Jump to content

Trying to make a groovy little cURL script but having some trouble with regex I think

- - - - -

This topic has been archived. This means that you cannot reply to this topic.
6 replies to this topic

#1
picklecake

picklecake

    Newbie

  • Members
  • Pip
  • 3 posts
So yeah I've been working on this script that will eventually automatically download a course for me from MIT Open CourseWare, but I'm having some trouble with my second regex. Its looking for links to video lecture pages. As far as I know I'm not having trouble with cURL, but I might be mistaken. I tried my regex on myregextester.com and it worked so im kind of puzzled here.

the regex in question is this:
$pattern2 = '/row"><td>(\d)<\/td><td><a href="([^"]+)">([^<]+)<\/a>(?:<br \/><br \/><a href="([^"]+)">([^<]+))?/';

Any help with this even a suggestion as to doing it differently would be appreciated. My php file is attached.

Attached Files



#2
InitVI

InitVI

    Newbie

  • Members
  • PipPip
  • 11 posts
I dont know whether your curl is working or not. I did make some changes to your regex and foreach statement.

When i did this script i just used file_get_contents to test your regex. Your regex was working. (kinda) It wasnt getting all of the links. But heres what i got. Adapt as needed.


<?php

    $htmlpreparse = file_get_contents('http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-002-circuits-and-electronics-spring-2007/video-lectures/');

	$pattern2 = '/row"><td>(\d)?(\d)<\/td><td><a href="([^"]+)">([^<]+)<\/a>(?:(<br \/>)?<br \/><a href="([^"]+)">([^<]+))?/';

	preg_match_all($pattern2, $htmlpreparse, $matchtwo, PREG_SET_ORDER);

	foreach($matchtwo as $lecture){

	    $dl_url = 'http://ocw.mit.edu'.$lecture[3];

		echo $dl_url;

		//do download stuff using $dl_url as the url :)

	}

?>


RegExr is what i use for my regex stuff.

#3
picklecake

picklecake

    Newbie

  • Members
  • Pip
  • 3 posts

InitVI said:

I dont know whether your curl is working or not. I did make some changes to your regex and foreach statement.

When i did this script i just used file_get_contents to test your regex. Your regex was working. (kinda) It wasnt getting all of the links. But heres what i got. Adapt as needed.


<?php

    $htmlpreparse = file_get_contents('http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-002-circuits-and-electronics-spring-2007/video-lectures/');

	$pattern2 = '/row"><td>(\d)?(\d)<\/td><td><a href="([^"]+)">([^<]+)<\/a>(?:(<br \/>)?<br \/><a href="([^"]+)">([^<]+))?/';

	preg_match_all($pattern2, $htmlpreparse, $matchtwo, PREG_SET_ORDER);

	foreach($matchtwo as $lecture){

	    $dl_url = 'http://ocw.mit.edu'.$lecture[3];

		echo $dl_url;

		//do download stuff using $dl_url as the url :)

	}

?>


RegExr is what i use for my regex stuff.

Thanks for trying to solve my problem, but I'm still not getting anything in my matchtwo array after preg_match_all(). I did forget that the videos could have a double digit in the number and I changed that but is there anything else different? I'm thinking about just forgeting php and doing this in perl, but maybe someone can help me here. Are you guys getting different results from this script?

#4
picklecake

picklecake

    Newbie

  • Members
  • Pip
  • 3 posts
Haha. I think I found the problem. For some reason cURL isnt returning a string from curl_exec() even though I set the returntransfer option. So it is a cURL problem. Other than that I think everything else is working now even my other mistakes.

#5
John

John

    Writes binary right handed and hex left handed

  • Moderators
  • 6,321 posts
I'd highly recommend using Browse PHP Simple HTML DOM Parser Files on SourceForge.net to parse the DOM.

#6
Guest_johnny.dacu_*

Guest_johnny.dacu_*
  • Guests

John said:

I'd highly recommend using Browse PHP Simple HTML DOM Parser Files on SourceForge.net to parse the DOM.

Or use DomDocument from PHP core.. It works even if is not a valid document (it will output some notices). You can select elements by id or by other atribute...

#7
InitVI

InitVI

    Newbie

  • Members
  • PipPip
  • 11 posts

picklecake said:

Thanks for trying to solve my problem, but I'm still not getting anything in my matchtwo array after preg_match_all(). I did forget that the videos could have a double digit in the number and I changed that but is there anything else different? I'm thinking about just forgeting php and doing this in perl, but maybe someone can help me here. Are you guys getting different results from this script?


I added a line break to the foreach to make it look clean. But it works for me.

This is the 2nd comp ive tested on aswell .

http://img94.imagesh...reenshot3yk.png