Lost Password?


Go Back   CodeCall Programming Forum > Web Development Forum > PHP Forum

PHP Forum Use this forum to discuss all aspects of PHP Development. PHP is a server-side, cross-platform, HTML embedded scripting language that lets you create dynamic web pages.

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 05-30-2008, 12:18 PM
CygnetGames's Avatar   
CygnetGames CygnetGames is offline
Programmer
 
Join Date: May 2007
Location: York, England
Posts: 113
Rep Power: 6
CygnetGames is on a distinguished road
Default RSS feeds and cURL

Hi all,

I'm using cURL from within a PHP webpage to display some RSS feeds on my site. I'm displaying comics from the xkcd feed and the Dilbert feed. The xkcd feed works perfectly, but Dilbert does something very strange.

At certain times of day Dilbert works fine, but at other times it does a 302 redirect to a different page with an older comic on it. However, only cURL seems to get this 302 redirect. When you paste the url into Firefox, you get the current comic!

This is really confusing me as I'm not sure whether the problem is in my code or in the Dilbert RSS feed. It's happened consistently for the last three days though. It works fine from around 11:00pm (GMT) to around 11:00am (GMT), and does the redirect from 11am until 11pm. (The times are only very rough, I haven't tracked them down precisely.)

The url I am using for the Dilbert feed is:
Code:
http://feeds.feedburner.com/DilbertDailyStrip?format=xml
and the one it redirects to is:
Code:
http://feedproxy.feedburner.com/DilbertDailyStrip?format=xml
My cURL code is:
PHP Code:
$ch curl_init();

curl_setopt($chCURLOPT_URL$feedURL);
curl_setopt($chCURLOPT_RETURNTRANSFER1);
curl_setopt($chCURLOPT_FOLLOWLOCATION1);

curl_setopt($chCURLOPT_VERBOSE1);
$out fopen('out '.urlencode($feedURL).time().'.txt''w');
curl_setopt($chCURLOPT_STDERR$out);

$buffer curl_exec($ch);

curl_close($ch);
fclose($out); 
Hopefully, someone may be able to shed some light on this - it's really puzzling me!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

Sponsored Links
  #2 (permalink)  
Old 05-30-2008, 12:39 PM
Jordan's Avatar   
Jordan Jordan is offline
Administrator
 
Join Date: Nov 2005
Location: Hendersonville, NC
Posts: 9,203
Last Blog:
Ext JS or Ext GWT
Rep Power: 20
Jordan is just really niceJordan is just really niceJordan is just really niceJordan is just really nice
Send a message via ICQ to Jordan Send a message via AIM to Jordan Send a message via MSN to Jordan
Default Re: RSS feeds and cURL

How often are you connecting to it and how many times are you connecting each time? They may have some custom code to forward IPs that connect to often to older RSS feeds. You can also try setting:

PHP Code:
curl_setopt($chCURLOPT_FOLLOWLOCATION1); 
to

PHP Code:
curl_setopt($chCURLOPT_FOLLOWLOCATION0); 
and see what happens.
__________________
CodeCall Blog | CodeCall Wiki | Shareware Site | Linux Forum | Write a Blog
The CodeCall Wiki is now fully integrated with vBulletin users! Check it out and add some new pages!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 05-30-2008, 06:43 PM
CygnetGames's Avatar   
CygnetGames CygnetGames is offline
Programmer
 
Join Date: May 2007
Location: York, England
Posts: 113
Rep Power: 6
CygnetGames is on a distinguished road
Default Re: RSS feeds and cURL

I'm connecting once every time the page is refreshed by a client - is that too much? I was considering caching the feeds on my server and updating the cache if the page is refreshed and the cache is over a day old (or it's passed whatever time the feeds are updated with new content, or something...).

It does the same thing with CURLOPT_FOLLOWLOCATION set to 0. That's how I had it set originally - my xml parsing code choked on the 302 page returned by cURL, which was how I discovered it was getting redirected.

It may well be that they are redirecting me for connecting too often.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 05-30-2008, 08:07 PM
Jordan's Avatar   
Jordan Jordan is offline
Administrator
 
Join Date: Nov 2005
Location: Hendersonville, NC
Posts: 9,203
Last Blog:
Ext JS or Ext GWT
Rep Power: 20
Jordan is just really niceJordan is just really niceJordan is just really niceJordan is just really nice
Send a message via ICQ to Jordan Send a message via AIM to Jordan Send a message via MSN to Jordan
Default Re: RSS feeds and cURL

Like I said, I don't see anything wrong with your code so I assume it is them blocking you. I think once every time someone connects is way to often.
__________________
CodeCall Blog | CodeCall Wiki | Shareware Site | Linux Forum | Write a Blog
The CodeCall Wiki is now fully integrated with vBulletin users! Check it out and add some new pages!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 06-01-2008, 11:40 AM
CygnetGames's Avatar   
CygnetGames CygnetGames is offline
Programmer
 
Join Date: May 2007
Location: York, England
Posts: 113
Rep Power: 6
CygnetGames is on a distinguished road
Default Re: RSS feeds and cURL

I was hoping that you were right about the blocking, but I've just tested it today and it's still happening.

I loaded the page at around 10:30am (GMT) this morning and it worked fine. Then I didn't touch it until just now (4:30pm), and it's redirected to the old comic.

It's not as if the old comic is a fixed amount of time before the new one either, it has been redirecting to the same comic - May 16 - for the last few days.

I will definetly cache it on my server though - so as not to connect to their feed too often.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote

Sponsored Links
  #6 (permalink)  
Old 06-05-2008, 03:20 PM
CygnetGames's Avatar   
CygnetGames CygnetGames is offline
Programmer
 
Join Date: May 2007
Location: York, England
Posts: 113
Rep Power: 6
CygnetGames is on a distinguished road
Default Re: RSS feeds and cURL

I fixed it!

You were half right Jordan. They were blocking me, but not because I was connecting too often. It was because I had a blank user agent header. Apparently, some servers don't like that. When I spoofed my user agent as Firefox, it worked perfectly.

Thanks for the help.

Here is the additional code for anyone else with a similar problem:
PHP Code:
$useragent 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1';
curl_setopt($chCURLOPT_USERAGENT$useragent); 
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 06-05-2008, 03:46 PM
Jordan's Avatar   
Jordan Jordan is offline
Administrator
 
Join Date: Nov 2005
Location: Hendersonville, NC
Posts: 9,203
Last Blog:
Ext JS or Ext GWT
Rep Power: 20
Jordan is just really niceJordan is just really niceJordan is just really niceJordan is just really nice
Send a message via ICQ to Jordan Send a message via AIM to Jordan Send a message via MSN to Jordan
Default

Quote:
Originally Posted by CygnetGames View Post
I fixed it!

You were half right Jordan. They were blocking me, but not because I was connecting too often. It was because I had a blank user agent header. Apparently, some servers don't like that. When I spoofed my user agent as Firefox, it worked perfectly.

Thanks for the help.

Here is the additional code for anyone else with a similar problem:
PHP Code:
$useragent 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1';
curl_setopt($chCURLOPT_USERAGENT$useragent); 
Good deal! How did you figure it out, trial and error?

Posted via CodeCall Mobile
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 06-06-2008, 07:06 PM
CygnetGames's Avatar   
CygnetGames CygnetGames is offline
Programmer
 
Join Date: May 2007
Location: York, England
Posts: 113
Rep Power: 6
CygnetGames is on a distinguished road
Default Re: RSS feeds and cURL

Quote:
Originally Posted by Jordan View Post
Good deal! How did you figure it out, trial and error?
It was partly trial and error and partly random luck!

While looking for something different, I found someone on a forum having a similar problem trying to scrape some content from another site and needing to spoof their user agent before the server would like them. I wondered if my problem was the same and it looks like it was.

Here is the finished, working version of the comics page. It pulls the latest comic from both Dilbert and xkcd, caching them every hour on my server.

I might make this into a tutorial, if it continues to work for the next few days without showing any more problems! It's a nice simple example of cURL and degradable ajax.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 06-07-2008, 10:35 AM
John's Avatar   
John John is online now
Co-Administrator
 
Join Date: Jul 2006
Age: 20
Posts: 3,432
Last Blog:
Google Web Toolkit
Rep Power: 20
John has a reputation beyond reputeJohn has a reputation beyond reputeJohn has a reputation beyond reputeJohn has a reputation beyond reputeJohn has a reputation beyond reputeJohn has a reputation beyond reputeJohn has a reputation beyond reputeJohn has a reputation beyond reputeJohn has a reputation beyond reputeJohn has a reputation beyond reputeJohn has a reputation beyond repute
Send a message via AIM to John Send a message via MSN to John
Default Re: RSS feeds and cURL

Good job, and a tutorial would be awesome!
__________________
CodeCall Blog | CodeCall Wiki | Shareware | Linux Forum | My Blog
Chat with other CodeCall members on IRC; connect to irc.codecall.net and join #codecall
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT -5. The time now is 12:38 AM.

Contest Stats

WingedPanther ........ 2753.6
Xav ........ 2704
Brandon W ........ 1702.32
John ........ 1207.73
marwex89 ........ 1175.24
morefood2001 ........ 966.05
dcs ........ 655.75
Steve.L ........ 475.59
orjan ........ 418.58
Aereshaa ........ 383.54

Contest Rules

CodeCall Goal

Goal: 100,000 Posts
Complete: 97%

Ads