Jump to content

Getting <title>

- - - - -

  • Please log in to reply
3 replies to this topic

#1
Bioshox

Bioshox

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 207 posts
Hey guy's

I recently developed some code that allowes me to get the title of a URL posted into a form, it works for most websites, other's it doesn't, and for one that I came across in beta testing outputs this error

Yes I know it's a strange thing to be typing it, but I was testing me blocking filters as websites like these arn't allowed on the website, but it outputed this error, which is interesting

Warning: file_get_contents((snip)) [function.file-get-contents]: failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in /home/hrefdir/public_html/submit.php on line 71

This seems to be the only URL i've tried so far which it happens with, the code I am using is...

function getTitle($link){
    $str = file_get_contents($link);
    if(strlen($str)>0){
        preg_match("/\<title\>(.*)\<\/title\>/",$str,$title);
        return $title[1];
    }
}
<?php 
$gettitle = getTitle($url);
if($gettitle == ""){
?>
                <input type="text" name="title" class="submitform" value="Website Title" onFocus="if(this.value == 'Website Title') {this.value = '';}" onBlur="if (this.value == '') {this.value = 'Website Title';}" />
<?php
}else{
?>
                <input type="text" name="title" class="submitform" value="<?php echo getTitle($url); ?>"  /> 
<?php
}
?>
And the URL to the site I'm using it on, The Hyperlink Directory - The ultimate web directory and backlink creator

Try entering a URL, and you will see the title of the page come up automatically.

Any help on this would be appreciated.

Also, when I enter a URL that doesn't exist I get these errors

Warning: file_get_contents() [function.file-get-contents]: php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/hrefdir/public_html/submit.php on line 71

Warning: file_get_contents(http://oswgvgdewgewegcall.net) [function.file-get-contents]: failed to open stream: php_network_getaddresses: getaddrinfo failed: Name or service not known in /home/hrefdir/public_html/submit.php on line 71

Any idea on how I can change this?

Edited by Alexander, 05 April 2011 - 06:57 PM.
(Removed unintentional hyperlinking to porn site)


#2
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,124 posts
  • Location:Vancouver, Eh! Cleverness: 200
If the site is accessible by you and not by server, then they likely have a feature in place to prevent bots from spidering content as a rule of thumb assuming you have proper connectivity over there.

Try adding this at the top of your script, if you are allowed to modify settings in run time:
ini_set('user_agent', $_SERVER['HTTP_USER_AGENT']);
You can as well view the headers any given website with print_r( get_headers($urlname, 1) );, this may be useful for debugging what your application does and where it fails on the valid address.

Note: file_get_contents is a wrapper for fopen, so you may need to use fopen or fsockopen or curl instead as they may provide finer control on how to retrieve the content (timeout, etc.)
Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.

#3
Bioshox

Bioshox

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 207 posts
Sweet! That's fixed one or two issues, although there are still a few website's the script can't connect to, I've now set up special error functions if the script can't connect, one of the site's I can't connect to is: http://creattica.com/

And the script modified is now this:

config.php (Global, Included on all files across the script)

if($user_agent == 'browser') ini_set('user_agent', $_SERVER['HTTP_USER_AGENT']); 

submit.php
		

//check the website connects

$checkurl = $url;

		$handle = @fopen($checkurl,'r');

		if($handle !== false){   

		

//Run this if the website connected

function getTitle($link){

    $str = file_get_contents($link);

    if(strlen($str)>0){

        preg_match("/\<title\>(.*)\<\/title\>/",$str,$title);

        return $title[1];

    }

}


#4
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,124 posts
  • Location:Vancouver, Eh! Cleverness: 200
In this specific case, the main page infinitely redirects to itself. Your browser may be smart enough to stop after 20 times (PHP's fopen will then throw an error without a context wrapper) and will require some hacking at to be able to get this address's contents. In fact it infinitely redirects to http://creattica/ which is invalid, I am unsure what on earth they did, but that was not smart.

For safety's sake if fopen fails I would throw an error and return from the function, i.e.
@fopen($checkurl, "r") || return false;
You can always check if the function returned something with this.
if(!getTitle($url)) {
  echo "Could not get page title from website";
}

Your previous post appears to use fopen and file_get_contents, was that just a paraphrase? fopen could replace the file_get_contents completely.
Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users