How do I grab information off a webpage?

A

Anonymous

Guest
I am trying to make a function that would connect to a site and read the source code and return the description and the title of the page.

Example:

let say this is our function: get_description ($url)
Then, calling get_description ("http://www.excite.com") should return the following text:
"Excite is the leading personalization Web portal, featuring world-class search, content and functionality. From financial portfolios to sports scores, local weather forecasts to movie listings, Excite gathers what matters most to you every day. It's like your very own online personal assitant."

Note the description META tag at http://www.excite.com if this example is does not make sense yet.

You help is greatly appreciated and your function coding is even more.

Best Regards,
Med
madmed@dotnetos.com
 
Read the manual on File system functions. You're looking for a function called fread().

You'll then also want to look up Regular Expression in the manual, so you can work out how filter out everything except what's between the tags you specifically want.

That should get you started
 
I've got this so far:

Code:
<?php
$fp = fsockopen ("www.excite", 80, $errno, $errstr, 30);
$sitecode = "" ;
if (!$fp) {
    echo "$errstr ($errno)<br>\n";
} else {
    $document = "" ;
    fputs ($fp, "GET /" . $document . " HTTP/1.0\r\nHost: www.excite.com\r\n\r\n");
    while (!feof($fp)) {
        $sitecode .= fgets ($fp,128);

    }
    fclose ($fp);

$codepart = explode ( "</head>" , $sitecode )   ;
$codepart = explode ( "<head>" , $codepart[0] ) ;

$meta     = explode ( "<META" , $codepart[1] )  ;

for ( $i=1 ; $i<count($meta) ; $i++ ) {
  if ( stristr ( $meta[$i] , "name=description" ) or stristr ( $meta[$i] , "name=\"description\"" ) ) {
    $metadesc = explode ( ">" , $meta[$i] ) ;
    $content  = explode ( "=" , $metadesc[0] ) ;
    for ( $j=0 ; $j<count($content) ; $j++ ) {
      if ( stristr ( $content[$j] , "content" ) ) {
        $target = explode ( "\"" , $content[$j+1] ) ;
        $description = $target[1] ;
        break ;
      }
    }
  }
}

echo "<br>Description starts here <br>";
echo $description;
echo "<br>Description ends here <br>";
}
?>

It's kind of working but very weakly: I am not concerned too much with the data or how to filter it... but my main problem is handling sites that redirect you to others. I will be using this code with "jump" pages mainly.

Thanks for your help and inputs!

Med
madmed@dotnetos.com
 
MadMed said:
my main problem is handling sites that redirect you to others. I will be using this code with "jump" pages mainly.

Thanks for your help and inputs!

Med
madmed@dotnetos.com
That would really depend on how the sites are redirecting you. You see a webpage is just text, formatted to HTML standards, and available at a specific URL. This is how you access and read it. Your browser reads the text, and interprets the HTML commands (tags etc) and displays it accordingly.

If the sites are controlling the 'headers' to redirect you, you can't do much since there was never a page there to begin with. But if they're using javascript or meta-refresh tags, then this shouldn't be a problem because these are only actioned by your browser.
 
Back
Top