Rip data from a page

  • Thread starter Thread starter Anonymous
  • Start date Start date
A

Anonymous

Guest
Is there a (simple) script that can automatically rip the text between two specific pieces of text in the source of an other website?
I want to rip the text between the two comment tags for example:
Code:
RIP THIS NEWS

(if this is the wrong forum to ask this, please let me know which forum is suited for queries like this)
 
read the pages line by line and put this condition:
pseudo
Code:
if ($line=="") {
  while($line !="") {
    echo "$line";
     }

}
what this code will do is, when it first come across the line it will enter the if condion and while loop that will run till the next line is not found,
in while loop you still read the page lines !
 
sigix said:
Code:
if ($line=="") {
  while($line !="") {
    echo "$line";
     }

}

What if the tag isn't on a line all by itself? Even if there's some whitespace, your script will fail. No, what you need to do is use strpos() to find the first instance of the opening tag (and remember its position), and then again to find the location of the closing tag. Then use substr() to grab just that text. Something like this (beware, thoroughly untested):

Code:
<?php
$content = ''; /* your HTML input... */
$start_delim = '';
$end_delim = '';

/* find the index of the first character after the start
   delimeter
*/
$start_pos = strpos($content, $start_delim) + strlen($start_delim);

/* find the number of characters between $start_pos and
   the beginning of the end delimeter
*/
$content_len = strpos($content, $end_delim) - $start_pos;

/* extract what's between the delimeters (trim for
   good measure)
*/
$content = trim(substr($content, $start_pos, $content_len));

echo $content;
?>

You will, of course, want to do some error-checking to make sure the delimeters are present, etc.
 
I wrote pseudo code :(
it will be endless loop no conditions for coming out
just gave idea how to do that. :!:
 
Ok, I use fopen() to get the HTML input, but then I get the following error:
Code:
Warning: fopen(): URL file-access is disabled in the server configuration
The people who have configurated this, say hackers can damgage the server with "php-code-injection", that's why they disabled it. :wink:
Can anyone explain what they mean by that? I don't believe this is possible...
 
hi

if you do not have permission to write, modify, read the files, then you probably can't do anything much

need to set the permissions first
 
So can anyone explain what a "php-code-injection" is, and what is has to do with the fopen() command?
 
Basically, if you leave a hole in your code that allows someone to specify an arbitrary URL for your fopen() command (via GET or POST) and then execute that code, they could cause it to retrieve and execute a malicious script from their server. This sort of attack is pretty common, and your BOFHs are wise to stop it at the source.
 
look at this how to strip data between 2 tags
Code:
<?php
$file = fopen ("http://www.example.com/", "r");
if (!$file) {
    echo "<p>Unable to open remote file.\n";
    exit;
}
while (!feof ($file)) {
    $line = fgets ($file, 1024);
    /* This only works if the title and its tags are on one line */
    if (eregi ("<title>(.*)</title>", $line, $out)) {
        $title = $out[1];
        break;
    }
}
fclose($file);
?>
:wink:
http://se2.php.net/manual/en/features.remote-files.php
 
sigix said:
look at this how to strip data between 2 tags
Code:
<?php
$file = fopen ("http://www.example.com/", "r");
if (!$file) {
    echo "<p>Unable to open remote file.\n";
    exit;
}
while (!feof ($file)) {
    $line = fgets ($file, 1024);
    /* This only works if the title and its tags are on one line */
    if (eregi ("<title>(.*)</title>", $line, $out)) {
        $title = $out[1];
        break;
    }
}
fclose($file);
?>
:wink:
http://se2.php.net/manual/en/features.remote-files.php

Sorry to say, this fails if the opening tag and closing tag aren't on the same line. The ereg would work, I think, if you put all of the lines into a single string before doing the ereg, but the above will only match if the tags are on the same line. Bummer, that. I'd be interested to know, however, if the ereg method or the strpos() method is faster..
 
yep the example that I have posted <link given> will parse the file line by line and you have to make on string beforing making that call...
 
Back
Top