Scraping a webpage



themitch said:
Hi all

I am looking to scrape a webpage and e-mail myself when it has been updated. I think, to use this I should use file_get_contents, say, and just save the contents into a mysql database, then check it every hour or so and if changed, I can e-mail myself.

Firstly, does this sound possible?

Secondly, how do I do the hourly thing? Is that a 'cron' job?

Finally, the page has, I think, HTTP authentication (an explorer popup appears asking for a username and password). I obviously have a username and password, but how do I pass that through to the file_get_contents function?

Thanks in advance for your help.

1. two ways:
a) put output file in array where each string of file it's element of array.
b) use Output Control Function functions what start as ob_
2. If you are use xNIX, you are use cron application.
3. need see how info is retrive. you are may try like this<user>&password=<password>
sounds possible... yes, the hourly thing is a cron job on *nix and shedule on windows..
the idea about file_get_contents id a good one actually as you can compare them as 2 variables..the onl with the new one... then use mail() to send an email to you :)

passing through would be done most likely via passing username and password ina url:

this is exactly like ftp login passing