bkd
March 4th, 2006, 10:58 PM
Hi -
I need some help parsing Amazon.com html. I have a perl script I've written to parse out Book Description/Book Info, but the page format has changed. I need this for Amazon.com only.
Here's what I used to have:
if ($page =~ /\<a\ name\=\"15659242665000\"\>(.*)\<\/a\>/) {
$junk = $1;
}
$page already contains the $html source by the time that the if statement gets executed.
What I need to end up with here is an HTML string starting with the <b>Book Description and ending before the next <img src tag after Book Info
The snippet I need should be as short as the one above - I'm rusty on my regular expressions and just haven't had enough time to hack it.
Here's a couple of ISBNs to play with - 1565924266 and 0596000278.
Will pay $10-$15 via PayPal - email bkdurham (at) gmail.com with any questions etc. I will post here after the solution has been accepted (so you other folks don't try to send me solutions).
Thanks -
Brian
I need some help parsing Amazon.com html. I have a perl script I've written to parse out Book Description/Book Info, but the page format has changed. I need this for Amazon.com only.
Here's what I used to have:
if ($page =~ /\<a\ name\=\"15659242665000\"\>(.*)\<\/a\>/) {
$junk = $1;
}
$page already contains the $html source by the time that the if statement gets executed.
What I need to end up with here is an HTML string starting with the <b>Book Description and ending before the next <img src tag after Book Info
The snippet I need should be as short as the one above - I'm rusty on my regular expressions and just haven't had enough time to hack it.
Here's a couple of ISBNs to play with - 1565924266 and 0596000278.
Will pay $10-$15 via PayPal - email bkdurham (at) gmail.com with any questions etc. I will post here after the solution has been accepted (so you other folks don't try to send me solutions).
Thanks -
Brian