Scraping Amazon's Wishlist Sat, 21st May 2016I wanted to keep track of the prices on my Amazon wishlist, in case something I was after suddenly dipped in price (as they do), and I could grab myself a bargain. I shoved together a little script in PHP, that would extract the items and prices, compared them to the previous prices, and email me if the price dropped. It's not a complicated script and doesn't do anything fancy (like track the original price, etc.) It will notify if ever a price drops compared to the previous price it recorded. I was going to keep a track of the lowest prices, but I figured that this might be one off, and would only ever notify once. And while this script does notify for any price drop, I only receive one or two emails per day, so it's not spamming me. If you use this script as part of a scheduled job (such as cron) I would recommend not calling it more than once an hour, at most. I've read suggestions that Amazon could blacklist your ip address if it happens to think you're attacking the site. Once an hour is plenty. Note: this script has only been run against Amazon UK. You may need to modify the $url to suit your own needs. #!/usr/bin/php <?php $OUT_FILE = "path/to/wishList.json"; $EMAIL_TO = "example@example.com"; $WISHLIST_ID = "wishlist.id"; $items = []; function getItems() { global $items; global $OUT_FILE, $EMAIL_TO, $WISHLIST_ID; $page = 1; $hasMore = false; $lastTitle = ""; $sanity = 0; do { $hasMore = false; $url = "https://www.amazon.co.uk/gp/registry/wishlist/$WISHLIST_ID/ref=cm_wl_sortbar_o_page_2?ie=UTF8&page=$page"; $html = file_get_contents($url); $regex = '/title="([^"]+)" href="\\/dp/i'; preg_match_all($regex, $html, $title); $title[1] = array_unique($title[1]); $titles = array_values($title[1]); $regex = '/[ ]{2,}(\\£([0-9.]+)|Unavailable)/i'; preg_match_all($regex, $html, $price); $prices = $price[2]; if (count($titles) == count($prices)) { for ($i = 0 ; $i < count($titles) ; $i++) { $title = "$titles[$i]"; $price = 0 + $prices[$i]; $items[$title] = $price; echo "$title = $price\n"; } $hasMore = $lastTitle != $titles[0]; $lastTitle = $titles[0]; } $page++; if (++$sanity >= 10) { $hasMore = false; } sleep(1); } while ($hasMore); } getItems(); if (file_exists($OUT_FILE)) { $json = json_decode(file_get_contents($OUT_FILE), true); if (count($items) == 0 && count($json) > 0) { mail($EMAIL_TO, "Wishlist checker failed", ""); } foreach ($items as $title => $price) { $price = round($price, 2); $diff = 0; if (isset($json[$title])) { $diff = round($price - $json[$title], 2); } if ($price > 0 && $diff < 0) { $subject = "$title : £$price ($diff)"; mail($EMAIL_TO, $subject, ""); } $items[$title] = $price; } } $json = json_encode($items, JSON_PRETTY_PRINT); //mail($EMAIL_TO, "Prices", $json); file_put_contents($OUT_FILE, $json); ?> To run this script, copy the code above into a file (or download it at the bottom of this page), update the $OUT_FILE, $WISHLIST_ID, and $EMAIL_TO with actual, valid values, chmod +x the file, and then run it. To wit: > vi wishList.sh > chmod +x wishList.sh > ./wishList.sh All being well, the wishlist should be scraped, and a JSON file output containing all the items and prices. The code will attempt to scan multiple pages, so you can have more than 25 items (the current limit per page) in your list. Downloads: wishList.sh (0Kb)Related News
Orb source code
Lasagne Monsters source code
3 Guys Apocalypse source code
A Reddit Wallpaper download script
Resizing an array in C
| |
Desktop site |