Scraping Amazon's Wishlist

Sat, 21st May 2016

I wanted to keep track of the prices on my Amazon wishlist, in case something I was after suddenly dipped in price (as they do), and I could grab myself a bargain. I shoved together a little script in PHP, that would extract the items and prices, compared them to the previous prices, and email me if the price dropped. It's not a complicated script and doesn't do anything fancy (like track the original price, etc.)

It will notify if ever a price drops compared to the previous price it recorded. I was going to keep a track of the lowest prices, but I figured that this might be one off, and would only ever notify once. And while this script does notify for any price drop, I only receive one or two emails per day, so it's not spamming me. If you use this script as part of a scheduled job (such as cron) I would recommend not calling it more than once an hour, at most. I've read suggestions that Amazon could blacklist your ip address if it happens to think you're attacking the site. Once an hour is plenty.

Note: this script has only been run against Amazon UK. You may need to modify the $url to suit your own needs.

#!/usr/bin/php

<?php

	$OUT_FILE = "path/to/wishList.json";
	$EMAIL_TO = "example@example.com";
	$WISHLIST_ID = "wishlist.id";
	
	$items = [];
	
	function getItems()
	{
		global $items;
		global $OUT_FILE, $EMAIL_TO, $WISHLIST_ID;
		
		$page = 1;
		$hasMore = false;
		$lastTitle = "";
		$sanity = 0;
		
		do
		{
			$hasMore = false;
		
			$url = "https://www.amazon.co.uk/gp/registry/wishlist/$WISHLIST_ID/ref=cm_wl_sortbar_o_page_2?ie=UTF8&page=$page";
		
			$html = file_get_contents($url);
			
			$regex = '/title="([^"]+)" href="\\/dp/i';
			preg_match_all($regex, $html, $title);
			$title[1] = array_unique($title[1]);
			$titles = array_values($title[1]);
			
			$regex = '/[ ]{2,}(\\£([0-9.]+)|Unavailable)/i';
			preg_match_all($regex, $html, $price);
			$prices = $price[2];
			
			if (count($titles) == count($prices))
			{
				for ($i = 0 ; $i < count($titles) ; $i++)
				{
					$title = "$titles[$i]";
					$price = 0 + $prices[$i];
					
					$items[$title] = $price;
					
					echo "$title = $price\n";
				}
				
				$hasMore = $lastTitle != $titles[0];
				
				$lastTitle = $titles[0];
			}
			
			$page++;
			
			if (++$sanity >= 10)
			{
				$hasMore = false;
			}
			
			sleep(1);
		}
		while ($hasMore);
	}
	
	getItems();
	
	if (file_exists($OUT_FILE))
	{
		$json = json_decode(file_get_contents($OUT_FILE), true);
		
		if (count($items) == 0 && count($json) > 0)
		{
			mail($EMAIL_TO, "Wishlist checker failed", "");
		}
		
		foreach ($items as $title => $price)
		{
			$price = round($price, 2);
		
			$diff = 0;
			
			if (isset($json[$title]))
			{
				$diff = round($price - $json[$title], 2);
			}
			
			if ($price > 0 && $diff < 0)
			{
				$subject = "$title : £$price ($diff)";
				
				mail($EMAIL_TO, $subject, "");
			}
			
			$items[$title] = $price;
		}
	}
	
	$json = json_encode($items, JSON_PRETTY_PRINT);
	
	//mail($EMAIL_TO, "Prices", $json);
	
	file_put_contents($OUT_FILE, $json);

?>

To run this script, copy the code above into a file (or download it at the bottom of this page), update the $OUT_FILE, $WISHLIST_ID, and $EMAIL_TO with actual, valid values, chmod +x the file, and then run it. To wit:

> vi wishList.sh
> chmod +x wishList.sh
> ./wishList.sh

All being well, the wishlist should be scraped, and a JSON file output containing all the items and prices. The code will attempt to scan multiple pages, so you can have more than 25 items (the current limit per page) in your list.

Downloads: wishList.sh (0Kb)
 
code 

Related News

Orb source code
Mon, 26th April 2021

Lasagne Monsters source code
Tue, 20th April 2021

3 Guys Apocalypse source code
Sun, 11th April 2021

A Reddit Wallpaper download script
Fri, 27th May 2016

Resizing an array in C
Thu, 19th May 2016

Desktop site