IKEA’s BILLY bookcase 80x28x106cm in birch veneer. A design classic. £45 for the everyman’s unobtrusive storage solution. Over 40 million BILLY bookcases sold since they were created in 1979. But how many of them are there in the my local branch of IKEA at the moment?

billy-bookcase

Not the most important question in the world but a useful one to do a quick demo of how to automatically pull data out of an API in PHP at a regular interval and store it in a database (MySQL in this case) – something that we commonly want to do to compile data-sets where live data readings are provided but historical data is not.

Where is my data?

The first thing to know is where we can get our live data. In the case of the IKEA website, there is a live stock check in the right hand column where you can select a particular store and see availability for the selected product. Here’s the location of the birch veneer BILLY bookcase on IKEA’s website: http://www.ikea.com/gb/en/catalog/products/80279786/. Note that final 8 digit number – that’s our unique product ID.

We could download the HTML page and try to get the number out of that but with a little digging into the website code, we can find that actually it’s pulling that live stock data from an API with the URL: http://www.ikea.com/gb/en/iows/catalog/availability/80279786 which returns XML which is a bit easier for us to work with in our PHP code.

The XML returned actually contains the stock level for the BILLY at all UK IKEA stores so we need to know where to look. Again digging into the HTML shows a hidden input field containing all store IDs.

<input name="localStores" value="113|belfast:142|wednesbury:264|bristol:267|cardiff:150|coventry:144|croydon:265|edinburgh:143|gateshead:266|glasgow:262|thurrock:261|leeds:186|ashton:185|milton_keynes:263|nottingham:125|southampton:255|edmonton:140|warrington:141|brent_park" type="hidden">

The Bristol store has an ID of 264 which corresponds to the “buCode” attribute of 264 in the XML.

Great, that’s all the information we need to find our live data. Let’s get coding…

Coding the PHP data grabber

PHP is usually used to serve webpages to a website visitor. But we can also use it to run code on our webserver in the background to do useful things for us. First we’ll write the code and then we’ll get it running at a regular interval using “cron”.

Let’s start with the code to fetch the XML and pull out the values we want. We’ll use PHP’s built in SimpleXML parser to help us navigate our way through the XML tree and the file_get_contents function to request the data from the URL and we’ll also create a variable to store the current time as a string so that we know when we queried the data.

$url = 'http://www.ikea.com/gb/en/iows/catalog/availability/80279786';
$xml = simplexml_load_string(file_get_contents($url));
$dts = date('Y-m-d H:i:s');

Simple so far. We’ve created an object representation of the XML data tree in $xml. SimpleXML gives us a way to drill into that tree for the data we want(some basic usage examples here).

Let’s loop through each of the store locations and pull out the elements we’re interested in storing – the store ID (buCode) and the current available stock of BILLYs in variables.

foreach ($xml->availability[0]->localStore as $store) {
	$storeId = $store['buCode'];
	$dataVal = $store->stock->availableStock;
	echo $storeId . ': ' . $dataVal . '<br/>';
}

If we now navigate to this page we should see a list of each store ID and its current stock level. Nice. Maybe all we want is a way to see the data right now. But what if we want to track the stock level over time. We need to store the data somewhere…

Writing to a MySQL database

There are lots of tutorials online about how to store data in a MySQL database with PHP so I won’t go into that in detail here. I’ve created a table called BillyCount in my database with 3 columns – StoreId, DataTimeStamp and Data to store my data in. If I was querying multiple products, I’d also have a ProductId column.

I’ve then used a bit of PHP to connect to my database and run some SQL to insert each data row. Here’s the final code for our php data grabbing page. If we go to this location in our browser we see lots of “New record created successfully” messages (hopefully) and we should be able to see that the data’s been added to our database.

<?php

$servername = "usuallylocalhost";
$username = "yourDBusername";
$password = "yourDBpassword";
$dbname = "yourDBname";

// Create MySQL connection
$conn = new mysqli($servername, $username, $password, $dbname);
// Check connection
if ($conn->connect_error) {
    die("Connection failed: " . $conn->connect_error);
}

$url = 'http://www.ikea.com/gb/en/iows/catalog/availability/80279786';
$xml = simplexml_load_string(file_get_contents($url));
$dts = date('Y-m-d H:i:s');

foreach ($xml->availability[0]->localStore as $store) {
	$storeId = $store['buCode'];
	$dataVal = $store->stock->availableStock;

	$sql = "INSERT INTO BillyCount (StoreId, DataTimeStamp, Data) VALUES ('".$storeId."', '".$dts."', ".$dataVal.")";
	//echo $sql;
	
	if ($conn->query($sql) === TRUE) {
	    echo "New record created successfully";
	} else {
	    echo "Error: " . $sql . "<br>" . $conn->error;
	}
}

// close MySQL connection
$conn->close();

?>

What’s a cron?

So that’s great. But we don’t want to have to manually visit the page in our browser each time we want to add new data to the database. We want to set something up to make that happen on a schedule. Maybe once a day at midday or once every hour on the hour. Fortunately there’s a way to do that. In Windows this is called a Scheduled Task but in Linux (ie most webservers) this is called a cron job. If your server’s running cPanel, you should have a Cron Jobs button. Some documentation for cron in cPanel here.

We get some options about when we want the job to run and cPanel helps us out with some common options. I’m going to use a more complex one to run every 3 hours during opening hours (let’s assume the stock isn’t going to change too much while the shop is closed). That comes out as “0 9,12,15,18,21 * * *”. You can try out different cron settings here http://crontab.guru and it’ll translate them into plain text for you.

We then need to specify a command to run. We need to use the absolute path to the task on our server. In this case we’re going to run a php job so we need to provide a path to php on our server eg “/usr/local/cpanel/3rdparty/bin/php” and then the path to our php page eg “/home/user/public_html/ikeaDataGrabber.php”. In this case we don’t want the output to go anywhere so we tell cron not to do anything with the output by finishing our command with “> /dev/null 2>&1”. So the final line is something like:

0 9,12,15,18 * * * /usr/local/cpanel/3rdparty/bin/php /home/user/public_html/ikeaDataGrabber.php > /dev/null 2>&1

The specifics will depend on your server but hopefully that’ll help you get there.

Done!

We now have a job running that’ll put the stock of BILLY’s in a database every 3 hours for all IKEA stores in the UK. Is that useful. Probably not. But hopefully this has been a useful demo of how to quickly set up a data scraper to pull and store data over time just using PHP and MySQL. I have this script running on my server now. If I find out anything interesting from the data, I’ll let you know!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>