I said at the start of the month that 2013 would be the year of API chess. While the mainstay of the moves have come from Twitter it looks like Facebook are starting their moves too.
This is not just a retail thing, neither is it a government thing. It’s not an elearning, social media or other flight of fancy that’s happening right now.
If household budgets are squeezed lower your price a little. If there’s a huge shift from your traditional sales channels to other channels then investigate them and look at this risk to you. If you feel a thing in your gut that the landscape is going to change then keep an eye on it.
This is not new, it’s been going on for years. The thing that killed civilisations, retailers, startups and individuals was the refusal to accept that change was happening around them.
Ecommerce in the real world
With all things analytics based they can slow down databases with the huge amount of reads/writes that are performed. Initial tests with the Recommendation Engine demo showed that using a JDBC data model was slow on a per user recommendation against a csv file based dataset.
Ultimately with all these things we don’t want to annoy or distrupt the customer experience on the website side of things or the point of sale.
Replication is preferred in terms of having the master database deal with all the realworld tasks (ecommerce, point of sale and so on) and a replicated slave database for the recommendation engine to the processing.
The diagram below shows the “live” web server (with the ecommerce site) and the live database in white, the slave server with the replicated database and the recommendation engine in the blue. With this setup the slave can do the analytics on the customer data while not impacting the live server to the customers.
Now for the wake up call…
Replication is communication between two database servers (the walkthrough I’ve done is for MySQL), what happens on the master is copied to the slave.
There is one big issue to keep in mind and that’s how the two servers communicate with each other. Straight master database user to slave database user is a dangerous way of doing things and doesn’t do a good job in protecting the data. Ideally each database server should only run locally and use secure shell (ssh) to communicate with each other.
This means using a ssh tunnel from the slave to the master to get the replication working.
The walk through
A few assumptions. Firstly that you’re using MySQL for your database needs. Secondly that you are running on some form of Unix/Linux based server. This should work on a Windows based system but I’ve never tried it, nor shall I in the near future.
First things first, on the slave server create a ssh key, passwordless. This will save to the .ssh/ folder in a file called id_rsa and id_rsa.pub.
$ ssh-keygen -t rsa
Now copy this to the master server. If your ssh is running under a different port number you are better off to use the server address in quotes with the port number within the quotes. The ssh-copy-id is a script and not an actual program.
$ ssh-copy-id 'firstname.lastname@example.org -p [port]'
Now create the ssh tunnel from the slave.myserver.com to the master.
$ ssh -p [port] master.myserver.com -L 3305:127.0.0.1:3306 -N
To find out if it’s working you can run a mysql client from the slave server to see if you can access the master database.
$ mysql -h 127.0.0.1 -P 3305 -uYourUser -p
On the master server edit the /etc/mysql/my.cnf file and ensure the following are in place. Replication requires binary logging is enabled for MySQL. For other database servers you’ll have to read the documentation.
Now create a user specific for the replication slave to access. Then grant this user permission to access the master db as a replicated slave.
CREATE USER 'repuser'@'127.0.0.1' IDENTIFIED BY 'reppass';GRANT REPLICATION SLAVE ON *.* TO 'repuser'@'127.0.0.1';
On the slave DB edit the /etc/mysql/my.cnf. You can name the databases you want to replicate with the “replicate-do-db” setting. You can list as many as you want.
Make sure the table structure is in place on the slave as well as the master. Now restart both servers and attempt to insert data into the master server and make sure the results are appeari
ng on the slave.
The recommendation engine
On the slave server you can now use Sqoop and the recommendation engine to append the transactions and recommend products back to the user.
The synchronisation works on way, the master db records an entry and the slave replicates it. Writing the recommendations to the slave won’t work their way back to the master.
One way to do this though is to rig the recommendation engine to write out a csv file of userid/itemid pairs to a file and as the ssh tunnel is in place import these back to the master db.
While putting the recommendation engine demo together over the last few days my thoughts turned to exporting data out of some data store into CSV.
The original demo uses MySQL to store transaction data, nothing wrong with that I know. And I put a small script and sql command to export the data out.
select * into outfile “/tmp/recom.csv” fields terminated by “,” lines terminated by “n” from recom;
All well and good. Until it comes to file permissions. MySQL will write to the /tmp directory no problems. It will complain if the file already exists.
You could use Sqoop as an alternative which is a handy little tool that’s used with all the BigData tools kicking around.
sqoop import –connect jdbc:mysql://localhost/mydb –username auser –table recom –as-textfile
You can also extend it to run specific queries to pull out certain parts of data and selective runs since specific row writes and so on. A better solution in an incremental fashion.