Ecommerce in the real world
With all things analytics based they can slow down databases with the huge amount of reads/writes that are performed. Initial tests with the Recommendation Engine demo showed that using a JDBC data model was slow on a per user recommendation against a csv file based dataset.
Ultimately with all these things we don’t want to annoy or distrupt the customer experience on the website side of things or the point of sale.
Replication is preferred in terms of having the master database deal with all the realworld tasks (ecommerce, point of sale and so on) and a replicated slave database for the recommendation engine to the processing.
The diagram below shows the “live” web server (with the ecommerce site) and the live database in white, the slave server with the replicated database and the recommendation engine in the blue. With this setup the slave can do the analytics on the customer data while not impacting the live server to the customers.
Now for the wake up call…
Replication is communication between two database servers (the walkthrough I’ve done is for MySQL), what happens on the master is copied to the slave.
There is one big issue to keep in mind and that’s how the two servers communicate with each other. Straight master database user to slave database user is a dangerous way of doing things and doesn’t do a good job in protecting the data. Ideally each database server should only run locally and use secure shell (ssh) to communicate with each other.
This means using a ssh tunnel from the slave to the master to get the replication working.
The walk through
A few assumptions. Firstly that you’re using MySQL for your database needs. Secondly that you are running on some form of Unix/Linux based server. This should work on a Windows based system but I’ve never tried it, nor shall I in the near future.
First things first, on the slave server create a ssh key, passwordless. This will save to the .ssh/ folder in a file called id_rsa and id_rsa.pub.
$ ssh-keygen -t rsa
Now copy this to the master server. If your ssh is running under a different port number you are better off to use the server address in quotes with the port number within the quotes. The ssh-copy-id is a script and not an actual program.
$ ssh-copy-id 'firstname.lastname@example.org -p [port]'
Now create the ssh tunnel from the slave.myserver.com to the master.
$ ssh -p [port] master.myserver.com -L 3305:127.0.0.1:3306 -N
To find out if it’s working you can run a mysql client from the slave server to see if you can access the master database.
$ mysql -h 127.0.0.1 -P 3305 -uYourUser -p
On the master server edit the /etc/mysql/my.cnf file and ensure the following are in place. Replication requires binary logging is enabled for MySQL. For other database servers you’ll have to read the documentation.
Now create a user specific for the replication slave to access. Then grant this user permission to access the master db as a replicated slave.
CREATE USER 'repuser'@'127.0.0.1' IDENTIFIED BY 'reppass';GRANT REPLICATION SLAVE ON *.* TO 'repuser'@'127.0.0.1';
On the slave DB edit the /etc/mysql/my.cnf. You can name the databases you want to replicate with the “replicate-do-db” setting. You can list as many as you want.
Make sure the table structure is in place on the slave as well as the master. Now restart both servers and attempt to insert data into the master server and make sure the results are appeari
ng on the slave.
The recommendation engine
On the slave server you can now use Sqoop and the recommendation engine to append the transactions and recommend products back to the user.
The synchronisation works on way, the master db records an entry and the slave replicates it. Writing the recommendations to the slave won’t work their way back to the master.
One way to do this though is to rig the recommendation engine to write out a csv file of userid/itemid pairs to a file and as the ssh tunnel is in place import these back to the master db.