I’ve sung RabbitMQ’s praises for a long time now, it’s powered the messaging for a number of my personal projects, it’s well supported and the documentation and tutorials are some of the clearest I’ve come across. Call me an advocate.

Over the last short while I’ve been thinking about message volume. What happens when a client wants a batch of 200,000 items to be queued a processed, what’s the time take and the knock on effect (if there is one)?

Bring out the postcodes

A large dataset, that’s what I need. The postcode database will do me fine, it’s 1.8 million records and mine comprises of three fields: postcode, latitude and longitude.  The postcode is indexed so queries should be fast enough:

mysql> select lat,lon from postcode where postcode="BT499PL";
+---------+----------+
| lat | lon |
+---------+----------+
| 55.0425 | -7.04186 |
+---------+----------+
1 row in set (0.00 sec)

Seems fast enough for me.

I can also generate 200,00 random postcodes to test with:

mysql> select postcode from postcode ORDER BY RAND() LIMIT 0,200000;

Then with that saved in a text file I can reuse at will.

The RabbitMQ Consumer

The consumer is basic, it takes the message and fetches the lat and lon from the database then prints the answers. It’s a fire and forget call.  The basic queue won’t process another message until this one is finished.

ConnectionFactory factory = new ConnectionFactory();
factory.setHost("localhost");
Connection connection = factory.newConnection();
Channel channel = connection.createChannel();
channel.queueDeclare(TASK_QUEUE_NAME, true, false, false, null);
System.out.println(" [*] Waiting for messages. To exit press CTRL+C");
channel.basicQos(1);
QueueingConsumer consumer = new QueueingConsumer(channel);
channel.basicConsume(TASK_QUEUE_NAME, false, consumer);
while (true) {
   QueueingConsumer.Delivery delivery = consumer.nextDelivery();
   String message = new String(delivery.getBody());
   System.out.println(" [x] Received '" + message + "'");
   System.out.println(doWork(message) + " to tag "
      + delivery.getEnvelope().getDeliveryTag());
   System.out.println(" [x] Done");
   channel.basicAck(delivery.getEnvelope().getDeliveryTag(), false);
}

We can run any number of these but I’ll run six of them for the moment.

java -cp ./lib/rabbitmq-client.jar:./lib/commons-cli-1.1.jar:./lib/commons-io-1.2.jar:./lib/mysql-connector-java-5.1.6-bin.jar:./bin  uk.co.dataissexy.rabbitmq.postcode.Consumer

The Publisher….

We need a piece of code to send our postcodes to the message queue. It’s a case of reading the text file of 200,000 postcodes and publishing them to the queue, what happens after that is all down to the queue. Lovely thing with RabbitMQ is that is the queue dies and needs restarted the unprocessed messages carry on where they left off.

Some numbers

So, 200,000 messages to six consumer instances.

Total time was 294 seconds (4 minutes and 54 seconds). Here’s the interesting part. While the publisher was hammering the queue with new messages it ran for the first 54 seconds at processing 92 messages a second. I’m cool with that.

rabbitmq

Once the queue was populated then it took off. 240 seconds to process the remain 195,032 messages at a rate of 812.6 a second. Now that’s not bad going.

Altering the publisher

This got me thinking if there was a way of increasing the performance of the publisher.  The bottle neck is here was while the publisher is delivering messages it’s taking precious time from the queue.  Firing up more consumers won’t make much difference, well not until the queue has been fully formed.  As it stands it’s taking 19% of the total processing time to work just 2.5% of the queue.

One idea I had was the use a thread pool in the publisher, essentially paralleling the publisher to fire out more messages. Started off well but then the publisher got upset, the thread pool buckled and the consumers had no idea what was going on.

The search for volume based queue enlightenment continues….

Advertisements