In Part 1 I introduced you to Spring 😄 and it’s lovely ways of being able to pull in streaming Twitter data. Think of it like a continuous catwalk of data….

f_6cbc2c3fbb5e8ecc927db404726d43efThe-fashion-industry

The story so far….

We’ve got streams of data coming into the server and they are being store. All very well but Twitter streaming responses are huge chances of JSON data. And when they are coming in thick and fast well it takes up disc space and quick.

I’m only bothered about two things from all this data, firstly the date/time of the tweet and secondly the content.

Within the grand data chuck I see that “created_at” and “text” are what we really need.

Transformers

We can write custom pieces of code to act as extra bits to the pipe and manipulate the data as it comes in. We’ve established that we’re looking for two things and I want this to be output to the text file.

So where I currently have:

xd:>stream create --name tweetLouboutins --definition "twitterstream --track='#louboutins'| file"

I want to add a transformer to strip back all the JSON and just give me the bits I want.  To create a transformer we can do that in code and then deploy it to our Spring 😄 node.

The code and bean definition.

Here’s the main body of the code:

Map<String, Object> tweet = mapper.readValue(payload,new TypeReference<Map<String, Object>>() {});
sb.append(tweet.get("created_at").toString());
sb.append("|");
sb.append(tweet.get("text").toString());
return sb.toString();
If you want to read the full class you can do as the project is on Github.

The last thing we need before deploying is a XML file that defines our transformation class.

<?xml version="1.0" encoding="UTF-8"?>
<beans:beans xmlns="http://www.springframework.org/schema/integration"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xmlns:beans="http://www.springframework.org/schema/beans"
  xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/integration
http://www.springframework.org/schema/integration/spring-integration.xsd">
  <channel id="input"/>
  <transformer input-channel="input" output-channel="output">
    <beans:bean class="co.uk.dataissexy.xd.samples.TwitterStreamTransform" />
  </transformer>
 <channel id="output"/>
</beans:beans>

Deployment

Spring 😄 wants your code to be stored as a jar file and placed the xd/lib directory. The xml definition file needs to be placed in the  xd/modules/processor directory, then restart the server for the changes to take effect.
Now we can run the transformer in a stream. Where before we had:
xd:>stream create --name tweetLouboutins --definition "twitterstream --track='#fashion'| file"

We now need to add in our new transformer.

xd:>stream create --name tweetLouboutins --definition "twitterstream --track='#fashion'| twitterstreamtransformer | file"
And now a quick inspection of the data directory we’ll see the data is a lot more manageable.
Sun Nov 10 11:42:06 +0000 2013|RT @GliStolti: Starry Night bag http://t.co/Xa1342og1G #fashion #trend #style #design #handmade #handicraft #shopping #rome #italy #madeini…
Sun Nov 10 11:42:16 +0000 2013|RT @GliStolti: #VanGogh necklace http://t.co/08v0Jwd4r7 #fashion #trend #style #design #handmade #handicraft #madeinitaly #shopping #rome #…
Sun Nov 10 11:42:16 +0000 2013|Was at @StuntDolly yesterday getting the #XMASCARBOOT organised for Sat the 16th! Be sure to come! #fashion #dalston http://t.co/lT6aLyf60r

Next time….

In Part 3 I’m going to bring Hadoop into the fold and collate the hashtags and attempt to create some form of visualisation with D3.
Advertisements