It's All a Bet

Jason Bell – Author, Advisor and Practitioner in Machine Learning and Artificial Intelligence

Over 475 machine learning citations and 50+ patent citations on machine learning.

Filtering the Streaming #Twitter API with #Onyx Plugin – #clojure #twitter #data #firehose #onyx

Twitter Fashion Analytics Revisited?

Not quite but I have been working on revisiting the original work I did in SpringXD and moving it to the Onyx Framework.

20091019-DSCF0212-copy

The Twitter Plugin for Onyx Workflows

The original Onyx Twitter plugin acted as a handy input stream from the Twitter firehose. Nice and easy to setup too, just put your Twitter app credentials as environment variables and add a catalog. For example:

{:twitter/consumer-key (env :twitter-consumer-key)
 :twitter/consumer-secret (env :twitter-consumer-secret)
 :twitter/access-token (env :twitter-access-token)
 :twitter/access-secret (env :twitter-access-secret)
 :twitter/keep-keys [:id :text]}

The plugin used the .sample call from Twitter4J library which gives you the 1% of tweets from the public feed. Fine but the data coming out was wayward, it’s very random.

The .filter function

For a more refined look at the Twitter firehose you’re better off using the .filter function which takes a FilterQuery and uses an array of Strings that you want to monitor, this then refines the matches against the firehose and then you get 1% of the matching tweets and not a bunch of random noise.

So, to that end, I’m delighted to say that the Onyx Twitter plugin now supports the tracking of strings in the stream.

Just add:

:twitter/track ["#fashion" "#louboutins" "#shoes"]

..to the catalog and you’re away. If you leave this option off then you get the usual random sample from the public stream.

I might get around to revisiting the whole Twitter Fashion analytics thing in Clojure, with the Onyx Platform soon.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.