The Hadoop space is large, like the amount of data it processes. SME’s are in a bit of a no mans land, often on developer resource to put such things together. Well I finally got my Big/Small Data act together and put Surgeon together. It’s a Hadoop distribution but instead of putting the emphasis on developers to write the MapReduce parts I made sure the most useful things were done first.

The thought of an SME being able to just put the data in a directory and “make it work” appeals to me. There’s no charge for the downloads, it’s all free. If you do use it I’d be delighted to know how you are using it.

You can download everything, Surgeon Core and the modules from Datasentiment. The current modules available are:

  • Word Count – Yeah the basic Hadoop demo but it still has its place.
  • Twitter Mentions – extract all the mentions from Tweets. I originally demo’d this at the BigData Bash in Belfast last year.
  • Twitter Hashtags – see above but with # instead of @
  • Random Sampling – Extracts a random percentage of data from the large data set. No modification is done only only generating a file.
  • Sentiment Analysis – Basic sentiment run against positive/negative word sets but it works for a lot of people’s needs.


There are more modules in the pipeline including segment analysis and prediction analytics (something I’ve done some posts on in the past).