This post was originally published on the DeskHoppa Engineering Blog on Medium.

We built DeskHoppa on data driven decisions. The technology though is used to augment our decision making, not wholly make it for us. How we choose the hosts we contact is based on data, algorithms and probability.

The search and match processes to put a guest together with a host is a pursuit of accuracy that can only be done over time with data, training and evaluation.

Putting those things together is not easy, much of the ground work is done by others who put the time in on their own dime. Open source software powers a lot of what we do.

Giving Something Back To The Community

Deciding to publish any code and setups that are useful to others was a very simple decision to make. What seems supposedly simple to us may be days of work for something else, uncovering the gotchas and documenting them can save a developer days, weeks or even months of unpicking. We’ve been there and have had the development rabbit holes that others have.

We’ve put our publishable repositories on our Github account. Some of it will be code written by us, some of it is just handy scripts that might have come from other places but collated in a way that’s easy for the developer to implement.

Using Kafka and Twitter Data

There’s a natural fit for Kafka and streams of Twitter data. Using a mixture of Kafka Connect to make a connection to Twitter Streams API and then using KSQL streaming query language to transform and query the stream is powerful even in the most simplistic of contexts.

While we do an awful lot more with the data past the KSQL stages we wanted to share a really quick setup for anyone to use. For our first community release to Github we wanted to start with raw data, it’s important to collate relevant data from the outset. Our Kafka/Twitter configuration, based on the excellent blog post by Robin Moffatt on the Confluent Blog is our baseline.

The configuration and required files are on Github https://github.com/deskhoppa/kafka-twitter-collector with a README of what to put where. Assuming you’re using the community edition of the Confluent Kafka Platform everything should slot in to place without any bother.

Advertisements