My friends at Airpos are funding raising on Crowdcube. #pointofsale @airpos #startups #crowdcube

A scalable, secure and mature software-as-a-service platform, AirPOS enables 100’s of independent retailers in over a dozen countries to manage their business and serve their customers more easily. The company is targeting a potential market of 20m cloud point-of-sale terminals worldwide.

AirPOS – Crowdcube Pitch from AirPOS on Vimeo.

I’ve watched this company evolve, I was also their CTO for a while back in the early days. Give yourself some time and look over their pitch, there’s seven days on the clock.

Time to Remind Myself What a #Startup Is.

From Wikipedia:

A startup company (startup or start-up) is an entrepreneurial venture which is typically a newly emerged, fast-growing business that aims to meet a marketplace need by developing or offering an innovative product, process or service. A startup is usually a company such as a small business, a partnership or an organization designed to rapidly develop a scalable business model.

Sorry, I just had to remind myself. It’s not about getting on accelerators, crowdfunding, flattering venture capitalists, faux mentors or any of that. It’s about having an idea, building it out and selling it to paying customers. If you need help then that’s fine, the help is out there, just remember that those services come at a cost whether that’s in sanity or time, probably both.

Government money is fine, as long as it’s not your single source of income, if that’s the case then take a good look at your idea, it’s probably on life support already.  It’s also a shifting sand, it could stop at any moment.

Where possible I’m still in favour of just getting on with it by yourself. As put nicely by a hedge fund founder in Vanity Fair a number of years ago, “VC funding is one step up from human trafficking….”

Better to think like Mike ‘Wags’ Wagner.

“You know I want to be on the outside, rocking with the marauders.”


Revisiting #Spark Scripts From the Command Line. #bigdata #spark #scala

It’s been a while since I looked at any Spark code, I’ve just been working on other things. There’s been a few comments on the blog about running Spark jobs from the command line shell.

Test Data

First let’s have some text data to work off. We’ll do a basic word count on it. Nothing to hand apart from my Tensor Flow algorithmic book generation.

I Wordlessly Kate and I gaze at the elevator at the end. I have never understood what you’re going to do with my safety. I groan as my body is rigid, tension radi- ating out me in front of me. He looks so remorseful, and in the same color as the crowd arrives and in my apartment. The thought is crippling. But and I don’t want to go to me that I want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to him — and I can tell him about 17 miles a deal. “Did you have to compro- mise. I giggle. “Wench. Food, now, please.” “Since you want to talk about you in my own way, and I am going to be very surprised, not to see you. Ax (Your fiancee) I ask softly. He looks so vulnerable — and I don’t know if it’s my heightened way of the ‘old,’ son. I have a hairdresser arriving at your mom?” “Yes.” He grins at me and winks, making me flush. He smirks at me. “What is it?” I ask. He gazes at me, his eyes dark and earnest. “Find out the elevators, of the first time in a half-bear — and I have to go to church . . . Date: June 10, 2011 16:05 To: Christian Grey Twiddling Christian and I don’t know if it’s not at the rules are a hostile Anthem, “Every Breath You Take.” I do you have to do with you?” he asks. “I don’t want to go to work for a living, and I’ll be very persuasive,” he murmurs, and his eyes are alight with humor. “He’s like a drink,” Jack mut- ters, locking the eggs. I crack through my body. But what I do to make you uncomfortable.” I shake my head to fetch him at the same COURTESY to a child. “I thought you were in the apartment or you^?

It’s not a classic I know.

The Scala Spark Script

Next a Scala script that does the word count in Spark.

val text = sc.textFile("/Users/jasonbell/sample.txt")
val counts = text.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_)

Basic…. but it works.

And A Run Through

And then run it from the command line.

$ /usr/local/spark-2.1.0-bin-hadoop2.3/bin/spark-shell -i wc.scala
Using Spark's default log4j profile: org/apache/spark/
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/04/08 09:07:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/04/08 09:07:20 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at
Spark context available as 'sc' (master = local[*], app id = local-1491638836119).
Spark session available as 'spark'.
Loading wc.scala...
text: org.apache.spark.rdd.RDD[String] = /Users/jasonbell/sample.txt MapPartitionsRDD[1] at textFile at <console>:24
counts: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[4] at reduceByKey at <console>:26
res0: Array[(String, Int)] = Array((COURTESY,1), (“Since,1), (flush.,1), (is,3), (now,,1), (2011,1), (arrives,1), (same,2), (June,1), (am,1), (have,5), (never,1), (tension,1), (winks,,1), (dark,1), (miles,1), (with,3), (fiancee),1), (crippling.,1), (first,1), (—,3), (fetch,1), (talk,1), (uncomfortable.”,1), (eyes,2), (crack,1), (my,7), (Take.”,1), (child.,1), (go,3), (make,1), (Breath,1), (what,2), (out,2), (Twiddling,1), (me,,1), (gazes,1), (looks,2), (Date:,1), (deal.,1), (remorseful,,1), (me,4), (him,3), (his,2), (are,2), (body,1), (shake,1), (persuasive,”,1), (“Yes.”,1), (can,1), (half-bear,1), (mise.,1), (Wordlessly,1), (“What,1), (elevator,1), (Food,,1), (.,3), (earnest.,1), (as,2), (going,2), (‘old,’,1), (very,2), (don’t,27), (you,1), (son.,1), (safety.,1), (eggs.,1), (apartment...
Welcome to
 ____ __
 / __/__ ___ _____/ /__
 _\ \/ _ \/ _ `/ __/ '_/
 /___/ .__/\_,_/_/ /_/\_\ version 2.1.0

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60)
Type in expressions to have them evaluated.
Type :help for more information.

If you’re not getting the results then something is wrong.




The Northern Ireland #AI Startup Problem – #AI #NorthernIreland #Startups

(The post here reflects my own thoughts and may not be the thoughts of my employer, just putting that out there now to avoid any confusion)

The Shift

Over the last few months there’s been a shift. A movement from web sites and apps that do stuff (mostly useful, some utterly useless) to a move refined thinking on process and insight.

During the weekend I was look at the funding patterns of artificial intelligence startups. Handily KDNuggets (the place you look for anything on data mining and machine intelligence) had a piece on 50 of the “top” companies right now in AI.

The 50 to Watch

Company Sector Investment ($m) (Provo UT) Ad Sales 251.2
Persado (New York NY) Ad Sales 66
APPIER (Taipei Taiwan) Ad Sales 49
DrawBridge (San Mateo CA) Ad Sales 46
Zoox (Menlo Park CA) Autotech 290
Nauto Inc. (Palo Alto CA) Autotech 14.9
nuTonomy (Cambridge MA) Autotech 19.6
Dataminr (New York NY) BI 183.44
Trifacta (San Francisco CA) BI 76.3
Paxata (Redwood City CA) BI 60.99
DataRobot (Boston MA) BI 57.42
Context Relevant (Seattle WA) BI 44.3
Tamr (Cambridge MA) BI 41.2
CrowdFlower Inc. (San Francisco CA) BI 38
RapidMiner (Boston MA) BI 36 (Tel Aviv Israel) BI 23.9
BloomReach (Mountain View CA) Commerce 97
Mobvoi Inc. (Beijing China) Conversation AI 71.62 (New York NY) Conversation AI 34.3
MindMeld (San Francisco CA) Conversation AI 15.4
Sentient Technologies (San Francisco CA) Core AI 135.78
Voyager Labs (Israel) Core AI 100
Ayasdi (Menlo Park CA) Core AI 106.35
Digital Reasoning (Franklin TN) Core AI 73.96
Vicarious (San Francisco CA) Core AI 72
Affectva (Waltham MA) Core AI 33.72 (Mountain View CA) Core AI 33.6
CognitiveScale (Austin TX) Core AI 25
Numenta (Redwood City CA) Core AI 24
Cylance (Irvine CA) Cyber Sec 177
Darktrace (London UK) Cyber Sec 104.5
Sift science (San Francisco CA) Cyber Sec 53.6
Kensho (Cambridge MA) Fintech 67
Alphasense (San Francisco CA) Fintech 35
iCarbonX (Shenzhen China) Healthcare 199.48
Benevolent.AI (London UK) Healthcare 100
Babylon health (London UK) Healthcare 25
Zebra medical vision (Shefayim HaMerkaz Israel) Healthcare 20
Anki (San Francisco CA) IOT 157.5
Ubtech (Shenzhen China) IOT 120
Rokid (Hangzhou Zhejiang China) IOT 50
Sight Machine (San Francisco CA) IOT 44.15
Verdigris tech. (Moffett Field CA) IOT 16.1
Narrative science (Chicago IL) Text Analysis 29.4
Captricity (Oakland CA) Vision 51.9
Clarifai (New York NY) Vision 40
Orbital Insight Inc. (Mountain View CA) Vision 28.7
Chronocam (Paris France) Vision 18.35
Zymergen (Emeryville CA) Other 174.1
Blue river tech (Sunnyvale CA) Other 30.4

Key Summary

  • Minimum Investment – $14.9m
  • Maximum Investment – $290m
  • Average Investment – $73.26m
  • Number of companies listed – 50

The listed companies were “ones to watch”, that doesn’t take into account the other 10,000 or so that will be in stealth, not on anyone’s radar or just making sales and getting on with it.

For me one concern is the lower investment limit, $14.9m, I’ve not seen any NI company raise that amount of investment. And I’ve spent time thinking about why that could possibly be.

  • All the startups are donkey’s. They’re just not worth that amount.
  • All the founders are playing the Northern Ireland funding game, raised their little $1m and can’t raise as they’ve already lost 20-25% of the company.
  • There’s no actual IP or product.
  • There’s no customers.
  • There’s no problem being solved.

That’s off the top of my head, if I really went all mind palace on it I’d probably come up with another ten reasons.

The Talent Pool

The much lauded reason for FDI companies setting up shop in Belfast and, occasionally, Derry.

“you have graduates – there’s a lot of talent in Belfast” From the BT, here.

Which I read as, “There’s plenty of cheap graduates looking for a job in Belfast, we can exploit that and reduce our bottom line.”

It’s time to seriously question this marketing message, yes there are some very talented graduates in Northern Ireland. Are they ready for the market where they are needed? Debatable. Do they fill the gap of what’s really missing, no they don’t.

It still skirts around the issue for any startup, a complete lack of good CTO talent. What I’m seeing more and more of are companies setting up, getting that free government money (startup DLA, if you will) and handing out vanity titles like there’s no tomorrow. I’ve written and spoken about this many times before, if you want to read it again then have a look at this.

Good CTO’s in NI are hard to find, plain and simple. The reason for this is simple too, they’re pretty much in great jobs with large employers with a deal too good to lose and don’t think it was a fluke, the large companies engineer it that way, they obviously don’t want to lose good talent when they see it.

Jumping to a startup with a very questionable runway is a huge risk. Look at yourself in the mirror and ask yourself, “Am I worth the risk to my employees, my C levels and most importantly my customers?”.

If you flinch or can’t do it then you obviously need a session with Wendy Rhodes.

NI Needs a BIG WIN

If you think you’re on the starting wave of AI technology then you’re already five years too late. The same mistake was made with BigData opportunities. What I personally believe is required right now is for someone to bring a product along that is so unique and solves a problem better than anyone else that the rest of the world can’t do anything but look.

This thing also needs to IPO big time and make the founders and early stage investors so rich that people look at Northern Ireland as the place. The time is now to stop kidding ourselves and thinking we’re at the start of a wave, you’re already behind. Still thinking that social media data is going to make you (and others) rich, I doubt it, that edge is long gone.

There’s little point building tools, it’s hard to create revenue with programming tools and API’s, solve a problem better than anyone else so it can’t be ignored. The tools to do AI and Machine Learning are plentiful, whether it be TensorFlow, Weka or what have you. Search hard enough here and you’ll find posts on those technologies. At the end of the day the programming side isn’t that difficult when you have good coders who understand the logic.

I firmly believe it can be done, I just think the thinking needs to change, stop listening to salary paid government PR (use them, fine, but weigh up what’s being said) and focus on idea, IP and customer.

  • Kick ass product
  • Kick ass team
  • More than $7m in investment
  • An edge that no one can ignore.
  • Main focus to remain in NI and IPO.

Your focus needs to be three standard deviations to the mean, that’s where the risk and the potential rewards are.

And keep this in mind, AI is not about replacing jobs, it’s about focusing on the job creation and creating new jobs that currently don’t exist. It’s an exciting time to be here but NI but you have some serious catch up to do.

Beltech 2017

I’m on the panel at Beltech 2017, “Public Debate: The Impact of AI on our World” at 6pm though I’ll be there most of the day on behalf of Mastodon C. So feel free to catch up with me there.


[#Kafka Diaries] – Your daily morning streaming meditation guide.

Kafka acting up like a toddler is a symptom and not the cause, you’re doing something to it for it to act that way.


Strive to reduce latency. Remember the Rule of 72.


Use frameworks but remember there’s added latency so make sure you can tune it.


Log everything.


Time everything.


Know your message size and define the broker memory on “write throughput * 30” seconds accordingly.

Know your log retention size and time, if you don’t know it then it’s probably 156 hours.


Ensure your consumers can neatly die if they need to without wrecking the other consumers.


That niggle inside saying something ain’t right, heed it.


If another company can process thirty million messages a second, you can too.

Bonus: Tea Solves Everything


[#Kafka Diaries] Topic Level Settings You Can’t Ignore Part 1. – #Data #Streaming

For the majority of users the defaults are there and they kind of work, your messages are small and there’s enough volume on the box to be able to relax. If you are working on local development then there’s a good chance you don’t even consider such things, once things go live though then it’s a different matter.

Message Retention

There are two methods that are available to you for setting retention of messages on Kafka, firstly by time a message is in the log and then by log size.

log.retention.hours, log.retention.minutes,

Yes there are three but they all do the same thing. How long messages are retained in the log by time. The default is 168 hours (which is seven days). You can use either hours, minutes or milliseconds as they all set the same thing. If more than one setting is present then the lowest unit size is used.


You can retain messages expressed as a the total number of bytes of the messages in the log. The retention bytes is set and applies to per partition so a topic with three partitions and a log.retention.bytes of 1GB is 3GB bytes retained at the very most. If you increased the partition count by one on the topic for example then the retention bytes will then increase to 4GB.

The two types of log retention, size and time, can be used together. If both are set then messages are removed when the either of the settings are satisfied. If you have a retention time of 1 day and 2GB retention in size then the log rules will be applied if you have over 2GB before the one day period is up.


Producers are limited to the size of messages they can produce. The default is 1mb in size, if a producer is sent a message over that then it will not be accepted. The setting refers to the compressed size of the message, so the message itself can be over the set size unpressed.

While you can set Kafka to use larger message sizes this does have performance impact across the network and I/O throughput. So it’s worth sitting down with a pen and paper (or a spreadsheet) to gauge the average message sizes and adjust the settings accordingly.


Realtime & Streaming Workflows and The Rule of 72 #data #streaming #kafka #kinesis


In Investment, the Rule of 72

The Rule of 72 is a simple calculation used in accountancy and investment. Simply, if you divide the interest rate by 72 you get the number of periods your investment will double. I have £100 and an interest rate of 10% per year then it will take 7.2 years (72/10 = 7.2) for my money to double.

In Realtime and Streaming Applications

Workflow throughput is everything. How consumers perform and behave will have a knock on effect to the number of messages you can process. So please allow me to present, The Streaming Rule of 72.

In realtime and streaming applications, the rule of 72, the rule of 70 and the rule of 69.3 are methods for estimating the volume of messages able to be processed doubling time. The rule number (e.g., 72) is divided by the percentage gain  per period to obtain the approximate number of periods (usually seconds) required for doubling.

For example if 100 messages are flowing through the system per second and by changing the workflow timeouts, increase a container shared memory volume or extend a heartbeat time out for example, then measure throughput again and we process 130 messages a second. That’s a 30% increase ((130-100)/100) so with the streaming rule of 72….. we will double the message volume every 2,4 seconds.


Monitoring Consumer Offsets in #Kafka.

With a bunch of applications acting as consumers to a Kafka stream it appears to be a Google dark art to find any decent information of what’s going where and doing what. The big question is, where is my application up to in the topic log?

After hours of try, test, rinse, repeat, tea, pulling hair, more tea, Stackoverflow (we all do, get over it) and yet more tea, this dear digital plumber was looking like this….


….but in the male form, less angry and not using tables, just more tea.

The /consumer node in Zookeeper is a bit of a red herring, your application consumer group ids don’t show up there but ones from the Kafka shell console do. This makes running the ConsumerGroupCommand class a bit of a dead end.

Consumer Offsets Hidden in Plain Sight

It does exist though! It’s right there, looking at you…

$ bin/ localhost:2181
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is disabled


WatchedEvent state:SyncConnected type:None path:null
ls /brokers/topics
[__consumer_offsets, topic-input]

..just not plainly obvious.

Most consumers are basically while loops collecting a number of records and using the poll() method to update the consumer offset of any records not dealt with, basically saying “I’m up to here boss! I didn’t read these though”. It also acts as the initial link with the Kafka Group Coordinator to register a new consumer group. Those consumer groups do not show up where you expect them too.

Finding Out Where the Offset Is

At this point Zookeeper isn’t much help to me, using the Zookeeper shell doesn’t give me much to go on.

get /brokers/topics/__consumer_offsets/partitions/44/state
cZxid = 0x4e
ctime = Sat Mar 04 07:41:28 GMT 2017
mZxid = 0x4e
mtime = Sat Mar 04 07:41:28 GMT 2017
pZxid = 0x4e
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 72
numChildren = 0

So I need something else…. and help is at hand. It just takes a little jiggery pokery.

Using The Kafka Consumer Console to Read Offset Information

We can use Kafka’s console tools to read the __consumer_offsets. First thing to do is create a config file in a temporary directory.

$ echo "exclude.internal.topics=false" > /tmp/consumer.config

Then we can start the console.

$ bin/ --consumer.config /tmp/consumer.config --formatter "kafka.coordinator.GroupMetadataManager\$OffsetsMessageFormatter" --zookeeper localhost:2181 --topic __consumer_offsets --from-beginning

Any consumer applications you have running should show up in the offset log. In this example I have two applications running from the same topic (topic-input) on one partition. So I can see from here that my-stream-processing-application is up to offset 315 in the topic while my-other-processing-application is further ahead at 504. That could potentially tell us there’s an issue with the first application as it appears to be way behind in the topic.

[my-stream-processing-application,topic-input,0]::[OffsetMetadata[189,NO_METADATA],CommitTime 1488613422905,ExpirationTime 1488699822905]
[my-stream-processing-application,topic-input,0]::[OffsetMetadata[252,NO_METADATA],CommitTime 1488613901498,ExpirationTime 1488700301498]
[my-stream-processing-application,topic-input,0]::[OffsetMetadata[315,NO_METADATA],CommitTime 1488614472422,ExpirationTime 1488700872422]
[my-other-processing-application,topic-input,0]::[OffsetMetadata[378,NO_METADATA],CommitTime 1488614576300,ExpirationTime 1488700976300]
[my-other-processing-application,topic-input,0]::[OffsetMetadata[441,NO_METADATA],CommitTime 1488614606314,ExpirationTime 1488701006314]
[my-other-processing-application,topic-input,0]::[OffsetMetadata[504,NO_METADATA],CommitTime 1488615237410,ExpirationTime 1488701637410]

The frustration at this point is that it’s hard to know the total number of records in the log, we know where the offset is up to but not the total and what the lag is.

The search continues…..




My first foray in to #novel #writing with #AI. – #Tensorflow #AI

TL;DR – Quick Summary

For all interested writers, authors and creative writing types…. I think it’s fairly safe to assume you’re safe for the time being.

Have a good day.

Can AI Write Me a Book?

First of all, this isn’t really about code it’s just about process. So there are no juicy code snippets or scripts to get all hot under the collar about. This whole (stupid) episode started out with a couple of questions.

  1. Could AI write a novel or a novella of a quality that it could be entered into a writing competition?
  2. Is it possible to make 50 Shades of Grey readable?

So before my usual working day I downloaded some recurrent neural network code, installed Tensor Flow and trained it on Shakespeare and left the laptop alone to do it’s thing. Yup, training takes a long time, in my case it was eight hours on a commodity Toshiba C70D laptop running Ubuntu. If you want to read more about RNN’s then there’s an excellent explanation here. Generating samples of text generated from the RNN is a doddle….. takes seconds.

That Flipping Book.

So, rinse and repeat with some different text. How about 161,528 words of that book. Now, I have a confession I’ve never read that book, or in fact novels, for some reason my brain is firmly planted in non-fiction. Now I’m wondering if I can get AI to write me an O’Reilly book…. wonder what animal I’d get?

Another eight hours pass, overnight this time.

So how quick is it to generate 500 words of AI driven wordery? No time at all it seems…..

In fact with some simple bash scripting I can write an 11,500 Novella in under two minutes, on a cheap laptop. I shall be rich after all!

Not So Fast….

While the output was, well okay, it needs A LOT OF WORK to make it actually work on a human readable level. The main reason is in the training, if you look at that book it trundles in at 900k long in text file length, for training that’s way too small. In the samples the AI would get stuck in a look at repeat the same phrases over and over. Sometimes it would actually add to the paragraph, most times it repeated so often it didn’t make sense.

“He looks so remorseful, and in the same color as the crowd arrives and in my apartment. The thought is crippling. But and I don’t want to go to me that I want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to you. I don’t want to be beholden to him — and I can tell him about 17 miles a deal.”

The only thing I can think of that came close was Adrian Belew’s repetative shouting of “I repeat myself when under stress, I repeat myself when under stress…..” (Warning: Youtube link contains King Crimson and a Chapman Stick).

Regardless, part of me thinks that’s naff, part of me thinks that’s rather cool, AI did that. So in theory and with a little cleaning up it’s possible to craft something.

What About Topic and Flow?

This is the thing with creative text, it has characters, themes and a story flow. What I’ve done so far doesn’t address any of that and that’s where everything falls flat on it’s bum for AI. Without some hefty topic wrangling it’s going to be difficult to craft something that’s actually going to flow and make sense.

My favourite book on text mining by a country mile is not one that has tons of code in it, it’s The Bestseller Code by Jodie Archer and Matthew Jockers. It’s a good attempt, while by their admission could be improved, investigation using multivariate analysis, NLP and other text mining tools to see if there were patterns in the best seller list.


Topic is important, that goes without saying. My AI version has no plot line whatsoever as it plainly isn’t told of such matters, if you want to know more about plot lines then there are the seven basic plot lines that are widely used. Baking those plot lines to an AI will take work.

The more text you generate the worse it’s going to be to get a basic plot going. A way to get the AI to focus on generating certain aspects of the story over a timeline would be beneficial but hard to do. Once again though, nothing is impossible.

An Industry Of Authorship, Automated?

The future automation of all things literature I think is a long way off. Though, let’s look at this from a 30,000ft view. I can generate an eleven thousand word book, while ropey, showing some promise if only needing an editor to sort the wording out.

API’s that exist now, well I could pick of for words to form a title. “rules are a hostile anthem” came out…. one Google Image Search for creative commons photos….


And automatically pick an image that fits a certain AI criteria. No text… pass that on to an overlay of the title of the book and the author’s name, “Alan Inglis” (geddit A.I.) and package that up in a Kindle format (that can be automated too) and off it goes to an Amazon account.

*I did check on Alan Inglis, there are no authors of that name but, rather ironically, one within clinical neurosciences….. I should write an algorithm to create a surname that doesn’t exist really. I just guessed this one out.

Perhaps Not Fiction Then….

Perhaps not, but with texts that tend to take the same form it could be easy to create fairly accurate drafts which require some form of editorial gaze afterwards. News reports, Invest NI jobs created press releases, term sheets and even business plans. Yup I think there’s sufficient scope for something to happen. I don’t think you’ll replace them human element but then again you don’t really want to.

Back, To Answer My Original Question

So, could my AI submit something to writing competition? Perhaps it could if it were less than 10,000 words. With enough corpus text it would be possible to do something of a quality that could be considered readable. Would a judge notice it as AI writing, who knows. There are some bits with the samples I’ve generated that are quite interesting, it looks like there’s heart in the prose but it’s simply not true.

I think the Alan Inglis’s of this world are safe for the time being….. I suppose I should go and read The Circle by Dave Eggers to see what my future holds.





The 30 Second Bayesian Classifier #machinelearning #bayes #classification

I’m putting this up as I got a nice email from a reader who was having trouble with running the Britney example. And as developers know, bad examples are enough to put people off…. actually they’re toxic.


See what I did there…


The Classifier4J library is old so it’s not on any Maven repository I’m aware of. So we have to go old school and go old fashion download jar file. You can find the Classifier4J library at

If you don’t have the code for the book you can download it from


Open a terminal window and go to the example code for the book. In chapter2 is the Britney code. Keep a note of where you’ve downloaded the Classifier4J jar file as you’ll need this in the Java compile command.

$ javac -cp /path/to/Classifier4J-0.6.jar


There should be a .class file in your directory now. Running the Java class is a simple matter. Notice we’re referencing the package and class we want to execute.

$ java -cp .:..:/path/to/Classifier4J-0.6.jar chapter2.BritneyDilemma
brittany spears = 0.7071067811865475
brittney spears = 0.7071067811865475
britany spears = 0.7071067811865475
britny spears = 0.7071067811865475
briteny spears = 0.7071067811865475
britteny spears = 0.7071067811865475
briney spears = 0.7071067811865475
brittny spears = 0.7071067811865475
brintey spears = 0.7071067811865475
britanny spears = 0.7071067811865475
britiny spears = 0.7071067811865475
britnet spears = 0.7071067811865475
britiney spears = 0.7071067811865475
christina aguilera = 0.0
britney spears = 0.9999999999999998

About the Book


You can find out more about the book on the Wiley website. As well as machine learning practical examples it also has sections on Hadoop, Streaming Data, Spark and R.