The Fact Your #Data is Being Used Should Be a Surprise to No-one

It’s been an interesting weekend for my field of work. Especially in an industry where I do stuff with data….

Ellie Mae O’Hagen wrote a piece called “No one can pretend Facebook is harmless fun anymore” and it’s not a bad overview of where things are. The last line says it all:

“…because people with Facebook profiles aren’t the company’s customers: they are the product it sells to advertisers.”

Which is basically the worst kept secret in technology companies, entrepreneurs and tech “thought leaders”. The value is in the data and once you figure out how to monetise that then a free product to customers is no bad thing.

Anyone who knows me knows my love of customer loyalty data, I’ve worked with it since 2002, mined Nectar card data and came up with recommendations via vouchers and offers on how to get customers to change behavior. The Cambridge Analytica approach is far from new it’s just the domain it was applied to.

Once you know you can change another persons behavior there comes a sense of responsibility with it. As the custodians of the data you now have the power the change the course of another person’s future without their knowledge. That thought alone is scary as I know some that would exploit it for profit like squeezing a grape until no more juice would come out.

So think about it all, every card, whether it’s loyalty cards, bank cards, your medical records on the GP’s system. Do the likes of Tesco/Dunn Humby have a public list of where their Clubcard data is sold? Probably not.  I asked a question during a Big Data Week panel in 2015, “Who has a Clubcard?”, pretty much all the room, “Who wouldn’t mind if your shopping habits were passed on to the insurance company?”, all hands with the exception of two went down very quick.

Telephone call logs are another and the classic line, “we may record your call for training purposes”, training what exactly? Another customer representative or an machine learning or AI tool to decide whether to keep your custom. How do we know, well we never do because we never find out.

Will the events of the weekend turn the tide against Facebook, it’s 50/50. I mean 50m users of Facebook is only about 2.5% of the user base and most hardened cat/dog/baby picture posting users won’t care. If I were to bet, I’d said probably nothing much will happen.

The only people who need to change are you and me, about what data goes were, how it will be used and how to have it deleted when we’re done with that service.



Setting up Org Mode and Babel for the Nervous #emacs #vi #babel

I’m claiming a moral victory for my sanity here…..

As a die hard vi user Emacs occasionally confuses me, I’m happy to admit that. Many a time I’ve pressed :wq instead of C-x C-s when it comes to saving files.

Thing is Emacs has loads of goodies that I never get to quite try out. Org mode being able to run scripts and stuff is one of them. Curiosity has now given way to requirement so it was a wrestling match to get it working (and reading the documentation did help…. I admit). There’s probably better ways to do this but it worked for me.

Installing Org mode

Open your init.el file and add the following line:

(add-to-list 'package-archives '("org" . "") t)

I’m assuming that you’ve already got (require 'package) in your init.el file, if you don’t then add that above the add-to-list line.

Open your Emacs, you now need to install the org package and org-contrib package.

M-x list-packages

You will see org and org-contrib listed (mine are at the top). Install them both, click on the title then click “Install”. Emacs will output a load of stuff but all is normally well. With that done we can now make sure that we can run bash from within Org mode.

Enabling Bash within Emacs

Open your init.el file again. You will now add the org-babel command to load the shell so it can be called from your org file. I usually add this stuff at the bottom of my init file.

(org-babel-do-load-languages 'org-babel-load-languages
 '((shell . t)))

Save the file and restart Emacs.

Testing within Org mode.

So far so good, now to test.

Either open an org mode file or create one. Now add the following:

#+BEGIN_SRC bash
echo 'this is a test'

Then evaluate the output by using C-c C-c and you will be prompted Evaluate this bash code block on your system? Respond with yes.

Look at your org file again and you should see the output.

#+BEGIN_SRC bash
echo 'this is a test'

: this is a test

That’s good enough for me and now I have notes on how to get it working on my work machine tomorrow morning (I’ll forget if I don’t write it down).


Walking as a debugging technique. #programming #debugging #code #learning

Kris is totally on the money, this tweet is 100% true. One story I tell to developers is from personal experience.

While working on the Sporting Life* website for the Press Association I was working on a Perl script, quite a beefy one, to populate pools coupons so people could play online.

All morning I was fixated on a bug and I couldn’t see the wood for the trees. My boss sat opposite but didn’t say a word, nor did I realise he was teaching me. After a while he decided the time was right, “Jase, go for a walk.”, I was blunt, “No, not until I’ve fixed this bug….”, “Jase, go for a WALK!”. I got the hint…..

The Press Association car park is a fair size so I did a lap, just the one. All the while during that lap I was talking under my breath about such an absurd command from my boss. My first proper programming job and I was than impressed…..

That all changed in an instant. I opened the door to the office, walked to my desk and before I even sat down pointed at the screen and said, “Oh look, there’s a comma missing….”, made the correction and it worked first time.

Stuck with a problem? Go for a walk.


* Two milestones of my programming career being one of the first involved in the very first online betting platform and second, the first online pools coupon….. this coming from the man who has no interest in sport at all.


I’m talking about streaming apps at ClojureX 4-5th December, London at @Skillsmatter #clojure #onyx #streaming #kafka #kinesis

Who Let Him Up There Again??

Last year at ClojureX I did an introduction to Onyx, this year it’s about what I really learned at the coal face. I’ll be talking about how I bled all over Onyx with a really big project.

This time though, no naff jokes, no Strictly Come Dancing and Linear Regression*, no temptation to use that Japanese War Tuba picture. It will be about designing streaming applications, task life cycles, heartbeats, docker deployment considerations and the calculating log volume sizes for when you’re on holiday.

I’m looking forward to it. If you are interested in the current schedule you can read that here, if you want more information on the conference then that’s on the SkillsMatter website.

* If you’re interesed the Darcey Coefficient is (as a Clojure function):

(ns darceycoefficient.core)
(defn predict-score [x-score]
 (+ 3.031 (* 0.6769 x-score)))

Like BigData tools you won’t need AI 99% of the time . #bigdata #data #machinelearning #ai #hadoop #spark #kafka

The Prologue.

Recently I’ve been very curious, I know that alone makes people in tech really nervous. I was curious to find out the first mentions of BigData and Hadoop in this blog, April 2012 and the previous year I’d been doing a lot of reading on cloud technologies and moreover data, my thirty year focus is data and right now in 2017 I’m halfway through.

The edge as I saw it would be to go macro on data and insight, that had been my thought ten years earlier. The whole play with customer data was clear in my mind then. In 2002 though we didn’t have the tooling, we made it ourselves. Crude, yes. Worked, it did.

When I moved to Northern Ireland I kept talking about the data plays to mainly deaf ears, some got it. Most didn’t. “Hadoop, never heard of it”. Five years later everyone has heard of Hadoop… too late.

It’s usually about now we have a word cloud with lots of big data related words on it.

Small Data, Big Data Tools

Most of the stories I hear about Big Data adoption are just this, using Big Data tools to solve small data problems. On the face of it the amount of data an organisation has rarely amounts to the need for huge tooling like Hadoop or Spark. My guess is (and I’ve seen partially confirmed) that the larger platforms like Cloudera, MapR and Hortonworks compete on a very narrow field of real big customers.

Let’s be honest with ourselves, Netflix and Amazon sized data are more deviations of the mean than the mean itself and the probability of it being given to you is very small unless it’s made public.

I personally found out in 2012 when I put together Cloudatics, using big data tools is a very hard sell. Many companies just don’t care, not all understand the benefits and those who cared still didn’t see how it would apply to them. Your pipeline is slim, at a guess 100:1 ratio would apply, that was optimistic then let alone five years on.

Most of us aren’t near “Averaged Sized Data” let alone Big Data.

When first met Bruce Durling back in late 2013 (he probably regretted that coffee) we talked about all the tools, how there’s no need to write all this Java stuff when a few lines of Pig will do and how solving a specific problem with existing big data tools was far better than trying to launch a platform (yup, know that, already tried).

What Bruce and I also know that we work with average sized data…. it’s not big data but it’s not small data. Do we need Hadoop or Spark? Probably not, can we code and scale it on our own, yes we can. Do we have the skills to do huge data processing, you betcha.

I sat in a room a few weeks ago where mining 40,000 tweets was classed as a monumental achievement, I don’t want to burst anyone’s bubble, it’s not. Even 80 million tweets is not a big data problem, neither an average sized data one. On my laptop doing sentiment analysis took under a minute.

Now enter all life saving AI!

And guess what, it looks like the same mistake is going to be repeated. This time with artificial intelligence. It’ll save lives! It’ll replace jobs! It’ll replace humans! It can’t tell the difference between a turtle and a gun! All that stuff is coming back.

If you firmly believe that a black box is going to revolutionise your business then please be my guest. Just be ready with the legals and customer service department, AI is rarely 100% accurate.

Like big data you’ll needs tons of data to train your “I have no idea how it works it’s all voodoo” black box algorithm. The less you train the more error prone your predictions will be. Ultimately the only the only thing it will harm is the organisation who ran the AI in the fist place. Take it as fact that customers will point the finger straight back at you, very publicly, if you get prediction wildly wrong.

I’ve seen Google video and Amazon Alexa voice classification neural works do amazing things, the usual startup on the street may have access to the tools but rarely the data to train. And my key takeaway of learning since doing all that Nectar card stuff, without quality data and lots of it, you’re fight will be a hard one.

I think there is still a good few years at the R&D coalface trying to figure it all out where AI could fit properly. Yes jobs will be replaced by AI, new jobs will be created. Humans will sit aside robotic machines that take the heavy lifting away (that was going on for a long time before the marketers got hold of AI and started scaring the s**t out of people with it.

It’s not impossible to start something in the AI space and put it on the cloud, though, the costs can add up if you take your eye off the ball. The real question is, “do you really have to do it that way? Is there an easier method?”. Most crunching could be done on a database (not blockchain may I add), hell even an Excel spreadsheet is capable for some without the programming knowledge or money to spend on services.

Popular learning methods are still based on the tried and true methods: decision trees, logistical regression and k-means clustering, not black boxes.  The numbers can be worked out away from code as confirmation, though who does that is a different matter entirely. The most well known algorithms can be reverse engineered: decision trees, Bayes networks, Support Vector Machines, Logistic Regression there’s maths laid down bare showing how they work. The rule of thumb is simple: if traditional machine learning methods are not showing good results then try a neural network (the backbone of AI) but only as a last resort, not the first go to.

If you want my advice try the tradition, well tested, algorithms first with the small data you have. I even wrote a book to help you…..

Like BigData, you more than likely don’t need AI.



How to run params in R scripts from Clojure – #clojure #r #datascience #data #java

You can read the main post here.

Passing parameters into Rscript.

A Mrs. Trellis of North Wales writes….

There’s always one, isn’t there? The decent chap as a point though so let’s plough on with it now.

New R code

First a new R script to handle arguments being passed to it.

#!/usr/bin/env Rscript
args = commandArgs(trailingOnly=TRUE)

if(length(args)==0) {
 stop("No args supplied")
} else {

If I test this from the command line I get the following:

$ Rscript params.R 1 2 3 4
[1] "1" "2" "3" "4"

Okay, that works so far, now to turn our attention to the Clojure code.

The Clojure Code

Let’s assume we have a vector of numbers, these will be passed in the run command as arguments. So I need a way to converting the vector into a string that can be passed in to the sh command.

(defn prepare-params [params]
 (apply str (map #(str " " %) params)))

Which gives an output like this:

rinclojure.example1> params
[1 2 3 4]
rinclojure.example1> (prepare-params params)
" 1 2 3 4"

With a little amend to the run command function (I’m going to create a new function to handle it)….

(defn run-command-with-values [r-filepath params]
 (let [format-params (prepare-params params)
 command-output (sh "Rscript" r-filepath (prepare-params params))]
   (if (= 0 (:exit command-output))
     (:out command-output)
     (:err command-output))))

Running the example passes the string in to the R script.

rinclojure.example1> (run-command-with-values filename params)
"[1] \" 1 2 3 4\"\n"

Not quite going according to plan. We have one string of arguments meaning there’d be some parsing to do on within the R script. Let’s refactor this function a little more.

(defn run-command-with-values [r-filepath params]
 (let [sh-segments (into ["Rscript" r-filepath] (mapv #(str %) params))
       command-output (apply sh sh-segments)]
   (if (= 0 (:exit command-output))
      (:out command-output)
      (:err command-output))))

The (prepare-params)function is now useless and removed. Using the into function we create a single vector of instructions to pass into sh this includes the Rscript command, the filepath and mapping through the values in the parameters.

Instead of running sh on it’s own I’m applying the vector against sh. When it’s run against the R script we get the following output:

rinclojure.example1> (run-command-with-values filename params)
"[1] \"1\" \"2\" \"3\" \"4\"\n"

Now we’ve got what we’re after, separate entries being registered from within the R script. The R script will have to deal with the argument input, converting the strings to numbers but we’re passing Clojure things into R with parameters.

Mrs. Trellis, as the basics go, job done. I’m sure it could be done better. Each case is going to be different so you’ll have to prepare the vector for each R script you work on.




Reverse Engineering the Nonsense. #marketing #coco #eureka


It looks daft doesn’t it…’s either bats**t loonball or someone has just done their job very well indeed…. personally I think it’s marketing genius. This is how I imagine the phone call went…..

[Eureka] – “Hi, is that Coco? It’s marketing dude at Eureka here, we’ve got this idea to sell these vacuum cleaners.”

[Coco] – “Go on, I’m listening….”

[Eureka] – “Will you walk down the high street while one of our other dudes vacuums the street? We’ll give you 10% of the sales”

[Coco] – “Deal! Can I wear what I want? If I’m gonna look mad I might as well do it in style….”

[Eureka] – “Deal!”

(Disclaimer: The above is ALL MADE UP)

Back of the Beermat later….

Facebook views as I took the screenshot: 15,777,263….. nice.

One percent convert to sales? A long shot but hey, it’s madness this morning.

So 157,772 sales at $219 as let’s be honest you want the one that Coco get’s someone to clean the street with…. $34,552,205.97. Nice.

Coco walks about with $3.4m in her back pocket (assuming the getup has pockets).

Not bad for an hour’s work, a bit of mockery on Facebook and Youtube, so odd headlines about you but hey, the exposure is priceless. Eureka have saved a fortune on Youtube CPM fees and a full marketing campaign.

That doesn’t even take into account the outfit and what the baby is wearing. Now if you could scan the image into an app and find out about it…… Oh Kim’s working on that already….

How to run R scripts from Clojure – #clojure #r #datascience #data #java

An interesting conversation came up during a tea break in London meeting this week. How do run R scripts from within Clojure? One was simple, the other (mine) was far more complicated (see the “More Complicated Ways” section below).

So here’s me busking my way through the simple way.

Run it from the command line

The Clojure Code

Using the package gives you access the Java system command process tools. I’m only interested in running a script so all I need is the sh command.

(ns rinclojure.example1
 (:use [ :only [sh]]))

The shfunction produces a map with three keys: an exit code (:exit), the output (:out) and an error (:err). I can evaluate the output map and ensure there’s no error code, anything that’s not zero, and dump the error or if all is well send out the output.

(defn run-command [r-filepath]
 (let [command-output (sh "Rscript" r-filepath)]
   (if (= 0 (:exit command-output))
     (:out command-output)
     (:err command-output))))

The R Code

I’ve kept this function simple, I’m only interested in running Rscript and checking the error code. If all is well then we show output, otherwise we send out the error.

The now preferred way to run R scripts from the command line is the Rscript command which is bundled with the R software when you download it. If I have R scripts saved then it’s a case of running them through Rscript and evaluating the output.

Here’s my R script.

myvec <- c(1,2,3,2,3,4,5,4,3,4,3,2,1)

Not complicated I know, just a list of numbers and a function to get the average.

Running in the REPL

Remember the error is from the running of the command and not within your R code. If you mess that up then those errors will appear in the :out value.

A quick test in the REPL gives us…..

rinclojure.example1> (def f "/Users/jasonbell/work/projects/rinclojure/resources/r/meantest.R")
rinclojure.example1> (run-command f)
"[1] 2.846154\n"

Easy enough to parse by removing the \n and the [1] line which R have generated. We’re not interacting with R only dumping out the output from it. After that there’s an amount of string manipulation to do.

Expanding to Multiline Output From R

Let’s modify the meantest.Rfile to give us something multiline.

myvec <- c(1,2,3,2,3,4,5,4,3,4,3,2,1)

Nothing spectacular I know but it has implications. Let’s run it through our Clojure command function.

rinclojure.example1> (def f "/Users/jasonbell/work/projects/rinclojure/resources/r/meantest.R")
rinclojure.example1> (run-command f )
"[1] 2.846154\n Min. 1st Qu. Median Mean 3rd Qu. Max. \n 1.000 2.000 3.000 2.846 4.000 5.000 \n"

Using clojure.string/split will give us the output in each line into a vector.

rinclojure.example1> (clojure.string/split x #"\n")
["[1] 2.846154" " Min. 1st Qu. Median Mean 3rd Qu. Max. " " 1.000 2.000 3.000 2.846 4.000 5.000 "]

There’s still an amount of tidying up to do though. Assuming I’ve created x to hold the output from the Rscript. Firstly split the \n’s out.

rinclojure.example1> (def foo (clojure.string/split x #"\n"))
rinclojure.example1> foo
["[1] 2.846154" " Min. 1st Qu. Median Mean 3rd Qu. Max. " " 1.000 2.000 3.000 2.846 4.000 5.000 "]

If, for example, I wanted the summary values then I have do some string manipulation to get them.

rinclojure.example1> (nth foo 2)
" 1.000 2.000 3.000 2.846 4.000 5.000 "

Split again by the space.

rinclojure.example1> (clojure.string/split (nth foo 2) #" +")
["" "1.000" "2.000" "3.000" "2.846" "4.000" "5.000"]

The final step is then to convert the values to numbers, forgetting the first as it’s blank. So I would end up with something like:

rinclojure.example1> (map (fn [v] (Double/valueOf v)) (rest (clojure.string/split (nth foo 2) #" +")))
(1.0 2.0 3.0 2.846 4.0 5.0)

We have no referencing to what the number means, if the min, max, average etc. At this point there would be more string manipulation required and you could convert them to keywords or just add your own.

More Complicated Ways.

With the R libraries exists the RJava package. This lets you run Java from R and R from Java. I wrote a chapter on R in my book back in 2014.

It’s not the easiest thing to setup but worth the investment. There is a Clojure project on Github that acts as a wrapper between R and Clojure, clj-jri. Once setup you run R as a REngine and evaluate the output that way. There’s far more control but it comes at the cost of complexity.

Keeping Things Simple

Personally I think it’s easier to keep things as simple as possible. Use Rscript to run the R code but it’s worth considering the following points.

  • Keep your R scripts as simple as possible, output to one line where possible.
  • Ensure that all your R packages are installed and working, it’s not idea to install them during the Clojure runtime as the output will become hard to parse. Also make sure that all the libraries are running on the same instance as your Clojure code.
  • In the long run have a set of solid string manipulation functions to hand for dealing with the R output. Remember, t’s one big string.


Time Critical Offers 101: Watch @garyvee #smartretail

A short post but an important one. It’s one of the most interesting plays I’ve seen to push a time critical offer. And it’s an interesting one to break down a little bit. So, in the great Gary Vaynerchuk tradition let’s get micro on this a little bit.

Buy My Stuff, In Exchange I’ll Give You My Time

So to push a two hour conference here’s the deal, you buy two cases of wine, selected by Gary, for $479.99. There’s no “buy tickets to this event”, no GetInvited or EventBrite links to buy access (and giving another supplier revenue). It’s a simple buy this and you’ll get what Gary is offering, a place at the conference.

Time critical offers are a mix of components. Get them right and you can measure the success:

  • An item, could be an appointment, a session or a stock item. In this case it’s wine.
  • A payoff: money off, free gift or access to something scarce. Here it’s Gary’s time.
  • A time limit. Here’s it’s the day of the conference, October 14th. Assume that with the audience size (see on the image it’s viewed over 206,000 times) that the offer will sell out beforehand. Scarcity accelerates demand.
  • A clear outline of the overheads involved, more on this in a minute.

We now have the elements of a formula:

Item retail price * available = total potential incremental revenue

Not a lot to it really…..

$479.99 * 200 = $95,998

Not bad going. A call to action and incremental revenue. Perfect. At a guess there’s a clear 30% profit margin once you take off sales tax, salaries but there’s no room hire or, I’m assuming, paying Gary to active for two hours (and the rest). Overhead reduction means profit increase.


The scene is simple really: know your audience, know your stock and know your numbers. The time frame it critical, there are customers who want your product and don’t want to lose out.

Find them by the medium that they consume (Snapchat, Instagram, Facebook and Twitter etc) and deliver the message. If you can personalise it then even better, that takes effort though.

In my opinion Gary executed it perfectly, the results though will be in the point of sale. That’s the measure.

Saving the Stylist Time. Dappad is ripe for machine learning: @dappad_official #dragonsden #toukertime

It’s not often I watch Dragon’s Den and get a little bit exited. Okay I kind of knew that investment wouldn’t be on the table but the opportunity is. What concerned me was that it’s Erika’s gig, she is the stylist, the brand and that brings it’s own problems as growth happens.

Time is the main metric

Throughput of orders and recommendations takes time. The three boxes a year is very similar to Tesco’s “four Christmas’ a year” concept for Clubcard vouchers.

If you reduce the time and you put more orders through. Doing it on your own is possible but growth can only be taken the point of the number of boxes you can put together in one day.

So if we can find a way to save time we can process more. And there are two key aspects that will make that happen: customer preference data and product attribute data.  If you can marry those two then you are on the way to improving process. I don’t know how Erika is doing it right now, from the pitch it sounded like it was all a manual process. I could be wrong.

Machine Learning Can Help

The main focus here is to get machine learning to automate the selection process for Erika, some form of match making algorithm, the who-gets-what selection that gives a list of preferred items to to box.

The final say is with Erika, not the algorithm, and that’s the important part as the customer is still paying for a personal service so there needs to be involvement. Machine learning aids the process but does not take over.

Measure Everything

Peter Jones main beef was over returns which is a reasonable concern. We know what products are going out (from our theoretical system) and we know that some products are going to come back. This becomes a self learning system, items that worked and items that didn’t are fed back into the system so the recommendations can improve.

Be certain of one thing, you will never have a perfect prediction but you can feed as much data back into the algorithm to ensure that your error rate starts to reduce. Once you are increasing certainty then you are reducing the chance of returns. That starts to increase the value of the customer and therefore increases the bottom line.

The matter of held inventory was also an issue, using an automated recommendation there’s a process that could, over time, minimise the stock holding by Dappad and just be able to order in a just-in-time basis. Automate the recommendation across the user base, order from the suppliers required quantities and then box appropriately.

Summing Up

There’s nothing here that I have presented that’s out of the ordinary nor anything that would worry me as a customer. It’s just taking a look at the supply chain process and seeing what could be improved with a little automation and algorithmic learning.

The questions in my head right now:

  • If you introduced 4 boxes a year instead of three what’s the impact to turnover?
  • Can you use Zara supply chain learning to Dappad and get down to near zero stock?
  • Would the introduction of some form of artificial intelligence or machine learning reduce the returns by 30%? If so what’s the financial uplift?
  • Can you replicate to different bands of customer: low spend, mid spent, luxury markets.

Ultimately all five Dragons passed on Dappad and for once in my life I actually think that Touker Suleyman missed a trick here….. no #toukertime this time.