First line of the blog post and I’ll say this, everything from this point is hypothetical a complete “what if” chain of events and solutions. But it’s fun to do (well in my eyes anyway).
Shall we begin?
What actually happened?
The failure was during an upgrade of CA-7 which is a work load automation system. It batches up 250,000 banking transactions and processes them. Then takes the next lot and everyone is happy. The original idea that the upgrade happened in India was then corrected in the select committee when it emerged it was done in Edinburgh. Egg. Face. Alignment.
The merging of three systems.
RBS acquired NatWest and then acquired Ulster Bank. That’s the order that transactions get bundled. No fun if it goes wrong and you’re an Ulster Bank customer…. My knowledge from that point on is hazy which is a shame as it looks like a nightmare and that’s what usually perks my interest.
So what we know:
- There’s 20,000,000 (twenty million) transactions to pile through for the Ulster Bank.
- They have to happen in date order*
- Dual currency adds to the mix (Euro and Sterling)
Jase’s very pragmatic and highly simplistic 30,000ft view.
Why do transactions have to happen in date order? Here’s my theory:
For every batch of 250,000 transactions you have four values. A sterling total debit, a sterling total credit, a Euro total debit and finally a Euro total credit. The running totals for each batch are calculated before the processing of the batch starts. By the end of the batch then the totals “should” (there’s that word again) tally up.
If we can do it with one then we can do it with lots. Assuming that works you can then chunk up the input transactions and map them out and process them.
So 20,000,000 transactions in 250,000 blocks is 80 chunks and assuming each chuck takes an hour to process (audit trails, backups and the like have to take place) wouldn’t it be nice for something like our big Hadoop clusters to take the strain and work in parallel.
Customers like two days instead of four weeks. Even by IT standards it’s a very long time to get something like this up and running again.
I’m not here to mock or antangonise the IT folk who are working to sort all this out. Nor am I going to point fingers at anyone. Yes I feel sorry for any customer who’s let down and please don’t use this as a solution, it’s not, it’s just an idea circling in my head.
That said I am wondering (a lot) whether there could have been a solution to take the data and process it quicker than was humanly possible. Now I’ve never worked in banking though it is something that interests me. I’m always keen to learn more.