My data – it’s just so big!

Regional ship on the amazon river

Regional ship on the amazon river (Photo credit: Wikipedia)

Every day we produce 2.5 Exabytes of data. How big is an exabyte? It’s a quintillion bytes. How many in a quintillion? 1 followed by 18 zeroes. Still struggling to visualise it? A quintillion pennies, if laid edge to edge, would cover the Earth two deep. If arranged in a cube, would measure 5 miles a side.

You can’t go to a conference these days without someone telling you that we produce more data today in the blink of a gnat’s eye than we did from the dawn of creation up to when something happened a long time ago. Every day, more people start using the Internet. More people join social networks. More social networks are created. More connected devices are manufactured. They all produce data.

Despite the soundbites, I doubt anyone really knows how much data we produce now, let alone how much we produced 10 or 20 years ago. Any figures you hear must be based on estimates on top of estimates, but no-one doubts we are producing a lot and over time the amount of data we produce increases at an ever-growing pace.

When it’s growing this fast and when we have this much, coping with it becomes a challenge. It’s a bit like trying to analyse the Amazon river. Every second, the Amazon spits out roughly 55 million gallons. Even the largest tank in the world would fill in less than a heartbeat. We have to use a different strategy. Either we sample the water every so often and extrapolate or we find a way whereby when something we’re interested in passes by – we get a message.

It’s no surprise that the software industry recognises the need to process and analyse all this data and it’s even less of a surprise that buzzwords have come about to describe the process. Big data is big business and it’s easy to see why. By analysing weather patterns, large retail chains can make a good guess about what’s going to sell and stock their shops accordingly. By analysing web searches, astonishingly accurate predictions can be made about election results or the potential success of a film or a music artist.

However clever all this seems, I can’t help thinking that we are like toddlers discovering our first toy. The potential in all this data is enormous. Who knows, we may be able to predict earthquakes, volcano eruptions, traffic jams, epidemics and murders by analysing everything from how many big macs are eaten in Bolton through to the water temperature in Tahiti.

Advertisements

Happy birthday to you

Happy Birthday to You!

Happy Birthday to You! (Photo credit: Wikipedia)

If you saw a sign above the door of a shop announcing that the proprietor established the business in 1993, you would probably shrug your shoulders and say so what? After all, 20 years is not a long time for a shop in the scheme of things.

In technology terms though, 2 decades is an eternity. Although Apple and Microsoft can trace their roots back nearly 40 years, there are not many tech firms that can. Amazon, eBay, Google & Facebook were just a twinkle in someone’s eye 20 years ago.

This year, my employer celebrates their 20th birthday and after working for them for 13 years, I can’t help but feel a certain pride in the achievement. It hasn’t always been plain sailing. The world collectively held its breath after 9/11 which meant that sales of banking software (among other things) fell off a cliff. The latest banking crisis (followed by the sovereign debt crisis) also meant that banks were a bit preoccupied. Still, we have emerged from these crises and the future looks bright for Temenos.

When any tech company first sets out, they’re going to need some IT. Assuming they went for the state of the art, then their machines  would have been powered by Pentiums – probably with a 60 MHz clock speed. Windows NT came out in 1993 so perhaps that would be the operating system of choice. If they waited until the end of the year, Windows 3.11 (or Windows for Workgroups) might be an option.

If they wanted to do some research on the internet, they would have found it fairly barren with only 50 World Wide Web servers. Just about every page would have a cute “Under Construction” graphic and their browser of choice would probably have been Mosaic (the Granddaddy of Netscape Navigator).

If they wanted to stay in touch with each other whilst out on the road, they would need some mobile phones. They would be fairly chunky, have terrible battery life and be analogue in nature. The mobile operators were still building their networks so the chances of holding a complete conversation free of interference were fairly slim.

No-0ne had heard of Big Data – after all – we transmit more data round the internet in a single second than we did in the whole of 1993. If people talked about clouds, they were the white, fluffy sort that float around in the sky. The words “Service Oriented Architecture” had yet to be uttered by overpaid consultants.

Today – a startup company has unbelievable resources at their fingertips. The internet is chock full of useful information. Social media makes it easy to build a network and get your message out. Cloud means a startup can commission a sophisticated network of IT for no capital outlay. It has never been so easy to start a company. Unfortunately, your competition also have all these resources at their disposal.

Temenos had none of these resources at their disposal and yet they have grown from nothing to a half a billion dollar company. They employ 4,000 people of which I am one. Happy birthday Temenos. Here’s to many more.

What will archeologists make of the here and now?

trowel

trowel (Photo credit: turtlemoon)

It has been a while since I visited the dump – sorry, municipal waste processing plant. The last time I was there, all the metal containers were for household waste. You parked up, selected the container closest to the car, and lugged everything you wanted to dispose of up the metal stairs and over the side. Today, things have changed. The first clue is the name. It has changed to the household waste recycling centre. The second clue is the sign outside which explains what goes where.

No longer is every container for household waste. There is but one such container and it has a guardian. Many climbed the steps to the household waste container, but few were deemed worthy by the guardian. They were dismissed with such mystical words as “no mate – electrical” or “sorry mate – that’s timber”.

Sometimes I thought he was being cruel by waiting until the poor soul had struggled up the steps with something particularly heavy or awkward before making his assessment. But no – he took his job very seriously and sometimes he needed a closer look before deciding whether to consider the item worthy of entry into his household waste container.

It was a bit bewildering at first, but the operatives were incredibly patient and when asked (for what must have been the hundredth time that day) where something went, they politely pointed in the right direction. I’m all in favour of anything that makes our meagre resources go further and the less stuff that ends up as landfill, the better.

If we get really good at this though, the future archeologists of the world are going to struggle. I once took a part-time archeology course and they live or die based on what they manage to dig out of the ground. If we become so efficient that we recycle absolutely everything, there will be nothing to dig up. I guess graffiti, the modern equivalent to cave painting might give them a few clues, but otherwise, they will be stumped.

It doesn’t help that so much of what we live and breathe is now digital either. Formats change so often that given enough time, any form of media will prove impossible to read. Books and paper degrade too, so it’s highly unlikely that an archeologist will be digging up a copy of Fifty Shades of Grey in a thousand years, although what they would make of a society who read such literature is hard to say.

The volume of data we produce and the rate at which it grows is unbelievable. We currently measure that data in exabytes, but how long until that becomes zettabytes or yottabytes is anyone’s guess. The ironic thing is that if we recycle everything and all media is left unreadable, a future archeologist might conclude that we were no more advanced that our stone age ancestors.

Information overload

Big Data: water wordscape

Big Data: water wordscape (Photo credit: Marius B)

Information is the most valuable tool we have. Without reliable information, decisions can only be made on hunches. One of the main benefits of computers was that they were supposed to help us deal with information, but there are some unfortunate side effects. Partly because of our hunger for ever more power and partly because of progress in making bigger and better computers, they have steadily grown ever since they were invented in a phenomenon is widely referred to as Moore’s law.

Not only that, but devices have become cheaper and cheaper. The Sinclair Spectrum was priced at £179 at launch 30 years ago, a sum probably equivalent to double that now. That was a machine with no screen, 48K of memory and no storage. Nowadays you could have a tablet, or a phone or even a couple of netbooks all with power that would completely dwarf the humble spectrum. As computers have become more affordable, they have proliferated.

And not just consumer devices. It now makes perfect sense to use computational devices in almost every setting thanks to the cheapness and the ingenuity of the Chinese. IBM told me at impact that there are 9.5Bn connected devices in the world today growing to 20Bn in 2015. All of these devices are producing data at an alarming rate. Storage companies fall over themselves to tell you how fast this data is growing, but take it from me – it’s exponential.

The other side effect of computers is that the average human being now has the attention span of a hyperactive goldfish. Because the answer is available from Google in 0.2 nanoseconds, why would you want to read a report or a book? Twitter delivers a constant stream of news in 140 character long chunks. This means that we are no longer happy to do lots of analysis to get our answers, we want to know instantly.

So how do we make sense of this muddle?

I have always been a fan of infographics as a mechanism for presenting complex information. Ingeniously through pictures, infographics helps the reader to understand the big picture. Spatial and relative relationships are easily picked out, but detail is not. Newspapers use info graphics to great effect – especially when trying to explain concepts like exactly how much money we all owe thanks to the financial meltdown. This is a great site to see some really good examples; http://www.coolinfographics.com/

Big Data was a much vaunted concept at IBM Impact too. If you want to process the huge amount of data available today, then traditional methods are just not going to work. Reading all the data into a relational database and then running a query will mean you need a huge building to house the storage, a power station to feed the computers and half a century to wait for the result. The only way to process this data is as a stream – a bit like the water coming out of a firehose. Then you need to use something called complex events processing to look for relationships between the data.

One of the most interesting concepts in dealing with the information overload is the semantic web. The idea is that computers need to make sense of all this data themselves and answer any questions we have. This is the technology behind the Siri voice response software in the latest iPhone. There is even a search engine. It’s not perfect by any means – try searching for “Who is the richest man in France” or my personal favourite “Who is the sexiest woman” 🙂 But what is interesting is how often it does get it right.

One day – the characters in Star Trek the next generation talking to their computers will look positively antiquated – and I for one can’t wait.