DATA IS THE NEW OIL - CRUDE OIL

our consultant managing the role
Posting date: 9/2/2013 2:16 PM

When Nasdaq stopped trading this week, it again showed how global firms are at the mercy of a power that created them

"Data is the new oil," declared Clive Humby, a Sheffield mathematician who with his wife, Edwina Dunn, made £90m helping Tesco with its Clubcard system. Though he said it in 2006, the realization that there is a lot of money to be made – and lost – through the careful or careless marshalling of "big data" has only begun to dawn on many business people.

The crash that knocked out the Nasdaq trading system was only one example; in the past week, Amazon, Google and Apple have all suffered breaks in service that have affected their customers, lost sales or caused inconvenience. When Amazon's main shopping site went offline for nearly an hour, estimates suggested millions of dollars of sales were lost. When Google went offline for just four minutes this month, the missed chance to show adverts to searchers could have cost it $500,000.

Michael Palmer, of the Association of National Advertisers, expanded on Humby's quote: "Data is just like crude. It's valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value."

For Amazon and Google especially, being able to process and store huge amounts of data is essential to their success. But when it goes wrong – as it inevitably does – the effects can be dramatic. And the biggest problem can be data which is "dirty", containing erroneous or garbled entries which can corrupt files and throw systems into a tailspin. That can cause the sort of "software glitch" that brought down the Nasdaq – or lead to servers locking up and a domino effect of overloading.

"Whenever I meet people I ask them about the quality of their data," says Duncan Ross, director of data sciences at Teradata, which provides data warehousing systems for clients including Walmart, Tesco and Apple. "When they tell me that the quality is really good, I assume that they haven't actually looked at it."

That's because the systems businesses use increasingly rely on external data, whether from governments or private companies, which cannot be assumed to be reliable. Ross says: "It's always dirty."

And that puts businesses at the mercy of the occasional high-pressure data spill. Inject the wrong piece of data and trouble follows. In April, when automatic systems read a tweet from the Associated Press Twitter feed which said the White House had been bombed and Barack Obama injured, they sold stock faster than the blink of an eye, sending the US Dow index down 143 points within seconds. But the data was dirty: AP's Twitter feed had been hacked.

The statistics are stunning: about 90% of all the data in the world has been generated in the past two years (a statistic that is holding roughly true even as time passes). There are about 2.7 zettabytes of data in the digital analytics universe, where 1ZB of data is a billion terabytes (a typical computer hard drive these days can hold about 0.5TB, or 500 gigabytes). IBM predicts that will hit 8ZB by 2015. Facebook alone stores and analyzes more than 50 petabytes (50,000 TB) of data.

Data is also moving faster than ever before: by last year, between 50% and 70% of all trades on US stock exchanges was being done by machines which could execute a transaction in less than a microsecond (millionth of a second). Internet connectivity is run through fibre optic connections where financial companies will seek to shave five milliseconds from a connection so those nanosecond-scale transactions can be done even more quickly.

We're also storing and processing more and more of it. But that doesn't mean we're just hoarding data, says Ross: "The pace of change of markets generally is so rapid that it doesn't make sense to retain information for more than a few years.

"If you think about something like handsets or phone calls, go back three or four years and the latest thing was the iPhone 3GS and BlackBerrys were really popular. It's useless for analysis. The only area where you store data for any length of time is regulatory work."

Yet the amount of short-term data being processed is rocketing. Twitter recently rewrote its entire back-end database system because it would not otherwise be able to cope with the 500m tweets, each as long as a text message, arriving each day. (By comparison, the four UK mobile networks together handle about 250m text messages a day, a figure is falling as people shift to services such as Twitter.)

Raffi Krikorian, Twitter's vice-president for "platform engineering" – that is, in charge of keeping the ship running, and the whale away – admits that the 2010 World Cup was a dramatic lesson, when goals, penalties and free kicks being watched by a global audience made the system creak and quail.

A wholesale rewrite of its back-end systems over the past three years means it can now "withstand" events such as the showing in Japan of a new film called Castle in the Sky, which set a record by generating 143,199 tweets a second on 2 August at 3.21pm BST. "The number of machines involved in serving the site has been decreased anywhere from five to 12 times," he notes proudly. Even better, Twitter has been available for about 99.9999% of the past six months, even with that Japanese peak.

Yet even while Twitter moved quickly, the concern is that other parts of the information structure will not be resilient enough to deal with inevitable collapses – and that could have unpredictable effects.

"We've had mains power for more than a century, but can have an outage caused by somebody not resetting a switch," says Ross. "The only security companies can have is if they build plenty of redundancy into the systems that affect our lives."


Click here for the article on the web.

Related blog & news

With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.

Visit our Blogs & News portal or check out the related posts below.

Harnham's Brush with Fame

Harnham have partnered with The Charter School North Dulwich as corporate sponsors of their ‘Secret Charter’ event. The event sees the south London state school selling over 500 postcard-sized original pieces of art to raise funds for their Art, Drama and Music departments. Conceived by local parent Laura Stephens, the original concept was to auction art from both pupils and contributing parents.  Whilst designs from 30 of the school's best art students remain, the scope of contributors has rapidly expanded and now includes the work of local artists alongside celebrated greats including Tracey Emin, Sir Anthony Gormley, Julian Opie, and Gary Hume.  In addition to famous artists, several well-known names have contributed their own designs including James Corden, David Mitchell, Miranda Hart, Jo Brand, Jeremy Corbyn, and Hugh Grant.  The event itself, sponsored by Harnham and others, will be hosted by James Nesbitt, and will take place at Dulwich Picture Gallery on the 15th October 2018.  You can find out how to purchase a postcard and more information about the event here. 

Breaking Code: How Programmers and AI are Shaping the Internet of Tomorrow

Data. It’s what we do. But, before the data is read and analysed, before the engineers lay the foundation of infrastructure, it is the programmers who create the code – the building blocks upon which our tomorrow is built. And once a year, we celebrate the wizards behind the curtain.  In a nod to 8-bit systems, on the 256th day of the year, we celebrate Programmers’ Day. Innovators from around the world gather to share knowledge with leading experts from a variety of disciplines, such as privacy and trust, artificial intelligence, and discovery and identification. Together they will discuss the internet of tomorrow.  The Next Generation of Internet At the Next Generation Internet (NGI), users are empowered to make choices in the control and use of their data. Each field from artificial intelligent agents to distributed ledger technologies support highly secure, transparent, and resilient internet infrastructures. A variety of businesses are able to decide how best to evaluate their data through the use of social models, high accessibility, and language transparency. Seamless interaction of an individual’s environment regardless of age or physical condition will drive the next generation of the internet. But, like all things which progress, practically at the speed of light, there is an element of ‘buyer beware’, or in this case, from ‘coder to user beware’. Caveat Emptor or rather, Caveat Coder The understanding, creation, and use of algorithms has revolutionised technology in ways we couldn’t possibly have imagined a few decades ago. Digital and Quantitative Analysts aim to, with enough data, be able to predict some action or outcome. However, as algorithms learn, there can be severe consequences of unpredictable code.  We create technology to improve our quality of life and to make our tasks more efficient. Through our efforts, we’ve made great strides in medicine, transportation, the sciences, and communication. But, what happens when the algorithms on which the technology is run surpasses the human at the helm? What happens when it builds upon itself faster than we can teach it? Or predict the infinite variable outcomes? Predictive analytics can become useless, or worse dangerous.  Balance is Key Electro-mechanical systems we could test and verify before implementation are a thing of the past, and the role of Machine Learning takes front and centre. Unfortunately, without the ability to test algorithms exhaustively, we must walk a tightrope of test and hope. Faith in systems is a fine balance of Machine Learning and the idea that it is possible to update or rewrite a host of programs, essentially ‘teaching’ the machine how to correct itself. But, who is ultimately responsible? These, and other questions, may balance out in the long run, but until then, basic laws regarding intention or negligence will need to be rethought. Searching for a solution  In every evolution there are growing pains. But, there are also solutions. In the world of tech, it’s important to put the health of society first and profit second, a fine balancing act in itself. Though solutions remain elusive, there are precautions technology companies can employ. One such precaution is to make tech companies responsible for the actions of their products, whether it is lines of rogue code or keeping a close eye on avoiding the tangled mass of ‘spaghetti’ code which can endanger us or our environment. Want to weigh in on the debate and learn how you can help shape the internet of tomorrow? If you’re interested in Big Data and Analytics, we may have a role for you. Check out our current vacancies. To learn more, contact our UK team at +44 20 8408 6070 or email us at info@harnham.com.