DATA IS THE NEW OIL - CRUDE OIL

Krishen Patel our consultant managing the role
Posting date: 9/2/2013 2:16 PM

When Nasdaq stopped trading this week, it again showed how global firms are at the mercy of a power that created them

"Data is the new oil," declared Clive Humby, a Sheffield mathematician who with his wife, Edwina Dunn, made £90m helping Tesco with its Clubcard system. Though he said it in 2006, the realization that there is a lot of money to be made – and lost – through the careful or careless marshalling of "big data" has only begun to dawn on many business people.

The crash that knocked out the Nasdaq trading system was only one example; in the past week, Amazon, Google and Apple have all suffered breaks in service that have affected their customers, lost sales or caused inconvenience. When Amazon's main shopping site went offline for nearly an hour, estimates suggested millions of dollars of sales were lost. When Google went offline for just four minutes this month, the missed chance to show adverts to searchers could have cost it $500,000.

Michael Palmer, of the Association of National Advertisers, expanded on Humby's quote: "Data is just like crude. It's valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc to create a valuable entity that drives profitable activity; so must data be broken down, analyzed for it to have value."

For Amazon and Google especially, being able to process and store huge amounts of data is essential to their success. But when it goes wrong – as it inevitably does – the effects can be dramatic. And the biggest problem can be data which is "dirty", containing erroneous or garbled entries which can corrupt files and throw systems into a tailspin. That can cause the sort of "software glitch" that brought down the Nasdaq – or lead to servers locking up and a domino effect of overloading.

"Whenever I meet people I ask them about the quality of their data," says Duncan Ross, director of data sciences at Teradata, which provides data warehousing systems for clients including Walmart, Tesco and Apple. "When they tell me that the quality is really good, I assume that they haven't actually looked at it."

That's because the systems businesses use increasingly rely on external data, whether from governments or private companies, which cannot be assumed to be reliable. Ross says: "It's always dirty."

And that puts businesses at the mercy of the occasional high-pressure data spill. Inject the wrong piece of data and trouble follows. In April, when automatic systems read a tweet from the Associated Press Twitter feed which said the White House had been bombed and Barack Obama injured, they sold stock faster than the blink of an eye, sending the US Dow index down 143 points within seconds. But the data was dirty: AP's Twitter feed had been hacked.

The statistics are stunning: about 90% of all the data in the world has been generated in the past two years (a statistic that is holding roughly true even as time passes). There are about 2.7 zettabytes of data in the digital analytics universe, where 1ZB of data is a billion terabytes (a typical computer hard drive these days can hold about 0.5TB, or 500 gigabytes). IBM predicts that will hit 8ZB by 2015. Facebook alone stores and analyzes more than 50 petabytes (50,000 TB) of data.

Data is also moving faster than ever before: by last year, between 50% and 70% of all trades on US stock exchanges was being done by machines which could execute a transaction in less than a microsecond (millionth of a second). Internet connectivity is run through fibre optic connections where financial companies will seek to shave five milliseconds from a connection so those nanosecond-scale transactions can be done even more quickly.

We're also storing and processing more and more of it. But that doesn't mean we're just hoarding data, says Ross: "The pace of change of markets generally is so rapid that it doesn't make sense to retain information for more than a few years.

"If you think about something like handsets or phone calls, go back three or four years and the latest thing was the iPhone 3GS and BlackBerrys were really popular. It's useless for analysis. The only area where you store data for any length of time is regulatory work."

Yet the amount of short-term data being processed is rocketing. Twitter recently rewrote its entire back-end database system because it would not otherwise be able to cope with the 500m tweets, each as long as a text message, arriving each day. (By comparison, the four UK mobile networks together handle about 250m text messages a day, a figure is falling as people shift to services such as Twitter.)

Raffi Krikorian, Twitter's vice-president for "platform engineering" – that is, in charge of keeping the ship running, and the whale away – admits that the 2010 World Cup was a dramatic lesson, when goals, penalties and free kicks being watched by a global audience made the system creak and quail.

A wholesale rewrite of its back-end systems over the past three years means it can now "withstand" events such as the showing in Japan of a new film called Castle in the Sky, which set a record by generating 143,199 tweets a second on 2 August at 3.21pm BST. "The number of machines involved in serving the site has been decreased anywhere from five to 12 times," he notes proudly. Even better, Twitter has been available for about 99.9999% of the past six months, even with that Japanese peak.

Yet even while Twitter moved quickly, the concern is that other parts of the information structure will not be resilient enough to deal with inevitable collapses – and that could have unpredictable effects.

"We've had mains power for more than a century, but can have an outage caused by somebody not resetting a switch," says Ross. "The only security companies can have is if they build plenty of redundancy into the systems that affect our lives."


Click here for the article on the web.

Related blog & news

With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.

Visit our Blogs & News portal or check out the related posts below.

Weekly News Digest: 22nd - 26th Feb 2021

This is Harnham’s weekly news digest, the place to come for a quick breakdown of the week’s top news stories from the world of Data & Analytics.  Search Engine Journal: 4 ways call tracking is changing (and why it’s a good thing) Call tracking is no longer about a customer seeing an ad, calling up the company, telling them how much they loved the ad and then deciding to purchase goods. This is a positive thing really because it wasn’t the most effective way for businesses to track how well adverts were doing anyway - who really remembers where they saw a billboard that took their interest, or what time of day an advert popped up on the TV? As call tracking technology becomes more advanced, call analytics have become much more accessible for all. Not only have they been able to transform how businesses of all shape and size advertise and track their success, but also how they market to potential audiences and track their sentiment.  This article from Search Engine Journal looks at the evolution of call tracking and call analytics from its most basic form, how it works now and what the future of this crucial set of analytics will look like in the future.  Read more on this here.  Towards Data Science: Data Science Year Zero Skills or qualifications in Data Science are becoming incredibly sought after by many employers, but the knowledge of how to break into the sector is still a little unclear for potential candidates. In this article by Towards Data Science, they break down the crucial elements of how to successfully enter the industry in four easy steps.  What the author, Bala Vishal, lacked when he started and how you can set off on a better footing.The most important skills and tools to have under your belt.Which skills should you home in on first.How to thrive in the workplace. This incredibly insightful piece should be a ‘must-read’ for any budding Data Scientist looking to break into Data in 2021 and beyond.  Read more here.  KD Nuggets: 10 Statistical Concepts You Should Know for Data Science Interviews This article is perfect for anyone in the Data Science industry. Whether you’re new to the game or looking to take the next step on the career ladder, make sure you brush up on these crucial statistical concepts you should know inside out before entering interview.  A few, in no order, include: Z tests vs T tests An invaluable piece of knowledge that will be used daily if you are involved in any statistical work.Sampling techniques Make sure you’ve got the main five solidified in your knowledge bank - Simple Random, Systematic, Convenience, Cluster, and Stratified sampling.Bayes Theorem/Conditional Probability One of the most popular machine learning algorithms, a must-know in this new era of technology.  Want to know about the other seven? Read more here. Forbes: 48 per cent of Sales Leaders Say Their CRM System Doesn’t Meet Their Needs. The Good News Is That This Is Fixable. This article by Gene Marks explores why teams aren’t happy with their current CRM systems, and how this can be remedied. New research from SugarCRM found: 52 per cent of sales leaders reported that their CRM platform is costing potential revenue opportunities.50 per cent of the companies said they cannot access customer data across marketing, sales and service systems.Nearly one-third complained that their customer data is incomplete, out of date, or inaccurate. While damning statistics, Marks then goes into how this worrying situation can be fixed for good. He says: “Like just about all problems in business, this problem comes down to two factors: time and money. The blunt fact is that most companies are not willing to spend the necessary time or money needed to enable their CRM systems to truly do what they’re designed to do. CRM systems are not just for sales teams. And they're not just for service teams. For a CRM system to be effective, a company must adapt it as its main, collaborative platform.” Read more on this here. We've loved seeing all the news from Data and Analytics in the past week, it’s a market full of exciting and dynamic opportunities. To learn more about our work in this space, get in touch with us at info@harnham.com.

How Are Data & Analytics Professionals Mapping COVID Trends With Data?

The coronavirus pandemic has impacted industries across the globe. There’s no ignoring that simple fact. This disruption (most notably) caused devastating effects in two strands: to our health and to business operations. As the virus spread, the health and wellbeing of people in society worsened, and businesses felt the strain of projects being placed on hold, and work slowing or completely grinding to a halt. As of the 24th February 2021, the disease has infected more than 112,237,188 people, with 2,487,349 reported deaths. For Data & Analytics professionals, it soon became evident that they could use their skills to help. Using the mass of data available, professionals and researchers turned to big data analytics tools to track and monitor the virus’s spread, along with a variety of trends. Here’s how: Genomics and sequencing Life science is a significant application within Data & Analytics and explores the study of all living things on earth. One particular section of this study looks at the concept of genomic sequencing.  Genomic sequencing is significant as it allows us looks at the entire genetic code of a virus – in this case, COVID-19. Most importantly, the technique means that researchers and analysts can identify dangerous mutations and track movements of specific variants. We know that the UK has the most advanced system for tracing covid variants too. Last year, Britain launched one of the world’s largest coronavirus sequencing projects, by investing £20 million in the Covid-19 Genomics UK consortium. In a group that included NHS researchers, public health agencies, academic partners and the Wellcome Sanger Institute, they set out to map the genetic code of as many strains of the coronavirus as possible. And the buy-in paid off. It took the US approximately 72 days to process and share each genetic sequence, compared with 23 days for UK researchers, according to figures compiled by the Broad Institute with data from Gisaid. Tech giants stepping in Ultimately, your organisation is more agile than you think it is. Regardless of the size of the business, or the industry in which it operates, the sector’s response in applying analysis and data to track the coronavirus was nothing short of miraculous. Google introduced a series of features such as popular times and live busyness, COVID-19 alerts in transit, and COVID checkpoints in driving navigation in order to keep their one billion (and growing) app users safe. They also introduced the COVID layer in Maps, a tool that shows critical information about COVID-19 cases in a given area, allowing their customers  to make informed decisions about where to go and what to do. Apple also released a mobility data trends tool from Apple Maps. This data was shared in order to provide insights to local governments and health authorities so that they could support mapping specific covid trends. These first-hand examples indicate the influence and power of using data to better our understanding of the virus. Before the coronavirus pandemic, professionals, businesses and industries alike worked in siloes. What we have witnessed since has been very much the opposite, as experts quickly came together to begin mapping out data requirements and supporting the world’s focus to improve the public’s health and get businesses back on their feet. Without Data & Analytics, none of this would be possible. If you're looking to take the next step in your career or build out a diverse Data & Analytics team, we may be able to help. Take a look at our latest opportunities or get in touch with one of our expert consultants to find out more. 

RELATED Jobs

Salary

£40000 - £70000 per annum + + bonus + benefits

Location

West Midlands

Description

An opportunity to help shape a brand new data engineering function for a well-known national name.

Salary

US$250000 - US$275000 per year

Location

Boston, Massachusetts

Description

Are you a disruptor in the Supply Chain & Inventory Optimization space looking for a new role?

Salary

US$230000 - US$250000 per annum

Location

Boston, Massachusetts

Description

Are you a Supply Chain Analytics Leader with strong Advanced Analytics and/or Data Science fundamentals?

recently viewed jobs