Big data demystified

William Wrigley our consultant managing the role
Posting date: 11/17/2014 12:00 AM

The term “Big Data” as applied to IT was coined around 2011, and various persons have laid claim to having been the first person to coin it. It has become a buzz word that is sometimes misunderstood and often abused. Here we will try to demystify it so we can understand what it is and how we can realize its real value.

What is Big Data?

Many alternative definitions of Big Data have been published. One of the most insightful of these was proposed by Gartner and has become the accepted standard. It defines Big Data as “High volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization."

These three V’s of Big Data, volume, velocity and variety, have more recently been augmented by the addition of a fourth V, veracity.

Volume

It is now almost a cliché to say that 90% of the existing data has been generated in the last two years. Around 2.4 trillion gigabytes of data are generated globally each day. Much of it arises through the internet and information generating digital analytics devices such as smartphones, digital analytics cameras and CCTVs and it is growing exponentially. It is estimated that by 2020 worldwide corporate data will exceed 35,000 exabytes where an exabyte is one quintillion bytes. 
Despite its size, big data is not  all about volume. In principal any volume of data could be processed using conventional database software if volume was the only issue.

Velocity

Data velocity, the rate at which information flows, is also increasing as a similar rate to volume. The increase in data velocity is in line with improving technologies are developing in accordance with Moore’s Law. 

Variety

The pivotal V that makes data Big Data is variety. While standard database software handles structured data, Big Data is often unstructured and cannot be processed with the same tool sets. In fact it can be a combination of various data categories including structured, semi-structured and unstructured. Typically it could consist of XML, database tables, audio and video files, text messages, tweets, and so forth.

Veracity

Veracity is an obvious addition. Unless the data is relevant, accurate and can be trusted, it is of little or no value. Ensuring the veracity of Big Data can be a challenge as it is difficult to control its quality. Any organization using Big Data must have the means of deciding whether it is beneficial and the extent to which it can be trusted.

Dealing with Big Data

One of the more popular ways of dealing with Big Data is Apache Hadoop. Named after a toy elephant it is an open source project designed to enable the large scale storage and processing of big data sets across server clusters.
It is hugely saleable and can readily be scaled from a single server to many thousands of servers either in premise or in the cloud. Originally developed by Yahoo and Google, its users include Yahoo, Facebook, Twitter, LinkedIn, and many more.

In addition to its scalability, it provides an inexpensive approach to massive parallel computing. It is flexible and can handle any kind of structured or unstructured data from unlimited sources which can be joined and aggregated and is fault tolerant.

Another popular approach is NoSQL. Also open source, it is a database framework that enables the storage and processing of large quantities of structured and unstructured data.

Big Data in Practice

Big Data and analytics have proven to be a success for many organizations. Its applications have included:

  • Understanding and targeting customers –  one of the main applications of Big Data. Examples of this in practice include Amazon Recommendations and personalized Tesco money-off coupons
  • Business processes – applications include optimizing supply route logistics, stock control based on social media trends, and HR processes including recruitment
  • Healthcare – for instance the side effects of drugs, correlations between lifestyle and health, the human genome, and the spread of infections
  • Big science – for instance the LHC at CERN generates one petabytes of data a second. Although most of it is discarded, CERN scientists store and process 30 petabytes a year using 65,000 servers.
  • Security and law enforcement – in the US the NSA uses Big Data in its war on terrorism as does GCHQ in the UK.

Is Big Data Over-hyped?

While the value of big data is clear, it isn’t a panacea. Certainly it has failed to live up to many of its early expectations and, according to some commentators, it has passed the peak of inflated expectations and has descended into the “trough of disillusionment”.

The backlash set in following the failure of Google Flu Trends which claimed to identify flu outbreaks using search queries. It got it spectacularly wrong, overestimating cases significantly from 2009 to 2013.

Big Data has several intrinsic weaknesses. These include:

  • While Big Data can detect subtle correlations, it can’t show causal relationships. This can lead to bad and dangerous conclusions. For instance the increasing number of autism diagnoses has been highly correlated with organic food sales.
  • Big data throws up correlations that appear to be statistically significant, but they happen just by chance simply because of the volume of data. The harder you look the more patterns you find even though they aren’t really there.
  • Big Data advocates have claimed that searching for models is no longer relevant as Big Data alone can deliver the answers. This is a dangerous and potentially catastrophic position that fortunately is losing sway.

Finally

Big Data is a large amount of data that may be structured, unstructured or both. It is characterized by its volume, velocity, and variety, and to be valuable it must have veracity too. However its real value is realized only when analytics are used to extract from it useful information.

It has changed how we do business, interact with each other and our customers, and protect our citizens from terrorism. Its benefits are clear, but so too are its potential dangers. Regardless, it’s here to stay so we should ensure that we learn how to handle it.

<

Related blog & news

With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.

Visit our Blogs & News portal or check out the related posts below.

How to Succeed in Self-Service BI

How to Succeed in Self-Service BI

Business Intelligence, along with Business Analytics and Big Data, is one of the terms often associated with decision-making processes in organisations.  However, there is little discussion around the importance of what skills decision makers in your organisation need to use the technology efficiently.  In recent years, the development of user-friendly tools for BI processes, Self-Service BI are increasing. Self-Service BI is an approach to BI where anyone in an organisation can collect and organise data for analysis without the assistance of data specialists. As a result of this, many businesses have invested in comprehensive storage and information processing tools. However, many are beginning to find that they are not able to realise the gains of these investments as they were expecting, may often due to underestimating the difficulties of introducing these systems into the current processes and transforming existing knowledge into actual actions and decisions.  In a worst-case scenario, if left unplanned, Self Service BI can sabotage your successful BI deployment by cutting mass user adoption, impairing query performance, failing to reduce report backlogs, and increasing confusion over the “single truth”. To prevent this from happening, here are our top three tips for ensuring the right implementation of SSBI in your company: UNDERSTAND YOUR USERS’ NEEDS There are three major user areas for analytics tools: strategic, tactical and operational. The strategic users make few, but important decisions. The tactical users make many decisions during a week and need updated information daily. Operational users are often closest to the customer, and this group needs data in its own applications in order to carry out a large number of requests and transactions.  Understanding the different needs of each group is necessary to know what information should be available at each given frequency to help scale the BI solution.  HARNESS THE POWER OF ADVANCED USERS To ensure a successful BI deployment, utilising advanced users is key. Self-service BI is not a one-size fits all approach. Casual users usually don’t have the time to learn the tool and will often reach out to ‘Power Users’ to create what they need. Hence, these users can become the go-to resource for creating ad-hoc views of data. Power Users are the ideal advocates for your business’ self-service BI implementation and should be able to help spur user adoption.  UPGRADE INTERNAL COMPETENCIES  Our final tip for a successful implementation is to communicate the new tool thoroughly to the users.  It is highly unlikely that employees who have not been involved in the actual development project will immediately understand what the tool should be used for, who needs it, and what it should replace. By upgrading internal competencies, you can avoid becoming dependent on external assistance. Establishing a cross-organizational BI competence centre of 5-10 members, who meet regularly to share their experiences will help drives and prioritise future use of the tool. The added benefit of a successful implementation is that it will generate new ideas from users for how the organisation can use data to make better decisions. If you have the skillset to implement Business Intelligence solutions, we may have a role for you.  Take a look at our latest opportunities or get in contact with our team. 

Real Time Pricing - Coming to a store near you

Real Time Pricing - Coming to a store near you

Real-time pricing: coming to a store near you.Personal shopping is on the brink of taking on a whole new meaning. The advancement of mobile technology and the information held on individuals' shopping histories means product prices could soon adapt as shoppers walk up and down their supermarket aisle.Gone are the days of retailers only being able to actively manage the price of a small number of products once a week. Algorithmic pricing and real-time competitive pricing data allows the changing of product prices on the fly.Amazon is at the forefront of such "real-time pricing" initiatives, which have traditionally been the preserve of online-only retailers.However, brick-and-mortar retailers in the US are showing their UK counterparts the limitless possibilities when it comes to dynamic pricing.Independent consumer electronics retailer Abt Electronics pipes competitive pricing data gathered by Dynamite Data into its point-of-sale systems to allow staff to negotiate prices at the point-of-sale, according to Dynamite Data chief executive Diana Schulz.Meanwhile, another one of Dynamite Data’s unnamed clients uses electronic shelf labels and re-prices every product in their stores each morning based on the prices of its rivals.The ability to change prices dynamically is not simply the preserve of all-powerful brands such as Walmart or Target either.Schulz explained that her company has "seen these types of technologies in both large and mid-sized retailers" despite the "investment in technology and competitive data that is typically needed".Commercial sensitivitiesBack in the UK things are not quite as close to a Minority Report-style personalized shopping experience.Even online-only specialists Shop Direct and Ocado claim they do not engage in real-time pricing, while those that do heavily use real-time data to adapt their prices such as the airline brands are reluctant to discuss the issues.EasyJet declined to comment when contacted because of commercial sensitivities around discussing pricing-related issues.Grocers Tesco, Asda and  Sainsbury’s have all claimed they do not engage in real-time pricing, with the latter two both citing the logistical difficulties in aligning such a strategy across their physical stores and online presence.A Sainsbury’s spokesman claims real-time pricing would result in "chaos", while an Asda spokeswoman saying such a strategy would be a "nightmare".Yet, despite such a negative perspective from UK brands, experts are confident real-time pricing will arrive on these shores sooner or later.Simon Spyer, a partner of VCCP data arm Conduit who began his career working on the Sainsbury's Nectar business, believes the UK will begin to see "more and more" of matching rivals’ prices dynamically, particularly in the grocery and electrical sectors.He explained that real-time pricing is likely to affect "anything where the product is largely commoditized" and in instances where the only way retailers can differentiate that product is by "being really keen on price".Electronic labelsAs it stands the major barrier for implementing "real-time pricing" in-store is changing the prices to match the online price, a hurdle that could be removed by the electronic shelf labels being pioneered in the US.Schemes like Tesco Price Promise and Asda Price Guarantee already use real-time data to 'price match'In the UK various retailers have dipped their toes into the water when it comes to electronic shelf-labeling including a Nisa Local store in Shrewsbury that launched a trial in August last year to carry out automatic pricing and timed promotional updates, alongside QR codes and meal deals.Tesco has also experimented with electronic labeling on various occasions with trials in 2006 and 2008, but the retail giant has yet to combine real-time pricing with its electronic labels.Spyer claims "the capability is definitely there both online and offline – it is whether there is a business rationale for investing in it".However, with major UK supermarkets lacking a pressing reason to implement real-time pricing, that investment may be slow in arriving, argues Kaye Coleman, the founder of price consultancy Ripe Strategic.Coleman explains: "The supermarkets already do price matching – it is not so sophisticated but price matching is already happening".Schemes including the Tesco Price Promise, the Asda Price Guarantee and the Sainsbury’s Brand Match currently use real-time data to "price match" by offering money off the next shop.A cynic could argue the supermarkets should knock money off at the till rather than relying on customers to redeem their vouchers at the next shop, but such an action could hit the companies' bottom line.Mobile sophisticationThe growing sophistication of mobile marketing is also likely to revolutionize the way brands approach their price matching."If you can come up with a value proposition where I check-in [on my mobile] when I walk through the store for the first time and that presents me with a personalized experience based on my purchase history then I could see the benefit for a customer and a retailer," said Spyer.The trick for retailers is persuading customers to adopt such behavior, but the offer of being delivered ever-changing personalized price offers and messages in-store is a compelling proposition.Personalization is already a priority for retailers. Sainsbury’s uses anonymized shopping data gathered from the Nectar card to personalize offers.The levels of personalization offered by Sainsbury’s are increasingly complex. If a female customer buys folic acid they will be sent promotions on other pregnancy-related supplements during the pregnancy period and offers on nappies further down the line.UK retailers are sure to keep a close eye on developments over the Atlantic, with Schulz claiming she knows of clients that are piloting technologies that enable in-store personalized discounts.The challenges on the high-street mean there will inevitably be more casualties, but real-time pricing does not have to be the sole preserve of online-only retailers.Innovative ways of manipulating real-time data could be the shot in the arm the high-street retail industry so desperately needs.This article was first published on marketingmagazine.co.ukClick here for the article on the web.

RELATED Jobs

Salary

£50000 - £75000 per annum + + Benefits

Location

City of London, London

Description

This is the leading company in the online food retail space operating in the UK They are seeking an Infosec Team lead

Salary

£50000 - £60000 per annum

Location

City of London, London

Description

*SENIOR INSIGHT ANALYST - CENTRAL LONDON - SQL & PYTHON/R - UP TO £60,000*

Salary

£42 - £65000 per annum + bonus and benefits

Location

London

Description

A well known marketplace app are seeking a Senior Analyst.

Salary

£36000 - £37000 per annum + Yes

Location

Milton Keynes, Buckinghamshire

Description

Junior Data Scientist, London United Kingdom.

recently viewed jobs