Big data demystified

William Wrigley our consultant managing the role
Posting date: 11/17/2014 12:00 AM

The term “Big Data” as applied to IT was coined around 2011, and various persons have laid claim to having been the first person to coin it. It has become a buzz word that is sometimes misunderstood and often abused. Here we will try to demystify it so we can understand what it is and how we can realize its real value.

What is Big Data?

Many alternative definitions of Big Data have been published. One of the most insightful of these was proposed by Gartner and has become the accepted standard. It defines Big Data as “High volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization."

These three V’s of Big Data, volume, velocity and variety, have more recently been augmented by the addition of a fourth V, veracity.

Volume

It is now almost a cliché to say that 90% of the existing data has been generated in the last two years. Around 2.4 trillion gigabytes of data are generated globally each day. Much of it arises through the internet and information generating digital analytics devices such as smartphones, digital analytics cameras and CCTVs and it is growing exponentially. It is estimated that by 2020 worldwide corporate data will exceed 35,000 exabytes where an exabyte is one quintillion bytes. 
Despite its size, big data is not  all about volume. In principal any volume of data could be processed using conventional database software if volume was the only issue.

Velocity

Data velocity, the rate at which information flows, is also increasing as a similar rate to volume. The increase in data velocity is in line with improving technologies are developing in accordance with Moore’s Law. 

Variety

The pivotal V that makes data Big Data is variety. While standard database software handles structured data, Big Data is often unstructured and cannot be processed with the same tool sets. In fact it can be a combination of various data categories including structured, semi-structured and unstructured. Typically it could consist of XML, database tables, audio and video files, text messages, tweets, and so forth.

Veracity

Veracity is an obvious addition. Unless the data is relevant, accurate and can be trusted, it is of little or no value. Ensuring the veracity of Big Data can be a challenge as it is difficult to control its quality. Any organization using Big Data must have the means of deciding whether it is beneficial and the extent to which it can be trusted.

Dealing with Big Data

One of the more popular ways of dealing with Big Data is Apache Hadoop. Named after a toy elephant it is an open source project designed to enable the large scale storage and processing of big data sets across server clusters.
It is hugely saleable and can readily be scaled from a single server to many thousands of servers either in premise or in the cloud. Originally developed by Yahoo and Google, its users include Yahoo, Facebook, Twitter, LinkedIn, and many more.

In addition to its scalability, it provides an inexpensive approach to massive parallel computing. It is flexible and can handle any kind of structured or unstructured data from unlimited sources which can be joined and aggregated and is fault tolerant.

Another popular approach is NoSQL. Also open source, it is a database framework that enables the storage and processing of large quantities of structured and unstructured data.

Big Data in Practice

Big Data and analytics have proven to be a success for many organizations. Its applications have included:

  • Understanding and targeting customers –  one of the main applications of Big Data. Examples of this in practice include Amazon Recommendations and personalized Tesco money-off coupons
  • Business processes – applications include optimizing supply route logistics, stock control based on social media trends, and HR processes including recruitment
  • Healthcare – for instance the side effects of drugs, correlations between lifestyle and health, the human genome, and the spread of infections
  • Big science – for instance the LHC at CERN generates one petabytes of data a second. Although most of it is discarded, CERN scientists store and process 30 petabytes a year using 65,000 servers.
  • Security and law enforcement – in the US the NSA uses Big Data in its war on terrorism as does GCHQ in the UK.

Is Big Data Over-hyped?

While the value of big data is clear, it isn’t a panacea. Certainly it has failed to live up to many of its early expectations and, according to some commentators, it has passed the peak of inflated expectations and has descended into the “trough of disillusionment”.

The backlash set in following the failure of Google Flu Trends which claimed to identify flu outbreaks using search queries. It got it spectacularly wrong, overestimating cases significantly from 2009 to 2013.

Big Data has several intrinsic weaknesses. These include:

  • While Big Data can detect subtle correlations, it can’t show causal relationships. This can lead to bad and dangerous conclusions. For instance the increasing number of autism diagnoses has been highly correlated with organic food sales.
  • Big data throws up correlations that appear to be statistically significant, but they happen just by chance simply because of the volume of data. The harder you look the more patterns you find even though they aren’t really there.
  • Big Data advocates have claimed that searching for models is no longer relevant as Big Data alone can deliver the answers. This is a dangerous and potentially catastrophic position that fortunately is losing sway.

Finally

Big Data is a large amount of data that may be structured, unstructured or both. It is characterized by its volume, velocity, and variety, and to be valuable it must have veracity too. However its real value is realized only when analytics are used to extract from it useful information.

It has changed how we do business, interact with each other and our customers, and protect our citizens from terrorism. Its benefits are clear, but so too are its potential dangers. Regardless, it’s here to stay so we should ensure that we learn how to handle it.

<

Related blog & news

With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.

Visit our Blogs & News portal or check out the related posts below.

Weekly News Digest: 10th - 14th May 2021

This is Harnham’s weekly news digest, the place to come for a quick breakdown of the week’s top news stories from the world of Data & Analytics.       Personnel Today: Mental Health Awareness Week: Concerns up 24% from last year It was Mental Health Awareness week this week, and this year, the focus was on the theme of nature. Personnel Today revealed some worrying statistics on the back of research from Close Brothers into the state of the population’s wellbeing in 2021.  Reports of mental ill-health has increased by nearly a quarter since this time last year as a direct consequence of the stresses and strains of COVID-19. From yo-yoing in and out of lockdowns to extended periods of isolation, job uncertainty and illness, this year has been like no other and it’s most certainly taken its toll.  63 per cent of 16–34-year-olds report mental health worries, up a seventh from last year.For those who are 55+, this worry has risen by a third. In this piece, it is made clear that the underlying issue lies not only with COVID-19, but the lack of support given by employers. The research revealed that 70 per cent of employers don’t have a wellbeing budget in place, and only 8 per cent of firms invest more than £126 per employee each year in health and wellbeing.  To read the full research, visit Personnel Today here.  Towards Data Science: 5 unique skills every Data Scientist should know We know that career tip articles for Data Scientists can all feel pretty ‘samey’. But this article in Towards Data Science mixes up the usual advice, looking at how those in, or aiming to be in, the industry need to brush-up on their softer skills if they are to be successful.  Tips include: Cutting down the jargon in order to communicate effectively with stakeholders. Don’t be hasty to overpromise, or you’re at risk of seriously under-delivering. Become friendly with your team’s software engineer, they’ll only be able to help you be more efficient and effective in your role.  Of course, there has to be some mention of coding in there – it wouldn’t be a data-based article without it. Make sure you’re mastering your SQL Optimisation. Don’t leave your Git out in the cold, become familiar with the practice to ensure you can update your model code quickly.  To read the full article, click here.  Analytics India Mag: What SMBs can learn from Big Tech’s AI playbook? AI has come on leaps and bounds in a short space of time, and its popularity has boomed. For the monster-sized companies, where budget is of no question and innovation can happen overnight if need be - embracing AI has been a total no-brainer. Workflows become more efficient, technology becomes smarter, and the scope of growth seems infinite.  However, despite all the benefits of AI that are so regularly shouted about, it’s been clear since the birth of the technology that there’s a huge divide in those who can and those who cannot afford to implement this innovation.  Up until now.  In this piece from Analytics India Mag, author Ritka Sagar, highlights how SMEs are finally finding ways to become ‘inventive’ with how they implement and use AI systems without breaking the bank.  To read how SMEs are managing this, click here. Silicon Republic: For smart cities to work, they need to be neutral and objective The concept of a smart city seems like something out of a futuristic, sci-fi film but, in fact, they are closer to becoming a reality than we may think.  The idea being that urban areas use sensors and other electronic methods to collect data. From citizens to traffic, water supply networks to crime detection, all of these assets of life, and more, are monitored, data collected, and insights given to make ‘life’ more efficient.  On the surface, it’s all very cool, but there are, of course, worries that come with it. In this Silicon Republic article, Computer Scientist, Larissa Suzuki, discusses the importance of ‘neutral and objective’ smart cities if they are to work.  She says; “Data and services in smart cities must be neutral and objective when reporting information about the city environment. They should encompass the entire population and respect data licences, regulation and privacy laws,” she said. “In a similar fashion, the digital services and the backbone technology – including algorithms – should be free from any ideology or influence in their conception, operation, integration and dissemination.” To read more on the future of smart cities, visit Silicon Republic here. We've loved seeing all the news from Data & Analytics in the past week, it’s a market full of exciting and dynamic opportunities. To learn more about our work in this space, get in touch with us at info@harnham.com.     

Using Data Visualisation To Bring Data & Analytics To Life

The majority of the human population are visual learners. Our brains are wired in such a way where we can register 36,000 visual messages per hour, and visuals are processed 60,000 times faster than text. In short, one of the best ways to truly assimilate and understand new-found knowledge is through clear and digestible imagery.  Because of this valuable insight, we are now witnessing the fast-growing trend of Data Visualisation. Over the next six years, the value of Data Visualisation tools is expected to reach $19.2 billion, over double what it was in 2019.  Data & Analytics is one key area where data visualisation is used continuously. The raw data collected on a daily basis by Data Analysts can be incredibly time-consuming to sift through, not forgetting near-impossible to form palatable findings from. However, through the use of data visualisation tools such as graphs, heat maps, charts and infographics, confusing, text-based data can be transformed and brought to life. So, how can Data Visualisation help your business? Greater understanding of your data As Lydia, our Senior Recruitment Consultant, stated in her most recent article – data insights have the capability of not only improving decision-making, but also allow you to spot key trends, errors and predict future challenges. Nevertheless, all of these brilliant capabilities of data insights can only occur when teams can garner an in-depth understanding of the data being presented to them.  Without a background in statistics, which very few members of any team would possess, the raw data simply wouldn’t mean anything, and key insights could be missed. Utilising data visualisations not only makes data more tangible, but it also allows every team member to understand the data, make decisions and implement changes more efficiently. Standing out from the competition The effectiveness of Data Visualisation is no secret, and time and time again it’s been proved that this way of presenting data is far more likely to produce results than simply reviewing text.  Research within Analytics Insight reported that businesses using data discovery tools are 28 per cent more likely to find timely information compared to their dashboard-using counterparts, and 48 per cent of business intelligence users at companies with visualisation tools are able to find the information they need without the help of a specialist team.  Nevertheless, despite the incredible benefits, only 26 per cent of businesses globally are using data visualisation tools.  While the reasons for this slow uptake are varied, it’s clear that those companies who are willing to invest in Data Visualisation are far more likely to stand a head above their competitors. It can improve customer experience 98 per cent of companies will use data to help drive a better customer experience, but it doesn’t always mean that this data is collected, managed or presented well.  Data is, and should be, used as a way to back up what brands are saying, especially if they’re shouting from the rooftops about how fantastic they are.  When a business or brand uses accurate Data Visualisation to tell this story – for example, the percentage of consumers who report high levels of customer satisfaction, or the amount of money donated to CSR projects – audiences will respond much better than if the claim appears to be empty words without any evidence.  Data Visualisation is undoubtedly one of the most effective ways to communicate data, both internally and externally. The comprehensible formats available enables information to be processed with ease, and for learnings and understandings to be absorbed and implemented with much more efficiency than text-based raw data. It’s clear that this trend is only going to grow in popularity as businesses begin to put more investment behind it in order to reap the benefits and watch the positive impact on their bottom lines prosper.  For examples of how Harnham uses Data Visualisation, head over to our recent research reports.  If you're looking to take the next step in your career or build out your Data & Analytics team, we can help. Take a look at our latest opportunities or get in touch with one of our expert consultants to find out more. 

RELATED Jobs

Salary

US$160000 - US$170000 per annum

Location

Boston, Massachusetts

Description

I'm looking for a Senior Cloud Engineer to design, develop, and maintain the AWS cloud platform for the entire organization.

Salary

US$155000 - US$175000 per annum

Location

New York

Description

Do you have a broad, in-depth knowledge of Machine Learning? Have you developed Media Mix Models and Attribution Models from scratch?

Salary

£80000 - £100000 per annum + Flexible Working

Location

London

Description

Lead a team of enthusiastic passionate optimisation specialists as well as build strategies that take optimisation to the next level!

recently viewed jobs