With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.
Visit our News & Blogs portal or check out our recent posts below.
The term “Big Data” as applied to IT was coined around 2011, and various persons have laid claim to having been the first person to coin it. It has become a buzz word that is sometimes misunderstood and often abused. Here we will try to demystify it so we can understand what it is and how we can realize its real value.
What is Big Data?
Many alternative definitions of Big Data have been published. One of the most insightful of these was proposed by Gartner and has become the accepted standard. It defines Big Data as “High volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization."
These three V’s of Big Data, volume, velocity and variety, have more recently been augmented by the addition of a fourth V, veracity.
Volume
It is now almost a cliché to say that 90% of the existing data has been generated in the last two years. Around 2.4 trillion gigabytes of data are generated globally each day. Much of it arises through the internet and information generating digital analytics devices such as smartphones, digital analytics cameras and CCTVs and it is growing exponentially. It is estimated that by 2020 worldwide corporate data will exceed 35,000 exabytes where an exabyte is one quintillion bytes.
Despite its size, big data is not all about volume. In principal any volume of data could be processed using conventional database software if volume was the only issue.
Velocity
Data velocity, the rate at which information flows, is also increasing as a similar rate to volume. The increase in data velocity is in line with improving technologies are developing in accordance with Moore’s Law.
Variety
The pivotal V that makes data Big Data is variety. While standard database software handles structured data, Big Data is often unstructured and cannot be processed with the same tool sets. In fact it can be a combination of various data categories including structured, semi-structured and unstructured. Typically it could consist of XML, database tables, audio and video files, text messages, tweets, and so forth.
Veracity
Veracity is an obvious addition. Unless the data is relevant, accurate and can be trusted, it is of little or no value. Ensuring the veracity of Big Data can be a challenge as it is difficult to control its quality. Any organization using Big Data must have the means of deciding whether it is beneficial and the extent to which it can be trusted.
Dealing with Big Data
One of the more popular ways of dealing with Big Data is Apache Hadoop. Named after a toy elephant it is an open source project designed to enable the large scale storage and processing of big data sets across server clusters.
It is hugely saleable and can readily be scaled from a single server to many thousands of servers either in premise or in the cloud. Originally developed by Yahoo and Google, its users include Yahoo, Facebook, Twitter, LinkedIn, and many more.
In addition to its scalability, it provides an inexpensive approach to massive parallel computing. It is flexible and can handle any kind of structured or unstructured data from unlimited sources which can be joined and aggregated and is fault tolerant.
Another popular approach is NoSQL. Also open source, it is a database framework that enables the storage and processing of large quantities of structured and unstructured data.
Big Data in Practice
Big Data and analytics have proven to be a success for many organizations. Its applications have included:
Is Big Data Over-hyped?
While the value of big data is clear, it isn’t a panacea. Certainly it has failed to live up to many of its early expectations and, according to some commentators, it has passed the peak of inflated expectations and has descended into the “trough of disillusionment”.
The backlash set in following the failure of Google Flu Trends which claimed to identify flu outbreaks using search queries. It got it spectacularly wrong, overestimating cases significantly from 2009 to 2013.
Big Data has several intrinsic weaknesses. These include:
Finally
Big Data is a large amount of data that may be structured, unstructured or both. It is characterized by its volume, velocity, and variety, and to be valuable it must have veracity too. However its real value is realized only when analytics are used to extract from it useful information.
It has changed how we do business, interact with each other and our customers, and protect our citizens from terrorism. Its benefits are clear, but so too are its potential dangers. Regardless, it’s here to stay so we should ensure that we learn how to handle it.
With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.
Visit our News & Blogs portal or check out our recent posts below.
14. January 2021
07. January 2021
23. December 2020
10. December 2020
Get your hands on the most extensive Data & Analytics Salary Guide Harnham has ever produced.
Harnham's 2020 Salary Guide goes beyond salaries to give a clear picture of the state of the Data & Analytics industry. Simply select your region to download your copy.
US SALARY GUIDE“Created for analysts, by analysts, Harnham's Salary Guide serves as a pulse check on the latest industry trends, salaries and challenges."Dave Farmer, COO of Harnham
Read about Harnham's corporate social responsibility initiatives
Find our more about the causes we support.