Big data demystified

William Wrigley our consultant managing the role
Posting date: 11/17/2014 12:00 AM

The term “Big Data” as applied to IT was coined around 2011, and various persons have laid claim to having been the first person to coin it. It has become a buzz word that is sometimes misunderstood and often abused. Here we will try to demystify it so we can understand what it is and how we can realize its real value.

What is Big Data?

Many alternative definitions of Big Data have been published. One of the most insightful of these was proposed by Gartner and has become the accepted standard. It defines Big Data as “High volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization."

These three V’s of Big Data, volume, velocity and variety, have more recently been augmented by the addition of a fourth V, veracity.

Volume

It is now almost a cliché to say that 90% of the existing data has been generated in the last two years. Around 2.4 trillion gigabytes of data are generated globally each day. Much of it arises through the internet and information generating digital analytics devices such as smartphones, digital analytics cameras and CCTVs and it is growing exponentially. It is estimated that by 2020 worldwide corporate data will exceed 35,000 exabytes where an exabyte is one quintillion bytes. 
Despite its size, big data is not  all about volume. In principal any volume of data could be processed using conventional database software if volume was the only issue.

Velocity

Data velocity, the rate at which information flows, is also increasing as a similar rate to volume. The increase in data velocity is in line with improving technologies are developing in accordance with Moore’s Law. 

Variety

The pivotal V that makes data Big Data is variety. While standard database software handles structured data, Big Data is often unstructured and cannot be processed with the same tool sets. In fact it can be a combination of various data categories including structured, semi-structured and unstructured. Typically it could consist of XML, database tables, audio and video files, text messages, tweets, and so forth.

Veracity

Veracity is an obvious addition. Unless the data is relevant, accurate and can be trusted, it is of little or no value. Ensuring the veracity of Big Data can be a challenge as it is difficult to control its quality. Any organization using Big Data must have the means of deciding whether it is beneficial and the extent to which it can be trusted.

Dealing with Big Data

One of the more popular ways of dealing with Big Data is Apache Hadoop. Named after a toy elephant it is an open source project designed to enable the large scale storage and processing of big data sets across server clusters.
It is hugely saleable and can readily be scaled from a single server to many thousands of servers either in premise or in the cloud. Originally developed by Yahoo and Google, its users include Yahoo, Facebook, Twitter, LinkedIn, and many more.

In addition to its scalability, it provides an inexpensive approach to massive parallel computing. It is flexible and can handle any kind of structured or unstructured data from unlimited sources which can be joined and aggregated and is fault tolerant.

Another popular approach is NoSQL. Also open source, it is a database framework that enables the storage and processing of large quantities of structured and unstructured data.

Big Data in Practice

Big Data and analytics have proven to be a success for many organizations. Its applications have included:

  • Understanding and targeting customers –  one of the main applications of Big Data. Examples of this in practice include Amazon Recommendations and personalized Tesco money-off coupons
  • Business processes – applications include optimizing supply route logistics, stock control based on social media trends, and HR processes including recruitment
  • Healthcare – for instance the side effects of drugs, correlations between lifestyle and health, the human genome, and the spread of infections
  • Big science – for instance the LHC at CERN generates one petabytes of data a second. Although most of it is discarded, CERN scientists store and process 30 petabytes a year using 65,000 servers.
  • Security and law enforcement – in the US the NSA uses Big Data in its war on terrorism as does GCHQ in the UK.

Is Big Data Over-hyped?

While the value of big data is clear, it isn’t a panacea. Certainly it has failed to live up to many of its early expectations and, according to some commentators, it has passed the peak of inflated expectations and has descended into the “trough of disillusionment”.

The backlash set in following the failure of Google Flu Trends which claimed to identify flu outbreaks using search queries. It got it spectacularly wrong, overestimating cases significantly from 2009 to 2013.

Big Data has several intrinsic weaknesses. These include:

  • While Big Data can detect subtle correlations, it can’t show causal relationships. This can lead to bad and dangerous conclusions. For instance the increasing number of autism diagnoses has been highly correlated with organic food sales.
  • Big data throws up correlations that appear to be statistically significant, but they happen just by chance simply because of the volume of data. The harder you look the more patterns you find even though they aren’t really there.
  • Big Data advocates have claimed that searching for models is no longer relevant as Big Data alone can deliver the answers. This is a dangerous and potentially catastrophic position that fortunately is losing sway.

Finally

Big Data is a large amount of data that may be structured, unstructured or both. It is characterized by its volume, velocity, and variety, and to be valuable it must have veracity too. However its real value is realized only when analytics are used to extract from it useful information.

It has changed how we do business, interact with each other and our customers, and protect our citizens from terrorism. Its benefits are clear, but so too are its potential dangers. Regardless, it’s here to stay so we should ensure that we learn how to handle it.

<

Related blog & news

With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.

Visit our Blogs & News portal or check out the related posts below.

Where Tech Meets Tradition

Where Tech Meets Tradition

If you’re lamenting the decline of handmade traditional products, cast your cares aside. There’s a new Sheriff in town and its name is, Tech. Just a generation ago, children would leave the farm or the family business, go to school, and then move on to make their place in the world doing their own thing. Away from family.  Today, the landscape has changed and those who have left are coming home. But this time, they’re bringing technology with them to help make things more efficient and more productive. Is Tech-Assisted Still Handmade? In a word, yes. Artists still make things “from scratch”, except now technologies allow them to not only see their vision in real-time, but their customers, too. Have you ever wondered what the image in your head might look like on paper or in metal? What about the design of prosthetic arms and healthcare devices by 3D printers? You’re still designing, creating.  But just like any new technology, there’s still a learning curve. Even for cutting-edge craftspeople who find that sometimes, the line between craftsmanship and high-tech creativity may be a bit of a blur. Not to mention the expense for either the equipment required or being able to offer art using traditional tools at technology-assisted prices. Somewhere between the two, there is a trade-off. It’s up to the individual to determine where and what that trade-off is. Life in the Creative Economy One of Banksy’s paintings shredded itself upon purchase at an auction recently. AI is making music and writing books. Augmented Reality, Virtual Reality, and Blockchain all have their place in the creative economy from immersive entertainment to efficient manufacturing processes. Each of these touches the way we live now. In a joint study between McKinsey and the World Economic Forum, 'Creative Disruption: The impact of emerging technologies on the creative economy', the organisations broke down the various technologies used in the creative economy and how they’re driving change. For example: AI is being used to distill user preferences when it comes to curating movies and music. The Associated Press has used AI to free up reporters’ time and the Washington Post has created a tool to help it generate up to 70 articles a month, many stories of which they wouldn’t have otherwise dedicated staff.Machine Learning has begun to create original content. Virtual Reality and Augmented Reality have come together as a new medium to help move people to get up, get active, and go play whether it’s a stroll through a virtual art gallery or watching your children play at the playground.  Where else might immersive media play out? Content today could help tell humanitarian stories or offer work-place diversity training. But back to the artisan handicrafts.  Artistry with technology Whilst publishing firms may be looking to use AI to redefine the creative economy, they are not alone. Other artists utilising these technologies include:  SculptorsDigital artistsPaintersJewellery makersBourbon distillers America’s oldest distiller has gotten on the technology bandwagon and while there is no rushing good Bourbon, but you can manage the process more efficiently. They’ve even taken things a step further and have created an app for aficionados to follow along in the process. Talk about crafted and curated for individual tastes and transparency. It may seem almost self-explanatory to note how other artisans are using technology. But what about distilleries? What are they doing? They’re creating efficiency by: Adding IoT sensors for Data Analytics collection Adding RFID tags to their barrels Creating experimental ageing warehouses (AR, anyone?) to refine their craft. Don’t worry, though. These changes won’t affect the spirit itself. After all, according to Mr. Wheatley, Master Distiller, “There’s no way to cheat mother nature or father time.” Ultimately, the idea is to not only understand the history behind the process, but to make it more efficient and repeatable. A way to preserve the processes of the past while using the advances of the present with an eye to the future. If you’re interested in using Data & Analytics to drive creativity, we may have a role for you. Take a look at our latest opportunities or get in touch with one of our expect consultants to find out more. 

How Will New Financial Risk Regulations Affect European Banks?

How Will New Financial Risk Regulations Affect European Banks?

The financial crisis of 2007-2008 changed banking. The world moved from taking mortgage loans in our dogs’ names to introducing strict regulations for banks prohibiting them from giving out loans to “anyone” without assessing Risk properly. In 2010 the Basel Committee on Banking Supervision (BCBS) introduced BASEL III, a regulatory framework that builds on BASEL I, and BASEL II. This framework changed how banks and financial institutions asses risk. It introduced an Advanced Internal Rate Based Approach (Commonly known as the AIRB approach).  Now, the committee has introduced new changes and, by 2022, all banks and institutions will have to implement the revised IRB Framework, as well as new revised regulations for the standardised approach, CVA Framework and new frameworks for Operational Risk and Market Risk. So, what does this mean for those working Risk? Change Is Coming Change is inevitable, no matter what you do. If you work in Risk Management and Compliance, change is something you can expect to happen, often. As mentioned above, by 2022 there will be lots of changes. The Basel Committee calls this initiative the “finalised reforms”, or BASEL IV which builds on the current regulatory framework BASEL III. Quickly summarised, the changes limit the reduction in capital that effect banks IRB models.  This change is predicted to impact banks in Sweden and Denmark the most, with estimations that capital ratio will fall by 2.5-3%, far higher than the 0.9% expected for the average European bank.  So what does all this mean for Swedish and Danish banks?  What’s Happening Now? One of the main things that Swedish and Danish banks need to revise for these new regulations, are their internal models. The new regulations introduced a new definition of Probability of Default, measured through a model commonly known as a PD model. Effectively this means that every bank must “re-develop” their internal PD Models in the IRB approach. Consequently, we are already seeing a clear response from the banks in their strategies moving forward. It has already become quite apparent that many banks are looking to make IRB model development their focus for 2019-2020 and 2021. This has resulted in a boom in the hiring space for developers with experience in IRB Modelling and Credit Risk Modelling in general, which in turn has led to high demand in the face of the low supply of these types of candidates. Understandably aware of this, modellers are now looking to negotiate higher salaries.  What You Can Do  For candidates that hold the right experience, there are good opportunities at hand. If so inclined, they can utilise this chance to finally see if the grass actually is greener on the other side, or not. However, there are a couple of things worth considering before making a move.   Firstly, are you actually keen on switching jobs? Your skills are probably equally in demand at your current employer and, if you are having doubts about moving from the get-go, you may well be able to negotiate a rise without pursuing a new opportunity. However, if you are serious about finding something new, this is a great time to do so. The majority of banks have found that these new regulations are creating an unsustainable workload,  and are now looking for talent externally to expand their teams. This means that the experienced modeller can pretty much have their pick of the litter.  Furthermore, if you are a junior modeller, there are now plenty of opportunities for you to enter a niche area known for being exciting and innovative. So, wherever you are in your career, these regulatory changes  are likely to have a large impact and open up new avenues for you to explore.   We all know that regulations in banking and finance are now essential, we all agree, even if they can be a little frustrating. However, what people often fail to think of are the opportunities new regulatory requirements create. In the case of BASEL IV, we’re already seeing an increase in demand for strong talent, and a demand for people who are passionate about Risk Management and model development.  For businesses, new regulations also provide the chance to not only improve their teams, but to  create new models that can be utilised to optimise and automate. A lot of financial institutions are already aware of this and are using these models to gain competitive advantage over their competitors, as well as to stay one hundred percent compliant.  If you’re looking to build out you Risk Management team or take on a new Risk opportunity for yourself, we may be able to help. Take a look at our latest opportunities or get in touch with one of our expert consultants to find out more. 

RELATED Jobs

Salary

£50000 - £65000 per annum

Location

London

Description

This role involves working with the most established media agency in the UK

Salary

£600 - £650 per day

Location

Greater London

Description

Hi all, I'm currently recruiting for a Solutions Architect who designing and implementing solutions on Microsoft Azure

Salary

£30000 - £45000 per annum + competitive bonus + benefits

Location

London

Description

A great opportunity to join an exciting and ambitious credit card company as the Lead Analyst on Application Fraud.

Salary

£80000 - £95000 per annum + competitive benefits package

Location

London

Description

A leading consultancy are seeking an experienced Forensic Analytics Director to oversee a team responsible for high volume transaction monitoring.

recently viewed jobs