Big Data Demystified



People at a Big Data Seminar

The term “Big Data” as applied to IT was coined around 2011, and various persons have laid claim to having been the first person to coin it. It has become a buzz word that is sometimes misunderstood and often abused. Here we will try to demystify it so we can understand what it is and how we can realize its real value.

What is Big Data?

Many alternative definitions of Big Data have been published. One of the most insightful of these was proposed by Gartner and has become the accepted standard. It defines Big Data as “High volume, high velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization."

These three V’s of Big Data, volume, velocity and variety, have more recently been augmented by the addition of a fourth V, veracity.

Volume

It is now almost a cliché to say that 90% of the existing data has been generated in the last two years. Around 2.4 trillion gigabytes of data are generated globally each day. Much of it arises through the internet and information generating digital analytics devices such as smartphones, digital analytics cameras and CCTVs and it is growing exponentially. It is estimated that by 2020 worldwide corporate data will exceed 35,000 exabytes where an exabyte is one quintillion bytes. 
Despite its size, big data is not  all about volume. In principal any volume of data could be processed using conventional database software if volume was the only issue.

Velocity

Data velocity, the rate at which information flows, is also increasing as a similar rate to volume. The increase in data velocity is in line with improving technologies are developing in accordance with Moore’s Law. 

Variety

The pivotal V that makes data Big Data is variety. While standard database software handles structured data, Big Data is often unstructured and cannot be processed with the same tool sets. In fact it can be a combination of various data categories including structured, semi-structured and unstructured. Typically it could consist of XML, database tables, audio and video files, text messages, tweets, and so forth.

Veracity

Veracity is an obvious addition. Unless the data is relevant, accurate and can be trusted, it is of little or no value. Ensuring the veracity of Big Data can be a challenge as it is difficult to control its quality. Any organization using Big Data must have the means of deciding whether it is beneficial and the extent to which it can be trusted.

Dealing with Big Data

One of the more popular ways of dealing with Big Data is Apache Hadoop. Named after a toy elephant it is an open source project designed to enable the large scale storage and processing of big data sets across server clusters.
It is hugely saleable and can readily be scaled from a single server to many thousands of servers either in premise or in the cloud. Originally developed by Yahoo and Google, its users include Yahoo, Facebook, Twitter, LinkedIn, and many more.

In addition to its scalability, it provides an inexpensive approach to massive parallel computing. It is flexible and can handle any kind of structured or unstructured data from unlimited sources which can be joined and aggregated and is fault tolerant.

Another popular approach is NoSQL. Also open source, it is a database framework that enables the storage and processing of large quantities of structured and unstructured data.

Big Data in Practice

Big Data and analytics have proven to be a success for many organizations. Its applications have included:

  • Understanding and targeting customers –  one of the main applications of Big Data. Examples of this in practice include Amazon Recommendations and personalized Tesco money-off coupons
  • Business processes – applications include optimizing supply route logistics, stock control based on social media trends, and HR processes including recruitment
  • Healthcare – for instance the side effects of drugs, correlations between lifestyle and health, the human genome, and the spread of infections
  • Big science – for instance the LHC at CERN generates one petabytes of data a second. Although most of it is discarded, CERN scientists store and process 30 petabytes a year using 65,000 servers.
  • Security and law enforcement – in the US the NSA uses Big Data in its war on terrorism as does GCHQ in the UK.

Is Big Data Over-hyped?

While the value of big data is clear, it isn’t a panacea. Certainly it has failed to live up to many of its early expectations and, according to some commentators, it has passed the peak of inflated expectations and has descended into the “trough of disillusionment”.

The backlash set in following the failure of Google Flu Trends which claimed to identify flu outbreaks using search queries. It got it spectacularly wrong, overestimating cases significantly from 2009 to 2013.

Big Data has several intrinsic weaknesses. These include:

  • While Big Data can detect subtle correlations, it can’t show causal relationships. This can lead to bad and dangerous conclusions. For instance the increasing number of autism diagnoses has been highly correlated with organic food sales.
  • Big data throws up correlations that appear to be statistically significant, but they happen just by chance simply because of the volume of data. The harder you look the more patterns you find even though they aren’t really there.
  • Big Data advocates have claimed that searching for models is no longer relevant as Big Data alone can deliver the answers. This is a dangerous and potentially catastrophic position that fortunately is losing sway.

Finally

Big Data is a large amount of data that may be structured, unstructured or both. It is characterized by its volume, velocity, and variety, and to be valuable it must have veracity too. However its real value is realized only when analytics are used to extract from it useful information.

It has changed how we do business, interact with each other and our customers, and protect our citizens from terrorism. Its benefits are clear, but so too are its potential dangers. Regardless, it’s here to stay so we should ensure that we learn how to handle it.

 

Harnham blog & news

With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.

Visit our News & Blogs portal or check out our recent posts below.

The Surprising Collaboration of Ada Lovelace, Charles Babbage, and Alan Turing

What do you get when you combine Amelia Earhart with Ada Lovelace? A Data Visualization Engineer ready to work with an aviation industry partner. Reaching new heights and shattering the glass ceiling is the modus operandi for many women, and what better role models than the ladies listed. Creative, free-spirited, pioneering, and well before their time in thoughts and action. Ada Lovelace, now attributed as the first computer programmer saw beyond the automatons of her day. She saw beyond the Berullean language in front of her she was translating.  A poet father and a passion for numbers collided into her thoughts and as we marvel at AI making art, writing stories and music, and winning strategy games, we have one lady to thank. Ada. She might also be called the first Data Visualization Engineer. Don’t you think? Insightful Business Decisions are Key in Collaboration Data professionals are no longer siloed from other departments in business allowing for collaboration between teams. In partnership between both technical and non-technical employees, businesses can be sure they’re teams have a single vision to help realize business objectives and goals. The collaboration between Ada Lovelace and Charles Babbage may not have been business-related, but the ideas are the same. He passed her the document and asked her to translate, she made notes, and those notes have made history. Together they created a vision for The Analytical Machine – it exists only on paper, but it’s design, layout, and potential implementation are realized in ways unimaginable to most 100 years ago.Ada’s mathematical prowess was such that she wrote her notes in easily explainable language.She worked closely with Charles Babbage and wrote in earnest to work with Michael Farraday – she reached out to others in her field, some accepted, others didn’t. How Data Helps Inform the Future Whether you use predictive modeling, machine learning, natural language processing, or some combination of each, the data you collect helps to inform the future. We may often lament the old adage that those who don’t know their history are doomed to repeat it, but history has a shining light as well. Collaboration across the ages. Consider this. Alan Turing, the man who worked in Bletchley Park with the Enigma machine, used the notes he found to help him solve the problem. Those notes belonged to Ada Lovelace. The information she set to paper informed every stage of computer programming leading to what we know today as Artificial Intelligence. Machines that could learn and ‘think,’ not just the automatons of her age which had been ‘programmed to perform.’ The Enchantress of Numbers Known as the Enchantress of Numbers, the pioneering Ada Lovelace shares the spotlight with other pioneering women in the sciences. Think Madame Curie, Joan Clarke, even Hedy Lamarr, and of course Amelia Earhart. They weren’t of the same eras, but each of their contributions have added to what we know as the Science, Technology, Engineering, and Mathematics (STEM). We have a name for it now, but it’s always been around. And the collaborative efforts of women everywhere are growing and increasing diversity and inclusion in many businesses across the world. And at the heart of it all, in the beginning, a surprising and time-defying collaboration began. It set in motion a spark of business intelligence and insight as men and women mentored and partnered for the sake of their vision of the future. Who will be remembered one hundred years from now?  If you’re interested in Big Data, Web Analytics, Marketing & Insight, Life Science Analytics, and more, check out our current vacancies or contact one of our recruitment consultants to learn more.   For our West Coast Team, contact us at (415) 614 - 4999 or send an email to sanfraninfo@harnham.com.   For our Mid-West and East Coast teams contact us at (212) 796-6070 or send an email to newyorkinfo@harnham.com.  

Computer Vision in Healthcare Beyond Covid-19

2020. It sounds like the name of a futuristic science-fiction movie or TV show, doesn’t it? Maybe it is. And like our favorite sci-fi flicks there are cutting edge changes happening in real time. We’re the characters in this story and the Computer Vision and Artificial Intelligence partnerships in healthcare are moving fast to help us take care of ourselves. When computers can see what we can’t. When AI can help us make more informed decisions. When the two are combined to help doctors and providers work more efficiently to save lives, that’s when the cutting-edge shines. From the collaboration of Johns Hopkins, the CDC, and the WHO mapping out the data to contact traces to medical professionals on the front lines, we’ve been focused on one thing. Saving lives. But, what about the other medical issues that affect us? Heart disease. Cancer. Neurological illnesses.  What if the latest advances in healthcare could help here, too? Five Ways Computer Vision Helps Healthcare Providers Identifies leading causes of medical illnesses in a time-sensitive manner by creating algorithms for image processing, classification, segmentation, and object detection.Develops deep learning models to create neural networks.Collaboration of teams of scientists working together for the advancement of projects and present findings to business leaders, stakeholders, and clients.Allows providers to spend more time with their patients.Optimization of medical diagnoses using deep learning so doctors can spend more time with patients to help see and solve the problem faster. Computer Vision Engineer Meets AI Professional Artificial Intelligence (AI) offers real world answers in healthcare the world needs today. Computer Vision Engineers build the means to which AI helps providers, patients, and leaders make informed decisions. Core requirements for both roles include, but aren’t limited to: Experience in machine learning and deep learning.How to build computer vision algorithms and probability models.Problem-solving skills, creativity, ingenuity, and innovation.Languages like Python, R, Hadoop, Java, and Spark.Be able to see the big picture while at the same time finding the devil in the details. Always striving to improve, to make better, to advance the technology within the industry. The Challenges and the Potential of Technology in Healthcare At the moment, Computer Vision, AI, and other healthcare technology models are localized to individual placements. The next step is to have these technologies ‘speak’ to each other across hospitals, provider’s offices, telehealth applications, and electronic health records management for a more cohesive benefit of care. As this year rounds to a close, we know the vulnerabilities of our healthcare system, and can find solace in the though that technology is bringing it forward at lightning speed. Automation and telehealth appointments have made it a breeze to talk to our doctors and get results faster. We can pay our bills with the click of a button and even carve out a payment plan, if need be. All without leaving our homes. The data now available to us and our providers offers a foundation, a benchmark of information, so our doctors can make more informed decisions. This data goes beyond the individual, it helps set a precedent for not only individuals, but also entire populations, to help us identify future health issues, epidemics, and pandemics.  Stored data is private and stays within its construct of hospital or doctor’s office, but from it we can create models to plan for the future. Want to make your make your mark in the healthcare and tech industry? We may have just the role for you. Check out our current vacancies or get in touch with one of our expert consultants to learn more.   For our West Coast Team, contact us at (415) 614 - 4999 or send an email to sanfraninfo@harnham.com.   For our Mid-West and East Coast teams contact us at (212) 796-6070 or send an email to newyorkinfo@harnham.com.  

A Slam-Dunk Career as a SLAM Engineer

Philadelphia. It’s known for it’s Philly Cheesesteak, the Liberty Bell, and where the Constitution was signed. Always on the cutting edge, Philadelphia is a land of firsts. You may or not know this, but one of its firsts was to have the first general use computer in 1946. Is it any wonder then that a company there is building robots to navigate GPS denied environments and was begun by leaders in the Computer Vision space?  Beyond the Roomba If you consider the Roomba, the autonomous vacuum that sweeps up pet hair, dirt, and other unwanted product, how does it know where to go? How does it know to go under a table or chair or around a wall to the next room? How does it know to avoid the dog, cat, or you? On nearly the smallest scale, this little round machine is a personal version of simultaneous location and mapping (SLAM).  However, the computational geometry method of this mapping and localization technique extends in a wide variety of arcs. Here are a few to get you thinking: GPS Navigation SystemsSelf-driving carsUnmanned Aerial Vehicles (UAV)Autonomous Underwater Vehicles (AUV)DronesRobotsVirtual Reality (VR)Augmented Reality (AR)Monocular Camera...and more There’s even a version which is used in the Life Sciences called RatSLAM. But we’ll visit that in another article. The uses and benefits of this simultaneous location and mapping technique are exponential even with some of the challenges posed by Audio-Visual and Acoustic SLAM. What is SLAM? Essentially, it is the 21st century version of cartography or mapping. Except in this case, not only can it map the environment, but it can also locate your place in it. When you want to know where the nearest restaurant is, you simply type in ‘restaurant near me.’ And soon, a list appears on your phone with a list radiating from nearest location outward.  Imagine you’re lost on a hike, you manage to find signal, and soon your GPS is offering directions on which way to move toward civilization.  This is Simultaneous Localization and Mapping. It locates you, your vehicle, a robot, drone, unmanned aerial vehicle or self-driving car and puts people and things in the direction it thinks they want to go or should go to get to safety. While mapping is at the epicenter of SLAM Computer Vision Engineering, there are other elements within the field as well. But let’s begin with mapping. Topological maps offer a more precise representation of your environment and can therefore help ensure consistency on a global scale.  Just as humans do when giving directions, sensor models offer landmark-based approaches to make it easier to determine your location within the map’s structure and raw-data approaches which makes no assumptions. Landmarks such as wifi or radio beacons are some of the easiest to locate, but may not always be correct which is where the raw-data approach comes in to offer its two cents as a model of location function. Four Challenges of SLAM GPS sensors may not function properly in chaotic environments such as military conflict. }Non-static environments such as pedestrians or high traffic areas with multiple vehicles make locations difficult to pinpoint.In Acoustic SLAM, challenges include inactivity and environmental noise as well as echo. Sound localization requires a robot or machine to be equipped with a microphone in order to go in the requested direction. Five Additional Forms of SLAM Tactile (sensing by touch)RadarAcousticAudio-Visual (a function of Human-Robot interaction)Wifi (sensing strength of nearby access points) Ready to Explore a Robotics and Computer Vision Career? Whether you’re interested in a slam dunk career as a SLAM Engineer or looking for your first or next role in Big Data, Web Analytics, Advanced Analytics & Insight, Life Science Analytics, or Data Science, take a look at our current vacancies or get in touch one of our expert consultants to learn more.   For our West Coast Team, contact us at (415) 614 - 4999 or send an email to sanfraninfo@harnham.com.   For our Mid-West and East Coast teams contact us at (212) 796-6070 or send an email to newyorkinfo@harnham.com.

How Machine Learning and AI Can Help Us See the Forest for the Trees

In the early days of 2020, Johns Hopkins, the CDC, the WHO, and a host of other public organizations banded together in collaboration. They were on a mission to ensure the world had real-time information to a virus that would forever chance the course of this year and the years to come. Which is great for those families with a computer in every home or every person with smartphone access. But what about the rest of the world? How do you ensure those people without access to basic needs lives can be improved? A health non-profit using AI and Machine Learning is aiming to do just this. But the Data is vast and the sheer numbers of people need to be corralled by someone into something the computers can read and make decisions on. Who would have thought Public Research and Data Science would come together in such a manner and in such an important time? Three Benefits of Data Science and Machine Learning in Healthcare According to a seminar given in September 2019, two research scientists explained to the CDC the promises and challenges using Big Data for public health initiatives. After explaining a few definitions and making correlations, the focus was soon on the benefits. The focus of Machine Learning is to learn data patterns.From the initial focus, patterns can then be validated to ensure they make sense.These patterns and validation of patterns can find links between seemingly uncorrelated factors such as the relationship between one’s environment and their genetics. To the scientists working with these scenarios, the decisions seem simple. Yet, when it comes to explaining them to laymen like policymakers, there can be a shift in understanding. This shift can lead to arbitrary and different findings which can affect medical decision making. Why? Could it be using Random Forests in linking the data could be confusing?  Data Classification is Not as Cut-and-Dried as a Work Flow or Org Chart If someone shows us a work flow or organizational chart, we understand immediately each task to be done in which order or who reports to whom. But in trying to link uncorrelated bits of information using decision trees, it can seem more like abstract art, more subjective than direct. Yet, it is those correlations which answer the bigger questions brought to bear by Research Scientists, Public Health Researchers, the Data Scientists, and AI working together to see the bigger picture. Decision trees, ultimately, are the great classifier. But there are a few things which need to be in place first. Yet, in the random forest model it’s not just one decision tree, it’s many. This is definitely a case where, if you done right, you will see the forest for the trees and at the same time be able to determine patterns in those trees. A bit counter-intuitive, but this is what stretches our minds to see correlations and patterns we might not see otherwise, don’t you think? So, what do you need to help make predictions?  Two Important Needs to Help Make Predictions Predictive power. The features you employ should make some sense. For example, without a basic knowledge of cooking, you can’t just throw random items from your refrigerator into a pot and expect it taste good. Unless of course, you’re making soup and all you have to do is add water.The trees and their predictions should be uncorrelated. If you’ve ever seen M. Night Shymalan’s Lady in the Water, there’s a little boy who can ‘read’ cereal boxes and tell a coherent story. A predictive coherent story. This is the layman’s version of random forests, their predictive nature, and ultimately, the scientists who can ‘read’ and explain the patterns. If you're looking for your first or next role in Big Data, Web Analytics, Marketing & Insight, Life Science Analytics, and more, check out our current vacancies or contact one of our recruitment consultants to learn more.   For our West Coast Team, contact us at (415) 614 - 4999 or send an email to sanfraninfo@harnham.com.   For our Mid-West and East Coast teams contact us at (212) 796-6070 or send an email to newyorkinfo@harnham.com.  

Recently Viewed jobs