How Machine Learning and AI Can Help Us See the Forest for the Trees

Tim Jonas our consultant managing the role
Author: Tim Jonas
Posting date: 10/1/2020 2:09 PM
In the early days of 2020, Johns Hopkins, the CDC, the WHO, and a host of other public organizations banded together in collaboration. They were on a mission to ensure the world had real-time information to a virus that would forever chance the course of this year and the years to come. Which is great for those families with a computer in every home or every person with smartphone access. But what about the rest of the world?

How do you ensure those people without access to basic needs lives can be improved? A health non-profit using AI and Machine Learning is aiming to do just this. But the Data is vast and the sheer numbers of people need to be corralled by someone into something the computers can read and make decisions on. Who would have thought Public Research and Data Science would come together in such a manner and in such an important time?

Three Benefits of Data Science and Machine Learning in Healthcare


According to a seminar given in September 2019, two research scientists explained to the CDC the promises and challenges using Big Data for public health initiatives. After explaining a few definitions and making correlations, the focus was soon on the benefits.

  1. The focus of Machine Learning is to learn data patterns.
  2. From the initial focus, patterns can then be validated to ensure they make sense.
  3. These patterns and validation of patterns can find links between seemingly uncorrelated factors such as the relationship between one’s environment and their genetics.

To the scientists working with these scenarios, the decisions seem simple. Yet, when it comes to explaining them to laymen like policymakers, there can be a shift in understanding. This shift can lead to arbitrary and different findings which can affect medical decision making. Why?

Could it be using Random Forests in linking the data could be confusing? 

Data Classification is Not as Cut-and-Dried as a Work Flow or Org Chart


If someone shows us a work flow or organizational chart, we understand immediately each task to be done in which order or who reports to whom. But in trying to link uncorrelated bits of information using decision trees, it can seem more like abstract art, more subjective than direct. Yet, it is those correlations which answer the bigger questions brought to bear by Research Scientists, Public Health Researchers, the Data Scientists, and AI working together to see the bigger picture.

Decision trees, ultimately, are the great classifier. But there are a few things which need to be in place first. Yet, in the random forest model it’s not just one decision tree, it’s many. This is definitely a case where, if you done right, you will see the forest for the trees and at the same time be able to determine patterns in those trees. A bit counter-intuitive, but this is what stretches our minds to see correlations and patterns we might not see otherwise, don’t you think?

So, what do you need to help make predictions? 

Two Important Needs to Help Make Predictions


  1. Predictive power. The features you employ should make some sense. For example, without a basic knowledge of cooking, you can’t just throw random items from your refrigerator into a pot and expect it taste good. Unless of course, you’re making soup and all you have to do is add water.
  2. The trees and their predictions should be uncorrelated. If you’ve ever seen M. Night Shymalan’s Lady in the Water, there’s a little boy who can ‘read’ cereal boxes and tell a coherent story. A predictive coherent story. This is the layman’s version of random forests, their predictive nature, and ultimately, the scientists who can ‘read’ and explain the patterns.

If you're looking for your first or next role in Big Data, Web Analytics, Marketing & Insight, Life Science Analytics, and more, check out our current vacancies or contact one of our recruitment consultants to learn more.  

For our West Coast Team, contact us at (415) 614 - 4999 or send an email to sanfraninfo@harnham.com.  

For our Mid-West and East Coast teams contact us at (212) 796-6070 or send an email to newyorkinfo@harnham.com.  


Related blog & news

With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.

Visit our Blogs & News portal or check out our recent posts below.