A data janitor, the sexiest job of the 21st century

Daniel Lewis our consultant managing the role
Posting date: 7/17/2013 3:19 PM

A job invented in Silicon Valley is going mainstream as more industries try to gain an edge from big data.

The job description “data scientist” didn’t exist five years ago. No one advertised for an expert in data science, and you couldn’t go to school to specialize in the field. Today, companies are fighting to recruit these specialists, courses on how to become one are popping up at many universities, and the Harvard Business Review even proclaimed that data scientist is the “sexiest” job of the 21st century.

Data scientists take huge amounts of data and attempt to pull useful information out. The job combines statistics and programming to identify sometimes subtle factors that can have a big impact on a company’s bottom line, from whether a person will click on a certain type of ad to whether a new chemical will be toxic in the human body.

While Wall Street, Madison Avenue, and Detroit have always employed data jockeys to make sense of business statistics, the rise of this specialty reflects the massive expansion in the scope and variety of data now available in some industries, like those that collect data about customers on the Web. There’s more data than individual managers can wrap their minds around—too much of it, changing too fast, to be analyzed with traditional approaches.

As smartphones promise to become a new source of valuable data to retailers, for example, Walmart is competing to bring more data scientists on board and now advertises for dozens of open positions, including “Big Fast Data Engineer.” Sensors in factories and on industrial equipment are also delivering mountains of new data, leading General Electric to hire data scientists to analyze these feeds.

The term “data science” was coined in Silicon Valley in 2008 by two data analysts then working at LinkedIn and Facebook (see “What Facebook Knows”). Now many startups are basing their businesses on their ability to analyze large quantities of data—often from disparate sources. ZestFinance, for example, has a predictive model that uses hundreds of variables to determine whether a lender should offer high-risk credit. The underwriting risk it achieves is 40 percent lower than that borne by traditional lenders, says ZestFinance data scientist John Candido. “All data is credit data to us,” he says.

Data scientist has become a popular job title partly because it has helped pull together a growing number of haphazardly defined and overlapping job roles, says Jake Klamka, who runs a six-week fellowship to place PhDs from fields like math, astrophysics, and even neuroscience in such jobs. “We have anyone who works with a lot of data in their research,” Klamka says. “They need to know how to program, but they also have to have strong communications skills and curiosity.”

The best data scientists are defined as much by their creativity as by their code-writing prowess. The company Kaggle organizes contests where data scientists compete to find the best way to make sense of massive data sets (see “Startup Turns Data Crunching into a High-Stakes Sport”). Many of the top Kagglers (there are 88,000 registered on the site) come from fields like astrophysics or electrical engineering, says CEO Anthony Goldbloom. The top-ranked participant is an actuary in Singapore.

Universities are starting to respond to the job market’s needs. Stanford University plans to launch a data science master’s track in its statistics department, says department chair Guenther Walther. A dozen or so other programs have already been started at schools including Columbia University and the University of California, San Francisco. Cloudera, a company that sells software to process and organize large volumes of data, announced in April that it would work with seven universities to offer undergraduates professional training on how to work with “big data” technologies.

Cloudera’s education program director, Mark Morissey, says a skills shortage is looming and that “the market is not going to grow at the rate it currently wants to.” That has driven salaries up. In Silicon Valley, salaries for entry-level data scientists are around $110,000 to $120,000.

Others think the trend could create a new area of outsourcing. Shashi Godbole, a data scientist in Mumbai, India, who is ranked 20th on Kaggle’s scoreboard, recently completed a Kaggle-arranged hourly consulting gig, a new business the platform is getting into. He did work for a tiny health advocacy nonprofit located in Chicago and is now bidding on more jobs (he earns $200 per hour, and Kaggle collects $300 an hour). His Kaggle work is part time for now, but he says it’s possible that it could be his major source of income one day.

To the data scientists themselves, the job is certainly less sexy than it’s being made out to be. Josh Wills, a senior director of data science at Cloudera, says most of the time it involves cleaning up messy data—for example, by putting it in the right columns and sorting it.

“I’m a data janitor. That’s the sexiest job of the 21st century,” he says. “It’s very flattering, but it’s also a little baffling.”


Click here for the article on the web.

Related blog & news

With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.

Visit our Blogs & News portal or check out the related posts below.

Weekly News Digest: 5th - 9th April 2021

This is Harnham’s weekly news digest, the place to come for a quick breakdown of the week’s top news stories from the world of Data & Analytics.    The Drum: How data visualisation turns marketing metrics into business intelligence Gathering data is just one part of a marketer’s job but having the ability to turn this data into something visually stunning, informative and easy to use is another skill completely.  Marketers, on the whole, are extremely visual learners along with around 65 per cent of the population. Most of us are able to absorb data more effectively if the information being presented to us is done in such a way that is pleasing to the eye. And this is why Data Visualisation exists; it allows us to group, organise and represent data sets in a way that allows us to analyse larger quantities of information, compare findings, spot patterns and extract meaningful insights from raw data. Not only does Data Visualisation allow us to learn more effectively, but we can then turn this understanding into much broader and deeper Business Intelligence.  To read more on the positives of Data Visualisation and how to translate this into meaningful Business Intelligence, click here.  ZDNet: The five Vs of customer data platforms According to ZDNet, Customer Data Platforms (CDPs) are the hottest marketing technology today, offering companies a way to capture, unify, activate, and analyse customer data. Research done in 2020 by Salesforce showed that CDPs were among the highest priority investments for CMOs in 2021. If you’re planning to invest in a CDP this year, what five critical things do you need to think about when developing a successful strategy? ZDNet tells all.  Velocity - Your systems need to manage a high volume of data, coming in at various speeds.Variety - Every system has a slightly different main identifier or "source of truth," and the goal is to have one. This starts with being able to provision a universal information model, or schema, which can organize all of the differently labelled data into a common taxonomy. Veracity - Companies must ensure they can provision a single, persistent profile for every customer or account.Volume - It has been theorized that, in 2020, 1.7MB of data was created every second for every person on Earth. If you want to use those interactions to form the basis of your digital engagement strategy, you have to store them somewhere. Value - Once you have a clean, unified set of scaled data – now’s the time to think about how to derive value from it.  To learn more, read the full article here. Towards Data Science: How to Prepare for Business Case Interview Questions as a Data Scientist When you think of Data Science, the first thing that comes to mind will be technical knowledge of coding languages and fantastic statistical ability; softer skills such as communication and exceptional business knowledge may be overlooked. However, this is where many budding Data Scientists trip up. It is these softer skills and business acumen that sets brilliant candidates apart from others.  But how, when not usually taught at university, do you gather the business knowledge that will set you apart from the competition and showcase it in interview? Towards Data Science shares a few key pointers. Build a foundation – Brush up on your business basics. Research project management methodologies, organisational roles, tools, tech and metrics - all are crucial here. Company specifics – Research your company and its staff. Make sure your knowledge is tailored to the company you’re interviewing for. Products – This is where you’ll stand out above the rest if you get it right. The more you can know the ins and outs of products and metrics at the company, the more prepared you will be to answer business case questions. Read the full article here.  Harnham: Amped up Analytics: Google Analytics 4 Joshua Poore, one of our Senior Managers based in the US West division of Harnham, explores Google’s new and improved data insight capabilities, predominantly across consumer behaviours and preferences.  This exciting new feature of Google was born in the last quarter of 2020 and has now fully come into its infancy, and it’s an exciting time for Data & Analytics specialists across the globe. Joshua explores four key advantages of Google Analytics 4.0. Combined data and reporting - Rather than focusing on one property (web or app) at a time, this platform allows marketers to track a customer’s journey more holistically. A focus on anonymised data - By crafting a unified user journey centred around machine learning to fill in any gaps, marketers and businesses have a way to get the information they need without diving into personal data issues.Predictive metrics - Using Machine Learning to predict future transactions is a game changer for the platform. These predictive metrics for e-commerce sites on Google properties allow for targeted ads to visitors who seem most likely to make a purchase within one week of visiting the site. Machine Learning driven insights - GA4 explains it “has machine learning at its core to automatically surface helpful insights and gives you a complete understanding of your customers across devices and platforms.” Machine Learning-driven insights include details that elude human analysts.  To read Joshua’s full insights on GA4, click here.  We've loved seeing all the news from Data & Analytics in the past week, it’s a market full of exciting and dynamic opportunities. To learn more about our work in this space, get in touch with us at  info@harnham.com.   

How Are Life Science Analytics Innovating For A Post-Pandemic World?

As COVID-19 unfolded, the Life Science discipline was thrust into the spotlight. The pandemic has shown the extent of the Life Sciences industry’s ability to innovate and collaborate. When facing a new disease, Life Sciences adapted quickly. The rate at which pharmaceutical companies successfully developed COVID-19 vaccines was unprecedented. Approaches that may have previously been labelled risky, were implemented to manage changing demand and deliver increased throughput. Embracing digitisation and innovation enabled organisations to adapt and accept constant change. The pandemic has shown just how well the Life Science industry is able to innovate and develop according to changing demands. As the world looks to the future, how can Life Sciences continue to remain dynamic?  Cloud data The cloud is becoming a CEO agenda item for Life Sciences. The cloud has the potential to enable more effective and profitable ways of doing business throughout the life science industry. It offers a powerful, secure platform for innovation and collaboration, with immense transactional power and data throughput. The cloud is necessary for creating data enablement, ensuring the right data is in the right place at the right time. It enables companies to innovate faster, work at a greater scale and increase collaboration.  Virtual communication According to Accenture, sixty-one per cent of healthcare professionals now communicate more with pharmaceutical sale reps than before the pandemic. 87 per cent now want either purely virtual or a blend of in-person and virtual meetings post-pandemic.  New means of virtual communication have created new opportunities in the industry. Digitisation allows for increased communication with trial participants and new opportunities to educate people about their conditions and care. There was already a growing trend for virtual healthcare interactions, but the pandemic has shifted this is into becoming the new normal. Collaboration ecosystem COVID-19 has led to increasing collaboration between companies. The race for a vaccine has seen cooperation evolve at an extraordinary pace. Companies who usually compete are now coming together to share data and cooperate. Organisations have created collaborative agreements in a matter of weeks; partnerships that pre-pandemic would have taken years to create.  The industry is now seeing the value of ecosystem partnership. The success of organisations post-pandemic relies on this continued collaboration.  AI and blockchain technology COVID-19 has increased the focus on AI in Life Sciences. Yet, Life Sciences have only scratched the surface of AI capabilities. AI has the potential to transform the industry; it can design novel compounds, identify genetic targets, expedite drug development and improve supply chains. The use of AI in Life Sciences is expected to continue to grow and organisations will need to focus ever more on merging human knowledge and AI capabilities.  Blockchain is also becoming increasingly trusted in Life Sciences. Its ability to create tamper-proof records makes it a key resource in increasing patient trust in remote clinical trials. As more of the industry understands the skills needed to use blockchain and increases collaboration, blockchain has the potential to become ubiquitous in Life Sciences. The pandemic has shown the importance of digital technology in Life Sciences. Digitisation increases efficiency and, collaboration, and also helps create a framework for future scientific discoveries. As we look towards a post-pandemic world, a successful Life Science industry must continue to embrace this mindset of innovation, collaboration and dynamism.  If you’re in the world of Data & Analytics and looking to take a step up or find the next member of your team, we can help. Take a look at our latest opportunities or get in touch with one of our expert consultants to find out more.

RELATED Jobs

Salary

£40000 - £55000 per annum

Location

London

Description

SUPPLY ANALYST UP TO £55,000 + BENEFITS LONDON One of the fastest growing food companies in the UK is looking for a Supply Analyst to join their team.

Salary

£35000 - £70000 per annum + bonus and benefits

Location

London

Description

A high-growth Food/Tech company are looking for multiple Supply Chain Analysts.

Salary

€180000 - €200000 per annum

Location

Paris, Île-de-France

Description

Cet éditeur de logiciel SaaS à destination de la grande distribution recherche un VP Product Manager

recently viewed jobs