Data engineers, the unsung heroes of data science

Joshua Carter our consultant managing the role
Posting date: 5/21/2018 6:33 PM
Before you can build a house, you need a blueprint of its design and schematics. When you begin construction, you must first lay the foundation upon which it will be built. Tangible products taken step-by-step to create first a house and then a home. However, in the world of data science, companies seem to have skipped the blueprints and foundational aspects and gone straight for the aesthetics. But, how do you decorate a house before it’s built?

A house without a foundation becomes a house of cards and the same is true of data analysis. Before the data scientists can process and analyse data, first must come the engineers. The Data Engineers who lay the digital foundation and set the parameters, who create the data lakes and platforms, so the data analysts have something to make sense of. As high as the demand is for data scientists, the demand and the need, is even greater for data engineers, yet a shortage remains.


Where are the Data Engineers?

Data engineering jobs outnumber data scientist jobs nearly four to one according to a quick search on job boards such as Glassdoor and Indeed. Yet, the complex technical nature of data engineering to support data scientists takes more than a degreed education. Unlike data analysts, data scientists, and other data professionals who can land a mid-level job directly out of university, data engineers cannot.

Ultimately, it takes between five to ten years for mid-level data engineers to gain enough experience for practical application. As such, systems do not yet exist in schools and universities to supplement data engineers undergraduate or postgraduate degrees in preparation for real life work experience in the field. However, once the experience is gained, it can take a company who has hired a data engineer up to two years to catch up with its competition.

With the pace of change in the tech world, this can be detrimental to both the business and the data science teams. Therein lies the Catch-22, data engineers must have experience before they can be hired, but there is no way to learn outside of hands-on, real life application.


Why You Need to Add a Data Engineer to Your Data Science Team

A data science team is not complete without a data engineer. Why? Because just like building a house, grand schemes and ideas to solve complex business problems, must first have a foundation. Data engineers are that foundational support of experts who design, build, and maintain data-based systems and organizational operations.

Not only do data engineers lay the foundation upon which data can be built, analysed, and ultimately translated to business professionals, it must also be timely.  Timely data leads to more data and better predictions.

Data engineers are not completely siloed from data science teams, they are also responsible for deploying the code and models that are written by data scientists. For more on the reasons data engineering is more important than data science for companies today, check out this article from Captech Consulting.

Data Science Team Seeks Data Engineer

Companies know data drives business and they know the importance of data professionals. However, they may mistakenly assume either that their data teams can pick up engineering experience as they work their way through a project or they simply assume the titles are interchangeable.

In the world of data engineering, there is no entry level job. Experience trumps education in this field.

Like the once siloed data science team now integrated across the business with sales, marketing, and advertising departments, so must the role of data engineer be integrated. This is not a marriage of convenience, but of necessity in order to stay ahead of the competition. Together, your fully integrated data teams – data engineering and data science now on equal footing - will be able to help your business reach better predictions faster, making you a voice of authority in your discipline.

Your Turn: Route to the Role of Data Engineer

The route to the role of Data Engineer may seem daunting with the catch-22 that experience supersedes education. So, in the spirit of collaboration, we thought we’d ask for your thoughts and opinions on a few items of interest such as how we can educate aspiring data engineers and get them into companies faster. What kind of cross-training programs might businesses and schools employ to fill the shortage? What other backgrounds are we overlooking as businesses seek to find and engage this most critical role within their data science teams?

According to the website Datanami, 2018 will be the year of the data engineer. If this is you, then we may have a role for you.

Related blog & news

With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.

Visit our Blogs & News portal or check out the related posts below.

Weekly News Digest - 18th-22nd Jan 2021

This is Harnham’s weekly news digest, the place to come for a quick breakdown of the week’s top news stories from the world of Data & Analytics. KDNuggets: 20 core Data Science concepts for beginners The field of Data Science is one that continuously evolves. For Data Scientists, this means constantly learning and perfecting new skills, keeping up to date with crucial trends and filling knowledge gaps.  However, there are a core set of concepts that all Data Scientists will need to understand throughout their career, especially at the start. From Data Wrangling to Data Imputation, Reinforcement Learning to Evaluation Metrics, KDNuggets outlines 20 of the key basics needed.  A great article if you’re just starting out and want to grasp the essentials or, if you’re a bit further up the ladder and would appreciate a quick refresh.  Read more here.  FinExtra: 15 DevOps trends to watch in 2021 As a direct response to the COVID-19 pandemic, there is no doubt that DevOps has come on leaps and bounds in the past year alone. FinExtra hears from a wide range of specialists within the sector, all of whom give their opinion on what 2021 holds for DevOps.  A few examples include: Nirav Chotai, Senior DevOps Engineer at Rakuten: “DataOps will definitely boom in 2021, and COVID might play a role in it. Due to COVID and WFH situation, consumption of digital content is skyrocket high which demands a new level of automation for self-scaling and self-healing systems to meet the growth and demand.” DevOps Architect at JFrog: “The "Sec'' part of DevSecOps will become more and more an integral part of the Software Development Lifecycle. A real security "shift left" approach will be the new norm.” CTO at International Technology Ventures: “Chaos Engineering will become an increasingly more important (and common) consideration in the DevOps planning discussions in more organizations.” Read the full article here.  Towards Data Science: 3 Simple Questions to Hone Python Skills for Beginners in 2021 Python is one of the most frequently used data languages within Data Science but for a new starter in the industry, it can be incredibly daunting. Leihua Yea, a PHD researcher at the University of California in Machine Learning and Data Science knows all too well how stressful can be to learn. He says: “Once, I struggled to figure out an easy level question on Leetcode and made no progress for hours!” In this piece for Towards Data Science, Yea gives junior Data Scientists three top pieces of advice to help master the basics of Python and level-up their skills. Find out what that advice is here.  ITWire: Enhancing customer experiences through better data management From the start of last year, businesses around the globe were pushed into a remote and digital way of working. This shift undoubtedly accelerated the use of the use of digital and data to keep their services as efficient and effective as possible.  Derak Cowan of Cohesity, the Information Technology company, talks with ITWire about the importance of the continued use of digital transformation and data post-pandemic, even after restrictions are relaxed and we move away from this overtly virtual world.  He says: “Business transformation is more than just a short-term tactic of buying software. If you want your business to thrive in the post-COVID age, it will need to place digital transformation at the heart of its business strategy and identify and overcome the roadblocks.” Read more about long-term digital transformation for your business here.  We've loved seeing all the news from Data and Analytics in the past week, it’s a market full of exciting and dynamic opportunities. To learn more about our work in this space, get in touch with us at info@harnham.com.

Using Data Ethically To Guide Digital Transformation

Over the past few years, the uptick in the number of companies putting more budget behind digital transformation has been significant. However, since the start of 2020 and the outbreak of the coronavirus pandemic, this number has accelerated on an unprecedented scale. Companies have been forced to re-evaluate  their systems and services to make them more efficient, effective and financially viable in order to stay competitive in this time of crisis. These changes help to support internal operational agility and learn about customers' needs and wants to create a much more personalised customer experience.  However, despite the vast amount of good these systems can do for companies' offerings, a lot of them, such as AI and machine learning, are inherently data driven. Therefore, these systems run a high risk of breaching ethical conducts, such as privacy and security leaks or serious issues with bias, if not created, developed and managed properly.  So, what can businesses do to ensure their digital transformation efforts are implemented in the most ethical way possible? Implement ways to reduce bias From Twitter opting to show a white person in a photo instead of a black person, soap dispensers not recognising black hands and women being perpetually rejected for financial loans; digital transformation tools, such as AI, have proven over the years to be inherently biased.  Of course, a computer cannot be decisive about gender or race, this problem of inequality from computer algorithms stems from the humans behind the screen. Despite the advancements made with Diversity and Inclusion efforts across all industries, Data & Analytics is still a predominantly white and male industry. Only 22 per cent of AI specialists are women, and an even lower number represent the BAME communities. Within Google, the world’s largest technology organisation, only 2.5 per cent of its employees are black, and a similar story can be seen at Facebook and Microsoft, where only 4 per cent of employees are black.  So, where our systems are being run by a group of people who are not representative of our diverse society, it should come as no surprise that our machines and algorithms are not representative either.  For businesses looking to implement AI and machine learning into their digital transformation moving forward, it is important you do so in a way that is truly reflective of a fair society. This can be achieved by encouraging a more diverse hiring process when looking for developers of AI systems, implementing fairness tests and always keeping your end user in mind, considering how the workings of your system may affect them.  Transparency Capturing Data is crucial for businesses when they are looking to implement or update digital transformation tools. Not only can this data show them the best ways to service customers’ needs and wants, but it can also show them where there are potential holes and issues in their current business models.  However, due to many mismanagements in past cases, such as Cambridge Analytica, customers have become increasingly worried about sharing their data with businesses in fear of personal data, such as credit card details or home addresses, being leaked. In 2018, Europe devised a new law known as the General Data Protection Regulation, or GDPR, to help minimise the risk of data breaches. Nevertheless, this still hasn’t stopped all businesses from collecting or sharing data illegally, which in turn, has damaged the trustworthiness of even the most law-abiding businesses who need to collect relevant consumer data.  Transparency is key to successful data collection for digital transformation. Your priority should be to always think about the end user and the impact poorly managed data may have on them. Explain methods for data collection clearly, ensure you can provide a clear end-to-end map of how their data is being used and always follow the law in order to keep your consumers, current and potential, safe from harm.  Make sure there is a process for accountability  Digital tools are usually brought in to replace a human being with qualifications and a wealth of experience. If this human being were to make a mistake in their line of work, then they would be held accountable and appropriate action would be taken. This process would then restore trust between business and consumer and things would carry on as usual.  But what happens if a machine makes an error, who is accountable?  Unfortunately, it has been the case that businesses choose to implement digital transformation tools in order to avoid corporate responsibility. This attitude will only cause, potentially lethal, harm to a business's reputation.  If you choose to implement digital tools, ensure you have a valid process for accountability which creates trust between yourself and your consumers and is representative of and fair to every group in society you’re potentially addressing.  Businesses must be aware of the potential ethical risks that come with badly managed digital transformation and the effects this may have on their brands reputation. Before implementing any technology, ensure you can, and will, do so in a transparent, trustworthy, fair, representative and law-abiding way.  If you’re in the world of Data & Analytics and looking to take a step up or find the next member of your team, we can help. Take a look at our latest opportunities or get in touch with one of our expert consultants to find out more.

RELATED Jobs

Salary

£40000 - £65000 per annum + Benefits, Training Budget

Location

London

Description

A leading Fintech firm who specialise in the crytocurrency industry are looking to hire skilled Full Stack Engineers to join their team.

Salary

£110000 - £120000 per annum

Location

London

Description

Head of Decision Science role with a London-based fintech.

Salary

£50000 - £60000 per annum + benefits

Location

London

Description

This is the perfect opportunity for a Senior Data Scientist who wants more autonomy in their role and wants to work in a fast-paced environment.

recently viewed jobs