Route to the role of Data Engineer

Joshua Carter our consultant managing the role
Posting date: 5/21/2018 7:22 PM
Do you like breaking things down to see how they work? Do you want to build something that helps solves problems and can make lives better? Are you a puzzle solver curious about the world around you with a knack for mathematics? Do you prefer to work behind the scenes or front of stage? If you want to be the person behind the curtain, then this is your year. The year of the Data Engineer is here. 

In last week’s article, we talked about Data Engineer as the unsung hero of the data science world and briefly touched on route to the role of engineer. Though experience supersedes education, you do need the basics – a bachelor’s degree in computer science, data science, applied math, physics, statistics, software/computer engineering which can lead to a Master’s in Data Engineering and to cement your knowledge – fellowships and professional organisations are now available around the world. In today’s article, we’ll cover a few options. 

Lay Your Educational Foundation

Computer Science, Data Science, and Engineering programs abound in university today, but no school can really teach big data skills. It’s too focused. Most schools today offer general purpose tech education with a focus on web development or backend systems. And here begins that Catch-22. Though experience supersedes education, you still need a framework from which to build. 

More often than not, if you type Data Engineer into Google looking for education programs, you’ll get undergrad opportunities for Data Science. However, that’s not to say a Bachelor’s in Data Science can’t lead you to a Master’s in Data Engineering. So, how do you get from point A to point B? Here are a few suggestions: 

Beef up your skills with specific certifications for the languages businesses need – Scala, Python, and Java 

Take courses in data engineering technology: Hadoop, Spark, AWS, GCP, Azure etc.  

Join a professional organization for Data Engineers such as The Data Warehousing Institute  (TDWI) or the Institution of Engineering and Technology (IET) – here you’ll find articles, resources, and a network of mentors ready to offer advice and suggestions. 

Apply for a fellowship  with ASI Data Science – an 8-10 week intensive project with one of their partner companies to solve real-world business problems using Data Science or Data Engineering skills. If you’re a postgrad or higher, this a perfect opportunity to build your portfolio. 

Boost Your Data Engineering Resume with These Tips

In the world of data engineering, it’s important to highlight the details. 
  • Be specific: Companies will be more interested in interviewing you if you can clearly outline why/what you have used different technology for. Keep this punchy and concise, and outline your in-put with said technologies
  • Outline projects you’ve worked on 
  • Detail the technologies you’ve used 

David Bianco, a Data Engineer with Urthecast, offers the following advice to data engineering students. 
  • Be fluent in the languages and tools you use to get the job 
  • Understand the concepts behind what you’re doing 
  • Get involved with a community –, hackathons, and other groups in your area are great places to get started. 

If you’re interested in switching your career  to Big Data, check out Jessen Anderson’s new e-book, The Ultimate Guide to Switching Careers to Big Data -- Upgrading Your Skills for the Big Data Revolution. 

Your Turn: Route to the Role of Data Engineer

Our data driven world moves at lightning speed and it can be hard to keep up. If you’re a Data Engineer, we want to hear your story. 

What was your route to the role?

What kind of cross-training programs might businesses and schools employ for future Data Engineers?
What other backgrounds are we overlooking as businesses seek to find and engage this most critical role within their data science teams?
What can we, as recruiters do to engage qualified candidates ready for their next role in the world of data and analytics? 

Related blog & news

With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.

Visit our Blogs & News portal or check out the related posts below.

Using Data Ethically To Guide Digital Transformation

Over the past few years, the uptick in the number of companies putting more budget behind digital transformation has been significant. However, since the start of 2020 and the outbreak of the coronavirus pandemic, this number has accelerated on an unprecedented scale. Companies have been forced to re-evaluate  their systems and services to make them more efficient, effective and financially viable in order to stay competitive in this time of crisis. These changes help to support internal operational agility and learn about customers' needs and wants to create a much more personalised customer experience.  However, despite the vast amount of good these systems can do for companies' offerings, a lot of them, such as AI and machine learning, are inherently data driven. Therefore, these systems run a high risk of breaching ethical conducts, such as privacy and security leaks or serious issues with bias, if not created, developed and managed properly.  So, what can businesses do to ensure their digital transformation efforts are implemented in the most ethical way possible? Implement ways to reduce bias From Twitter opting to show a white person in a photo instead of a black person, soap dispensers not recognising black hands and women being perpetually rejected for financial loans; digital transformation tools, such as AI, have proven over the years to be inherently biased.  Of course, a computer cannot be decisive about gender or race, this problem of inequality from computer algorithms stems from the humans behind the screen. Despite the advancements made with Diversity and Inclusion efforts across all industries, Data & Analytics is still a predominantly white and male industry. Only 22 per cent of AI specialists are women, and an even lower number represent the BAME communities. Within Google, the world’s largest technology organisation, only 2.5 per cent of its employees are black, and a similar story can be seen at Facebook and Microsoft, where only 4 per cent of employees are black.  So, where our systems are being run by a group of people who are not representative of our diverse society, it should come as no surprise that our machines and algorithms are not representative either.  For businesses looking to implement AI and machine learning into their digital transformation moving forward, it is important you do so in a way that is truly reflective of a fair society. This can be achieved by encouraging a more diverse hiring process when looking for developers of AI systems, implementing fairness tests and always keeping your end user in mind, considering how the workings of your system may affect them.  Transparency Capturing Data is crucial for businesses when they are looking to implement or update digital transformation tools. Not only can this data show them the best ways to service customers’ needs and wants, but it can also show them where there are potential holes and issues in their current business models.  However, due to many mismanagements in past cases, such as Cambridge Analytica, customers have become increasingly worried about sharing their data with businesses in fear of personal data, such as credit card details or home addresses, being leaked. In 2018, Europe devised a new law known as the General Data Protection Regulation, or GDPR, to help minimise the risk of data breaches. Nevertheless, this still hasn’t stopped all businesses from collecting or sharing data illegally, which in turn, has damaged the trustworthiness of even the most law-abiding businesses who need to collect relevant consumer data.  Transparency is key to successful data collection for digital transformation. Your priority should be to always think about the end user and the impact poorly managed data may have on them. Explain methods for data collection clearly, ensure you can provide a clear end-to-end map of how their data is being used and always follow the law in order to keep your consumers, current and potential, safe from harm.  Make sure there is a process for accountability  Digital tools are usually brought in to replace a human being with qualifications and a wealth of experience. If this human being were to make a mistake in their line of work, then they would be held accountable and appropriate action would be taken. This process would then restore trust between business and consumer and things would carry on as usual.  But what happens if a machine makes an error, who is accountable?  Unfortunately, it has been the case that businesses choose to implement digital transformation tools in order to avoid corporate responsibility. This attitude will only cause, potentially lethal, harm to a business's reputation.  If you choose to implement digital tools, ensure you have a valid process for accountability which creates trust between yourself and your consumers and is representative of and fair to every group in society you’re potentially addressing.  Businesses must be aware of the potential ethical risks that come with badly managed digital transformation and the effects this may have on their brands reputation. Before implementing any technology, ensure you can, and will, do so in a transparent, trustworthy, fair, representative and law-abiding way.  If you’re in the world of Data & Analytics and looking to take a step up or find the next member of your team, we can help. Take a look at our latest opportunities or get in touch with one of our expert consultants to find out more.

Defragmenting Data Analytics

This week's guest blog is written by Moray Barclay.                    Around 20 years ago I was showing some draft business plans with cashflow projections to my new boss. His name was Marc Destrée and I concluded by saying I’d like to get the finance department involved. “No”, Marc replied. He paused for several seconds, looked up from his desk, and explained "Do the internal rate of return. Then we discuss. Then we give it to finance." He was right of course, for three reasons which together represent best practice. Firstly, it cemented the separate accountabilities between the different job functions responsible for the business case and financial governance. Secondly, there were no technical barriers to separating the “cashflow creation process” and the “P&L creation process” as everyone in the organisation used the same product: Excel. Thirdly, it assigned the right skills to activities. Today, organisations have no equivalent best practice upon which to build their data analytics capability. The lack of best practice is caused by fragmentation: fragmentation of job functions, fragmentation of products, and fragmentation of skills. This is not necessarily a bad thing: fragmentation drives innovation, and those organisations who get it right will gain huge competitive advantage. But the application of best practice mitigates against unnecessary fragmentation and hence unnecessary inefficiencies. So how could best practice be applied to an organisation’s data analytics capability? In other words, how we do defragment data job functions, data products and data skills? Defragmenting data job functions A good starting point to understanding best practice for data job functions is the informative and well-written publication “The scientist, the engineer and the warehouse”, authored by the highly respected Donald Farmer of TreeHive Strategy. He includes references to four job functions: (i) the data scientist, (ii) the data engineer, (iii) the business intelligence analyst and (iv) the departmental end user.  (i) The data scientist: The accountability of the data scientist is to build data science models using their skills in maths and coding to solve business problems. In addition to using open source technologies, such as python and R, data scientists can and do use data science platforms such as Knime which enable them to spend more time on maths and less time on coding - more on data science platforms later. (ii) The data engineer: The accountability of the data engineer is to build robust and scalable data pipelines which automate the movement and transformation of data across the organisation’s infrastructure, using their skills in database engineering, database integration, and a technical process called extract/transform/load (ETL) and its variants – more on ETL production platforms later. (iii) The business intelligence (BI) analyst: Donald Farmer’s publication does not address the accountabilities of the BI analyst in any detail because that is not its focus. Unlike the clearly defined roles of data scientists and data engineers, there are no best practice descriptions for the role of BI analyst. Typical accountabilities often include designing data visualisations from existing datasets, building these visualisations into reports or online dashboards and automating their production, and configuring end users to ensure they only have access to data that they are approved to see. Beyond these core accountabilities, BI analysts sometimes create entirely new datasets by building complex analytic models to add value to existing datasets, using either a suitable open source technology (such as python, but used in a different way to data scientists) or a data analytic platform such as Alteryx which enables the creation of code-free analytic models. One final point - a BI analyst might also build data science models, albeit typically more basic ones than those built by data scientists. BI analysts will inevitably become more like data scientists in the future driven by their natural curiosity and ambitions, vendors creating combined data science platforms and data analytic platforms, and organisations wanting to benefit from the integration of similar functions. (iv) The departmental end-user: A departmental end-user is generally the most data-centric person within a department: it might be a sales operations professional within a sales department for example. I am told that when Excel was first introduced into organisations in the 1980’s, there would be a “go-to Excel expert”; self-evidently over time everyone learned how to use it. I was there when CRM systems like and Netsuite appeared 20 years later, and the same thing happened: initially there would be one or two pioneers, but eventually everyone learned to use it. The same democratisation is happening and will continue to happen with business intelligence. In the same way that CRM and Excel are used by everyone who needs to, soon anyone will be able to build their own data visualisations and reports to help identify and solve their own problems. In some organisations such as BP this is already well-established. And why stop there? If a departmental end-user can model different internal rates of return and create visualisations, then why should they not apply their own data science techniques to their own datasets? But this can only happen if the role of the BI analyst has an accountability for democratisation, in addition to those mentioned earlier.In summary, the following is a list of best practice accountabilities for the BI analyst: (1) Build and automate the initial set of business intelligence reports and visualisations (2) Create the data governance framework to enable self-service by departmental end-users (3) Act as the initial go-to business intelligence expert (4) Evangelise a data-driven culture and mentor those who want to become proficient in self-service (5) Deploy resources which over time make redundant the role of a go-to business intelligence expert (6) Over time, increase time devoted to creating innovative datasets by building complex analytic models which add value to existing datasets - using open source technologies and/or a data analytic platform (7) Work with the data science function in such a way that over time the data science function and the BI function can be merged The above best practice eventually results in the role of the BI analyst, or the BI analyst team, becoming redundant, much in the way that the role of a dedicated Excel specialist died out in the mid-1980’s. As mentioned earlier, as BI analysts will move into data science, this should not result in people losing their jobs.  Defragmenting data products Unlike open source technologies there is a highly fragmented data product landscape. Products include data science platforms, data analytic platforms, platforms which are more visualisation-centric, and platforms which are more focused on data governance. There are also ETL production platforms which are in the domain of the data engineer but which include functionality to build some types of analytic models. Fragmented markets eventually consolidate. Even the broadest three cloud vendors, Amazon, Google and Microsoft, do not cover the entire landscape. For visualisation there is Quicksight, Data Studio, and Power BI respectively as well as competitive products, most obviously Tableau; for ETL production platforms there is Athena, Cloud Dataflow and Azure Data Factory, as well as competitive products such as Talend. But smaller vendors have the lead in data science platforms and data analytic platforms. The hiring by Microsoft of the python inventor Guido van Rossum two months ago points to their ambitions in data science platforms and data analytic platforms. Market consolidation in 2021 seems inevitable, but the details of actual acquisitions are not obvious. After all, it was which bought Tableau in 2019: not Amazon, Google or Microsoft. Best practice for organisations is to consider possible vendor consolidation as part of their procurement process, because product fragmentation means there is a corresponding fragmentation of skills. Defragmenting data skills Fragmentation of data skills means that the market for jobs, particularly contract jobs, is less elastic than it could be. The fragmentation of skills is partly caused by the fragmentation of products and their associated education resources and certification. Vendor’s product pricing typically falls into three categories: (i) more expensive commercial products (c. £500 - £5000 per user per month) which include free online education resources and certification; (ii) inexpensive commercial products (c. £5 to £50 per user per month) which usually require a corporate email address but have free online education resources and reasonably-priced certification exam fees (c £100- £200); and (iii) products which are normally expensive but have an inexpensive licensed version that cannot be used for commercial purposes, again including free online education resources and certification. The latter approach is best practice for solving the fragmentation of skills because the barriers to learning (i.e. high product cost or the need for a corporate email address) are removed. Best practice includes the Microstrategy Analyst Pass, which is available to anyone and costs $350 per year including a non-commercial product licence, online education resources and access to certification exams. University students (as well as self-educated hackers) learn open source technologies and one would expect that those skills are sufficient for them to enter the workplace in any data analytics environment. Yet several vendors who provide the more expensive commercial products (c. £500 - £5000 per user per month) and do not have discounted licences for non-commercial purposes make one exception: universities. At face value, this seems benign or even generous. But it contributes to the inelasticity of the job market at graduate level because an unintended consequence is that some graduate data analytics jobs require the graduate to be competent in a product before they have started work. Best practice is for organisations to employ graduates based on their skills in maths, statistics and open source technologies, not product. In seeking corporate acquisitions, vendors might find that their customers value “education bundling” as much as “product bundling”. Customers who are happy to pick, for example, the best visualisation product and the best data storage product from different vendors might be more attracted to their people using a single education portal with the same certification process across all products. And if an organisation can allocate 100% of its education budget to a single vendor then it will surely do so. Best practice is for vendors to consider the value of consolidating and standardising education resources, and not just products, when looking at corporate acquisitions. Defragmentating data analytics The consequence of implementing a best practice data analytics capability based on the principles of defragmentation has profound consequences for an organisation. It enables a much richer set of conversations to the one which took place 20 years ago. A young business development manager is showing some draft business plans to their new boss. They conclude by saying they’d like to get a data scientist involved. “No”, the boss replies. He pauses for several seconds, looks up from his desk and explains "Segment our customer base in different ways using different clustering techniques. Then run the cashflow scenarios. Then we discuss. Then we give it to data science." You can view Moray's original article here. Moray Barclay is an Experienced Data Analyst working in hands-on coding, Big Data analytics, cloud computing and consulting.



£100000 - £120000 per annum + benefits




Head of Data role based in Oxford, paying up to £120,000


US$120000 - US$130000 per annum + Additional Benefits


Cincinnati, Ohio


My client in Ohio are looking for big data engineering experts looking to join a learning-based cutting edge environment to grow technically!


£75000 - £85000 per annum




I am now recruiting for a Senior Data Engineer with experience in client facing positions, mentoring and leadership experience and with a broad tech stack.

recently viewed jobs