Junior Data Engineer

London
£30000 - £40000 per annum

JUNIOR DATA ENGINEER

£30,000 - £40,000

LONDON

A hands-on role for a junior data engineer in a growing media company based in London. You will sit in a small team of data engineers working on an exciting end to end project.

THE COMPANY:

This role is with a media company who work with a wide range of tech brands and have grown rapidly since their launch 5 years ago. They have a centralised tech platform and are currently building a Google cloud platform to service the rest of the data team.

THE ROLE:

The successful candidate will work within a small data function within the business in a core data engineer role.

In particular, this role will involve:

  • End to end data engineering duties including building a new data extraction using Python programming.
  • Transforming and loading processes in production.
  • Data warehouse management.
  • Communicating with managers and the ability to meet requirements.

YOUR SKILLS & EXPERIENCE:

  • 1- 2 years' commercial experience in Data engineering.
  • Educated to degree level in a relevant field.
  • Commercial experience with Python programming, (ideally GCP and Airflow/Docker)
  • Strong SQL knowledge and experience.

THE BENEFITS:

The selected candidate will receive a salary between £30,000 - £40,000 depending on their experience and requirements as well as a bonus.

HOW TO APPLY:

Please register your interest by sending your CV to Holly Neeves via the Apply link on this page.

Send similar jobs by email
37703/HN
London
£30000 - £40000 per annum
  1. Permanent
  2. Big Data

Similar Jobs

Salary

US$120000 - US$130000 per annum + Additional Benefits

Location

Cincinnati, Ohio

Description

My client in Ohio are looking for big data engineering experts looking to join a learning-based cutting edge environment to grow technically!

Salary

US$180000 - US$200000 per annum + Additional Benefits

Location

New York

Description

A NYC healthcare leader is looking to bring in a Director of Data Engineering to join their team!

Salary

£500 - £600 per day

Location

London

Description

Data Engineer 6 month contract (Outside IR35) Azure, Python, Spark Remote/London

Salary

€40000 - €50000 per annum + Additional benefits

Location

Amsterdam, North Holland

Description

A start-up is looking data engineer to join their team to transform the experience of online supermarket shopping, who will be building ETL pipelines.

Harnham blog & news

With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.

Visit our Blogs & News portal or check out our recent posts below.

From Broken Data Pipelines to Broken Data Headlines

This week's guest post is written by Moray Barclay. Two things have caused the UK’s Test & Trace application to lose 16,000 Covid-19 test results, both of which are close to my heart. The first is the application’s data pipeline, which is broken. The second is a lack of curiosity. The former does not necessarily mean that a data application will fail. But when compounded by the latter it is certain. Data Pipelines All data applications have several parts, including an interesting part (algorithms, recently in the news), a boring part (data wrangling, never in the news), a creative part (visualisation, often a backdrop to the news), and an enabling part (engineering, usually misunderstood by the news).  Data engineering, in addition to the design and implementation of the IT infrastructure common to all software applications, includes the design and implementation of the data pipeline. As its name suggests, a data pipeline is the mechanism by which data is entered at one end of a data application and flows through the application via various algorithms to emerge in a very different form at the other end. A well architected data application has a single pipeline from start to finish. This does not mean that there should be no human interaction with the data as it travels down the pipeline but it should be limited to actions which can do no harm. Human actions which do no harm include: pressing buttons to start running algorithms or other blocks of code, reading and querying data, and exporting data to do manual exploratory or forensic analysis within a data governance framework. The data pipeline for Test & Trace will look something like this:    a patient manually fills out a web-form, which automatically updates a patient listfor each test, the laboratory adds the test result for that patientthe lab sends an Excel file to Public Health England with the ID’s of positive patientsPHE manually transpose the data in the Excel file to the NHS Test & Trace systemthe NHS T&T system pushes each positive patient contact details to NHS T&T agentsfor each positive patient, an NHS T&T contact centre agent phones them. This is a not a single pipeline because in the middle a human being needs to open up an editable file and transpose it into another file. The pipeline is therefore broken, splitting at the point at which the second Excel file is manually created. If you put yourself in the shoes of the person receiving one of these Excel files, you can probably identify several ways in which this manual manipulation of data could lead to harm. And it is not just the data which needs to be moved manually from one side of the broken pipeline to the other side, it is the associated data types, and CSV files can easily lose data type information. This matters. You may have experienced importing or exporting data with an application which changes 06/10/20 to 10/06/20. Patient identifiers should be of data type text, even if they consist only of numbers, for future-proofing. Real numbers represented in exponential format should, obviously, be of a numeric data type. And so on. One final point: the different versions of Excel (between the Pillar 2 laboratories and PHE) are a side-show, because otherwise this implies that had the versions been the same, then everything would be fine. This is wrong. The BBC have today reported that “To handle the problem, PHE is now breaking down the test result data into smaller batches to create a larger number of Excel templates. That should ensure none hit their cap.” This solves the specific Excel incompatibility problem (assuming the process of creating small batches is error-free) but has no bearing on the more fundamental problem of the broken data pipeline, which will stay until the manual Excel manipulation is replaced by a normal and not particularly complex automated process. Curiosity So where does curiosity fit in? The first thing that any Data Analyst does when they receive data is to look at it. This is partly a technical activity, but it is also a question of judgement and it requires an element of curiosity. Does this data look right? What is the range between the earliest and the latest dates? If I graph one measurement over time (in this case positive tests over time), does the line look right? If I graph two variables (such as Day Of Week versus positive tests) what does the scatter chart look like? Better still, if I apply regression analysis to the scatter chart what is the relationship between the two variables and within what bounds of confidence? How does that relate to the forecast? Why? This is not about skills. If I receive raw data in csv format I would open it in a python environment or an SQL database. But anyone given the freedom to use their curiosity can open a csv file in Notepad and see there are actually one million rows of data and not 65,000. Anyone given the freedom to use their curiosity can graph data in Excel to see whether it has strange blips. Anyone given the freedom to use their curiosity can drill down into anomalies. Had those receiving the data from the Pillar 2 laboratories been allowed to focus some of their curiosity at what they were receiving they would have spotted pretty quickly that the 16,000 patient results were missing. As it was, I suspect they were not given that freedom: I suspect they were told to transpose as much data as they could as quickly as possible, for what could possibly go wrong? Single Data Pipeline, Singular Curiosity: Pick At Least One To reiterate, the current problems with T&T would never have arisen with a single data pipeline which excluded any manual manipulation in Excel. But knowing that the data pipeline was broken and manual manipulation was by design part of the solution, the only way to minimise the risk was to encourage people engaged in that manual process to engage their curiosity about the efficacy of the data they were manipulating. In their prototype phases – for that is the status of the T&T application - data projects will sometimes go wrong. But they are much more likely to go wrong if the people involved, at all levels, do not have enough time or freedom to think, to engage their curiosity, and to ask themselves “is this definitely right?” You can view Moray's original article here.  Moray Barclay is an Experienced Data Analyst working in hands-on coding, Big Data analytics, cloud computing and consulting.

2020: The Year of the Data Engineer

Data Engineers are the architects of Data. They lay the foundation businesses use to collect, gather, store, and make Data usable. Each iteration of the Data as it moves along the pipeline is cleaned and analysed to be used by Data professionals for their reports and Machine Learning models. A ROLE IN HIGH DEMAND Even as businesses reopen, reassess, and for some, remain remote, the demand for Data Engineers is high. Computer applications, Data modelling, prediction modelling, Machine Learning, and more need Data professionals to lay the groundwork to help businesses benefit in today’s Data-driven culture. The word gets thrown around a bit, but when the majority of business has moved online, Data-driven is the name of the game. Having a Data plan, a Data team, and all aligned with your business strategy is imperative to the way business is done today. This type of innovation can offer insight for better business decisions, enhance customer engagement, and improve customer retention without missing a beat.  Without Data Engineers, Data Scientists can’t do their jobs. Understanding the amount of Data, the speed at which is delivered, and its variety need Engineers to create reliable and efficient systems. Like many Data professional jobs, even still in 2020, Data Engineers are in high demand. Yet a skills shortage remains. This has created an emerging field of professionals from other backgrounds who are looking to take on the role of Data Engineer and fill the gap. Whether by necessity or design, these individuals build and manage pipelines, automate projects, and see their projects through to the end result. CAREER OPPORTUNITIES OUTSIDE THE NORM As this growing trend emerges, it has created career opportunities for those with experience outside the normal channels of Data Engineering study. While it might involve individuals from backgrounds such as software Engineering, Databases, or something similarly IT-related, some businesses are upskilling their employees with talent. Rapid growth, reskilling, upskilling, and ever-constant changes still leave businesses with a shortage of Data Engineers to meet the demand. It’s critical to fill the gap for success. According to LinkedIn’s 2020 Emerging Jobs Report, Data Engineering is listed in the top 10 of jobs experiencing growth. THREE STEPS TOWARDS BECOMING A DATA ENGINEER This is a vital role in today’s organisations. So, if you’re in the tech industry and want to take a deeper dive into Data as a Data Engineer, what steps can you take? This is a time like no other. There’s time to assess your goals, take online classes, and get hands on with projects. Though having a base of computer science, mathematics, or business-related degree is always a good start. Be well-versed in such popular programming languages such as SQL, Python, R, Hadoop, Spark, and Amazon Web Services (AWS).Prepare for an entry-level role once you have your bachelor’s degree.Consider additional education to stay ahead of the curve. This can include not only professional certifications, but higher education degrees as well. The more experience, hands-on as well as academic, you have the more in demand you’ll be as a Data Engineer. Data scientists might be the rockstars of Data, but Data Engineers set the stage. As business processes have shifted online, looking for your next job has become more daunting than ever before. If you’re looking for your next opportunity in Data, take a look at our current jobs or get in touch with one of our expert consultants to find out more. 

Recently Viewed jobs