Data Architect - GCP, AWS, dbt

London
£60000 - £95000 per annum

Data Architect - GCP, AWS, DBT

£90,000

London

Do you want to work with the latest tools in the data and technology space? Have you worked building ETL flows for Cloud data warehouses? If so, this could be the role for you

THE COMPANY:

This business is one of the UK's leading news outlets who are generating new plans for their data infrastructure. Having built a data lake in Scala, Spark and AWS, the data structures are quite mature and require team to wrangle the data for the analytics team. This company have a tech stack at the forefront of data and analytics, using GCP, BigQuery data warehouse, DBT and git. This role will have you working across these technologies to deliver on targets for multiple teams across the business.

THE ROLE:

Reporting into the Head of Data, this role as Data Architect, you will lead the design of logical and physical data models to deliver on the business' objectives. You role will require you to:

  • Have architectural accountability for how the business collects, structures and uses data across the organisation
  • Design how the business deliver on the needs for data
  • Ensure that the data models are to a high standard
  • Ensure that the architectural and modelling standards are understood by the engineering teams

YOUR SKILLS AND EXPERIENCE:

To qualify for this Data Architect role, you will require:

  • Experience creating data architectures and models in large, consumer - orientated organisations
  • Able to demonstrate a tech agnostic approach
  • Proven experience delivering end to end solutions
  • A broad understanding of data environments
  • Experience with modern data analytics and business intelligence tools: AWS, GCP, DBT, Data Lake, Spark, Scala, SQL, Tableau, Looker, LookML

THE BENEFITS:

A successful applicant will receive:

  • A salary of up to £90,000
  • The opportunity to take ownership and work with technologies at the forefront of data and technology

HOW TO APPLY:

Please register your interest to this Data Architect role by applying via this website. For more information on this role or other roles in the Business Intelligence market, reach out to Tom Brammer at Harnham.

Send similar jobs by email
76283
London
£60000 - £95000 per annum
  1. Permanent
  2. Business Intelligence

Similar Jobs

Salary

£100000 - £125000 per annum + bonus and benefits

Location

London

Description

Great role for a head of analytics with significant experience across reporting, customer analysis and building teams

Salary

€640 - €720 per day

Location

Rotterdam, South Holland

Description

An exciting Data and Analytics Consultancy are seeking a BI Consultant to work on some exciting projects for their high profile clients.

Salary

£60000 - £70000 per annum + package

Location

London

Description

Great role for someone with excellent production level SQL and Dbt experience in a cloud environment to join a rapidly expanding, well funded start-up

Salary

£400 - £500 per day

Location

London

Description

A major name in the Housing industry is looking for an Azure Data Engineer to support their data team during their exciting cloud journey

Harnham blog & news

With over 10 years experience working solely in the Data & Analytics sector our consultants are able to offer detailed insights into the industry.

Visit our Blogs & News portal or check out our recent posts below.

It Takes Two: Data Architect Meets Data Engineer

Information. Data. The lifeblood of business. Information and data are used interchangeably, gathered, collected, and analysed to create actionable insights for informed business decisions. So, what does that mean exactly? And to that end, how do we know what information or data we need to make those decisions? Enter the Data Architect. The Role of a Data Architect Just like you might hire an architect to sketch out your dreamhouse, the Data Architect is a Data Visionary. They see the full picture and can craft the design and framework creating the blueprint for the Data Engineer, who will ultimately build the digital framework. Data Architects are the puzzle solvers who can take a jumble of puzzle pieces, in this case massive amounts of data, and put everything in order. It’s their job to figure out what’s important and what isn’t based on an organisation's business objectives. Skills for a Data Architect might include: Computer Science degree, or some variation thereof.Plenty of experience working with systems and application development.Extensive knowledge and able to deep dive into Information ManagementIf you’re just starting your Data Architect path, be prepared for years of building your experience in data design, data storage, and Data Management. The Role of a Data Engineer The Data Engineer builds the vision and brings it to life. But they don’t work in a vacuum and are integral to the Data Team working nearly in tandem with the Data Architect. These engineers are building the infrastructure – the pipelines and data lakes. Once exclusive to the software-engineering field, the data engineer’s role has evolved exponentially as data-focused software became an industry standard. Important skills for a Data Engineer might include. Strong developer skills.Understand a host of technologies such as Python, R, Hadoop, and moreCraft projects to show what you can do, not just talk about what you can do – your education isn’t much of a factor when it comes to data engineering. On the job training does it best.Social and communication skills are critical as you map initial designs, and a love of learning keeps everything humming along, even as technology libraries shift, and you have to learn something new. The Major Differences between the Data Architect and Data Engineer RolesAs intertwined as these two roles might seem, there are some crucial differences. Data Architect Crafts concept and visualises frameworkLeads the Data Science teams Data Engineer Builds and maintains the frameworkProvides supporting framework With a focus on Database Management technologies, it can seem as though Data Architect and Data Engineer are interchangeable. And at one time, Data Architects did also take on the Data Engineering role. But the knowledge each has is used differently.  Whether you’re looking to enter the field of Data Engineering, want to move up or over with your years of experience to Data Architect, or are just starting out. Harnham may have a role for you. Check out our current opportunities or get in touch with one of our expert consultants to learn more.  

From Broken Data Pipelines to Broken Data Headlines

This week's guest post is written by Moray Barclay. Two things have caused the UK’s Test & Trace application to lose 16,000 Covid-19 test results, both of which are close to my heart. The first is the application’s data pipeline, which is broken. The second is a lack of curiosity. The former does not necessarily mean that a data application will fail. But when compounded by the latter it is certain. Data Pipelines All data applications have several parts, including an interesting part (algorithms, recently in the news), a boring part (data wrangling, never in the news), a creative part (visualisation, often a backdrop to the news), and an enabling part (engineering, usually misunderstood by the news).  Data engineering, in addition to the design and implementation of the IT infrastructure common to all software applications, includes the design and implementation of the data pipeline. As its name suggests, a data pipeline is the mechanism by which data is entered at one end of a data application and flows through the application via various algorithms to emerge in a very different form at the other end. A well architected data application has a single pipeline from start to finish. This does not mean that there should be no human interaction with the data as it travels down the pipeline but it should be limited to actions which can do no harm. Human actions which do no harm include: pressing buttons to start running algorithms or other blocks of code, reading and querying data, and exporting data to do manual exploratory or forensic analysis within a data governance framework. The data pipeline for Test & Trace will look something like this:    a patient manually fills out a web-form, which automatically updates a patient listfor each test, the laboratory adds the test result for that patientthe lab sends an Excel file to Public Health England with the ID’s of positive patientsPHE manually transpose the data in the Excel file to the NHS Test & Trace systemthe NHS T&T system pushes each positive patient contact details to NHS T&T agentsfor each positive patient, an NHS T&T contact centre agent phones them. This is a not a single pipeline because in the middle a human being needs to open up an editable file and transpose it into another file. The pipeline is therefore broken, splitting at the point at which the second Excel file is manually created. If you put yourself in the shoes of the person receiving one of these Excel files, you can probably identify several ways in which this manual manipulation of data could lead to harm. And it is not just the data which needs to be moved manually from one side of the broken pipeline to the other side, it is the associated data types, and CSV files can easily lose data type information. This matters. You may have experienced importing or exporting data with an application which changes 06/10/20 to 10/06/20. Patient identifiers should be of data type text, even if they consist only of numbers, for future-proofing. Real numbers represented in exponential format should, obviously, be of a numeric data type. And so on. One final point: the different versions of Excel (between the Pillar 2 laboratories and PHE) are a side-show, because otherwise this implies that had the versions been the same, then everything would be fine. This is wrong. The BBC have today reported that “To handle the problem, PHE is now breaking down the test result data into smaller batches to create a larger number of Excel templates. That should ensure none hit their cap.” This solves the specific Excel incompatibility problem (assuming the process of creating small batches is error-free) but has no bearing on the more fundamental problem of the broken data pipeline, which will stay until the manual Excel manipulation is replaced by a normal and not particularly complex automated process. Curiosity So where does curiosity fit in? The first thing that any Data Analyst does when they receive data is to look at it. This is partly a technical activity, but it is also a question of judgement and it requires an element of curiosity. Does this data look right? What is the range between the earliest and the latest dates? If I graph one measurement over time (in this case positive tests over time), does the line look right? If I graph two variables (such as Day Of Week versus positive tests) what does the scatter chart look like? Better still, if I apply regression analysis to the scatter chart what is the relationship between the two variables and within what bounds of confidence? How does that relate to the forecast? Why? This is not about skills. If I receive raw data in csv format I would open it in a python environment or an SQL database. But anyone given the freedom to use their curiosity can open a csv file in Notepad and see there are actually one million rows of data and not 65,000. Anyone given the freedom to use their curiosity can graph data in Excel to see whether it has strange blips. Anyone given the freedom to use their curiosity can drill down into anomalies. Had those receiving the data from the Pillar 2 laboratories been allowed to focus some of their curiosity at what they were receiving they would have spotted pretty quickly that the 16,000 patient results were missing. As it was, I suspect they were not given that freedom: I suspect they were told to transpose as much data as they could as quickly as possible, for what could possibly go wrong? Single Data Pipeline, Singular Curiosity: Pick At Least One To reiterate, the current problems with T&T would never have arisen with a single data pipeline which excluded any manual manipulation in Excel. But knowing that the data pipeline was broken and manual manipulation was by design part of the solution, the only way to minimise the risk was to encourage people engaged in that manual process to engage their curiosity about the efficacy of the data they were manipulating. In their prototype phases – for that is the status of the T&T application - data projects will sometimes go wrong. But they are much more likely to go wrong if the people involved, at all levels, do not have enough time or freedom to think, to engage their curiosity, and to ask themselves “is this definitely right?” You can view Moray's original article here.  Moray Barclay is an Experienced Data Analyst working in hands-on coding, Big Data analytics, cloud computing and consulting.

Recently Viewed jobs