AI Driven Data – Built in the North and Midlands | How Data Engineers are Using AI to Build Better Data Pipelines

Across organisations, AI is rapidly changing how data-driven teams operate, from how insights are generated to how decisions are made and executed. Whether it’s CRO, Engineering, Analytics, or Data Science, each function is experiencing a shift in responsibilities, required skillsets, and the speed at which value can be delivered.

In this series, I am working with current leaders in this space across the North of England and Midlands to explore how AI is impacting each of these core areas, highlighting the opportunities, challenges, and practical implications for teams looking to stay ahead. From automation and augmentation to entirely new ways of working, this collection breaks down what AI means in real terms for the people and functions at the heart of modern data organisations.

This is the second article in the series, in which I collaborated with Head of Engineering, Stephen Lynch who has worked at companies including Penguin, Cloudfactory, and Chetwood.

How Data Engineers Are Using AI to Build Better Data Pipelines

AI is finding its way into almost every part of the modern data stack.

For data engineers, the value isn’t in replacing engineering work. It’s in reducing the time spent on repetitive tasks such as schema design, pipeline development, monitoring, and documentation.

Tools like Snowflake Cortex, dbt Co-Pilot, and Matillion Maia can help accelerate delivery, but they’re only part of the picture. The real question is where AI genuinely adds value and where engineering judgement still matters.

Where AI fits into the Data Engineering lifecycle

One thing I often see when people talk about AI in data engineering is a focus on individual tools rather than where those tools actually create value.

From my perspective, a typical data engineering lifecycle looks something like this:

  1. Customer requests data
  2. Architect and Business Analyst investigate requirements and define a model
  3. Engineer builds schema
  4. Engineer ingests data and discovers it doesn’t conform to the expected schema
  5. Engineer amends schema
  6. Engineer builds pipeline to ingest data
  7. Engineer resolves issues with the data and pipeline
  8. Engineer defines and deploys pipeline monitoring
  9. Engineer makes improvements based on monitoring insights
  10. Engineer creates supporting documentation and releases into hypercare

In most cases, steps three through nine consume the majority of the effort. They’re iterative, they’re often complex, and they’re where engineers spend most of their time troubleshooting, refining, and optimising.

They’re also the areas where AI can provide the greatest benefit.

Using AI during schema design and pipeline development

If we look at the lifecycle again, there are already several tools that can support data engineers throughout the development process.

Snowflake Cortex Code can help with schema design, validation, and refinement. dbt Co-Pilot can support model development, testing, and pipeline creation. Tools such as Matillion Maia can help accelerate pipeline development and operational workflows.

The opportunity isn’t necessarily about introducing more tools. It’s often about understanding what capabilities already exist within the platforms you’re using today.

That’s where I think many organisations should focus first. Before investing in another AI platform, it’s worth asking what functionality is already available within your existing stack and whether you’re getting value from it.

That said, AI-generated outputs should still be treated as a starting point rather than a finished product.

Using Cortex AI functionality, it’s possible to generate Snowflake SQL, dbt models, validation frameworks, and tests. It can help accelerate development considerably. But those outputs still need to be reviewed, optimised, and productionised by experienced engineers.

Just because something works doesn’t mean it works well.

Improving monitoring and operational efficiency

The conversation doesn’t stop once a data product is live.

In many organisations, monitoring pipelines and responding to failures remains one of the most time-consuming operational responsibilities for data engineering teams.

Tools such as Sifflet are helping address this challenge through AI-powered monitoring, alerting, and anomaly detection.

Rather than waiting for something to fail, these platforms can help identify potential issues earlier, allowing engineers to intervene before business users are impacted.

I can also see opportunities for tighter integration with platforms such as ServiceNow, automatically generating tickets when issues occur and streamlining operational response processes.

Over time, I expect predictive monitoring capabilities to become increasingly valuable as organisations look to reduce manual oversight and improve platform reliability.

The productivity question: is AI worth the cost?

One of the biggest conversations surrounding AI in data engineering isn’t capability. It’s economics.

Tools such as Snowflake Cortex can become expensive surprisingly quickly as token usage and compute requirements increase.

The question organisations need to answer isn’t whether AI can generate code or automate tasks. We already know it can.

The more important question is whether the productivity gains outweigh the associated costs.

There needs to be a clear comparison between improved engineering efficiency and the ongoing cost of running these services at scale.

AI should solve real problems, not simply add another line item to the technology budget.

What comes next?

One area I’m particularly interested in is the idea of agentic data pipelines.

Today, AI is helping engineers generate code, create documentation, identify issues, and accelerate delivery. The next evolution could be systems that automatically detect failures, recommend fixes, and potentially resolve some issues without direct human intervention.

The idea of self-healing pipelines that continue ingesting and exposing the majority of data despite operational issues is an interesting one.

We’re not there yet, but it’s certainly a direction worth watching.

AI still needs Data Engineers

I’m a huge fan of Snowflake and the broader ecosystem. We’ve come a long way from when Snowflake was simply viewed as a cloud-first data warehouse.

Regularly, I use AI functionality to go through my work and remove some of the more repetitive tasks involved in data engineering. It saves time and helps improve productivity.

However, I’m not at the stage where I can trust AI to do a perfect job.

It’s improving constantly, but there are still risks around cost, quality, and over-reliance on generated outputs.

One concern I have is the potential separation of data engineering from actual engineering skills. Understanding the data, designing robust architectures, and writing efficient, maintainable code remain essential capabilities.

AI can generate code. It cannot replace engineering judgement.

For me, AI works best alongside strong data engineering fundamentals. It’s a tool that helps accelerate delivery, reduce manual effort, and improve productivity. It shouldn’t replace the expertise required to build reliable, scalable, and well-designed data platforms.

The organisations that get the most value from AI won’t be the ones that rely on it entirely.

They’ll be the ones that combine AI capabilities with strong engineering talent and solid data foundations.

 

Building or scaling a data engineering team?  

Harnham has spent more than 19 years placing data and AI talent across the UK and US. 

If you want a team that uses AI well rather than leaning on it entirely, our Data Engineering specialists can help you find engineers who pair AI with strong fundamentals. You can also browse live Data & AI roles or benchmark your hiring with our Data & AI Salary Guides.