Senior Machine Learning Infra Engineer

/ $200000 - $250000 annum

APPLY NOW

JOB ALERTS

INFO

SALARY:

$200000 - $250000

LOCATION

JOB TYPE

Permanent

ML Infrastructure Engineer

Location: Remote (US or UK)

About the Role

We're looking for an ML Infrastructure Engineer to build and scale production systems for cutting-edge generative AI models. You'll architect scalable inference pipelines, optimize model deployment, and ensure our 3D and multimodal generation systems run reliably at scale.

What You'll Do

Design and deploy high-performance backend systems for serving generative models in production
Build and optimize GPU-based inference services with focus on latency, throughput, and cost efficiency
Implement model optimization techniques including quantization, pruning, and distillation
Develop robust APIs and microservices for model serving using FastAPI, Flask, or gRPC
Manage cloud infrastructure and CI/CD pipelines for continuous model deployment
Scale distributed inference systems to handle high-concurrency workloads with request batching
Collaborate with ML researchers to productionize diffusion models, transformers, and multimodal pipelines

Required Experience

Generative AI Models

Hands-on experience with diffusion models and transformer-based architectures
Background in multimodal pipelines combining image and 3D generation
Familiarity with 3D generation or computer graphics pipelines (meshes, textures, multi-view data)

Production Infrastructure

Strong track record building backend and infrastructure systems in production environments
Expert-level Python programming with production-grade API design
Deep experience deploying and operating ML models at scale, including GPU-based inference services, concurrency handling, request batching, and latency/throughput optimization

ML Deployment Stack

Proficiency with cloud platforms: AWS (SageMaker, EC2, EKS), GCP, or equivalent
Experience with containerization (Docker), orchestration, and CI/CD pipelines
Hands-on work with model optimization frameworks: ONNX Runtime, TensorRT, FSDP, DeepSpeed
Knowledge of distributed systems and scalable inference frameworks (Ray, Triton, TorchServe)

Nice to Have

Experience with real-time inference systems or streaming pipelines
Background in graphics rendering or game engine technologies
Contributions to open-source ML infrastructure projects
Understanding of cost optimization strategies for GPU compute

CONTACT

Gabriella Varela

Recruitment Consultant

APPLY NOW

JOB ALERTS

SIMILAR
JOB RESULTS

Data Analytics Engineer

Austin

$140000 - $160000

+ Data Management & Governance

Permanent
Austin, Texas

Our Expertise

Data Engineering

Data Science Machine Learning & AI

Digital analytics

Risk analytics

Advanced Analytics

Life sciences

Computer vision

Data Management and Governance

Data & AI Recruitment Solutions

Data and AI Talent Solutions

Data and AI Recruitment & Staffing

Data and AI Contract & Freelance

Global Data & AI Hiring Insights

Data and AI Executive Search

Data and AI Graduate Talent

Data and AI Training

Data & AI Industry Hub

News, Blogs & Insights

Data and AI Salary Guides 2024

The Data and AI Podcast

Diversity Reports

CAREERS AT HARNHAM

Careers at Harnham

Career Paths

Recruitment Rookies

Experienced Hires

Operations at Harnham

Diversity, Equity & inclusion

Harnham Life / Meet the Team

Training, Learning & Development

CSR

WHERE TO FIND US

New York

San Francisco

PHOENIX

LONDON

Amsterdam

France

Senior Machine Learning Infra Engineer

Gabriella Varela

SIMILAR JOB RESULTS

Data Analytics Engineer

Backend Engineer

Engineering Manager – Enterprise Security

Engineering Manager, Enterprise Security

The Opportunity

What You’ll Own

Technical Direction & Delivery

Team Leadership & Growth

Product & Business Impact

What You Bring

Nice to Have

Machine Learning Engineer – GenAI Product

Machine Learning Engineer – GenAI (Remote)

The Role

What You’ll Do

What You’ll Bring