Senior Machine Learning Infra Engineer

arrow

/ $200000 - $250000 annum

INFO

Salary
SALARY:

$200000 - $250000

Location

LOCATION

Job Type
JOB TYPE

Permanent

ML Infrastructure Engineer

Location: Remote (US or UK)

About the Role

We're looking for an ML Infrastructure Engineer to build and scale production systems for cutting-edge generative AI models. You'll architect scalable inference pipelines, optimize model deployment, and ensure our 3D and multimodal generation systems run reliably at scale.

What You'll Do

  • Design and deploy high-performance backend systems for serving generative models in production
  • Build and optimize GPU-based inference services with focus on latency, throughput, and cost efficiency
  • Implement model optimization techniques including quantization, pruning, and distillation
  • Develop robust APIs and microservices for model serving using FastAPI, Flask, or gRPC
  • Manage cloud infrastructure and CI/CD pipelines for continuous model deployment
  • Scale distributed inference systems to handle high-concurrency workloads with request batching
  • Collaborate with ML researchers to productionize diffusion models, transformers, and multimodal pipelines

Required Experience

Generative AI Models

  • Hands-on experience with diffusion models and transformer-based architectures
  • Background in multimodal pipelines combining image and 3D generation
  • Familiarity with 3D generation or computer graphics pipelines (meshes, textures, multi-view data)

Production Infrastructure

  • Strong track record building backend and infrastructure systems in production environments
  • Expert-level Python programming with production-grade API design
  • Deep experience deploying and operating ML models at scale, including GPU-based inference services, concurrency handling, request batching, and latency/throughput optimization

ML Deployment Stack

  • Proficiency with cloud platforms: AWS (SageMaker, EC2, EKS), GCP, or equivalent
  • Experience with containerization (Docker), orchestration, and CI/CD pipelines
  • Hands-on work with model optimization frameworks: ONNX Runtime, TensorRT, FSDP, DeepSpeed
  • Knowledge of distributed systems and scalable inference frameworks (Ray, Triton, TorchServe)

Nice to Have

  • Experience with real-time inference systems or streaming pipelines
  • Background in graphics rendering or game engine technologies
  • Contributions to open-source ML infrastructure projects
  • Understanding of cost optimization strategies for GPU compute

CONTACT

Gabriella Varela

Recruitment Consultant

SIMILAR
JOB RESULTS

4k-Harnham_DA copy
CAN’T FIND THE RIGHT OPPORTUNITY?

STILL
LOOKING?

If you can’t see what you’re looking for right now, send us your CV anyway – we’re always getting fresh new roles through the door.