Summary
We are looking for a Senior MLOps Engineer to support the AI CoE in building and scaling machine learning operations. This position requires both strategic oversight and direct involvement in MLOps infrastructure design, automation, and optimization. The person will lead a team while collaborating with various stakeholders to manage machine learning pipelines and model deployments in GCP / AWS / Azure. One of key parts of this role would also be managing data and models using data
cataloging tools, ensuring that they are well- documented, versioned, and accessible for reuse and auditing.
Job Description:
⮚ Deploy models to production in GCP and own the model maintenance, monitoring and
support activities
⮚ Split time between high-level strategy and hands-on technical implementation
⮚ Architect, build, and maintain scalable MLOps pipelines, with a focus on GCP / AWS / Azure services such as Vertex AI, GKE, Cloud Storage, and Big Query; stay up-to-date with the latest trends and advancements in MLOps
⮚ Implement and optimize CI/CD pipelines for machine learning model deployment, ensuring minimal downtime and streamlined processes
⮚ Work closely with data scientists and data engineers to ensure efficient data processing pipelines, model training, testing, and deployment
⮚ Manage data catalog tools for model and dataset versioning, lineage tracking, and
governance. Ensure that all models and datasets are properly documented and
discoverable
⮚ Develop automated systems for model monitoring, logging, and performance tracking in production environments
⮚ Lead the integration of data cataloging tools (e.g., Open MetaData), ensuring the
traceability and versioning of both datasets and models.
Required Experience:
⮚ Bachelor’s degree in Computer Science, Engineering or similar quantitative disciplines
⮚ 4+ years of professional experience in MLOps or similar roles
⮚ Candidate should be able to able to write code in ML
⮚ Excellent analytical and problem-solving skills for technical challenges related to MLOps
⮚ Excellent English proficiency, presentation, and communication skills
⮚ Proven experience in deploying, monitoring, and managing machine learning models on GCP / AWS /Azure
⮚ Hands-on experience with data catalog tools
⮚ Expert in GCP / AWS / Azure services such as Vertex AI, GKE, BigQuery, and Cloud Build, Endpoint etc for building scalable ML infrastructure (GCP / AWS / Azure official Certifications are a huge plus)
⮚ Experience with model serving frameworks (e.g., TensorFlow Serving, TorchServe), and MLOps tools like Kubeflow, MLflow, or TFX