Overview
We are seeking a highly motivated and experienced Generative AI Operations (GenAI Ops) Engineer to join our forward-thinking team.
In this position, you will play a key role in building, deploying, and maintaining the operational backbone for advanced generative AI models and services. You will collaborate with data scientists, machine learning engineers, and software developers to ensure our GenAI applications—including complex, multi-agent systems—are scalable, reliable, and efficient across leading cloud platforms. If you are passionate about operationalizing large-scale AI systems and eager to make a meaningful impact, we want to hear from you.
Responsibilities
- Design, implement, and maintain automated CI/CD pipelines for training, evaluating, and deploying large language models (LLMs) and AI agents
- Deploy and manage sophisticated multi-agent AI systems, ensuring seamless agent-to-agent communication and collaboration for automating complex business processes
- Integrate AI agents with external tools and APIs, using open standards like Model Context Protocol (MCP) to ensure interoperability and security
- Utilize AI-powered development tools to accelerate infrastructure code, testing, and troubleshooting in cloud environments
- Define and manage GenAI infrastructure using cloud-native or cloud-agnostic Infrastructure as Code (IaC) tools such as Terraform
- Implement monitoring and logging solutions to track model and agent performance, resource usage, and system health, including tracing agent actions and multi-step conversational flows
- Design and optimize scalable architectures for model serving and inference, focusing on performance and cost-effectiveness
- Apply security best practices and ensure compliance with industry standards and regulations for GenAI infrastructure and data
Requirements
- Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience
- At least 3 years of experience in a DevOps, SRE, or MLOps role with a focus on cloud infrastructure
- Hands-on experience with cloud services from major providers such as AWS, Google Cloud, or Azure
- Strong background in building and managing CI/CD pipelines using tools like Jenkins, GitLab CI, or cloud-native solutions
- Proficiency in at least one scripting language such as Python or Bash
- Experience with Infrastructure as Code (IaC) tools like AWS CDK, CloudFormation, or Terraform
- Familiarity with cloud-native GenAI services such as AWS Bedrock, Azure AI Foundry, or Google Vertex AI
- Understanding of the architecture and operational challenges of Large Language Models (LLMs)
- Practical experience with generative AI frameworks including Hugging Face, OpenAI, or LangChain
- Experience with monitoring and observability tools for AI/ML systems such as Prometheus or Grafana
- Experience with ML experiment tracking and versioning tools like MLflow or Weights & Biases
- Hands-on experience with containerization and orchestration technologies such as Docker and Kubernetes
- Fluent English communication skills at a B2+ level
Nice to have
- Master’s degree or PhD in Computer Science, AI, Machine Learning, or a related discipline
- Experience designing or managing multi-agent systems or orchestrated AI workflows
- Relevant certifications in cloud or DevOps technologies
- Strong analytical and problem-solving skills with the ability to thrive in a fast-paced, collaborative environment
Turkiye
CONTINUOUS UPSKILLING, LEARNING & DEVELOPMENT
- Diversity of tasks and projects
- Assessment center for objective review of competency level
- Personal development plan
- Mentoring programs and leadership development
- Certification and professional development support
- Access to learning platforms including more than 2,500 internal courses and the LinkedIn Learning library with 20,000+ courses
- English courses taught by certified teachers
CORPORATE BENEFITS
- Extra leave days
- Referral bonuses
COMPENSATION PACKAGE
- Competitive compensation paid in USD
- Regular salary and performance reviews
MEDICAL & HEALTHCARE
- Private health insurance
- Well-being events
WORKING ENVIRONMENT
- Recreation areas and kitchens
- Tea, coffee, and snacks
- Well-being events
- Sports equipment and game consoles
- IT Equipment
- Microsoft's Software Assurance Home Use Program (HUP)
[epamgdo] Turkiye (About EPAM)
EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
[epamgdo] Turkiye (CVs in English)
Please note that our Talent Attraction Team reviews applications and CVs submitted in English.