Overview

We are seeking a Senior DevOps Engineer to enhance our high-performance computing services and collaborate closely with the scientific community to optimize research computing.

Join our team to build and operate cutting-edge HPC capabilities using automation and infrastructure-as-code. Apply now to contribute to innovative computational solutions in a dynamic environment.

Responsibilities

  • Design, implement, and maintain robust platform infrastructure using Infrastructure as Code tools such as Terraform
  • Develop, deliver, and operate research computing services and applications
  • Apply Site Reliability Engineering principles to manage HPC service deployment, monitoring, and incident response
  • Solve complex technical problems related to HPC services and user applications
  • Manage large-scale HPC, HTC, or BC computing environments for optimal performance
  • Collaborate with scientific users to tailor HPC resources to research needs
  • Automate deployment processes to ensure consistency across HPC infrastructure
  • Maintain and administer large-scale cluster and server computing software such as Slurm, LSF, or Grid Engine
  • Develop and maintain monitoring dashboards using tools like Grafana and Prometheus
  • Work within a DevOps team environment following agile methodologies
  • Operate and utilize virtualized private cloud resources such as OpenStack
  • Administer large-scale parallel filesystems including Weka, GPFS, or Lustre
  • Use configuration management tools like Ansible, Salt, or Puppet to manage IT operations
  • Develop scripts and tools for HPC and DevOps platform operations using Bash and Python

Requirements

  • 3+ years of experience with DevOps processes and automation using Infrastructure as Code tools such as Terraform
  • Hands-on experience operating or engineering large-scale HPC or similar computing environments
  • Proven expertise in Linux system administration including TCP/IP networking and storage subsystems
  • Experience administering large-scale cluster management software such as Slurm, LSF, or Grid Engine
  • Knowledge of configuration management tools like Ansible, Salt, or Puppet
  • Experience working in agile DevOps teams
  • Ability to develop and maintain monitoring tools such as Grafana and Prometheus
  • Experience with scripting languages such as Bash and Python for automation and tool development
  • Strong experience managing virtualized private cloud environments like OpenStack
  • Scientific degree or equivalent experience in computationally intensive scientific data analysis
  • Proven ability to manage relationships with third-party suppliers
  • Upper-intermediate proficiency in English (B2+)

Nice to have

  • Experience with container technologies such as LXD, Singularity, Docker, or Kubernetes
  • Operation and configuration experience with public cloud platforms like AWS, Azure, or GCP
  • Experience with HashiCorp tools such as Vault, Consul, and Nomad
  • Development experience with programming languages such as Java, C++, Python, Ruby, or Perl
  • Experience with parallel filesystems like Weka, GPFS, or Lustre

Poland (Prod)

We gather like-minded people:

  • Engineering community of industry professionals
  • Friendly team and enjoyable working environment
  • Flexible schedule and opportunity to work remotely within Poland
  • Chance to work abroad for up to 60 days annually
  • Business-driven relocation opportunities

We provide growth opportunities:

  • Outstanding career roadmap
  • Leadership development, career advising, soft skills, and well-being programs
  • Certification (GCP, Azure, AWS)
  • Unlimited access to LinkedIn Learning, Get Abstract, Cloud Guru
  • English classes

We cover it all:

  • Stable income (Employment Contract or B2B)
  • Participation in the Employee Stock Purchase Plan
  • Benefits package (health insurance, multisport, shopping vouchers)
  • Strategically located offices featuring entertainment and relaxation zones, table tennis and football, free snacks, fantastic coffee, and more
  • Referral bonuses
  • Corporate, social and well-being events

Please, note:

  • The set of bonuses might vary based on the role you apply for – specifics will be discussed with our recruiter during the general interview.
  • We will reach out to selected candidates exclusively.

[epamgdo] Poland (About EPAM)

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.