This is a remote position.

Primary Responsibilities
1. OpenStack Architecture & Platform Engineering
  • Design production-grade OpenStack environments across controller, compute, and storage nodes.
  • Architect HA control planes using HAProxy, Keepalived, Galera, and RabbitMQ clustering.
  • Build scalable cell-based Nova architectures.
  • Implement multi-region replication strategies.
  • Perform platform capacity modeling and growth forecasting.
2. Compute Virtualization (Nova)
  • Nova scheduler tuning and filters.
  • CPU pinning and isolation.
  • NUMA topology alignment.
  • HugePages configuration.
  • Live migrations and evacuations.
  • GPU passthrough and SR-IOV provisioning.
Hypervisor stack includes KVM, QEMU, Libvirt, and VirtIO.
3. Networking & SDN (Neutron)
  • ML2 plugin architecture.
  • OVS, OVN, Linux Bridge deployments.
  • VXLAN, Geneve, VLAN overlays.
  • DVR and L3 routing.
  • Floating IP NAT design.
  • SR-IOV and DPDK acceleration.
  • Integration with BGP EVPN, MPLS, VRFs, and SD-WAN.
4. Storage Engineering
Ceph (Primary Requirement)
  • RBD block storage.
  • CephFS and RGW object storage.
  • CRUSH map tuning.
  • Placement group optimization.
  • BlueStore performance tuning.
  • NVMe and SSD tiering.
Additional exposure to Linstor, DRBD, iSCSI, and NVMe-oF preferred.
5. Image & Lifecycle Services
  • Glance image pipelines.
  • QCOW2 optimization.
  • Cloud-init automation.
  • Golden image lifecycle management.
6. Identity & Access (Keystone)
  • RBAC modeling.
  • LDAP/AD integration.
  • SAML/SSO federation.
  • Token lifecycle management.
7. Orchestration & Automation
  • Heat orchestration templates.
  • Terraform automation.
  • Ansible playbooks.
  • CI/CD for infrastructure.
Deployment frameworks include Kolla-Ansible, OpenStack-Ansible, TripleO, and MAAS/Juju.
8. Kubernetes & Containerized Control Planes
  • Operate OpenStack on Kubernetes.
  • Helm/Operator-based deployments.
  • Pod and persistent volume troubleshooting.
9. Bare Metal Provisioning (Ironic)
  • PXE/iPXE pipelines.
  • Hardware introspection.
  • Integration with MAAS/Foreman.
10. Observability & Reliability Engineering
  • Prometheus and Grafana monitoring.
  • ELK logging pipelines.
  • Incident response and RCA.
  • SLA tracking and alert tuning.
11. Upgrade & Lifecycle Management
  • Major version upgrades.
  • Rolling compute upgrades.
  • Database migrations.
  • Zero-downtime patching.

Requirements

Required Technical Experience
  • 8–12+ years Linux systems engineering.
  • 5+ years OpenStack production operations.
  • Strong KVM virtualization expertise.
  • Networking: BGP, VXLAN, EVPN.
  • Storage: Ceph production operations.
  • Databases: MariaDB/Galera.
  • Messaging: RabbitMQ.
  • Automation: Ansible/Terraform.
  • Scripting: Python/Bash.
Preferred Skills
  • Platform9 / Canonical / Red Hat OpenStack.
  • Ironic bare-metal provisioning.
  • DPDK / SR-IOV acceleration.
  • GPU workloads.
  • Hybrid cloud integrations.
Work Model Requirements
  • Remote within India.
  • Mandatory U.S. EST shift overlap.
  • Night shift operations required.
  • On-call rotation participation.