About the Role
We're seeking a Senior Infrastructure Engineer to help build and scale Hyperbolic's GPU Cloud Marketplace, building a multi-tenancy provisioning and virtualization solution. You'll transform raw GPUs from diverse global suppliers into a programmable, orchestrated pool that serves thousands of AI developers and researchers.
Requirements
- Experience with bare-metal provisioning and lifecycle management (e.g., IPMI/Redfish, BMC, PXE, OS deployment)
- Experience with GPU scheduling and orchestration
- Experience with infrastructure and DevOps tools (e.g., Terraform or Pulumi, CI/CD, secrets management, configuration management, observability tools)
- Experience with storage and data infrastructure for AI/ML workloads (e.g., object storage, block storage, distributed file systems)
- Experience with API design and cloud-init
- Experience with GPU architecture, CUDA, and GPU compute
- Experience working with hardware vendors or vendor engineering teams
- Experience building and scaling cloud infrastructure or distributed systems in production environments
Bonus Skills
- Familiarity with high-performance networking technologies such as InfiniBand and RoCE
- Experience with distributed storage systems such as Ceph, Weka, or VAST Data