DevOps / Platform Engineer – Healthcare SaaS (Azure, Python/Django)

Role SummaryWe are seeking a senior, hands-on DevOps / Platform Engineer to own production readiness, reliability, security, multi-tenancy, and cost optimization for our Azure-based, AI-native healthcare platform.
This role is critical to ensuring the company operates as a secure, compliant, highly available SaaS platform as we onboard regulated healthcare customers and scale multi-tenant workloads. You will act as the platform owner, working closely with engineering, security, and leadership.

Key ResponsibilitiesProduction Readiness & Reliability
Own production readiness across development, staging, and production environments

Design and implement:

Safe deployment strategies (blue-green, rolling, canary)
Automated rollbacks
Health checks, monitoring, and alerting
Operate and maintain high-availability, fault-tolerant systems
Lead incident response, root-cause analysis (RCA), and preventive remediation
Establish SLOs, SLIs, and operational runbooks
Multi-Tenancy & Scalability
Design and operate secure multi-tenant infrastructure for a healthcare SaaS platform
Implement tenant isolation across:Compute, Network, Data
Configuration and secrets
Enable tenant-aware deployments and customer onboarding
Ensure scalability without cross-tenant impact, data leakage, or performance degradation

Azure Cloud Infrastructure Design, build, and operate Azure-first infrastructure, including:

AKS and containerized microservices

App Services, Azure Functions, and background workers
Azure SQL, Cosmos DB, Blob Storage
VNETs, private endpoints, NSGs, firewalls, and ingress controls
Manage infrastructure using Infrastructure as Code (Terraform preferred; Bicep/ARM acceptable)
Ensure environments are reproducible, auditable, and secure by default

Azure Security & Identity Implement and manage Azure security and identity services:
Azure Entra ID (RBAC, managed identities, conditional access)

Microsoft Defender for Cloud

Azure Key Vault (secrets, keys, certificates)
Enforce least-privilege access, strong authentication, and audit logging
Support SOC 2 Type I & II and HIPAA-aligned security controls
Partner with security and engineering teams on threat modeling and compliance readiness

CI/CD & Release Engineering Build and maintain CI/CD pipelines using GitHub Actions and/or Azure DevOps

Enable:

Zero-downtime deployments

Versioned APIs and backward compatibility
Environment-specific configuration and secrets
Improve release reliability, deployment speed, and developer productivity

Cost Optimization (FinOps) Monitor and optimize Azure cloud spend across environments and tenants

Implement:

Budgets, alerts, and cost attribution

Environment-level and tenant-level cost visibility
Right-size compute, storage, and networking resources
Partner with engineering and leadership on cost forecasting and optimization

AI & Data Platform EnablementSupport AI-native workloads, including Azure OpenAI–based services
Operate document ingestion and event-driven pipelines (fax, PDFs, clinical data)
Ensure secure handling of PHI and regulated healthcare data across pipelines
Support scalable, resilient background processing and async workloads

Customer Onboarding & IntegrationsSupport production onboarding of new healthcare customers
Enable repeatable, automated deployment and go-live processes
Support integrations with payer platforms, EHRs, and external vendors
Act as a technical escalation point during customer launches

QualificationsRequired
7+ years of experience in DevOps, SRE, or Platform Engineering
Strong hands-on experience with Microsoft Azure
Experience operating production SaaS platforms

Deep experience with:

Kubernetes / AKS

Docker and containerized workloads
CI/CD pipelines
Infrastructure as Code (Terraform, Bicep, or ARM)
Experience designing and operating multi-tenant architectures
Strong understanding of cloud security, identity, and access management
Experience with regulated or compliance-driven environments (SOC 2, HIPAA, etc.)
Experience with cloud cost optimization / FinOps
Experience with Azure OpenAI or AI/ML platforms

Nice to HaveExperience supporting Python / Django production systems
Prior ownership of production incident management and reliability metrics