About This Role

Saaf AI is building the infrastructure backbone for modern mortgage operations by combining advanced AI with scalable, reliable systems. As part of a top-10 private lender processing billions in loan volume, backed by leading asset managers and funds, we are growing fast — and infrastructure and AI are central to how we build and operate.

As a Senior DevOps Engineer, you will own the infrastructure, deployment pipelines, and reliability practices that support our platform and enable engineering teams to ship quickly and safely. You will design and operate scalable systems, improve observability, and ensure high availability across critical workflows.

We are an AI-native engineering team, where AI-assisted tools are a regular part of how we build, deploy, and maintain infrastructure. From writing infrastructure code to debugging production issues and optimizing system performance, you will use these tools to improve efficiency and reliability. You will also support the infrastructure required to run AI-driven workflows in production, ensuring they are robust, scalable, and maintainable.

Key Responsibilities

Infrastructure & Cloud Operations

  • Design, build, and maintain production-grade AWS infrastructure using Infrastructure-as-Code (Terraform preferred).

  • Architect and manage serverless and containerized environments that balance cost, performance, and reliability.

  • Implement and maintain networking, security groups, IAM policies, and cloud resource configurations following least-privilege principles.

CI/CD & Deployment

  • Own and evolve the CI/CD pipeline ecosystem, primarily using GitHub Actions, to enable fast, safe, and repeatable deployments.

  • Implement deployment strategies (blue-green, canary, rolling) that minimize risk and downtime.

  • Automate build, test, and release workflows across multiple services and environments.

AI-Integrated DevOps

  • Leverage AI-assisted tools (code generation, intelligent autocomplete, automated IaC authoring) as a regular part of your infrastructure workflow to accelerate delivery and reduce configuration errors.

  • Use AI tools to support incident diagnosis, log analysis, runbook generation, and documentation of infrastructure decisions.

  • Evaluate and integrate emerging AI tools and practices into the team's DevOps processes.

  • Build and support the infrastructure layer for agentic workflows, including compute orchestration, autoscaling, and cost-efficient execution of AI-powered automation.

Monitoring, Observability & Incident Management

  • Design and maintain monitoring, logging, and alerting systems that provide clear visibility into platform health and performance.

  • Implement distributed tracing and structured logging across services and multi-step workflows.

  • Lead incident response, conduct post-mortems, and drive reliability improvements based on findings.

Security & Compliance

  • Apply cloud security best practices across all infrastructure, including secrets management, encryption, network segmentation, and access controls.

  • Design secure secrets and configuration management for agentic processes, including API keys, model tokens, and external service credentials.

  • Ensure infrastructure meets financial regulatory and compliance requirements with full auditability.

Data Infrastructure Support

  • Support and maintain infrastructure for data engineering workflows, including Snowflake environments, ETL/ELT pipelines, and dbt execution.

  • Manage serverless event-driven pipelines and orchestration tools (Step Functions, Temporal, or similar).

Team & Process

  • Collaborate with product engineers, data engineers, and founders to ensure infrastructure supports rapid iteration and reliable delivery.

  • Document infrastructure decisions, runbooks, and operational procedures to support team knowledge sharing and onboarding.

  • Regularly review and improve operational workflows, automation coverage, and infrastructure cost efficiency.

Qualifications

Required

  • 4+ years of experience in DevOps, SRE, or similar infrastructure-focused roles.

  • Proficient in AWS with strong Infrastructure-as-Code experience (Terraform preferred).

  • Strong CI/CD expertise with GitHub Actions.

  • Experience with containerization and serverless architectures.

  • Skilled in monitoring, logging, and incident management.

  • Strong scripting and automation skills in Bash, Python, or Node.js.

  • Knowledge of cloud security principles, least privilege, and compliance requirements.

  • Experience with Snowflake and data engineering workflows (ETL, dbt).

  • Exposure to Kubernetes and orchestration tools.

  • Understanding of serverless event-driven pipelines (Step Functions, Temporal).

  • Demonstrated, regular use of AI-powered development tools (e.g., Cursor, GitHub Copilot, Claude Code, or similar) to accelerate infrastructure authoring, debugging, or documentation workflows.

  • Startup mindset: hands-on, resourceful, and comfortable operating in a fast-paced environment.

Preferred

  • Experience with event-driven workflow orchestration tools such as Step Functions, Temporal, Airflow, or Prefect.

  • Familiarity with agentic workflow patterns, including integrating API-based decision points, asynchronous task handling, and dynamic routing of requests.

  • Understanding of infrastructure requirements for AI-powered automation, including latency optimization, autoscaling strategies, and cost-efficient compute for high-throughput processes.

  • Ability to design secure secrets and configuration management systems for agentic processes, including API keys, model tokens, and external service credentials.

  • Experience implementing observability for multi-step workflows, including distributed tracing, structured logging, and audit-friendly data pipelines.

  • Experience with prompt engineering for IaC generation, incident analysis, or building AI-powered operational tooling.

  • Prior early-stage startup experience is highly preferred

Benefits

  • Competitive salary

  • Unlimited PTO

  • Remote-first with flexible hours

  • Yearly professional development budget

  • Home office setup stipend