About the Role
We are building a multi-tenant, hardware-agnostic IoT platform from the ground up. We need a senior engineer who can design the system architecture in the morning and write production backend code in the afternoon. This is not an architecture-only role and not a coding-only role — it is both, simultaneously, in a fast-moving early-stage environment. You will also own cloud infrastructure as interim DevOps until we scale.
The Ideal Candidate
You have built IoT backend platforms before — not just used them. You understand the hard problems: device auth at scale, MQTT broker design, time-series ingestion performance, multi-tenant data isolation, and real-time delivery to web clients. You are comfortable making architectural decisions autonomously, documenting them clearly, and standing by them. You work remotely with discipline — you flag risks before they become problems.
Key Responsibilities
Platform Architecture
- Design the full end-to-end IoT platform architecture: device connectivity layer → MQTT/protocol ingestion → stream processing → time-series storage → REST/GraphQL API layer → real-time WebSocket delivery
- Define the multi-tenant data model: strict data isolation between customers, tenant-scoped API tokens, row-level security
- Design the device lifecycle system: provisioning, X.509/JWT authentication, device registry, status tracking, decommissioning
- Architect the protocol abstraction layer so MQTT, Modbus, OPC-UA, CoAP, and HTTP devices all normalise to the same internal data model
- Design a configurable rule engine: event-condition-action rules for alerts, automations, and integrations — no code required from customers
- Plan OTA firmware update management: secure delivery, versioning, rollback, fleet orchestration
- Write Architecture Decision Records (ADRs) for every major technical choice — nothing undocumented
- Design the scaling path from 100 devices (pilot) to 500,000+ (production) without structural rework
Backend Development
- Build core platform services from scratch: device management, telemetry ingestion, rule engine, notification/alerting, OTA update, multi-tenant API gateway
- Develop REST and GraphQL APIs with full OpenAPI specification — version-controlled from Day 1
- Implement WebSocket and SSE endpoints for real-time telemetry delivery to web and mobile clients
- Build device command-and-control with acknowledgement, retry logic, and timeout handling
- Implement device shadow service: last-known state of every device accessible even when offline
- Write unit, integration, and load tests — no service reaches staging without test coverage
- Own service reliability: SLO definitions, alerting runbooks, on-call incident response
Cloud Infrastructure (Interim)
- Provision and manage all AWS environments (dev, staging, production) using Terraform — no click-ops
- Configure AWS IoT Core: MQTT endpoint, topic namespace, rules engine, certificate management
- Set up CI/CD pipelines via GitHub Actions for all backend services
- Configure CloudWatch monitoring, log aggregation, and automated health alerts
- Manage IAM for all team members — least-privilege access, no shared credentials
- Hand off infrastructure fully documented when a DevOps engineer joins in Phase 2
Requirements
- 7–12 years software or systems engineering; minimum 4 years specifically building IoT platform backends or connected product infrastructure
- Expert-level, hands-on experience with AWS IoT Core or Azure IoT Hub — production deployments, not tutorials ⚑ NON-NEGOTIABLE
- Expert MQTT knowledge: v3.1 and v5.0, topic hierarchy design, QoS levels, retained messages, Last Will & Testament, broker sizing and clustering ⚑ NON-NEGOTIABLE
- Proficiency in Python and Node.js/TypeScript for production backend services — Go is a strong advantage
- Hands-on experience with a time-series database: InfluxDB, TimescaleDB, or AWS Timestream
- Terraform or AWS CloudFormation — you provision cloud infrastructure programmatically, not through the console
- Multi-tenant SaaS backend architecture: data isolation patterns, tenant-scoped access control, shared infrastructure design
- Security fundamentals applied in practice: TLS/mTLS, X.509 certificates, OAuth 2.0, JWT, secrets management (Vault or AWS Secrets Manager)
- Message broker or streaming experience: Kafka, RabbitMQ, AWS Kinesis, or AWS IoT Rules Engine
- Proven ability to work autonomously at a senior level — makes decisions, documents rationale, flags risks without needing to be prompted ⚑ REMOTE DISCIPLINE
Nice to Have
- Industrial protocol knowledge: Modbus TCP/RTU, OPC-UA, BACnet — even as a consumer or integrator
- EMQX, HiveMQ, or VerneMQ broker deployment and production operation
- Edge computing runtimes: AWS Greengrass v2, Azure IoT Edge, or Balena
- Digital twin frameworks: AWS IoT TwinMaker, Azure Digital Twins
- Container orchestration: Kubernetes, ECS, or equivalent for future Phase 2 migration
- Open-source IoT contributions or published technical writing on platform architecture
Skills at a Glance
Architecture: IoT platform end-to-end design · Multi-tenant SaaS patterns · Device lifecycle management · Protocol abstraction · Rule engine design · Horizontal scaling strategy
Backend: Python / Node.js / TypeScript / Go · REST + GraphQL API design · WebSocket / SSE real-time delivery · MQTT broker configuration · Time-series DB (InfluxDB / Timestream) · PostgreSQL or equivalent RDBMS
Cloud & DevOps: AWS IoT Core / Azure IoT Hub · Terraform / CloudFormation · GitHub Actions CI/CD · Docker containers · CloudWatch monitoring · IAM and security policy management
Security: TLS / mTLS configuration · X.509 certificate management · OAuth 2.0 / JWT implementation · Secrets management · Device authentication at scale