We’re looking for a Site Reliability Engineer (SRE) to join our team in Cyprus (on-site) or remotely. In this role, you will be responsible for maintaining the stability and reliability of our production environment.
Responsibilities:
- Ensure the stability of production and development infrastructure
- Develop and improve monitoring, alerting, and observability (metrics, logs, tracing)
- Configure and optimize metrics and logging systems
- Analyze incidents and prevent their recurrence
- Work with alerts and improve their quality
- Increase service reliability and fault tolerance
- Optimize system performance and stability
Key competencies:
- Strong understanding of Linux
- Experience as an SRE / DevOps / System Engineer
- Solid experience with monitoring and alerting tools (Prometheus, Grafana, or similar)
- Understanding of observability (metrics, logs, tracing)
- Experience with Kubernetes and containerization
- Experience in incident analysis and production troubleshooting
- Automation skills (Bash, Python)
- Understanding of networking, performance, and fault tolerance
- Experience with GCP is a plus
We offer:
- Remote work or from our office in Limassol
- Compensation for English or Greek classes
- Health insurance (only for Cyprus)
- Office lunches (only for Cyprus)
- Flexible start of the working day