We're looking for a skilled Data Engineer to join our growing data platform team. In this role, you'll be at the heart of building and scaling the infrastructure that powers our data-driven decisions - designing robust pipelines, shaping our lakehouse architecture, and ensuring data flows reliably from source to insight.

You'll work hands-on with Databricks and Apache Spark on AWS, taking ownership of end-to-end pipeline development while collaborating closely with analytics, engineering, and product teams. If you're someone who takes pride in clean, performant code and enjoys solving complex data challenges at scale, we'd love to hear from you.

Key responsibilities

1. Data Pipeline Development

  • Design, build, and maintain scalable ETL/ELT pipelines using Databricks (Apache Spark).

  • Implement batch and streaming data ingestion from multiple sources (databases, APIs, event streams).

  • Ensure pipelines are fault-tolerant, efficient, and cost-optimized on AWS.

2. Data Platform & Architecture

  • Develop and maintain data lake / lakehouse architectures using AWS S3, Delta Lake, and Databricks.

  • Implement medallion architecture (Bronze / Silver / Gold layers).

  • Optimize data storage formats (Parquet, Delta) and partitioning strategies.

3. AWS Services Integration

Work with AWS services such as:

  • S3 (data lake storage)

  • IAM (access control)

  • Lambda / Step Functions (orchestration)

  • Manage secure data access across AWS accounts and environments.

4. Databricks Development & Optimization

  • Develop notebooks and jobs using PySpark / SQL / Python

  • Optimize Spark jobs for performance and cost (cluster sizing, caching, joins, shuffles).

  • Manage Databricks Workflows, jobs, and cluster configurations.

  • Implement Unity Catalog for governance and data access control (if used). respond to failures and performance issues.

 

Requirements

 

1. Databricks & Spark

  • Hands-on experience with Databricks.

  • Strong proficiency in Apache Spark (PySpark and/or Scala).

  • Good command of Spark SQL and performance optimization techniques.

  • Experience with Delta Lake and lakehouse architectures.

 

2. Programming & Tools

  • Proficiency in Python for data engineering.

  • Experience writing clean, testable, and maintainable code.

  • Familiarity with SQL for data transformation and analytics.

  • Experience with Git and version control workflows.

 

3. Soft Skills

  • Strong problem-solving and analytical thinking skills.

  • Ability to work independently and take ownership of data pipelines.

  • Good communication skills and ability to collaborate with cross-functional teams.

  • Attention to detail and focus on data correctness and reliability.

Nice to Have

  • Familiarity with Unity Catalog or other data governance tools.

  • Experience supporting BI and analytics use cases.

  • Knowledge of cost optimization in AWS and Databricks.

  • Excellence Spark knowledge