AI Evaluation Engineer (Data Analysis & Multi-Agent Systems)

Remote

Full time

Middle

Remote, Egypt

DockerPythonData AnalysisJSONagent systemsagentsAnomaly detectionclarityEvaluationlogsStatistical AnalysisverificationSenior AI Analytics EngineerInterpretationreports

Job description

We are looking for an AI Evaluation Engineer to design benchmark tasks that simulate real-world analytical workflows, with a focus on data analysis, multi-agent systems, and verification logic.

Requirements

Design and develop multi-agent benchmark tasks focused on complex data analysis workflows
Create or curate realistic datasets (CSV, JSON, logs, reports, financial or operational data)
Build tasks requiring: Cross-referencing across multiple data sources, Anomaly detection and contradiction identification, Statistical analysis and interpretation
Define task decomposition strategies across specialized sub-agents (e.g., financial, technical, operational analysis)
Develop verification logic to validate precise analytical outputs (not generic summaries)
Implement evaluation pipelines using Python and SQL
Create reproducible environments using Docker
Analyze task performance and refine for clarity, difficulty, and scoring accuracy

Match

Good match

We match every vacancy against your profile and show a fit score — so you instantly know which ones are worth applying to. Sign up and create a resume — it's free.

Not enough data to estimate a salary range for this role in this region yet.

About Public Offer Terms of Service Privacy Policy Support

By Region

Jobs in Europe
Jobs in USA
Jobs in Canada
Jobs in Russia

By Format

Remote Jobs
Relocation to USA
Hybrid Jobs
Office Jobs

By Experience Level

Junior Jobs
Middle Jobs
Senior Jobs