AI Evaluation Engineer (Data Analysis & Multi-Agent Systems)

Remote

Full time

Middle

Remote, Colombia

DockerPythonData Analysisagent systemsclarityEvaluationSenior AI Analytics EngineerAI Model Evaluation SpecialistAI Evaluation EngineerAI Analytics Engineer

Job description

We are looking for an AI Evaluation Engineer specialized in data analysis to design benchmark tasks that simulate real-world analytical workflows. The ideal candidate will have 5+ years of experience in data analysis or analytics-heavy roles, strong proficiency in Python and SQL, and experience working with real-world, messy datasets.

Requirements

Design and develop multi-agent benchmark tasks focused on complex data analysis workflows
Create or curate realistic datasets
Implement evaluation pipelines using Python and SQL
Create reproducible environments using Docker
Analyze task performance and refine for clarity, difficulty, and scoring accuracy

Benefits

Competitive salary

Match

Good match

We match every vacancy against your profile and show a fit score — so you instantly know which ones are worth applying to. Sign up and create a resume — it's free.

Not enough data to estimate a salary range for this role in this region yet.

About Public Offer Terms of Service Privacy Policy Support

By Region

Jobs in Europe
Jobs in USA
Jobs in Canada
Jobs in Russia

By Format

Remote Jobs
Relocation to USA
Hybrid Jobs
Office Jobs

By Experience Level

Junior Jobs
Middle Jobs
Senior Jobs