Design, validate, and improve evaluation frameworks for AI agents.

Implement automated testing suites and regression for generative models.

Define and monitor quality metrics related to: Relevance, Fidelity, Coherence, Precision, and Hallucinations.

Build evaluation systems of the type “LLM-as-a-Judge”.

Establish performance benchmarks for new models and existing agents.

Validate updates of prompts, models, and RAG pipelines.

Collaborate with AI and development teams to define acceptance criteria (pass/fail).

Analyze evaluation results and propose continuous improvements.

Generate metric reports and traceability on the quality of agents.

AI QA Engineer - Calidad y Evaluación de IA Generativa

Job description