Design and run short multi-turn conversations (typically 1–5 turns) intended to test AI personalization behavior

Create prompts grounded in realistic personal scenarios to evaluate contextual understanding

Review AI responses to determine whether personalization is correctly applied

Check grounding quality to ensure the model does not invent unsupported claims about the user

Evaluate integration quality — confirming personal signals are used naturally (not forced or robotic)

Compare two responses side-by-side and determine which is more helpful, natural, and relevant

Write clear, structured rationales explaining rankings and referencing specific conversation turns

Verify debug information showing whether correct data sources were used

Maintain strict workflow hygiene (including deleting evaluation conversations when required)

AI Content Analyst in GERMAN Language

Job description