hero

Search open roles at our portfolio companies

companies
Jobs

Agent Performance Manager

Scaled Cognition

Scaled Cognition

United States
Posted on Oct 16, 2025
Scaled Cognition is the world’s only model lab dedicated exclusively to customer experience and pioneering agentic models purpose-built for reliable action-taking enterprise applications. Backed by Khosla Ventures, the company’s flagship Agentic Pretrained Transformer (APT) eliminates hallucinations, enforces enterprise policies and increases reliability in real-world CX workflows. Founded by serial AI entrepreneurs, former Microsoft Corporate Vice President of Conversational AI Dan Roth, and UC Berkeley AI Professor Dan Klein, and built by a team of world-class PhD researchers and engineers, Scaled Cognition advances the science of agentic AI to deliver safe, policy-aligned automation that enterprises can trust.

As an Agent Performance Manager at Scaled Cognition you will:

  • Develop and implement scalable QA plans for evaluating AI agents, defining key performance metrics to measure progress over time.
  • Collaborate with product and engineering teams to document findings, test fixes, and recommend improvements to the underlying models and conversational flows.
  • Lead and mentor a team of QA engineers, establishing best practices and processes for testing conversational AI agents.

Example projects could include:

  • Building test sets to track regressions, agent robustness, and end-to-end testing.
  • Reviewing and analyzing voice and chat transcripts, and quickly identify conversational gaps and provide data for faster iteration on customer deployments.
  • Designing and automating testing pipelines to scale QA capacity across a diverse portfolio of customers and to continuously evaluate the performance of our AI agents.

Preferred Qualifications:

  • Intermediate-level proficiency in Python and experience building and testing conversational AI/LLM systems.
  • Background in implementing evaluation benchmarks, and production monitoring metrics.
  • Experience working with libraries and tooling common in the AI/LLM ecosystem.
  • Demonstrated precision in documenting test plans, test cases, and bug reports, ensuring data is accurate and easily understandable by cross-functional teams.
  • Experience with leveraging AI-powered assistants/tooling to enable rapid iteration, prototyping, and accelerated delivery.