Course Description

This hands-on course takes you deep into the world of Agent Operations—the backbone of modern AI workflows. You’ll learn how to design, build, and manage AI agents that can autonomously handle tasks, interact with APIs, collaborate with other agents, and integrate into production systems.

Starting with the fundamentals of agent frameworks, you’ll explore tools and concepts like task orchestration, memory management, multi-agent collaboration, error handling, and monitoring. The course culminates in building real-world projects that showcase how agents can automate workflows, power intelligent applications, and scale reliably in production environments.

Perfect for developers, AI enthusiasts, and professionals looking to operationalize agents at scale.

Course curriculum

  • 1
    Overview of Agent Evaluation
    • Intro to Agent Evaluation
    • Quiz
    • The Cycle of Evaluation
    • Quiz
    • Course Handouts
  • 2
    Agent Evaluation Techniques
    • LLM as a Judge
    • Human Annotations
    • Code-based Evals
    • Offline vs Online Evals
    • Quiz
  • 3
    Agent Evaluation Types
    • Router Evals
    • Skill Evals
    • Path_ Planning_ and Reflection Evals
    • Creating a ground truth labeled dataset
    • Quiz
  • 4
    End to End Example: Evaluating a Research Agent
    • Building your Agent
    • Tracing your agent in Production
    • Routers and Skill evaluation
    • Adding Trajectory Evaluations
    • Adding Structure to your evaluations
    • Monitoring Agents
    • Quiz

Who Should Enroll?

  • Developers & Engineers who want to build intelligent agent systems

  • AI/ML Practitioners exploring real-world agent operations

  • Automation Specialists scaling workflows beyond no-code tools

  • Product Managers & Innovators curious about leveraging AI agents in business use cases

  • Tech Enthusiasts eager to master the future of AI-driven automation

Key Takeaways

  • Understand the fundamentals of n8n and workflow automation

  • Build no-code/low-code workflows to automate daily tasks

  • Integrate popular tools (Slack, Gmail, Google Sheets, APIs, CRMs, databases, etc.)

  • Use advanced features like triggers, conditionals, loops, and error handling

  • Work with APIs, webhooks, and authentication in n8n

  • Deploy, monitor, and scale workflows for real-world use cases

About Instructor

Tanika Gupta - Director Data Science, Sigmoid

Tanika Gupta is a GenAI and data science leader with 13+ years of experience scaling AI solutions and building high-performing teams across global organizations like JPMorgan Chase, Mastercard, American Express, and now as Director of Data Science at Sigmoid. Currently, she leads a 60+ person team and drives enterprise-wide adoption of GenAI and builds multi-agent systems. She has also filed patents, spoken at international AI conferences, and mentored the next generation of AI talent through various platforms.
About Instructor

FAQ

  • What is Agent Evaluation, and why does it matter?

    Agent Evaluation is the process of measuring how well your AI agent performs in real-world scenarios. It ensures agents are not only functioning but also reliable, efficient, and aligned with the intended goals.

  • Do I need prior experience with evaluation frameworks?

    No. The course introduces evaluation methods step by step, starting with simple metrics and moving to advanced evaluation strategies, so both beginners and experienced learners can follow along.

  • What techniques will I learn for evaluating agents?

    You’ll explore performance metrics, benchmarking techniques, qualitative vs. quantitative evaluation, and tools for testing robustness, reliability, and fairness of agents.

  • Will I learn to evaluate multi-agent systems too?

    Yes. Beyond individual agent performance, you’ll practice evaluating multi-agent collaboration, role assignment efficiency, and workflow success rates.

  • How is evaluation connected to improving agents?

    Evaluation is not just about scoring agents—it feeds directly into iterative improvement, helping you identify weaknesses, fine-tune prompts, adjust logic, and improve overall reliability.