LLM Apps: Evaluation
Develop techniques for building, optimizing, and scaling AI validators with minimal human input. Learn to streamline evaluation pipelines and refine your AI systems with precision and efficiency.
Welcome to the course
Course setup and credits
Gemini, Veo, Imagen
Google AI Studio tour
The course use cases
Chapter intro
Data in evaluation
Annotation scores
Medical extraction notebook
Annotations in code
What is alignment?
Creating automated checks
Understanding test results
Chapter intro
Code diff application
Programatic criteria
Creating a baseline
Improving the system
Manual evaluation vs LLM evaluators
Takeaways
Chapter intro
Prompting
Evaluator
Dataset and metrics
Iteration
Structured outputs
Alignment
Alignment metrics
Cohen Kappa in essay use case
Setting a baseline
Improving the alignment
Chapter conclusion and next steps
Building LLM Evaluator with an agentic tool
Evaluating agents
At the end of the course, after watching the lessons and working with the code notebooks, I will
have solid understanding of why, how and when for LLM evaluation
know how to create a working LLM as a judge
improve my auto evaluation to align to human feedback with minimal human input
To best take advantage of the course we would recommend: familiarity with pythan, basic experience with using LLMs. If you have never build with LLMs we would recommend checking out our Building LLM powered apps course.
Address common questions ahead of time to save yourself an email.
Address common questions ahead of time to save yourself an email.