LLM Engineering: Evaluation
Develop techniques for building, optimizing, and scaling AI validators with minimal human input. Learn to streamline evaluation pipelines and refine your AI systems with precision and efficiency.
Welcome to the course
Course setup and credits
Chapter intro
Data in evaluation
Annotation scores
Medical extraction notebook
Annotations in code
What is alignment?
Creating automated checks
Understanding test results
Chapter intro
Code diff application
Programatic criteria
Improving the system
Manual evaluation vs LLM evaluators
Takeaways
Chapter intro
Prompting
Evaluator
Dataset and metrics
Iteration
Structured outputs
Alignment
Alignment metrics
Cohen Kappa
Alignment conclusion
Using an LLM to build a judge
Evalforge
Assignment
Conclusion and next steps
At the end of the course, after watching the lessons and working with the code notebooks, I will
have solid understanding of why, how and when for LLM evaluation
know how to create a working LLM as a judge
improve my auto evaluation to align to human feedback with minimal human input
To best take advantage of the course we would recommend: familiarity with pythan, basic experience with using LLMs. If you have never build with LLMs we would recommend checking out our Building LLM powered apps course.
Address common questions ahead of time to save yourself an email.
Address common questions ahead of time to save yourself an email.