LLM Engineering: Evaluation
Develop techniques for building, optimizing, and scaling AI validators with minimal human input. Learn to streamline evaluation pipelines and refine your AI systems with precision and efficiency.
Welcome to the course
Course setup and credits
Why do we need evaluation for LLM-based systems?
How to correctly evaluate?
How do we build an evaluation systematically?
Building an optimized evaluation pipeline
Programatic evaluation
Manual evaluation
'What' and 'how' of LLM as a Judge or LLM validators?
Notebook
Prompting
Position Bias
Knowledge Bias
Format Bias
Scoring/numeric bias
Single validators
Iterative Refinement: Aligning LLM Evaluation with Human Feedback
Using an LLM to build a judge
Evalforge
Assignment
Conclusion and next steps
At the end of the course, after watching the lessons and working with the code notebooks, I will
have solid understanding of why, how and when for LLM evaluation
know how to create a working LLM as a judge
improve my auto evaluation to align to human feedback with minimal human input
To best take advantage of the course we would recommend: familiarity with pythan, basic experience with using LLMs. If you have never build with LLMs we would recommend checking out our Building LLM powered apps course.
Address common questions ahead of time to save yourself an email.
Address common questions ahead of time to save yourself an email.