Name: LLM Apps: Evaluation
Price: $0.00 USD

LLM Apps: Evaluation

Develop techniques for building, optimizing, and scaling AI validators with minimal human input. Learn to streamline evaluation pipelines and refine your AI systems with precision and efficiency.

Pre-registration open

Course curriculum

1. Welcome to the course
2. Course setup and credits
1. Gemini, Veo, Imagen
2. Google AI Studio tour
1. The course use cases
2. Chapter intro
3. Data in evaluation
4. Annotation scores
5. Medical extraction notebook
6. Annotations in code
7. What is alignment?
8. Creating automated checks
9. Understanding test results
1. Chapter intro
2. Code diff application
3. Programatic criteria
4. Creating a baseline
5. Improving the system
6. Manual evaluation vs LLM evaluators
7. Takeaways
1. Chapter intro
2. Prompting
3. Evaluator
4. Dataset and metrics
5. Iteration
6. Structured outputs
7. Alignment
8. Alignment metrics
9. Cohen Kappa in essay use case
10. Setting a baseline
11. Improving the alignment
12. Chapter conclusion and next steps
1. Building LLM Evaluator with an agentic tool
2. Evaluating agents
1. Chapter intro
2. Position bias
3. Verbosity bias
4. Misinformation oversight bias
5. Criteria drift
6. Chapter conclusion and next steps
1. Imagen and Veo evaluation
2. Tool use evaluation
1. Assignment
2. Conclusion and next steps

About this course

Free
44 lessons
1.5 hours of video content

In collaboration with

Guest instructors

Paige Bailey

GenAI Developer Experience @ Google

Paige Bailey is the engineering lead for GenAI Developer Experience at Google. Paige has a deep understanding of the generative AI landscape, having previously served as an applied machine learning engineer at Microsoft and GitHub, and a product lead for Google's PaLM v2 and Gemini models. Paige is passionate about making cutting-edge AI technology accessible, and empowering developers to build the next generation of innovative applications.

Graham Neubig

Chief Scientist @ All Hands AI

Graham Neubig is an Associate Professor at Carnegie Mellon University, and Chief Scientist at All Hands AI. His research work focuses on AI agents for web browsing and code generation, as well as improvements to LLMs for multilingual and multimodal applications. He is a big proponent of open source and open science, including the OpenHands framework for software engineering agents, developed by All Hands AI.

What can I expect from this course?

At the end of the course, after watching the lessons and working with the code notebooks, I will

have solid understanding of why, how and when for LLM evaluation
know how to create a working LLM as a judge
improve my auto evaluation to align to human feedback with minimal human input

Your course instructors

Ayush Thakur

ML Engineer @ Weights & Biases

Ayush Thakur is a MLE at Weights and Biases and Google Developer Expert in Machine Learning (TensorFlow). He is interested in everything computer vision and representation learning. For the past 2 years he’s been working with LLMs and have covered RLHF and how and what of building LLM-based systems.

Anish Shah

ML Engineer @ Weights & Biases

Anish loves turning ML ideas into ML products. Anish started his career working with multiple Data Science teams within SAP, working with traditional ML, deep learning, and recommendation systems before landing at Weights & Biases. With the art of programming and a little bit of magic, Anish crafts ML projects to help better serve our customers, turning “oh nos” to “a-ha”s!

FAQ

Any prerequisites?

To best take advantage of the course we would recommend: familiarity with pythan, basic experience with using LLMs. If you have never build with LLMs we would recommend checking out our Building LLM powered apps course.
Include questions a potential student may have before purchase.

Address common questions ahead of time to save yourself an email.
Include questions a potential student may have before purchase.

Address common questions ahead of time to save yourself an email.

No evaluation -> Auto Evaluation in 2 hours

Pre-register now

LLM Apps: Evaluation

Course curriculum

Welcome to the course

Google AI Studio intro

Evaluation basics

Programatic and LLM Evaluations

Structuring LLM Evaluators

Case study: Building a LLM Evaluator with an agent

Improving LLM Evaluators

Case study: Google

Conclusion and course assignment

About this course

In collaboration with

Guest instructors

Paige Bailey

GenAI Developer Experience @ Google

Graham Neubig

Chief Scientist @ All Hands AI

What can I expect from this course?

Your course instructors

Ayush Thakur

ML Engineer @ Weights & Biases

Anish Shah

ML Engineer @ Weights & Biases

FAQ

No evaluation -> Auto Evaluation in 2 hours