Course curriculum

    1. Welcome to the course

    2. Course setup and credits

    1. Why do we need evaluation for LLM-based systems?

    2. How to correctly evaluate?

    3. How do we build an evaluation systematically?

    4. Building an optimized evaluation pipeline

    1. Programatic evaluation

    2. Manual evaluation

    3. 'What' and 'how' of LLM as a Judge or LLM validators?

    4. Notebook

    1. Prompting

    2. Position Bias

    3. Knowledge Bias

    4. Format Bias

    5. Scoring/numeric bias

    6. Single validators

    7. Iterative Refinement: Aligning LLM Evaluation with Human Feedback

    1. Using an LLM to build a judge

    2. Evalforge

    1. Assignment

    2. Conclusion and next steps

About this course

  • Free
  • 21 lessons
  • 0.5 hours of video content

Guest instructors

Paige Bailey

GenAI Developer Experience @ Google

Paige Bailey is the engineering lead for GenAI Developer Experience at Google. Paige has a deep understanding of the generative AI landscape, having previously served as an applied machine learning engineer at Microsoft and GitHub, and a product lead for Google's PaLM v2 and Gemini models. Paige is passionate about making cutting-edge AI technology accessible, and empowering developers to build the next generation of innovative applications.

What can I expect from this course?

At the end of the course, after watching the lessons and working with the code notebooks, I will

  • have solid understanding of why, how and when for LLM evaluation

  • know how to create a working LLM as a judge

  • improve my auto evaluation to align to human feedback with minimal human input

Your course instructors

Ayush Thakur

ML Engineer @ Weights & Biases

Ayush Thakur is a MLE at Weights and Biases and Google Developer Expert in Machine Learning (TensorFlow). He is interested in everything computer vision and representation learning. For the past 2 years he’s been working with LLMs and have covered RLHF and how and what of building LLM-based systems.

Anish Shah

ML Engineer @ Weights & Biases

Anish loves turning ML ideas into ML products. Anish started his career working with multiple Data Science teams within SAP, working with traditional ML, deep learning, and recommendation systems before landing at Weights & Biases. With the art of programming and a little bit of magic, Anish crafts ML projects to help better serve our customers, turning “oh nos” to “a-ha”s!

FAQ

  • Any prerequisites?

    To best take advantage of the course we would recommend: familiarity with pythan, basic experience with using LLMs. If you have never build with LLMs we would recommend checking out our Building LLM powered apps course.

  • Include questions a potential student may have before purchase.

    Address common questions ahead of time to save yourself an email.

  • Include questions a potential student may have before purchase.

    Address common questions ahead of time to save yourself an email.