Course curriculum

    1. Welcome to the course

    2. Course setup and credits

    1. Chapter intro

    2. Data in evaluation

    3. Annotation scores

    4. Medical extraction notebook

    5. Annotations in code

    6. What is alignment?

    7. Creating automated checks

    8. Understanding test results

    1. Chapter intro

    2. Code diff application

    3. Programatic criteria

    4. Improving the system

    5. Manual evaluation vs LLM evaluators

    6. Takeaways

    1. Chapter intro

    2. Prompting

    3. Evaluator

    4. Dataset and metrics

    5. Iteration

    6. Structured outputs

    7. Alignment

    8. Alignment metrics

    9. Cohen Kappa

    10. Alignment conclusion

    1. Using an LLM to build a judge

    2. Evalforge

    1. Assignment

    2. Conclusion and next steps

About this course

  • Free
  • 30 lessons
  • 1 hour of video content

In collaboration with

Google logo
All Hands AI logo

Guest instructors

Paige Bailey

GenAI Developer Experience @ Google

Paige Bailey is the engineering lead for GenAI Developer Experience at Google. Paige has a deep understanding of the generative AI landscape, having previously served as an applied machine learning engineer at Microsoft and GitHub, and a product lead for Google's PaLM v2 and Gemini models. Paige is passionate about making cutting-edge AI technology accessible, and empowering developers to build the next generation of innovative applications.

Graham Neubig

Chief Scientist @ All Hands AI

Graham Neubig is an Associate Professor at Carnegie Mellon University, and Chief Scientist at All Hands AI. His research work focuses on AI agents for web browsing and code generation, as well as improvements to LLMs for multilingual and multimodal applications. He is a big proponent of open source and open science, including the OpenHands framework for software engineering agents, developed by All Hands AI.

What can I expect from this course?

At the end of the course, after watching the lessons and working with the code notebooks, I will

  • have solid understanding of why, how and when for LLM evaluation

  • know how to create a working LLM as a judge

  • improve my auto evaluation to align to human feedback with minimal human input

Your course instructors

Ayush Thakur

ML Engineer @ Weights & Biases

Ayush Thakur is a MLE at Weights and Biases and Google Developer Expert in Machine Learning (TensorFlow). He is interested in everything computer vision and representation learning. For the past 2 years he’s been working with LLMs and have covered RLHF and how and what of building LLM-based systems.

Anish Shah

ML Engineer @ Weights & Biases

Anish loves turning ML ideas into ML products. Anish started his career working with multiple Data Science teams within SAP, working with traditional ML, deep learning, and recommendation systems before landing at Weights & Biases. With the art of programming and a little bit of magic, Anish crafts ML projects to help better serve our customers, turning “oh nos” to “a-ha”s!

FAQ

  • Any prerequisites?

    To best take advantage of the course we would recommend: familiarity with pythan, basic experience with using LLMs. If you have never build with LLMs we would recommend checking out our Building LLM powered apps course.

  • Include questions a potential student may have before purchase.

    Address common questions ahead of time to save yourself an email.

  • Include questions a potential student may have before purchase.

    Address common questions ahead of time to save yourself an email.