Nano Course: Building Large Language Models for Code

In this Free Nano GenAI Course on Building Large Language Models for Code, you will-

  • Learn how to train LLMs for Code from Scratch covering Training Data Curation, Data Preparation, Model Architecture, Training, and Evaluation Frameworks.

  • Explore each step in-depth, delving into the algorithms and techniques used to create StarCoder, a 15B code generation model trained on 80+ programming languages.

  • Understand and learn the best practices to train your own StarCoder on the data


  • 38 Mins

  • 4.7

  • Intermediate

Key Takeaways from the “Nano Course: Building Large Language Models for Code”

  • Learn how to train LLMs for code fom scratch

  • Deep dive into StarCoder journey

  • Understand algorithms and techniques used at each step involved in development of StarCoder

  • Learn best practices to train your own StarCoder model on data

  • Explore the model architecture, training and evaluation frameworks for Code LLMs

Course curriculum

  • 1
    Building Large Language Models for Code
    • Introduction
    • Agenda
    • BigCode Community
    • Training LLMs for Code from Scratch: Training Data Curation
    • Training Data Formatting and Preprocessing
    • Model Architecture
    • BigCode Ecosystem
    • Training Frameworks
    • Model Evaluation
    • Tools and Descendants of StarCoder

Instructor

Loubna Ben Allal, ML Engineer at Hugging Face

Loubna Ben Allal is a Machine Learning Engineer at Hugging Face. She has been working on LLMs for code. She is part of the core team of BigCode that released The Stack dataset, SantaCoder, and StarCoder models.
Instructor