Nano Course: Building Large Language Models for Code
In this Free Nano GenAI Course on Building Large Language Models for Code, you will-
Learn how to train LLMs for Code from Scratch covering Training Data Curation, Data Preparation, Model Architecture, Training, and Evaluation Frameworks.
Explore each step in-depth, delving into the algorithms and techniques used to create StarCoder, a 15B code generation model trained on 80+ programming languages.
Understand and learn the best practices to train your own StarCoder on the data
Key Takeaways from the “Nano Course: Building Large Language Models for Code”
-
Learn how to train LLMs for code fom scratch
-
Deep dive into StarCoder journey
-
Understand algorithms and techniques used at each step involved in development of StarCoder
-
Learn best practices to train your own StarCoder model on data
-
Explore the model architecture, training and evaluation frameworks for Code LLMs
Course curriculum
-
1
Getting Started with Code LLMs
- Introduction
- Agenda
- BigCode Community
- Quiz
-
2
Data Curation and Preparation
- Training LLMs for Code from Scratch: Training Data Curation
- Training Data Formatting and Preprocessing
- Model Architecture
- Quiz
-
3
Ecosystem and Infrastructure
- BigCode Ecosystem
- Training Frameworks
- Model Evaluation
- Quiz
-
4
Beyond the Model
- Tools and Descendants of StarCoder
Instructor
Loubna Ben Allal, ML Engineer at Hugging Face
