This repository is a hands-on study project to build deep learning parallelism from scratch.
Inspired by the Ultrascale Playbook, the goal is not just to use distributed training libraries, but to understand and implement them step by step.
- Understand how modern large-scale training works
- Implement core parallelism techniques manually
- Share the learning process with others
dp1 -> dp2 -> dp3 -> zero1 -> zero2 -> fsdp -> device_mesh
data_parallelism/
├── dp1.py
├── dp2.py
├── dp3.py
├── zero1.py
├── zero2.py
├── fsdp.py
├── device_mesh.py
├── dp_benchmark.py
├── utils.py
Work in progress. Expect incomplete and evolving code.