## Release Plan for v0.2.0 * Release Manager: TBD * Code Freeze Date: TBD * Test Verification/Bug Bash: * Release Date: * Release Note: ----- ## Features(P0) - [x] explicit warp specialize - [ ] tile scheduler - [x] support transposeB=False for Rocm - [x] Correctness Evaluation - [x] Layout Swizzling ## Kernels (P0) - [x] Implement Flash MLA kernel - [x] init version - [x] optimize to SoTA - [x] MI300 - [x] Implement NSA kernel - [x] init version - [x] decoding - [x] varlen - [x] fuse topk - [x] bwd - [x] MI300 - [x] Implement Flash seerAttention - [x] init version - [x] different q/kv seq - [x] varlen - [x] bwd - [x] optimize TileLang Flash Attention kernel to SoTA - [x] H100 - [x] MI300 - [ ] Complete support for commonly used attributes in Flash Attention - [x] varlen - [ ] mask/bias - [ ] list all supported dims (benchmark) - [ ] fa3 dim 256 fwd + bwd - [ ] fa3 bwd (64, 128) ----- ## Backends #56 - [x] Pass and Migrate CI to H100 - [ ] fix fp16xfp4 dequant: testing/python/kernel/test_tilelang_kernel_dequantize_gemm.py: test_simple_impl_float16xfp4_gemm - [ ] fix tma load for float32: testing/python/kernel/test_tilelang_kernel_gemm.py:test_gemm_f32f32f32_nn - [x] Add support for WebGPU - [ ] Add support for Metal - [ ] Add support for Hexagon ## Kernels - [ ] compare with deepGemm - [ ] e2e example: kernel develop flow - [ ] Support FP8/INT8 `T.gemm` - [ ] Add Examples to CI Test - [ ] optimize TileLang Flash Attention kernel to SoTA on A100 ## Features - [x] Nightly Build - [ ] Update API: Replace all tilelang.lower into tilelang.compile in examples and tests. - [ ] Reduce LLVM dependencies - [ ] Provide prebuilt and PyPI packages for ROCm platforms - [ ] Integrate TileLang with Torch Inductor - [ ] Configure API access level to enable advanced features ## Cost Model - [x] Integrate Cost Model Carver into auto-tuning
Release Plan for v0.2.0
Features(P0)
Kernels (P0)
Implement Flash MLA kernel
Implement NSA kernel
Implement Flash seerAttention
optimize TileLang Flash Attention kernel to SoTA
Complete support for commonly used attributes in Flash Attention
Backends #56
Kernels
T.gemmFeatures
Cost Model