burmese-coding-eval
A benchmark and dataset collection for Burmese programming assistants.
Want to try it live?
Experience our AI assistant in action. Visit the Model Arena to chat with our Burmese language AI.
Go to Model ArenaThe Evaluation Suite
Properly measuring the safety and accuracy of language models requires rigid benchmarks. burmese-coding-eval is a specialized multi-track framework built to test code correctness, linguistic coherence, and cultural appropriateness of AI-generated code from Burmese prompts.
Core Datasets
- burmese-mbpp: A localized, translated, and culturally aligned variant of the Mostly Basic Python Problems dataset.
- burmese-human-eval: A rigorous adaptation of the standard HumanEval logic programming tests optimized for Myanmar syntax parameters.
Impact
By standardizing how we measure AI performance in Myanmar languages, burmese-coding-eval accelerates the development and reliability of local AI coding assistants, allowing researchers to compete objectively and refine model architectures based on empirical linguistic criteria.
What connects this benchmark page
The main pages stay first so the benchmark page sits with the model page, the white paper, and the source repository.
| Item | Source | Why it matters |
|---|---|---|
| Benchmark page | burmese-coding-eval | Keeps the benchmark page easy to find. |
| Model under evaluation | Burmese-Coder-4B | Shows the benchmark’s direct connection to the code model. |
| Base model reference | Burmese GPT | Explains the language foundation behind the benchmarked model family. |
| Technical white paper | Documents the benchmark design and evaluation methodology. | |
| Source repository | GitHub | Provides the implementation source for the benchmark suite. |