#burmese-coder#myanmar llm#code generation#qlora#gemma#mlx#gguf

Burmese-Coder-4B: Training a Code LLM for Myanmar Developers Burmese-Coder-4B: Training a Code LLM for Myanmar Developers

· ·
Article Summary

Burmese-Coder-4B is a 4B Burmese coding assistant by Dr. Wai Yan Nyein Naing, adapted from the Gemma-3 4B family via supervised fine-tuning on Burmese MBPP (974 tasks) and a DPO alignment stage to reduce mixed-language drift. Evaluation uses a two-track pipeline (Pass@1 + LLM-as-a-judge rubric scoring with Gemini 2.5 and DeepSeek V3) as described in the technical whitepaper.

Burmese-Coder-4B is a 4B Burmese coding assistant by Dr. Wai Yan Nyein Naing, adapted from the Gemma-3 4B family via supervised fine-tuning on Burmese MBPP (974 tasks) and a DPO alignment stage to reduce mixed-language drift. Evaluation uses a two-track pipeline (Pass@1 + LLM-as-a-judge rubric scoring with Gemini 2.5 and DeepSeek V3) as described in the technical whitepaper.

Introduction

When a Burmese-speaking developer asks their AI assistant “Python ဖြင့် list ကို sort လုပ်နည်း” (How do I sort a list in Python?), the response should not just be code — it should include a clear explanation in Myanmar language.

Burmese-Coder-4B was built to make this possible.


Architecture & Training

Base Model

Burmese-Coder-4B is adapted from the Gemma-3 4B family. The goal is not only “correct code”, but also stable Burmese explanations without mixed-script contamination.

Two-Stage Adaptation: SFT → DPO

SFT to DPO training pipeline for Burmese-Coder-4B

The technical whitepaper describes a two-stage pipeline:

Stage 1 — Supervised Fine-Tuning (SFT) using LoRA (4-bit loading):

  • Context length: 2048
  • LoRA rank: 16
  • LoRA alpha: 16
  • Batch size: 16
  • Gradient accumulation: 4
  • Learning rate: 2e-4
  • Warmup steps: 15

Stage 2 — Direct Preference Optimization (DPO) alignment (to reduce multilingual drift):

  • Context length: 2048
  • 4-bit loading
  • Batch size: 2
  • Gradient accumulation: 16
  • Learning rate: 3e-6
  • Max steps: 300
  • β: 0.5

Training Data (Authoritative)

Per the whitepaper, the supervised training corpus is Burmese MBPP (974 tasks). Each instance includes:

  • a Burmese instruction
  • a Python solution
  • a Burmese explanation of the code logic

For held-out functional evaluation, the benchmark is Burmese HumanEval (run via the burmese-coding-eval pipeline).


Evaluation: burmese-coding-eval (Two-Track)

The whitepaper uses a two-track evaluation framework:

Metrics

TrackMetricDescription
FunctionalPass@1Unit-test correctness on Burmese HumanEval
Judge-basedRubric scoreLLM-as-a-judge scoring across multiple dimensions

The judge-based rubric evaluates: fluency, instruction following, semantic correctness, terminology quality, and a mixed-language penalty. Rubric scores are reported with two separate judges: Gemini 2.5 and DeepSeek V3.

Main Results (from the whitepaper)

ModelPass@1 (%)Rubric (DeepSeek)Rubric (Gemini)
burmese-coder-4b62.03.4563.779
gemma3_4b62.02.9393.203
qwen2.5_3b45.01.2202.526

Mixed-language contamination drops sharply after alignment:

  • Under Gemini: 0.69 → 0.02
  • Under DeepSeek: 0.72 → 0.09

Where to Get It

For the main project links and the latest release artifacts, use the project pages:


Impact

Burmese-Coder-4B represents the first serious attempt to bridge the gap between:

  • Modern code generation AI (GitHub Copilot, Claude, GPT-4)
  • Myanmar-language developers who think, document, and communicate in Burmese

By making LLMs accessible in Myanmar language, we lower the barrier for thousands of developers who would otherwise have to navigate AI assistance exclusively in English.


Try It



Keywords: Burmese-Coder-4B, Myanmar LLM, Gemma fine-tuning, QLoRA, Burmese code generation, Myanmar developer tools, WYNN747, Dr. Wai Yan Nyein Naing

← All Articles