What Changed Today
JetBrains released Mellum2 — a 12B parameter Mixture-of-Experts (MoE) model specialized for software engineering tasks. It's open-source (Apache 2.0) and designed to be a fast, local "focal model" inside larger AI pipelines, not a replacement for frontier models like GPT-4 or Claude.
Key specs:
- 12B total params, 2.5B active per token — runs on a single GPU or even CPU with quantization
- MoE architecture: 64 experts, 8 activated per token
- 128K context window — handles large codebases
- Multi-Token Prediction (MTP) head for speculative decoding speedups
- Trained on ~10.6T tokens with a three-phase curriculum shifting from web → code → math
- Six checkpoints released covering the full training run
Source: MarkTechPost, The New Stack
Why This Matters for Developers and Business Owners
1. Local AI Becomes Actually Usable
Previous local coding models were either too small (2B-4B) to be useful, or too large (70B+) to run without expensive hardware. Mellum2 sits in the sweet spot: 12B total, but only 2.5B active per token due to MoE routing. That means:
- Run on a single RTX 3090/4090 (24GB VRAM)
- Or even CPU with 4-bit quantization for lighter tasks
- No API costs, no data leaving your machine
For teams handling proprietary code or working under compliance constraints, this is a game-changer.
2. It's a "Focal Model" — Not Trying to Be Everything
JetBrains is honest about positioning: Mellum2 is a specialized component, not a general-purpose replacement. Use it for:
- Code completion and inline suggestions
- Debugging assistance
- Refactoring and code review
- Function calling and tool use
Pair it with a frontier model for architecture decisions, and Mellum2 for the 80% of daily coding tasks.
3. Cost Math for Business Owners
| Approach | Monthly Cost (10 devs) | Data Control |
|---|---|---|
| GitHub Copilot Pro | $190/mo | Microsoft-hosted |
| Claude Code API | $500-2000/mo (variable) | Anthropic-hosted |
| Mellum2 (local) | $0 (hardware amortized) | Fully local |
| Mellum2 + small cloud GPU | ~$50-100/mo | You control the box |
For bootstrapped teams or agencies with tight margins, local models cut a real monthly expense.
How to Use It
Quick Start with Ollama
# Pull the model (when available on Ollama Hub — expected within days)
ollama pull mellum2:12b
# Or run from HuggingFace with llama.cpp
# Download weights from https://huggingface.co/jetbrains
# Start the server
ollama run mellum2:12b
Integration with JetBrains IDEs
JetBrains will likely integrate Mellum2 into AI Assistant as a local model option. Until then:
# Configure custom model in JetBrains AI Assistant
# Settings → AI Assistant → Custom Model → http://localhost:11434/v1/chat/completions
VS Code / Continue.dev Setup
// .continue/config.json
{
"models": [
{
"title": "Mellum2 Local",
"provider": "ollama",
"model": "mellum2:12b",
"apiBase": "http://localhost:11434"
}
]
}
API Example (Python)
import requests
response = requests.post("http://localhost:11434/api/generate", json={
"model": "mellum2:12b",
"prompt": "Refactor this Python function to use list comprehensions:\n\ndef get_even(nums):\n result = []\n for n in nums:\n if n % 2 == 0:\n result.append(n)\n return result",
"stream": False
})
print(response.json()["response"])
Production Notes / Gotchas
- MoE models need careful quantization — Standard 4-bit quantization may degrade expert routing. Use
Q6_KorQ8_0for critical work. - Context window is 128K but memory scales with it — Long files eat VRAM. Split large files into chunks for review tasks.
- Speculative decoding with MTP head — The built-in draft model can 2x speed, but requires compatible inference engine (llama.cpp dev branch or vLLM).
- Not multimodal — No image input. Stick to text/code tasks only.
- Apache 2.0 = commercial use OK — No attribution required beyond license file, but check your legal team if redistributing.
Bottom Line
Mellum2 is the first practical, open-source coding model that doesn't require enterprise hardware or enterprise budgets. For teams already using JetBrains tools, it's a natural upgrade path. For everyone else, it's proof that the "local AI" promise is finally becoming real.
Try it if: You pay for Copilot/Claude Code and wonder if there's a cheaper way.
Skip it if: You need frontier-level reasoning for architecture or complex debugging — pair it with GPT-4 instead.
Published on Build With Abdallah
Questions? Email: buildwithabdallah@gmail.com