Introducing North Mini Code: Cohere's First Model For Developers

Overview

Cohere released North Mini Code, its first model built specifically for software developers and agentic coding work. The open-weight model packs 30 billion total parameters but activates only 3 billion per query through a Mixture-of-Experts design, and it ships free under the permissive Apache 2.0 license. It scores 80.2% pass@10 on the SWE-Bench Verified benchmark, beating several models four times its size while running on a single H100 GPU.

Key Takeaways

North Mini Code is Cohere's first model aimed at developers, built for writing code, fixing bugs across whole repositories, and running terminal-based agent workflows.
The Mixture-of-Experts design uses 30 billion total parameters with only 3 billion active per query, routing each request to a small set of specialized expert networks to keep running costs low.
It is open-weight under the Apache 2.0 license and available on Hugging Face, through the Cohere API, and with OpenCode integration.
On SWE-Bench Verified it reaches 80.2% pass@10, outperforming larger systems such as Nemotron 3 Super and Mistral Small 4 that carry over 100 billion parameters.
Cohere reports up to 2.8 times higher output throughput than the competing Devstral Small 2 model, lowering the per-task cost of running coding agents at scale.
The model runs on a single H100 GPU when served in its FP8 quantized form, putting enterprise-grade coding within reach of modest hardware.

Stats & Key Facts

#30 billion total parameters, with only 3 billion active per query through Mixture-of-Experts routing.
#128 expert networks, with 8 activated per token, for scale without proportional compute cost.
#80.2% pass@10 on SWE-Bench Verified, a leading result for the model's size class.
#Trained on more than 70,000 verifiable tasks drawn from roughly 5,000 unique code repositories.
#Up to 2.8 times higher output throughput than Devstral Small 2, reducing cost per coding task.
#Reinforcement learning added 7.9 percentage points on Terminal-Bench v2 and 3.0 points on SWE-Bench over the fine-tuned checkpoint.

Cohere's First Model Built for Developers and Agentic Coding

North Mini Code marks Cohere's entry into the developer-tools market.

North Mini Code is the first in a new family of Cohere models designed for software engineering work. The company built it for tasks such as writing code, fixing bugs across whole repositories, and running terminal-based agent workflows where the model takes multiple steps on its own.

Cohere positions the release around accessibility and what it calls sovereign AI, meaning organizations run the model on their own terms rather than depending on a closed service. The model published on June 9, 2026, and is open to teams that want to inspect, modify, and self-host it.

How the 30B Mixture-of-Experts Design Cuts Compute Cost

The architecture is the reason a small-footprint model performs like a much larger one.

›30 billion total parameters, but only 3 billion active for any single query.
›128 expert networks, with 8 routed in per token by a sigmoid router.
›Interleaved sliding-window and global attention in a 3-to-1 ratio, using RoPE positioning.
›SwiGLU activation inside the feed-forward blocks.
›A context window of 128,000 tokens, large enough to hold sizeable codebases in working memory.

A Mixture-of-Experts model splits its knowledge across many specialized sub-networks and sends each request to only a few of them. This keeps the running cost close to a 3 billion parameter model while preserving the breadth of a 30 billion parameter one.

Benchmark Scores That Beat Models Four Times Its Size

Cohere reports results across three widely used coding benchmarks.

›80.2% pass@10 on SWE-Bench Verified, which tests fixing real GitHub issues.
›61.0% pass@1 on the Mini-SWE-Agent test.
›Measured gains on Terminal-Bench v2, which checks command-line agent skill.
›33.4 on the Artificial Analysis Coding Index, ahead of Qwen3.5, Gemma 4, and Devstral Small 2.

Cohere states the model outperforms substantially larger systems, including Nemotron 3 Super at roughly 120 billion parameters, Mistral Small 4 at about 119 billion parameters, and Devstral 2 at 123 billion parameters. The takeaway for buyers is that raw parameter count no longer predicts coding ability the way it once did.

Efficiency and Throughput as the Business Case

For companies running agents at scale, speed and cost matter as much as accuracy.

›Up to 2.8 times higher output throughput than Devstral Small 2.
›Runs on a single H100 GPU when served in FP8 form.
›Both BF16 and FP8 quantized versions are published on Hugging Face.

Coding agents often loop through many model calls to plan, edit, test, and retry. Higher throughput means each loop finishes faster and costs less, so the savings compound across a team's daily workload. Running on one accelerator also lowers the barrier for teams without large GPU clusters.

How Cohere Trained the Model on Real Repositories

Training combined supervised learning with reinforcement learning across two stages.

›More than 70,000 verifiable tasks drawn from roughly 5,000 unique code repositories.
›Two stages of supervised fine-tuning, with code making up 61% of trainable tokens in the second stage.
›A second-stage data mixture of 4.5 billion tokens.
›Context length grew from 64,000 tokens in stage one to 128,000 tokens in stage two.

A reinforcement learning stage, run across both terminal and software engineering environments, sharpened the model after fine-tuning. That stage added 7.9 percentage points on Terminal-Bench v2 and 3.0 points on SWE-Bench over the fine-tuned checkpoint.

Human Evaluation and Open License Terms

Cohere backed the benchmark numbers with human review and a permissive release.

In a human evaluation across 85 code-editing samples, the reinforcement-learning version of the model won 66.1% of head-to-head comparisons against the earlier fine-tuned checkpoint. That gives a second signal, beyond automated benchmarks, that the final training stage produced edits people prefer.

The model is released under Apache 2.0, one of the most permissive open licenses, which allows commercial use, modification, and redistribution. It is reachable on Hugging Face for self-hosting, through the Cohere API for managed access, and with OpenCode integration for agent workflows.

Frequently Asked Questions

What is North Mini Code?

It is Cohere's first model built for software developers, designed for agentic coding tasks such as writing code, fixing bugs across repositories, and running terminal-based agent workflows. It is open-weight under the Apache 2.0 license.

How does a 30B model beat models with over 100 billion parameters?

North Mini Code uses a Mixture-of-Experts design, so only 3 billion of its 30 billion parameters activate per query. Cohere paired that efficient architecture with training focused tightly on coding tasks, which lets it score 80.2% on SWE-Bench Verified and outperform larger general models.

Where can businesses get and run the model?

It is available on Hugging Face in BF16 and FP8 versions for self-hosting, through the Cohere API for managed access, and with OpenCode integration. The FP8 version runs on a single H100 GPU.

What does the Apache 2.0 license allow?

Apache 2.0 is a permissive open license that allows commercial use, modification, and redistribution of the model weights. Teams can inspect and self-host the model rather than depending on a closed service.

How was the model trained?

Cohere used two stages of supervised fine-tuning followed by reinforcement learning, drawing on more than 70,000 verifiable tasks from roughly 5,000 code repositories. The reinforcement learning stage added measurable gains on both Terminal-Bench v2 and SWE-Bench.

North Mini Code shows that focused training and an efficient Mixture-of-Experts design let a small model match coding results from systems many times its size. For business teams, the open license and single-GPU footprint make capable coding agents practical to run in-house.