
Nous Researchan open-source artificial intelligence startup backed by a crypto venture firm Exampleon Monday released a new competitive programming model that it says matches or exceeds many large proprietary systems — trained in just four days using NVIDIA’s 48 latest. B200 graphics processors.
model, which is called nouscoder-14bis another entry in the crowded field of AI coding assistants, but arrives at a particularly charged moment: Claude Codethe agentic programming tool from rival Anthropic, has dominated social media discussions since New Year’s Day, with developers posting breathless Appreciation About his abilities. The simultaneous developments underscore how quickly AI-assisted software development is evolving—and how companies large and small are competing to capture what many believe will become a core technology in software writing.
Type: Embedded entry inline ID: 74csyrq6ourp9seq5zousl
nouscoder-14b Achieves an accuracy rate of 67.87% liveCodebench v6a benchmark that tests models on competitive programming problems to be published between August 2024 and May 2025. This figure represents a 7.08 percentage point improvement over the base model that was trained, Alibaba trained. Qwen3-14baccording to a Nous Research technical report published alongside the release.
"I detailed the problem to Claude Code, he produced in an hour what we had built last year," Written by Jana Duggana principal engineer at Google responsible for the Gemini API, in a viral post on X last week that captured the prevailing mood around AI coding tools. Duggan was describing a distributed agent orchestration system his team had spent a year developing—a system cloud code closer to three-paragraph notation.
The justi position is instructive: While Anthropic’s CloudCode has captured imaginations with end-to-end software development demonstrations, Nous Research is betting that open-source alternatives trained on verifiable problems can close the gap—and how those models are built in cases of raw potential.
How Noose Research Built an AI Coding Model That Anyone Can Replicate
what’s the difference nouscoder-14b A release from many rival declarations is its radical openness. Nous research isn’t just published Model weight But A fully reinforced learning environmentBenchmark Suite, and Training Control – built on the company The Atropos Framework – Enabling any researcher with enough compute Reproduce or extend the work.
"The open-sourcing Atropos stack provides the infrastructure necessary for reproducible Olympiad-level reasoning research," A commentator mentioned on xa summary of importance to the academic and open source communities.
The model was trained Joe Leea researcher in residence at Nous Research and a former competitive programmer himself. Lee’s Technical report Unexpectedly reveals a personal dimension: He compared the model’s path to improvement to his journey on CodeForce, a competitive programming platform where participants earn rankings based on competition performance.
Mapping Leo’s codebench scores to CodeForce’s ratings based on some rough estimates, Lee calculated that the noosecoder-14B reformers ranged from about 1600-1750 to 2100-2200—a jump that took him about two years of practice between the ages of 14 and 16. This model did the equivalent work in four days.
"It was a surreal experience watching that final training." Lee wrote in a technical report.
But Lee was quick to note an important caveat that speaks to broader questions about the AI’s performance: It solved about 1,000 problems over those two years, while the model needed 24,000. Humans, at least for now, remain dramatically more effective learners than models.
Within a reinforcement learning system that trains on 24,000 competitive programming problems
nouscoder-14bThe training process provides a window into the increasingly sophisticated techniques researchers use to improve AI reasoning abilities through reinforcement learning.
The approach depends on what the researchers say "Verifiable rewards" – A system where model code generates solutions, those solutions are executed against test cases, and the model receives a simple binary signal: true or false. This feedback loop, while theoretically straightforward, requires significant infrastructure to implement at scale.
Nos research was used Modala cloud computing platform, to run sandboxed code execution in parallel. Each of the 24,000 training problems contains an average of hundreds of test cases, and the system must verify that the code produces correct results within the time and memory constraints set out—15 seconds and 4 gigabytes, respectively.
A technique used in training is called DAPO (Dynamic Sampling Policy Optimization)which the researchers found performed slightly better than alternatives in their experiments. A significant innovation is involved "Dynamic sampling" – Rejecting training examples where the model either solves all attempts or fails all attempts, as these do not provide a useful gradient signal for learning.
Researchers also adopted "An extension of recursive context," Train the model with a 32,000 token context window before expanding to the first 40,000 tokens. During the evaluation, expanding the context to another 80,000 tokens gave the best results, reaching an accuracy of 67.87%.
Perhaps most importantly, the training pipeline overlaps estimation and validation—as soon as the model generates a solution, it starts working on the next problem while the previous solution is being checked. This pipelining, combined with heterogeneous training where multiple model instances run in parallel, maximizes hardware utilization on expensive GPU clusters.
A lack of data that can slow down the progress of AI coding models
Buried in Lee Technical report A finding with important implications for the future of AI development: the training dataset for Nouscoder-14b "A significant portion of verifiable competitive programming problems is readily available in a standard dataset format."
In other words, for this particular domain, researchers are reaching the limits of high-quality training data.
"The total number of competitive programming problems on the Internet is roughly the same order of magnitude," Lee wrote, citing 24,000 problems used for training. "This suggests that within the competitive programming domain, we have approached the limits of high-quality data."
This observation raises concerns in the AI ​​industry about data constraints. Although the calculations are increasingly based on economic and engineering principles, the training data is "sharply limited," As Lee said.
"It appears that some of the most important future research needs will be in the areas of artificial data generation and data efficient algorithms and architectures." He concluded.
The challenge is particularly acute for competitive programming because the domain requires problems with known correct solutions that can be automatically verified. Unlike natural language tasks where human evaluation or proxy metrics are sufficient, code either works or it doesn’t – making the generation of synthetic data quite difficult.
Lee identified one possible avenue: training models to not only solve problems but to create problems to solve, enabling a form of self-play similar to techniques that have proven successful in game-playing AI systems. "Once the artificial problem generation is solved, self-play becomes a very interesting direction," He wrote.
A $65 million bet that open source AI can compete with Big Tech
Nous Research has carved out a niche position in the AI ​​landscape: a company committed to Open source release It competes with – and sometimes surpasses – proprietary alternatives.
The company picked up 50 million in April 2025 In a round led by Paradigm, a cryptocurrency-based venture firm founded by Coinbase co-founder Fred Ehrsom. According to some reports, the total funding reached 65 million. The investment reflects growing interest in decentralized approaches to AI training, an area where Nous research has made advances. Psychic Platform.
Previous releases include Hermes 4a family of models that we reported "Improve chatput performance without content restrictions," and Defermus-3, which the company touts as a first "Toggle on reasoning model" – Allowing users to activate expansion capabilities on demand.
The company has cultivated a distinctive aesthetic and community, which has raised some doubts about whether style can overshadow substance. "OFC I would trust a mobile phone PFP company. Stop Benchmark Maxxing FFS," wrote a critic on Xciting Nous Research’s industry practice of improving mobile phone style branding and benchmark performance.
Others raised technical questions. "Based on the benchmark, Nemotron is betterfor , for , for , ." One commenter said, referring to NVIDIA’s family of language models. Another asked if it came nouscoder-14b is "Agent focused or just ‘one shot’ coding" – a distinction that is important for practical software development, where iterating over feedback usually produces better results than single attempts.
Researchers say AI coding tools must be next to improve
The release includes several directions for future work that indicate where AI coding research could go.
Multi-turn reinforcement learning tops the list. Currently, the model receives only one final binary reward – pass or fail – after generating a solution. But competitive programming problems usually involve public test cases that provide intermediate feedback: compilation errors, false results, time limit violations. Training models to incorporate this feedback across multiple attempts can significantly improve performance.
Controlling the length of the reaction also remains a challenge. The researchers found that incorrect solutions tended to be longer than correct ones, and that response length during training quickly took over the available context window—a pattern that various algorithmic modifications failed to resolve.
Perhaps most ambitiously, Lee proposed "Problem-solving and self-play" – Both models for solving and creating programming problems. This will directly address the problem of data scarcity by enabling models to develop their own training curriculum.
"Humans are very good at creating interesting and useful problems for other competing programmers, but there still seems to be a significant gap in LLMs’ abilities in creative problem generation," Lee wrote.
There is a model Now available on Hugging Face Under the Apache 2.0 License. For researchers and developers who want to extend the work, Noose Research has published a complete Atropos Training Stack Along with that
What it took Leo to age in two years—going from a level 1600 novice to a 2100 rated competitor on Codeforce—built an AI clone in 96 hours. He needed a thousand worries. The model required 24,000. But soon, these systems can learn to write down their problems, teach themselves, and completely bypass human standards.
The question is no longer whether machines can learn to code. It is whether they will soon be better teachers than we are.