
Nous Researcha San Francisco-based artificial intelligence startup, released on Tuesday what it calls an open-source mathematical reasoning system. Nomos 1 It achieved near-human performance this year William Lowell Putnam Mathematical Competitionone of the most famous and notoriously difficult undergraduate mathematics competitions in the world.
Putnam is known for its difficulty: although a perfect score is 120, this year’s top score was 90, and the median was only 2. The Honor 1, by contrast, scored 87 points – which would result in second place out of 3,988 participants in the 2024 competition.
The release marks a tipping point in the rapidly accelerating race to build AI systems capable of sophisticated mathematical reasoning. Unlike the large-scale, compute-intensive models deployed by large technology companies, Nomos 1 It achieves its results with a relatively compact architecture: 30 billion parameters with approximately 3 billion active at any time, using the expert design of Alibaba-based composites. QWEN3 model.
"This score will rank #2/3988 in 2024 and is our first step towards becoming a Sota AI Mathematician with Hullumbi." Nous Research announced on social media on Tuesday.
The same base model scored 24 points without any research-specific training
Perhaps the most striking difference is Nomos 1 And its base model. When nous research works the same way QWEN3-30B-A3B-Thinking-25507 Model Using the same test, it scored only 24 out of 120—a result that indicates the critical importance of post-training optimization and special inference techniques on the raw model scale.
"NOMOS 1 scored 87/120 with 8 perfect scores," The company said there is a difference in performance "This is largely due to training and data quality rather than usability."
Results were verified by blind grading by a human expert who had previously finished in the top 200 on Putnam. Nous Research Provided anonymous submissions to Grader, he then published the complete set of de-animated files used to develop the runbooks on GitHub.
Why is the Putnam Competition considered the ultimate test of mathematical reasoning?
William Lowell Putnam Mathematical Competition is an annual mathematics competition for undergraduate college students enrolled in institutions of higher education in the United States and Canada. It is widely regarded as the world’s most famous university mathematics competition.
The notoriously brutal William Lowell Putnam Mathematical Competition is a program of mathematical games rather than an academic test. The exam consists of two 3-hour sessions separated by a 2-hour gap. There are total 12 questions to solve, 6 for each session. Each question is worth 10 points, for a total of 120 points.
Putnam’s questions are not the kind that appear on regular exams or textbooks. They are more like puzzles than calculus, often requiring students to find different ways to represent things before a solution can be found.
Last year, nearly 4,000 students from across the continent wrote Putnam. Eighty-one percent scored three points or less, according to Mathematical Association of Americawhich organizes the competition. The top score was 90 out of 120.
Many Putnam Fellows have gone on to become distinguished researchers in mathematics and other fields, including three Fields Medalists—John Milnor, David Mumford, and Daniel Quillen—and two Nobel laureates in physics—Richard Feynman and Kenneth Wilson.
Within the two-stage reasoning system that powers NOMOS 1’s mathematical achievements
Nomos 1 Kevin specializes in QWEN3-30B-A3B Thinking ModelOptimized for natural language math problem solving and proof writing. This system was developed collaboratively Halklamb Ai.
What distinguishes NOMOS 1 from simple model evaluation is its use of sophisticated inference—an open-source framework that develops how to approach models and solve problems. Control runs in two separate phases within a three-hour time limit, mirroring the actual structure of the Putnam competition.
In the solving phase, parallel workers simultaneously tackle problems using a priority-based system. Each worker picks a problem, prepares a submission, then scores their work on a scale of 1 to 7. Difficulties with the lowest perfect score are prioritized, ensuring that the system focuses its calculations on the most difficult challenges. This process continues until either all problems have achieved a perfect self-critique score or the goal is timed out.
The final form phase starts 15 minutes before the time limit (or at 50% for short runs) and employs a two-stage selection process. First, a consolidation phase attempts to draw conclusions from the groups’ submissions and identify the right group—importantly, not necessarily the majority group. After that, a pairs tournament using single elimination determines the final submission for each problem.
"Our open-source reasoning system consists of a solving phase, where workers try the minimum solved problem and self-test, followed by a finalization phase, which consolidates the submissions to choose a final presentation for each problem," Nous Research Explained.
How NOMOS 1 Compares to Mathematical AI Systems from Dipsec, Google, and Openei
The results of NOMOS 1 come amid advances in mathematical reasoning AI. Deepak’s model, Dipsecmath-v2scored 118 out of 120 points on the 2024 William Lowell Putnam Mathematical Competition questions, beating the high human score of 90. The model also performed at the level of gold medal winners at the International Mathematical Olympiad.
This year, Google’s advanced The Gemini model Produce rigorous mathematical proofs directly from formal problem descriptions, running end-to-end in natural language—all within a competitive time limit of 4.5 hours. They obtained this year’s result using an updated version Gemini deep thinking.
What makes the NOMOS 1’s success remarkable isn’t raw performance – it trails the DeepSock 118/120 – but its accessibility and performance. At 30 billion parameters with only 3 billion actives, this model can run on consumer-grade hardware, in stark contrast to the massive compute clusters required by OpenAI and Google’s Frontier models.
Hermes 4.3 arrived just six days ago, trained on a decentralized blockchain network.
The announcement of NOMOS 1 coincides with the Dec. 3 release of Nos Research Hermes 4.3a general-purpose language model that marked another major milestone for the company.
BytDance based Hermes 4.3 Bij-AS-36B base modelis the first production model to have Nous Research fully trained on it Psychic network – A distributed training infrastructure that uses a novel optimizer called Distro to coordinate training across nodes spread across data centers on the open Internet through consensus on the Solana blockchain.
Company trained Hermes 4.3 Both traditional centralized methods and on it Psychic networkspecifically to verify that distributed training can match or exceed centralized performance for production workloads. The psychologically trained version outperformed the main version in a suite of downstream tasks, the company reported.
"The training run proved to be completely stable, with an average of 144K tokens/second spread across 24 psychic nodes," Described by Nous Research. "Using the distro’s overlapped clustering strategy, all of the P2P communication is hidden from the training time, which achieves the equivalent of traditional, centralized training."
Hermes 4.3 State-of-the-art results were also obtained on denial, a new criterion that measures a model’s willingness to be helpful in a variety of scenarios typically limited by other models. The model answered 74.60% of denial queries in non-argument mode, beating its predecessor Hermes 4 70b (59.50%) and outclassing closed models including the Gark 4 (51.30%) and Gemini 2.5 Pro (24.23%).
Small models with smart training are closing the gap with trillion-parameter giants
Together, the two releases in the same week signal a strategic imperative for signal nous research: smaller, more efficient models with advanced post-training techniques and reasoning can compete—and in some cases larger-scale models developed by better-funded competitors.
For enterprise decision makers, the implications are significant. Mathematical reasoning skills have applications far beyond academic competitions: they are essential for formal verification, theorem proving, scientific modeling, cryptographic analysis, and any domain requiring rigorous logical deduction.
The open-source nature of both releases—NOMOS 1 is available under the Apache 2.0 License—embracing the face, with Using Full Reasoning on GitHub – This means organizations can deploy these capabilities on their own infrastructure without relying on API calls to major cloud providers.
"For the first time, anyone can run or access sophisticated AI mathematicians," One observer noted on social media. "This reduces the barrier to serious mathematical research, proof validation, modeling complex systems, advanced reasoning work."
Key contributors to NOMOS 1 include Roger Ginn, who leads the training. Jeffrey Quayle and Dakota Mahan, who created the infrastructure. Chen Guang, who advised. and Ryan Technim and Jeffrey Coyle, who provided leadership. The model was developed by Hillclimb AI and a team of mathematicians Samuel Kim, Myron Yurkevich, and others.
The race to build AI mathematicians is accelerating faster than anyone predicted
86th Putnam Competition It happened on Saturday, December 6, 2025—just three days before NOMOS 1 was released by NOUS Research. The timing is indicative of how quickly the field is advancing: companies are now releasing mathematical AI systems capable of near-elite human performance within days of the competitions they are designed to solve.
Competition in mathematical AI has intensified dramatically in recent months. In July, an updated version of The Gemini Model of Google DeepMind and an empirical reasoning model Open Eye Both achieved gold status at IMO 2025. A new model of Dipsec Matching their performance, solving 5 out of 6 problems.
But the resource requirements for these frontier systems are prohibitive for most organizations. Openai’s O1-PRO estimates over 1.8 trillion parameters. Google’s Gemini 2.5 Pro is likely over 400 billion. NOMOS 1, in contrast, achieves competitive results with a fraction of its footprint.
The gap between mass frontier models and effective open source alternatives is narrow. And for organizations that need mathematical reasoning capabilities without the budget for hyperscale compute, that gap can close enough to make a difference.
As An observer Put it on social media: "This marks a significant leap forward for AI mathematical models that are small enough to run on your laptop."
A laptop that could now be close to 4,000 of the continent’s best undergraduate mathematicians.