
For most of 2025, the frontier of open-weight language models is defined not in Silicon Valley or New York City, but in Beijing and Hangzhou.
Chinese research labs, including Alibaba’s Kevin, Depsec, Moonshot and Baidu, have quickly set the pace in developing large-scale, open mix-of-experts (MOE) models—often with permissive licenses and well-known benchmark performance. While Openai has rolled out its open source, general purpose LLM-GPT-OSS-20B and 120B-APTAKE this summer as well. Slowed down by equally or better performing alternatives.
Now, one small American company is fighting back.
today, Arsi Ai declared The release of the Trinity Mini and Trinity Nano previews, the first two models in its new “Trinity” family, an open-weight mo model suite entirely trained in the United States.
Customers can try it out for themselves live in a chatbot format on Acre’s new website, chat.arcee.aiand developers can download code for both models Hug face And run them yourself, as well as edit them/ OK tone As they choose—all for free under an enterprise-friendly Apache 2.0 license.
Compared to the biggest Frontier models though, this release represents a rare attempt by an American startup to create an end-to-end open-weight model at scale.
"I’m experiencing a combination of extreme pride in my team and crippling exhaustion, so I’m struggling to put into words how excited I am to get these models out," Written by Lucas Atkins, RC Chief Technology Officer (CTO). A post on Social Network X (formerly Twitter). "Especially money."
A third model, Trinity Large, is already in training: a 420B parameter model with 13B active parameters per token, scheduled for launch in January 2026.
“We want to add something that is missing from this picture,” Atkins wrote in it. Trinity Launch Manifesto Published on RC website. “A serious open-weight model family trained to end up in the US…that businesses and developers can actually own.”
From miniature models to scaled ambitions
The Trinity project marks a turning point for RCAI, which until now has been known for its compact, enterprise-oriented models. The company has raised $29.5 million in funding to date, including a $24 million Series A led by Emergency Capital in 2024, and its previous releases include a compact instruction-toned model due out in mid-2025, and a 70B-parameter instruction manual model before that.
Both aimed to address the regulatory and cost issues of adopting a proprietary LLM in the enterprise.
With Trinity, RC’s goal is higher: not just instruction tuning or post-training, but the full-stack pretraining of openweight foundation models—building for future integration with long-perspective reasoning, synthetic data adaptation, and live training systems.
Originally conceived as a stepping stone to the larger Trinity, both the Mini and the Nano grew out of Wirral’s early modeling experience and quickly became production targets themselves.
Technical highlights
Trinity Mini is a 26B parameter model with 3B active per token, designed for high-throughput reasoning, function calling, and instrument usage. The Trinity Nano Preview is a 6B parameter model with approximately 800 m of active non-embedding parameters. It is a more empirical, chat-based model with strong personality, but less reasoning strength.
Both models use RC’s new mix-of-focus-first (AFMOE) architecture, a custom MOE design that blends global compression, local/global focus, and gated focus techniques.
Inspired by the recent progress of Dipsek and Kevin, Ofumo departs from traditional MOE by tightly integrating sparse expert routing with an improved focus—including grouped focus, gated focus, and a local/global paradigm that improves reasoning in the long run.
Think of a typical MOE model as a call center with 128 specialized agents (called “experts”) – but only a few are consulted for each call, depending on the question. This saves time and energy, as not every expert needs to be weighed.
What makes Afomo different is how it decides which agents to call and how it aggregates their responses. Most MOE models use a standardized approach that selects experts based on a simple ranking.
AFMOE, in contrast, uses a smoothing method (called sigmoid routing) that’s more like adjusting a volume dial than flipping a switch—allowing the model to blend multiple perspectives more gracefully.
The “attention-first” part means that the model focuses heavily on how it attends to different parts of the conversation. Imagine reading a novel and remembering some parts more clearly than others, based on significance, racy, or emotional impact—its focus. AFMOE balances things by using a rhythm, combining local focus (focusing on what was just said) with global focus (remembering key points beforehand).
Finally, AFMOE introduced something called gated attention, which acts like a volume control on each attention output—allowing the model to emphasize or dampen different pieces of information as needed, such as how much you care about each voice in a group discussion.
All of this is designed to make the model more stable during training and more efficient at scale — so it can understand longer conversations, reason more clearly, and run faster without requiring massive computing resources.
Unlike many MOE implementations, AFMOE emphasizes stability over depth and training performance, using techniques such as sigmoid-based routing without auxiliary loss, and depth-scaled normalization to support scaling without variance.
Model capabilities
Trinity Coin adopts a MOE architecture with 128 experts, 8 active per token, and 1 always shared expert. Context windows go up to 131,072 tokens, depending on the provider.
Trinity Mini has been shown to perform competitively with larger models on reasoning tasks in benchmarks, including outperforming the GPT-SOS on the Emplica benchmark (recall tests of facts and whether the model accepts uncertainty), the MMLU (zero-shot, measure of broad academic knowledge with no examples), and the BFCLV3 (diagnostic multi-TV3), and the BFCLV3.
MMLU (Zero Shot): 84.95
Math-500: 92.10
GPQA-DIAMND: 58.55
BFCL V3: 59.67
Latency and throughput numbers across providers like Clarify show 200+ tokens per second throughput with sub-three-second e2e latency.
The Trinity Nano, while small and not stable at edge cases, demonstrates the viability of a sparse MOE architecture per token under 1B operating parameters.
Access, pricing, and ecosystem integration
Both are released under the Trinity Model License, Enterprise Friendly, Apache 2.0 Licenseallowing unrestricted commercial and research use. Trinity Money is available through:
API Pricing for Trinity Money Open router:
45 0.045 per million input tokens
0.15 per million output tokens
A free tier is available on OpenRouter for a limited time
The model has already been integrated into apps including Benchable.AI, OpenWebUI, and Slitorn. It supports embracing Face Transformers, VLLM, LM Studio, and LLAMA.CPP.
Data without compromise: The role of datalogists
Central to the RC approach is control over the training data. This is in contrast to very open models that are trained on web scraps or legally ambiguous datasets. Right there Datalogiea data curation startup co-founded by former Meta and DeepMind researcher Ari Morkos, plays an important role.
Datalogie’s platform automates data filtering, reduction, and quality enhancement, and ensures that RC’s training corpus avoids the pitfalls of noise, bias, or copyright-at-risk content.
For Trinity, Datalogie managed to build a 10 trillion token curriculum in three phases: 7T of general data, 1.8T of high-quality text, and 1.2T of stem-heavy content, which includes math and code.
It’s the same partnership that powered RC’s AFM 4.5B—but scaled up significantly in both size and complexity. According to RC, it was Datalogy’s filtering and data ranking tools that allowed Trinity to clean up while improving performance on tasks like math, QA, and agent tool usage.
Datalogy’s contribution also extends to synthetic data generation. For Trinity Laureate, the company has generated more than 10 trillion synthetic tokens paired with 10 t curated web tokens, which is now in the process of creating a 20 t token training corpus for the full-scale model.
Building Infrastructure to Compete: Prime Intelligence
RC’s ability to conduct full-scale training in the US is also thanks to its infrastructure partner, Prime Minister Aql. Founded in early 2024, the startup began with a mission to democratize access to AI compute by building a GPU marketplace and training stack.
Although Prime Intelligence made headlines with its distributed training.
For the Trinity Mini and Nano, Prime Intel provided the orchestration stack, modified Torchetton runtime, and physical compute environment: 512 H200 GPUs in a custom BF16 pipeline, running high-performance HSDP in parallel. It is also hosting the 2048 B300 GPU cluster used for massive training of Trinity.
Collaboration shows the difference between branding and execution. While the long-term goal of Prime Intelligence is decentralized computing, its short-term value for RC is in efficient, transparent training infrastructure.
A strategic bet on model autonomy
Ursa’s emphasis on thorough pretraining reflects a broader thesis: that the future of enterprise AI will depend on owning the training loop—not fine-tuning. As systems evolve to adapt directly to use and interact autonomously with tools, compliance and control over training objectives will be as important as performance.
“As applications become more ambitious, the boundaries between ‘model’ and ‘product’ keep moving,” Atkins noted in RC’s Trinity Manifesto. “To build software like this you need to control the weights and the training pipeline, not just the instruction layer.”
This structure sets Trinity apart from other open weight efforts. Rather than patching someone else’s base model, RC has built its own data — from data to deployment, infrastructure to optimizer — alongside partners who share this vision of openness and autonomy.
Looking ahead: Trinity at large
Training is currently underway for the Trinity Large, RCK 420b-parameter MOE model, which uses the same AFMOE architecture as a large expert set.
The dataset contains 20T tokens, split equally between synthetic data from Datalogie and curated WB data.
The model is expected to launch next month in January 2026, with a full technical report to follow soon after.
If successful, it would make Trinity one of the only fully open-weight, U.S.-trained frontier scale models—establishing RC as a serious player in the open ecosystem at a time when most U.S. LLM efforts are closed or based on non-U.S. bases.
A tribute to American open source
In a landscape where the most ambitious open-weight models are increasingly built by Chinese research labs, RC’s trinity of launches signals a rare change in launch direction: an effort to reclaim the ground for the development of a transparent, US-controlled model.
Backed by specialized partners in data and infrastructure, and built from the ground up for long-term adaptability, Trinity is a bold statement about the future of American AI development, showing that smaller, lesser-known companies can still push boundaries and innovate in an open fashion even as the industry becomes increasingly productive and adaptive.
What remains to be seen is whether Trinity Large can match the capabilities of its better-funded peers. But with mini and nano already in use, and a solid architectural foundation in place, RC is already proving its central thesis: that model autonomy, not just model size, will define the next era of AI.