For Pachocki, it’s an obvious one yes. In fact, he thinks it’s just a matter of continuing down the path we’re already on. A simple increase in versatility allows models to operate longer without assistance, he says. He points to the jump from 2020’s GPT-3 to 2023’s GPT-4, two of OpenAI’s previous models. He says that GPT-4 was able to work on a problem much longer than its predecessor, even without special training.
The so-called reasoning model brought another bump. Training LLMs to work through problems step-by-step, backtracking when they make a mistake or hurt someone, has also made the models work over the long term. And Puchuki is confident that OpenAI’s reasoning models will continue to improve.
But OpenAI is also training its systems to take specific patterns of complex tasks, such as difficult puzzles taken from math and coding competitions, that force the models to learn to keep track of very large chunks of text and to work longer on their own by dividing (and then managing) the problems into multiple subtasks.
The goal is not to create models that just win math competitions. “It lets you prove that the technology works before you connect it to the real world,” says Pachuki. “If we really wanted to, we could make an amazing automated mathematician, we have all the tools, and I think it would be relatively easy. But it’s not something we’re going to prioritize right now because, you know, at the point where you believe you can do it, there’s a lot more pressing stuff to do.”
“We are now focusing more on research that is relevant in the real world,” he added.
Now it means what codecs (and similar tools) can do with coding and try to apply it to problem solving in general. “There’s a big shift happening, especially in programming,” he says. “Our jobs are completely different now than they were a year ago. No one is really editing code all the time anymore. Instead, you manage a bunch of codex agents.” If a codec can solve coding problems (the argument goes), then it can solve any problem.
The line always goes up.
It’s true that OpenAI has made a handful of notable achievements in the past few months. Researchers using GPT-5 (the LLM that powers Codex) have discovered new solutions to a number of unsolved math problems and teased out apparent dead ends in a few puzzles in biology, chemistry and physics.
“Just looking at the models that are coming up with ideas that would take most of a PhD’s weeks, at least, I expect we’ll see a lot of acceleration from this technology in the near future,” Puchuki says.