
- Openi’s latest AI model, GPT O3 and O4-mini, significantly more frequently than your predecessors
- Increasing complications of models can lead to more confidence mistakes
- High error rate gives rise to concerns about AI’s reliability in real -world requests
Exquisite but incredible people are an important place in fiction (and history). Based on Open AI and Investigation, the same conduct may be applied to AI Seat By New York Times. AI -Chat has been part of the boats since deception, imaginary facts and a straightforward lie. Theoretically improved models should reduce the frequency with which they appear.
Openi’s latest flagship model, GPT O3 and O4-mini, are to imitate human logic. Unlike their predecessors, who were primarily focused on flowing text generation, Openi built the GPT O3 and O4-mini to think of things by step by step. Openi has been proud that O1 can match or exceed the performance of PhD students in chemistry, biology, and mathematics. But the Openi report highlights some of the horrific consequences for everyone who takes the GPT’s responses to face at a facial price.
Open found that the GPTO 3 model included a fraud in one -third of the benchmark test containing public figures. The O1 model is doubled before last year. The more compact O4-mini model performed even worse, cheating on 48 % of the same tasks.
When the more common knowledge questions are checked for a simple benchmark, 51 % reaction to O3 and the response of 79 % for O4-mini is expressed. This is not just a little noise in the system. This is a complete land identity crisis. You think something has been marketing as a reasoning system will at least check its own logic, but this is not the case.
One of the theories of the AI ​​research community is that the more the model tries to argue, the more likely the possibility is to get away from the trains. Unlike simple models that remain on the forefront of high confidence, models of reasoning enter the area where they should evaluate multiple potential routes, connect different facts, and must be corrected. And the correction around the facts is also known as making things.
Imaginary
Is not the cause of connection, and Openi told Position That deception cannot increase because the models of reasoning are naturally bad. Instead, they can easily be more verbal and brave in their answers. Since the new models are just repeating the predictions, but not speculating about the possibilities, there may be fading for the line AI between theory and the fabricated truth. Unfortunately, some of these possibilities are completely unhealthy.
Nevertheless, more deception is the opposite that Openi or its rivals like Google and Entropic want from their latest models. Calling AI Chat Boats Assistants and Co -Co -Co -Co -Co -Co -Coorders shows that they will be helpful, not effective. Lawyers have already been in trouble to use GPT and look at the imaginary court references. Who knows how many such mistakes have caused trouble in low pressure conditions?
Opportunities for a user are spreading rapidly as AI system begins in classrooms, offices, hospitals and government agencies. Sophisticated AI can help draft job applications, solve billing problems, or analyze the spreadsheet, but the contradiction is that the more useful AI, the less room for error.
You can’t claim to save time and effort if they just have to spend a double check of everything you have. It’s not that these models are not impressive. GPTO 3 has shown some amazing achievements of coding and logic. Even it can improve many humans in some ways. The problem is that at the time when it decides that Abraham Lincoln hosted a podcast or boil at 80 ° F, which causes the illusion of vomiting.
Unless these problems are solved, you should get a reaction from the AI ​​model that contains a spoonful of salt salt. Sometimes, Chat GPT is like a disturbing man in many meetings in which we all attended. Flashing with confidence in sheer nonsense.