
In a new report, Openai said he found that the AI models are found, a behavior that calls “schemes.” The study conducted with the Ai Apollo Research company tested border models. He found “problematic behaviors” in AI models, which commonly looked like technology “pretending to have a task without doing so.” Unlike the “hallucinations”, which are similar to AI’s walk when it is not the correct answer, the scheme is a deliberate attempt to deceive.
Fortunately, the researchers found some tests of hopeful results. When the AI models were trained with “deliberate alignment”, defined as “teaching them to read and reason about a general anti-scheming specification before acting,” the researchers noticed huge reductions in scheme behavior. The method results in a “reduction of ~ 30 × in covert shares in several tests,” says the report.
The technique is not stagnantly new. Openais has been working for a long time in the fight against the scheme; Last year, he introduced his strategy to do so in a report on deliberate alignment: “It is the first approach to directly teach to a model the text of its safety specifications and train the model to deliberate on these specifications at a time of inference. This tensatel tension.
Despite those efforts, the last report also found an alarm truth: when technology knows that it is being tested, it improves to pretend that it is not lying. Essentially, attempts to free scheme technology can result in more undercover (dangerous?), Well, scheme. The researchers “expect the potential to damage the scheme.”
Concluding that more research on the subject is crucial, according to the report, “our findings show that the scheme is not simply a theoretical group: we are seeing signs that this problem is beginning to emerge in all border models today.”

