OpenAI said its board can choose to hold back the release of an AI model even if the company’s leadership has deemed it safe, another sign of the artificial intelligence startup empowering its directors to bolster safeguards for developing the cutting-edge technology.
The arrangement was spelt out in a set of guidelines released Monday explaining how the ChatGPT-maker plans to deal with what it may deem to be extreme risks from its most powerful AI systems. The release of the guidelines follows a period of turmoil at OpenAI after Chief Executive Officer Sam Altman was briefly ousted by the board, putting a spotlight on the balance of power between directors and the company’s c-suite.
OpenAI’s recently announced “preparedness” team said it will continuously evaluate its AI systems to figure out how they fare across four different categories — including potential cybersecurity issues as well as chemical, nuclear and biological threats — and work to lessen any hazards the technology appears to pose. Specifically, the company is monitoring for what it calls “catastrophic” risks, which it defines in the guidelines as “any risk which could result in hundreds of billions of dollars in economic damage or lead to the severe harm or death of many individuals.”
Aleksander Madry, who is leading the preparedness group and is on leave from a faculty position at the Massachusetts Institute of Technology, told Bloomberg News his team will send a monthly report to a new internal safety advisory group. That group will then analyze Madry’s team’s work and send recommendations to Altman and the company’s board, which was overhauled after ousting the CEO. Altman and his leadership team can make a decision about whether to release a new AI system based on these reports, but the board has the right to reverse that decision, according to the document.
OpenAI announced the formation of the “preparedness” team in October, making it one of three separate groups overseeing AI safety at the startup. There’s also “safety systems,” which looks at current products such as GPT-4, and “superalignment,” which focuses on extremely powerful — and hypothetical — AI systems that may exist in the future.
Madry said his team will repeatedly evaluate OpenAI’s most advanced, unreleased AI models, rating them “low,” “medium,” “high,” or “critical” for different types of perceived risks. The team will also make changes in hopes of reducing potential dangers they spot in AI and measure their effectiveness. OpenAI will only roll out models that are rated “medium” or “low,” according to the new guidelines.
See also: test article with event
“AI is not something that just happens to us that might be good or bad,” Madry said. “It’s something we’re shaping.”
Madry said he hopes other companies will use OpenAI’s guidelines to evaluate potential risks from their AI models as well. The guidelines, he said, are a formalization of many processes OpenAI followed previously when evaluating AI technology it has already released. He and his team came up with the details over the past couple months, he said, and got feedback from others within OpenAI.