OpenAI: 5 Powerful Safety Models for Online Trust

Prime Highlights:

OpenAI has unveiled two open-weight reasoning models. gpt-oss-safeguard-120b and 20b, to strengthen online safety and transparency.
The models allow developers to customize AI safety tools based on specific policy needs while providing insight into decision-making processes.

Key Facts:

Developed in partnership with Discord, SafetyKit, and ROOST, the models are currently available in a research preview for public testing.
OpenAI, valued at $500 billion, aims to make AI safety tools accessible and transparent for organizations worldwide.

Background:

In a major step toward improving digital safety, OpenAI has introduced two new reasoning models designed to help online platforms identify and classify a wide range of online harms. The models, named gpt-oss-safeguard-120b and gpt-oss-safeguard-20b, are fine-tuned versions of the company’s earlier gpt-oss models released in August.

These new models have been launched as open-weight models, meaning their parameters, the factors that determine how the model generates predictions, are publicly available. Unlike fully open-source systems, open-weight models allow developers to use and adjust them while maintaining transparency and control over their application.

OpenAI said that the safeguard models are capable of reasoning through tasks and showing how they arrive at specific outputs, offering developers a clearer understanding of their decision-making process. This update aims to make online moderation more trustworthy and transparent.

Developers can also customize the new models to follow their own safety guidelines. For instance, a product review site could use them to spot fake reviews, while a gaming forum might use them to catch posts about cheating.

The models were created in partnership with Discord, SafetyKit, and Robust Open Online Safety Tools (ROOST), groups that focus on online safety solutions. They are currently available in a research preview, and OpenAI is inviting feedback from experts and the safety community before wider use.

Camille François, President of ROOST, said that as technology advances, safety tools must also progress quickly and remain accessible to everyone to promote responsible innovation.

The release comes as OpenAI faces questions about its rapid growth and focus on ethics. Valued at about $500 billion, the company recently finished a restructuring that confirmed its nonprofit foundation still controls its for-profit arm.

OpenAI said that eligible users can now access and download the safeguard model weights on Hugging Face, taking another step toward greater collaboration in online safety research.