To assist builders defend their purposes in opposition to attainable misuse, we’re introducing the sooner and extra correct Moderation endpoint. This endpoint gives OpenAI API builders with free entry to GPT-based classifiers that detect undesired content material—an occasion of using AI systems to help with human supervision of those techniques. We have now additionally launched each a technical paper describing our methodology and the dataset used for analysis.
When given a textual content enter, the Moderation endpoint assesses whether or not the content material is sexual, hateful, violent, or promotes self-harm—content material prohibited by our content policy. The endpoint has been skilled to be fast, correct, and to carry out robustly throughout a spread of purposes. Importantly, this reduces the possibilities of merchandise “saying” the flawed factor, even when deployed to customers at-scale. As a consequence, AI can unlock advantages in delicate settings, like training, the place it couldn’t in any other case be used with confidence.