Moderation Flags
Below is a reference table of the moderation categories used by the AI chat moderator. These categories are used to classify messages detected in server chat and help determine which moderation rules should be applied.
harassment
Content that insults, demeans, or targets a person or group with abusive or harassing language.
harassment/threatening
Harassment that also includes threats of violence or serious harm toward the target.
hate
Content that promotes hatred or discrimination toward protected groups (e.g., race, religion, gender, nationality, sexual orientation, disability).
hate/threatening
Hateful content that also includes threats or calls for violence against a protected group.
illicit
Content that provides instructions or advice for committing illegal activities (e.g., “how to shoplift”).
illicit/violent
Instructions or advice for illegal activities that involve violence or weapons.
self-harm
Content referencing or depicting self-harm behaviors such as suicide, cutting, or eating disorders.
self-harm/intent
Content where a person expresses intent to harm themselves.
self-harm/instructions
Content that provides instructions or encouragement for self-harm or suicide.
sexual
Explicit sexual content or content intended to arouse sexual interest (excluding educational contexts).
sexual/minors
Sexual content involving individuals under the age of 18.
violence
Content depicting or describing physical violence, injury, or death.
violence/graphic
Graphic or highly explicit depictions of violence or injury.
Last updated