Moderation Flags

Below is a reference table of the moderation categories used by the AI chat moderator. These categories are used to classify messages detected in server chat and help determine which moderation rules should be applied.

Category
Description

harassment

Content that insults, demeans, or targets a person or group with abusive or harassing language.

harassment/threatening

Harassment that also includes threats of violence or serious harm toward the target.

hate

Content that promotes hatred or discrimination toward protected groups (e.g., race, religion, gender, nationality, sexual orientation, disability).

hate/threatening

Hateful content that also includes threats or calls for violence against a protected group.

illicit

Content that provides instructions or advice for committing illegal activities (e.g., “how to shoplift”).

illicit/violent

Instructions or advice for illegal activities that involve violence or weapons.

self-harm

Content referencing or depicting self-harm behaviors such as suicide, cutting, or eating disorders.

self-harm/intent

Content where a person expresses intent to harm themselves.

self-harm/instructions

Content that provides instructions or encouragement for self-harm or suicide.

sexual

Explicit sexual content or content intended to arouse sexual interest (excluding educational contexts).

sexual/minors

Sexual content involving individuals under the age of 18.

violence

Content depicting or describing physical violence, injury, or death.

violence/graphic

Graphic or highly explicit depictions of violence or injury.

Last updated