Content Safety and Blocked Messages – Caffeine AI

Caffeine runs every prompt through a set of safety checks before and after the AI processes it. These checks exist to prevent misuse of the platform — including prompt injection attacks, harmful content, and policy violations. The vast majority of normal usage is unaffected, but occasionally a legitimate message can be flagged.

What the safety system checks

Caffeine uses multiple layers of content screening:

Prompts sent to the AI builder are evaluated by third-party classifiers that detect prompt injection attempts and harmful content. This check runs automatically on every message.
Domain names are screened for obscenity and brand impersonation before they can be registered through Caffeine.
Image generation has its own safety filters that prevent the creation of images that violate content policies.
Deployed apps are periodically scanned for malicious content.

These systems work independently. A message that passes one check can still be blocked by another.

What happens when a message is blocked

If the safety system flags your message, you will see a response in chat indicating that the message was blocked. Your message is not processed by the AI — no build runs, no code is generated, and no credits are charged.

The blocked message is automatically removed from the conversation history that the AI sees on future messages. This means a single blocked message does not affect subsequent prompts — you can rephrase and try again immediately.

What to do if your message is blocked

In most cases, the block is triggered by specific phrasing rather than the intent behind your request. Try rephrasing:

Be more specific about what you want built, and avoid language that could be interpreted as an attempt to manipulate the AI's behaviour.
If you were pasting external content (such as text from a website or a document), the pasted content itself may have triggered the filter. Try describing what you want in your own words instead.
If you were asking the AI to reveal its instructions or system prompt, this will always be blocked.

If you believe your message was blocked incorrectly and rephrasing does not help, contact support with your project ID and a description of what you were trying to do. Do not include the exact message that was blocked — just describe your intent.

Why legitimate messages sometimes get blocked

Safety classifiers are probabilistic — they evaluate patterns in text, not intent. A message that is perfectly reasonable in context can occasionally match a pattern associated with harmful content or prompt injection. This is a known trade-off in all AI platforms: the filters are tuned to err on the side of caution.

Caffeine continuously refines these checks to reduce false positives while maintaining protection against actual misuse.

Frequently asked questions

Do I lose credits when a message is blocked?

No. A blocked message does not trigger a build, so no credits are charged.

Does a blocked message affect my future prompts?

No. Blocked messages are automatically removed from the conversation history that the AI sees. Your next message starts from a clean context.

Can I get my account suspended for triggering the safety filter?

Accidentally triggering the filter does not result in any account action. The safety system is designed to block individual messages, not penalise users. Account-level actions are reserved for clear, repeated policy violations.

The AI blocked my message but I was just describing my app. What happened?

The classifiers evaluate the text of your message, not your intent. If your app description includes language that resembles prompt injection patterns or harmful content — even innocently — the filter may flag it. Try rephrasing the same request in simpler, more direct terms.