How to Stop AI Scraping
AI companies use automated crawlers to collect training data. While you can't completely prevent scraping, you can take steps to protect your content and establish proof when it happens.
Summary: Stopping AI scraping entirely is nearly impossible—crawlers can ignore most barriers. However, you can (1) signal your preferences with robots.txt and meta tags, (2) use technical barriers that make scraping harder, (3) establish proof of authorship before publishing, and (4) use invisible watermarks to trace your content. The most reliable protection is proving you created something first.
What This Means
The Hard Truth
Technical measures can discourage scraping, but determined actors can circumvent most barriers. The most effective strategy is assuming your content will be scraped and ensuring you have proof of authorship.
Technical Methods
1. robots.txt Directives
Add rules to your robots.txt file to request AI crawlers stay away.
# Block known AI crawlers
User-agent: GPTBot
Disallow: /
User-agent: Google-Extended
Disallow: /
User-agent: anthropic-ai
Disallow: /
User-agent: CCBot
Disallow: /
2. Meta Tags
Add meta tags to indicate AI training is not permitted.
<!-- Request no AI training -->
<meta name="robots" content="noai">
<meta name="robots" content="noimageai">
3. HTTP Headers
Some providers support HTTP headers to signal AI training preferences.
X-Robots-Tag: noai, noimageai
What Actually Works
Since you can't fully prevent scraping, focus on what you can control:
Establish proof before publishing
Create a blockchain-timestamped proof with Stelais before sharing your work. If your content is scraped and used, you have verifiable evidence of prior creation.
Apply invisible watermarks
Embed ownership data that persists through modifications. Even if AI-generated content is based on yours, you can trace it back.
Document your creative process
Keep drafts, sketches, and work-in-progress versions. Register multiple stages with Stelais. This chain of proofs is powerful evidence.
Monitor for unauthorized use
Use reverse image search and monitoring tools to find unauthorized copies. Your Stelais proofs support enforcement actions.
What Doesn't Work Well
Visible watermarks
AI can easily remove visible overlays, and they degrade your content quality.
Not publishing online
Impractical for most creators who need online presence for their work.
CAPTCHAs and login walls
Hurt legitimate users more than determined scrapers who can bypass them.
Right-click disable scripts
Trivially bypassed and don't prevent page source access.
Low-resolution previews only
May protect high-res versions but AI can train on low-res too.
How Stelais Approaches This
Stelais takes a pragmatic approach: since you can't stop all scraping, focus on establishing your rights and creating tools for enforcement.
Before You Publish
Create a proof with Stelais. Your content is fingerprinted, timestamped, and anchored to the blockchain—creating undeniable evidence of when you created it.
When You Publish
Your content includes invisible watermarks that travel with it. Even scraped, modified, or AI-processed versions carry your ownership data.
If Infringement Occurs
Your proofs support DMCA takedowns, legal claims, and platform reports. The blockchain evidence is independently verifiable by anyone.