How to Block meta-externalagent (Meta AI Crawler) from Using Your Content for AI Training

Last update on: December 23, 2024

Founder & CEO @ Nexunom

Meta AI web crawler name is meta-externalagent, Meta uses this web crawler specifcally to crawl and index the web for its AI training purposes. To quickly block meta-externalagent you can add this code to your robot.txt file:

User-agent: meta-externalagent
Disallow: /

What follows is the story behind me trying to find Meta AI web crawler when Meta first launched its AI without declaring any web crawler or user agent for the website owners to block. At the time I went ahead and even emailed them, asking what user agent they use for AI training purposes.

Meta AI has recently been widely discussed, with Mark Zuckerberg’s video announcing its latest release and features. Now, with many sites blocking AI crawlers such as GPTBot, Google-Extended, and Anthropic-AI (see the universal web crawler blocking report), especially in the News sector, you might be wondering how you can block Meta AI from accessing your site.

I did some research and was not able to find any official source from Facebook on the Meta AI user agent. There is a page on the Facebook developer’s guide with information about the crawlers that Facebook uses to browse a website when a link is shared between users:

facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
facebook external hit/1.1
facebookcatalog/1.0

However, these seem to be specific to Facebook only, and we don’t know if Meta AI uses the same User Agent.

So I went ahead and asked Meta AI itself about how we can block it from accessing a site via robots.txt, and this is the response:

How to Block meta-externalagent (Meta AI Crawler) from Using Your Content for AI Training 1 — Asking Meta AI itself about its user agent

So, to block these user agents, you can add these lines to your robots.txt file:

User-agent: MetaAI
User-agent: Meta-AI
User-agent: Meta AI
User-agent: Meta.AI
User-agent: meta-ai
User-agent: metaai
Disallow: /

I then asked Meta AI if it could provide me with the source of this information, but none of the sources it provided mentioned anything about how to block Meta AI or what the Meta AI user agent is.

How to Block meta-externalagent (Meta AI Crawler) from Using Your Content for AI Training 2 — None of the sources Meta AI provides mention anything about Meta AI User Agent

So I assume that what the Meta AI said about its user agent is just the AI making up things, doing guesswork, or related to AI hallucination, as it is more commonly referred to.

So, I went ahead and contacted Facebook about this, but I have not yet heard back from them. I will update this post as soon as I hear back from them (I will update this post as soon as I receive a reply.)

How to Block meta-externalagent (Meta AI Crawler) from Using Your Content for AI Training 3

Meta did not answer my email, but they updated their page about Meta bots and web crawlers, adding a new web crawler named meta-externalagent, which crawls the web and directly indexes and stores content to train Meta’s AI models. Here is how you can block it via robots.txt:

User-agent: meta-externalagent
Disallow: /

What’s your Reaction?

Author

Saeed Khosravi

Saeed Khosravi is an SEO Strategist, Digital Marketer, and WordPress Expert with over 15 years of experience, starting his career in 2008. He graduated with a degree in MIB Marketing from HEC Montreal. As the Founder and CEO of Nexunom, Saeed, alongside his dedicated team, provides comprehensive digital marketing solutions to local businesses. He is also the founder and the main brain behind several successful marketing SAAS platforms, including Allintitle.co, ReviewTool.com, and Tavata.com.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

3 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Patch

1 year ago

Thank you for the information. They hit one of my Web applications over 148,000 times today — effectively causing a DOS attack so I had to stop the server. It’s a highly interactive Web application — content and links change depending on user selections, and there are navigational cross-links and backlinks — so the crawler may have been going around in loops possibly interminably. They should at least have the decency to space out the requests over more time.

Buck Manhands

The meta-externalagent ignores the robots.txt directive on my website.

I suggest stronger measures via htaccess setEnvIf directives to block any user agent with “meta” in its string. It will block everything meta except the facebook sharer bot.

If you are on Nginx then there are corresponding commands you can put in your config file to do the same thing.

There are multiple answers on stackoverflow that will provide the full code necessary.

Den

Reply to Buck Manhands

10 months ago

Do you have some advice of measures because i have the same problem ..

How to Block meta-externalagent (Meta AI Crawler) from Using Your Content for AI Training

Author

Universal Web Crawler Blocking Report – Top Blocked Non-AI and AI Bots

Cloudways Managed WordPress Hosting: Is It the Right Choice for You?

Author

Google is now testing scroll-to-text links in the description section of SERP snippets.

🚨 Fixing the “421 Misdirected Request” Error on Plesk After Apache Update

Best Internal Linking Tools Revealed – Including Ones “They” Don’t Want You to Know About (Read This Before You Pick One 👉)

How to Stop Divi from Cropping Your Featured Image in Single Blog Posts

About

Contact

Follow Saeed

Important Pages