Meta AI web crawler name is meta-externalagent, Meta uses this web crawler specifcally to crawl and index the web for its AI training purposes. To quickly block meta-externalagent you can add this code to your robot.txt file:
User-agent: meta-externalagent
Disallow: /
What follows is the story behind me trying to find Meta AI web crawler when Meta first launched its AI without declaring any web crawler or user agent for the website owners to block. At the time I went ahead and even emailed them, asking what user agent they use for AI training purposes.
Meta AI has recently been widely discussed, with Mark Zuckerberg’s video announcing its latest release and features. Now, with many sites blocking AI crawlers such as GPTBot, Google-Extended, and Anthropic-AI (see the universal web crawler blocking report), especially in the News sector, you might be wondering how you can block Meta AI from accessing your site.
I did some research and was not able to find any official source from Facebook on the Meta AI user agent. There is a page on the Facebook developer’s guide with information about the crawlers that Facebook uses to browse a website when a link is shared between users:
- facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
- facebook external hit/1.1
- facebookcatalog/1.0
However, these seem to be specific to Facebook only, and we don’t know if Meta AI uses the same User Agent.
So I went ahead and asked Meta AI itself about how we can block it from accessing a site via robots.txt, and this is the response:
So, to block these user agents, you can add these lines to your robots.txt file:
User-agent: MetaAI
User-agent: Meta-AI
User-agent: Meta AI
User-agent: Meta.AI
User-agent: meta-ai
User-agent: metaai
Disallow: /
I then asked Meta AI if it could provide me with the source of this information, but none of the sources it provided mentioned anything about how to block Meta AI or what the Meta AI user agent is.
So I assume that what the Meta AI said about its user agent is just the AI making up things, doing guesswork, or related to AI hallucination, as it is more commonly referred to.
So, I went ahead and contacted Facebook about this, but I have not yet heard back from them. I will update this post as soon as I hear back from them (I will update this post as soon as I receive a reply.)
Meta did not answer my email, but they updated their page about Meta bots and web crawlers, adding a new web crawler named meta-externalagent, which crawls the web and directly indexes and stores content to train Meta’s AI models. Here is how you can block it via robots.txt:
User-agent: meta-externalagent
Disallow: /