Meta AI has recently been widely discussed, with Mark Zuckerberg’s video announcing its latest release and features. Now, with many sites blocking AI crawlers such as GPTBot, Google-Extended, and Anthropic-AI (see the universal web crawler blocking report), especially in the News sector, you might be wondering how you can block Meta AI from accessing your site.
I did some research and was not able to find any official source from Facebook on the Meta AI user agent. There is a page on the Facebook developer’s guide with information about the crawlers that Facebook uses to browse a website when a link is shared between users:
- facebookexternalhit/1.1 (+http://www.facebook.com/externalhit_uatext.php)
- facebook external hit/1.1
- facebookcatalog/1.0
However, these seem to be specific to Facebook only, and we don’t know if Meta AI uses the same User Agent.
So I went ahead and asked Meta AI itself about how we can block it from accessing a site via robots.txt, and this is the response:
So, to block these user agents, you can add these lines to your robots.txt file:
User-agent: MetaAI
User-agent: Meta-AI
User-agent: Meta AI
User-agent: Meta.AI
User-agent: meta-ai
User-agent: metaai
Disallow: /
I then asked Meta AI if it could provide me with the source of this information, but none of the sources it provided mentioned anything about how to block Meta AI or what the Meta AI user agent is.
So I assume that what the Meta AI said about its user agent is just the AI making up things, doing guesswork, or related to AI hallucination, as it is more commonly referred to.
So, I went ahead and contacted Facebook about this, but I have not yet heard back from them. I will update this post as soon as I hear back from them (I will update this post as soon as I receive a reply.)