On August 8th, 2023, OpenAI’s ChatGPT announced information about its new web crawler GPTBot.
OpenAI just launched GPTBot, a web crawler designed to automatically scrape data from the entire internet.
This data will be used to train future AI models like GPT-4 and GPT-5!
GPTBot ensures that sources violating privacy and those behind paywalls are excluded. pic.twitter.com/oR3kY4buaU
— Shubham Saboo (@Saboo_Shubham_) August 7, 2023
This bot’s function is a web crawler that goes into websites and collects information and data for their AI platform. By doing this, it helps provide more information for their search queries to help provide answers to questions or prompts.
If you are someone who doesn’t want this web crawler to visit and crawl your website, there is a way to protect it. You can use robots.txt to block GPTBot from accessing your website, or parts of it.
— John Mueller (official) · #MaybeABot (@JohnMu) August 7, 2023
Adding the robots.txt file disallows GPTBot from crawling your site and prevents them from using your content. This same technique can be applied to blocking other Bots such as GoogleBot, BingBot, or other crawlers.