OpenAI has released a brief overview of GPTBot.
GPTBot is OpenAI’s web crawler and can be identified by the following user agent and string.
User agent token: GPTBot
Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)
Usage
Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies. Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety. Below, we also share how to disallow GPTBot from accessing your site.
Disallowing GPTBot
To disallow GPTBot to access your site you can add the GPTBot to your site’s robots.txt:
User-agent: GPTBot Disallow: /
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |