OpenAI’s New Web Crawler: GPTBot

OpenAI has released a brief overview of GPTBot.

GPTBot is OpenAI’s web crawler and can be identified by the following user agent and string.

User agent token: GPTBot

Full user-agent string: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

Usage

Web pages crawled with the GPTBot user agent may potentially be used to improve future models and are filtered to remove sources that require paywall access, are known to gather personally identifiable information (PII), or have text that violates our policies. Allowing GPTBot to access your site can help AI models become more accurate and improve their general capabilities and safety. Below, we also share how to disallow GPTBot from accessing your site.

Disallowing GPTBot

To disallow GPTBot to access your site you can add the GPTBot to your site’s robots.txt:

User-agent: GPTBot Disallow: /

| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |

Avatar photo

Author: Charles W. Bailey, Jr.

Charles W. Bailey, Jr.