USA, WASHINGTON, JANUARY 30, 2023: Embracing the Power of AI: Silhouette of Man with AI Software Logo Behind

OpenAI has launched the GPTBot web crawler and added instructions for blocking it

On Aug 8, 2023 16:38 103

The GPTBot system will crawl the Internet to collect a database for AI training and improvement.

According to a post on the OpenAI blog, the GPTBot scanner has the potential to improve existing artificial intelligence models, particularly in aspects such as accuracy and security.

“Web pages crawled by the GPTBot agent can potentially be used to improve future models and are filtered to remove sources that require paid access, collect personal information, or contain text that violates our policies,” the company said in a statement.

At the same time, website operators will be able to deny access to GPTBot – partially or completely. To do this, you need to add it to the robots.txt file of the site with the “disallow” command.

Earlier, OpenAI was already embarrassed by the fact that it collects data – in June last year, the company was sued for “stealing” information for ChatGPT training. The chatbot also accidentally revealed other people’s chat histories, and later OpenAI added a feature to disable the history to avoid other accusations.

On July 18, the company filed an application with the U.S. Patent and Trademark Office for the GPT-5 trademark, which includes artificial intelligence-based human speech and text software, audio-to-text conversion, and voice and speech recognition. Back in June, the company’s CEO Sam Altman stated that OpenAI was not yet teaching GPT-5 because a lot of preparatory work needed to be done.

Source openai

ChatGPT OpenAI