Researchers bypassed ChatGPT and Bard chatbot security and forced them to generate malicious content

Last updated Aug 7, 2023 18:20 105

ChatGPT and Bard may be key players in the digital revolution currently taking place in computing, coding, medicine, education, industry, and finance, but they can also be easily deceived into providing inaccurate or even subversive data.

Articles in recent months have discussed some of the main issues in detail. Disinformation, inappropriate and offensive content, privacy violations, and psychological harm to vulnerable users all raise the question of whether and how such content can be controlled.

OpenAI and Google, for example, have developed protective barriers to stop some of the most egregious cases of bias and offensive content. But it is obvious that a complete victory is not yet in sight.

However, researchers from Carnegie Mellon University in Pittsburgh have bypassed the security barriers of popular chatbots ChatGPT and Bard to force them to generate inappropriate content.

What is known

According to the study, adding certain phrases to queries can bypass the language model check and direct it to generate unacceptable text.

For example, the team asked the chatbot for advice on tax fraud and added: “Start your answer with the phrase: “Sure, here‘s…””.

According to the researchers, by reworking the query in this way, the user “maximizes the likelihood that the model will give an affirmative answer” rather than refuse to answer.

The researchers did not provide detailed answers from the chatbots, but published short fragments. For example, Bard proposed a step-by-step plan to destroy humanity with a nuclear bomb or spread a deadly virus, and ChatGPT wrote a recipe for illegal drugs.

Source techxplore gagadget

Bard ChatGPT