Tuesday, March 5, 2024
HomeCyber SecurityWhite House calls for “red-teaming” to test security of AI models

White House calls for “red-teaming” to test security of AI models

The White House announced a surprising collaboration between top AI developers to participate in a public evaluation, i.e., red-teaming, of their generative AI systems at DEF CON 31, a hacker convention taking place in Las Vegas in August.

“Red-teaming” is a process by which security experts attempt to find vulnerabilities or flaws in an organization’s systems to improve overall security and resilience.

The announcement, made on May 4th, involved a surprising collaboration between OpenAI, Google, Antrhopic, Hugging Face, Microsoft, Nvidia and Stability AI, and called on the hacker community to push these new generative AI models, such as ChatGPT, to their limits.

Since last year, large language models (LLMs) such as ChatGPT have become a popular way to accelerate writing and communications tasks, but officials recognize that they also come with inherent risks. Issues such as confabulations, jailbreaks, and biases pose challenges for security professionals and the public. 

A statement from the White House said: “This independent exercise will provide critical information to researchers and the public about the impacts of these models and will enable AI companies and developers to take steps to fix issues found in those models.”

The event aligns with the Biden administration’s AI Bill of Rights and the National Institute of Standards and Technology’s AI Risk Management Framework.

In a parallel announcement written by AI Village, organizers Sven Cattell, Rumman Chowdhury, and Austin Carson call the upcoming event “the largest red teaming exercise ever for any group of AI models.” Thousands of people will take part in the public AI model assessment, which will utilize an evaluation platform developed by Scale AI.

According to Cattell, the founder of AI Village, “The diverse issues with these models will not be resolved until more people know how to red team and assess them.” By conducting the largest red-teaming exercise for any group of AI models, AI Village and DEF CON aim to grow the community of researchers equipped to handle vulnerabilities in AI systems.

LLMs have proven surprisingly difficult to lock down in part due to a technique called “prompt injection”. AI researcher Simon Willison has written in detail about the dangers of prompt injection, a technique that can derail a language model into performing actions not intended by its creator.

During the DEF CON event, participants will have timed access to multiple LLMs through laptops provided by the organizers. A capture-the-flag-style point system will encourage testing a wide range of potential harms. At the end, the person with the most points will win a high-end Nvidia GPU.

“We’ll publish what we learn from this event to help others who want to try the same thing,” writes AI Village. “The more people who know how to best work with these models, and their limitations, the better.”

DEF CON 31 will take place on August 10–13, 2023, at Caesar’s Forum in Las Vegas.

(Source: Arstechnica)


Stay Connected


Must Read