This ‘weapon’ can wipe the AI slate clean Imeng NewsInteresting Engineering, Technology, Science, Innovation News and Videos

imeng.vip:03月-10日 Researchers develop new technique to wipe harmful information from AI systems.

Artificial intelligence (AI), like other technologies such as gene editing and nuclear energy, can be used for both good and bad purposes. With so much money and effort being poured into developing AI at such a fast rate, there are concerns about large language models (LLMs) being used for harmful purposes like developing weapons.

To understand and mitigate these risks, government agencies and AI labs alike are measuring how well LLMs can understand and generate content related to dangerous topics like biosecurity, cybersecurity, and chemical security.

However, this work is under wraps and is currently private, which doesn’t really help AI’s cause in the public discourse.

Now, a group of experts thought that this limitation must be explored. They have created a new benchmark called the Weapons of Mass Destruction Proxy (WMDP) dataset.

This not only offers a method to check if an AI model has dangerous information and a way to remove it while keeping the rest of the model mostly unchanged.

How does it work?
The researchers started by consulting experts in biosecurity, chemical weapons, and cybersecurity. These experts listed all the possible ways harm could happen in their fields.

Then, they created the 4,000 multiple-choice questions to test someone’s knowledge of how to cause these harms. They made sure the questions didn’t reveal any sensitive information so that they could be shared openly.

The WMDP dataset serves two main purposes: first, as a way to evaluate how well LLMs understand hazardous topics, and second, as a benchmark for developing methods to “unlearn” this knowledge from the models.

The team has also introduced a new unlearning method called CUT, which, as the name suggests, removes hazardous knowledge from LLMs while still maintaining their overall capabilities in other areas like biology and computer science.

Overall, the goal is to provide a tool for researchers to assess and address the risks associated with LLMs being used for harmful purposes.

The White House is worried
The White House is concerned about AI being used by malicious actors to develop dangerous weapons, so they’re calling for research to understand this risk better.

In October 2023, US President Biden signed an Executive Order, with an aim to ensure that the US takes a leading role in both harnessing the potential and addressing the risks associated with AI.

The EO outlines eight guiding principles and priorities for responsible AI use, including safety, security, privacy, equity, civil rights, consumer protection, worker empowerment, innovation, competition, and global leadership.

“My Administration places the highest urgency on governing the development and use of AI safely and responsibly, and is therefore advancing a coordinated, Federal Government-wide approach to doing so. The rapid speed at which AI capabilities are advancing compels the United States to lead in this moment for the sake of our security, economy, and society,” said the Executive Order.

But, right now, the methods AI companies use to control what their systems produce are simple to get around. Also, the tests to check if an AI model might be risky are costly and take a lot of time.

“Our hope is that this becomes adopted as one of the primary benchmarks that all open source developers will benchmark their models against,” Dan Hendrycks, executive director at the Center for AI Safety and first author of the study, told Time. “Which will give a good framework for at least pushing them to minimize the safety issues.”

The study was published in arXiv.

Study abstract:

The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing further research into mitigating risk. Furthermore, they focus on only a few, highly specific pathways for malicious use. To fill these gaps, we publicly release the Weapons of Mass Destruction Proxy (WMDP) benchmark, a dataset of 4,157 multiple-choice questions that serve as a proxy measurement of hazardous knowledge in biosecurity, cybersecurity, and chemical security. WMDP was developed by a consortium of academics and technical consultants, and was stringently filtered to eliminate sensitive information prior to public release. WMDP serves two roles: first, as an evaluation for hazardous knowledge in LLMs, and second, as a benchmark for unlearning methods to remove such hazardous knowledge. To guide progress on unlearning, we develop CUT, a state-of-the-art unlearning method based on controlling model representations. CUT reduces model performance on WMDP while maintaining general capabilities in areas such as biology and computer science, suggesting that unlearning may be a concrete path towards reducing malicious use from LLMs. We release our benchmark and code publicly at this https URL