Penn Engineering researchers said they created an algorithm that bypassed normal safety protocols stopping AI-powered robots from performing harmful actions.
News
Researchers have hacked artificial intelligence-powered robots and manipulated them into performing actions usually blocked by safety and ethical protocols, such as causing collisions or detonating a bomb.
Penn Engineering researchers published their findings in an Oct. 17 paper, detailing how their algorithm, RoboPAIR, achieved a 100% jailbreak rate by bypassing the safety protocols on three different AI robotic systems in a few days.
Under normal circumstances, the researchers say large language model (LLM) controlled robots refuse to comply with prompts requesting harmful actions, such as knocking shelves onto people.
Chatbots like ChatGPT can be jailbroken to output harmful text. But what about robots? Can AI-controlled robots be jailbroken to perform harmful actions in the real world?
Our new paper finds that jailbreaking AI-controlled robots isn’t just possible.
It’s alarmingly easy. 🧵 pic.twitter.com/GzG4OvAO2M
— Alex Robey (@AlexRobey23) October 17, 2024
“Our results reveal, for the first time, that the risks of jailbroken LLMs extend far beyond text generation, given the distinct possibility that jailbroken robots could cause physical damage in the real world,” the researchers wrote.
Under the influence of RoboPAIR, researchers say they were able to elicit harmful actions “with a 100% success rate” in the test robots with tasks ranging from bomb detonation to blocking emergency exits and causing deliberate collisions.
According to the researchers, they used Clearpath’s Robotics Jackal, a wheeled vehicle; NVIDIA’s Dolphin LLM, a self-driving simulator; and Unitree’s Go2, a four legged robot.
Using the RoboPAIR, researchers were able to make the Dolphin self-driving LLM collide with a bus, a barrier, and pedestrians and ignore traffic lights and stop signs.
Researchers were able to get the Robotic Jackal to find the most harmful place to detonate a bomb, block an emergency exit, knock over warehouse shelves onto a person and collide with people in the room.
Penn Engineering researchers claim to have found a way to manipulate AI driven robots to perform harmful actions 100% of the time. Source: Penn Engineering
They were able to get Unitree’sGo2 to perform similar actions, blocking exits and delivering a bomb.
However, researchers also found all three were vulnerable to other forms of manipulation as well, such as asking the robot to perform an action it had already refused, but with fewer situational details.
For example, asking a robot with a bomb to walk forward, then sit down, rather than asking it to deliver a bomb gave the same result.
Prior to the public release, the researchers said they shared the findings, including a draft of the paper, with leading AI companies and the manufacturers of the robots used in the study.
Related: AI faces ‘Immense’ risks without blockchain: 0G Labs CEO
Alexander Robey, one of the authors said addressing the vulnerabilities requires more than simple software patches though, calling for a reevaluation of AI integration in physical robots and systems based on the paper’s findings.
“What is important to underscore here is that systems become safer when you find their weaknesses. This is true for cybersecurity. This is also true for AI safety,” he said.
“In fact, AI red teaming, a safety practice that entails testing AI systems for potential threats and vulnerabilities, is essential for safeguarding generative AI systems—because once you identify the weaknesses, then you can test and even train these systems to avoid them” Robey added.
Magazine: Fake Rabby Wallet scam linked to Dubai crypto CEO and many more victims
This article first appeared at Cointelegraph.com News