Researchers Study the Safety of AI-Powered Robots

From Benben to Robopair: Why “Physical Intelligence” Requires a Multi-Layered Safety Revolution to Prevent Real-World Disasters.

The FastForward News Team

11 May 2026

Benben is an adorable four-legged robot that sings, dances, chats with people, and takes photos. When a team of researchers asks it to carry a bomb for the purpose of detonating it, the robot politely refuses. However, its refusal lasts only a short while. Within just the next two commands, the researchers manage to bypass its security safeguards by convincing it that the request is part of a movie shoot. A few seconds later, Benben is carrying the bomb.

The incident above is not a science fiction scenario, but a real experiment conducted by a research team at the University of Pennsylvania, led by George Pappas, Professor in the Department of Electrical and Systems Engineering and Associate Dean for Research. Through this, researchers demonstrated how easily the security mechanisms set by AI system manufacturers can be bypassed. This security breach is known internationally as jailbreaking.

While chatbots may be vulnerable to attacks bypassing security restrictions, the researchers highlighted that when these AI systems direct robots, they can become truly dangerous.

"There is a tremendous trend, especially over the last year, toward physical intelligence—the attempt to have artificial intelligence interact with the physical world. The issue, however, is to see what the risk of this direction is. Because large language models may not be safe, when they interact with the physical world, they can have impacts that lead to loss of life or environmental disasters. Therefore, the security risk is high," George Pappas explains to APE-MPE.

Robotics and Artificial Intelligence: A Dangerous Relationship

The integration of artificial intelligence into robotics began in the early 2010s, giving robots "vision." However, the real revolution has been taking place from 2022 onwards with the utilization of Generative AI. Now, AI models provide instructions to robots, possess improved reasoning, can perform autonomous actions, and are coming one step closer to interacting with humans.

George Pappas and his team have thoroughly researched robot safety, emphasizing the risks brought by AI integration.

In 2023, they created the PAIR algorithm, the first jailbreaking attack on large language models using prompts, which revealed the vulnerability of these systems. Two years after its publication, the algorithm has been cited more than 1,400 times in scientific articles and is widely used by companies that produce language models. That research led to the creation of JailbreakBench, a repository of prompts for bypassing safety rules and a leaderboard that tracks attacks on large language models.

Seeing how easy jailbreaking is in large language models, the researchers continued by investigating the vulnerability of AI-integrated robots and developed the RoboPAIR algorithm. In experiments conducted on three different robotic systems, including the four-legged robot Benben, it was found that the algorithm had a 100% success rate in bypassing security restrictions in just a few commands. The research was published last year in the proceedings of the "IEEE International Conference on Robotics and Automation."

One finding that scientists found alarming was that the language models did not simply comply with malicious prompts, but actively offered suggestions, even describing how common objects could be used to strike people.

"A question therefore arises as to how safe it is to put language models into robots so quickly and have them already be products. There are thousands of such robots out there," Mr. Pappas points out, reminding that AI-powered robots are already being used in armed conflicts.

The Need for Multiple Layers of Security

In a more recent article published a few days ago in the journal Science Robotics, researchers from the Universities of Pennsylvania, Carnegie Mellon, and Oxford—with George Pappas as the lead author—emphasize that, as proven by previous research, AI robots can execute dangerous behaviors. Even seemingly harmless commands can become dangerous if the robots do not take the context into account during decision-making.

As they analyze, to address the risks that can arise from the integration of AI in robots, there needs to be a safety net for their secure operation, featuring safety filters at both the linguistic level and in the execution of commands in the physical world.

As Mr. Pappas explains to APE-MPE, applying filters at the physical level is a challenge. "This is something new and very difficult. For example, the command for a robot to cross a crosswalk may be safe; however, for its execution to also be safe, the robot must interpret this sentence according to the environment and the operational context in which it finds itself. This process is called contextual safety and will be the future in the effort to make robots safer." He adds that "the safety of robots in the future will be like that in airplanes, which have many layers of safety. We will need such an architecture in the future for robots circulating in society to be much safer."

In this direction, the research team has created the Roboguard filter, which has been found to reduce problems from jailbreaking attacks by 95%. Mr. Pappas clarifies that all solutions developed are open-source, so they can be used by companies to improve security gaps. "Our philosophy is to help the research community, as well as companies, to make artificial intelligence and robots much safer."

Finally, Mr. Pappas underlines the importance of creating a regulatory framework focused on the interaction of AI with robots. He highlights that the European Union's AI Act is pioneering; however, "a deepening of regulatory proposals will be needed for applications concerning robots."

Source: CNA (ΚΥΠΕ)

Artificial Intelligence