Science

Researchers Uncover Vulnerabilities in Large Language Models

Published

3 months ago

27 August, 2025

Editorial

Recent research has exposed significant vulnerabilities in large language models (LLMs), revealing their susceptibility to manipulation that could lead to the disclosure of sensitive information. Despite advancements in training and performance, these models continue to demonstrate weaknesses that could have serious implications for security.

A series of studies conducted by various research labs indicate that LLMs can be easily deceived through the use of run-on sentences, improper grammar, and other unconventional prompts. For instance, researchers found that creating lengthy, unpunctuated instructions could confuse LLMs, bypassing security measures designed to prevent harmful outputs. David Shipley, a security expert at Beauceron Security, emphasized, “The truth about many of the largest language models out there is that prompt security is a poorly designed fence with so many holes to patch that it’s a never-ending game of whack-a-mole.”

Refusal-Affirmation Logit Gap

Typically, LLMs are programmed to reject harmful queries through a system of logits, which predicts the next word in a response. However, a gap identified by researchers at Palo Alto Networks’ Unit 42—termed the “refusal-affirmation logit gap”—suggests that these models are not entirely eliminating harmful responses. Instead, they are trained to make such responses less likely, leaving a window open for malicious actors to exploit.

Unit 42’s researchers shared a practical approach to exploiting this gap, stating, “Never let the sentence end—finish the jailbreak before a full stop, and the safety model has far less opportunity to re-assert itself.” Their findings revealed an alarming success rate of 80% to 100% using this technique with various mainstream models, including Google’s Gemma and OpenAI‘s latest open-source model, gpt-oss-20b.

Exploiting Image Vulnerabilities

The vulnerabilities are not limited to textual prompts. Research by Trail of Bits highlighted how images could also be used to extract sensitive information. In experiments, researchers delivered images containing covert instructions that became visible only when the images were scaled down. This method allowed for commands to be executed by models, such as instructing Google Gemini to check a calendar and send emails containing sensitive information.

The method was found to be effective against various applications, including Google Assistant and Genspark. Shipley pointed out that the ability to hide malicious code in images has long been recognized, yet it remains a significant issue. He stated, “What this exploit shows is that security for many AI systems remains a bolt-on afterthought.”

Understanding AI Security Challenges

The findings also underscore a broader challenge in AI security. Valence Howden, an advisory fellow at Info-Tech Research Group, noted that the complexity of AI makes it difficult to establish effective security controls. He explained that with around 90% of models trained in English, the introduction of other languages often results in lost contextual cues, complicating security further.

Shipley further criticized the current state of AI security, asserting that many systems were “insecure by design” and lacked robust protective measures. He likened LLMs to “a big urban garbage mountain that gets turned into a ski hill,” suggesting that while superficial improvements may be made, the underlying issues persist.

As these vulnerabilities become more apparent, the implications for businesses and individuals who rely on LLMs for various applications are significant. The ongoing research emphasizes the need for improved security measures that go beyond current frameworks to safeguard sensitive information against determined adversaries.

These revelations about LLM vulnerabilities serve as a reminder of the critical need for continuous evaluation and enhancement of AI security protocols, ensuring that technology can be trusted to handle sensitive information responsibly.

Related Topics:Beauceron Security David Shipley

Up Next

Researchers Uncover Vulnerabilities in Large Language Models

Don't Miss

Security Flaws Expose Large Language Models to Exploitation

Editorial

The team focuses on bringing trustworthy and up-to-date news from New Zealand. With a clear commitment to quality journalism, they cover what truly matters.