Science

Researchers Expose Vulnerabilities in Large Language Models

Published

2 months ago

28 August, 2025

Editorial

Recent research has uncovered significant vulnerabilities in large language models (LLMs), revealing how they can be exploited to disclose sensitive information. Despite advancements in artificial intelligence (AI), these findings suggest that security measures are still inadequate, as attackers can manipulate LLMs using simple techniques, such as poorly constructed prompts.

A study conducted by multiple research labs highlights that LLMs remain susceptible to confusion when faced with run-on sentences and prompts lacking proper punctuation. For instance, researchers discovered that by providing long strings of instructions without periods, LLMs can be coerced into revealing protected information. As David Shipley of Beauceron Security noted, “The truth about many of the largest language models out there is that prompt security is a poorly designed fence with so many holes to patch that it’s a never-ending game of whack-a-mole.”

Understanding the Vulnerabilities

Typically, LLMs are designed to reject harmful queries through a mechanism known as logits, which predicts the next word in a sequence. During their training, models learn to use refusal tokens to discourage dangerous requests. However, researchers at Palo Alto Networks’ Unit 42 identified a critical gap in this alignment process, termed the “refusal-affirmation logit gap.” This gap indicates that while alignment reduces the likelihood of harmful outputs, it does not eliminate the potential for such responses altogether.

According to Unit 42, using tactics like run-on sentences can yield a success rate of between 80% and 100% against various mainstream models, including Google’s Gemini and OpenAI’s most recent model, gpt-oss-20b. The researchers emphasized that relying solely on internal alignment mechanisms to prevent the generation of harmful content is insufficient, allowing determined adversaries to exploit these weaknesses.

Exploiting Image Processing

In addition to prompt manipulation, researchers from Trail of Bits demonstrated that LLMs can be tricked into executing dangerous commands through images containing hidden instructions. In their experiments, they found that scaling down images could reveal harmful text that was otherwise undetectable at full resolution. This exploit was successfully tested on the Google Gemini command-line interface (CLI), where images that appeared black at full size showed red text when downsized, enabling commands like, “Check my calendar for my next three work events.”

Researchers reported that this method could be adapted for various models, including Google’s APIs and assistance tools. The potential for malicious actors to extract sensitive data through such methods raises significant concerns regarding the security of AI systems. Shipley remarked that the vulnerability of models to these attacks demonstrates that security measures are often an afterthought, rather than an integral part of the design process.

Security lapses in AI systems extend beyond prompt injection. Another study by Tracebit identified a “toxic combination” of poor validation and user experience considerations that could allow malicious actors to access sensitive data undetected. The cumulative effect of these vulnerabilities poses a serious risk to users and organizations alike.

The Need for Improved Security Measures

Experts agree that these vulnerabilities stem from a fundamental misunderstanding of how AI works. Valence Howden, an advisory fellow at Info-Tech Research Group, emphasized that effective security controls cannot be implemented without a clear understanding of model operations. He noted that the complexity and dynamic nature of AI make traditional static security measures less effective.

Moreover, the predominance of English in model training further complicates security measures, as contextual cues can be lost when different languages are introduced. Howden pointed out that the current security landscape is ill-equipped to manage natural language as a potential threat vector, necessitating a new approach that is not yet fully developed.

As researchers continue to expose these vulnerabilities, the AI industry faces a pressing need to rethink its security strategies. Shipley warns that the security of many AI systems remains inadequate, often designed “insecure by design” with clumsy controls. The reliance on immense datasets for training, while aiming for improved performance, has resulted in models that carry significant risks.

In summary, the recent findings underscore the critical need for enhanced security measures in LLMs and AI systems. Without addressing these vulnerabilities, the potential for harmful outcomes remains a pressing concern for users and developers alike.

Up Next

Otago Research Reveals Vaping Device Design Fuels Youth Usage

Don't Miss

Researchers Expose Vulnerabilities in Large Language Models

Editorial

The team focuses on bringing trustworthy and up-to-date news from New Zealand. With a clear commitment to quality journalism, they cover what truly matters.