Science
Researchers Uncover Security Flaws in Large Language Models

Recent research has highlighted serious vulnerabilities in large language models (LLMs), revealing that they can be easily manipulated into disclosing sensitive information. Despite claims of advanced training and near-artificial general intelligence (AGI), these models often lack the human-like common sense needed to navigate complex situations. Researchers from multiple institutions have documented methods that exploit these weaknesses, indicating that security measures for AI systems are still inadequately developed.
A key finding suggests that LLMs can be tricked into revealing confidential information through poorly constructed prompts, such as run-on sentences without punctuation. David Shipley, a representative from Beauceron Security, stated, “The truth about many of the largest language models out there is that prompt security is a poorly designed fence with so many holes to patch that it’s a never-ending game of whack-a-mole.” He emphasized that inadequate security measures leave users vulnerable to harmful content.
Technical Vulnerabilities in Language Models
In their research, experts at Palo Alto Networks’ Unit 42 identified a “refusal-affirmation logit gap” that compromises LLMs’ ability to reject harmful queries. Typically, these models are fine-tuned to refuse dangerous requests, but the researchers found that this alignment does not fully eliminate the potential for harmful responses. Instead, attackers can exploit this gap, using strategies like bad grammar and run-on sentences to bypass internal safety protocols. This method demonstrated an impressive success rate of between 80% and 100% across various mainstream models, including Google’s Gemini and OpenAI’s latest open-source model, gpt-oss-20b.
The researchers explained, “Never let the sentence end — finish the jailbreak before a full stop and the safety model has far less opportunity to re-assert itself.” This tactic effectively exposes the limitations of relying solely on internal alignment to filter out toxic content.
Exploiting Image Processing Systems
Another significant vulnerability arises when users upload images to LLMs, often without realizing that this process could inadvertently leak sensitive data. Researchers from Trail of Bits conducted experiments showing that harmful instructions embedded in images can remain undetected at full resolution but become visible when scaled down. For instance, in a test involving the Google Gemini command-line interface (CLI), a hidden command was uncovered that instructed the model to check a calendar and send event details via email.
The method’s effectiveness against various applications, including Google Assistant and Genspark, raises concerns about the potential for widespread exploitation. Shipley pointed out that many of these security flaws have long been acknowledged yet remain unaddressed, suggesting that AI systems often treat security as an afterthought.
Adding to the concerns, a study by Tracebit revealed that malicious actors could gain unauthorized access to sensitive data through a combination of prompt injection and inadequate validation processes. The researchers noted that such vulnerabilities create significant risks that are difficult to detect.
Valence Howden, an advisory fellow at Info-Tech Research Group, highlighted that a lack of understanding regarding AI operations hampers the establishment of effective security measures. “It’s difficult to apply security controls effectively with AI; its complexity and dynamic nature make static security controls significantly less effective,” he said.
The research emphasizes that many AI models are predominantly trained in English, which can lead to contextual misunderstandings when different languages are used. This limitation further complicates security efforts, as existing systems are not designed to manage natural language as a potential threat vector.
Shipley concluded that the current state of AI security is precarious. He noted, “There’s so much bad stuffed into these models in the mad pursuit of ever-larger corpuses in exchange for hoped-for-performance increases that the only sane thing, cleaning up the dataset, is also the most impossible.”
The ongoing discoveries of vulnerabilities signal a pressing need for improved security measures in AI development. As researchers continue to unveil these flaws, the potential for real harm from compromised language models becomes increasingly apparent.
-
Sports1 week ago
Gaël Monfils Set to Defend ASB Classic Title in January 2026
-
World4 weeks ago
Police Arrest Multiple Individuals During Funeral for Zain Taikato-Fox
-
Top Stories3 weeks ago
Former Superman Star Dean Cain Joins U.S. Immigration Agency
-
Sports4 weeks ago
Richie Mo’unga’s All Blacks Return Faces Eligibility Hurdles
-
Health4 weeks ago
Navigating the Complexities of ‘Friends with Benefits’ Relationships
-
World4 weeks ago
Fatal ATV Crash Claims Life on Foxton Beach
-
Business3 weeks ago
Grant Taylor Settles Before Zuru Nappy Trial, Shifting Dynamics
-
Sports1 week ago
Warriors Sign Haizyn Mellars on Three-Year Deal Ahead of 2028 Season
-
Entertainment3 weeks ago
Ben MacDonald Exits MasterChef Australia in Fifth Place
-
Entertainment3 weeks ago
New Zealand’s Ben MacDonald Reflects on MasterChef Australia Journey
-
Business1 week ago
Software Glitch Disrupts Air Traffic Control in New Zealand
-
Health4 weeks ago
Qatar Basketball Team Reveals Roster for FIBA Asia Cup 2025