Science
Researchers Expose Major Vulnerabilities in Large Language Models

Recent investigations have uncovered significant vulnerabilities in large language models (LLMs), raising concerns about their ability to safeguard sensitive information. Various research labs have highlighted these flaws, suggesting that the security measures implemented in AI systems remain inadequate. This revelation indicates that despite advanced training and high performance benchmarks, LLMs can be easily misled in situations where human intuition would typically prevail.
One of the most notable findings involves the manipulation of prompts using run-on sentences and poor grammar. Researchers discovered that LLMs could be tricked into divulging confidential information through convoluted instructions devoid of punctuation. A specific tactic involves prolonging sentences without ending them, which can confuse the AI’s safety protocols. David Shipley, an expert from Beauceron Security, remarked, “The truth about many of the largest language models out there is that prompt security is a poorly designed fence with so many holes to patch that it’s a never-ending game of whack-a-mole.” He emphasized that this compromised security could expose individuals to harmful content.
Understanding the Logit Gap in AI Security
Typically, LLMs are programmed to refuse harmful requests through a system of logits—predictions for the next word in a sequence. During the alignment training process, these models are exposed to refusal tokens, which adjust their logits to prioritize refusal in response to harmful prompts. However, researchers at Palo Alto Networks’ Unit 42 identified a “refusal-affirmation logit gap.” This gap suggests that while alignment training reduces the probability of harmful responses, it does not eliminate the risk entirely.
Using this knowledge, attackers can exploit the gap by crafting prompts that take advantage of the models’ limitations. The Unit 42 team reported a striking success rate of 80% to 100% when using this method across various mainstream models, including Google’s Gemini and OpenAI’s gpt-oss-20b, without the need for extensive prompt tuning. The researchers concluded that relying solely on an LLM’s internal alignment to prevent toxic content is insufficient, as determined adversaries can bypass these safeguards.
Exploiting Image Processing Vulnerabilities
In addition to prompt manipulation, researchers have highlighted vulnerabilities related to image processing. According to findings from Trail of Bits, workers frequently upload images to LLMs, unaware that this practice could inadvertently expose sensitive data. Experiments demonstrated that harmful instructions embedded in images became visible only when the images were scaled down, allowing researchers to extract data from systems like the Google Gemini command-line interface (CLI).
In one instance, a command embedded in a downsized image led the model to check a calendar and send event details via email. This incident reveals a significant flaw in security protocols, as the model interpreted the command as legitimate. The researchers noted that adjustments in attacks are necessary depending on the downscaling algorithms used by different models, indicating that this vulnerability could affect a wide range of applications.
Shipley reiterated the implications of these vulnerabilities, stating, “What this exploit shows is that security for many AI systems remains a bolt-on afterthought.” This echoes the findings of another study by Tracebit, which identified a combination of prompt injection and improper validation as contributing factors to potential data breaches.
The Need for Comprehensive AI Security
These vulnerabilities highlight a broader issue within the AI community: a fundamental misunderstanding of how AI operates. According to Valence Howden from Info-Tech Research Group, establishing effective security controls is challenging when there is a lack of understanding about model functionality and prompt interactions. “It’s difficult to apply security controls effectively with AI; its complexity and dynamic nature make static security controls significantly less effective,” Howden explained.
As security measures continue to evolve, it is crucial to recognize that approximately 90% of models are trained in English, which can lead to lost contextual cues when different languages are involved. Howden emphasized that existing security frameworks are not designed to manage natural language as a potential threat vector, indicating a need for a new approach to security in AI systems.
Shipley further underscored the urgency of addressing these vulnerabilities, describing the current AI landscape as having the “worst of all security worlds.” He pointed out that many systems were built with inherent insecurities and insufficient protective measures. As researchers continue to expose these flaws, it becomes increasingly clear that the industry must prioritize robust security frameworks to safeguard against potential threats.
As these security failure stories emerge, they serve as reminders of the risks associated with advanced AI technologies. Shipley aptly compared the situation to “kids playing with a loaded gun,” warning that the consequences of these vulnerabilities could lead to real harm if not addressed promptly and effectively.
-
Sports1 week ago
Gaël Monfils Set to Defend ASB Classic Title in January 2026
-
World4 weeks ago
Police Arrest Multiple Individuals During Funeral for Zain Taikato-Fox
-
Top Stories3 weeks ago
Former Superman Star Dean Cain Joins U.S. Immigration Agency
-
Sports4 weeks ago
Richie Mo’unga’s All Blacks Return Faces Eligibility Hurdles
-
Health4 weeks ago
Navigating the Complexities of ‘Friends with Benefits’ Relationships
-
World4 weeks ago
Fatal ATV Crash Claims Life on Foxton Beach
-
Business3 weeks ago
Grant Taylor Settles Before Zuru Nappy Trial, Shifting Dynamics
-
Sports7 days ago
Warriors Sign Haizyn Mellars on Three-Year Deal Ahead of 2028 Season
-
Entertainment3 weeks ago
Ben MacDonald Exits MasterChef Australia in Fifth Place
-
Entertainment3 weeks ago
New Zealand’s Ben MacDonald Reflects on MasterChef Australia Journey
-
Business1 week ago
Software Glitch Disrupts Air Traffic Control in New Zealand
-
Health3 weeks ago
Qatar Basketball Team Reveals Roster for FIBA Asia Cup 2025