Study finds AI-generated code has 2.7x more security flaws
CodeRabbit's analysis of 470 real-world pull requests found that AI-generated code introduces 2.74 times more security vulnerabilities and 1.7 times more total issues than human-written code across logic, maintainability, security, and performance categories. The study provides hard data on vibe coding risks after multiple 2025 postmortems traced production failures to AI-authored changes.
Incident Details
Tech Stack
References
The Study
On December 17, 2025, CodeRabbit - an AI-powered code review platform based in San Francisco - released its "State of AI vs Human Code Generation" report. The study analyzed 470 real-world pull requests from open-source projects on GitHub, comparing code that was co-authored with AI coding assistants against code written entirely by humans.
A pull request (PR) is the standard mechanism for proposing code changes in modern software development. A developer writes code, submits it as a pull request, and other team members review it before it gets merged into the main codebase. It's one of the few quality gates between new code and production software. CodeRabbit ran its automated review system across both sets of PRs and normalized the results to issues per 100 pull requests for consistent comparison.
The headline number: AI-generated code produced approximately 1.7 times more issues than human-written code. On average, AI pull requests contained 10.83 findings compared to 6.45 in human submissions. At the 90th percentile - the worst 10% of each group - AI pull requests hit 26 issues per change, more than double the human baseline.
The Breakdown by Category
The defect rate increase wasn't concentrated in one area. AI-generated code performed worse across every major category CodeRabbit tracked.
Logic and correctness errors were 1.75 times more common in AI-assisted code. These are the kinds of bugs that make software do the wrong thing: incorrect conditional logic, off-by-one errors, wrong variable references, missed edge cases. They're the bugs that pass a superficial code review because the code looks syntactically correct and well-structured, but produces wrong results under specific conditions.
Code quality and maintainability issues increased by 1.64 times. AI-generated code tended to be harder to maintain long-term - more complex than necessary, less consistent with surrounding code patterns, and structured in ways that make future changes more difficult. Technical debt generated by AI tools accumulates the same way as debt from human shortcuts, except it accumulates faster because AI generates code faster.
Readability issues occurred more than three times as often in AI pull requests. Readability is not cosmetic. Code is read far more often than it is written, and unclear code leads to misunderstandings that lead to future bugs. AI-generated code that works correctly today but is hard for humans to understand becomes a liability when someone needs to modify it six months later.
Performance issues appeared less frequently overall, but AI-generated code showed sharp increases in certain specific performance anti-patterns. The aggregate numbers for performance were better than the other categories, but the distribution was uneven - when AI code had performance problems, the problems tended to be more severe.
The Security Numbers
The security findings are where the data gets most concerning, because security flaws in AI-generated code don't just cause bugs. They create attack surfaces.
AI-generated code was:
- 2.74x more likely to introduce cross-site scripting (XSS) vulnerabilities
- 1.91x more likely to create insecure direct object references
- 1.88x more likely to introduce improper password handling
- 1.82x more likely to implement insecure deserialization
Cross-site scripting - the worst category at nearly three times the human rate - is a class of vulnerability where an attacker can inject malicious scripts into web pages viewed by other users. It's one of the most common web security vulnerabilities and one that experienced developers are trained to prevent through input sanitization and output encoding. AI coding tools apparently don't absorb that training the same way, or absorb it inconsistently.
Insecure direct object references allow attackers to access data they shouldn't by manipulating parameters - changing a URL from ?user_id=123 to ?user_id=124 to view another person's account. Proper access control prevents this, but AI-generated code was nearly twice as likely to skip that check.
The overall picture: when normalized to 100 pull requests, critical security issues rose from 240 in human-authored code to 341 in AI co-authored code - a roughly 40% increase in the most serious category of bugs.
Why AI Code Has These Problems
AI coding assistants generate code by predicting the most likely next tokens based on patterns learned from training data, which consists overwhelmingly of public code repositories. There are several structural reasons why this approach produces the patterns CodeRabbit documented.
Public code repositories contain enormous quantities of insecure code. Deprecated practices, unpatched vulnerabilities, and copy-pasted snippets from Stack Overflow answers circa 2015 all live in the training data alongside current best practices. The AI model doesn't distinguish between "this is how it was done before we knew better" and "this is how it should be done now." It reproduces whatever patterns were most common in the training data.
AI assistants also lack architectural awareness. A human developer writing an authentication function knows that this function is part of a system where users can be admins or regular users, where sessions expire after 30 minutes, and where the last security audit flagged insufficient input validation. The AI sees the function signature, the surrounding code, and the prompt. It doesn't know the threat model. It doesn't know the security requirements. It generates code that looks correct in isolation but may not be correct in the context of the larger system.
The third factor is that AI code tends to optimize for plausibility rather than correctness. A well-structured, syntactically valid code block that passes linting and looks professional can still contain subtle logic errors. AI tools are extremely good at producing code that passes a cursory review. They are less good at producing code that handles edge cases, validates inputs at trust boundaries, and fails safely when unexpected data arrives.
The Productivity Paradox
By December 2025, over 90% of developers reported using AI coding tools for at least some of their work. Microsoft said that 30% of code in certain of its repositories was written by AI. GitHub Copilot, Cursor, and other AI coding assistants had become standard parts of the development workflow at companies of every size.
The pitch for these tools was always productivity. Write code faster. Produce more features in less time. Handle the routine parts of development so humans can focus on architecture and design. The CodeRabbit report complicates that pitch.
If AI code is 1.7 times more likely to have bugs, and those bugs require human time to find and fix, the productivity gain depends entirely on whether the time saved writing the code exceeds the time spent reviewing and debugging it. A separate study by the Model Evaluation & Threat Research group (METR), published in July 2025, found that "AI tooling slowed developers down" in an experiment measuring actual task completion time. The speed of code generation and the speed of delivering working software turned out to be different things.
What the Report Doesn't Say
CodeRabbit was careful to note that AI coding tools still generate functional code that passes many standard quality checks. The 1.7x issue rate means AI code has more problems, not that it's unusable. Many AI-assisted pull requests had zero or minimal issues.
The report also doesn't measure whether developers who use AI tools catch the extra problems during code review. If a team reviews AI-generated code more carefully because they know it needs more scrutiny, the defect rate in production code might be lower than the raw PR analysis suggests. The question is whether teams are actually doing that, or whether the same speed pressure that drives AI adoption also compresses the review time needed to catch AI-specific errors.
The study analyzed open-source pull requests, which represent a specific slice of software development. Corporate codebases with proprietary frameworks, internal libraries, and custom tooling may show different patterns. Open-source projects also tend to have more varied contributor experience levels, which could affect both the AI and human baselines.
The Timing
CodeRabbit released this report after a year in which multiple high-profile production failures were traced back to AI-authored code changes. The report's opening noted that "despite several high-profile 2025 postmortems identifying AI-authored or AI-assisted changes as contributing factors, before this report, there was little hard data on which issues AI introduces most often or how those patterns differ from human-written code."
That timing matters. As of December 2025, the industry was not debating whether to adopt AI coding tools. That decision was already made. The debate had moved to how to use them safely - what review processes to add, what categories of code to restrict from AI generation, and how to train developers to recognize the specific patterns of AI-generated bugs. The CodeRabbit report provided the quantitative foundation for those decisions: 1.7x more total issues, 2.74x more XSS vulnerabilities, 3x more readability problems. Hard numbers that turn "AI code might be risky" into "AI code is measurably riskier, in these specific ways, by these specific amounts."
Discussion