AI Coding Assistants: Software Quality, Security & Maintainability

One of the biggest concerns with AI-generated code is: Does it meet the same quality and security standards as human-crafted code?

Martin Jordanovski

~8 min read · March 10, 2025 (Updated: March 10, 2025) · Free: Yes

As developers integrate AI-written code into projects, we have to grapple with potential impacts on code quality, security vulnerabilities, and long-term maintainability (technical debt).

Early studies and industry reports are a mixed bag, they show incredible productivity gains, but also some worrying signs that if left unchecked, AI could propagate subpar coding patterns or vulnerabilities at scale.

Image: gitclear.com analysis shows code churn, code written then quickly changed, roughly doubled from 2021 to 2023 in the "AI-influenced" era. More churn and duplicate code suggests maintainability challenges.

On the quality front, there's evidence that AI assistance can lead to more code, and more code can mean more problems. A recent analysis by a tool vendor (GitClear) examined 153 million lines of code over 4 years and noted a significant uptick in "churn" and code duplication since the rise of LLM-based coding.

Code churn refers to code that gets written but then changed or reverted shortly after. High churn can indicate thrashing or lower-quality code that needed rework.

According to their data, starting around 2022 (when AI coding tools became popular), churn began increasing notably. In fact, by 2023, short-term churn was almost double what it was in the pre-Copilot era. The authors stop short of blaming AI outright, but the correlation is strong. They describe the AI-influenced code of 2023 as resembling "an itinerant contributor, prone to violate the DRY-ness of the repos", meaning it often introduced repetitive code and less refined structures.

Duplicate code blocks also spiked, which makes sense if AI is unaware it's regenerating code similar to something that exists elsewhere in the codebase. Essentially, AI might cause developers to write more, newer code rather than reusing existing functions, because it's so easy to generate something from scratch on demand. This can inadvertently increase technical debt (e.g., multiple implementations of the same logic).

From a maintainability perspective, many team leads have voiced concern that AI may accelerate delivery of code at the expense of clarity and simplicity. One CEO commented that "AI-enabled code development [will exponentially increase] the volume and velocity of code delivery, much of which we anticipate will be of lower quality and more bloated.".

This sentiment is echoed by many, if a single developer can now produce, say, 30%+ more code per day with AI, that could be fantastic if the code is good. If not, it's just more lines for others to maintain or fix. The same CEO (of an app security company) warned that this surge of AI-generated code could overwhelm security teams, who are already struggling to keep up with manual code reviews and vulnerability scans.

It's like suddenly doubling the output of a factory without doubling quality control, you'll get more product, but also potentially more defects slipping through.

Security is a major part of this quality conversation. AI assistants, especially earlier versions, have been caught suggesting insecure code patterns.

Knock knock, who's there? Image generated by the author utilizing stable diffusion models.

In 2021, an academic study famously showed that GitHub Copilot would often emit code with common security flaws (like SQL injection vulnerabilities) when asked to generate certain functionality.

Recent studies show improvement, but not a complete fix. A 2023 replication study found that with newer versions, Copilot's suggestions were still insecure about 27% of the time, down from ~36% in earlier tests.

Another research group (Fu et al. 2023) analyzed outputs and found 32.8% of Copilot's generated Python code and 24.5% of its JavaScript code had security vulnerabilities.

These are significant numbers, roughly one out of three AI-generated snippets in some languages may introduce a vulnerability (the release of new models improves this)! The types of issues included things like using outdated cryptographic algorithms, not properly validating inputs, and misconfiguring security settings, often mirroring bad practices that existed in the training data.

Another way AI can propagate risk is by surfacing older or less optimal solutions. If much of the training data comes from public code that isn't necessarily the best code, the AI might learn patterns that aren't ideal.

For instance, it might use a library in an insecure way just because that's what it saw most often. Without awareness, it can spread that pattern. This is why developers must be the filter, having an AI write code doesn't absolve us from applying our knowledge of secure practices. It's a bit like code review: you'd question a human colleague's code if they, say, concatenated strings to build SQL queries (risking injection). You need to question the AI's code in the same way.

Readability and maintainability of AI-generated code have also been scrutinized. In many cases, AI code is syntactically correct but not stylistically consistent. It might not adhere to your project's conventions unless you prompt it to. Developers have noticed things like inconsistent naming schemes or formatting in AI contributions. This can be cleaned up by linters and formatters, of course.

A bigger issue is logical clarity: sometimes AI will take a roundabout approach to solve a problem. For example, it might generate a solution using a complex nested loop because it pieced it together from examples, whereas a human might immediately use a clearer library function or a more idiomatic approach.

Over time, if such AI-born code accumulates in a codebase, you could end up with a patchwork of different styles and a lot of "unnecessarily complicated" sections that future maintainers struggle to understand.

One IT leader pointed out that AI-generated code needs the same refactoring and cleanup as any other code, it's not drop-in perfection. If teams treat it as rough draft that engineers then refine, quality can be kept high. But if they just accept it verbatim due to time pressure, those rough edges become tomorrow's tech debt.

It's worth noting that AI can also be a force for improving code quality in some aspects. For example, tools like Amazon Q Developer attempt to automatically scan for vulnerabilities or bad practices and either avoid them or warn the user. AI can even be used to refactor code, there are experiments with getting models to optimize or simplify a given function.

In one study by IBM (the company I currently work for, has been serving IBM as client for a long time, 20+ years), developers found AI suggestions helped them catch mistakes or consider edge cases, effectively acting like a pair reviewer that improves code quality (not just degrades it).

Some assistive tools will highlight the differences between AI suggestion and closest known good code (like GitHub's code referencing feature that shows if the suggestion matches code in a public repo and its license). These kinds of features can alert a developer if, say, the AI is about to insert a chunk of code that might be outdated or problematic. Additionally, since AI can generate tests or documentation, it might indirectly boost quality by encouraging more complete test coverage and clearer code comments (assuming developers use it for that).

However, those benefits are only realized if developers actively use the AI for quality purposes, not just raw code generation. The overall industry insight is that AI's role in technical debt is double-edged. It can help write tests, enforce patterns, and do mass refactors (reducing debt), but it can also flood a project with sloppy code if not kept in check (increasing debt).

Seasoned engineers have reported experiences where AI-produced code in large-scale production had to be heavily reviewed and in some cases rewritten, either due to performance issues or maintainability.

One developer on a forum remarked that after a few months of juniors using AI without oversight, they found sections of the codebase that "felt like they were written by an AI, technically working, but very hard to follow and rife with odd edge case bugs."

In terms of real-world production use, organizations are starting to develop guidelines around AI-generated code. Some companies have banned direct use of AI suggestions in critical code without manual review or approval. Others allow it but require an annotation in the code review (so the reviewer knows that code was AI-suggested and might need extra scrutiny).

There's also increasing interest in tools that scan AI-generated code for known vulnerability patterns, essentially an AI to check the AI's work, reminiscent of virus scanners. For instance, researchers have developed prototypes like "Critic" models that specifically critique code for bugs or security issues. These could become part of the pipeline: AI writes code, another AI (or static analysis tool) flags potential issues, and then a human makes the final call.

Overall, the quality of AI-generated code is improving as models get more advanced (for example, GPT-4.5 is far better than GPT-3.5 at following instructions and avoiding obvious errors, Claude 3.7 Sonnet, arrival of DeepSeek R1), but it's nowhere near perfect. It can accelerate the production of both good and bad code.

The responsibility lies with developers and teams to harness the good and mitigate the bad. That means keeping security best practices in mind (don't blindly trust that the AI did something safely), doing code reviews and tests as diligently as ever, and refactoring AI code to your standards.

When used well, AI assistants can help produce clean, secure code, they might even suggest adding input validation or error handling that a human might forget. But used naively, they could amplify bad habits or outdated techniques, creating a minefield in your codebase.

In summary, AI coding tools can and will write flawed code, the scale might be new, but the solution is familiar: good engineering process.

As one article concluded, the data shows a "confluence of indicators that show code quality decline at scale, starting around 2022", but it's correlational. It likely reflects many developers using AI uncritically.

Going forward, the hope is that with better AI and better developer vigilance, we'll see code quality and security increase. AI could help eliminate human mistakes (typos, forgotten null checks) and even learn to avoid known pitfalls. But until then, developers must act as the quality gatekeepers. The code coming out of an AI is only as good as the guidance and checks we put around it.

Next, we'll discuss how AI influences higher-level software design and architectural decisions, an area where human judgment is even more crucial.

Thank you for taking the time to read my article. Your support means a lot to me. If you found this content valuable, I would greatly appreciate it if you could:

• Clap 👏 to show your support.

• Comment 💬 with your thoughts or questions.

• Follow ➕ to stay updated with my latest articles.

Your feedback and interaction help me create better content for you. Thank you!

#ai #programming #software-engineering #software-development #security

AI Coding Assistants: Software Quality, Security & Maintainability

One of the biggest concerns with AI-generated code is: Does it meet the same quality and security standards as human-crafted code?

Reporting a Problem