25 Mar 2026
How to spot AI-generated code in a GitHub repo
A practical checklist for reviewing repositories that may include AI-assisted code.
When you are reviewing a repository, the hard part is not deciding whether AI tools were used. The hard part is deciding whether the resulting code is trustworthy.
If you are searching for how to spot AI-generated code in a repo, the practical answer is to evaluate patterns that affect maintainability, security, and test confidence.
The most useful question is simple:
Does this repo look like it evolved through deliberate engineering decisions, or was it assembled in large generated batches with light validation?
No single signal is conclusive, but patterns are.
How Repo Watch surfaces these AI-risk signals
AI-Risk Indicators
How Repo Watch surfaces this
Repo Watch runs static analysis only and surfaces AI-risk indicators as directional heuristics, alongside test confidence, code quality, and security hygiene sections.
- Structured AI-risk indicators are shown in a dedicated section with clear score explainers.
- Top findings are prioritized so reviewers can inspect the highest-impact signals first.
- Each finding includes category, severity, and plain-language guidance for next steps.
- Scan completeness caveats are surfaced so teams understand confidence limits in the result.
This makes AI code detection for GitHub and GitLab repositories more actionable: you get a triage view, not a black-box verdict.
AI-Risk Indicators
50
Score explainer?
AI-Risk Indicators are directional, structure-based heuristics that look for signals often associated with AI-assisted code generation patterns.
Computed total: 50 (matches final section score).
This mirrors the actual AI-Risk Indicators section in Repo Watch scan results. Each contribution is shown explicitly so reviewers can trace exactly how the section score was produced, not just consume a number.
1. Style that is too uniform
Healthy teams usually show mild variation over time. AI-heavy output can look unnaturally consistent across every file.
Check for:
- identical comment tone in unrelated modules
- repetitive naming patterns everywhere
- little variation in implementation style across long timelines
// utils/formatDate.ts
/**
* Formats a date string into a human-readable format.
* @param date - The date string to format.
* @returns A formatted date string.
*/
export function formatDate(date: string): string {
return new Date(date).toLocaleDateString();
}
// utils/parseAmount.ts (different module, same author?)
/**
- Parses an amount value into a numeric representation.
- @param amount - The amount string to parse.
- @returns A parsed numeric amount.
*/
export function parseAmount(amount: string): number {
return parseFloat(amount);
}Repo Watch does not currently scan for this. Comment tone and naming uniformity require AST-level style analysis that is outside the current static scan scope. The AI-Risk section uses structural fragmentation (many files per source file) as a coarser proxy for generated batches, but style uniformity itself is a manual review signal for now.
Consistency is good. Artificially perfect consistency can be a review clue.
2. Generic abstractions with no real demand
Generated code often adds extra layers "just in case": utility wrappers, broad interfaces, and extension points that never get used.
Check for:
- exported helpers that are never called
- interfaces with options no code path exercises
- abstractions that increase complexity without reducing duplication
// A wrapper around HttpClient that adds nothing
export class HttpClientWrapper {
private readonly client: HttpClient;
constructor(options?: HttpClientOptions) {
this.client = new HttpClient(options);
}
async get<T>(url: string, config?: RequestConfig): Promise<ApiResponse<T>> {
return this.client.get<T>(url, config);
}
async post<T>(url: string, data: unknown, config?: RequestConfig): Promise<ApiResponse<T>> {
return this.client.post<T>(url, data, config);
}
}
// Used exactly once, in a single page component, calling .get() with no configRepo Watch does not currently scan for this. Two separate passes are planned. Structural proxy smells — classes whose every method just delegates to an inner object, all-optional interfaces — are addressable with targeted Semgrep rules and are on the roadmap. True dead-code and unused export detection requires a cross-file symbol graph; we plan to add a knip pass to the worker pipeline for that, which resolves re-exports and barrel files without executing any code.
3. Coverage without confidence
A repo can show high coverage and still miss critical behavior.
Check for:
- minimal negative tests (
throws,rejects, invalid input) - no boundary tests (empty data, max limits, null variants)
- repeated happy-path unit tests with little failure exploration
// 3 tests, all green, strong coverage percentage
it("returns user by id", async () => {
const user = await getUser("user-1");
expect(user.id).toBe("user-1");
});
it("returns user name", async () => {
const user = await getUser("user-1");
expect(user.name).toBe("Alice");
});
it("returns user email", async () => {
const user = await getUser("user-1");
expect(user.email).toBe("alice@example.com");
});
// Missing: getUser("nonexistent"), getUser(null), deleted accounts, permission checksRepo Watch partially scans for this. A test-to-source file ratio is measured as a structural confidence signal, and thin test footprints lower the Test Confidence section score. Repo Watch does not yet inspect test content or parse coverage artifacts to identify happy-path-only suites. Coverage artifact integration is planned work.
Coverage is a metric. Test confidence is an outcome.
4. Commit shapes that feel synthetic
Organic repositories usually include small fixes, partial refactors, and uneven progress.
Check for:
- very large one-shot commits introducing many modules
- commit messages that are uniformly polished from day one
- no visible maintenance rhythm between feature drops
$ git log --oneline
a4f1c2e feat: implement complete user authentication system with OAuth
b3e2d1f feat: add full product catalog with search, filtering, and sorting
c2d3e4f feat: build checkout flow with payment integration and confirmation emails
d1e2f3a feat: create admin dashboard with analytics and user management
e0f1a2b feat: set up project structure with database schema and API layerRepo Watch does not currently scan for this. Scans operate on the repository filesystem only and do not access git history. Commit shape analysis would require git log traversal, which is not in the current pipeline. This is a longer-term signal worth adding as the product matures.
5. Dependency weight mismatch
Generated projects often import familiar libraries by default, even when platform primitives would suffice.
Check for:
- many dependencies in a small scope project
- legacy libraries for solved runtime features
- outdated packages that were likely copied from old examples
// package.json — a simple 3-route CRUD API
"dependencies": {
"axios": "^1.6.0", // fetch() is native
"lodash": "^4.17.21", // Array methods cover this
"moment": "^2.29.4", // Intl.DateTimeFormat is native
"uuid": "^9.0.0", // crypto.randomUUID() is native
"dotenv": "^16.0.0", // process.env loads without this in Node 20+
"cors": "^2.8.5",
"helmet": "^7.0.0",
"morgan": "^1.10.0"
}Repo Watch partially scans for this. Dependency counts are parsed from lockfiles, and a high dependency footprint relative to source file count reduces the Code Quality section score. Repo Watch does not currently check for version staleness, available platform substitutes, or known vulnerabilities. Expanded dependency analysis is a planned improvement.
6. Security controls only at the edge
This is a common risk pattern in generated output: route-level checks exist, but deeper layers assume trusted input.
Check for:
- validation only in controllers, not in domain or data layers
- inconsistent parameterization of data access operations
- missing defense-in-depth for sensitive write paths
// router.ts — auth and schema validation exist at the edge ✓
router.post("/transfer", requireAuth, validateTransferBody, transferHandler);
// transfer-handler.ts — no ownership check, no re-validation
async function transferHandler(req: Request, res: Response) {
const { fromAccountId, toAccountId, amount } = req.body;
// No check: does this user own fromAccountId?
// No check: is amount within allowed limits?
// Direct pass-through to data layer
await db.transfer(fromAccountId, toAccountId, amount);
res.json({ ok: true });
}Repo Watch scans for this via Semgrep. The Security Hygiene section runs Semgrep with its auto ruleset, which includes rules targeting missing authorization checks and incomplete input validation at the domain layer. Patterns like the one above can surface as HIGH severity findings with file location and remediation guidance. Gitleaks also scans separately for exposed secrets and credential patterns.
How to use these signals
Treat this as a prioritization framework, not a verdict engine.
If multiple signals appear together, dig deeper. Ask for implementation rationale. Verify assumptions in tests. Check dependency and security posture before merge.
A structured scan gives you the starting list. Use it to focus the human review, not to replace it.
Run a first pass before the review
Sign in for 3 free scans a month. Paid plans unlock more scans, connected repositories, and priority processing. Questions about the results? Reach us directly.
No credit card required. Connect a GitHub repository or upload a ZIP to start.