25 Mar 2026

How to spot AI-generated code in a GitHub repo

A practical checklist for reviewing repositories that may include AI-assisted code.

When you are reviewing a repository, the hard part is not deciding whether AI tools were used. The hard part is deciding whether the resulting code is trustworthy.

If you are searching for how to spot AI-generated code in a repo, the practical answer is to evaluate patterns that affect maintainability, security, and test confidence.

The most useful question is simple:

Does this repo look like it evolved through deliberate engineering decisions, or was it assembled in large generated batches with light validation?

No single signal is conclusive, but patterns are.

How Repo Watch surfaces these AI-risk signals

AI-Risk Indicators

How Repo Watch surfaces this

Repo Watch runs static analysis only and surfaces AI-risk indicators as directional heuristics, alongside test confidence, code quality, and security hygiene sections.

Structured AI-risk indicators are shown in a dedicated section with clear score explainers.
Top findings are prioritized so reviewers can inspect the highest-impact signals first.
Each finding includes category, severity, and plain-language guidance for next steps.
Scan completeness caveats are surfaced so teams understand confidence limits in the result.

Important: AI-risk indicators are heuristic signals, not proof of code origin or attribution.

This makes AI code detection for GitHub and GitLab repositories more actionable: you get a triage view, not a black-box verdict.

AI-Risk Indicators

AI tooling config detected (1)Normal structural fragmentation

Score explainer?

AI-Risk Indicators are directional, structure-based heuristics that look for signals often associated with AI-assisted code generation patterns.

Baseline+20

AI tooling config files detected (1)+30

Computed total: 50 (matches final section score).

This mirrors the actual AI-Risk Indicators section in Repo Watch scan results. Each contribution is shown explicitly so reviewers can trace exactly how the section score was produced, not just consume a number.

1. Style that is too uniform

Healthy teams usually show mild variation over time. AI-heavy output can look unnaturally consistent across every file.

Check for:

identical comment tone in unrelated modules
repetitive naming patterns everywhere
little variation in implementation style across long timelines

Watch for this

typescript

// utils/formatDate.ts
/**
* Formats a date string into a human-readable format.
* @param date - The date string to format.
* @returns A formatted date string.
*/
export function formatDate(date: string): string {
return new Date(date).toLocaleDateString();
}

// utils/parseAmount.ts (different module, same author?)
/**

- Parses an amount value into a numeric representation.
- @param amount - The amount string to parse.
- @returns A parsed numeric amount.
*/
export function parseAmount(amount: string): number {
return parseFloat(amount);
}

Identical JSDoc pattern, identical parameter naming, identical return phrasing — across two unrelated utility files written in the same batch.

Repo Watch does not currently scan for this. Comment tone and naming uniformity require AST-level style analysis that is outside the current static scan scope. The AI-Risk section uses structural fragmentation (many files per source file) as a coarser proxy for generated batches, but style uniformity itself is a manual review signal for now.

Consistency is good. Artificially perfect consistency can be a review clue.

2. Generic abstractions with no real demand

Generated code often adds extra layers "just in case": utility wrappers, broad interfaces, and extension points that never get used.

Check for:

exported helpers that are never called
interfaces with options no code path exercises
abstractions that increase complexity without reducing duplication

Watch for this

typescript

// A wrapper around HttpClient that adds nothing
export class HttpClientWrapper {
private readonly client: HttpClient;

constructor(options?: HttpClientOptions) {
this.client = new HttpClient(options);
}

async get<T>(url: string, config?: RequestConfig): Promise<ApiResponse<T>> {
return this.client.get<T>(url, config);
}

async post<T>(url: string, data: unknown, config?: RequestConfig): Promise<ApiResponse<T>> {
return this.client.post<T>(url, data, config);
}
}

// Used exactly once, in a single page component, calling .get() with no config

The wrapper adds no error handling, no retry logic, no shared auth headers — it just proxies. Real demand for this abstraction would show in the call sites.

Repo Watch does not currently scan for this. Two separate passes are planned. Structural proxy smells — classes whose every method just delegates to an inner object, all-optional interfaces — are addressable with targeted Semgrep rules and are on the roadmap. True dead-code and unused export detection requires a cross-file symbol graph; we plan to add a knip pass to the worker pipeline for that, which resolves re-exports and barrel files without executing any code.

3. Coverage without confidence

A repo can show high coverage and still miss critical behavior.

Check for:

minimal negative tests (throws, rejects, invalid input)
no boundary tests (empty data, max limits, null variants)
repeated happy-path unit tests with little failure exploration

Watch for this

typescript

// 3 tests, all green, strong coverage percentage
it("returns user by id", async () => {
const user = await getUser("user-1");
expect(user.id).toBe("user-1");
});

it("returns user name", async () => {
const user = await getUser("user-1");
expect(user.name).toBe("Alice");
});

it("returns user email", async () => {
const user = await getUser("user-1");
expect(user.email).toBe("alice@example.com");
});

// Missing: getUser("nonexistent"), getUser(null), deleted accounts, permission checks

Three tests exercising the same happy path on the same fixture. Coverage climbs; confidence does not.

Repo Watch partially scans for this. A test-to-source file ratio is measured as a structural confidence signal, and thin test footprints lower the Test Confidence section score. Repo Watch does not yet inspect test content or parse coverage artifacts to identify happy-path-only suites. Coverage artifact integration is planned work.

Coverage is a metric. Test confidence is an outcome.

4. Commit shapes that feel synthetic

Organic repositories usually include small fixes, partial refactors, and uneven progress.

Check for:

very large one-shot commits introducing many modules
commit messages that are uniformly polished from day one
no visible maintenance rhythm between feature drops

Watch for this

bash

$ git log --oneline
a4f1c2e feat: implement complete user authentication system with OAuth
b3e2d1f feat: add full product catalog with search, filtering, and sorting
c2d3e4f feat: build checkout flow with payment integration and confirmation emails
d1e2f3a feat: create admin dashboard with analytics and user management
e0f1a2b feat: set up project structure with database schema and API layer

Five commits, each introducing a complete vertical feature. No partial work, no fixes between drops, no evidence of iteration.

Repo Watch does not currently scan for this. Scans operate on the repository filesystem only and do not access git history. Commit shape analysis would require git log traversal, which is not in the current pipeline. This is a longer-term signal worth adding as the product matures.

5. Dependency weight mismatch

Generated projects often import familiar libraries by default, even when platform primitives would suffice.

Check for:

many dependencies in a small scope project
legacy libraries for solved runtime features
outdated packages that were likely copied from old examples

Watch for this

json

// package.json — a simple 3-route CRUD API
"dependencies": {
"axios": "^1.6.0",       // fetch() is native
"lodash": "^4.17.21",    // Array methods cover this
"moment": "^2.29.4",     // Intl.DateTimeFormat is native
"uuid": "^9.0.0",        // crypto.randomUUID() is native
"dotenv": "^16.0.0",     // process.env loads without this in Node 20+
"cors": "^2.8.5",
"helmet": "^7.0.0",
"morgan": "^1.10.0"
}

Four of these are platform primitives in modern Node. The pattern suggests code assembled from 2021-era examples without revisiting defaults.

Repo Watch partially scans for this. Dependency counts are parsed from lockfiles, and a high dependency footprint relative to source file count reduces the Code Quality section score. Repo Watch does not currently check for version staleness, available platform substitutes, or known vulnerabilities. Expanded dependency analysis is a planned improvement.

6. Security controls only at the edge

This is a common risk pattern in generated output: route-level checks exist, but deeper layers assume trusted input.

Check for:

validation only in controllers, not in domain or data layers
inconsistent parameterization of data access operations
missing defense-in-depth for sensitive write paths

Watch for this

typescript

// router.ts — auth and schema validation exist at the edge ✓
router.post("/transfer", requireAuth, validateTransferBody, transferHandler);

// transfer-handler.ts — no ownership check, no re-validation
async function transferHandler(req: Request, res: Response) {
const { fromAccountId, toAccountId, amount } = req.body;

// No check: does this user own fromAccountId?
// No check: is amount within allowed limits?
// Direct pass-through to data layer
await db.transfer(fromAccountId, toAccountId, amount);

res.json({ ok: true });
}

Edge validation passes schema checks but skips ownership and authorization logic at the domain layer. An authenticated user could transfer from any account.

Repo Watch scans for this via Semgrep. The Security Hygiene section runs Semgrep with its auto ruleset, which includes rules targeting missing authorization checks and incomplete input validation at the domain layer. Patterns like the one above can surface as HIGH severity findings with file location and remediation guidance. Gitleaks also scans separately for exposed secrets and credential patterns.

How to use these signals

Treat this as a prioritization framework, not a verdict engine.

If multiple signals appear together, dig deeper. Ask for implementation rationale. Verify assumptions in tests. Check dependency and security posture before merge.

A structured scan gives you the starting list. Use it to focus the human review, not to replace it.

Run a first pass before the review

Sign in for 3 free scans a month. Paid plans unlock more scans, connected repositories, and priority processing. Questions about the results? Reach us directly.

Start scanning free hello@repowatch.io

No credit card required. Connect a GitHub repository or upload a ZIP to start.