ELEPHAANT

Table of Contents

  • How it actually works (official pipeline)
  • Severity levels (from the docs)
  • When reviews run (per repo)
  • 1. What is it, and do I need it?
  • 2. How much does it really cost, and how do I control spend?
  • 3. How do I get access, and what's the catch?
  • 4. Is my code safe? Does Anthropic train on it?
  • 5. How does it compare to CodeRabbit, Copilot, and the free GitHub Action?
  • 6. What does it actually check—and what can I customize?
  • 7. What are the real limitations?
  • 8. When is $15–25 per PR actually worth it?
  • 9. How do I set it up and keep control?
  • The one story that sums it up
  • Bottom line
#ai#tools#development

Anthropic Code Review for Claude Code: The Developer Questions Nobody's Answering

Anthropic's $15–25/PR Code Review is live. Should you use it? How do you control cost? Is your code used for training? We answer the questions every developer actually has—including the ones that aren't in the marketing.

March 12, 2026 (Today)

9 min read

Anthropic just shipped automated PR reviews at $15–25 per pull request. Before you enable it—or dismiss it—here are the answers you won't find in the launch post.

Code review has become a bottleneck everywhere. At Anthropic, code output per engineer grew 200% in the last year. Many teams are in the same spot: more AI-generated code, same number of human reviewers, and PRs that get a quick skim instead of a real read. In March 2026, Anthropic launched Code Review for Claude Code—a multi-agent system that reviews every PR in depth. It's the same pipeline they run on nearly every PR at Anthropic. Now it's in research preview for Team and Enterprise.

This post answers the questions developers actually have: cost control, privacy, when it's worth it, how it compares to alternatives, and the limitations that matter for your stack.


How it actually works (official pipeline)

From Anthropic's Code Review docs: once an admin enables Code Review, reviews run automatically when a pull request opens or updates. Here’s the pipeline, as described by Anthropic.

1. Parallel analysis. Multiple agents run on Anthropic’s infrastructure and analyze the diff and surrounding code in parallel. Each agent looks for a different class of issue (e.g. logic errors, security issues, edge cases, regressions).

2. Verification. A verification step checks candidate findings against actual code behavior to filter out false positives. So what you see isn’t raw model output—it’s been validated.

3. Dedupe and rank. Results are deduplicated, ranked by severity, and then posted as inline comments on the specific lines where issues were found. If no issues are found, Claude posts a short confirmation comment on the PR.

4. Scale with PR size. Review depth and cost scale with PR size and complexity. Large or complex PRs get more agents and a deeper pass; small changes get a lighter pass. Average time to completion is around 20 minutes.

Severity levels (from the docs)

Each finding is tagged with a severity level:

MarkerSeverityMeaning
🔴NormalA bug that should be fixed before merging
🟡NitA minor issue, worth fixing but not blocking
🟣Pre-existingA bug that exists in the codebase but was not introduced by this PR

Findings also include a collapsible extended reasoning section you can expand to see why Claude flagged the issue and how it verified the problem.

When reviews run (per repo)

Admins choose, per repository, when Code Review runs:

  • After every push to the PR branch – Review runs on each push, so you catch new issues as the PR evolves. Threads can auto-resolve when you fix flagged issues. More reviews, higher cost.
  • After PR creation only – Review runs once when the PR is opened or marked ready for review. Lower cost, good starting point.

The official docs have full setup, customization with REVIEW.md / CLAUDE.md, and the analytics dashboard at claude.ai/analytics/code-review.


1. What is it, and do I need it?

Code Review is a managed service that runs when you open a pull request. It doesn't replace your existing GitHub Action or your human reviewers. It adds a deep, multi-agent pass before (or alongside) human review.

  • Multiple agents look at the PR in parallel—data handling, edge cases, API misuse, cross-file consistency.
  • Agents cross-verify findings to cut false positives, then rank by severity.
  • You get one summary comment on the PR plus inline comments on specific lines.
  • It does not approve or block merges. A human still has to approve. It only closes the feedback gap.

Do you need it? If your team is already on Claude Team or Enterprise and you feel like many PRs get light review, it's worth trying. If you're on GitHub only and want depth over speed, it fits. If you're on GitLab/Bitbucket or need sub-minute reviews, it's not for you yet (see limitations below).


2. How much does it really cost, and how do I control spend?

Pricing is per review, not per seat: about $15–25 per PR, depending on size and complexity. So 100 PRs/month might land in the $1,500–2,500 range if they're average; large refactors cost more.

Controls that actually matter:

  • Analytics dashboard – PRs reviewed, acceptance rate, total review cost.
  • Repository-level toggle – Turn it on only for the repos you care about (e.g. production services, not docs or experiments).
  • Monthly org cap – Cap total spend across all Code Review usage for the org.

So you can limit cost by: (1) enabling only on critical repos, (2) setting a monthly cap, and (3) using the dashboard to see where the spend goes.


3. How do I get access, and what's the catch?

  • Who gets it: Claude Team and Enterprise only. Not Pro, not API-only.
  • State: Research preview (beta). Behavior and pricing may evolve.
  • Integration: GitHub only, via the Claude Code GitHub App. You install the app and choose which repos get reviews.

Important catch: Code Review is not available for orgs with Zero Data Retention. If your company requires zero data retention with vendors, you can't use this managed Code Review. In that case your option is the open-source Claude Code GitHub Action (anthropics/claude-code-action), which you run yourself (e.g. in GitHub Actions) and which doesn't send PR content to Anthropic's managed service in the same way. Check with your security/compliance before enabling.


4. Is my code safe? Does Anthropic train on it?

Short answer: for commercial use (Team/Enterprise), your data is not used for model training by default.

  • Commercial products (Claude for Work, Enterprise, API, and by extension Code Review for Team/Enterprise): Anthropic does not train on your data unless you explicitly join the optional Development Partner Program and agree to share session data.
  • Consumer products (Claude Free, Pro, Max): Chat/session data can be used for training only if you've enabled it in settings (or it's flagged for safety). Incognito is never used for training.

Code Review runs in the same commercial bucket: your PR content is processed for the review, not for training, unless you've opted into the partner program. For full assurance, confirm with your legal/compliance and Anthropic's current privacy and security docs.


5. How does it compare to CodeRabbit, Copilot, and the free GitHub Action?

Claude Code ReviewCodeRabbitGitHub Copilot Code ReviewClaude Code GitHub Action
ModelPer-PR ($15–25)Per seat (~$24/dev/mo)Per seat (~$19/mo Business)Free / you pay API/compute
Speed~20 min9+ min~30 secDepends on your runners
DepthDeep, multi-agentStrong (e.g. security)Fast, surface-levelConfigurable, you control
PlatformGitHub onlyGitHub, GitLab, Azure DevOps, BitbucketGitHubGitHub (self-hosted in Actions)
FocusLogic, correctness, securitySecurity + generalQuick pass, refactorsWhatever you configure

When to pick Code Review: You want depth over speed, you're on Claude Team/Enterprise, and you're okay with GitHub-only and per-PR cost. Good for high-stakes or large PRs.

When to stick with CodeRabbit: You need GitLab/Bitbucket, per-seat pricing, or committable fixes and learning from feedback.

When to use Copilot: You want fast, cheap first-pass review and are already on GitHub Copilot Business.

When to use the GitHub Action: You need full control, zero data retention, or custom triggers (e.g. only when someone tags @claude). You trade off managed UX for flexibility and data locality.


6. What does it actually check—and what can I customize?

Out of the box: Code Review focuses on correctness and logic—bugs that can break production—not style or formatting. Think: data handling, boundary conditions, API misuse, regressions, and cross-file consistency.

Customization: You can steer it with two files in your repo:

  • REVIEW.md – What to emphasize or ignore in reviews (e.g. “Flag any new env vars,” “Don’t nitpick naming in this legacy module”).
  • CLAUDE.md – Project context and conventions. Code Review uses this (and directory-level CLAUDE.md for touched paths) so agents align with your patterns and standards.

So you can keep the default “logic and correctness” focus and add project-specific rules and context without changing product settings.


7. What are the real limitations?

  • GitHub only – No GitLab, Bitbucket, or Azure DevOps. For those, use something like CodeRabbit or the Claude Code Action in your CI (e.g. GitLab CI).
  • ~20 minutes per review – Not for “instant” feedback. Good for pre-merge depth, not for every commit.
  • No auto-approve or block – Findings are advisory. Humans still approve and merge.
  • Zero Data Retention – If your org requires it, this managed Code Review is off the table; use the self-hosted Action instead.
  • Team/Enterprise only – No Pro or consumer tier.
  • Research preview – Capabilities and pricing may change.

8. When is $15–25 per PR actually worth it?

Worth considering when:

  • Large or complex PRs – Anthropic’s own stats: on PRs over 1,000 lines, 84% got findings (avg 7.5 issues). Small PRs (<50 lines) had 31% with findings (avg 0.5). So the value is highest on big changes.
  • Critical paths – Auth, billing, data pipeline, security-sensitive code. One prevented bug can justify many review runs.
  • Lots of AI-generated code – Stanford work has shown that AI-assisted code can introduce more security issues while developers feel more confident. A second “pair of eyes” that’s tuned for logic and security can pay off.
  • Review bottleneck – If many PRs get only a quick skim, Code Review can consistently add substantive feedback (Anthropic saw 16% → 54% of PRs getting substantive comments).

Less compelling when:

  • Tiny, trivial PRs – You might not need a $15–25 pass on a one-line typo fix.
  • Budget-sensitive teams – If you ship hundreds of small PRs per month, per-PR pricing can add up; compare to per-seat tools.
  • You need instant feedback – Then Copilot or a fast linter/check is a better fit.

9. How do I set it up and keep control?

For developers: Once an admin enables Code Review and installs the GitHub App, reviews run automatically on new PRs in the selected repos. No per-PR configuration required.

For admins:

  1. Open Claude Code settings (e.g. claude.ai/admin-settings/claude-code).
  2. Enable Code Review and install the GitHub App.
  3. Choose repositories where reviews should run.
  4. Optionally set monthly spend caps and use the analytics dashboard to monitor usage.

Use repo-level enablement and monthly caps so cost and scope stay under control.


The one story that sums it up

Anthropic shared a concrete example: a one-line change to a production service looked routine—the kind of diff that often gets a quick approval. Code Review flagged it as critical. That change would have broken authentication for the service. The engineer said they wouldn’t have caught it on their own. It was fixed before merge.

That’s the pitch: not “AI replaces review,” but “AI catches the bugs that slip past a quick skim.” Whether that’s worth $15–25 per PR depends on your PR volume, repo criticality, and how much you trust a single fast pass today.


Bottom line

  • What it is: A deep, multi-agent PR review for GitHub, ~20 min per PR, $15–25 per review, Team/Enterprise only, research preview.
  • Worth it if: You’re on Claude Team/Enterprise, use GitHub, care about depth over speed, and have high-impact or large PRs (or a lot of AI-generated code).
  • Not for you if: You need Zero Data Retention, GitLab/Bitbucket, or sub-minute reviews; or you prefer per-seat pricing and multi-platform support.
  • Control: Repo-level enablement, monthly caps, and analytics. Customize with REVIEW.md and CLAUDE.md.
  • Privacy: Commercial usage isn’t used for training by default; confirm with your org and Anthropic’s latest docs.

If you’re evaluating it, enable it on one or two critical repos first, set a monthly cap, and compare the feedback quality and cost to your current process. That’s the only way to know if it’s worth it for your team.

Related Articles

  • Google's Always-On Memory Agent: Why It Might Replace Your Vector DB in 2026

    Google open-sourced an agent that keeps persistent memory without vector databases or embeddings—just an LLM that reads, thinks, and writes. Here's what it is, when it fits, and when I'd still reach for a vector DB.

  • Best Tech Stack to Build Anything in 2026 Using AI

    The right stack in 2026 isn't just fast—it's AI-native. From frontend to backend, here's the tech stack I use and recommend to build products that leverage AI from day one.

  • Cursor: The AI-Powered Coding Revolution That's Changing How We Build Software

    Explore how Cursor is transforming software development with AI-powered coding assistance. Learn practical workflows, advanced features, and real-world insights from my experience - I've been using Cursor since June 2025.

ELEPHAANT