Claude Code Subagents: Divide and Conquer with AI

One of the major drawbacks when working with LLM's in dev. environments is context contamination.

A single chat quickly becomes a Frankenstein: mixing code review with debugging, then with architecture questions, later with documentation... everything ends up in an endless prompt where the model loses precision and coherence.

Subagents in Claude Code appear as a direct response to that problem.

They allow you to design specialized assistants, each with its own context, instruction system, and permissions. In other words: they introduce a pattern of separation of concerns within AI collaboration, similar to what has been applied for years with microservices or CI/CD pipelines.

It's not just another "trick," but a change in how we think about AI work architecture.

⚠️ Before starting, to fully leverage this guide you'll need some hands-on experience with Claude Code. For the essentials about Claude Code, check my comprehensive guide.

What Are Subagents

A subagent is a specialized AI personality, defined through a configuration file (Markdown with YAML frontmatter) that specifies:

Clear purpose: each subagent has a well-defined area of specialization.
Separate context: works in its own context window, independent of the main conversation.
Controlled tools: you can limit which tools (Read, Bash, Grep, etc.) are available to it.
Own system prompt: specific instructions that guide its behavior.

This allows delegating complex tasks in a structured way.

What They Are Not

Not fine-tuning: you don't train a model; you define a behavior, in this case within Claude Code.
Not autonomous agents: they don't run outside Claude's ecosystem nor do they "live" alone on the network.
Not simply prompts: they persist, are reusable, and live in the file system.

Key Benefits

Context preservation: avoid mixing everything in the same thread.
Specialization: each subagent is an expert in its domain.
Reusability: shared between projects and teams.
Granular permissions: each subagent accesses only what it needs.

Now Seriously... Why Use Them?

Sometimes it seems like you spend more time configuring shit than programming. Between training prompts, adjusting permissions, and writing CLAUDE.md, you wonder if this is applied AI or an expensive hobby.

But if you don't structure, you end up losing even more time. The initial chaos is paid with interest. The model starts strong, but by the fourth prompt it's already making things up because it has lost adherence (drifting).

Subagents exist precisely to mitigate that vicious circle. They allow you to distribute the load: one subagent that researches, another that validates, another that plans. Each with its own sandbox, without dragging garbage from the main conversation.

If you have to keep one idea, let it be this:

Subagents = less prompt engineering and more real work, with the model functioning like a team of well-disciplined developers instead of a hyperactive intern with attention deficit.

How to Create a Subagent

Claude offers a simple flow:

Open the interface with the /agents command.
Choose to create a project subagent (.claude/agents/) or user subagent (~/.claude/agents/) to have it available in all projects.
Recommended: generate a draft with the Claude Code assistant and then adjust it to your needs.
Define it: name, description, tools (if you want to restrict), and system prompt. At minimum, write a detailed description of when it should be used; if you use the assistant, Claude will complete the rest.
Save: from that moment Claude can use it proactively (automatic delegation) or by explicit invocation.

Basic Example

This subagent automatically runs tests when it detects changes. It isolates failures, proposes minimal fixes, and ensures quality without contaminating the main context.

---
name: test-runner
description: Proactively run tests and fix failures
tools: Read, Bash, Grep
---

You are a test automation specialist.
Always run tests after code changes.
If tests fail, analyze the failures and implement minimal fixes while preserving the original test intent.
Return a concise summary and any failing test output.

Invoking Agents

Two main ways:

Automatic: Claude selects the correct subagent when the description matches your request.
Explicit:

> Use the debugger subagent to investigate this error.

To force more proactive use, you can add phrases like "MUST BE USED when debugging errors" in the description field.

This turns the subagent into an active part of your flow: you don't have to always remember to invoke it. Although in my experience it is necessary to remind Claude to use them.

Combining Them and Advanced Workflows

The real power comes when chaining subagents:

> First use the code-analyzer subagent to find performance issues,
> then use the optimizer subagent to fix them.

This allows designing pipelines within the conversation itself:

Performance analysis.
Automatic optimization.
Test execution.
Final code review.

The pattern is clear: subagents = AI "microservices" in your development environment.

Practical Consideration:

Each subagent starts with a clean context → adds some latency.
Don't abuse chaining without a clear strategy, as they still devour tokens.

Best Practices and Strategic Design

Create your own subagents based on the real needs of your projects and your way of working.
Design with unique responsibilities: one subagent = one task.
As I recommend above, start with agents generated by Claude and then iterate to adjust them.
Write detailed prompts: the clearer the definition, the better the performance.
Limit tools: security and focus, but considering their independence: leave write permissions, and when you don't want it to generate an artifact, say so explicitly.
Version in Git: so your team can collaborate on improving the subagents.

Real Use Cases

I repeat, in my opinion the best way to make the most of this Claude Code functionality is to create your own agents. Using some already created is useful, and shortly I'll show you a huge collection of them but, seriously, each professional is different, so solutions should be personalized.

In my case, my current stack consists of Tailwind, Laravel with Inertia and Vue.js, plus the OpenAI API. Therefore, I have an agent per stack piece:

Laravel-Vue-Architect that helps me make pragmatic decisions based on Laravel (with Inertia) and Vue.js best practices.
Open-AI-Expert is, as the name indicates, an expert in the OpenAI API (which isn't simple, by the way). I use it when I want to make sure I'm using the best prompting techniques, caching, choosing the right model, its parameters, etc.
UI-Design-Reviewer is the agent responsible for validating my attempts at creating layouts and UI compositions (with Tailwind), even though I'm not a professional at it.

Open-AI-Expert

I'll show you the content of the open-ai-expert.md agent, so you get an idea.

---
name: openai-api-expert
description: Use this agent when adding or modifying code that calls the OpenAI API. This includes: implementing new OpenAI API integrations, updating existing API calls, reviewing OpenAI-related code for best practices, debugging API issues, or optimizing API usage for cost and performance. The agent will analyze your code and provide concrete patches with production-ready patterns.\n\nExamples:\n<example>\nContext: User is implementing a new feature that calls OpenAI's API\nuser: "I need to add a function that summarizes text using GPT-4"\nassistant: "I'll help you implement that text summarization function with GPT-4."\n<function implementation>\nassistant: "Now let me use the openai-api-expert agent to review and enhance this OpenAI integration with best practices"\n<commentary>\nSince we're adding new OpenAI API code, use the openai-api-expert to ensure robust patterns, observability, and cost awareness.\n</commentary>\n</example>\n<example>\nContext: User is reviewing existing OpenAI API code\nuser: "Can you check if our ChatCompletion calls are following best practices?"\nassistant: "I'll use the openai-api-expert agent to audit your OpenAI API integration and provide specific improvements"\n<commentary>\nThe user wants to review OpenAI API code quality, so launch the openai-api-expert for comprehensive analysis.\n</commentary>\n</example>
model: sonnet
color: purple
---

You are an expert in OpenAI API integrations, specializing in both client SDK and REST implementations. You enforce production-grade patterns, provide concrete code patches, and ensure cost-effective, observable, and resilient API usage.

## Core Responsibilities

You will:
1. **Enforce Robust Patterns**: Implement timeouts, exponential backoff retries, idempotency keys, and streaming where appropriate
2. **Add Minimal Observability**: Include request metadata logging, latency histograms, and error categorization
3. **Provide Cost/Latency Analysis**: Calculate transparent estimates based on model selection, token policies, and usage patterns
4. **Suggest Defensive Measures**: Recommend prompt guard rails, input validation, and rate limiting strategies

## Analysis Workflow

When analyzing code:
1. **Scan Integration Points**: Review client initialization, request factories, middleware, and error handlers
2. **Evaluate Model Selection**: Identify chosen models and justify based on quality/latency/cost trade-offs
3. **Generate Concrete Patches**: Provide unified diffs with production-ready code, not just recommendations
4. **Flag Outdated Information**: Mark any pricing/limit details as "TO VERIFY in vendor docs" with exact documentation anchors
5. **Create Test Runbooks**: Include curl or node.js snippets for local smoke testing with proper secret redaction

## Required Patterns to Enforce

### Timeout and Retry Logic
- Default timeout: 30s for standard requests, 60s for streaming
- Exponential backoff: 1s, 2s, 4s, 8s with jitter
- Max retries: 3 for transient errors (429, 503, network)
- Idempotency keys for all non-GET requests

### Error Handling
- Categorize errors: rate_limit, auth, validation, model, network, unknown
- Log with structured metadata: model, prompt_tokens, completion_tokens, latency_ms
- Implement circuit breaker for repeated failures

### Cost Optimization
- Use appropriate models: gpt-3.5-turbo for simple tasks, gpt-4 for complex reasoning
- Implement token counting before requests
- Cache responses where deterministic
- Stream for user-facing responses to improve perceived latency

### Security and Validation
- Validate and sanitize all user inputs
- Implement prompt injection detection
- Use system messages for behavioral constraints
- Never log full API keys, only last 4 characters

## Output Structure

Your response must include these sections:

### 1. Risks Found
List specific vulnerabilities or anti-patterns discovered:
- Missing error handling
- No timeout configuration
- Hardcoded API keys
- Unbounded token usage
- Missing rate limit handling

### 2. Proposed Patch
Provide a unified diff format patch:
```diff
--- a/path/to/file.js
+++ b/path/to/file.js
@@ -line,count +line,count @@
 context lines
-removed lines
+added lines
```

### 3. Cost/Latency Assumptions
Provide detailed estimates:
- Model: [model-name] @ $X/1K input, $Y/1K output tokens (TO VERIFY: https://openai.com/pricing)
- Expected tokens: ~X input, ~Y output per request
- Estimated cost: $Z per 1000 requests
- P50 latency: Xms, P99 latency: Yms
- Rate limits: X RPM, Y TPM (TO VERIFY: https://platform.openai.com/docs/guides/rate-limits)

### 4. Local Test Runbook
Provide executable test commands:
```bash
# Test with curl (secrets redacted)
export OPENAI_API_KEY="sk-...XXXX"
curl -X POST https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{...}'

# Or with Node.js
node -e "..."
```

## Model Selection Guidelines

- **gpt-3.5-turbo**: Classification, simple extraction, formatting (fast, cheap)
- **gpt-4-turbo**: Complex reasoning, code generation, analysis (balanced)
- **gpt-4o**: Multimodal tasks, highest accuracy requirements (premium)
- **text-embedding-3-small**: Standard embeddings (cheap, fast)
- **text-embedding-3-large**: High-precision similarity search (quality)

## Critical Checks

Always verify:
1. API keys are from environment variables, not hardcoded
2. Streaming is used for user-facing completions >1s expected latency
3. Token limits are enforced (context window awareness)
4. Proper error messages don't leak internal details
5. Retry logic doesn't retry client errors (400, 401, 403)
6. Observability includes correlation IDs for request tracing

When you identify outdated vendor information, provide the exact documentation URL and section to verify. Your patches should be immediately applicable without modification, following the project's existing code style and patterns.

Example of "My Agents" in Action

This screenshot shows a real example of how Claude Code Agents work in practice.

The flow started with a UI-Design-reviewer, an agent specialized in reviewing visual consistency and usability.

It detected clear problems: poorly positioned buttons, confusing visual hierarchy, weak contrasts, and spatial disorientation.

The output wasn't vague phrases, but a structured diagnosis with concrete proposals: unify action zones, apply a modern color scheme, and reinforce consistency across all components.

Then, the Laravel-Vue-architect came into play, another Agent specialized in front-back architecture.

Its role was to translate those design recommendations into a technical implementation plan: create a unified component, centralize logic in a composable, type with TypeScript, and update existing components to align with the new architecture.

The result is a perfect example of how Claude Agents work in chain, just like a real team would:

The designer detects and proposes.
The architect translates into technical solutions.
And the human developer reviews and decides.

Far from being a single "mega-agent" that does everything, the strength of Agents lies in specialization and collaboration.

Where to Get Subagents

Although I said it before and I'll repeat it: the best thing is to design your own subagents, adapted to your stack and your way of working. That said, you don't start from scratch: there are public collections ready to use.

A popular example is Build With Claude, where you even have a CLI to install them and keep them up to date without breaking a sweat.

My Honest Experience —Raw

In theory, Claude should invoke subagents autonomously. In practice... no. Not even if you put USE THIS AGENT EXTENSIVELY in the description and several emojis.
The realistic thing is for you to invoke them explicitly, with clear instructions. If not, they often just gather dust.
It is true that using them helps keep the context window cleaner (less compression, less accumulated garbage), which translates to more stable prompts.
But watch out: they're not infallible. They also hallucinate, they also over-analyze, they also add unnecessary complexity. The same skepticism you apply to the main agent you have to apply here.
The good side? They give you that brutal initial sense of progress. False? Probably. Useful for getting started and motivated? Also.

Conclusion

Subagents are not an optional extra: they represent a new architectural pattern for working with AI in development.

If until now we thought of Claude as "an assistant in a chat," with subagents we move to designing a system of specialized assistants, each with its role, its independent context, and its tool set.

The conclusion is clear: just as a good technical team doesn't rely on a single service that does everything, neither should you depend on a single AI thread to cover all your needs.