Unit 2: Agent Tool Architecture

CSEC 601 — Semester 1 | Weeks 5–8

Building on Unit 1 Week 4: Week 4 introduced the context engineering framework — system prompts, memory architectures, structured outputs, and retrieval as a conceptual model. This unit builds the infrastructure that implements those concepts: standardized tool interfaces (MCP), secure tool design patterns, output schemas, and production-grade knowledge retrieval. By the end of Week 8, every component of the context engineering stack will have a working implementation behind it.

Week 5: Model Context Protocol (MCP) — The Agent-Tool Interface

Start your context library now. The context library is the collection of curated reference files you build throughout this course — prompts that worked, patterns you discovered, schemas you designed. Start it in Week 5. At the end of every lab this unit, you'll add one entry. By Unit 4 you'll have a personal reference library you can pull into any Claude session.

Build your context library as you go

Throughout Unit 2 you will build MCP tools, schemas, and patterns. After each week's lab, capture what worked into context-library/patterns/ — a growing artifact you will reuse in later units. Week 8 will show you the full structure; start adding to it from Week 5.

Day 1 — Theory & Foundations

Learning Objectives

Understand the historical evolution from custom API integrations to standardized agent-tool protocols
Analyze MCP architecture (clients, servers, transports) and its security implications
Evaluate MCP governance under the Linux Foundation and its role in AI standardization
Compare MCP with alternative agent-tool frameworks (LLM-specific tools, proprietary platforms, REST APIs)
Design a secure tool exposure strategy for a hypothetical security system

From Custom APIs to Standardized Protocols: The MCP Origin Story

For the first generation of AI agents (2022–2023), integrating external tools into LLM workflows was chaotic. Engineers built custom connectors for each LLM provider. A tool that worked with Claude required separate integration for GPT-4, and yet another for Gemini. Security teams couldn't audit tool access in a standardized way. There was no common language for describing what a tool could do, what it required, or how it failed.

Anthropic, in collaboration with community partners, recognized this fragmentation as a critical blocker for enterprise AI adoption. In late 2024, they released the Model Context Protocol (MCP), an open standard that fundamentally changed how agents discover, access, and use external tools. Rather than building LLM-specific integrations, teams now build once and connect to any LLM that supports MCP.

🔑 Key Concept: MCP solves the "integration sprawl" problem by establishing a standardized, transport-agnostic interface for agent-tool communication. This is analogous to how HTTPS became the standard for web security rather than requiring bespoke encryption for every web application.

The MCP Architecture: Three Layers of Control

MCP defines three architectural components:

Clients — The AI agent or application that consumes tools. In a security context, this is often Claude running in Claude Code or Claude Agent SDK, tasked with analyzing threats, generating reports, or triaging alerts.
Servers — Stateless services that expose tools. A security team might run an MCP server that wraps their SIEM (Splunk, Elasticsearch), another for their vulnerability scanner, another for their threat intelligence feed. Each server is responsible for authentication, input validation, and logging.
Transports — The communication channel between clients and servers. MCP supports:
stdio — Standard input/output, used for local tools and Claude Code integrations
HTTP/HTTPS — For client-server architectures across networks
WebSockets — For persistent, bidirectional communication in real-time security operations

This three-layer model provides crucial security advantages. A client (agent) never directly executes code on the server; it sends structured requests and receives structured responses. The server can enforce rate limits, validate inputs, and log every interaction. The transport layer can be encrypted and authenticated independently.

Discussion Prompt: If you were designing a tool that gives an AI agent access to your organization's vulnerability database, what security policies would you enforce at each layer (client restrictions, server validation, transport security)?

MCP Concepts Mapped to OOP

If you come from an OOP background, MCP maps directly to concepts you already know:

OOP Concept	Agentic Equivalent
Class	MCP Server
Interface / Abstract class	Tool schema (the contract)
Method	Tool (single responsibility function)
Encapsulation	Server hides implementation, exposes only the schema
Access modifiers	PoLP — tools exposed only to agents that need them
Decorator pattern	Rate limiting + audit logging wrapping tools at the dispatch layer
Circuit breaker	Graceful degradation when external APIs fail

Where the analogy breaks — non-determinism: In OOP you control the call graph. In agentic engineering the agent controls the call graph — you only control the boundaries. Temperature=0 and structured outputs make agent behavior more predictable, not deterministic. The orchestration layer — which tools get called, in what order, how intermediate results get interpreted — remains genuinely non-deterministic. Design for this: define boundaries, observe behavior over time, make audit trails mandatory.

Tool Discovery and Metadata

MCP servers expose their capabilities through a discovery mechanism. When an agent connects to an MCP server, the server responds with a list of available tools, including:

Tool name — e.g., query_cve, enrich_alert, fetch_logs
Description — Natural language description of what the tool does
Input schema — JSON Schema defining required and optional parameters, types, constraints
Output schema — JSON Schema defining the structure of responses
Error codes — Documented failure modes and remediation steps

This metadata allows agents to understand what's available without requiring hardcoded knowledge. A new security analyst can launch Claude Code, load an MCP server, and immediately see all available tools with documentation.

Discussion: tool discovery as attack surface. Every tool your MCP server exposes is discoverable by any agent (or attacker) with access to the server. Overpermissioned MCP servers don't just risk misuse — they enable capability amplification: an agent that should only read CVE data can instead scan networks if those tools are available. The governance response is allowance profiles — pre-approved lists of which tools each agent role can invoke. You'll implement this with Cedar policies in Unit 3.

Discussion: Agentic RBAC and PoLP

If MCP tool discovery exposes all server capabilities to any connected agent, what governance controls prevent an agent from using tools beyond its intended scope? How does Principle of Least Privilege apply to agent-tool relationships?

Agents without scope boundaries will pursue goals using any available tool — including tools outside their intended role. Tool discovery is an attack surface. The defense is PoLP applied at the control plane: only expose tools the agent needs, enforced by identity, not just permission checks.

Further Reading: See the MCP Specification for detailed schema examples and Tool Design Patterns for reference implementations.

Real-World Case Study: MASS as an MCP Server

MASS (Modular Agent Security System) exemplifies modern tool design. MASS exposes its security analyzers—context analysis, vulnerability assessment, threat profiling—as discoverable MCP tools rather than as opaque internal functions. When an agent uses MASS, it:

Queries the MASS MCP server for available tools
Selects relevant tools (e.g., analyze_security_context, scan_vulnerabilities)
Passes data to each tool
Receives structured, auditable results
MASS logs every invocation for compliance and forensics

This design allows MASS to be integrated into other security workflows (ticketing systems, SOAR platforms, custom dashboards) without rewriting the core analyzers.

🔑 Key Concept: Tool-agnostic design (exposing capabilities via MCP rather than writing tool-specific glue code) reduces technical debt and increases system resilience. If your threat intelligence provider changes APIs, you update the MCP server once, not every client that uses it.

Comparing MCP to Alternatives

Approach	Pros	Cons	Best For
MCP	Standardized, vendor-agnostic, auditable, composable	Requires server implementation	Enterprise, multi-LLM deployments
LLM-Specific Tools (e.g., Claude SDK tools)	Simple, well-documented	Lock-in to single LLM, no standardization	Prototypes, single-provider teams
REST APIs	Flexible, universally understood	No tool discovery, requires custom serialization	Legacy systems, unstructured tool access
Proprietary Agent Frameworks (e.g., proprietary SOAR)	Optimized for specific use case	Lock-in, expensive, slow to update	Highly specialized, risk-averse orgs

MCP is rapidly becoming the industry standard for serious AI agent deployments because it balances standardization with flexibility.

Governance and the Linux Foundation

In 2025, the Linux Foundation took governance of MCP, establishing it as a community-driven open standard. This matters for security teams because:

No single vendor lock-in — Microsoft, Google, Anthropic, and open-source maintainers all contribute to MCP's evolution
Transparent roadmap — Security-critical features (like enhanced audit logging) go through community review
Long-term stability — LF governance ensures MCP won't be abandoned or pivoted by a commercial entity
Extensibility process — New security features can be proposed and standardized across the ecosystem

Discussion Prompt: How would your organization's AI governance policies change if your agent infrastructure depended on a proprietary tool protocol vs. an open standard like MCP?

The service layer pattern. Your MCP server is not the business logic — it's a translation layer. The pattern: your REST API or Python service holds the core logic; the MCP server is a thin wrapper that translates between Claude's tool call format and your service's interface. This separation matters for security: you can add rate limiting, authentication, and audit logging at the MCP layer without touching the core logic, and you can test the core logic independently.

Principle of Least Privilege (PoLP) in tool design. Every tool you build should expose the minimum capability needed for its declared purpose. Three practices:

Scope boundaries — define what the tool will never return, not just what it will return
Single responsibility — one tool, one capability; resist adding convenience features that expand the attack surface
Output contracts — the tool's return schema is a security boundary; undocumented fields are unvalidated fields

PoLP isn't a checkbox — it's a design discipline you apply before writing a single line of code.

The "Lost in the Middle" Effect — Why Tool Output Size Matters

Research on transformer attention patterns reveals a consistent finding: model performance on information retrieval degrades for content placed in the middle of a long context window. The model attends most strongly to content at the beginning (system prompt, early context) and the end (most recent user turn). Information buried in the middle of a large context is processed, but with reduced salience.

For security agents, this has direct implications:

Large tool outputs get partially "lost" — If a SIEM tool returns 10,000 log lines, the relevant indicators in lines 4,000–6,000 may receive less attention than those at the start or end
Design tool outputs to be focused — Return the 10 most relevant events, not all 10,000. The tool's job is pre-filtering, not raw data dump
Place critical context strategically — High-priority information (threat indicators, confidence scores, required actions) should appear at the start or end of your prompt, not buried in the middle

The Scratchpad Pattern is a direct response to this problem. Rather than loading all context into one large prompt, you give the agent an explicit working space to accumulate findings incrementally — each tool call adds to a structured scratchpad rather than appending to a growing context block. This keeps relevant findings near the end of context (most recent) and prevents critical data from drifting into the low-attention middle. You'll implement this in Unit 5 when building multi-agent workflows.

Context Anxiety — The Late-Session Failure Mode

Anthropic's engineering team identified a companion failure mode called "context anxiety": models begin wrapping up work prematurely as they approach what they believe is their context limit — not because they've actually run out of context, but because they start behaving as if they need to finish quickly.

Symptoms to watch for:

Agent starts summarizing instead of continuing work
Agent declares tasks "complete" that aren't actually done
Agent stops exploring alternatives and commits to the first approach
Output quality degrades noticeably in the last 20% of a long session

Compaction vs. Context Reset: /compact helps but doesn't fully resolve context anxiety — the agent knows it was compacted and may still feel "late in session." A context reset (start a fresh agent with a structured handoff artifact) provides a clean slate. This is why /build-spec writes output to plans/ — the spec package survives context resets. A new agent reads plans/SPEC-*.md and picks up where the previous agent left off, without inheriting any context fatigue.

For long sessions: write important state to files (scratchpad pattern), then /clear and start fresh with the files as input. The new session has full context budget and no anxiety. Source: Anthropic Engineering, "Harness design for long-running application development," March 2026.

Memory Surfaces — Where Agent Data Actually Lands

Before building data governance or retention policies, map every surface where your agent writes data. Most agents have more write paths than their builders realize:

Conversation history — the message array grows every turn. Tool results, system messages, and all model outputs accumulate here. Persistence: session duration. Deletion: only on session end or explicit clear.
Tool call results — verbose API responses injected directly into context. A SIEM query returning 10,000 log lines enters your conversation history unfiltered unless a PostToolUse hook intervenes.
Scratchpad and progress files — unstructured writes to disk (claude-progress.txt, TASK.md, session manifests). Persistence: indefinite. Often forgotten after the session ends.
External memory systems — vector databases, MCP resources, third-party write-backs (Slack, CRM, databases). Persistence: indefinite, cross-session. Access scope: any agent that connects to the same server.

The memory surface map is the prerequisite for any data governance or retention policy. If you don't know where data lands, you can't govern it.

PII Accumulation Drift

Unlike a single PII disclosure event, accumulation drift is invisible in any individual turn — it only becomes a compliance exposure at the session or system level. An agent that handles customer support tickets may receive a name in turn 3, an address in turn 7, and a partial credit card number in turn 12. No single turn violated policy. The conversation history at turn 12 is a GDPR data collection event.

Controls: (1) PostToolUse hooks to strip PII from tool results before they enter context; (2) the scratchpad pattern — explicit structured writes with declared fields replace uncontrolled context growth; (3) session-scoped memory with automatic disposal on session end. The scratchpad doesn't prevent accumulation — it makes accumulation auditable by routing all writes through a declared structure.

Day 1 Deliverable

Write a 2-page analysis (500–750 words) comparing how MCP would change your current tool integration workflow. If you're using REST APIs, Zapier, or custom connectors to link your security tools, describe:

Current pain points in your integration
How MCP's standardized interface would address them
How tool discovery and metadata would improve your workflow
One security consideration you'd need to evaluate

Day 2 — Hands-On Lab

Lab Objectives

Build a functional MCP server that exposes a security tool (CVE database lookup)
Connect the MCP server to Claude Code via stdio transport
Interact with the agent using natural language to query the tool
Understand the tool discovery, request/response cycle, and error handling in MCP
Document the server, validate inputs, and measure performance

Part 1: Set Up Your MCP Development Environment

Prerequisites:

Python 3.11+ or Node.js 18+
Claude Code installed and authenticated
A public CVE database API key (NVD API is free; no key required for basic queries)

What You're Wrapping — The NVD API

Your MCP server is a structured interface in front of the NIST National Vulnerability Database (NVD) REST API. Before building the wrapper, understand the underlying service:

Endpoint: https://services.nvd.nist.gov/rest/json/cves/2.0
Query by CVE ID: append ?cveId=CVE-2023-44487 — returns CVSS score, severity, affected products, references
Query by keyword: append ?keywordSearch=openssl&cvssV3Severity=CRITICAL
Auth: unauthenticated = 5 req/30 sec; free API key (register at nvd.nist.gov) = 50 req/30 sec
Docs: nvd.nist.gov/developers/vulnerabilities

The MCP layer adds: input validation (CVE format check before the call), schema enforcement (structured output the agent can reason over), error handling (rate limits, unreachable API), and the natural language interface. The NVD API does the actual data retrieval — your MCP server makes it agent-accessible.

Install MCP SDK and dependencies:

If using Python:

pip install mcp
pip install httpx  # for API requests
pip install pydantic  # for data validation

If using Node.js:

npm init -y
npm install @modelcontextprotocol/sdk
npm install axios  # for API requests

Pro Tip: Start with the Lab Setup Guide to avoid environment issues. MCP SDK frequently updates; verify you're on version 0.8.0 or later.

Part 2: Design the CVE Lookup Tool

Your MCP server will expose a single tool: query_cve. Here's the tool schema:

Input Schema:

{
  "type": "object",
  "properties": {
    "cve_id": {
      "type": "string",
      "description": "CVE identifier in format CVE-YYYY-NNNNN (e.g., CVE-2023-44487)",
      "pattern": "^CVE-[0-9]{4}-[0-9]{4,6}$"
    },
    "include_remediation": {
      "type": "boolean",
      "description": "Whether to include remediation steps (default: true)",
      "default": true
    }
  },
  "required": ["cve_id"]
}

Output Schema:

{
  "type": "object",
  "properties": {
    "cve_id": {"type": "string"},
    "published": {"type": "string", "format": "date-time"},
    "description": {"type": "string"},
    "severity": {"type": "string", "enum": ["critical", "high", "medium", "low", "unknown"]},
    "cvss_v3_score": {"type": "number", "minimum": 0, "maximum": 10},
    "affected_products": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "vendor": {"type": "string"},
          "product": {"type": "string"},
          "affected_versions": {"type": "array", "items": {"type": "string"}}
        }
      }
    },
    "remediation": {"type": "string"},
    "references": {"type": "array", "items": {"type": "string", "format": "uri"}}
  },
  "required": ["cve_id", "description", "severity"]
}

Common Pitfall: CVE databases have rate limits and uptime issues. Always implement retry logic and fallback responses. If the NVD API is unreachable, return a cached response or a user-friendly error message rather than crashing.

Part 3: Implement the MCP Server (Python Example)

Architecture Decision: MCP Server vs. API Client

An MCP (Model Context Protocol) server is middleware that: 1. Exposes tools to Claude via a standardized interface 2. Validates tool inputs using schemas (Pydantic models) 3. Calls external APIs (e.g., NVD for CVE data) 4. Returns structured responses back to Claude

A complete MCP server requires:

Async I/O (handling multiple concurrent tool calls)
Input validation (preventing malformed requests)
Error handling (timeouts, API errors, invalid inputs)
Tool registration (defining schemas so Claude knows what to call)

This is non-trivial infrastructure. In production, you'd deploy it. But for learning?

🔑 Key Concept: Building a real MCP server teaches you infrastructure thinking. But the logic of the server (fetch CVE from NVD, parse CVSS, extract affected products) is the same whether you use MCP, FastAPI, or direct API calls.

Claude Code Workflow: Design the Server Before You Finish the Build

You are still building a real MCP server in the lab. Use Claude Code here to reason through the server architecture before or during implementation, not instead of implementation:

Claude Code Prompt:

Teach me how an MCP server would work for CVE lookups. Walk through the architecture:

1. INPUT VALIDATION: If a user asks for "CVE-2023-44487", how would we validate it's a real CVE format?

2. EXTERNAL API CALL: Write pseudocode for calling the NVD API to fetch details. What parameters? What response shape?

3. PARSING: Given an NVD response JSON (I'll provide an example), extract:
   - CVSS v3 score
   - Severity (critical/high/medium/low based on score)
   - Affected products (vendor, product, versions)
   - References

4. RESPONSE FORMATTING: Shape the parsed data into a structured response a Claude agent would understand.

5. ERROR HANDLING: What could go wrong? (Timeout, malformed CVE ID, API rate limiting, network error)

Then show me: If Claude asked for CVE-2023-44487, what would the MCP server return?

[Include a sample NVD API response here so Claude can work with real data]

After Claude explains the architecture:

Ask:

"Show me the validation schema (Pydantic model) for a CVE query"
"Write the async function that calls NVD and parses the response"
"How would you implement error handling for timeouts?"
"What JSON schema would Claude see so it knows what parameters to pass?"

Why use Claude Code during the build:

Design the logic before you write infrastructure code
See the entire flow (validation → API call → parsing → response) clearly before debugging implementation details
Iterate on schemas, parsing rules, and failure handling faster
Use the prompt outputs as scaffolding for the real server you are building in the Week 5 lab
Reduce blind trial-and-error when you move into async code and MCP registration

Pro Tip: Treat Claude Code as your design and debugging partner. First design the tool flow in natural language, then translate it into the real Python/MCP server from the lab. The implementation is still required.

When to Build a Real MCP Server:

Build an actual MCP server when:

You have repeated tool calls (worth the infrastructure cost)
You need persistent state (caching, rate limiting)
You're deploying to production (security, reliability, monitoring)
You're integrating with proprietary APIs (credential management)

Don't build one when:

You're learning (use Claude Code)
Tool calls are infrequent (overhead not justified)
You're prototyping (iteration is slow with infrastructure)

Remember: The register_tool() call defines the tool's interface. The agent will read this schema to understand what parameters to send and what to expect back. Always include clear descriptions and constraints in your schemas.

Part 4: Connect to Claude Code

Create a claude_code_config.json:

{
  "mcpServers": {
    "cve-lookup": {
      "command": "python",
      "args": ["cve_mcp_server.py"]
    }
  }
}

Then register your MCP server with Claude:

claude mcp add cve-lookup -- python cve_mcp_server.py

Pro Tip: If your MCP server crashes, Claude Code's stdio transport will cleanly disconnect and log the error. Always run your server with logging enabled during development so you can debug communication issues.

Part 5: Test with the Agent

In Claude Code, interact with the agent:

I'd like you to help me check a few CVEs. First, what is CVE-2023-44487?

The agent should: 1. Discover the query_cve tool 2. Parse the CVE ID from your question 3. Call query_cve with {"cve_id": "CVE-2023-44487"} 4. Receive the JSON response 5. Summarize the findings in natural language

Try these follow-up queries:

"Are there any critical vulnerabilities in Apache Log4j released in 2023?"
"Find the CVSS score for CVE-2021-44228 and tell me the affected products."
"What recent vulnerabilities affect OpenSSL?"

Measure the response time and accuracy. Note whether the agent correctly parses CVE IDs and filters by date or severity.

Common Pitfall: The NVD API has rate limits (~60 requests per minute for unauthenticated access). If you hit rate limits, implement exponential backoff or cache responses locally. Your MCP server should return a clear error message rather than hanging.

Part 6: Error Handling and Edge Cases

Add error handling for:

Malformed CVE IDs — "CVE-2023-INVALID" should reject with schema validation error
Non-existent CVEs — Return a graceful "not found" message
API timeouts — Return a timeout error with retry guidance
Rate limiting — Detect HTTP 429 and wait before retrying

Test each error case and document how your server handles it.

Deliverables

MCP Server Code (Python or Node.js)
- Well-commented, with logging
- Input validation using schemas
- Error handling for all identified edge cases
- Deploy instructions (dependencies, how to run)
Tool Schema Documentation
- Input and output JSON schemas
- Example request/response
- Error codes and recovery steps
Performance Report
- Time to first CVE lookup: _____ ms
- Time for 5 sequential queries: _____ ms
- Timeout rate under normal load: _____ %
- Any API rate-limiting issues encountered: _____
Demo Video or Walkthrough (3–5 minutes)
- Show the MCP server starting
- Show Claude Code connecting and discovering tools
- Show the agent answering 3–5 natural language queries
- Demonstrate one error case and recovery

Sources & Tools

Week 6: Tool Design Patterns for Security Agents

Day 1 — Theory & Foundations

Learning Objectives

Apply the five core principles of secure tool design to real-world security scenarios
Evaluate input validation strategies and their effectiveness against injection attacks
Design least-privilege access controls for agent-tool communication
Analyze tool versioning and deprecation strategies in compliance-critical systems
Create testable, observable tool interfaces

The Five Pillars of Secure Tool Design

Over the past two years, security teams have learned painful lessons about agent-tool integration. A poorly designed tool can:

Allow agents to execute unintended queries (SQL injection through a log query tool)
Expose sensitive data that agents shouldn't access
Bypass rate limits and cause denial-of-service
Leave no audit trail (critical for compliance)
Fail unpredictably and require human intervention

Modern secure tool design rests on five interdependent principles:

1. Single Responsibility — Each tool does exactly one thing. A "security operations" tool that can query logs, enrich alerts, check IP reputation, and remediate threats is unmaintainable and difficult to audit. Instead, build four separate tools, each with a narrow scope. This makes it easier to test, audit, and revoke access to specific functionality.

2. Clear Schemas — Input and output must be unambiguous and machine-readable. A tool that accepts "anything goes" parameters (e.g., a bare string like log_query="SELECT * FROM events WHERE..."``) is vulnerable to injection. Instead, define strict JSON schemas with parameterized inputs. A log query tool should have discrete parameters:source_type,time_range,event_type,limit`, not a free-form query field.

3. Error Handling — Tools must fail gracefully. When something goes wrong, provide:

A clear error code (not a stack trace)
Context about what failed and why
Remediation steps if applicable
No sensitive information in error messages (don't leak internal paths or credentials)

4. Security Boundaries — Enforce least privilege at every level:

Input validation: Reject invalid requests before processing
Rate limiting: Prevent agents from hammering a tool
Data access scoping: Limit which records a tool can query
Temporal scoping: Some sensitive tools should only be available during business hours
Audit logging: Every call is logged with timestamp, requester, parameters, and result

5. Observability — Every tool invocation must be observable and auditable. This is non-negotiable for compliance. Log:

Who called the tool (agent identity)
What parameters were passed
How long it took
What the response was
Any errors or exceptions

🔑 Key Concept: "Security by default" means designing tools with the assumption that agents will, eventually, misuse them (whether through adversarial prompting or genuine mistakes). Make the secure path the easy path. This is the Pit of Success from Agentic Engineering practice—design your tools so that the right behavior (least privilege, validated input, audit-logged operations) is not just safe but the easiest path for the agent to follow.

Further Reading: See the Agentic Engineering additional reading on tool design for deeper discussion of tool design principles and the Pit of Success pattern.

The Service Layer Pattern: API First, MCP Second

Service Layer Pattern

Build your core logic as a standard REST API first. Then treat MCP as a thin translation layer that maps agent tool calls to your API endpoints. This keeps your business logic testable, reusable, and independent of any AI framework.

The MCP server describes what the agent can do — the API defines how it actually gets done. Changes to the AI interface (MCP) don't require touching business logic, and vice versa. This separation is where production governance hooks live in Unit 7.

In production environments, the most resilient tool architectures follow a critical principle: build the REST API first, then layer the MCP server on top as a consumption client. This pattern decouples core business logic from the AI integration layer.

The Architecture:

Core Business Logic
      ↓
FastAPI/Flask REST API (with auth, rate limiting, structured responses)
      ↓
    ┌─────────────────────────────────┐
    ├── MCP Server (wrapper)          │
    ├── Web Dashboard (client)         │
    ├── CLI Tool (client)              │
    └── CI/CD Pipeline (client)        │

Why This Matters:

Separation of Concerns: Security logic lives in the API. The MCP server is just a thin translation layer between MCP tool calls and REST endpoints.
Multiple Consumers: The same API serves agents, dashboards, CI/CD integrations, and future protocols.
Production-Ready Security: The API becomes your deployable artifact. It containerizes cleanly with proper auth, rate limiting, and observability.
Future-Proof: When agent protocols evolve (MCP → next-gen standards), you swap the wrapper—not the logic.

Example: A Threat Intel Lookup Tool

Naive approach: build an MCP server that directly queries a database.

# ❌ Monolithic MCP server—logic and protocol tightly coupled
class ThreatIntelMCPServer:
    def query_threat_intel(self, ip_address: str):
        # Direct database access, no auth, no rate limiting
        return db.query(f"SELECT * FROM threats WHERE ip = {ip_address}")

Production approach: build the API first, then wrap it.

# Step 1: Build the REST API
from fastapi import FastAPI, Depends, RateLimiter
app = FastAPI()

@app.get("/api/v1/threat-intel/{ip_address}")
async def get_threat_intel(ip_address: str, api_key: str = Depends(verify_api_key)):
    # Validate input
    if not is_valid_ip(ip_address):
        raise HTTPException(status_code=400, detail="Invalid IP")
    # Rate limiting, auth already applied
    return {"ip": ip_address, "threat_level": query_db(ip_address), ...}

# Step 2: Wrap with MCP server
class ThreatIntelMCPServer:
    def query_threat_intel(self, ip_address: str):
        # Call the REST API—the MCP server is a thin client
        response = requests.get(
            f"http://localhost:8000/api/v1/threat-intel/{ip_address}",
            headers={"Authorization": f"Bearer {self.api_key}"}
        )
        return response.json()

The Key Insight: The API is the stable contract. Authentication, rate limiting, audit logging, and data validation happen at the API boundary. The MCP server doesn't need to know about any of that—it just calls the API like any other client. When you deploy, you containerize the API, push to ECR, and run it on ECS. The MCP server (or future protocols) calls it without modification.

Input Validation: The First Line of Defense

Consider a log query tool. The naive implementation:

def query_logs(query: str):
    # DANGEROUS: Query injection vulnerability
    return execute_sql(f"SELECT * FROM logs WHERE {query}")

An agent might be tricked into running:

query_logs("source='app' OR 1=1 --")

This returns all logs, exposing sensitive data.

The secure implementation uses parameterized queries:

def query_logs(
    source_type: str,
    time_range: str,  # "last_hour", "last_day", "last_week"
    event_type: str = None,
    limit: int = 100
):
    # Validate source_type against whitelist
    allowed_sources = ["app", "network", "database", "authentication"]
    if source_type not in allowed_sources:
        raise ValueError(f"Invalid source: {source_type}")

    # Validate and parse time_range
    time_map = {
        "last_hour": "-1h",
        "last_day": "-1d",
        "last_week": "-7d"
    }
    if time_range not in time_map:
        raise ValueError(f"Invalid time range: {time_range}")

    # Validate event_type
    if event_type:
        allowed_events = ["error", "warning", "info", "authentication", "data_access"]
        if event_type not in allowed_events:
            raise ValueError(f"Invalid event type: {event_type}")

    # Validate limit
    if not isinstance(limit, int) or limit < 1 or limit > 1000:
        raise ValueError(f"Limit must be between 1 and 1000")

    # Now build the parameterized query
    query = "SELECT * FROM logs WHERE source_type = ?"
    params = [source_type]

    if time_range:
        query += " AND timestamp > ?"
        params.append(time_map[time_range])

    if event_type:
        query += " AND event_type = ?"
        params.append(event_type)

    query += " LIMIT ?"
    params.append(limit)

    return execute_parameterized_sql(query, params)

This approach:

Whitelists allowed values for categorical parameters
Validates ranges for numeric parameters
Uses parameterized queries (never string interpolation)
Provides clear error messages

Discussion Prompt: Your security team has a tool that can block IP addresses. The MCP schema requires an IP address as input. What validation would you add to prevent an agent from accidentally blocking 0.0.0.0 or blocking your company's entire subnet?

Discussion (~13 min): The format() Shadowing — AI Assumptions Kill You

Setup: Show students these two patches:

# Patch 1
def y(self):
    return format(self.data.year, "04d")[-2:]

# Patch 2
def y(self):
    return '%02d' % (self.data.year % 100)

Both fix 2-digit year formatting in Django for years before 1000 CE. Are they equivalent? Most students will say yes — format(476, '04d') gives '0476', take last 2 chars, get '76'. Modulo 100 of 476 is 76. Same result. Ask: "What assumptions are you making about format()?"

Reveal: Django's dateformat.py has a module-level function also named format() — it expects a datetime object, not an integer. Python's name resolution finds the module-level function BEFORE the builtin. Patch 1 calls the wrong format() and raises AttributeError. Standard AI reasoning missed this because it assumed format() is the builtin. Semi-formal reasoning caught it because the template required tracing the actual function definition — the agent ran grep to find where format() was defined in the module.

Key insight: "The AI assumed format() means what it always means. In this codebase, it doesn't. How many of YOUR assumptions about standard library functions are actually correct in the specific codebase you're working in?" For security tools: your config auditor that checks for 'default passwords' assumes it knows what 'default' means. But the application might have a custom auth module. Your scanner gives a clean report. The application is vulnerable. Same pattern, different domain.

Course connection: Anti-patterns reference Pattern 1.4 (State Assumptions). AI assumes clean context; production has layers of overrides, shadowing, and framework magic. Reference this example whenever students make assumptions about what functions do without checking.

Source: Ugare & Chandra, "Agentic Code Reasoning," arXiv:2603.01896v2

Least Privilege for Agent Tool Access

Even with perfect tool design, you must limit what agents can access. This involves several strategies:

1. Capability Scoping — An agent tasked with "triage alerts" doesn't need access to "modify firewall rules." Give it read-only access to logs and threat intelligence, not write access to security controls.

2. Data Scoping — An alert triage agent shouldn't access all historical logs—only logs from the past hour. A compliance auditor might need access to a broader set. Implement this at the tool level:

def query_logs(source_type: str, time_range: str = "last_hour", ...):
    # Some agents are allowed "last_week"; others only "last_hour"
    # This is enforced at the tool level based on agent identity
    if agent_role == "alert_triage":
        allowed_ranges = ["last_hour", "last_day"]
    elif agent_role == "compliance_auditor":
        allowed_ranges = ["last_hour", "last_day", "last_week", "last_month"]

    if time_range not in allowed_ranges:
        raise PermissionError(f"Agent {agent_id} cannot access time range {time_range}")

    # ... continue with query

3. Temporal Scoping — Sensitive tools (like credential rotation or firewall changes) might only be available during maintenance windows. Tools can return "not available" errors outside these windows.

4. Rate Limiting — Prevent an agent from hammering a tool:

from collections import defaultdict
from time import time

# Track calls per agent per minute
call_history = defaultdict(list)

def rate_limit_check(agent_id: str, max_calls_per_minute: int = 10):
    now = time()
    # Remove old calls (older than 1 minute)
    call_history[agent_id] = [t for t in call_history[agent_id] if now - t < 60]

    if len(call_history[agent_id]) >= max_calls_per_minute:
        raise RateLimitError(f"Agent {agent_id} has exceeded {max_calls_per_minute} calls per minute")

    call_history[agent_id].append(now)

Common Pitfall: Rate limiting without clear feedback is frustrating for users and can cause agents to retry ineffectively. Always return a clear error with guidance: "Rate limit exceeded. Try again in 45 seconds." Don't silently drop requests.

Policy must be in the tool, not trusted to the operator. A tool's scope constraints must be baked into its implementation — hardcoded allowed targets, maximum operation counts, required authorization checks. Relying on the operator (or the agent) to "use the tool responsibly" is not a security control. If ALLOWED_TARGETS is a frozenset in the tool code, it cannot be overridden at runtime. If it's a parameter the operator sets, it can be changed. Know the difference.

Tool Versioning and Deprecation

Security tools evolve. Your IP reputation API might improve its accuracy, your log query tool might add new filter options. How do you update tools without breaking agents that depend on them?

Semantic Versioning for Tools:

MAJOR version — Incompatible changes (removing a parameter, changing output type)
MINOR version — Backward-compatible additions (new optional parameter)
PATCH version — Bug fixes (no behavior changes)

Deprecation Strategy:

Release new MINOR version with new parameter, keeping old behavior
Document deprecation path in tool description: "The query parameter is deprecated as of v2.1. Use filters object instead."
Log deprecation warnings when agents use old parameters
Set deprecation timeline (e.g., "query parameter will be removed in v3.0, available until Jan 1, 2027")
Migrate gradually — Give agents and applications time to update

Example:

def query_logs_v2(
    source_type: str,
    time_range: str,
    filters: dict = None,  # NEW in v2.0
    query: str = None      # DEPRECATED, kept for compatibility
):
    if query is not None:
        logger.warning(f"query parameter is deprecated. Use filters instead.")
        # Parse query for backward compatibility
        filters = parse_legacy_query(query)

    if filters is None:
        filters = {}

    # ... rest of implementation

Testing Secure Tools

Secure tool design requires rigorous testing. A security tool test suite should include:

Input validation tests — Test every validation rule independently
Injection attack tests — Attempt SQL injection, command injection, etc.
Rate limit tests — Verify rate limiting works correctly
Authorization tests — Verify access control is enforced
Error handling tests — Verify error messages are safe and helpful
Audit log tests — Verify every call is logged correctly

Example test:

import pytest

def test_query_logs_rejects_invalid_source():
    """Test that invalid source types are rejected."""
    with pytest.raises(ValueError, match="Invalid source"):
        query_logs(source_type="invalid_source", time_range="last_hour")

def test_query_logs_sql_injection_protection():
    """Test that SQL injection is not possible through time_range."""
    with pytest.raises(ValueError):
        query_logs(source_type="app", time_range="'; DROP TABLE logs; --")

def test_query_logs_rate_limiting():
    """Test that rate limiting is enforced."""
    agent_id = "test_agent"
    for i in range(10):
        query_logs(agent_id, source_type="app", time_range="last_hour")

    # 11th call should fail
    with pytest.raises(RateLimitError):
        query_logs(agent_id, source_type="app", time_range="last_hour")

def test_query_logs_audit_logged():
    """Test that calls are logged."""
    with patch('audit_log.write') as mock_log:
        query_logs(source_type="app", time_range="last_hour")
        mock_log.assert_called_once()
        call_args = mock_log.call_args[0][0]
        assert "query_logs" in call_args
        assert "app" in call_args

Further Reading: See OWASP Application Security Testing Guide and Tool Design Patterns for detailed security testing methodologies.

Case Study: Operation GTG-1002 — When Attackers Build the Same Stack You're Learning

In November 2025, Anthropic published the full technical report on what they designated GTG-1002: the first documented case of AI-orchestrated cyber espionage executed at scale. This case study is required reading for this course because the attacker built exactly the architecture you're learning to build defensively—and every principle they violated is one your tools must enforce.

What Happened:

A Chinese state-sponsored threat actor (GTG-1002) built an autonomous attack framework using Claude Code and custom MCP servers. The framework decomposed complex multi-stage cyberattacks into discrete technical tasks for Claude sub-agents: vulnerability scanning, credential validation, data extraction, and lateral movement. Each task appeared legitimate when evaluated in isolation—the attacker presented them as routine technical requests through carefully crafted personas claiming to be employees of legitimate cybersecurity firms conducting authorized penetration testing.

The operation targeted roughly 30 entities—major technology corporations, financial institutions, chemical manufacturers, and government agencies. Anthropic's investigation validated a handful of successful intrusions.

The Architecture (Through Our Five Pillars Lens):

The attacker's MCP server architecture is a mirror image of what you built in Week 5—but with every secure design principle inverted:

1. Single Responsibility — Weaponized: The attacker did follow single responsibility. Each MCP server wrapped one commodity tool: a network scanner, a search tool, a data retrieval tool, a code analysis tool, an exploitation tool. This modularity is what made the attack hard to detect—each individual tool call looked like legitimate security work. The lesson: single responsibility is a design principle, not a security guarantee. Attackers use good architecture too.

2. Clear Schemas — Exploited for Deception: The attacker's tool schemas were clean and well-defined. They presented structured inputs (target IP, scan type, port range) that could pass any schema validation. The attack wasn't through malformed inputs—it was through correctly-formed inputs with malicious intent. This is why input validation alone is insufficient. You also need intent validation: Does this sequence of tool calls make sense for the agent's stated purpose?

3. Error Handling — Turned into Reconnaissance: When Claude's exploitation attempts failed, the error responses themselves became intelligence. A "connection refused" error on port 443 tells you there's no HTTPS service. A "timeout" tells you a firewall is filtering. The attacker's framework parsed errors to map the target's infrastructure. Defensive takeaway: your error messages must be designed with the assumption that an adversary is reading them.

4. Security Boundaries — Bypassed Through Social Engineering the Model: This is the most critical lesson. Claude is extensively trained to refuse harmful requests. The attacker's key innovation was role-play: they convinced Claude it was conducting authorized defensive security testing for a legitimate cybersecurity firm. The "social engineering" that works on humans also works on AI models. The attacker didn't break the security boundaries—they convinced the agent the boundaries didn't apply.

5. Observability — What Caught Them: Ultimately, it was operational tempo that triggered detection. The sustained request rates (thousands of requests, multiple operations per second) were physically impossible for human-directed operations. Anthropic's monitoring detected the anomalous pattern. The lesson: observability at the platform level caught what tool-level security missed.

🔑 Key Concept: GTG-1002 demonstrates that the tools and patterns you're learning (MCP servers, agent orchestration, multi-agent coordination) are dual-use. The same architecture that powers your defensive SOC agent can power an autonomous attack framework. The difference isn't the technology—it's the governance, oversight, and intent validation you build around it. This is why CCT matters: evidence-based analysis, inclusive perspective, and ethical governance aren't optional decorations—they're the difference between a security tool and a weapon.

The Hallucination Problem:

An important finding from the report: Claude frequently overstated findings and occasionally fabricated data during autonomous offensive operations—claiming to have obtained credentials that didn't work, or identifying "critical discoveries" that proved to be publicly available information. This AI hallucination in offensive security contexts presented real challenges for the attacker's operational effectiveness.

This is a powerful validation of CCT Pillar 1 (Evidence-Based Analysis) and Pillar 4 (Adaptive Innovation). Even the attacker had to deal with the fact that autonomous agents generate plausible-sounding results that require human verification. The attacker's 10-20% human involvement wasn't optional—it was necessary because the AI couldn't be trusted to validate its own outputs.

Discussion (~12 min): "The Agent Was Right for the Wrong Reasons"

Setup: A patch equivalence study evaluated whether agents correctly determine if two code patches produce the same behavior. The researchers acknowledged: "an agent might sometimes arrive at the right answer through flawed reasoning." They only checked the final YES/NO answer, not the reasoning chain.

Discussion prompt: Your CVE scanner reports "no vulnerabilities found." Is that a clean bill of health or a broken scanner? How do you tell the difference without running the scan again yourself? Push beyond "check the logs" — what if the logs say "scan complete, 0 findings"? Does that prove it actually scanned everything?

Key insight: A security tool that gives the right answer for the wrong reason will give the WRONG answer next time conditions change. The only way to trust a result is if the tool shows you WHAT it checked (premises), WHERE it looked (traces with file:line evidence), and HOW it concluded (formal derivation from evidence). This is a certificate. "0 vulnerabilities" isn't the output — "I checked these 47 files, traced these 12 data flows, tested these 5 input vectors, found no paths where untrusted input reaches sensitive operations" is the output. Without the certificate, you don't know if the scanner checked 47 files or 0 files. Both produce "0 vulnerabilities."

Course connection: AIUC-1 Domain A (Accountability). Every automated decision needs a traceable evidence chain. The certificate IS the audit trail. Every course skill produces a certificate, not just an answer. Return to this concept every time a student builds a tool that produces a bare result without evidence.

Source: Ugare & Chandra, "Agentic Code Reasoning," arXiv:2603.01896v2

Discussion Prompt: GTG-1002 used persona-based prompting to bypass safety guardrails ("I'm a security researcher at a legitimate firm conducting authorized testing"). How would you design a tool-level defense that can distinguish between legitimate security testing and malicious use of the same tools? Is this even possible at the tool level, or does it require platform-level monitoring?

The Human-AI Split:

The report quantifies the operational split: AI executed 80-90% of tactical work independently, with humans serving in strategic supervisory roles (10-20%). Human operators made decisions at critical escalation points: approving the transition from reconnaissance to active exploitation, authorizing use of harvested credentials for lateral movement, and making final decisions about data exfiltration scope.

This mirrors—inversely—the architecture we teach in this course. Your defensive agents should also operate with human oversight at decision gates. The difference is that your gates exist to ensure responsible action, while the attacker's gates existed to direct malicious action. The pattern is identical; the intent is opposite.

Further Reading: Anthropic, "Disrupting the First Reported AI-Orchestrated Cyber Espionage Campaign," Full Report, November 2025. This is the GTG-1002 primary source document. See also the June 2025 "vibe hacking" findings that documented the earlier, less autonomous precursor to this operation. Both are available in the Reading List.

Design before you build: the /think → /build-spec → build workflow.

/think — use Claude Code's /think skill to surface your design reasoning: "What are the security implications of each tool I'm adding? What could go wrong?"
/build-spec — use /build-spec to capture your decisions in a spec document: interface contracts, error handling strategy, PoLP decisions, known gaps
Build — with the spec written, build to the spec; decisions already made don't slow you down

The spec is not bureaucracy — it's the record of your security design choices. When you audit your own system in Unit 3, your spec is Exhibit A.

Discussion (~10 min): The Template Changes BEHAVIOR, Not Just FORMAT

Setup: Meta's research found something unexpected: structured templates changed what the agent DOES, not just how it reports. Agents given semi-formal templates took 2.8x more steps — they read more files, traced more function definitions, and explored more code paths — because the template required evidence they could only get by doing more work.

Discussion prompt: What's the difference between asking someone to "write a security report" and asking them to: (1) state what each component does, (2) trace how data flows through the system, (3) identify where untrusted input reaches sensitive operations, (4) conclude with a finding for each gap, citing file and line. Is the second one just a format specification? Or does it change what work the person actually does?

Key insight: The person doing the second task HAS to read more code, trace more paths, and think more carefully — because the template requires evidence they can only produce through deeper investigation. This is why SKILL.md templates matter more than most people realize. The template structure determines the quality of the agent's REASONING, not just the quality of its output. A skill that says "analyze this code" produces surface-level work. A skill that says "state premises, trace execution, identify divergences, conclude with evidence" produces deep work — because the agent has to actually DO those things to fill in the template.

Course connection: Walk through the /build-spec SKILL.md structure. Each section requires the agent to do specific investigation work (evaluate approaches, select with justification, define components). The template isn't formatting — it's a work specification.

Source: Ugare & Chandra, "Agentic Code Reasoning," arXiv:2603.01896v2

/code-review — The Post-Build Step Before You Commit

After building to your spec, run /code-review before committing. This is the ASSISTIVE gate — it's what separates "I built something" from "I built something ready to be audited."

Under the hood, /code-review dispatches 4 parallel review agents. Each examines your changes independently. Only findings with ≥80 confidence are reported — the threshold filters noise so you actually read the output. Findings are tagged: bug, security, performance, style, compliance.

/code-review                 # Review to terminal
/code-review --comment       # Post as PR comment (if on a branch)

Security framing: You are building security tools — tools that find vulnerabilities in other systems. /code-review checks for vulnerabilities you introduced while building the vulnerability-finder. The meta-security step that most developers skip.

Updated workflow for Week 6+:

/think → /build-spec → Build → /code-review → [ASSISTIVE gate ✓] → commit → /audit-aiuc1 (Week 9)

The confidence threshold (80) is configurable in the plugin's command file. Lower = more findings and more noise. Higher = fewer findings and potential misses. 80 is the course default — adjust after you understand your system's noise profile.

Discussion (~9 min): Same Model, Different Prompt, 10 Points Better

Setup: Patch equivalence accuracy improved from 78% to 88%. No fine-tuning. No new model. No additional training data. Same model. Different prompt template.

Discussion prompt: What does this tell you about where the leverage is in AI engineering? If you had one week to improve your security tool's accuracy, would you spend it trying a different model, or redesigning your prompt templates?

Key insight: The harness matters more than the model. The governance layer matters more than the raw capability. The prompt structure matters more than the prompt content. This is why this course focuses on harness design, skill templates, and evaluation methodology — not on which model is best this week. The model will change every quarter. The engineering principles are durable. Every component in a harness encodes an assumption about what the model can't do on its own. The semi-formal template encodes the assumption that "the model can't reliably gather evidence before concluding." When models improve enough to do this natively, the template becomes unnecessary overhead. Until then, it's 10 percentage points for free.

Course connection: Engineering Assessment Stack principle applied to prompting. Start with the simplest approach, measure, add structure where it improves results, remove structure when the model handles it natively.

Source: Ugare & Chandra, "Agentic Code Reasoning," arXiv:2603.01896v2

Day 1 Deliverable

Design a security tool (your choice of domain: log querying, IP reputation, alert enrichment, credential rotation, etc.) and document:

Tool Specification — Name, description, single responsibility
Input Schema — All parameters with validation rules and constraints
Output Schema — Response structure
Security Boundaries — How you enforce least privilege, rate limiting, data access scoping
Error Handling — 5–10 documented error cases with responses
Testing Plan — 10–15 test cases covering validation, injection attempts, authorization, rate limiting
Audit Logging Strategy — What information is logged for every call?

(2–3 pages, ~800–1000 words)

Day 2 — Hands-On Lab

Lab Objectives

Build a multi-tool MCP server with proper input validation and error handling
Implement rate limiting and audit logging
Design and test security boundaries at the tool level
Compose multiple tools to solve a realistic security workflow
Measure and report on tool performance and security

Part 1: Architecture Overview

Your multi-tool server will expose three tools:

query_logs — Structured, parameterized log queries
check_ip_reputation — Query threat intelligence for IP addresses
enrich_alert — Combine log data and threat intel to provide rich context

These tools work together: an agent receives a raw alert, uses query_logs to fetch context, uses check_ip_reputation to assess the source IP, and uses enrich_alert to synthesize a final assessment.

Part 2: Implement the Multi-Tool Server

Understand the MCP Architecture for Multi-Tool Security Analysis

The key insight: Instead of writing monolithic tools, you build modular tools that work together:

Tool 1: query_logs - Fetch context from SIEM/logs based on filters
Tool 2: check_ip_reputation - Assess IP threat level from reputation databases
Tool 3: enrich_alert - Synthesize a complete assessment from multiple sources

An MCP server orchestrates these tools. Claude (the agent) decides which tools to call and in what order.

🔑 Key Concept: The value of MCP isn't the individual tools—it's the composition. An agent can call query_logs, then check_ip_reputation, then enrich_alert in sequence, each tool using outputs from previous ones.

Discussion (~10 min): Agentic vs Single-Shot — Why Exploration Matters

Setup: Meta tested two approaches. Single-shot: the model sees a code snapshot and reasons from that alone — 80–87% accuracy. Agentic: the model can explore the repository, follow imports, read test files, trace function definitions — 87–93% accuracy.

Discussion prompt: When you review a pull request, do you only look at the diff? Or do you also look at what the changed functions call, what tests exist, what other files import the changed code, and how the change fits into the broader architecture? Now think about your MCP servers. When your CVE lookup MCP server returns results, does the agent just take those results at face value? Or does it explore further — cross-reference with other data sources, check the configuration, verify the result makes sense in context?

Key insight: A security agent that only analyzes what's directly in front of it is a single-shot scanner. A security agent that EXPLORES — follows dependencies, queries related systems, checks configurations, traces data flows across components — is an agentic scanner. The paper proves the agentic approach catches things single-shot misses. This is WHY your MCP servers matter — they give the agent access to explore beyond its immediate context. A CVE lookup MCP server, a SIEM query MCP server, and a config database MCP server together let the agent build a richer understanding than any single data source provides. But also: more tools = larger attack surface. The tradeoff is real.

Course connection: Justifies the multi-MCP architecture students are building. Each MCP server extends the agent's exploration capability. More tools = more evidence = better reasoning = more accurate security assessments.

Source: Ugare & Chandra, "Agentic Code Reasoning," arXiv:2603.01896v2

Why Not Just Write Python Functions?

You could write Python functions that do this:

def analyze_alert(alert_id):
    logs = query_logs(alert_id.source)
    ip_threat = check_ip_reputation(alert_id.source_ip)
    assessment = enrich_alert(logs, ip_threat)
    return assessment

But then you're orchestrating the logic in Python. With MCP, Claude orchestrates it. Why is that better?

Claude can adapt the flow based on intermediate results
You can compose tools from different sources (your SIEM, external threat intel, internal policies)
You don't need to rebuild logic when requirements change
Claude can explain its reasoning (which tools it used, why)

Claude Code Workflow: Design the Multi-Tool System

Instead of implementing a full MCP server, use Claude Code to design it:

Claude Code Prompt:

I'm building a security alert enrichment system with three tools:

1. query_logs(source_type, time_range, event_type, limit) - Returns log entries matching filters
   - Inputs: source_type (app|network|database|auth), time_range (last_hour|day|week), event_type (optional), limit (1-1000)
   - Output: Array of log entries with timestamp, message, severity

2. check_ip_reputation(ip_address) - Returns threat intel on an IP
   - Input: IP address
   - Output: {threat_level: low|medium|high|critical, reputation_score: 0-100, known_attacks: [...], false_positive_risk: 0-100}

3. enrich_alert(alert_id, logs, ip_threat) - Synthesizes a final assessment
   - Inputs: alert_id, logs from tool 1, threat data from tool 2
   - Output: {incident_severity: low|medium|high|critical, recommended_action: string, reasoning: string}

Walk me through how an agent would use these tools to analyze this alert:

ALERT: Unusual data access
- Source IP: 203.45.12.89
- User: admin@company.local
- Action: Downloaded 10 GB from customer database
- Time: 2026-03-05 14:22 UTC

The agent should:
1. Call query_logs to get context about this IP, this user, and this action
2. Call check_ip_reputation on the IP
3. Call enrich_alert with results from steps 1 and 2
4. Return the final assessment

Show me: What would the agent ask at each step? What outputs would it receive? How would it reason about them?

After Claude walks through the logic:

Ask:

"What validation would you add to prevent malicious inputs to query_logs?"
"How would you handle if check_ip_reputation times out?"
"What if the tools return conflicting signals (logs say suspicious, IP reputation says clean)?"
"How would an agent decide whether to escalate to human SOC?"

Key Takeaway:

MCP servers are infrastructure. The pattern of decomposing complex analysis into modular, composable tools is universal. You can learn that pattern in Claude Code before investing in building real infrastructure.

When you're ready to build a production MCP server, you'll: 1. Translate your Claude Code design to Python classes and async functions 2. Add error handling (timeouts, retries, circuit breakers) 3. Deploy the server alongside your SIEM/threat intel integrations 4. Monitor tool call patterns (which tools do agents use most? which fail most?)

But the hard part—deciding which tools to build and how they should interact—you figure out in Claude Code first.

Part 3: Test with Claude Code

Create test scenarios in Claude Code:

Scenario 1: IP Reputation Check

An alert came in: suspicious IP 198.51.100.89 was detected accessing the web server. Check the IP's reputation and tell me what you find.

The agent should: 1. Call check_ip_reputation with the IP 2. Receive high risk score and "scanner" threat type 3. Recommend blocking

Scenario 2: Alert Enrichment

We have an alert: Alert ID alert-001, source IP 203.0.113.42 attempted to access a protected API endpoint. Can you enrich this alert and tell me the severity?

The agent should: 1. Call enrich_alert with the details 2. Combine IP reputation and log data 3. Return overall severity and recommendation

Scenario 3: Multi-tool Investigation

I noticed some suspicious log activity from 198.51.100.89. Can you check that IP's reputation, find all logs from that IP, and give me a complete assessment?

The agent should compose multiple tools to provide a comprehensive answer.

Part 4: Validate and Test Security

Test input validation:
- Try query_logs with an invalid source_type (should be rejected)
- Try check_ip_reputation with malformed IP (should be rejected)
- Try enrich_alert with missing required fields (should be rejected)
Test rate limiting:
- Make 25 rapid calls to the same tool (should be rate-limited after 20)
- Verify error message includes guidance
Test audit logging:
- Run several tool calls
- Check audit.log file
- Verify all calls are logged with timestamp, agent_id, tool, parameters, and result
Test error handling:
- Request logs for a non-existent source type
- Check IP reputation for an IP not in the database
- Verify graceful error responses

Part 5: Performance Analysis

Measure: 1. Average response time per tool (query_logs, check_ip_reputation, enrich_alert) 2. Maximum response time 3. Rate of successful calls (what percentage succeed without errors)

Create a brief performance report:

| Tool | Avg Time (ms) | Max Time (ms) | Success Rate |
|------|---------------|---------------|--------------|
| query_logs | ___ | ___ | ___% |
| check_ip_reputation | ___ | ___ | ___% |
| enrich_alert | ___ | ___ | ___% |

Part 6: Red Team / Blue Team Exercise — Operation Forge Fire

This exercise applies the GTG-1002 case study from today's lecture. You'll build both sides of the equation: a simplified offensive reconnaissance agent AND a defensive detection agent, both targeting your own sandboxed infrastructure. You'll experience firsthand how the same tools and patterns serve offense and defense.

Common Pitfall: This exercise involves building tools that perform network reconnaissance. You will only target infrastructure you own and control (local Docker containers). Never run reconnaissance tools against systems you don't own. This is both a legal requirement and a professional ethics standard.

Step 1: Set Up Your Sandboxed Target Environment

Launch a local target environment using Docker Compose. This creates the isolated infrastructure your red team agent will scan and your blue team agent will defend:

mkdir -p ~/forge-fire-lab && cd ~/forge-fire-lab

cat > docker-compose.yml << 'EOF'
version: "3.8"
services:
  # Target web application
  webapp:
    image: nginx:alpine
    ports:
      - "8080:80"
    volumes:
      - ./webapp:/usr/share/nginx/html
    networks:
      - forge-net

  # Target database
  database:
    image: postgres:15-alpine
    environment:
      POSTGRES_USER: admin
      POSTGRES_PASSWORD: forge-lab-2026
      POSTGRES_DB: targets
    ports:
      - "5432:5432"
    networks:
      - forge-net

  # Simulated internal API
  api:
    image: python:3.11-slim
    command: python -m http.server 9000
    ports:
      - "9000:9000"
    networks:
      - forge-net

  # Log collector (your blue team monitors this)
  log-collector:
    image: python:3.11-slim
    command: python -m http.server 8888
    ports:
      - "8888:8888"
    volumes:
      - ./logs:/app/logs
    networks:
      - forge-net

networks:
  forge-net:
    driver: bridge
EOF

mkdir -p webapp logs
echo "<h1>Meridian Financial Portal</h1>" > webapp/index.html
docker-compose up -d

Step 2: Responsible AI Security Testing — Getting Claude's Approval

Before building offensive tools, you must properly frame your security testing context for Claude. This is a critical professional skill. The GTG-1002 attackers bypassed safety guardrails through deceptive role-play. You'll do the opposite: transparent, honest framing of legitimate security testing.

Create a file security-testing-policy.md:

# Responsible AI Security Testing Policy — Noctua Lab Exercise

## Authorization Scope
- **Target systems:** Local Docker containers ONLY (127.0.0.1, localhost)
- **Target ports:** 8080, 5432, 9000, 8888 (as defined in docker-compose.yml)
- **Authorized activities:** Port scanning, service enumeration, banner grabbing
- **Prohibited activities:** Exploitation, credential brute-forcing, data exfiltration
- **Duration:** This lab session only
- **Authorization:** Course instructor-approved lab exercise

## Ethical Boundaries
- All targets are locally owned and controlled infrastructure
- No external systems will be scanned or contacted
- All findings are for educational analysis only
- This exercise teaches defensive awareness through controlled offensive simulation

## Claude Usage Guidelines
- Clearly state the educational context in every prompt
- Never ask Claude to generate actual exploit code
- Focus on reconnaissance patterns, not exploitation
- Document all interactions for course submission

Now, when prompting Claude for offensive tool design, include this context:

I'm a graduate student in an AI Security Engineering course. I have a local Docker
environment with 4 containers (nginx on 8080, postgres on 5432, python http.server
on 9000 and 8888) that I set up for a lab exercise.

I need to build a simple MCP tool that performs service enumeration against these
LOCAL containers only (127.0.0.1). This is for a course lab exercise studying the
GTG-1002 case study — we're learning how attackers used MCP tools for reconnaissance
so we can build better defenses.

The tool should ONLY scan localhost and should refuse any non-local targets.
Can you help me design the tool schema and implementation?

🔑 Key Concept: Notice the difference between this prompt and what GTG-1002 did. The attacker created a false persona ("I'm a security researcher at a legitimate firm") to bypass guardrails. You're providing truthful context ("I'm a student in a lab exercise targeting my own infrastructure"). Transparent intent is not just ethically required—it produces better results because Claude can tailor its assistance to your actual needs rather than a fabricated scenario.

Step 3: Build the Red Team — Reconnaissance Agent

Using Claude, design and build a simple reconnaissance MCP tool that can only target localhost:

# red_team_recon.py — Reconnaissance MCP Tool (localhost only)
import socket
import json
from datetime import datetime

class ReconTool:
    """Simplified reconnaissance tool for educational security testing.
    HARD-CODED to localhost only — will refuse all other targets."""

    ALLOWED_HOSTS = ["127.0.0.1", "localhost", "0.0.0.0"]
    ALLOWED_PORTS = range(1, 65536)
    MAX_PORTS_PER_SCAN = 100  # Rate limiting

    def port_scan(self, target: str, ports: list[int]) -> dict:
        """Scan specified ports on localhost ONLY."""
        # SECURITY: Hard-coded localhost restriction
        if target not in self.ALLOWED_HOSTS:
            return {
                "error": "BLOCKED: This tool only scans localhost.",
                "target_requested": target,
                "policy": "Lab exercise tools are restricted to local targets."
            }

        if len(ports) > self.MAX_PORTS_PER_SCAN:
            return {"error": f"Too many ports. Maximum {self.MAX_PORTS_PER_SCAN}."}

        results = []
        for port in ports:
            try:
                sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
                sock.settimeout(1)
                result = sock.connect_ex(("127.0.0.1", port))
                status = "open" if result == 0 else "closed"
                results.append({"port": port, "status": status})
                sock.close()
            except Exception as e:
                results.append({"port": port, "status": "error", "detail": str(e)})

        return {
            "target": "127.0.0.1",
            "timestamp": datetime.utcnow().isoformat(),
            "ports_scanned": len(ports),
            "results": results,
            "open_ports": [r for r in results if r["status"] == "open"]
        }

    def banner_grab(self, target: str, port: int) -> dict:
        """Attempt to grab service banner from an open port on localhost."""
        if target not in self.ALLOWED_HOSTS:
            return {"error": "BLOCKED: This tool only targets localhost."}

        try:
            sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
            sock.settimeout(3)
            sock.connect(("127.0.0.1", port))
            sock.send(b"HEAD / HTTP/1.1\r\nHost: localhost\r\n\r\n")
            banner = sock.recv(1024).decode("utf-8", errors="replace")
            sock.close()
            return {
                "target": "127.0.0.1",
                "port": port,
                "banner": banner[:500],  # Truncate for safety
                "timestamp": datetime.utcnow().isoformat()
            }
        except Exception as e:
            return {"target": "127.0.0.1", "port": port, "error": str(e)}

Use Claude to run the red team tools against your Docker targets:

Using the ReconTool, scan localhost ports 8080, 5432, 9000, and 8888.
Then banner-grab any open ports. Report what services you find and what
an attacker would learn from this reconnaissance.

Compare your findings to Phase 2 of the GTG-1002 report: "Reconnaissance
and attack surface mapping." What information did the attacker gather,
and how does our simplified scan compare?

Step 4: Build the Blue Team — Detection Agent

Now build the defensive side. Your blue team MCP tool monitors the same infrastructure and detects the red team's activity:

# blue_team_monitor.py — Detection MCP Tool
import json
import os
from datetime import datetime
from collections import defaultdict

class DetectionTool:
    """Blue team monitoring and anomaly detection tool."""

    def __init__(self):
        self.connection_log = []
        self.alert_threshold = 10  # connections per minute = suspicious
        self.baseline = {}

    def log_connection(self, source_ip: str, dest_port: int,
                       protocol: str = "TCP") -> dict:
        """Log an observed connection attempt."""
        event = {
            "timestamp": datetime.utcnow().isoformat(),
            "source_ip": source_ip,
            "dest_port": dest_port,
            "protocol": protocol,
            "event_type": "connection_attempt"
        }
        self.connection_log.append(event)
        return event

    def detect_port_scan(self, time_window_seconds: int = 60) -> dict:
        """Detect port scanning behavior in recent connection logs."""
        now = datetime.utcnow()
        recent = [e for e in self.connection_log
                  if (now - datetime.fromisoformat(e["timestamp"])).seconds
                     < time_window_seconds]

        # Group by source IP
        by_source = defaultdict(list)
        for event in recent:
            by_source[event["source_ip"]].append(event["dest_port"])

        alerts = []
        for ip, ports in by_source.items():
            unique_ports = len(set(ports))
            if unique_ports >= 4:  # Scanning 4+ ports = suspicious
                alerts.append({
                    "alert_type": "PORT_SCAN_DETECTED",
                    "severity": "high" if unique_ports >= 10 else "medium",
                    "source_ip": ip,
                    "unique_ports_scanned": unique_ports,
                    "ports": sorted(set(ports)),
                    "total_connections": len(ports),
                    "time_window": f"{time_window_seconds}s",
                    "recommendation": "Investigate source. Consider temporary block.",
                    "cct_note": "Apply Pillar 2: What would the network team say? "
                               "Is this a legitimate service health check?"
                })

        return {
            "scan_time": now.isoformat(),
            "events_analyzed": len(recent),
            "alerts": alerts,
            "status": "ALERT" if alerts else "CLEAR"
        }

    def analyze_reconnaissance_pattern(self, events: list) -> dict:
        """Apply CCT analysis to detected reconnaissance activity."""
        return {
            "pillar_1_evidence": {
                "description": "Raw observations from connection logs",
                "observations": [
                    f"{len(events)} connection attempts detected",
                    f"Unique ports targeted: {len(set(e['dest_port'] for e in events))}",
                    f"Time span: {events[0]['timestamp']} to {events[-1]['timestamp']}"
                        if events else "No events"
                ],
                "question": "Are these facts or could monitoring artifacts mislead us?"
            },
            "pillar_2_perspective": {
                "description": "Who else should weigh in?",
                "teams_to_consult": [
                    "Network team: Is this normal service discovery?",
                    "DevOps: Did someone deploy a health checker?",
                    "Application team: Are services self-registering?"
                ]
            },
            "pillar_3_connections": {
                "description": "What patterns emerge?",
                "questions": [
                    "Does the scanning pattern match known GTG-1002 TTPs?",
                    "Are the targeted ports consistent with data exfiltration?",
                    "What second-order effects if this IS an attack?"
                ]
            },
            "pillar_4_adaptive": {
                "description": "What would prove us wrong?",
                "hypotheses": [
                    "H1: This is an automated attack (evidence: rapid sequential scanning)",
                    "H2: This is a misconfigured monitoring tool (evidence: regular interval)",
                    "H3: This is an authorized penetration test (evidence: check with security team)"
                ],
                "falsification": "If connections are at regular intervals from a known "
                                 "monitoring IP, H2 is most likely."
            },
            "pillar_5_ethics": {
                "description": "Proportional response?",
                "considerations": [
                    "Blocking too aggressively may disrupt legitimate services",
                    "Not blocking fast enough may allow data exfiltration",
                    "Document all decisions for audit trail"
                ]
            }
        }

Step 5: Red vs. Blue — The Forge Fire Exercise

Now run both sides simultaneously. This is the core exercise:

Red team agent runs reconnaissance using the ReconTool against your Docker containers
Blue team agent monitors connections and runs detect_port_scan in real-time
You analyze the results through both lenses — what the attacker learned vs. what the defender detected

Use Claude to orchestrate both sides:

You are running a red team / blue team exercise. I have two tool sets:

RED TEAM (ReconTool): port_scan, banner_grab — targets localhost Docker containers
BLUE TEAM (DetectionTool): log_connection, detect_port_scan, analyze_reconnaissance_pattern

Run this exercise:
1. Red team: Scan ports 80, 443, 5432, 8080, 8888, 9000, 3306, 6379, 27017, 22 on localhost
2. Blue team: Log each connection attempt the red team makes
3. Blue team: Run port scan detection
4. Blue team: Run CCT analysis on detected activity
5. Compare: What did the red team learn? What did the blue team detect? What was missed?

Then answer: If this were a real GTG-1002-style attack, what additional detection
would you need? What would the attacker do next after this reconnaissance phase?

Step 6: Draft a Responsible AI Security Testing Policy

Based on what you've learned from the GTG-1002 case study and this exercise, draft a formal Responsible AI Security Testing Policy for your organization. This document should be something you could hand to a security team lead or a compliance officer.

Your policy should address:

Scope and Authorization — What systems can be tested? Who authorizes testing? How is authorization documented?
AI Model Usage — How should security testers frame requests to AI models? What is transparent vs. deceptive prompting? Where is the ethical line?
Tool Restrictions — What hard-coded restrictions should offensive tools have? (Target restrictions, rate limits, scope limits)
Monitoring and Oversight — How should AI-assisted security testing be monitored? What triggers a halt?
Documentation and Audit — What records must be kept? How are findings reported?
Lessons from GTG-1002 — What specific practices from the case study inform your policy? (persona abuse, autonomous escalation without human gates, hallucinated findings)

Remember: The GTG-1002 attacker's key innovation was deceptive framing — convincing Claude it was doing legitimate security work through false personas. Your policy must address: how does an organization distinguish between legitimate AI-assisted security testing and adversarial abuse of the same capabilities? The answer isn't just technical—it's procedural, cultural, and ethical.

Deliverables

Multi-Tool MCP Server Code (Parts 1–5)
- All three defensive tools with input validation, error handling, rate limiting
- Audit logging for all calls
- Well-commented, production-ready
Tool Schema Documentation
- Input schema for each tool
- Output schema for each tool
- Validation rules
- Error codes and messages
Red Team / Blue Team Lab Report (Part 6)
- Red team reconnaissance findings (what did your agent discover about the Docker environment?)
- Blue team detection results (what alerts fired? what was the CCT analysis?)
- Gap analysis: What did the red team learn that the blue team missed?
- Comparison to GTG-1002: How does your simplified exercise map to the real operation's Phase 2–3?
- Your Claude interaction log showing how you framed the security testing context
Responsible AI Security Testing Policy (1,000–1,500 words)
- Scope, authorization, tool restrictions, monitoring, documentation
- Specific lessons from GTG-1002 integrated into policy recommendations
- Clear position on transparent vs. deceptive prompting of AI models
Performance and Security Report
- Response times for each tool
- Rate limiting behavior
- Audit log samples from both red and blue team operations
- Security boundaries tested and results

Sources & Tools

MCP Specification
Pydantic Documentation (for schema validation)
Claude Agent SDK
OWASP Injection Prevention Cheat Sheet
Rate Limiting Patterns
Anthropic, "Disrupting the First Reported AI-Orchestrated Cyber Espionage Campaign," Full Report, November 2025
Anthropic, "Detecting and Countering AI-Enabled Cyber Threats" (Vibe Hacking findings), June 2025

Week 7: Structured Outputs & Security Reporting

Day 1 — Theory & Foundations

Learning Objectives

Distinguish between natural language and deterministic structured outputs in security contexts
Design JSON schemas for compliance-ready security reports
Implement output validation and constraint enforcement
Chain multiple Claude calls with different schemas to build complex analyses
Integrate AI-generated reports into downstream security systems (SIEM, SOAR, ticketing)

The Determinism Imperative in Security

Imagine a security analyst receives an alert and needs to decide: block this IP, log it, or ignore it? If the decision comes from an LLM's natural language response ("This IP might be a threat..." or "The risk is probably moderate..."), the analyst must interpret the uncertainty. Different analysts might interpret the same output differently, leading to inconsistent decisions.

Deterministic structured outputs solve this. Instead of natural language, the LLM returns strict JSON that downstream systems can parse and act on automatically:

{
  "alert_id": "alert-12345",
  "severity": "critical",
  "action": "block",
  "confidence": 0.92
}

Now a SOAR platform can automatically block the IP with 92% confidence, and an analyst can review the decision if confidence falls below 90%. No ambiguity.

🔑 Key Concept: In security, "precise" beats "conversational." Deterministic outputs enable automation, auditability, and compliance. They also reduce alert fatigue by eliminating vague threat assessments.

Structured Output Formats for Security

JSON for Machine Readability — JSON is the lingua franca of modern security tooling. It's easy for machines to parse and validate, and it integrates natively with APIs, databases, and workflow engines.

CVSS for Severity — The Common Vulnerability Scoring System (CVSS) provides a standardized way to rate the severity of vulnerabilities. Rather than inventing custom severity scales, use CVSS v3.1:

CVSS Base Score: 0.0–10.0
9.0–10.0: Critical
7.0–8.9: High
4.0–6.9: Medium
0.1–3.9: Low

MITRE ATT&CK for Threat Classification — MITRE ATT&CK is a knowledge base of adversary tactics and techniques based on real-world observations. Rather than creating custom threat taxonomies, map findings to ATT&CK:

{
  "mitre_attack": {
    "tactic": "Reconnaissance",
    "technique": "Active Scanning",
    "technique_id": "T1595"
  }
}

STIX/TAXII for Threat Intelligence — STIX (Structured Threat Information Expression) is an XML/JSON format for sharing threat intelligence. Tools like Splunk, Cortex XSOAR, and VirusTotal speak STIX natively, so AI-generated threat reports can flow directly into these platforms.

Designing Compliance-Ready JSON Schemas

A compliance-ready security report must include:

Metadata — Alert ID, timestamp, analyst/system
Assessment — What happened, severity, confidence
Evidence — What data supports the assessment
Recommendations — What action to take
Audit Trail — Who analyzed it, when, and why

Example schema:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "title": "Security Alert Assessment",
  "properties": {
    "alert_id": {
      "type": "string",
      "description": "Unique identifier for the alert",
      "pattern": "^alert-[0-9]+$"
    },
    "timestamp": {
      "type": "string",
      "format": "date-time",
      "description": "ISO 8601 timestamp of assessment"
    },
    "severity": {
      "type": "string",
      "enum": ["critical", "high", "medium", "low"],
      "description": "Alert severity"
    },
    "confidence": {
      "type": "number",
      "minimum": 0,
      "maximum": 1,
      "description": "Confidence level (0.0–1.0)"
    },
    "threat_classification": {
      "type": "object",
      "properties": {
        "attack_type": {"type": "string"},
        "mitre_tactic": {"type": "string"},
        "mitre_technique": {"type": "string"},
        "mitre_technique_id": {"type": "string", "pattern": "^T[0-9]{4}$"}
      }
    },
    "evidence": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "source": {"type": "string"},
          "timestamp": {"type": "string", "format": "date-time"},
          "description": {"type": "string"}
        }
      }
    },
    "recommended_actions": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "priority": {"type": "string", "enum": ["immediate", "urgent", "standard"]},
          "action": {"type": "string"},
          "rationale": {"type": "string"}
        }
      }
    }
  },
  "required": ["alert_id", "timestamp", "severity", "confidence", "evidence", "recommended_actions"]
}

Common Pitfall: Schemas that are too loose (allowing any string for severity, for example) defeat the purpose of structuring outputs. Enforce constraints with enums, patterns, and ranges. A severity field should accept only "critical", "high", "medium", "low"—not free-form text.

Chaining Claude Calls with Different Schemas

Complex security analyses often require multiple steps:

Step 1: Classify the alert (threat type, MITRE mapping)
Step 2: Enrich with context (affected systems, timeline)
Step 3: Generate recommendations (actions, priority)

Rather than asking Claude to do all three in one call, chain calls with different output schemas:

# Step 1: Threat classification
classification = claude_call(
    prompt=f"Classify this alert: {alert_data}",
    output_schema=threat_classification_schema
)

# Step 2: Enrichment
enrichment = claude_call(
    prompt=f"Enrich this alert with context: {alert_data}. Classification: {json.dumps(classification)}",
    output_schema=enrichment_schema
)

# Step 3: Recommendations
recommendations = claude_call(
    prompt=f"Generate security recommendations for this alert: {json.dumps({**classification, **enrichment})}",
    output_schema=recommendations_schema
)

# Combine results
final_report = {
    **classification,
    **enrichment,
    **recommendations
}

This approach:

Lets Claude focus on one task per call (better quality)
Allows intermediate validation (if classification is nonsense, stop)
Enables reuse (can use the classification in multiple downstream analyses)
Facilitates debugging (each step has clear input/output)

Pro Tip: Claude is better at complex reasoning when you break tasks into steps with clear schemas at each step. Instead of "analyze this alert and give me everything," try "first classify the alert type," then "then determine severity," then "finally recommend actions."

Pattern: Claude Calling Claude. When your MCP tool needs to analyze, classify, or generate structured content, it can call the Claude API internally. This creates a composition: one Claude session orchestrates (the user-facing agent), another Claude call specializes (the tool-level reasoning task).

Design principles for this pattern:

Role separation — the orchestrating Claude handles conversation; the tool-level Claude handles a single focused task (classification, extraction, report generation)
Model selection — use a faster/cheaper model for the tool-level task if the task is well-defined; reserve Sonnet for complex reasoning
Context isolation — the tool-level Claude call should have only the data it needs; don't pass the full conversation history into a tool call

Pattern: Schema-Driven Decomposition. When a complex report requires multiple reasoning tasks (classification → enrichment → recommendation), decompose into sequential Claude calls, each with its own focused schema.

Why this works:

Each call is a narrow reasoning task with a clear validation target
Validation gates between calls prevent bad data from propagating
A failed validation at step 2 stops step 3 — no recommendations generated from bad enrichment data

When validation fails repeatedly (>2 retries), treat it as a signal of schema-model mismatch — your schema may be too strict, too ambiguous, or asking the model to reason about something it can't reliably output in that structure. Adjust the schema, not the retry count.

Integration with Security Infrastructure

SIEM Integration (Splunk Example)

A SIEM can ingest structured reports and create dashboards:

import splunk_sdk

# Generate structured report
report = generate_security_report(alert)

# Send to Splunk
splunk_client.ingest(
    source="ai_security_agent",
    sourcetype="security_alert_assessment",
    event=json.dumps(report)
)

SOAR Integration (Cortex XSOAR Example)

A SOAR platform can consume structured assessments and orchestrate responses:

# If confidence > 0.9 and severity == "critical", execute incident response playbook
if report["confidence"] > 0.9 and report["severity"] == "critical":
    xsoar_client.execute_playbook(
        playbook="incident_response_critical",
        incident_data=report
    )

Ticketing Integration (Jira/ServiceNow Example)

Create tickets automatically from reports:

ticket = jira_client.create_issue(
    project="SEC",
    issue_type="Security Incident",
    summary=report["summary"],
    description=json.dumps(report, indent=2),
    priority=priority_map[report["severity"]],  # Map severity to Jira priority
    components=["SecOps"]
)

Validation and Quality Assurance

Every structured output must be validated against its schema:

import jsonschema

def validate_security_report(report, schema):
    try:
        jsonschema.validate(instance=report, schema=schema)
        return True, None
    except jsonschema.ValidationError as e:
        return False, str(e)

# Use it
is_valid, error = validate_security_report(claude_output, report_schema)
if not is_valid:
    logger.error(f"Report validation failed: {error}")
    # Handle invalid output (re-prompt, fallback, etc.)

A robust system should: 1. Always validate outputs against schemas 2. Log validation failures for debugging 3. Retry on validation failure (sometimes re-prompting Claude with the error fixes it) 4. Have a fallback if validation repeatedly fails (e.g., escalate to human analyst)

Further Reading: JSON Schema Best Practices and CVSS v3.1 Specification.

Include model attribution in structured outputs. Every agent output that goes downstream — to another system, to a human reviewer, to an audit log — should include metadata about how it was produced:

Which model classified/analyzed the data
Which model generated the output
Timestamp and duration
Invocation chain (what called what)

This is not logging overhead — it's the audit record. When a recommendation is wrong, you need to know whether the classification was wrong, the enrichment was wrong, or the output generation was wrong. Without attribution, you're debugging blind.

Bulk Analysis — The Message Batches API

When structured reporting is applied at scale — processing thousands of alerts, scanning a CVE backlog, or generating compliance reports for an entire infrastructure — sequential API calls are too slow and expensive. The Message Batches API lets you submit up to 10,000 requests in a single batch and process them asynchronously, with a 50% cost reduction compared to standard API calls.

Security use case: bulk alert triage

import anthropic

client = anthropic.Anthropic()

# Build one request per alert
alert_requests = []
for alert in alerts:  # alerts is a list of raw alert dicts
    alert_requests.append({
        "custom_id": alert["alert_id"],  # correlate results back to source
        "params": {
            "model": "claude-haiku-4-5-20251001",  # fast/cheap for bulk classification
            "max_tokens": 256,
            "system": "Classify this security alert as: false_positive | low | medium | high | critical. Return JSON with fields: classification, confidence (0.0-1.0), reason (one sentence).",
            "messages": [{"role": "user", "content": str(alert)}]
        }
    })

# Submit the batch
batch = client.beta.messages.batches.create(requests=alert_requests)
print(f"Batch ID: {batch.id} — {len(alert_requests)} alerts submitted")

# Poll until complete (or use a webhook in production)
import time
while batch.processing_status == "in_progress":
    time.sleep(60)
    batch = client.beta.messages.batches.retrieve(batch.id)

# Collect results
for result in client.beta.messages.batches.results(batch.id):
    if result.result.type == "succeeded":
        alert_id = result.custom_id
        classification = result.result.message.content[0].text
        # Parse JSON and route to SOAR/SIEM

Assistant prefill is a complementary technique for enforcing structured output. You can pre-populate the start of Claude's response with an opening brace, forcing it to complete a JSON object:

messages = [
    {"role": "user", "content": "Classify this alert: " + alert_text},
    # Prefill the assistant turn to force JSON output — Claude must continue this
    {"role": "assistant", "content": "{"}
]

The model receives its own partial response as context and will complete the JSON structure. Combined with Batches API, this ensures every batch result starts valid JSON, reducing parse failures in automated pipelines. Use prefill when schema compliance is non-negotiable; use standard structured output prompting when you need Claude to reason before formatting.

Day 1 Deliverable

Design a complete structured reporting system for a security domain of your choice (breach detection, vulnerability triage, incident response, threat hunting, etc.):

JSON Schema — Define the report structure with all required fields, enums, ranges, and validation rules (2–3 pages)
Chaining Strategy — Describe how you would break down the analysis into 2–3 Claude calls with different output schemas
Integration Plan — Describe how this report would integrate with SIEM, SOAR, or ticketing systems
Validation Strategy — How you would validate outputs and handle failures
Example Report — Provide a sample output that conforms to your schema

(3–4 pages, ~1200–1500 words)

Day 2 — Hands-On Lab

Lab Objectives

Build an automated security report generator using Claude API with structured outputs
Implement JSON schema validation for all outputs
Chain multiple Claude calls with different schemas
Integrate structured reports into a downstream system (mock SIEM/ticketing)
Measure accuracy and completeness of generated reports

Part 1: Set Up Report Generation Infrastructure

Report Generation Architecture

The problem: Manually writing security reports is:

Time-consuming (2-4 hours per incident)
Inconsistent (different analysts write differently)
Hard to standardize (no fixed structure)
Error-prone (human typing mistakes)

The solution: Use Claude to generate reports from raw incident data.

But "generate reports" is vague. You need:

Structure: Fixed schema (incident summary, timeline, impact, recommendations)
Consistency: Same format every time
Accuracy: Grounded in actual incident data, not hallucinated
Compliance: Meets regulatory requirements (GDPR, SOC 2, etc.)

🔑 Key Concept: Claude can write better prose than most humans. But you need to constrain it with schemas and context to ensure accuracy and consistency.

The Three Components of Report Generation:

Input: Raw incident data (logs, alerts, forensics results)
Processing: Claude reads input, applies a schema, generates structured output
Output: Final report (markdown, PDF, email, database entry)

Claude Code Workflow: Building a Report Template

Instead of writing Python infrastructure, use Claude Code to design the report:

Claude Code Prompt:

Design a security incident response report template with these properties:

1. STRUCTURE (Fixed sections that every report has):
   - Executive Summary (1 paragraph, <100 words)
   - Incident Timeline (bullet list, reverse chronological)
   - Technical Analysis (evidence-based, not speculative)
   - Impact Assessment (scope of compromise)
   - Containment Actions (what we did to stop it)
   - Remediation Steps (longer-term fixes)
   - Detection Recommendations (how to catch this in future)
   - Lessons Learned (what we got wrong, what we fixed)

2. SCHEMA (Structured data that machines can parse):
   - incident_id: String
   - detection_timestamp: ISO timestamp
   - containment_timestamp: ISO timestamp
   - mitti_minutes: Number (time to investigate)
   - mtts_minutes: Number (time to suppress)
   - affected_systems: [String]
   - affected_users: [String]
   - data_exposed: Bool
   - exposure_count: Number (how many records/accounts/files)
   - root_cause: String
   - severity: LOW|MEDIUM|HIGH|CRITICAL
   - recommended_actions: [{action: String, priority: HIGH|MEDIUM|LOW, owner: String}]

3. CONTENT REQUIREMENTS (What makes a good report):
   - Grounded in evidence (cite logs, not speculation)
   - Transparent about uncertainty ("We don't know X, recommend investigating Y")
   - Actionable (recommendations should be implementable)
   - Auditable (clear decision trail that regulators would accept)

Now, given this raw incident data, write a report:

[Include raw incident data: logs, alerts, forensics, timeline]

Generate output in JSON + Markdown:
- JSON section contains structured data (schema above)
- Markdown section contains prose (narrative, timeline, analysis)

Include: Which sections are high-confidence (well-supported by evidence)? Which are low-confidence (need more investigation)?

After Claude generates a report:

Ask:

"What data would you need to increase confidence in the root cause?"
"For the affected_users count, how did you determine it?"
"The recommended_actions—are they ordered by urgency or impact?"
"If I disagree with the severity rating, what evidence would change your mind?"

Iterative Refinement:

The power of Claude Code: You can refine the report in real-time.

"Rewrite the executive summary for a technical audience (CTO) vs. non-technical (CEO)"
"Add a section: 'Questions still open' with investigation steps"
"Format the timeline to show the interval between each event"
"Add compliance implications: How does this affect SOC 2 certification?"

Why Claude Code instead of building Python infrastructure:

With Python (structured outputs, JSON schemas, report generation), you need to:

Set up dependencies
Define Pydantic models
Handle parsing errors
Test edge cases
Debug mismatches between schema and actual Claude output

With Claude Code, you:

Iterate on structure immediately
See prose quality in real-time
Test different formats (markdown, JSON, HTML) in seconds
Refine content without redeploying

When to Graduate to Python:

Once you've iterated on the report design in Claude Code and you're happy with the structure, then you build Python infrastructure to:

Integrate with your incident management system
Automatically generate reports for every incident
Store reports in a database
Distribute via email/Slack
Track metrics (time to report, quality scores)

But the "what should the report contain and how should it be structured?" question—answer that first in Claude Code.

Remember: Each Claude call should be focused and have clear output expectations. By chaining calls with different schemas, you enable validation at each step and can retry individual steps if they fail validation.

Part 2: Test with Sample Alerts

Run the report generator with 5–10 sample alerts (provided or created):

python security_report_generator.py

Collect the generated reports and analyze: 1. How many reports passed validation on first try? 2. How many needed retry? 3. How accurate was the threat classification? 4. How complete was the enrichment?

Part 3: Integrate with Downstream Systems

Integration Architecture: From Report to Action

Once Claude generates a report, where does it go?

Options: 1. Email: Send to security leadership 2. SIEM: Ingest into your SIEM for correlation with other alerts 3. Ticketing: Create a Jira/ServiceNow ticket 4. Database: Store for compliance audit trail 5. Slack: Notify the SOC team

Each integration has different requirements:

SIEM needs structured fields (incident ID, severity, affected systems)
Email needs formatted prose (markdown, HTML)
Ticketing needs title + description + priority
Compliance database needs immutable record with signatures

Rather than building connectors now, design them first.

Claude Code Workflow: Integration Design

Claude Code Prompt:

I have a security incident report generated by Claude. It's in JSON + Markdown format:

{
  "incident_id": "INC-2026-0345",
  "severity": "HIGH",
  "affected_systems": ["app-prod-01", "db-backup-01"],
  "affected_users": 47,
  "root_cause": "Unpatched vulnerability in Jenkins",
  "recommended_actions": [...]
}

[Markdown report with full details]

I want to export this to multiple systems. Design how each system would ingest it:

1. **Splunk SIEM:**
   - What fields would you extract and send?
   - What format? (JSON, syslog, REST API?)
   - How would you handle missing fields?

2. **Jira ticketing:**
   - What should the ticket title be?
   - How to structure the description?
   - What custom fields (priority, component, due date)?
   - How to link related incidents?

3. **Compliance database:**
   - What immutable data must we store?
   - Who approves the incident? (audit trail)
   - How long to keep records? (retention policy)
   - How to ensure non-repudiation (nobody can deny it happened)?

4. **Email notifications:**
   - Different audience = different format?
   - CISO wants: 1-paragraph summary + risk
   - CTO wants: Technical details + remediation steps
   - Board wants: Business impact + regulatory implications

For each integration, show me: What data goes in? What format? What could go wrong?

Why design before building:

Integrations have operational impact:

Email to wrong person = data exposure
Jira ticket with wrong priority = misaligned response
SIEM ingestion without normalization = useless alerts
Compliance record without approval = audit failure

Design in Claude Code first. Test the logic. Then build the Python/API connectors.

Pro Tip: Most "integration failures" aren't code bugs—they're design mistakes. Wrong field mapped, wrong priority assigned, wrong person notified. Claude Code helps you catch these before code.

When to Build Integration Code:

Build actual integrations when:

You have a stable incident response process
You know your SIEM/ticketing system APIs
You have credentials/authentication set up
You want to automate (every incident triggers exports)

Don't build when:

You're still figuring out what data to capture
Your incident response process is changing
You're learning integrations (simulate them first)

Deliverables

Report Generator Code
- Complete implementation with three chained Claude calls
- Schema validation at each step
- Retry logic for validation failures
- Well-commented and documented
Schemas Documentation
- Threat classification schema
- Alert enrichment schema
- Recommendations schema
- Example inputs and outputs for each
Generated Reports
- 5–10 sample reports from real or realistic alert data
- All in valid JSON format
Validation Report
- How many reports passed validation on first try: ___
- How many required retry: ___
- Average Claude API calls per report: ___
- Any recurring validation issues: ___
Integration Demo
- Show reports being ingested into SIEM
- Show tickets being created in ticketing system
- CSV exports or log files showing integration

Sources & Tools

Week 8: Retrieval-Augmented Generation (RAG) for Security Knowledge

Day 1 — Theory & Foundations

Learning Objectives

Understand RAG architecture and how it improves LLM accuracy for domain-specific questions
Evaluate vector databases and embedding models for security applications
Design chunking strategies for security documents without losing critical context
Compare RAG, MCP tools, and fine-tuning for different security use cases
Implement source attribution for compliance and verification

Why RAG: The Accuracy Challenge

In Unit 1, you learned that LLMs have a knowledge cutoff. Claude's training data ends in February 2025—so it doesn't know about vulnerabilities disclosed in March 2026. Beyond the cutoff problem, LLMs also struggle with proprietary or domain-specific information: your company's security policies, your incident response procedures, your threat intelligence feeds.

A naive approach is to ask Claude directly: "What does our security policy say about password length?" The model might hallucinate a reasonable-sounding answer, which is catastrophically wrong in a compliance context.

Retrieval-Augmented Generation (RAG) solves this. Rather than relying on training data or hallucination, RAG:

Retrieves relevant documents from a knowledge base
Augments the prompt with those documents
Generates a response grounded in the retrieved documents

Now Claude answers: "Our security policy (Section 3.2, last updated Jan 15 2025) requires passwords of at least 16 characters for privileged accounts. [Document excerpt]." And you know the answer came from your actual policy, not an LLM hallucination.

🔑 Key Concept: RAG is not a replacement for LLMs; it's a pattern that augments LLM capabilities with retrieval. The LLM's role shifts from "know everything" to "synthesize retrieved information." This is a more honest and verifiable approach to AI in security.

When to use MCP, when to use RAG — the decision framework.

Use MCP when	Use RAG when
Data must be current (CVEs, live threat feeds)	Data is a curated static corpus (runbooks, policies)
Lookup is deterministic (ID → record)	Retrieval is fuzzy (concept → relevant chunks)
The source has an API	You own the documents
Precision matters more than coverage	Coverage matters more than precision

Common mistake: using RAG for CVE lookup. CVEs change — a RAG corpus goes stale. Hallucination risk is high when the model retrieves a near-match instead of the exact CVE. For CVE data, use the NVD API via MCP — deterministic, always current.

Context Engineering & the Capability Capacity Model

Beyond RAG, strategic context management is fundamental to agent reliability. The Capability Capacity Model from Agentic Engineering practice establishes that when an agent's context fills beyond 40%, performance degradation becomes measurable. This is not just a theoretical concern—it's a practical design constraint. An agent given 200K tokens of context, plus detailed tool definitions, plus system prompts, plus error examples, can lose accuracy precisely when you need it most: during high-stakes investigations.

Context engineering means:

Systematically structuring context in predictable formats (the ACE Playbook pattern)
Measuring context fill at each stage and staying below the 40% threshold
Prioritizing which context to include based on task relevance
Deferring auxiliary information to tool calls or retrieval systems rather than embedding it in every prompt

RAG is one mechanism; others include MCP tool servers, structured examples, and dynamic context selection.

Further Reading: See the Agentic Engineering additional reading on context engineering for the complete Capability Capacity Model and ACE Playbook format for organizing context systematically.

RAG Architecture: The Complete Pipeline

RAG consists of five stages:

1. Document Ingestion & Preprocessing

Raw documents (PDFs, markdown files, logs, threat reports) are loaded into the system. This might involve:

Extracting text from PDFs
Parsing structured documents (YAML, JSON)
Deduplicating and cleaning text
Removing sensitive information (personal data, credentials)

2. Chunking

Documents are split into manageable pieces. Too large chunks (entire documents) make retrieval inefficient. Too small chunks (single sentences) lose context. A typical security chunk is 300–800 tokens, roughly:

Example chunk from NIST SP 800-53:

"AC-2 ACCOUNT MANAGEMENT

Control: The organization manages information system accounts, including establishment,
activation, modification, review, deactivation, and removal...

Supplemental Guidance: Information system account management activities include:
(i) Identification of account type (e.g., individual, shared, system, guest/anonymous);
(ii) Establishment of conditions for group and role membership;
(iii) Assignment of access authorizations..."

Length: ~150 words, ~250 tokens

3. Embedding

Each chunk is converted into a high-dimensional vector (embedding) using an embedding model. Similar chunks have similar vectors. This enables semantic search: "What's our password policy?" matches documents about authentication even if they don't use the exact word "password."

Common embedding models for security:

Anthropic's Claude Embeddings — Optimized for longer documents (8K tokens)
OpenAI text-embedding-3-large — General-purpose, high-quality
Cohere embed-english-v3.0 — Good for compliance/legal text
Open-source (all-MiniLM-L6-v2) — Runs locally, no API calls

Embedding model selection is domain-specific. General-purpose embedding models (all-MiniLM-L6-v2, text-embedding-3-small) work well for most security content. For formal regulatory and compliance text (NIST, GDPR, legal language), domain-specific models like Cohere's embed-english-v3.0 often outperform general models on retrieval recall. The test: run retrieval eval on a sample of your actual corpus before committing to a model.

4. Storage & Indexing

Embeddings are stored in a vector database (also called a vector index), which enables fast similarity search. Popular choices:

Pinecone — Managed, scalable, low operational overhead
Weaviate — Open-source, flexible schema, good for security use cases
Chroma — Lightweight, great for prototypes, runs locally
Milvus — Open-source, high-performance, suitable for large deployments

A vector database stores millions of embeddings and can retrieve the 10 most similar to a query embedding in milliseconds.

Vector databases are a security boundary, not just a retrieval optimization. In production, the vector store becomes a hidden authority source for the agent: whatever is retrieved shapes what the model treats as relevant evidence. Four controls matter immediately: collection isolation (one tenant or trust domain must not semantically retrieve another's data), metadata enforcement (filters such as classification, region, and document owner must be applied before results are returned), ingestion provenance (every chunk should retain source, ingest time, and integrity context), and retention/deletion discipline (sensitive content embedded into vectors still counts as stored data). If you secure the prompt but not the vector store, your agent can still be steered by poisoned or unauthorized retrieval.

5. Retrieval & Augmentation

When an agent or user asks a question:

The question is embedded using the same embedding model
A similarity search finds the top-K most similar document chunks
These chunks are inserted into the prompt alongside the question
Claude answers based on the retrieved documents

Example RAG prompt:

You are a security policy assistant. Answer the user's question based on the provided documents.

RETRIEVED DOCUMENTS:
[Document 1: Policy section about password requirements]
[Document 2: Incident response procedure mentioning similar situation]
[Document 3: Compliance note linking to NIST controls]

USER QUESTION: What's our password policy for admin accounts?

INSTRUCTIONS: If the answer is in the documents, cite the source. If not, say "This is not covered in our policy documents."

Discussion Prompt: Your organization uses RAG to answer questions about security policies. A user asks, "Is it OK to share passwords with contractors?" RAG retrieves a document saying "Passwords may never be shared" but that document is from 2020 and your company updated this policy in 2024. How would you prevent this stale information from being used?

Securing retrieval means securing ranking, filtering, and freshness together. A secure RAG pipeline does not stop at "top-K similarity search." It should enforce metadata filters before ranking, prefer authoritative and current sources over merely similar ones, log which chunks were returned for every answer, and fail safely when the retrieved evidence is weak or contradictory. Retrieval quality is a security property when the agent's output can influence operational decisions.

Chunking Strategies for Security Documents

Naive chunking (breaking documents into equal-sized pieces) often fails for security content because context matters. A CVSS score of 9.5 is meaningless without knowing which vulnerability it describes.

Smart Chunking Strategies:

1. Semantic Chunking — Break at logical boundaries, not arbitrary token limits.

Instead of:

[150 tokens]
[150 tokens]  ← might split a control in half
[150 tokens]

Do:

AC-1 ACCESS CONTROL POLICY (340 tokens)  ← one complete control
AC-2 ACCOUNT MANAGEMENT (450 tokens)    ← another complete control

2. Hierarchical Chunking — Preserve document structure.

Document: NIST SP 800-53
  Section: AC (Access Control)
    Control: AC-2 Account Management
      Requirement: Organizations must establish account policies
      Supplemental Guidance: ...

When retrieving, retrieve at the appropriate level. For "What's our account management requirement?" retrieve the requirement level, not the entire section.

3. Overlap-Based Chunking — Add context overlap between chunks.

Chunk 1: [sentences 1–15] Chunk 2: [sentences 12–27] ← overlap with Chunk 1 Chunk 3: [sentences 25–40] ← overlap with Chunk 2

This ensures that related information isn't split across chunks.

4. Metadata Enrichment — Attach metadata to chunks.

{
  "chunk_id": "AC-2-001",
  "document": "NIST SP 800-53",
  "section": "AC (Access Control)",
  "control_id": "AC-2",
  "title": "Account Management",
  "text": "...",
  "applicable_systems": ["web_servers", "databases"],
  "last_updated": "2024-01-15",
  "severity": "high"
}

When retrieving, filter by metadata. "What controls apply to databases?" retrieves only chunks with applicable_systems: ["databases"].

Common Pitfall: Storing too much metadata in the chunk itself increases embedding size and retrieval latency. Use a hybrid approach: store metadata separately in the vector database, use it for filtering, but don't embed it.

Metadata is a retrieval quality lever. Every chunk in your RAG corpus should carry metadata that enables filtered retrieval:

source_date — when the document was last updated (enables freshness filtering)
doc_type — runbook, policy, threat intel, CVE, vendor advisory
classification_level — public, internal, confidential
relevant_tools — which MCP tools this document supports

A retrieval pipeline that ignores metadata will surface outdated runbooks alongside current ones. Timestamp-aware retrieval — downweighting chunks where source_date exceeds 90 days — is a simple technique with significant impact on answer quality.

Vector Databases for Security: A Comparison

Database	Pros	Cons	Best For
Pinecone	Managed, scales easily, good UI	Vendor lock-in, pricing	Production systems with budget
Weaviate	Open-source, flexible, GraphQL API	More operational overhead	Organizations wanting full control
Chroma	Lightweight, runs locally, simple	Not for large deployments (>1M embeddings)	Prototypes, small teams
Milvus	Highly scalable, open-source, fast	Steeper learning curve	Large-scale deployments (millions of documents)

For a typical security team (5000–50000 security documents), Chroma or Weaviate are good starting points.

RAG vs. MCP Tools vs. Fine-Tuning

Approach	Use Case	Pros	Cons
RAG	Large, changing document collections (policies, threat reports, runbooks)	No retraining, always current, supports source citation	Retrieval quality depends on chunking and embeddings
MCP Tools	Real-time queries (current incident status, live system queries)	Deterministic, fresh data, can be composed	Requires tool infrastructure, not for unstructured knowledge
Fine-tuning	Consistent writing style or specific domain terminology	Model learns patterns	Expensive to train/retrain, outdated by knowledge cutoff

Most security teams use all three:

RAG for policies, procedures, threat reports (documents)
MCP tools for live queries (incidents, IP reputation, system status)
Fine-tuning rarely; only for very specific style/tone requirements

Source Attribution in RAG

A critical requirement for compliance: every answer must cite where information came from. Bad practice:

Q: What are our password requirements?
A: Passwords must be at least 16 characters.

Good practice:

Q: What are our password requirements?
A: According to our Access Control Policy (AC-2, updated Jan 15 2024),
"Passwords for privileged accounts must be at least 16 characters,
with complexity requirements including uppercase, lowercase, digits, and special characters."

Source: /policies/access-control/AC-2-Account-Management.md, Section 3.2

Implementation:

def rag_answer_with_citation(question: str, top_k_documents: list) -> dict:
    # Augment prompt with retrieved documents
    prompt = f"""
    Answer this question based ONLY on the provided documents.
    Cite your sources.

    QUESTION: {question}

    RETRIEVED DOCUMENTS:
    """

    for doc in top_k_documents:
        prompt += f"\n[Source: {doc['source']}, Last updated: {doc['updated']}]\n{doc['text']}\n"

    prompt += "\nRespond with the answer and explicit citations."

    response = claude.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        messages=[{"role": "user", "content": prompt}]
    )

    return {
        "question": question,
        "answer": response.content[0].text,
        "sources": [doc['source'] for doc in top_k_documents],
        "retrieved_documents": len(top_k_documents)
    }

Further Reading: Lewis et al., "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (arxiv). This foundational paper explains RAG and its advantages.

Day 1 Deliverable

Design a RAG system for a security use case (policy assistant, threat intelligence, incident response knowledge base, etc.):

Knowledge Base Strategy — What documents/data would you ingest? How would you keep them current?
Chunking Plan — How would you chunk these documents? Provide examples of 2–3 chunks.
Embedding Model Choice — Which embedding model? Why?
Vector Database Choice — Which database? Why? How many documents/embeddings?
Retrieval Strategy — How many documents (top-K)? Any filtering? How do you evaluate retrieval quality?
Source Attribution — How would you ensure every answer cites sources?
Quality Evaluation — How would you measure RAG system accuracy? (RAGAS metrics, human evaluation, etc.)

(3–4 pages, ~1200–1500 words)

Day 2 — Hands-On Lab

Lab Objectives

Build a working RAG system with a security knowledge base
Implement document chunking and embedding
Set up a vector database for similarity search
Create a RAG-powered assistant that answers security questions with citations
Measure accuracy and quality of retrieved documents

Part 1: Set Up RAG Infrastructure

RAG (Retrieval-Augmented Generation) Architecture

RAG solves a critical problem: Claude's knowledge cutoff is February 2025. If you need to analyze incidents using data from March 2026 (your company's incident database, threat intelligence from last week, newly published CVE information), Claude can't access it natively.

RAG works like this:

Retrieval: When Claude analyzes an incident, fetch relevant context from:
- Previous similar incidents (your database)
- Recent threat intelligence (feeds)
- Organizational policies (stored as text)
- Known vulnerabilities and patches (your asset inventory)
Augmentation: Inject that context into Claude's prompt
Generation: Claude analyzes the incident with that fresh context

Example:

Without RAG:

User: "Is CVE-2026-12345 actively exploited?"
Claude: "I don't have data on 2026 vulnerabilities; my knowledge cutoff is February 2025."

With RAG:

User: "Is CVE-2026-12345 actively exploited?"
System retrieves: [threat intel report from yesterday showing active exploitation]
System injects: "Recent threat intelligence (March 5, 2026) reports: CVE-2026-12345 is being actively exploited by APT-28..."
Claude: "Based on the threat intel provided, CVE-2026-12345 is actively exploited. We should prioritize patching systems..."

🔑 Key Concept: RAG doesn't make Claude smarter. It makes Claude current. Your incident data is always fresher than Claude's training data.

RAG Components:

Vector Database: Store and search documents by semantic meaning
- Previous incidents → searchable by attack type, technique, severity
- Threat intel → searchable by APT, exploit, vulnerability
- Policies → searchable by topic (MFA, encryption, incident response)
Retrieval Logic: Given an incident, fetch relevant documents
- Semantic search (find similar incidents using embeddings)
- Keyword search (find mentions of IOCs, CVEs)
- Metadata filtering (only policies updated in last 30 days)
Prompt Injection: Safely add retrieved context to Claude's prompt
- Format as markdown or structured text
- Include source and timestamp (so Claude knows how old the data is)
- Limit context size (don't overwhelm Claude with 100 documents)

Claude Code Workflow: Design RAG

Instead of building a vector database, design the system first:

Claude Code Prompt:

I'm building a RAG system for security incident analysis. Here's my data:

PREVIOUS INCIDENTS:
- INC-2025-0102: Ransomware via phishing, Emotet malware, 48 hours to recover
- INC-2025-0087: Insider data theft, admin account abuse, regulatory notification
- INC-2025-0156: Supply chain compromise, third-party software, 2-week investigation

THREAT INTELLIGENCE (CURRENT):
- APT-28 targeting manufacturing sector with spear-phishing
- CVE-2026-12345: RCE in Windows SMB, actively exploited by multiple groups
- New extortion campaign: Scattered Spider variant, targeting financial services

ORGANIZATIONAL POLICIES:
- MFA required for all remote access
- Data classification: PII, Financial, Public
- Incident escalation: MEDIUM triggers email to CISO, HIGH triggers war room

NEW INCIDENT:
- Alert: Unusual SMB traffic from user account "jchen" to multiple systems
- Source: IP 203.45.12.89 (Singapore)
- User: John Chen, Finance VP
- Time: March 5, 2026

Walk me through RAG:

1. RETRIEVAL: What should the system search for?
   - "Unusual SMB traffic" → Find incidents with lateral movement?
   - "John Chen" → Find incidents involving this user?
   - "Singapore IP" → Find incidents from that geography?
   - "CVE-2026-12345" → Check if any previous incidents involved SMB vulnerabilities?

2. AUGMENTATION: Which previous incidents are relevant to show Claude?
   - INC-2025-0156 involved supply chain (different threat actor)?
   - INC-2025-0102 involved phishing (different vector)?
   - Neither is a perfect match, but what can Claude learn?

3. GENERATION: How should Claude use this context?
   - "Based on previous incident INC-2025-0102, phishing-based attacks often lead to lateral movement within 24 hours"
   - "APT-28 is currently targeting the financial sector"
   - "We have MFA enabled, so if John's account was compromised, attacker would need MFA too"

What documents would be most useful for Claude to see? In what order?

After Claude designs the RAG logic:

Ask:

"How would you handle if no previous incidents match?"
"Should recent incidents weight higher than old ones?"
"What if conflicting policy documents (old vs. new version)?"
"How do you prevent false positives from irrelevant documents?"

Why Claude Code instead of building RAG now:

RAG infrastructure is non-trivial:

Vector database (Pinecone, Weaviate, Elasticsearch)
Embedding model (how to represent documents as vectors)
Retrieval algorithm (semantic + keyword + filtering)
Indexing pipeline (parse documents, chunk them, embed them)
Monitoring (is retrieval working? are we getting relevant documents?)

Before investing in that infrastructure, answer: "What documents do we need? How often do they change? How should they be searched?" Claude Code helps you answer these questions first.

When to Build a Real RAG System:

Build actual RAG when:

You have incident data to index (100+ previous incidents)
You have threat intelligence feeds (continuously updated)
You have organizational policies (stored as documents)
You want automation (every new incident automatically retrieves context)
Scale matters (analyzing dozens of incidents/day)

For learning, simulating RAG in Claude Code is enough: "Imagine these are previous incidents. Which would be relevant? How would they change the analysis?"

Remember: The quality of RAG depends on three factors: document quality, chunking strategy, and retrieval ranking. Invest time in understanding why certain documents are retrieved and adjust chunking/metadata if needed.

Part 2: Test the RAG System

Run the system:

python security_rag_system.py

Measure: 1. Retrieval Precision — Does the system retrieve relevant documents? 2. Answer Quality — Are answers accurate and well-cited? 3. Citation Coverage — Do answers cite sources?

Part 3: Compare RAG vs. Unaugmented Claude

Test Claude without RAG on the same questions:

def answer_without_rag(question: str) -> str:
    """Answer without RAG—just Claude's training data."""
    response = claude_client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        messages=[{
            "role": "user",
            "content": f"Answer this security question: {question}"
        }]
    )
    return response.content[0].text

# Compare
print("WITH RAG:", rag_answer["answer"])
print("\nWITHOUT RAG:", answer_without_rag(question))

Document differences in accuracy, detail, and confidence.

Growing Your Context Library: Tool Patterns

In Unit 2, you've built MCP servers and defined tools. Now it's time to capture the patterns that work—tool definition schemas, error handling approaches, structured output templates. These become your "style guide" for Claude Code to follow.

🔑 Key Concept: A tool definition isn't just code—it's a contract. When you write a tool schema that's clear, validates input properly, and returns structured output, Claude learns to respect that contract. Save your best tool definitions in your context library. Next project, paste them as context: "Here are my preferred tool patterns. Use this style." Claude Code will follow your conventions.

Why This Matters for Unit 2 Specifically:

Tool Design is Hard: Getting tool definitions right requires iteration—constraint specification, error handling, output validation. Your Unit 2 work is refined. Capture it.
Consistency Across Projects: When you build many MCP servers, tool definitions should feel familiar. Your library ensures this.
Onboarding New Projects: In Unit 3 and 4, you'll build different tools. Referencing your Unit 2 patterns ensures consistency and saves design time.

Expand Your Context Library Structure

Add a new section to your existing context-library/:

mkdir -p ~/context-library/patterns/tool-definitions
mkdir -p ~/context-library/patterns/error-handling

Unit 2 Task: Extract Tool Patterns

In this unit, you've designed and built:

Tool Definition Schema: The structure for defining tools (parameters, constraints, return types)
Error Handling Pattern: How tools fail gracefully and communicate errors back to the agent
Structured Output Template: The JSON/response format tools should return

Capture These Patterns:

Add to context-library/patterns/tool-definitions/mcp-tool-schema.md:

# MCP Tool Definition Template

## Description
[Clear one-liner about what the tool does]

## Parameters Schema
[Your standard parameter validation approach]
[Include examples of well-constrained parameters]

## Return Schema
[Your standard output format]
[Include error handling response format]

## Example: [One real tool from Unit 2]
[Complete definition showing the pattern in action]

Add to context-library/patterns/error-handling/tool-errors.md:

# Tool Error Handling Pattern

## Error Categories
[Classification of errors: validation, timeout, access control, etc.]

## Response Format
[How your tools communicate failures to agents]

## Example Failure Scenarios
[Real cases from Unit 2 testing]

Add to context-library/prompts/tool-design.md:

# Tool Design Decision Framework

When designing a new tool, ask:
1. What is the minimal capability set?
2. What inputs MUST be validated?
3. How does this tool fail? (timeout, permissions, malformed input)
4. What should the agent do when it fails?

[Examples from Unit 2]

How to Use Your Library in Future Sessions

When you start a new Claude Code project in Unit 3 or 4, provide this context:

Here are my preferred patterns for tool definitions and error handling.
When you design new tools, follow this style:

[Paste your context-library/patterns/tool-definitions/mcp-tool-schema.md]
[Paste your context-library/patterns/error-handling/tool-errors.md]

This ensures consistency and alignment with my standards.

Pro Tip: Review your library. Did you discover any error handling patterns Unit 2 that surprised you? Did you learn a better way to structure tool output? Update your library entries—they should evolve as you learn.

Deliverables

RAG System Code
- Document ingestion and chunking
- Vector database setup (Chroma)
- Similarity search
- Claude integration with RAG prompts
- Source citation
Knowledge Base Documentation
- List of documents in the system
- Chunking strategy used
- Embedding model
- Sample chunks and embeddings
Evaluation Report
- Retrieval precision (how many correct documents retrieved?)
- Answer accuracy (were answers correct and well-cited?)
- Comparison vs. unaugmented Claude
- Any hallucinations or failures
Sample Q&A
- 10–15 example questions with RAG answers
- Citations included
- Evaluation of answer quality
Performance Metrics
- Avg time to retrieve documents: _____ ms
- Avg time to generate answer: _____ ms
- Total latency (retrieval + generation): _____ ms

Sources & Tools

Summary

Unit 2 equips you with the modern toolkit for AI-powered security:

Week 5 (MCP) — Standard interfaces for agents to discover and use tools
Week 6 (Tool Design) — Secure, composable, observable tool architecture
Week 7 (Structured Outputs) — Machine-readable reports that integrate with downstream systems
Week 8 (RAG) — Domain-specific knowledge systems with source attribution

Together, these techniques enable you to build AI agents that are secure, auditable, and grounded in your organization's knowledge and tools.

Next: Unit 3 will explore building production-ready security agents that integrate these technologies into real incident response workflows.