Lab Guide: Unit 4 — Rapid Prototyping with Agentic Tools

CSEC 601 | Weeks 13–16 | Semester 1

Four weeks of agentic engineering: multi-agent architecture (Week 13), Sprint I build (Week 14), Sprint II hardening (Week 15), and midyear presentations (Week 16). This is where everything from Semester 1 comes together.

Claude for velocity, not shortcuts: Use Claude for velocity, not shortcuts. The goal is to learn production thinking fast — not to avoid thinking. Ask Claude to review your architecture decisions, not to make them for you.
Unit 4 Lab Progress0 / 34 steps complete

Week 13 — Multi-Agent Architecture Deep Dive

WEEK 13Lab: Build a Multi-Agent SOC System with Worktrees

Lab Goal: Build a four-agent SOC system (Orchestrator + Recon Agent + Analysis Agent + Reporting Agent) using Claude Agent SDK, with worktrees for parallel development. Implement the Hierarchical orchestration pattern.

Tool-agnostic framing: worktrees and multi-agent patterns

Worktrees are a git feature, not a Claude Code feature. Any IDE or terminal supports them. VS Code Multi-Root Workspaces, Cursor, and JetBrains all work with git worktrees. The Claude Code integration shown here is one way to use them.

Orchestrator + specialized worker is a pattern, not a framework. Whether you use the Anthropic SDK (shown here), Claude Managed Agents, OpenAI Agents SDK, or AutoGen — the architecture is the same: one agent routes and coordinates, specialized agents execute with focused context and tool scope. The framework changes; the pattern doesn't.

Knowledge Check — Week 13

1. Why do worktrees enable parallel agent development without conflicts?

2. What distinguishes the Expert Swarm Pattern from Hierarchical orchestration?

3. Why can multiple specialized agents sometimes cost LESS than one large general agent?

Try /worktree-setup for Parallel Agent Development

This lab has you building three agents simultaneously across isolated worktrees. The /worktree-setup skill generates the exact commands for your project — worktree create commands, directory map, and integration merge steps — based on your spec.

⭳ Download worktree-setup.md 👁 All Course Skills
curl -o ~/.claude/commands/worktree-setup.md https://raw.githubusercontent.com/r33n3/Noctua/main/docs/skills/worktree-setup.md
# Then in Claude Code, after your spec is ready:
/worktree-setup three-agent SOC system: recon-agent, analysis-agent, reporting-agent

What you're learning here transfers beyond Claude Code. Worktrees are a git feature — not a Claude Code invention. The git worktree add command works in any git repository, with any development tool. Agent orchestration patterns (supervisor → specialist, parallel fanout, sequential pipeline) appear in Claude Managed Agents, OpenAI Agents SDK, AutoGen, and the Claude Agent SDK with different syntax but identical architecture. The mental model you build this week — how to decompose a complex task into a supervised multi-agent pipeline — is framework-independent. The syntax is not.

Lab Exercise: Three-Agent SOC System

Architecture: Incident arrives → Orchestrator routes to Recon Agent (gathers IoC data via your MCP tools) → Analysis Agent (applies CCT framework to synthesize findings) → Reporting Agent (generates structured JSON + Markdown report). All three run under the Orchestrator's control.
cd ~/noctua-labs
git init soc-agent-system && cd soc-agent-system
git commit --allow-empty -m "init"
# Create worktrees for each agent component
git worktree add ../soc-recon   -b feature/recon-agent
git worktree add ../soc-analysis -b feature/analysis-agent
git worktree add ../soc-reporting -b feature/reporting-agent
cd ~/soc-recon
claude
# Prompt: "Build a recon security agent in Python using the Anthropic SDK.
# The agent receives raw alert text and must:
# 1. Extract all IoCs (IPs, hashes, CVEs, domains)
# 2. Use tool calls to enrich each IoC via the MCP tools
# 3. Return structured JSON: {iocs: [{type, value, enrichment, severity}], summary}
# Use claude-sonnet-4-6. Include error handling for failed tool calls."
cd ~/noctua-labs/soc-agent-system
claude
# Prompt: "Build orchestrator.py that coordinates three agents in sequence.
# Import: recon_agent.run(alert_text), analysis_agent.run(recon_json),
# reporting_agent.run(analysis_json, start_ts)
# Requirements:
# - 30s asyncio timeout per agent via asyncio.wait_for()
# - If recon times out: pass partial data to analysis with recon_status='timeout'
# - If analysis fails: reporting still runs with error summary
# - Return: {mtti_seconds, recon_status, analysis_status, report}
# - Log each stage with timestamp to structured JSON log"
Alternative: Claude Managed Agents

The system you just built manages the orchestration loop in Python — your code calls each agent in sequence, handles errors, and wires results together. Claude Managed Agents is Anthropic's hosted version of this pattern: you define the agent config once, start a session per investigation, and stream events. The container, tools, and agent loop run on Anthropic's infrastructure.

The same SOC investigation, rebuilt as a Managed Agent:

import anthropic

client = anthropic.Anthropic()

# ── ONE-TIME SETUP (run once, save the IDs) ──────────────────────────────────
agent = client.beta.agents.create(
    name="SOC Analyst",
    model="claude-sonnet-4-6",
    system="""You are a senior SOC analyst. When given a security alert:
1. Enrich all IoCs (IPs, domains, hashes) using available tools
2. Apply CCT analysis: identify TTPs, generate 3 hypotheses with probabilities
3. Produce a structured incident report in JSON + narrative summary""",
    tools=[{"type": "agent_toolset_20260401"}],  # bash, read/write, web_search all included
)
environment = client.beta.environments.create(
    name="soc-env",
    config={"type": "cloud", "networking": {"type": "unrestricted"}},
)
# Save agent.id and environment.id — reuse across all investigations

# ── PER-INVESTIGATION (run for each new alert) ────────────────────────────────
session = client.beta.sessions.create(
    agent=agent.id,
    environment_id=environment.id,
    title="Meridian Financial Incident",
)

with client.beta.sessions.events.stream(session.id) as stream:
    client.beta.sessions.events.send(
        session.id,
        events=[{"type": "user.message", "content": [
            {"type": "text", "text": meridian_incident_text}
        ]}],
    )
    for event in stream:
        if event.type == "agent.message":
            for block in event.content:
                if hasattr(block, "text"):
                    print(block.text, end="")
        elif event.type == "agent.tool_use":
            print(f"\n[Tool: {event.name}]")
        elif event.type == "session.status_idle":
            print("\n── Investigation complete ──")
            break

Custom orchestrator vs. Managed Agents — when to choose:

  • Custom SDK orchestrator (this lab): use when you need custom tool execution on your own infrastructure, approval gates before sensitive actions, or fine-grained control over every agent call.
  • Claude Managed Agents: use when you want Anthropic to host the container and run the loop — ideal for long-running investigations, file-heavy work, and teams that don't want to manage agent infrastructure.

The orchestrator pattern you designed this week transfers directly to Managed Agents. The agent's reasoning logic (recon → analysis → report) lives in the system prompt; what changes is who runs the infrastructure underneath it. You'll evaluate both approaches quantitatively in Unit 5.

Week 13 Deliverables
  • soc-agent-system/ — complete multi-agent repository with all three agents and orchestrator
  • Architecture Diagram — data flow diagram showing alert → Orchestrator → Recon → Analysis → Reporting → output
  • MTTI Comparison — Week 1 manual vs. Week 13 automated, with cost analysis

Week 14 — Rapid Prototyping Sprint I

WEEK 14Lab: Concept to Working Demo in 3 Hours

Lab Goal: Execute a time-boxed rapid prototyping sprint: 20 minutes for problem scoping and spec, 2 hours for building, 40 minutes for testing and demo prep. You will measure MTTS, MTTP, and MTTSol for your sprint. This prototype becomes the foundation of your midyear project.

Knowledge Check — Week 14

1. In the sprint context, what does MTTS measure?

2. What does the Think → Spec → Build → Retro cycle prescribe for time allocation?

Use /think + /build-spec for Your 20-Minute Scoping Phase

The Think → Spec → Build → Retro cycle was built for exactly this sprint format. In your 20-minute planning window: run /think first to validate your chosen problem, then run /build-spec to produce the formal 1-page spec. Both skills give you structured output you can paste directly into the build phase.

⭳ Download spec.md 👁 All Course Skills
curl -o ~/.claude/commands/build-spec.md https://raw.githubusercontent.com/r33n3/Noctua/main/docs/skills/build-spec.md
# Sprint scoping sequence in Claude Code:
/think I want to build a phishing email triage pipeline for this sprint. What are the risks and unknowns?
# Then after validating direction:
/build-spec phishing email triage pipeline — single sprint, Claude agent + MCP tools, 3-hour build window

Track two metrics, not one. Most students track MTTP (Mean Time to Prototype) — time from "go" to working demo. Also track MTTS (Mean Time to Spec) — time from "go" to a written, signed-off spec. Why:

  • MTTP tells you how fast you execute
  • MTTS tells you how fast you make design decisions
  • If MTTP improves by compressing MTTS (skipping the spec to start building faster), you've traded evaluation validity for speed

The spec phase should not be rushed. It is where security decisions are made.

Required: AIUC-1 pre-check before finalizing your spec (5 minutes).

Before writing a single line of code, answer these four questions and document them in sprint1/aiuc1-precheck.md:

  1. What data does this system process? (Email content? Does it contain PII? Attachments?)
  2. Who is affected by a wrong decision? (False positive = missed real threat reported to nobody. False negative = analyst alerted on legitimate email.)
  3. Which AIUC-1 domains are in scope? (B: Security — does the system have access to security-sensitive data? D: Reliability — what happens when it's wrong? E: Accountability — who reviews its decisions?)
  4. What human oversight exists for high-severity outputs? (Does a P1 classification auto-escalate, or does a human review first?)

The full AIUC-1 audit happens in Unit 3 (which you've already completed). This pre-check ensures your Sprint I design avoids the gaps your Unit 3 audit identified.

Lab Exercise: Sprint I — Timed Prototype

Timer is active: The lab instructor will signal the start. Record your exact start time. Each phase is time-boxed — when the phase ends, move on regardless of completion state. Track your metrics honestly.

Finding the skill

The /audit-aiuc1 skill is included in the course skills bundle. The skill file is at .claude/skills/audit-aiuc1/SKILL.md.

mkdir -p ~/noctua-labs/unit4/sprint1
cd ~/noctua-labs/unit4/sprint1
git init
# Start building with Claude Code immediately — spec in context
claude
# Prompt: "I am building [your chosen system]. Here is my spec:
# [paste your 1-page spec]. Let's start with the core agent loop.
# Build [first component] first."
/check-antipatterns ~/noctua-labs/unit4/sprint1/

# Required: Zero CRITICAL findings to pass Sprint I
# Document: HIGH findings → deferred to Sprint II with justification
# Include: report output in Sprint I deliverables package
Discussion (~10 min): The Cost of Being Thorough

Setup: The semi-formal approach took 2.8x more agent steps. Full harness runs cost $125–200 and took 4–6 hours. The quick version was $9 and took 20 minutes.

Discussion prompt: Your sprint has a token budget. Running /check-antipatterns with full semi-formal analysis costs 3x more than a quick scan. You have 5 components to assess. Do you run the thorough check on all 5? Or do you triage — run the quick check on everything, and the thorough check on the 2 highest-risk components? How do you decide WHICH components get the thorough check? What makes a component high-risk?

Key insight: Thoroughness is a budget allocation problem, not a binary choice. The answer is tiered verification: every commit gets quick checks (linting, basic /code-review); every PR gets standard checks (/code-review with confidence threshold); before deployment gets full semi-formal analysis (all three evaluators); high-risk components get additional manual review on top. The cost of a thorough assessment is always less than the cost of a missed vulnerability in production. But you can't run thorough assessments on everything — you'd never ship. Match the reasoning depth to the stakes of the decision.

Instructor note: Have students actually measure the cost difference. Run /check-antipatterns on a component and record the token cost. Then run it with explicit instructions to "trace every function call and provide file:line evidence for every finding." Compare costs. The difference makes the tradeoff concrete.

Course connection: /cost tracking, /effort levels, three-tier code review, sprint budget management. Students should track their actual token costs for quick vs thorough assessments across the sprint.

Source: Ugare & Chandra, "Agentic Code Reasoning," arXiv:2603.01896v2

Week 14 Deliverables
  • Sprint I Prototype — code repository with README, working end-to-end (even if incomplete)
  • Sprint I Metrics — MTTS, MTTP, MTTSol, token cost, and completion percentage
  • 5-Minute Demo Script — written and rehearsed presentation for Sprint I showcase
  • AI Methodology Note — 100-word description of how Claude Code was used in the sprint

Week 15 — Rapid Prototyping Sprint II: Hardening

WEEK 15Lab: Iterate, Harden, and Production-Ready Quality

Lab Goal: Transform your Sprint I prototype into a hardened, documented, and ethically audited system. Apply the full quality checklist: error handling, input validation, logging, CCT analysis, AIUC-1 compliance, performance measurement.

Knowledge Check — Week 15

1. What does 'hardening' a security prototype specifically require?

Dependency review before you plan. Before committing to Sprint II scope, audit your dependencies:

  • What Python packages do you need that aren't currently installed?
  • What does adding them require (compilation? system libraries? significant disk space?)
  • Do any planned remediations (PII scanning, encryption, external API integrations) require new dependencies?

Dependencies discovered mid-sprint become scope constraints. A PII scanner that requires a 200MB model download or a complex compilation step can block a sprint if discovered on day two. Five minutes of dependency review at sprint start saves hours of mid-sprint replanning.

Lab Exercise: Sprint II Hardening Checklist

import logging, json, hashlib, time

class JSONFormatter(logging.Formatter):
    def format(self, record):
        return json.dumps({
            'ts': self.formatTime(record),
            'level': record.levelname,
            'event': record.getMessage(),
            'agent': getattr(record, 'agent', None),
            'tool': getattr(record, 'tool', None),
        })

# Use: logger.info("tool_call", extra={'agent':'recon','tool':'query_cve'})
/check-antipatterns ~/noctua-labs/unit4/sprint1/

# Track improvement:
# Sprint I:  CRITICAL __ HIGH __ MEDIUM __
# Sprint II: CRITICAL __ HIGH __ MEDIUM __
# Target: READY status (no CRITICAL or HIGH)
import anthropic, json, os

client = anthropic.Anthropic()  # uses ANTHROPIC_API_KEY from env

# ── STEP 1: Deploy the agent (run ONCE — save the IDs) ───────────────────────
# Paste your Sprint II system prompt below:
SYSTEM_PROMPT = """
[Your hardened Sprint II system prompt here — the one that passed AIUC-1 audit]
"""

agent = client.beta.agents.create(
    name="Unit 4 Sprint II — <your tool name>",
    model="claude-sonnet-4-6",
    system=SYSTEM_PROMPT,
    tools=[{"type": "agent_toolset_20260401"}],  # bash, file ops, web_search included
)
environment = client.beta.environments.create(
    name="unit4-sprint2-env",
    config={"type": "cloud", "networking": {"type": "unrestricted"}},
)

# Save IDs — you'll reuse these for every session
ids = {"agent_id": agent.id, "environment_id": environment.id}
with open("managed_agent_ids.json", "w") as f:
    json.dump(ids, f, indent=2)
print(f"Deployed: {ids}")

# ── STEP 2: Run a test session against your Sprint I test cases ───────────────
with open("managed_agent_ids.json") as f:
    ids = json.load(f)

session = client.beta.sessions.create(
    agent=ids["agent_id"],
    environment_id=ids["environment_id"],
    title="Test Session — Week 15",
)

test_input = "[Paste your Sprint I test case #1 here]"

with client.beta.sessions.events.stream(session.id) as stream:
    client.beta.sessions.events.send(session.id, events=[{
        "type": "user.message",
        "content": [{"type": "text", "text": test_input}]
    }])
    for event in stream:
        if event.type == "agent.message":
            for block in event.content:
                if hasattr(block, "text"):
                    print(block.text, end="", flush=True)
        elif event.type == "agent.tool_use":
            print(f"\n[Tool: {event.name}]", flush=True)
        elif event.type == "session.status_idle":
            print("\n\n── Session complete ──")
            break
After deployment: answer these questions
  • Did your agent produce the same output as the local version? If not — why? (context window, tool availability, system prompt differences?)
  • How long did the session take vs. local execution? What drove the difference?
  • Your agent is now "live" — anyone with the session API and your agent ID can run it. What access controls would a production deployment need?
  • Look at tools/mass/claude-managed-agents/ in this repo — that's the MASS scanner deployed the same way. Your Sprint II agent is structurally identical. In Unit 5 you'll compare this pattern against two others.
Discussion (~10 min): Semi-Formal vs Fully Formal — The Practical Middle

Setup: The paper explicitly positions semi-formal reasoning between unstructured chain-of-thought (too loose) and fully formal verification in Lean or Coq (too rigid). Unstructured: 78% accuracy. Semi-formal: 88% accuracy. Fully formal: theoretically 100% accuracy on whatever you can formalize — but formalizing a Django codebase with Python, PostgreSQL, Redis, and three API integrations would take years.

Discussion prompt: Why not just use formal verification for everything? Let students identify: cost, time, scope limitations. Then push: "Is there any part of your capstone system that SHOULD be formally verified? What about the parts that can't be?"

Key insight: Three levels of verification rigor. Informal (chain-of-thought): fast, cheap, broad scope, 78% accuracy — use for quick checks, brainstorming, initial triage. Semi-formal (structured templates): moderate cost, broad scope, 88% accuracy — use for security assessments, code review, audit, most production work. Fully formal (Lean, Coq, proof systems): expensive, narrow scope, 100% accuracy on formalized scope — use for cryptographic implementations, authentication logic, critical safety properties. For your capstone: your agent's system prompt governance is semi-formal. Your IAM policies are closer to formal (AWS validates them against a policy language). Your overall security posture is assessed semi-formally (AIUC-1 audit with evidence chains). The mix is intentional.

Instructor note: This is a good capstone prep discussion because it frames what the CISO panel will ask: "How confident are you in this assessment? What's the rigor level? Where would formal verification add value? Where is it impractical?" Students who can articulate the rigor-cost-scope tradeoff demonstrate senior engineering judgment.

Course connection: Engineering Assessment Stack (start simple, escalate when needed), four defense layers, the entire course philosophy of "minimum viable rigor for the stakes involved."

Source: Ugare & Chandra, "Agentic Code Reasoning," arXiv:2603.01896v2

Week 15 Deliverables
  • Hardened Prototype — complete code with error handling, validation, logging, and README
  • Sprint II AIUC-1 Governance Audit — AIUC-1 compliance matrix for the hardened prototype
  • Sprint I vs II Comparison — metrics comparison table showing measurable improvements
  • Managed Agent Deploymentmanaged_agent_ids.json proving a live deployment + answers to the 4 post-deployment questions
Close the Cycle: Run /retro Before Week 16 Presentations

Before your demo, run a structured retrospective on both sprints. The /retro skill produces a document comparing what you spec'd vs. what you built, what worked, what didn't, and what you'd carry into a third sprint. This feeds your presentation directly — and becomes part of your portfolio.

⭳ Download retro.md 👁 All Course Skills
curl -o ~/.claude/commands/retro.md https://raw.githubusercontent.com/r33n3/Noctua/main/docs/skills/retro.md
# After Sprint II, in Claude Code:
/retro Unit 4 Sprint I + II — phishing triage pipeline
Merge Before You Present: Run /merge-worktrees

If you built Sprint II across multiple worktrees, merge them now — before Week 16 presentations. Presenting from an unmerged worktree means your demo runs against a branch that won't survive the sprint. The /merge-worktrees skill handles conflict resolution, runs your test suite after each merge, and produces a merge report.

# From your main Claude Code session (not inside a worktree):
/merge-worktrees

Critical: Run this from your main session, not from inside a worktree. The skill merges worktrees into main — it can't run from inside one of the branches being merged.

⭳ Download merge-worktrees.md 👁 All Course Skills

Week 16 — Midyear Project Presentations

WEEK 16Lab: Demo, Defend, Reflect

Lab Goal: Present your hardened prototype to the class and instructor panel. Defend architectural decisions using CCT. Conduct peer reviews. Reflect on the semester's learning trajectory.

Class Discussion — Before Presentations

Before the demo session begins, take 15 minutes for structured reflection. These questions are the synthesis moment — 16 weeks of CCT, tools, ethics, and prototyping converge here.

  • Week 1 vs. Week 16 MTTI: Your MTTI at Week 1 was roughly 26 minutes. What is your Week 16 MTTI for a similar-complexity investigation? What accounts for the difference — skill, tooling, or workflow? How much of the improvement came from faster CCT reasoning vs. faster execution?
  • Changed assumptions: What assumption about AI-assisted security work did you hold in Week 1 that turned out to be wrong? What evidence from a specific lab changed your mind?
  • CCT in practice: Which of the five CCT pillars challenged your instincts most? Was there a lab where you realized you had been applying a pillar superficially?
  • Course redesign: If you were designing this semester's curriculum, what would you add, cut, or reorder?
  • Portfolio gap: Your Sprint II prototype is now a portfolio item. What would you need to do to present it in a job interview with confidence? What is the gap between "it works" and "I would stake my professional reputation on this"?

Ship It: Release Pipeline Checklist

Before your prototype earns a PR and goes to leadership review, run it through this shipping checklist. Each step has a reason — this is the production shipping discipline practiced throughout the course: spec before build, test before ship, document before deploy.

Unit 4 Pre-Landing AI Checklist

Review each item. Pass = confirmed. Skipped = documented in your PR description with written justification.

# Item What to check Pass criteria
1LLM trust boundariesDoes your system treat all LLM outputs as untrusted input? Is there validation before acting on agent decisions?Every agent output is validated before downstream use
2SQL / injection safetyIf your system writes to any database or constructs queries, are inputs parameterized?No string concatenation in queries
3Race conditionsIf agents run concurrently, is shared state (logs, files, rate counters) protected?Thread-safe or process-isolated
4Enum completenessDo all match/case or if-elif blocks have an explicit default?No silent fall-through on unexpected values
5Error propagationDoes every exception either recover gracefully or surface clearly to the caller?No bare except: pass blocks
6Secrets in env varsAre all API keys, tokens, and credentials in environment variables — not in code or committed files?git grep -i "api_key\s*=" returns zero matches
7Blast radiusDoes this PR change fewer than 5 files? If more, is the scope justified?Documented rationale for any large-scope change

Document any item you knowingly skip and the reason why. This is a graded deliverable — the documentation is as important as the checklist result.

Coming in Semester 2 — the full production checklist

Unit 7 extends this checklist with governance-grade requirements: AIUC-1 audit across all 6 domains, MASS security scan, agent identity and allowance profiles, distributed tracing, red team report, cost caps, and tested human escalation paths. The 7 items above are the Semester 1 foundation you build on.

gh pr create \
  --title "Sprint II: Threat Hunter v0.1.0 — Hardened" \
  --body "## Problem
Analysts spend 45 min/day triaging phishing alerts manually.

## What this does
3-agent system: classifier → enricher → reporter. 94% accuracy on test set.

## Architecture decisions
- Claude Sonnet for classification (cost/accuracy tradeoff)
- Async enrichment with 30s timeout + fallback
- Human escalation at confidence < 0.7

## Test coverage
- 23 unit tests, 4 integration tests, all passing
- Pre-landing checklist: all 7 items reviewed, 0 deferred

## Known gaps
- No rate limiting on the enrichment API (tracked in TODOS.md)
- Evaluation dataset is synthetic; production data may differ"

Presentation Preparation Checklist

Peer Review Template — copy and complete for each presenter
# Peer Review — [Presenter Name]
**Reviewer:** [Your Name] | **Date:** [Date] | **System:** [System Name]

## 1. Problem solved
[1-2 sentences: What security problem did they address? Was the problem well-defined?]

## 2. Most impressive technical achievement
[Specific: name the component, approach, or design decision that stood out.
Not "it worked well" — what specifically demonstrated skill?]

## 3. Most significant gap or risk
[Specific: a security gap, architectural weakness, or untested edge case.
Apply CCT Pillar 1 — what evidence supports this being a real risk?]

## 4. One improvement suggestion
[Actionable: "Add rate limiting to the enrichment API call in recon_agent.py"
not "improve security." Something they could implement in a day.]

## Overall: Would you use this in a real SOC?
[ ] Yes, as-is  [ ] Yes, with modifications  [ ] Not yet — needs X first
Reason: [one sentence]
Week 16 Deliverables
  • 10-Minute Live Demo — of your hardened Sprint II prototype to the class
  • Peer Review Forms — one for each teammate's presentation
  • Semester Reflection (750 words) — learning trajectory from Week 1 to Week 16
  • GitHub Repository — final, tagged release of your prototype with complete documentation
Semester 1 Portfolio — Make It Public and Share It

Stop and take stock of what you've built: a CCT analysis framework, a multi-tool MCP server with RAG, an AI security policy, an AI ethics audit, and a 3-agent SOC prototype that went through two hardening sprints. That is a real portfolio — not coursework, not exercises, actual working security systems.

If your repositories aren't public on GitHub with proper READMEs, fix that now. Share the links — on LinkedIn, in security forums, with your team, in Discord servers like BloodHound Gang or Security BSides channels. There are security practitioners, hobbyists, and students who would learn from your AI ethics audit template alone. Security only improves when practitioners share what works. You don't have to wait until you're an expert to share. You are already building things people need.

Use this prompt to generate READMEs:

Write a GitHub README for my [project name] that explains the security problem it solves, how to run it locally, what tools it uses, and what someone could fork or extend.
Build Your Sprint Skill — Shortcut the Next One

You've now run two sprints and have a repeatable pattern: planning → scaffolding → hardening → review. Turn this into a Claude Code skill. Create a /sprint-setup skill that scaffolds a new security agent project with your preferred directory structure, CLAUDE.md, logging config, and ethics checklist pre-wired. The next sprint starts in 20 seconds instead of 20 minutes.

Use this prompt:

Based on my Sprint I and II work, write a Claude Code skill file called sprint-setup.md that scaffolds a new security agent project with my standard structure, dependencies, CLAUDE.md, and hardening checklist already in place.

Knowledge Check — Week 16

Which best describes production readiness for an AI system?

Applying the production engineer mindset to your final presentation means:


Semester 1 Complete!

You have completed all four units of CSEC 601: CCT Foundations, Agent Tool Architecture, AI Security Governance, and Rapid Prototyping. You are ready for Semester 2.

What Semester 2 does with what you built.

The tools and systems you built this semester are the starting material for Semester 2 — not background context, but direct inputs.

  • Unit 5 (Multi-Agent Orchestration): Your phishing triage agent from Sprint I becomes a supervised multi-agent pipeline. The single agent that classifies and reports becomes a specialist team: a classifier agent, an enrichment agent, a report-writer agent.
  • Unit 6 (Red Teaming): You will attack your own Unit 2 MCP server using the techniques from Unit 6. The tools you built are the targets.
  • Unit 7 (Hardening): The Cedar policies you wrote in Week 12 are deployed to Amazon Verified Permissions. Your Unit 2 MCP server gets production security hardening. The gaps your Unit 3 audit identified get closed.
  • Unit 8 (Capstone): Your Sprint II prototype is the capstone starting point. You're not starting from scratch — you're hardening and scaling what's already there.

✓ What you mastered

  • Sprint planning: spec-first, scope definition, integration estimation
  • Agentic prototype development under ethical constraints
  • Evaluation methodology: ground truth independence, adversarial test cases
  • Multi-agent architecture patterns (worktree-based, supervisor/specialist)

↻ What was introduced (returns later)

  • Production hardening (Unit 7)
  • Red team testing of your own tools (Unit 6)
  • Multi-agent orchestration at scale (Unit 5)

→ What's waiting next

Semester 2 begins with Unit 5 — your prototype enters a multi-agent pipeline, and the security evaluation gets serious.

Continue: Semester 2 Lab Guide — Unit 5: Multi-Agent Orchestration →