Unit 8: Capstone Projects

CSEC 602 — Semester 2 | Weeks 13–16

← Back to Semester 2 Overview


Unit Learning Goals

Capstone as Production Delivery: Your capstone is not a prototype showcase—it's a production delivery exercise. You'll apply the full prototype-to-production pipeline from Agentic Engineering: rapid prototyping (Weeks 13-14), leadership evaluation through architecture review, and production hardening (Weeks 14-15). By presentation day, your capstone demonstrates not just a clever idea but a deployable, observable, governed system ready for real-world use. Your reflection should articulate how you'd take this from demo to production: what monitoring would operators need? What policies would governance require? How would you handle failures? This is the mindset of a production engineer, not just a developer.

The deployment test: A required component of your capstone assessment is a written deployment justification. Assume you have one engineer-month and $10K to take this system from presentation to production. Write a phased plan: what ships first (MVP), what rolls out gradually, what's explicitly deferred and why, and how you'd handle a critical bug in production. If you can't answer these questions, the system isn't ready — regardless of how technically complete the implementation is. This question is what separates engineers who build demos from practitioners who deliver value.


Week 13: Capstone Kickoff and Architecture Reviews

Day 1 — Theory & Foundations: Project Selection and Architecture Design

Learning Objectives

Project Scope and Requirements

The capstone is your opportunity to demonstrate mastery of everything you've learned in CSEC 602. You'll work in teams of 2–3 to design and build a production-quality agentic security system that addresses a tangible cybersecurity problem.

Your capstone project must include:

  1. Multi-Agent Architecture — Minimum 3 specialized agents with distinct roles, expertise, and tool sets. Agents must communicate clearly and work toward a common goal.
  2. Collaborative Critical Thinking (CCT) Analysis — Documentation showing how the multi-agent design enables deeper reasoning, validates assumptions, and identifies risks that a single agent would miss.
  3. MITRE ATLAS Threat Model — Identify and mitigate top 5 AI-specific threats to your system.
  4. Observability and Monitoring — Comprehensive logging, metrics, audit trails, and operational dashboards.
  5. Ethical Impact Assessment — Stakeholder analysis, potential misuse scenarios, and responsible AI alignment.
  6. AIUC-1 Domain Mapping — Map your capstone system against all six AIUC-1 domains (Data & Privacy, Security, Safety, Reliability, Accountability, Society). For each domain, document: which controls your system implements, which controls are not applicable (with justification), and what gaps remain. Reference: https://www.aiuc-1.com/
  7. AIVSS Risk Assessment — Score the top 5 AI-specific vulnerabilities in your system using OWASP AIVSS methodology. For each vulnerability: describe the risk, assign an AIVSS score, map it to the relevant AIUC-1 domain, and document your mitigation. Demonstrate how AIVSS scoring informed your prioritization decisions.
  8. Containerized Delivery — Your capstone must be deliverable as a containerized artifact, including:
    • Dockerfile with multi-stage build, non-root user, health checks
    • docker-compose.yml for local testing and development
    • Container image scanning (Trivy) results documenting any CVE findings and mitigations
    • Supply chain security (SBOM) in CycloneDX or SPDX format
  9. Infrastructure as Code (IaC) — CloudFormation or Terraform template showing how your system deploys to production (ECS task definition, Kubernetes manifests, or equivalent). IaC enables repeatable, versioned deployments.
  10. CI/CD Pipeline — GitHub Actions workflow demonstrating the DevSecOps promotion pipeline:
    • Pre-commit: secrets detection
    • PR review: SAST scanning (Bandit/Semgrep)
    • Build: container image scanning, SBOM generation
    • Deploy: promotion gates (dev → pilot → preprod → prod) with approval workflows
  11. Deployment Plan — Documentation on how this system scales to production, including operational runbook, incident playbook, and observability setup.

🔑 Key Concept: The capstone is not just about building a cool system—it's about demonstrating that you can engineer agentic security solutions with the same rigor as traditional software engineering. Production-quality means security, observability, documentation, and responsible AI built in from the start, not bolted on afterward.

Production-Promotable Capstone: By Week 16, your capstone must be ready to move from demo to production. This means: containerized and tested locally via docker-compose, with a complete CI/CD pipeline defined (GitHub Actions with all security gates), an IaC template ready for your ops team to deploy to ECS/Kubernetes, and documentation proving observability and incident response are designed in. Your capstone isn't just code; it's a deployable artifact with full provenance, governance, and operational readiness. If leadership said "deploy this Monday morning," your team could hand off a complete, hardened system—not a collection of notebooks and scripts.

Deployment Target: Cloud Infrastructure (Required)
Your capstone system must be deployed to production infrastructure — not running locally on Claude Code. The primary stack is Claude SDK for custom agent logic + Claude Managed Agents for hosted execution, deployed via Docker/containers to cloud infrastructure (AWS ECS, Lambda, or equivalent).

Pre-capstone checkpoint (do before Week 13 starts): Verify your Anthropic API key is active and you can make a basic client.messages.create(...) call. Confirm your container registry and deployment pipeline are configured. Do not discover environment issues on the first day of Week 13.

If Claude Managed Agents hosted execution is not yet available in your region, deploy as containerized agents with the Claude SDK — the agent logic is identical; only the hosting boundary changes.

Capstone Project Ideas

Here are concrete, achievable project ideas suitable for a 4-week capstone:

Autonomous SOC Analyst

Proactive Threat Hunting System

Automated Compliance Auditor

Intelligent Phishing Defense

Vulnerability Management Orchestrator

AI Red Team System

MASS Plugin Development

PeaRL Governance Extension

Discussion Prompt: In your team, discuss which project idea resonates with your interests. Why? What real-world problem would you want to solve? How would a multi-agent approach help where a single agent or traditional automation would fall short?

Further Reading: Review Framework documentation to understand available agent frameworks (Claude SDK, Claude Managed Agents, OpenAI Agents SDK) and how they support multi-agent patterns.

🔑 Key Concept: Both PeaRL and MASS are open source because their creator believes security should always be open to anyone to use. This isn't just ideology — it's sound engineering. Open-source security tools benefit from community review, diverse perspectives, and rapid improvement cycles. When you build your capstone, consider: would the security community benefit from your work being open? How does open-sourcing change your approach to code quality, documentation, and design?

Architecture Review Methodology

Weeks 13 is structured around a peer and faculty architecture review. Here's how it works:

Timeline:

What Reviewers Look For: 1. Problem clarity — Is the cybersecurity problem well-defined and significant? 2. Solution fit — Is an agentic multi-agent approach the right tool? Or is this overengineered? 3. Technical feasibility — Can a team of 2–3 actually build this in 3 weeks? (Scope is critical!) 4. Architectural soundness — Do the agents have clear roles? Is orchestration realistic? 5. Security thinking — Do you demonstrate understanding of threat models and hardening? 6. Ethical awareness — Have you thought through potential harms and misuse?

Common Pitfall: Over-scoping. Many teams try to build a system that would take 6 months. Scope ruthlessly. A simple, well-executed 3-agent system beats an incomplete 10-agent vision. Ask your reviewers: "What's the minimum viable product that still demonstrates the concepts?"

Day 2 — Hands-On Lab: Proposal Development and Peer Review

Lab Objectives

Step 1: Form Teams (Due Wednesday)

Submit to faculty:

Pro Tip: Choose a co-lead architect and lead developer early. Assign one person to champion security/hardening and one to champion observability/ops. These aren't "nice to have" roles—they're critical to your grade.

Step 2: Write Your Proposal (Due Thursday)

Format: 500–1000 words

Content: 1. The Problem (2–3 paragraphs): What cybersecurity challenge are you solving? Why does it matter? How is it currently addressed? What are the gaps? 2. Why Multi-Agent? (1 paragraph): Why is a multi-agent approach better than a single agent or traditional automation? 3. Proposed Solution (2 paragraphs): High-level overview of your system. What does it do? Who uses it? What are the main workflows? 4. Success Metrics (1 paragraph): How will you know your system works? What are 3–5 key metrics (accuracy, latency, cost, false positive rate, etc.)?

Remember: A proposal is a sales pitch. You're convincing your reviewers (and yourself) that this is worth 4 weeks of intensive work. Be specific. Use numbers and examples.

Step 3: Develop Your Architecture Document (Due Thursday)

Format: 1500–2500 words (this is substantial; start early)

Methodology: Your capstone follows the Think → Spec → Build → Retro cycle. Week 13 is the Think + Spec phase (critical analysis, architecture review, and formal specification of your design decisions). Weeks 14-15 are the Build phase (rapid development with Claude Code using /worktree-setup for isolated parallel work). The red team review closes the Retro phase (external validation and hardening). By Week 16, you've completed a full cycle and can reflect on how iteration improved your system.

Structure:

1. System Overview (200 words)

2. Multi-Agent Design (600 words)

🔑 Key Concept: Good multi-agent design is about separation of concerns. Each agent should have a clear, bounded role. Agent A doesn't try to do everything; it calls Agent B when specialized expertise is needed. This mirrors how human teams work. The Pit of Success principle from Agentic Engineering principles means designing your multi-agent system so the right behavior (agents respecting role boundaries, escalating appropriately, handling failures gracefully) emerges naturally from the architecture, not from constant oversight.

3. Collaborative Critical Thinking (CCT) Analysis (400 words)

Pro Tip: CCT isn't abstract. Show concrete examples. Don't just say "agents discuss threats." Say: "Agent A (Alert Triager) flags alert severity as LOW. Agent B (Threat Analyst) reviews threat intel and overrides to CRITICAL because this IP just attacked 3 other companies in our industry." That's CCT in action.

4. Security Hardening Plan (400 words)

5. Observability Plan (300 words)

6. Deployment Plan (300 words)

7. Ethical Considerations (300 words)

8. Success Criteria (200 words)

Design Thinking: As you finalize your capstone architecture, reflect on the mental models that underpin it. Agentic Engineering principles ask: What assumptions are you making about how users will interact with your system? How will operators understand what went wrong? Are you designing for the cognitive model of your users or against it? Use these questions to stress-test your architecture before building.

9. Timeline and Milestones (100 words)

Step 4: Present Your Architecture (Thursday Afternoon)

Format: 15-minute presentation + 15-minute feedback/Q&A

Presentation structure (aim for ~10 slides): 1. Problem and context (1–2 slides) 2. Proposed solution overview (1 slide) 3. Multi-agent architecture (2 slides: agent roles + orchestration diagram) 4. CCT analysis — concrete example (1 slide) 5. Security hardening plan (1 slide) 6. Observability approach (1 slide) 7. Timeline and risks (1 slide) 8. Questions?

Common Pitfall: Slides that are text-heavy or too technical. Reviewers want to understand your vision in 15 minutes. Use diagrams. Show your system architecture visually. Practice beforehand and time yourself.

Pro Tip: In the Q&A, be honest about unknowns. "We haven't decided on framework yet, but we're between Claude SDK and Claude Managed Agents because..." is better than "We'll use whatever works." Reviewers respect intellectual honesty.

Step 5: Incorporate Feedback and Finalize (Thursday–Friday)

After your presentation, you'll receive written feedback from reviewers focusing on:

Action: Meet with your team Friday. Read feedback. Refine your architecture document and confirm:

Deliverables (Due Friday)

  1. Capstone Proposal (500–1000 words)
  2. Architecture Document (1500–2500 words)
    • Presentation Slides (PDF)
      • Peer Review Feedback Summary (1 page: what did you learn? what did you change?)

Sources & Tools


Week 14: Capstone Development Sprint I

Day 1 — Daily Standup Check-In

Learning Objectives

Structure (15 minutes daily)

Each team answers: 1. What did you complete yesterday? (Focus on working code, not just effort) 2. What's your plan for today? 3. What's blocking you? (Faculty can help)

🔑 Key Concept: Standups are a team synchronization tool, not a status report to management. Keep them tight. If a blocker needs deep discussion, take it offline after standup.

Mid-Week Checkpoint (Wednesday, 30 minutes)

Each team demos progress to faculty:

Day 2 — Hands-On Development Sprint

Lab Objectives

Development Focus for Sprint I

Week 14 is about getting the minimum viable product (MVP) working:

  1. Implement core agents — Each team member builds 1–2 agents. Ensure they can communicate.
  2. Establish data flows — Data moves from one agent to the next; end-to-end workflow completes.
  3. Deploy basic tools — If agents call external APIs or tools, get those integrated.
  4. Add logging and monitoring — Every agent decision should be logged; set up basic metrics.
  5. Get to "working" — The system doesn't need to be perfect, but it should run end-to-end without crashing.

Common Pitfall: Perfectionism in week 14. Don't spend 3 days optimizing agent prompts when you haven't built the orchestration layer yet. Build the skeleton first; refine later.

Pro Tip: Use Claude Code and Git heavily. Create a branch for each agent. Use pull requests for code review. Maintain a clear README so any team member can spin up the environment. You'll thank yourself in Week 16 when you need to demo quickly.

Static Review vs. Dynamic QA — Testing the Running System

/code-review and /audit-aiuc1 analyze your code for vulnerabilities and compliance gaps. But code that looks correct can still break at runtime. Anthropic's engineering team adds a Playwright MCP to their evaluator agent, letting it navigate the running application like a real user — clicking through features, submitting inputs, and verifying outputs against expected behavior.

For your capstone: if your system has a security dashboard, alert triage UI, or any user-facing interface, connect the Playwright MCP to your evaluator agent and have it test the live deployment — not just the source code. This catches "the agent runs but produces wrong findings" or "the dashboard loads but displays stale data" — issues that static review will always miss.

For API-only systems (no UI): focus dynamic QA on the MCP server endpoints and agent output schemas. Have a separate evaluator agent call the tools directly with edge-case inputs and verify outputs meet the declared schemas. Source: Anthropic Engineering, "Harness design for long-running application development," March 2026.

Deliverable: Sprint I Progress Report (Due Friday)

Format: 2–3 pages, including:

  1. Implementation Status:
    • List agents implemented (with % complete for each)
    • Working end-to-end workflows
    • What's in progress or deferred

  2. Code & Artifacts:

    • Link to GitHub repo
    • README with "how to run" instructions
    • Demo or screenshot of working system

  3. Metrics:

    • Lines of code written (rough estimate)
    • Number of agents deployed
    • Functionality coverage (e.g., "70% of design implemented")

  4. Obstacles & Adjustments:

    • What challenges did you hit? How did you solve them?
    • Any scope or architecture adjustments?
    • Risk assessment: What might not make it?

  5. Plan for Sprint II:

    • What will you focus on in Week 15?
    • How will you prepare for the red team review?

Remember: This report is not just for your instructors—it's for your team. Be honest about what's working and what's not. If you're behind, now's the time to course-correct.


Week 15: Capstone Development Sprint II and Red Team Review

Day 1 — Sprint II Kickoff and Red Team Assignment

Learning Objectives

Red Team Review Overview

On Wednesday of Week 15, your team will conduct a peer security review of another team's capstone project. Simultaneously, another team will red team your system. This is a 2-hour time-boxed exercise designed to find vulnerabilities through adversarial thinking.

What Red Teamers Will Do: 1. Review your architecture and threat model 2. Attempt 3–5 common attacks:

3. Document findings with evidence and severity ratings

How This Helps You:

🔑 Key Concept: Red team reviews are constructive, not punitive. The goal is to make your system better. Reviewers are peers, not adversaries. Treat findings as gifts—they show you where to focus hardening effort.

Deployment Freeze

On Day 1 of Week 15, teams finalize their production deployments. After the freeze, no changes until red team results are received. Deployment must include:

  1. Working security agent system (3+ agents with distinct IAM roles, real MCP connections)
  2. Full governance stack: IAM per-agent, guardrails layer (NeMo Guardrails or equivalent), observability dashboard, SBOM
  3. Documentation package:
    • Architecture diagram (agents, tools, data flows)
    • Security controls matrix (every control, which layer, inside/outside reasoning loop)
    • AIUC-1 domain mapping (which domains covered, which controls)
    • AWS scoping matrix position (GenAI Scope × Agentic Scope)
    • Cost model (estimated monthly cost at projected usage)
    • Known limitations and accepted risks
  4. Access package for red team: read-only observer role + scoped attacker role (instructor-configured IAM permission boundary — scope is limited to your team's sandbox)

Red Team Assignment

Each team receives ANOTHER team's deployment to red team. Red teams have 48 hours (Day 1 afternoon through Day 2). Teams work from the architecture documentation and test the production deployment.

Red Team Methodology: OWASP Agentic Top 10

Test each OWASP Agentic risk against the production deployment. For each finding: OWASP AIVSS severity score, OWASP Agentic risk number, defense layer exploited (L1/L2/L3/L4), inside or outside the reasoning loop, recommended fix with specific implementation guidance.

Scope boundary: Supply chain testing (#7) is theoretical only — analyze whether dependencies are hash-pinned and document the blast radius if one were compromised. Do not attempt to modify packages or dependencies in another team's environment. This is a finding-and-reporting exercise, not a destructive penetration test.

Phase 1: Reconnaissance (2 hours)
Phase 2: Vulnerability Testing (3 hours)
OWASP RiskWhat to test
#1 Excessive AgencyCan any agent take actions beyond its stated scope? What's the blast radius of a single agent error? Try to exceed failure caps.
#2 Insufficient GuardrailsTest the guardrails layer (NeMo Guardrails / system prompt defenses) with adversarial inputs from Unit 6. Can you bypass content filtering?
#3 Insecure Tool IntegrationPath traversal on file-reading tools. Command injection on bash-executing tools. Are inputs validated before execution?
#4 Lack of Output ValidationCan you make the agent produce findings with fabricated evidence? Does output schema enforcement hold?
#5 Prompt InjectionInject via tool outputs. Inject via data the agent retrieves (RAG poisoning). Test indirect injection through MCP server responses.
#6 Memory PoisoningIf the agent has persistent memory, can you corrupt it? Can you inject false context that affects future decisions?
#7 Supply ChainTheoretical only: Are dependencies hash-pinned? Is the SBOM complete? Document blast radius — do not modify packages.
#8 Insufficient LoggingMake the agent do something anomalous. Can the blue team detect it from the observability dashboard?
#9 Over-relianceMake the agent produce a plausible but wrong finding. Would a human analyst catch the error from the output alone?
#10 Inadequate IAMCan one agent's credentials access another agent's resources? Are there shared credentials? Try to escalate from attacker role.
Phase 3: Advanced Testing (2 hours)
Phase 4: Report (1 hour)

For each finding:

Deliverable: Red team report suitable for CISO briefing — executive summary (Critical/High count, most significant finding) + detailed findings table.

Day 2 — Hardening Preparation

Teams continue red teaming (48-hour window). Blue teams use Day 2 to prepare for the hardening response — reviewing their own system with the red team methodology to anticipate findings before the report arrives.

Pro tip: Run the OWASP Agentic Top 10 table against your OWN system now. Any finding you identify and fix before receiving the red team report is a finding you already remediated — and that shows up in your hardening response as proactive, not reactive.

Day 2 — Sprint II Development and Hardening

Lab Objectives

Production Hardening: Week 15 applies the production hardening practices from Agentic Engineering (Ch. 7: Practices — Production Concerns). You're not just fixing bugs; you're ensuring your system can run reliably under load, with visible observability, clear error messages, and graceful degradation. By end of this week, your system should be deployment-ready, not prototype-ready. That distinction matters.

Hardening Checklist

By end of week 15, your system should address:

Input Validation:

Output Filtering:

Tool Permission Scoping:

Monitoring & Alerts:

Error Handling:

Pro Tip: Don't try to prevent every possible attack. Instead, focus on defense in depth: multiple layers of protection (validation, filtering, monitoring, logging). If one layer fails, others catch it. Plus, comprehensive logging means you can detect attacks even if they partially succeed.

Red Team Findings Response

By Thursday, you'll receive the red team report. Action plan:

  1. Read and categorize — Which findings are valid? Which are misunderstandings of the system?
  2. Prioritize — Fix critical/high severity before presentation. Medium/low can be documented as "accepted risk."
  3. Mitigate or document — Either implement a fix or document why you're accepting the risk (e.g., "This attack requires admin access, which is out of scope for this MVP").
  4. Test your fixes — Make sure mitigations actually work.

Remember: You don't need to fix every finding. But for every finding you don't fix, you need a good reason (documented in your final presentation).

Deliverable: Sprint II Progress Report (Due Friday)

Format: 3–4 pages

  1. Hardening Summary:
    • Security improvements implemented (with brief description)
    • Red team findings and responses (table: finding, severity, status)
    • Any risks you're accepting

  2. Observability Implementation:

    • Monitoring dashboard or reporting system deployed
    • Key metrics defined and tracked
    • Audit logging configured

  3. Code Quality:

    • Code review completed (peer review of pull requests)
    • Documentation updated
    • Test coverage (unit tests, integration tests)

  4. Performance Metrics:

    • Track 5 key metrics from Week 14 to Week 15 (show improvement if possible)
    • Examples: accuracy, latency, cost, false positive rate, uptime

  5. Readiness Assessment:

    • % of architecture implemented and tested
    • Remaining work for Week 16
    • Risks: "What might not be done by presentation day?"

Sources & Tools


Week 16: Capstone Presentations and Course Wrap

Day 1 — Defense Hardening

Morning: Receive Red Team Report

Each team receives the red team report on their system. You have 4 hours to triage, fix, and document:

  1. Triage findings by severity (Critical → High → Medium → Low)
  2. Fix Critical and High findings — focus on code and configuration changes; document the fix plan for any finding requiring infrastructure changes that take longer than 4 hours
  3. For Medium/Low: document accepted risk with rationale (why it's acceptable, what would change the calculus)
  4. For each fix: what defense layer does it operate at? Is it inside or outside the reasoning loop? How do you verify it holds?
  5. Re-run the specific attacks that found the vulnerabilities — verify fixes hold

Scope the hardening realistically. Fixing an IAM misconfiguration, reconfiguring a guardrails layer, or redeploying a containerized agent takes longer than fixing a code vulnerability. For infrastructure-level fixes, provide the fix plan + code change — your instructor will verify the approach is correct. What matters is that you understand what the fix is and why it addresses the finding.

Afternoon: Final Verification and Presentation Prep

Reflection Essay (1000–1500 words, Due Friday)

Write a reflection on your capstone experience:

  1. What you learned about agentic AI — What surprised you? What challenges did you face?
  2. How your thinking evolved — When you started, what did you think agentic systems could do? Now?
  3. Your hardening journey — What vulnerabilities did you discover? How did you think about security differently?
  4. Ethical implications and AIUC-1 alignment — What are the risks of deploying this system? How does your AIUC-1 domain mapping reveal gaps in your governance approach? Which AIUC-1 domain was hardest to address, and why?
  5. Production readiness — What would need to happen before this system could run in a real organization? What observability, governance, or operational procedures would teams need? What could go wrong, and how would operators detect and respond to it?
  6. The bigger picture — What are the implications of agentic security systems for the field of cybersecurity?

From Prototype to Production: Use your reflection to articulate the prototype-to-production journey your capstone has taken. How did your system evolve from an idea (Week 13) to a working implementation (Week 14) to a hardened, observable system ready for deployment (Week 16)? What did you learn about building production systems that you didn't know before? This reflection isn't just introspection—it's documentation of your growth as an engineer.

Key Concept: The reflection isn't a summary of your system. It's introspection. Think of it as a letter to yourself or to future practitioners building agentic security systems. What do you wish you had known at the start? What will your experience teach others?

Day 2 — Presentations and Course Retrospective

Capstone Presentations (Thursday)

Schedule: Each team presents 25 min. All faculty and students attend.

Presentation Format: 25 Minutes Per Team

Audience: simulated CISO, compliance officer, and engineering director.

SectionTimeContent
1. System Overview5 minWhat does it do? What security problem does it solve? Architecture diagram. AWS scoping matrix position. Dark Factory maturity assessment (see Unit 7).
2. Security Architecture5 minFour-layer defense model applied. Controls matrix: every control, layer, inside/outside reasoning loop. AIUC-1 domain coverage. What's enforced (L2-4) vs guidance (L1)?
3. Red Team Results5 minWhat was found? OWASP AIVSS severity breakdown. Most interesting/surprising finding. What the red team did NOT find — and why your controls worked.
4. Hardening Response5 minHow you fixed Critical/High. What you accepted as residual risk and why. Before/after security posture comparison.
5. Production Readiness3 minCost model at projected scale. Observability: what you monitor, what triggers alerts. Supply chain: dependency audit results. What would need to change for fully delegated autonomous operation?
6. Q&A2 minPanel questions from CISO/compliance perspective.

Evaluation Rubric (40% of course grade):

Component Weight What We're Looking For
Technical Sophistication 30% Complexity and depth of multi-agent architecture; proper use of patterns; code quality
CCT Application 20% Quality of critical thinking analysis; how agents enable reasoning; concrete examples of agent collaboration
Security Hardening 20% Strength of threat model; identified and mitigated vulnerabilities; response to red team findings
Ethical Considerations 15% Stakeholder analysis; potential harms identified; responsible AI principles demonstrated
Practical Applicability 10% Real-world relevance; feasibility of deployment; operational readiness
Presentation Quality 5% Clarity, organization, time management, ability to engage audience

Pro Tip: Q&A is part of your grade. Be humble. If you don't know the answer to a question, say so. Offer to research and follow up. Defensive answers lose points.

Capstone Project Deliverables (Due Friday, 5 PM)

Submit to faculty:

  1. Source Code (GitHub)
    • Clean, well-organized repository
    • Comprehensive README with setup and usage instructions
    • CI/CD pipeline configuration
    • Deployment scripts / Dockerfiles
    • .gitignore properly configured (no API keys, secrets, or large files)

  2. Technical Documentation

    • System architecture and design document (updated from Week 13)
    • Multi-agent design and orchestration details
    • API and tool documentation
    • Configuration reference

  3. Security Documentation

    • MITRE ATLAS threat model (summary)
    • Security hardening measures (with implementation details)
    • Red team findings and your responses
    • Security deployment checklist
    • AIUC-1 domain mapping (all six domains with control coverage assessment)
    • AIVSS risk scores for top 5 AI-specific vulnerabilities

  4. Observability & Operations

    • Monitoring and metrics documentation
    • Operations runbook (how to troubleshoot, deploy, scale)
    • Incident response procedures
    • Cost tracking and optimization

  5. Presentation Materials

    • Slides (PDF)
    • Architecture diagrams (in presentation and standalone)
    • Demo video (backup if live demo isn't possible)

  6. Reflection Paper

    • 1000–1500 words
    • Address prompts listed in Day 1 section above

Remember: This is a portfolio piece. These deliverables will be evidence of your mastery of agentic security engineering. Make them clear, professional, and complete.

Course Retrospective (Friday Afternoon, 2 Hours)

All students and faculty gather to reflect on the course and capstone projects.

Structure:

1. Key Takeaways (Each student shares 1–2 minutes)

2. Cohort Themes (Faculty synthesizes)

3. Where Is the Field Heading? (Discussion)

4. Course Feedback

Pro Tip: Be honest in the retrospective. Your feedback directly shapes future iterations of this course. We're building a curriculum together.


Context Library: Your Professional Toolkit

As you finish CSEC 602, your context library has evolved from a personal reference collection into a professional-grade toolkit. This toolkit—combined with your deep knowledge of AI security—is your competitive advantage in any security role.

The Capstone: Context Library as Deliverable

Your context library is now a formal component of your capstone evaluation. As part of your final presentation and deliverables:

Include a section titled "Context Library":

  1. Directory Structure — Show how you've organized patterns (screenshot or tree output)
  2. Key Artifacts — List the 5-10 highest-value patterns you've captured (supervisor pattern, defense layers, CI/CD pipeline, etc.)
  3. Breadth — How many domains does your library cover? (Multi-agent patterns, red team, blue team, DevOps, observability, security hardening)
  4. Depth — Pick 2-3 patterns and show how they've evolved from Unit 5 through Unit 8 (version history, refinements based on lessons learned)
  5. Composability — Demonstrate how patterns combine (e.g., your CI/CD pipeline + canary deployment + observability config = a complete deployment system)
  6. Team-Readiness — Would a teammate or junior engineer be able to use your library? Is it documented?

Evaluation Criteria for Context Library

Your library will be evaluated on:

Criterion What We're Looking For Example
Breadth Coverage across domains Library includes multi-agent, red team, blue team, DevOps, and observability patterns
Depth Quality and completeness of individual patterns Supervisor pattern includes code, decision rationale, usage examples, and common pitfalls
Iteration Evidence of refinement over the semester Supervisor pattern v1.0 (Unit 5) → v1.2 (Unit 6) → v2.0 (Unit 8) with changelog explaining improvements
Composability Patterns work together, not in isolation Your CI/CD pipeline references your Dockerfile template; both work with your canary deployment script
Documentation Teammates could use your library README explains what each pattern solves, how to use it, when to use it, and how to customize it
Production Quality Patterns are ready for real deployment Your Dockerfile isn't a learning exercise; it's a hardened, secure base image for actual systems

What Makes a Senior Professional

The capstone isn't just about building one great system. It's about demonstrating that you can build systems and share the patterns you've learned so others benefit.

A junior engineer builds a system and moves on. A senior engineer builds a system, extracts reusable patterns, documents them, and shares them so the whole team gets better.

Your context library is proof you think like a senior engineer. You're not just solving today's problem; you're building tools for tomorrow's problems.

Your Library After Graduation

Immediately after CSEC 602:

In your first security role:

Over your career:

Template for Final Submission

In your capstone presentation and deliverables, include:

Slide: "Context Library: My Professional Toolkit"

What I've Captured

Multi-Agent Patterns (Unit 5):
  ✓ Supervisor orchestration pattern
  ✓ Agent communication protocol
  ✓ Framework selection guide
  ✓ Evaluation harness template

Attack & Defense Playbooks (Unit 6):
  ✓ Attack templates with evasion techniques
  ✓ Defense layer configurations
  ✓ Incident response runbook
  ✓ Scoring rubric for security assessments

Production Engineering (Unit 7):
  ✓ CI/CD pipeline (GitHub Actions)
  ✓ Production Dockerfile (multi-stage, hardened)
  ✓ Canary deployment script
  ✓ Observability and metrics configuration

Security Hardening (Unit 8 Capstone):
  ✓ MITRE ATLAS threat model template
  ✓ Input validation and output filtering patterns
  ✓ Red team findings response template
  ✓ Security deployment checklist

📊 Library Metrics

  Patterns captured: 40+
  Domains covered: 6 (multi-agent, red team, blue team, DevOps, ops, observability)
  Lines of code/documentation: 10,000+
  Version iterations: 2-3 per pattern (showing evolution and refinement)
  Team-ready patterns: 15 (documented and ready to share)

Why This Matters

Your context library is not academic. Every pattern was discovered and tested through real (simulated but realistic) security work. When you use these patterns in production, you're deploying knowledge earned the hard way.

Final Reflection Question

In your capstone reflection essay, address:

"Describe your context library. What patterns are you proudest of? How have they evolved since Unit 5? How would a teammate use your library in their first week on a project? What makes your library a reflection of your professional standards?"

This question isn't about boasting. It's about demonstrating that you've thought deeply about quality, reusability, and scalability—the hallmarks of professional engineering.

The Bigger Picture

You're leaving CSEC 602 with two things:

  1. Deep Knowledge — You understand agentic AI security at a level most practitioners will never reach. You've designed attacks, built defenses, orchestrated agents, and deployed systems. This knowledge is in your head.

  2. Professional Toolkit — You have a context library of patterns proven to work. This library is your competitive advantage. When you face a new problem, you don't start from scratch. You pull a pattern from your library, adapt it, and build faster and better than peers without your toolkit.

The knowledge is invaluable. But the toolkit is career-long. Treat your context library with the care you would a production system.


Final Evaluation and Grade Determination

Your final grade in CSEC 602 is calculated as:

Capstone grade components:

Capstone Grading Rubric

ComponentWeightKey Criteria
Deployed System25%Works in production (cloud infrastructure/containers), 3+ agents with distinct IAM roles, real MCP connections, guardrails layer configured
Security Governance Package20%Controls matrix complete and accurate, AIUC-1 mapped with evidence, PeaRL Delegated Autonomous gate verified, SBOM present
Red Team Report (attacking)20%All 10 OWASP Agentic risks tested, OWASP AIVSS scored, defense layer classified, fixes recommended with implementation detail
Hardening Response (defending)15%Critical/High findings fixed or documented fix plan provided, residual risk documented with rationale, verification evidence present
Presentation15%Clear narrative, CISO-appropriate framing, demonstrates understanding of tradeoffs, not just feature demo
Code Quality5%Clean and documented, hash-pinned requirements.txt, REVIEW.md present, CI/CD pipeline configured

Key Concept: This is a mastery-based grading course. You're evaluated on depth of understanding and quality of work, not just completion. A simple system well-designed and well-documented scores higher than an ambitious system with gaps.

Next Steps After CSEC 602

Congratulations! You've completed a graduate-level course in agentic security engineering. Here's how to continue your journey:

Publish Your Work

Contribute to the Field

Keep Learning

Professional Growth


Key Resources


Course Contact: For questions about the capstone or Unit 8, reach out to course faculty. Office hours are posted on the course homepage.