Section 2

LLMs & Generative AI

Deep dive into the LLM landscape — model comparisons, API integration, prompt engineering mastery, and production agentic systems.

2.1

LLM Landscape & Comparison

The major model families, their strengths, and when to use each.

Interactive Comparison Matrix

Filter:

Model	Context	Cost In	Cost Out	Vision	Tools	Strengths
GPT-4o OpenAI Popular	128K	$5/1M	$15/1M	✓	✓	CodingReasoningVision
Claude 3.5 Sonnet Anthropic Preferred	200K	$3/1M	$15/1M	✓	✓	Long contextSafetyWriting
Gemini 2.5 Flash Google Fast	1M	$0.15/1M	$0.60/1M	✓	✓	SpeedLong contextCost
Gemini 2.5 Pro Google Power	2M	$3.50/1M	$10.50/1M	✓	✓	ReasoningHuge contextCode
Llama 3.3 70B Meta (OSS) OSS	128K	Free*	Free*	–	✓	Open sourcePrivacyFine-tunable
Phi-4 Microsoft Tiny	16K	Free*	Free*	–	✓	On-deviceSmall sizeReasoning

* Hosting costs apply for self-hosted open-source models. Pricing approximate and subject to change.

2.2a

Anthropic Claude API

The preferred model for long-context tasks, safety-critical applications, and tool use. Powers this platform's AI assistant.

Best200K ctx

claude-3-5-sonnet-20241022

$3/$15 per 1M

Production workloads

Fast200K ctx

claude-3-5-haiku-20241022

$0.80/$4 per 1M

High-volume, routing

Power200K ctx

claude-3-opus-20240229

$15/$75 per 1M

Complex reasoning

🐍claude_api.py
import anthropic

client = anthropic.Anthropic(api_key="your-key")

# Basic message
message = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    system="You are a DoD financial analyst AI.",
    messages=[
        {"role": "user", "content": "Summarize FY2025 defense priorities"}
    ]
)
print(message.content[0].text)

# Tool use / Function calling
tools = [{
    "name": "search_budget_data",
    "description": "Search DoD budget database",
    "input_schema": {
        "type": "object",
        "properties": {
            "year": {"type": "integer", "description": "Fiscal year"},
            "service": {"type": "string", "description": "Military service branch"}
        },
        "required": ["year"]
    }
}]

response = client.messages.create(
    model="claude-3-5-sonnet-20241022",
    max_tokens=1024,
    tools=tools,
    messages=[{"role": "user", "content": "What was Army FY2024 budget?"}]
)

Why Claude for This Platform

Claude's 200K token context window is ideal for analyzing long federal documents (NDAA, OMB Circulars). Constitutional AI training makes it particularly reliable for sensitive DoD policy questions.

2.2b

Google Gemini API

Powers the MyThing platform agentic AI assistant with Google Search grounding and function calling.

🔷gemini_agent.ts
import { GoogleGenerativeAI } from '@google/genai';

const genai = new GoogleGenerativeAI(process.env.GOOGLE_GEMINI_API_KEY!);

// Function calling setup
const tools = [{
  functionDeclarations: [{
    name: "get_budget_data",
    description: "Retrieve DoD budget data",
    parameters: {
      type: "object",
      properties: {
        fiscal_year: { type: "string" },
        component: { type: "string" }
      }
    }
  }]
}];

async function runAgent(query: string) {
  const model = genai.getGenerativeModel({
    model: "gemini-2.0-flash",
    tools,
    systemInstruction: "You are a federal budget analyst AI."
  });

  const chat = model.startChat();
  const result = await chat.sendMessage(query);
  
  // Handle function calls
  const response = result.response;
  const parts = response.candidates?.[0]?.content?.parts || [];
  
  for (const part of parts) {
    if (part.functionCall) {
      console.log("Tool called:", part.functionCall.name);
      // Execute the function...
    }
  }
  
  return response.text();
}

2.3

Prompt Engineering Mastery

Production-tested techniques for getting the best results from LLMs.

🎯

Zero-Shot

No examples — direct instruction. Works well with capable models.

"Classify this document as UNCLASSIFIED, CUI, or SECRET."

📋

Few-Shot

Provide 2-5 examples before the task. Dramatically improves accuracy.

"Example 1: X → Y. Example 2: A → B. Now classify: C → ?"

🧠

Chain-of-Thought

Ask the model to reason step-by-step before answering.

"Think step by step about the budget implications..."

⚙️

System Prompts

Set role, constraints, output format at the system level.

"You are a DoD analyst. Always cite FM references. Output JSON."

🌳

Tree of Thoughts

Explore multiple reasoning paths, select the best.

Generate 3 approaches, evaluate each, pick the optimal.

📊

Structured Output

Force JSON/XML output for programmatic processing.

"Respond ONLY in JSON: {"risk_level": ..., "factors": [...]}"

Interactive Prompt Playground

Prompt Playground

Test prompts against Claude · Powered by Anthropic API

System Prompt

User Prompt

Response

Response will appear here after running...

2.4

Agentic AI Systems

Building multi-agent systems with tool use, routing, and orchestration — based on the MyThing platform implementation.

🔄

ReAct Agent

Reason → Act → Observe loop. Tool use + reasoning interleaved.

📋

Plan & Execute

Generate a full plan first, then execute each step.

🗺️

Multi-Agent Router

Classify query → route to specialized agent. (MyThing pattern)

🪞

Reflexive Agent

Self-critique and revise before final output.

Case Study: MyThing Platform Agent Architecture

🐍multi_agent.py
# Multi-Agent System (MyThing Platform Pattern)
from anthropic import Anthropic

client = Anthropic()

def route_query(query: str) -> str:
    """Determine which specialized agent to use"""
    routing = client.messages.create(
        model="claude-3-5-haiku-20241022",
        max_tokens=50,
        system="Classify the query. Respond with ONLY one word: portfolio, tech, dod, or notes",
        messages=[{"role": "user", "content": query}]
    )
    return routing.content[0].text.strip().lower()

AGENT_CONFIGS = {
    "portfolio": "You are a portfolio analyst for Peter Shang's projects...",
    "tech": "You are a tech trends analyst with expertise in AI/ML...",
    "dod": "You are a DoD policy and federal finance expert...",
    "notes": "You help retrieve and summarize personal notes...",
}

def run_agent(query: str) -> str:
    agent_type = route_query(query)
    system_prompt = AGENT_CONFIGS.get(agent_type, AGENT_CONFIGS["tech"])
    
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        system=system_prompt,
        messages=[{"role": "user", "content": query}]
    )
    
    return f"[{agent_type.upper()} Agent] {response.content[0].text}"

# Usage
print(run_agent("What AI projects has Peter built?"))  # → portfolio
print(run_agent("Latest trends in agentic AI?"))        # → tech
print(run_agent("Explain FIAR audit requirements"))     # → dod

Production Implementation

This multi-agent pattern is live at shangthing.vercel.app — routing between Portfolio, Tech Trends, DoD Policy, and Notes agents using Gemini 2.5 with Google Search grounding.

RAG (Retrieval-Augmented Generation)

🐍rag.py
from anthropic import Anthropic
import numpy as np

# Simple RAG implementation
client = Anthropic()

def cosine_similarity(a: list, b: list) -> float:
    a, b = np.array(a), np.array(b)
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def simple_rag(query: str, documents: list[dict]) -> str:
    """
    Retrieval-Augmented Generation with Claude
    documents: [{"content": "...", "embedding": [...]}]
    """
    # 1. Embed the query (simplified - use real embedding API)
    # query_embedding = embed(query)
    
    # 2. Find most relevant docs
    context = "\n\n".join([
        f"[Source: {doc['source']}]\n{doc['content']}"
        for doc in documents[:3]  # Top 3 relevant chunks
    ])
    
    # 3. Generate with context
    response = client.messages.create(
        model="claude-3-5-sonnet-20241022",
        max_tokens=1024,
        system="""Answer questions using ONLY the provided context. 
        Cite sources when possible. If unsure, say so.""",
        messages=[{
            "role": "user",
            "content": f"Context:\n{context}\n\nQuestion: {query}"
        }]
    )
    
    return response.content[0].text