LLMs & Generative AI

Deep dive into the LLM landscape — model comparisons, API integration, prompt engineering mastery, and production agentic systems.

LLM Landscape & Comparison

The major model families, their strengths, and when to use each.

Interactive Comparison Matrix

Filter:
ModelContextCost InCost OutSpeedIntelligenceVisionToolsStrengths
GPT-4o
OpenAI
Popular
128K$5/1M$15/1M
CodingReasoningVision
Claude 3.5 Sonnet
Anthropic
Preferred
200K$3/1M$15/1M
Long contextSafetyWriting
Gemini 2.5 Flash
Google
Fast
1M$0.15/1M$0.60/1M
SpeedLong contextCost
Gemini 2.5 Pro
Google
Power
2M$3.50/1M$10.50/1M
ReasoningHuge contextCode
Llama 3.3 70B
Meta (OSS)
OSS
128KFree*Free*
Open sourcePrivacyFine-tunable
Phi-4
Microsoft
Tiny
16KFree*Free*
On-deviceSmall sizeReasoning

* Hosting costs apply for self-hosted open-source models. Pricing approximate and subject to change.

Anthropic Claude API

The preferred model for long-context tasks, safety-critical applications, and tool use. Powers this platform's AI assistant.

Best200K ctx

claude-3-5-sonnet-20241022

$3/$15 per 1M

Production workloads

Fast200K ctx

claude-3-5-haiku-20241022

$0.80/$4 per 1M

High-volume, routing

Power200K ctx

claude-3-opus-20240229

$15/$75 per 1M

Complex reasoning

🐍claude_api.py
1import anthropic
2
3client = anthropic.Anthropic(api_key="your-key")
4
5# Basic message
6message = client.messages.create(
7    model="claude-3-5-sonnet-20241022",
8    max_tokens=1024,
9    system="You are a DoD financial analyst AI.",
10    messages=[
11        {"role": "user", "content": "Summarize FY2025 defense priorities"}
12    ]
13)
14print(message.content[0].text)
15
16# Tool use / Function calling
17tools = [{
18    "name": "search_budget_data",
19    "description": "Search DoD budget database",
20    "input_schema": {
21        "type": "object",
22        "properties": {
23            "year": {"type": "integer", "description": "Fiscal year"},
24            "service": {"type": "string", "description": "Military service branch"}
25        },
26        "required": ["year"]
27    }
28}]
29
30response = client.messages.create(
31    model="claude-3-5-sonnet-20241022",
32    max_tokens=1024,
33    tools=tools,
34    messages=[{"role": "user", "content": "What was Army FY2024 budget?"}]
35)

Why Claude for This Platform

Claude's 200K token context window is ideal for analyzing long federal documents (NDAA, OMB Circulars). Constitutional AI training makes it particularly reliable for sensitive DoD policy questions.

Google Gemini API

Powers the MyThing platform agentic AI assistant with Google Search grounding and function calling.

🔷gemini_agent.ts
1import { GoogleGenerativeAI } from '@google/genai';
2
3const genai = new GoogleGenerativeAI(process.env.GOOGLE_GEMINI_API_KEY!);
4
5// Function calling setup
6const tools = [{
7  functionDeclarations: [{
8    name: "get_budget_data",
9    description: "Retrieve DoD budget data",
10    parameters: {
11      type: "object",
12      properties: {
13        fiscal_year: { type: "string" },
14        component: { type: "string" }
15      }
16    }
17  }]
18}];
19
20async function runAgent(query: string) {
21  const model = genai.getGenerativeModel({
22    model: "gemini-2.0-flash",
23    tools,
24    systemInstruction: "You are a federal budget analyst AI."
25  });
26
27  const chat = model.startChat();
28  const result = await chat.sendMessage(query);
29  
30  // Handle function calls
31  const response = result.response;
32  const parts = response.candidates?.[0]?.content?.parts || [];
33  
34  for (const part of parts) {
35    if (part.functionCall) {
36      console.log("Tool called:", part.functionCall.name);
37      // Execute the function...
38    }
39  }
40  
41  return response.text();
42}

Prompt Engineering Mastery

Production-tested techniques for getting the best results from LLMs.

🎯

Zero-Shot

No examples — direct instruction. Works well with capable models.

"Classify this document as UNCLASSIFIED, CUI, or SECRET."
📋

Few-Shot

Provide 2-5 examples before the task. Dramatically improves accuracy.

"Example 1: X → Y. Example 2: A → B. Now classify: C → ?"
🧠

Chain-of-Thought

Ask the model to reason step-by-step before answering.

"Think step by step about the budget implications..."
⚙️

System Prompts

Set role, constraints, output format at the system level.

"You are a DoD analyst. Always cite FM references. Output JSON."
🌳

Tree of Thoughts

Explore multiple reasoning paths, select the best.

Generate 3 approaches, evaluate each, pick the optimal.
📊

Structured Output

Force JSON/XML output for programmatic processing.

"Respond ONLY in JSON: {"risk_level": ..., "factors": [...]}"

Interactive Prompt Playground

Prompt Playground

Test prompts against Claude · Powered by Anthropic API

Response will appear here after running...

Agentic AI Systems

Building multi-agent systems with tool use, routing, and orchestration — based on the MyThing platform implementation.

🔄

ReAct Agent

Reason → Act → Observe loop. Tool use + reasoning interleaved.

📋

Plan & Execute

Generate a full plan first, then execute each step.

🗺️

Multi-Agent Router

Classify query → route to specialized agent. (MyThing pattern)

🪞

Reflexive Agent

Self-critique and revise before final output.

Case Study: MyThing Platform Agent Architecture

🐍multi_agent.py
1# Multi-Agent System (MyThing Platform Pattern)
2from anthropic import Anthropic
3
4client = Anthropic()
5
6def route_query(query: str) -> str:
7    """Determine which specialized agent to use"""
8    routing = client.messages.create(
9        model="claude-3-5-haiku-20241022",
10        max_tokens=50,
11        system="Classify the query. Respond with ONLY one word: portfolio, tech, dod, or notes",
12        messages=[{"role": "user", "content": query}]
13    )
14    return routing.content[0].text.strip().lower()
15
16AGENT_CONFIGS = {
17    "portfolio": "You are a portfolio analyst for Peter Shang's projects...",
18    "tech": "You are a tech trends analyst with expertise in AI/ML...",
19    "dod": "You are a DoD policy and federal finance expert...",
20    "notes": "You help retrieve and summarize personal notes...",
21}
22
23def run_agent(query: str) -> str:
24    agent_type = route_query(query)
25    system_prompt = AGENT_CONFIGS.get(agent_type, AGENT_CONFIGS["tech"])
26    
27    response = client.messages.create(
28        model="claude-3-5-sonnet-20241022",
29        max_tokens=1024,
30        system=system_prompt,
31        messages=[{"role": "user", "content": query}]
32    )
33    
34    return f"[{agent_type.upper()} Agent] {response.content[0].text}"
35
36# Usage
37print(run_agent("What AI projects has Peter built?"))  # → portfolio
38print(run_agent("Latest trends in agentic AI?"))        # → tech
39print(run_agent("Explain FIAR audit requirements"))     # → dod

Production Implementation

This multi-agent pattern is live at shangthing.vercel.app — routing between Portfolio, Tech Trends, DoD Policy, and Notes agents using Gemini 2.5 with Google Search grounding.

RAG (Retrieval-Augmented Generation)

🐍rag.py
1from anthropic import Anthropic
2import numpy as np
3
4# Simple RAG implementation
5client = Anthropic()
6
7def cosine_similarity(a: list, b: list) -> float:
8    a, b = np.array(a), np.array(b)
9    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
10
11def simple_rag(query: str, documents: list[dict]) -> str:
12    """
13    Retrieval-Augmented Generation with Claude
14    documents: [{"content": "...", "embedding": [...]}]
15    """
16    # 1. Embed the query (simplified - use real embedding API)
17    # query_embedding = embed(query)
18    
19    # 2. Find most relevant docs
20    context = "\n\n".join([
21        f"[Source: {doc['source']}]\n{doc['content']}"
22        for doc in documents[:3]  # Top 3 relevant chunks
23    ])
24    
25    # 3. Generate with context
26    response = client.messages.create(
27        model="claude-3-5-sonnet-20241022",
28        max_tokens=1024,
29        system="""Answer questions using ONLY the provided context. 
30        Cite sources when possible. If unsure, say so.""",
31        messages=[{
32            "role": "user",
33            "content": f"Context:\n{context}\n\nQuestion: {query}"
34        }]
35    )
36    
37    return response.content[0].text