Section 6

Advanced Topics & Research

Cutting-edge 2026 AI landscape, RAG systems, fine-tuning, RLHF, and the research papers that matter.

6.1

2026 AI Landscape

The major trends shaping AI/ML right now.

🤖

Agentic AI Revolution

Critical

Models that plan, use tools, and take actions over multiple steps. From basic chatbots to autonomous AI workers.

📚

Multi-Million Token Contexts

High

Gemini 2.5 Pro at 2M tokens means entire codebases, years of documents, or entire books fit in context.

🧠

Reasoning Models (o1/o3)

High

Models that spend compute tokens thinking before answering. Dramatically better on math and logic.

📱

On-Device / Edge AI

Growing

Phi-4, Gemma, Llama running on device. Privacy-preserving, offline-capable, no API costs.

🎭

Multimodal Everything

High

Text + Images + Audio + Video as native inputs. GPT-4o, Gemini 2.0 Flash as examples.

🏛️

AI Governance & Policy

Critical for DoD

Executive orders, EU AI Act, DoD AI policies. Compliance becoming essential for federal AI.

6.2

Advanced RAG Systems

Production-grade retrieval-augmented generation with hybrid search and reranking.

🐍advanced_rag.py
# Advanced RAG with Hybrid Search + Reranking
from anthropic import Anthropic
import chromadb
from chromadb.utils import embedding_functions

client = Anthropic()

class AdvancedRAG:
    def __init__(self, collection_name: str = "policy_docs"):
        self.chroma = chromadb.PersistentClient("./rag_db")
        self.ef = embedding_functions.OpenAIEmbeddingFunction(
            api_key="your-key",
            model_name="text-embedding-3-small"
        )
        self.collection = self.chroma.get_or_create_collection(
            collection_name, embedding_function=self.ef
        )
        self.anthropic = Anthropic()
    
    def add_documents(self, docs: list[dict]):
        """Add documents with metadata"""
        self.collection.add(
            documents=[d["content"] for d in docs],
            metadatas=[d["metadata"] for d in docs],
            ids=[d["id"] for d in docs]
        )
    
    def query_with_reranking(self, query: str, k: int = 5) -> str:
        # Step 1: Initial retrieval
        results = self.collection.query(
            query_texts=[query],
            n_results=k * 2  # Over-retrieve then rerank
        )
        
        docs = results["documents"][0]
        metas = results["metadatas"][0]
        
        # Step 2: LLM Reranking
        rerank_response = self.anthropic.messages.create(
            model="claude-3-5-haiku-20241022",
            max_tokens=500,
            messages=[{
                "role": "user",
                "content": f"""Given this query: "{query}"
Rank these {len(docs)} documents by relevance (1=most relevant).
Return ONLY a JSON array of indices: [3, 0, 2, 1, ...]
Documents:
{chr(10).join(f'{i}: {d[:200]}' for i, d in enumerate(docs))}"""
            }]
        )
        
        import json, re
        ranking_text = rerank_response.content[0].text
        idx_match = re.search(r'\[.*?\]', ranking_text, re.DOTALL)
        indices = json.loads(idx_match.group()) if idx_match else list(range(k))
        
        # Step 3: Generate with top-ranked context
        top_docs = [docs[i] for i in indices[:k] if i < len(docs)]
        context = "\n\n---\n\n".join([
            f"[Source: {metas[i].get('source', 'Unknown')}]\n{docs[i]}"
            for i in indices[:k] if i < len(docs)
        ])
        
        response = self.anthropic.messages.create(
            model="claude-3-5-sonnet-20241022",
            max_tokens=1500,
            system="Answer based ONLY on provided context. Cite sources.",
            messages=[{"role": "user", "content": f"Context:\n{context}\n\nQuery: {query}"}]
        )
        
        return response.content[0].text

# Usage
rag = AdvancedRAG()
answer = rag.query_with_reranking("What are FY2025 audit requirements?")

When to Use RAG vs Fine-Tuning

Use RAG when: data changes frequently, you need source attribution, domain knowledge is large. Use fine-tuning when: you need a specific style/format, the task is narrow and stable, latency matters.

6.3

Fine-Tuning with LoRA / QLoRA

Parameter-efficient fine-tuning that trains only 0.1% of model parameters while achieving near-full fine-tune performance.

Full Fine-Tune

100%

trainable params

~160GB VRAM

When resources allow

LoRA

~1%

trainable params

~40GB VRAM

Standard adaptation

QLoRA

~0.1%

trainable params

~10GB VRAM

Consumer GPU fine-tuning

🐍lora_finetune.py
# LoRA Fine-Tuning with Hugging Face + PEFT
from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer
import torch

# Load base model (4-bit quantized for efficiency)
model_name = "meta-llama/Llama-3.2-3B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    load_in_4bit=True,  # QLoRA - saves ~75% memory
    torch_dtype=torch.float16,
    device_map="auto"
)

# LoRA configuration
lora_config = LoraConfig(
    r=16,                    # Rank — higher = more parameters, better quality
    lora_alpha=32,           # Scaling factor
    target_modules=[         # Modules to apply LoRA to
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj"
    ],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.CAUSAL_LM
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# trainable params: 3,670,016 || all params: 3,212,749,824
# trainable%: 0.1142% ← Only 0.1% of params trained!

# Training
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,  # Your custom dataset
    args=TrainingArguments(
        output_dir="./fine-tuned-llama",
        num_train_epochs=3,
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=10,
        save_strategy="epoch",
    ),
    dataset_text_field="text",
    max_seq_length=2048,
)

trainer.train()
model.save_pretrained("./dod-financial-llama")

6.4

RLHF & Constitutional AI

How modern LLMs are aligned to be helpful, harmless, and honest.

🔄

RLHF Process

1.Pre-train on massive text corpus
2.Supervised fine-tuning (SFT) on demonstrations
3.Train reward model from human preferences
4.RL (PPO) to maximize reward model score

📜

Constitutional AI (Anthropic)

1.Generate outputs from a model
2.Use AI (not humans) to critique using a "constitution"
3.Revise based on AI critique
4.Train on revised outputs (RLAIF)

6.5

Notable Papers

The research papers that changed everything.

2017

Attention Is All You Need

Foundational

Vaswani et al.

Invented the transformer. The foundation of every modern LLM.

2018

BERT: Pre-training of Deep Bidirectional Transformers

NLP Revolution

Devlin et al.

Transfer learning for NLP. Showed pre-training + fine-tuning dominance.

2020

Language Models are Few-Shot Learners (GPT-3)

GPT-3

Brown et al.

Demonstrated emergent capabilities at scale. Kick-started the LLM era.

2022

Chain-of-Thought Prompting Elicits Reasoning

Prompting

Wei et al.

CoT prompting unlocks step-by-step reasoning. Now standard practice.

2022

Training Language Models to Follow Instructions (InstructGPT)

Alignment

Ouyang et al.

RLHF for instruction following. Led directly to ChatGPT.

2023

LoRA: Low-Rank Adaptation of Large Language Models

Fine-Tuning

Hu et al.

Parameter-efficient fine-tuning. Made fine-tuning accessible to everyone.