Section 6
Advanced Topics & Research
Cutting-edge 2026 AI landscape, RAG systems, fine-tuning, RLHF, and the research papers that matter.
6.1
2026 AI Landscape
The major trends shaping AI/ML right now.
Agentic AI Revolution
Models that plan, use tools, and take actions over multiple steps. From basic chatbots to autonomous AI workers.
Multi-Million Token Contexts
Gemini 2.5 Pro at 2M tokens means entire codebases, years of documents, or entire books fit in context.
Reasoning Models (o1/o3)
Models that spend compute tokens thinking before answering. Dramatically better on math and logic.
On-Device / Edge AI
Phi-4, Gemma, Llama running on device. Privacy-preserving, offline-capable, no API costs.
Multimodal Everything
Text + Images + Audio + Video as native inputs. GPT-4o, Gemini 2.0 Flash as examples.
AI Governance & Policy
Executive orders, EU AI Act, DoD AI policies. Compliance becoming essential for federal AI.
6.2
Advanced RAG Systems
Production-grade retrieval-augmented generation with hybrid search and reranking.
1# Advanced RAG with Hybrid Search + Reranking
2from anthropic import Anthropic
3import chromadb
4from chromadb.utils import embedding_functions
5
6client = Anthropic()
7
8class AdvancedRAG:
9 def __init__(self, collection_name: str = "policy_docs"):
10 self.chroma = chromadb.PersistentClient("./rag_db")
11 self.ef = embedding_functions.OpenAIEmbeddingFunction(
12 api_key="your-key",
13 model_name="text-embedding-3-small"
14 )
15 self.collection = self.chroma.get_or_create_collection(
16 collection_name, embedding_function=self.ef
17 )
18 self.anthropic = Anthropic()
19
20 def add_documents(self, docs: list[dict]):
21 """Add documents with metadata"""
22 self.collection.add(
23 documents=[d["content"] for d in docs],
24 metadatas=[d["metadata"] for d in docs],
25 ids=[d["id"] for d in docs]
26 )
27
28 def query_with_reranking(self, query: str, k: int = 5) -> str:
29 # Step 1: Initial retrieval
30 results = self.collection.query(
31 query_texts=[query],
32 n_results=k * 2 # Over-retrieve then rerank
33 )
34
35 docs = results["documents"][0]
36 metas = results["metadatas"][0]
37
38 # Step 2: LLM Reranking
39 rerank_response = self.anthropic.messages.create(
40 model="claude-3-5-haiku-20241022",
41 max_tokens=500,
42 messages=[{
43 "role": "user",
44 "content": f"""Given this query: "{query}"
45Rank these {len(docs)} documents by relevance (1=most relevant).
46Return ONLY a JSON array of indices: [3, 0, 2, 1, ...]
47Documents:
48{chr(10).join(f'{i}: {d[:200]}' for i, d in enumerate(docs))}"""
49 }]
50 )
51
52 import json, re
53 ranking_text = rerank_response.content[0].text
54 idx_match = re.search(r'\[.*?\]', ranking_text, re.DOTALL)
55 indices = json.loads(idx_match.group()) if idx_match else list(range(k))
56
57 # Step 3: Generate with top-ranked context
58 top_docs = [docs[i] for i in indices[:k] if i < len(docs)]
59 context = "\n\n---\n\n".join([
60 f"[Source: {metas[i].get('source', 'Unknown')}]\n{docs[i]}"
61 for i in indices[:k] if i < len(docs)
62 ])
63
64 response = self.anthropic.messages.create(
65 model="claude-3-5-sonnet-20241022",
66 max_tokens=1500,
67 system="Answer based ONLY on provided context. Cite sources.",
68 messages=[{"role": "user", "content": f"Context:\n{context}\n\nQuery: {query}"}]
69 )
70
71 return response.content[0].text
72
73# Usage
74rag = AdvancedRAG()
75answer = rag.query_with_reranking("What are FY2025 audit requirements?")When to Use RAG vs Fine-Tuning
6.3
Fine-Tuning with LoRA / QLoRA
Parameter-efficient fine-tuning that trains only 0.1% of model parameters while achieving near-full fine-tune performance.
Full Fine-Tune
trainable params
~160GB VRAM
When resources allow
LoRA
trainable params
~40GB VRAM
Standard adaptation
QLoRA
trainable params
~10GB VRAM
Consumer GPU fine-tuning
1# LoRA Fine-Tuning with Hugging Face + PEFT
2from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
3from peft import LoraConfig, get_peft_model, TaskType
4from trl import SFTTrainer
5import torch
6
7# Load base model (4-bit quantized for efficiency)
8model_name = "meta-llama/Llama-3.2-3B-Instruct"
9model = AutoModelForCausalLM.from_pretrained(
10 model_name,
11 load_in_4bit=True, # QLoRA - saves ~75% memory
12 torch_dtype=torch.float16,
13 device_map="auto"
14)
15
16# LoRA configuration
17lora_config = LoraConfig(
18 r=16, # Rank — higher = more parameters, better quality
19 lora_alpha=32, # Scaling factor
20 target_modules=[ # Modules to apply LoRA to
21 "q_proj", "k_proj", "v_proj", "o_proj",
22 "gate_proj", "up_proj", "down_proj"
23 ],
24 lora_dropout=0.05,
25 bias="none",
26 task_type=TaskType.CAUSAL_LM
27)
28
29model = get_peft_model(model, lora_config)
30model.print_trainable_parameters()
31# trainable params: 3,670,016 || all params: 3,212,749,824
32# trainable%: 0.1142% ← Only 0.1% of params trained!
33
34# Training
35trainer = SFTTrainer(
36 model=model,
37 train_dataset=dataset, # Your custom dataset
38 args=TrainingArguments(
39 output_dir="./fine-tuned-llama",
40 num_train_epochs=3,
41 per_device_train_batch_size=4,
42 gradient_accumulation_steps=4,
43 learning_rate=2e-4,
44 fp16=True,
45 logging_steps=10,
46 save_strategy="epoch",
47 ),
48 dataset_text_field="text",
49 max_seq_length=2048,
50)
51
52trainer.train()
53model.save_pretrained("./dod-financial-llama")6.4
RLHF & Constitutional AI
How modern LLMs are aligned to be helpful, harmless, and honest.
RLHF Process
- 1.Pre-train on massive text corpus
- 2.Supervised fine-tuning (SFT) on demonstrations
- 3.Train reward model from human preferences
- 4.RL (PPO) to maximize reward model score
Constitutional AI (Anthropic)
- 1.Generate outputs from a model
- 2.Use AI (not humans) to critique using a "constitution"
- 3.Revise based on AI critique
- 4.Train on revised outputs (RLAIF)
6.5
Notable Papers
The research papers that changed everything.
Attention Is All You Need
FoundationalVaswani et al.
Invented the transformer. The foundation of every modern LLM.
BERT: Pre-training of Deep Bidirectional Transformers
NLP RevolutionDevlin et al.
Transfer learning for NLP. Showed pre-training + fine-tuning dominance.
Language Models are Few-Shot Learners (GPT-3)
GPT-3Brown et al.
Demonstrated emergent capabilities at scale. Kick-started the LLM era.
Chain-of-Thought Prompting Elicits Reasoning
PromptingWei et al.
CoT prompting unlocks step-by-step reasoning. Now standard practice.
Training Language Models to Follow Instructions (InstructGPT)
AlignmentOuyang et al.
RLHF for instruction following. Led directly to ChatGPT.
LoRA: Low-Rank Adaptation of Large Language Models
Fine-TuningHu et al.
Parameter-efficient fine-tuning. Made fine-tuning accessible to everyone.