Section 1
ML Fundamentals
Comprehensive learning path from zero to production — core algorithms, deep learning, transformers, and evaluation metrics.
1.1
Machine Learning Basics
The three paradigms of ML — how and when to use each.
Supervised Learning
Learn from labeled data. Output is a prediction.
- ▸Classification
- ▸Regression
- ▸Ranking
Unsupervised Learning
Find patterns in unlabeled data.
- ▸Clustering
- ▸Dimensionality Reduction
- ▸Anomaly Detection
Reinforcement Learning
Learn through reward/penalty signals.
- ▸Game Playing
- ▸Robotics
- ▸RLHF for LLMs
Key Concept: Bias-Variance Tradeoff
1.2
Core Algorithms
Deep dives into the most important ML algorithms with theory, implementation, and real-world use cases.
Linear Regression
regressionFinds the best-fit linear relationship between features and a continuous target variable using the least squares method. The foundation of all regression analysis.
✅ Best For
- Continuous output prediction
- When relationships are linear
- Feature importance (coefficients)
- DoD budget trend forecasting
❌ Avoid When
- Non-linear relationships
- Categorical outputs needed
- High multicollinearity
- Complex feature interactions
Always check residual plots and look for heteroscedasticity. In federal finance, log-transforming budget figures often improves linearity.
Random Forest
classificationAn ensemble of decision trees where each tree is trained on a random subset of features and data. Combines bagging with feature randomization for robust, high-performance predictions.
✅ Best For
- Tabular data
- Feature importance analysis
- Handles missing data
- DoD audit risk classification
❌ Avoid When
- Large-scale real-time inference
- When model interpretability is critical
- Very high-dimensional sparse data
Feature importance from Random Forest is invaluable for DoD audit reports — it tells you exactly which factors drive financial risk.
Support Vector Machine (SVM)
classificationFinds the optimal hyperplane that maximally separates classes. Uses the kernel trick to handle non-linear decision boundaries by projecting to higher-dimensional spaces.
✅ Best For
- High-dimensional text data
- Small-to-medium datasets
- Clear margin separation needed
- Binary classification
❌ Avoid When
- Large datasets (slow training)
- Probabilistic outputs needed
- Many features & samples
SVMs with RBF kernel shine on document classification — great for policy analysis (OMB Circulars, NDAA documents).
Algorithm Quick Reference
Text/NLP: Transformer-based LLMs → Fine-tuned BERT → TF-IDF + SVM
Images: CNN (ResNet/EfficientNet) → Transfer Learning
Time series: LSTM → XGBoost with lag features → ARIMA
1.3
Deep Learning Essentials
From perceptrons to modern transformer architectures.
Transformers Architecture
The backbone of all modern LLMs
Introduced in "Attention Is All You Need" (2017), transformers replaced RNNs with multi-head self-attention, enabling parallel processing and long-range dependency capture. Every modern LLM (GPT, Claude, Gemini, Llama) is built on this architecture.
Self-Attention
Each token attends to every other token. Captures context across the full sequence.
Multi-Head
8-16 attention heads run in parallel, each learning different relationship patterns.
Positional Encoding
Adds sequence order information since attention is position-agnostic.
Feed-Forward
Position-wise MLP layers add non-linearity after each attention block.
1import torch
2import torch.nn as nn
3
4class MultiHeadAttention(nn.Module):
5 def __init__(self, d_model=512, num_heads=8):
6 super().__init__()
7 self.d_model = d_model
8 self.num_heads = num_heads
9 self.d_k = d_model // num_heads
10
11 self.W_q = nn.Linear(d_model, d_model)
12 self.W_k = nn.Linear(d_model, d_model)
13 self.W_v = nn.Linear(d_model, d_model)
14 self.W_o = nn.Linear(d_model, d_model)
15
16 def scaled_dot_product_attention(self, Q, K, V, mask=None):
17 scores = torch.matmul(Q, K.transpose(-2, -1)) / (self.d_k ** 0.5)
18 if mask is not None:
19 scores = scores.masked_fill(mask == 0, -1e9)
20 attention = torch.softmax(scores, dim=-1)
21 return torch.matmul(attention, V)
22
23 def forward(self, x):
24 B, T, _ = x.shape
25 Q = self.W_q(x).view(B, T, self.num_heads, self.d_k).transpose(1, 2)
26 K = self.W_k(x).view(B, T, self.num_heads, self.d_k).transpose(1, 2)
27 V = self.W_v(x).view(B, T, self.num_heads, self.d_k).transpose(1, 2)
28
29 out = self.scaled_dot_product_attention(Q, K, V)
30 out = out.transpose(1, 2).contiguous().view(B, T, self.d_model)
31 return self.W_o(out)1.4
Model Evaluation Metrics
Choosing the right metric is as important as choosing the right model.
| Metric | Type | Formula | Use When |
|---|---|---|---|
| Accuracy | Classification | TP+TN / Total | Balanced classes |
| Precision | Classification | TP / (TP+FP) | False positives costly |
| Recall | Classification | TP / (TP+FN) | False negatives costly |
| F1-Score | Classification | 2·P·R / (P+R) | Imbalanced classes |
| AUC-ROC | Classification | Area under ROC curve | Ranking models |
| RMSE | Regression | √(Σ(y-ŷ)²/n) | Penalize large errors |
| MAE | Regression | Σ|y-ŷ|/n | Robust to outliers |
| R² | Regression | 1 - SS_res/SS_tot | Variance explained |
1from sklearn.metrics import (
2 accuracy_score, precision_recall_fscore_support,
3 roc_auc_score, confusion_matrix
4)
5import matplotlib.pyplot as plt
6import seaborn as sns
7
8def evaluate_classifier(y_true, y_pred, y_prob=None):
9 """Comprehensive classification evaluation"""
10 acc = accuracy_score(y_true, y_pred)
11 prec, rec, f1, _ = precision_recall_fscore_support(
12 y_true, y_pred, average='weighted'
13 )
14
15 metrics = {
16 'Accuracy': acc,
17 'Precision': prec,
18 'Recall (Sensitivity)': rec,
19 'F1-Score': f1,
20 }
21
22 if y_prob is not None:
23 metrics['AUC-ROC'] = roc_auc_score(y_true, y_prob)
24
25 for name, value in metrics.items():
26 print(f"{name:25s}: {value:.4f}")
27
28 return metricsDoD Context