ML Fundamentals

Comprehensive learning path from zero to production — core algorithms, deep learning, transformers, and evaluation metrics.

Machine Learning Basics

The three paradigms of ML — how and when to use each.

🎯

Supervised Learning

Learn from labeled data. Output is a prediction.

  • Classification
  • Regression
  • Ranking
🔍

Unsupervised Learning

Find patterns in unlabeled data.

  • Clustering
  • Dimensionality Reduction
  • Anomaly Detection
🎮

Reinforcement Learning

Learn through reward/penalty signals.

  • Game Playing
  • Robotics
  • RLHF for LLMs

Key Concept: Bias-Variance Tradeoff

High bias = underfitting (model too simple). High variance = overfitting (model too complex). The goal is to find the sweet spot using cross-validation and regularization.

Core Algorithms

Deep dives into the most important ML algorithms with theory, implementation, and real-world use cases.

📈

Linear Regression

regression
Time: O(n·p²)
Space: O(p²)

Finds the best-fit linear relationship between features and a continuous target variable using the least squares method. The foundation of all regression analysis.

✅ Best For

  • Continuous output prediction
  • When relationships are linear
  • Feature importance (coefficients)
  • DoD budget trend forecasting

❌ Avoid When

  • Non-linear relationships
  • Categorical outputs needed
  • High multicollinearity
  • Complex feature interactions

Always check residual plots and look for heteroscedasticity. In federal finance, log-transforming budget figures often improves linearity.

🌲

Random Forest

classification
Time: O(n·p·log n)
Space: O(trees·depth)

An ensemble of decision trees where each tree is trained on a random subset of features and data. Combines bagging with feature randomization for robust, high-performance predictions.

✅ Best For

  • Tabular data
  • Feature importance analysis
  • Handles missing data
  • DoD audit risk classification

❌ Avoid When

  • Large-scale real-time inference
  • When model interpretability is critical
  • Very high-dimensional sparse data

Feature importance from Random Forest is invaluable for DoD audit reports — it tells you exactly which factors drive financial risk.

✂️

Support Vector Machine (SVM)

classification
Time: O(n²) to O(n³)
Space: O(n)

Finds the optimal hyperplane that maximally separates classes. Uses the kernel trick to handle non-linear decision boundaries by projecting to higher-dimensional spaces.

✅ Best For

  • High-dimensional text data
  • Small-to-medium datasets
  • Clear margin separation needed
  • Binary classification

❌ Avoid When

  • Large datasets (slow training)
  • Probabilistic outputs needed
  • Many features & samples

SVMs with RBF kernel shine on document classification — great for policy analysis (OMB Circulars, NDAA documents).

Algorithm Quick Reference

Tabular data: XGBoost → Random Forest → Linear/Logistic Regression
Text/NLP: Transformer-based LLMs → Fine-tuned BERT → TF-IDF + SVM
Images: CNN (ResNet/EfficientNet) → Transfer Learning
Time series: LSTM → XGBoost with lag features → ARIMA

Deep Learning Essentials

From perceptrons to modern transformer architectures.

Transformers Architecture

The backbone of all modern LLMs

Introduced in "Attention Is All You Need" (2017), transformers replaced RNNs with multi-head self-attention, enabling parallel processing and long-range dependency capture. Every modern LLM (GPT, Claude, Gemini, Llama) is built on this architecture.

Self-Attention

Each token attends to every other token. Captures context across the full sequence.

Multi-Head

8-16 attention heads run in parallel, each learning different relationship patterns.

Positional Encoding

Adds sequence order information since attention is position-agnostic.

Feed-Forward

Position-wise MLP layers add non-linearity after each attention block.

🐍transformer.py
1import torch
2import torch.nn as nn
3
4class MultiHeadAttention(nn.Module):
5    def __init__(self, d_model=512, num_heads=8):
6        super().__init__()
7        self.d_model = d_model
8        self.num_heads = num_heads
9        self.d_k = d_model // num_heads
10        
11        self.W_q = nn.Linear(d_model, d_model)
12        self.W_k = nn.Linear(d_model, d_model)
13        self.W_v = nn.Linear(d_model, d_model)
14        self.W_o = nn.Linear(d_model, d_model)
15    
16    def scaled_dot_product_attention(self, Q, K, V, mask=None):
17        scores = torch.matmul(Q, K.transpose(-2, -1)) / (self.d_k ** 0.5)
18        if mask is not None:
19            scores = scores.masked_fill(mask == 0, -1e9)
20        attention = torch.softmax(scores, dim=-1)
21        return torch.matmul(attention, V)
22    
23    def forward(self, x):
24        B, T, _ = x.shape
25        Q = self.W_q(x).view(B, T, self.num_heads, self.d_k).transpose(1, 2)
26        K = self.W_k(x).view(B, T, self.num_heads, self.d_k).transpose(1, 2)
27        V = self.W_v(x).view(B, T, self.num_heads, self.d_k).transpose(1, 2)
28        
29        out = self.scaled_dot_product_attention(Q, K, V)
30        out = out.transpose(1, 2).contiguous().view(B, T, self.d_model)
31        return self.W_o(out)

Model Evaluation Metrics

Choosing the right metric is as important as choosing the right model.

MetricTypeFormulaUse When
AccuracyClassificationTP+TN / TotalBalanced classes
PrecisionClassificationTP / (TP+FP)False positives costly
RecallClassificationTP / (TP+FN)False negatives costly
F1-ScoreClassification2·P·R / (P+R)Imbalanced classes
AUC-ROCClassificationArea under ROC curveRanking models
RMSERegression√(Σ(y-ŷ)²/n)Penalize large errors
MAERegressionΣ|y-ŷ|/nRobust to outliers
Regression1 - SS_res/SS_totVariance explained
🐍evaluation.py
1from sklearn.metrics import (
2    accuracy_score, precision_recall_fscore_support,
3    roc_auc_score, confusion_matrix
4)
5import matplotlib.pyplot as plt
6import seaborn as sns
7
8def evaluate_classifier(y_true, y_pred, y_prob=None):
9    """Comprehensive classification evaluation"""
10    acc = accuracy_score(y_true, y_pred)
11    prec, rec, f1, _ = precision_recall_fscore_support(
12        y_true, y_pred, average='weighted'
13    )
14    
15    metrics = {
16        'Accuracy': acc,
17        'Precision': prec,
18        'Recall (Sensitivity)': rec,
19        'F1-Score': f1,
20    }
21    
22    if y_prob is not None:
23        metrics['AUC-ROC'] = roc_auc_score(y_true, y_prob)
24    
25    for name, value in metrics.items():
26        print(f"{name:25s}: {value:.4f}")
27    
28    return metrics

DoD Context

In audit risk prediction, use Recall as primary metric — missing a true audit risk (false negative) is far more costly than a false alarm. Target >0.90 recall with acceptable precision.