Section 1

ML Fundamentals

Comprehensive learning path from zero to production — core algorithms, deep learning, transformers, and evaluation metrics.

1.1

Machine Learning Basics

The three paradigms of ML — how and when to use each.

🎯

Supervised Learning

Learn from labeled data. Output is a prediction.

▸Classification
▸Regression
▸Ranking

🔍

Unsupervised Learning

Find patterns in unlabeled data.

▸Clustering
▸Dimensionality Reduction
▸Anomaly Detection

🎮

Reinforcement Learning

Learn through reward/penalty signals.

▸Game Playing
▸Robotics
▸RLHF for LLMs

Key Concept: Bias-Variance Tradeoff

High bias = underfitting (model too simple). High variance = overfitting (model too complex). The goal is to find the sweet spot using cross-validation and regularization.

1.2

Core Algorithms

Deep dives into the most important ML algorithms with theory, implementation, and real-world use cases.

📈

Linear Regression

regression

Time: O(n·p²)

Space: O(p²)

Finds the best-fit linear relationship between features and a continuous target variable using the least squares method. The foundation of all regression analysis.

✅ Best For

Continuous output prediction
When relationships are linear
Feature importance (coefficients)
DoD budget trend forecasting

❌ Avoid When

Non-linear relationships
Categorical outputs needed
High multicollinearity
Complex feature interactions

Always check residual plots and look for heteroscedasticity. In federal finance, log-transforming budget figures often improves linearity.

🌲

Random Forest

classification

Time: O(n·p·log n)

Space: O(trees·depth)

An ensemble of decision trees where each tree is trained on a random subset of features and data. Combines bagging with feature randomization for robust, high-performance predictions.

✅ Best For

Tabular data
Feature importance analysis
Handles missing data
DoD audit risk classification

❌ Avoid When

Large-scale real-time inference
When model interpretability is critical
Very high-dimensional sparse data

Feature importance from Random Forest is invaluable for DoD audit reports — it tells you exactly which factors drive financial risk.

✂️

Support Vector Machine (SVM)

classification

Time: O(n²) to O(n³)

Space: O(n)

Finds the optimal hyperplane that maximally separates classes. Uses the kernel trick to handle non-linear decision boundaries by projecting to higher-dimensional spaces.

✅ Best For

High-dimensional text data
Small-to-medium datasets
Clear margin separation needed
Binary classification

❌ Avoid When

Large datasets (slow training)
Probabilistic outputs needed
Many features & samples

SVMs with RBF kernel shine on document classification — great for policy analysis (OMB Circulars, NDAA documents).

Algorithm Quick Reference

Tabular data: XGBoost → Random Forest → Linear/Logistic Regression
Text/NLP: Transformer-based LLMs → Fine-tuned BERT → TF-IDF + SVM
Images: CNN (ResNet/EfficientNet) → Transfer Learning
Time series: LSTM → XGBoost with lag features → ARIMA

1.3

Deep Learning Essentials

From perceptrons to modern transformer architectures.

⚡

Transformers Architecture

The backbone of all modern LLMs

Introduced in "Attention Is All You Need" (2017), transformers replaced RNNs with multi-head self-attention, enabling parallel processing and long-range dependency capture. Every modern LLM (GPT, Claude, Gemini, Llama) is built on this architecture.

Self-Attention

Each token attends to every other token. Captures context across the full sequence.

Multi-Head

8-16 attention heads run in parallel, each learning different relationship patterns.

Positional Encoding

Adds sequence order information since attention is position-agnostic.

Feed-Forward

Position-wise MLP layers add non-linearity after each attention block.

🐍transformer.py
import torch
import torch.nn as nn

class MultiHeadAttention(nn.Module):
    def __init__(self, d_model=512, num_heads=8):
        super().__init__()
        self.d_model = d_model
        self.num_heads = num_heads
        self.d_k = d_model // num_heads
        
        self.W_q = nn.Linear(d_model, d_model)
        self.W_k = nn.Linear(d_model, d_model)
        self.W_v = nn.Linear(d_model, d_model)
        self.W_o = nn.Linear(d_model, d_model)
    
    def scaled_dot_product_attention(self, Q, K, V, mask=None):
        scores = torch.matmul(Q, K.transpose(-2, -1)) / (self.d_k ** 0.5)
        if mask is not None:
            scores = scores.masked_fill(mask == 0, -1e9)
        attention = torch.softmax(scores, dim=-1)
        return torch.matmul(attention, V)
    
    def forward(self, x):
        B, T, _ = x.shape
        Q = self.W_q(x).view(B, T, self.num_heads, self.d_k).transpose(1, 2)
        K = self.W_k(x).view(B, T, self.num_heads, self.d_k).transpose(1, 2)
        V = self.W_v(x).view(B, T, self.num_heads, self.d_k).transpose(1, 2)
        
        out = self.scaled_dot_product_attention(Q, K, V)
        out = out.transpose(1, 2).contiguous().view(B, T, self.d_model)
        return self.W_o(out)

1.4

Model Evaluation Metrics

Choosing the right metric is as important as choosing the right model.

Metric	Type	Formula	Use When
Accuracy	Classification	TP+TN / Total	Balanced classes
Precision	Classification	TP / (TP+FP)	False positives costly
Recall	Classification	TP / (TP+FN)	False negatives costly
F1-Score	Classification	2·P·R / (P+R)	Imbalanced classes
AUC-ROC	Classification	Area under ROC curve	Ranking models
RMSE	Regression	√(Σ(y-ŷ)²/n)	Penalize large errors
MAE	Regression	Σ\|y-ŷ\|/n	Robust to outliers
R²	Regression	1 - SS_res/SS_tot	Variance explained

🐍evaluation.py
from sklearn.metrics import (
    accuracy_score, precision_recall_fscore_support,
    roc_auc_score, confusion_matrix
)
import matplotlib.pyplot as plt
import seaborn as sns

def evaluate_classifier(y_true, y_pred, y_prob=None):
    """Comprehensive classification evaluation"""
    acc = accuracy_score(y_true, y_pred)
    prec, rec, f1, _ = precision_recall_fscore_support(
        y_true, y_pred, average='weighted'
    )
    
    metrics = {
        'Accuracy': acc,
        'Precision': prec,
        'Recall (Sensitivity)': rec,
        'F1-Score': f1,
    }
    
    if y_prob is not None:
        metrics['AUC-ROC'] = roc_auc_score(y_true, y_prob)
    
    for name, value in metrics.items():
        print(f"{name:25s}: {value:.4f}")
    
    return metrics

DoD Context

In audit risk prediction, use Recall as primary metric — missing a true audit risk (false negative) is far more costly than a false alarm. Target >0.90 recall with acceptable precision.