MLflow Tracking¶

All experiments are tracked with MLflow for reproducibility and comparison.

Starting the UI¶

make mlflow
# or: mlflow ui --backend-store-uri experiments

This opens the MLflow UI at http://localhost:5000.

Tracking URI¶

Experiments are stored locally in the experiments/ directory (gitignored):

file://./experiments/

What Gets Logged¶

Benchmark Runs¶

Each model's k-fold CV run logs:

Parameters: model name, hyperparameters, n_folds, data config
Metrics: accuracy, balanced_accuracy, macro_f1, log_loss (per fold and averaged)
Artifacts: results DataFrame, confusion matrices

Ablation Runs¶

Each ablation variant logs:

Parameters: ablation type, model, variant description
Metrics: performance under each ablation condition
Tags: ablation type for easy filtering

Experiment Names¶

Run Type	Default Experiment Name
Benchmark	`pitch_benchmark`
Ablation	`pitch_ablation`
Single training	`pitch_train`

Comparing Runs¶

In the MLflow UI:

Select an experiment from the sidebar
Check the runs you want to compare
Click Compare to see side-by-side metrics and parameters
Use the Chart view to visualize metric distributions

Programmatic Access¶

import mlflow

mlflow.set_tracking_uri("file://./experiments")

# List experiments
for exp in mlflow.search_experiments():
    print(exp.name, exp.experiment_id)

# Query runs
runs = mlflow.search_runs(experiment_names=["pitch_benchmark"])
print(runs[["params.model", "metrics.accuracy", "metrics.macro_f1"]])

Cleaning Up¶

Experiment artifacts are stored in experiments/ and are gitignored. To clean up:

make clean  # removes experiments/ and other build artifacts