API Reference¶
Auto-generated API documentation from source code docstrings.
Top-Level Package¶
pitch_sequencing
¶
Baseball pitch sequence prediction and analysis.
__version__ = '0.1.0'
module-attribute
¶
MODEL_REGISTRY = {'logistic_regression': LogisticRegressionModel, 'random_forest': RandomForestModel, 'hmm': HMMModel, 'autogluon': AutoGluonModel, 'lstm': LSTMModel, 'cnn1d': CNN1DModel, 'transformer': TransformerModel}
module-attribute
¶
DataConfig
dataclass
¶
Source code in src/pitch_sequencing/config.py
ModelConfig
dataclass
¶
Source code in src/pitch_sequencing/config.py
get_model(name, config=None)
¶
Instantiate a model by registry name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
Key in MODEL_REGISTRY (e.g. 'lstm', 'random_forest'). |
required | |
config
|
Optional dict of hyperparameters. |
None
|
Returns:
| Type | Description |
|---|---|
|
Instance of the model class. |
Source code in src/pitch_sequencing/models/__init__.py
load_pitch_data(path, filter_none_prev=True)
¶
Load the main pitch dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Path to baseball_pitch_data.csv. |
required |
filter_none_prev
|
bool
|
If True, drop rows where PreviousPitchType is 'None'. |
True
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
DataFrame with pitch data. |
Source code in src/pitch_sequencing/data/loader.py
create_sequences(df, window_size=8, feature_cols=None, target_col='PitchType_enc')
¶
Create sliding-window sequences respecting game boundaries.
Game boundaries are detected via PitchNumber resets (the raw column must be present or reconstructable). The function expects that categorical columns have already been encoded (e.g. PitchType_enc, PitcherType_enc).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
DataFrame with encoded features. |
required |
window_size
|
int
|
Number of previous timesteps per sample. |
8
|
feature_cols
|
Optional[List[str]]
|
Columns to include as features in each timestep. |
None
|
target_col
|
str
|
Column to predict. |
'PitchType_enc'
|
Returns:
| Type | Description |
|---|---|
ndarray
|
(X, y, game_starts) where X has shape (n_samples, window_size, n_features), |
ndarray
|
y has shape (n_samples,), and game_starts lists the indices where new games start. |
Source code in src/pitch_sequencing/data/loader.py
generate_dataset(num_games=3000, at_bats_per_game=35, seed=42)
¶
Generate the main pitch dataset by simulating full games.