API Documentation#
- class hysom.hysom.HSOM(width: int, height: int, input_dim: tuple, random_seed: int | None = None)#
Self-Organizing Map (SOM) for 2D time series data.
- Parameters:
width (int) – Number of units along the width of the map.
height (int) – Number of units along the height of the map.
input_dim (tuple of int) – Shape of the input samples. Typically (seq_len, 2), where seq_len is the number of (x, y) coordinate points representing a sample.
random_seed (int, optional) – Ensures reproducibility. If None, results may vary each time due to random elements in the training process. Default is None.
- attribute_matrix(data: ~numpy.ndarray, attribute: ~numpy.ndarray, agg_method: ~typing.Callable[[~typing.List], float] = <function median>) ndarray#
Create an attribute matrix based on the provided data and attribute values.
- Parameters:
data (np.ndarray) – Collection of data samples with shape (nsamples, seq_len, 2).
attribute (np.ndarray) – Attribute values corresponding to each sample in data.
agg_method (Callable, optional (default=np.median)) – Aggregation method to apply to the attribute values for each BMU.
- Returns:
Attribute map with shape (height, width).
- Return type:
np.ndarray
- classify(samples: ndarray) dict[tuple, list]#
Assign each sample in samples to its Best Matching Unit (BMU).
- Parameters:
samples (np.ndarray) – Array of input samples with shape (n_samples, seq_len, n_features). For this SOM implementation, n_features is typically 2.
- Returns:
A dictionary mapping BMU coordinates (row, col) to a list of samples whose BMU corresponds to that coordinate. Each key is a tuple representing the BMU position on the SOM grid, and each value is the list of samples assigned to that node.
- Return type:
dict[tuple, list]
- frequency_matrix(data: ndarray, relative=False) ndarray#
Create a frequency matrix based on the provided data.
- Parameters:
data (np.ndarray) – Collection of data samples with shape (nsamples, seq_len, 2).
- Returns:
Frequency matrix with shape (height, width).
- Return type:
np.ndarray
- get_BMU(sample: ndarray) Tuple#
Return BMU coordinates for a given sample, following matrix notation: (row, col).
- Parameters:
sample (np.ndarray) – Input sample with shape (sequence_length, 2).
- Returns:
Coordinates of the Best Matching Unit (row, col).
- Return type:
Tuple
- get_QE_history() Tuple#
Get the average quantization error across iterations.
Only available if track_errors is set to True during training.
- Returns:
iteration (List) – Iteration indices.
QE (List) – Average quantization error values.
- get_TE_history() Tuple#
Get the average topographic error across iterations.
Only available if track_errors is set to True during training.
- Returns:
iteration (List) – Iteration indices.
TE (List) – Average topographic error values.
- get_distance_to_bmu(sample: ndarray) float#
Return the distance to the BMU for a given sample.
- Parameters:
sample (np.ndarray) – Input sample with shape (sequence_length, 2).
- Returns:
distance to the BMU.
- Return type:
float
- get_prototypes(bmu: tuple[int, int] | None = None) ndarray#
Get the prototypes.
- Returns:
prototypes array.
- Return type:
np.ndarray
- quantization_error(data: ndarray) List#
Compute the quantization error for each sample in data.
- Parameters:
data (np.ndarray) – Collection of data samples with shape (nsamples, seq_len, 2).
- Returns:
Quantization error for each data sample.
- Return type:
List
- random_init(data: ndarray)#
Initialize prototypes randomly from data
- Parameters:
data (np.ndarray) – Data
- set_init_prototypes(prototypes: ndarray)#
Initialize prototypes.
- Parameters:
prototypes (np.ndarray) – The shape of prototypes must be consistent with the SOM dimensions and input dimensions (input_dim): prototypes.shape = (height, width, seq_len, 2).
- topographic_error(data: ndarray) List#
Compute the topographic error for each sample in data.
- Parameters:
data (np.ndarray) – Collection of data samples with shape (nsamples, seq_len, 2).
- Returns:
Topographic error for each data sample.
- Return type:
List
- train(data: ndarray, epochs: int, random_order: bool = True, initial_sigma: float | None = None, initial_learning_rate: float = 1.0, final_sigma: float = 0.3, final_learning_rate: float = 0.01, decay_sigma_func: str | Callable = 'power', decay_learning_rate_func: str | Callable = 'power', neighborhood_function: str | Callable = 'gaussian', distance_function: str | Callable = 'dtw', track_errors: bool = False, errors_sampling_rate: int = 4, errors_data_fraction: float = 1.0, verbose: bool | int = False)#
Trains the Self-Organizing Map (SOM).
- Parameters:
data (np.ndarray) – Data array. The first dimension corresponds to the number of samples. Second and third dimensions must be consistent with input_dim
epochs (int) – Defines the number of training iterations (total_iterations = number_of_samples * epochs). Each data sample is fed to the map once every epoch.
random_order (bool, optional (default=True)) – If True, samples are picked randomly without replacement. If False, they are fed sequentially.
initial_sigma (float, optional (default: sqrt(width * height))) – Neighborhood radius at the first iteration.
initial_learning_rate (float, optional (default: 1.0)) – Learning rate at the first iteration.
final_sigma (float, optional (default: 0.3)) – Neighborhood radius at the last iteration.
final_learning_rate (float, optional (default: 0.01)) – Learning rate at the last iteration.
decay_sigma_func (str or callable, optional (default: "power")) –
Decay function for the neighborhood radius. Defines how the neighborhood radius change from initial_sigma to final_sigma. Available options: “power”, “linear”. If callable, the function should accept four arguments:
init_val (float): Initial neighborhood radius.
iter (int): Current iteration.
max_iter (int): Maximum number of iterations.
final_val (float): Minimum radius value.
The function must return a numeric value.
Examples
>>> def decay_linear(init_val, iteration, max_iter, final_val): >>> slope = (init_val - final_val) / max_iter >>> return init_val - (slope * iteration)
See the Tutorials for additional details
decay_learning_rate_func (str or callable, optional (default: "power")) – Same format as decay_sigma_func, but applied to the learning rate.
neighborhood_function (str or callable, optional (default: "gaussian")) –
Defines the neighborhood function.
Available options: “gaussian”, “bubble”. If callable, the function should accept three arguments:
grid (tuple of numpy arrays): Coordinate matrices as returned by numpy.meshgrid using matrix indexing convention:
center (tuple): Coordinates where the function returns 1.0 (peak value), using (i, j) matrix convention.
sigma (float): Defines the neighborhood radius.
The function should return a matrix of neighborhood values with shape (width, height). See the Tutorials for additional details
distance_function (str or callable, optional (default: "dtw")) –
Defines the distance function used to identify the BMU.
Available options: “dtw”, “euclidean”.
If callable, the function should accept two arguments:
prototypes (np.ndarray): prototypes array as returned by get_prototypes().
sample (np.ndarray): a sample data of shape input_dim.
The function should return an np.array of shape (width, height, seq_len, 2) containing the distance from sample to each prototype
track_errors (bool, optional (default=False)) – If True, quantization error (QE) and topographic error (TE) will be computed during training. These values can be accessed using get_QE_history() and get_TE_history().
errors_sampling_rate (int, optional (default=4)) – If track_errors is True, this parameter controls how often errors are tracked. Errors will be tracked errors_sampling_rate times per epoch.
errors_data_fraction (float, optional (default=1.0)) – If track_errors is True, this parameter specifies the fraction of the data used to compute errors. It should be between 0 and 1.0 (inclusive). If set to 1.0, all samples are used; if set to a value less than 1.0, the calculation is faster but uses fewer samples.
verbose (bool or int, optional (default=False)) – If True, the status of the training process will be printed each epoch. If int, this value represents the approximate number of times the status of the training process will be printed each epoch.