DR<a class="headerlink" href="#module-paradime.dr" title="Permalink to this heading">¶

classify(X)[source]¶

Classifies data using the model’s classify method.

Parameters:: X (Union[ndarray, Tensor]) – A numpy array or PyTorch tensor with the data to be classified.
Return type:: Tensor
Returns:: A PyTorch tensor with the predicted class labels for the data.

compute_derived_data(only=None)[source]¶

Computes the derived data entries in the registered dataset.

After caling this function, the derived entries will be stored as regular entries in the routine’s dataset.

Parameters:: only (Optional[Literal[‘rel_based’, ‘other’]]) – If “rel_based”, only those entries are computed that require global relations. If “other”, all other entries are computed. By default (None), all relations are computed.
Return type:: None

compute_global_relations(force=False)[source]¶

Computes the global relations.

The computed relation data are stored in the instance’s global_relation_data attribute.

Parameters:: force (bool) – Whether or not to force a new computation, when relations have been previously computed for the same instance.
Return type:: None

embed(X)[source]¶

Embeds data into the learned embedding space using the model’s embed method.

Parameters:: X (Union[ndarray, Tensor]) – A numpy array or PyTorch tensor with the data to be embedded.
Return type:: Tensor
Returns:: A PyTorch tensor with the embedding coordinates for the data.

classmethod from_spec(file_or_spec, model=None)[source]¶

Creates a paradime.dr.ParametricDR routine from a ParaDime specification.

Parameters:: file_or_spec (Union[str, dict]) – The specification, either as a dictionary or as a path to a YAML/JSON file.
Return type:: TypeVar(_ParametricDR, bound= ParametricDR)
Returns:: The paradime.dr.ParametricDR routine.
Raises:: paradime.exceptions.SpecificationError – If the validation of the specification has failed.

run_training_phase(training_phase)[source]¶

Runs a single training phase.

Parameters:: training_phase (TrainingPhase) – A paradime.dr.TrainingPhase instance.
Return type:: None

set_training_defaults(training_phase=None, epochs=None, batch_size=None, batches_per_epoch=None, sampling=None, edge_rel_key=None, neg_sampling_rate=None, loss_keys=None, loss_weights=None, optimizer=None, learning_rate=None, report_interval=5, **kwargs)[source]¶

Sets a parametric dimensionality reduction routine’s default training parameters.

This methods accepts either a paradime.dr.TrainingPhase instance or individual parameters passed with the same keyword syntax used by paradime.dr.TrainingPhase. The specified default parameters will be used instead of the regular defaults when adding training phases.

Parameters:: training_phase (Optional[TrainingPhase]) – A paradime.dr.TrainingPhase instance with the new default settings. Instead of this, individual parameters can also be passed. For a full list of training phase settings, see paradime.dr.TrainingPhase.
Return type:: None

train(data=None)[source]¶

Runs all training phases of a parametric dimensionality reduction routine.

data: The training data, passed either as a single numpy array or: PyTorch tensor, or as a dictionary containing multiple arrays and/or tensors.

Return type:: None

class paradime.dr.TrainingPhase(name=None, epochs=5, batch_size=50, batches_per_epoch=-1, sampling='standard', edge_rel_key='rel', neg_sampling_rate=5, loss_keys=['loss'], loss_weights=None, optimizer=<class 'torch.optim.adam.Adam'>, learning_rate=0.01, report_interval=5, **kwargs)[source]¶

A collection of parameter settings for a single phase in the training of a paradime.dr.ParametricDR instance.

Parameters:

name (Optional[str]) – The name of the training phase.
epochs (int) – The number of epochs to run in this phase. In standard item-based sampling, the model sees every item once per epoch In the case of negative edge sampling, this is not guaranteed, and an epoch instead comprises batches_per_epoch batches (see parameter description below).
batch_size (int) – The number of items/edges in a batch. In standard item-based sampling, a batch has this many items, and the edges used for batch relations are constructed from the items. In the case of negative edge sampling, this is the number of sampled positive edges. The total number of edges is higher by a factor of r + 1, where r is the negative sampling rate. The same holds for the number of items (apart from possible duplicates, which can result from the edge sampling and are removed).
batches_per_epoch (int) – The number of batches per epoch. This parameter only has an effect for negative edge sampling, where the number of batches per epoch is not determined by the dataset size and the batch size. If this parameter is set to -1 (default), an epoch will comprise a number of batches that leads to a total number of sampled items roughly equal to the number of items in the dataset. If this parameter is set to an integer, an epoch will instead comprise that many batches.
sampling (Literal[‘standard’, ‘negative_edge’]) – The sampling strategy, which can be either 'standard' (simple item-based sampling; default) or 'negative_edge' (negative edge sampling).
edge_rel_key (str) – The key under which to find the global relations that should be used for negative edge sampling.
neg_sampling_rate (int) – The number of negative (i.e., non-neighbor) edges to sample for each real neighborhood edge.
loss_keys (list[str]) – The keys under which to find the losses that should be minimized in this training phase.
loss_weights (Optional[list[float]]) – The weights for the losses. If none are specified, losses will be weighed equally.
optimizer (type) – The optmizer to use for loss minimization.
learning_rate (float) – The learning rate used in the optimization.
report_interval (int) – How often the loss should be reported during training, given in terms of epochs. E.g., with a setting of 5, the loss will be reported every 5 epochs.
kwargs – Additional kwargs that are passed on to the optimizer.

loss¶: The loss constructed from the keys and weights specified above.

paradime.dr.register_data_func(name, data_func)[source]¶

Registers a new data function to be used in ParaDime specifications.

Parameters:

name (str) – The name of the data function.
data_func (Callable) – The data function to be registered.

Return type:

paradime.dr.register_loss(name, loss)[source]¶

Registers a new loss type to be used in ParaDime specifications.

Parameters:

name (str) – The name of the loss type.
loss (type[Loss]) – The paradime.pdloss.Loss to be registered.

Return type:

paradime.dr.register_loss_func(name, loss_func)[source]¶

Registers a new loss function to be used in ParaDime specifications.

Parameters:

name (str) – The name of the loss function.
data_func – The loss function to be registered.

Return type:

paradime.dr.register_relations(name, rel)[source]¶

Registers a new type of relations to be used in ParaDime specifications.

Parameters:

name (str) – The name of the relation type.
rel (type[Relations]) – The paradime.relations.Relations to be registered.

Return type:

paradime.dr.register_transform(name, tf)[source]¶

Registers a new relation transform to be used in ParaDime specifications.

Parameters:

name (str) – The name of the transform.
tf (type[RelationTransform]) – The :class:`paradime.transform.RelationTransform`to be registered.

Return type:

paradime.dr.validate_spec(file_or_spec)[source]¶

Validates a ParaDime specification.

Parameters:: file_or_spec (Union[str, dict]) – The specification, either as a dictionary or as a path to a YAML/JSON file.
Return type:: dict[str, Any]
Returns:: The validated specification as a dictionary.
Raises:: paradime.exceptions.SpecificationError – If the validation of the specification failed.

Relations¶

Relation computation for ParaDime.

The paradime.relations module defines various classes used to compute relations between data points.

class paradime.relations.DifferentiablePDist(p=2, metric=None, transform=None, data_key='main')[source]¶

Differentiable pairwise distances between data points.

Parameters:

p (float) – Parameter that specificies which p-norm to use as a distance function. Ignored if metric is set.
metric (Optional[Callable[[Tensor, Tensor], Tensor]]) – The distance metric to be used.
transform (Union[RelationTransform, list[RelationTransform], None]) – A single paradime.transforms.Transform or list of paradime.transforms.Transform instances to be applied to the relations.
data_key (str) – The key to access the data for which to compute relations.
verbose – Verbosity toggle.

relations¶: A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances. Available only after calling compute_relations().

compute_relations(X=None, **kwargs)[source]¶

Calculates the pairwise distances.

If metric is not None, a flexible but memory-inefficient implementation is used instead of PyTorch’s torch.nn.functional.pdist().

Parameters:: X (Union[ndarray, Tensor, None]) – Input data tensor with one sample per row.
Return type:: RelationData
Returns:: A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances.

class paradime.relations.DistsFromTo(metric=None, transform=None, data_key='main')[source]¶

Distances between individual pairs of data points.

Parameters:

metric (Optional[Callable[[Tensor, Tensor], Tensor]]) – The distance metric to be used.
transform (Union[RelationTransform, list[RelationTransform], None]) – A single paradime.transforms.Transform or list of paradime.transforms.Transform instances to be applied to the relations.
data_key (str) – The key to access the data for which to compute relations.

relations¶: A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances. Available only after calling compute_relations().

compute_relations(X=None, **kwargs)[source]¶

Calculates the distances.

Parameters:: X (Union[ndarray, Tensor, None]) – Input data tensor of shape (2, n, dim), where n is the number of pairs of data points.
Return type:: RelationData
Returns:: A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances.

class paradime.relations.NeighborBasedPDist(n_neighbors=None, metric=None, transform=None, data_key='main', verbose=False)[source]¶

Approximate, nearest-neighbor-based pairwise distances between data points.

Parameters:

n_neighbors (Optional[int]) – Number of nearest neighbors to be considered. If not specified, this will be set to 5 percent of the number of data points. If the transforms include any paradime.transforms.AdaptiveNeighborhoodRescale instances, this parameter will be overridden according to their parameters.
metric (Union[Callable[[Tensor, Tensor], Tensor], str, None]) – The distance metric to be used.
transform (Union[RelationTransform, list[RelationTransform], None]) – A single paradime.transforms.Transform or list of paradime.transforms.Transform instances to be applied to the relations.
data_key (str) – The key to access the data for which to compute relations.
verbose (bool) – Verbosity toggle.

relations¶: A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances. Available only after calling compute_relations().

compute_relations(X=None, **kwargs)[source]¶

Calculates the pairwise distances.

Parameters:: X (Union[ndarray, Tensor, None]) – Input data tensor with one sample per row.
Return type:: RelationData
Returns:: A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances.

class paradime.relations.PDist(metric=None, transform=None, keep_result=True, data_key='main', verbose=False)[source]¶

Full pairwise distances between data points.

Parameters:

metric (Union[Callable, str, None]) – The distance metric to be used.
transform (Union[RelationTransform, list[RelationTransform], None]) – A single paradime.transforms.Transform or list of paradime.transforms.Transform instances to be applied to the relations.
keep_result – Specifies whether or not to keep previously calculated distances, rather than computing new ones.
data_key (str) – The key to access the data for which to compute relations.
verbose (bool) – Verbosity toggle.

relations¶: A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances. Available only after calling compute_relations().

compute_relations(X=None, **kwargs)[source]¶

Calculates the pairwise distances.

Parameters:: X (Union[ndarray, Tensor, None]) – Input data tensor with one sample per row.
Return type:: RelationData
Returns:: A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances.

class paradime.relations.Precomputed(X, transform=None)[source]¶

Precomputed relations between data points.

Parameters:

X (Union[ndarray, Tensor]) – The precomputed relations, in a form accepted by paradime.relationdata.relation_factory().
transform (Union[RelationTransform, list[RelationTransform], None]) – A single paradime.transforms.Transform or list of paradime.transforms.Transform instances to be applied to the relations.

relations¶: A paradime.relationdata.RelationData instance

containing the

Type:: possibly transformed

compute_relations(X=None, **kwargs)[source]¶

Obtain the precomputed relations.

Parameters:: X (Union[ndarray, Tensor, None]) – Ignored, since relations are already precomputed.
Return type:: RelationData
Returns:: A paradime.relationdata.RelationData instance containing the (possibly transformed) relations.

class paradime.relations.Relations(transform=None, data_key='main')[source]¶

Base class for calculating relations between data points.

Custom relations should subclass this class.

Relation Data¶

Relation data containers for ParaDime.

The paradime.relationdata module implements container classes for various formats of relation data. The relation data containers are used by the different paradime.relations.Relations (see paradime.relations) and paradime.transforms.RelationTransform (see paradime.transforms).

class paradime.relationdata.FlatRelationArray(relations)[source]¶

Relation data in the form of a flat array of individual relations.

Parameters:: relations (ndarray) – A flat Numpy array of relation values.

data¶: The raw relation data.

to_flat_array()[source]¶

Converts the relations to a paradime.relationdata.FlatRelationArray.

Return type:: FlatRelationArray
Returns:: The converted relations.

to_flat_tensor()[source]¶

Converts the relations to a paradime.relationdata.FlatRelationTensor.

Return type:: FlatRelationTensor
Returns:: The converted relations.

class paradime.relationdata.FlatRelationTensor(relations)[source]¶

Relation data in the form of a flat tensor of individual relations.

Parameters:: relations (Tensor) – A flat PyTorch tensor of relation values.

data¶: The raw relation data.

to_flat_array()[source]¶

Converts the relations to a paradime.relationdata.FlatRelationArray.

Return type:: FlatRelationArray
Returns:: The converted relations.

to_flat_tensor()[source]¶

Converts the relations to a paradime.relationdata.FlatRelationTensor.

Return type:: FlatRelationTensor
Returns:: The converted relations.

class paradime.relationdata.NeighborRelationTuple(relations, sort=None)[source]¶

Relation data in neighborhood tuple form.

Parameters:

relations (tuple[ndarray, ndarray]) – A tuple (n, r) of relation data, where n is an array of neighor indices for each data point and r is an array of relation values. Both arrays must be of shape (num_points, num_neighbors).
sort (Optional[Literal[‘ascending’, ‘descending’]]) – Sorting option. If None is passed (default), values are kept as is. Otherwise, values for each item are sorted either in 'ascending' or 'descending' order.

data¶: The raw relation data.

sub(indices)[source]¶

Subsamples the relation matrix based on item indices.

Intended to be used for batch-wise subsampling of global relations.

Parameters:: indices (Union[list[int], ndarray[Any, dtype[integer]], Tensor]) – A flat list of item indices.
Return type:: Tensor
Returns:: A square PyTorch tensor consisting of all relations between items with the given indices.

to_neighbor_tuple()[source]¶

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

Return type:: NeighborRelationTuple
Returns:: The converted relations.

to_sparse_array()[source]¶

Converts the relations to a paradime.relationdata.SparseRelationArray.

Return type:: SparseRelationArray
Returns:: The converted relations.

to_square_array()[source]¶

Converts the relations to a paradime.relationdata.SquareRelationArray.

Return type:: SquareRelationArray
Returns:: The converted relations.

to_square_tensor()[source]¶

Converts the relations to a paradime.relationdata.SquareRelationTensor.

Return type:: SquareRelationTensor
Returns:: The converted relations.

to_triangular_array()[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationArray.

Return type:: TriangularRelationArray
Returns:: The converted relations.

to_triangular_tensor()[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

Return type:: TriangularRelationTensor
Returns:: The converted relations.

class paradime.relationdata.RelationData[source]¶

Base class for storing relations between data points.

sub(indices)[source]¶

Subsamples the relation matrix based on item indices.

Intended to be used for batch-wise subsampling of global relations.

Parameters:: indices (Union[list[int], ndarray[Any, dtype[integer]], Tensor]) – A flat list of item indices.
Return type:: Tensor
Returns:: A square PyTorch tensor consisting of all relations between items with the given indices.

to_flat_array()[source]¶

Converts the relations to a paradime.relationdata.FlatRelationArray.

Return type:: FlatRelationArray
Returns:: The converted relations.

to_flat_tensor()[source]¶

Converts the relations to a paradime.relationdata.FlatRelationTensor.

Return type:: FlatRelationTensor
Returns:: The converted relations.

to_neighbor_tuple()[source]¶

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

Return type:: NeighborRelationTuple
Returns:: The converted relations.

to_sparse_array()[source]¶

Converts the relations to a paradime.relationdata.SparseRelationArray.

Return type:: SparseRelationArray
Returns:: The converted relations.

to_square_array()[source]¶

Converts the relations to a paradime.relationdata.SquareRelationArray.

Return type:: SquareRelationArray
Returns:: The converted relations.

to_square_tensor()[source]¶

Converts the relations to a paradime.relationdata.SquareRelationTensor.

Return type:: SquareRelationTensor
Returns:: The converted relations.

to_triangular_array()[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationArray.

Return type:: TriangularRelationArray
Returns:: The converted relations.

to_triangular_tensor()[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

Return type:: TriangularRelationTensor
Returns:: The converted relations.

class paradime.relationdata.SparseRelationArray(relations)[source]¶

Relation data in sparse array form.

Parameters:: relations (spmatrix) – A square, sparse Scipy array of relation values.

data¶: The raw relation data.

sub(indices)[source]¶

Subsamples the relation matrix based on item indices.

Intended to be used for batch-wise subsampling of global relations.

Parameters:: indices (Union[list[int], ndarray[Any, dtype[integer]], Tensor]) – A flat list of item indices.
Return type:: Tensor
Returns:: A square PyTorch tensor consisting of all relations between items with the given indices.

to_neighbor_tuple()[source]¶

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

Return type:: NeighborRelationTuple
Returns:: The converted relations.

to_sparse_array()[source]¶

Converts the relations to a paradime.relationdata.SparseRelationArray.

Return type:: SparseRelationArray
Returns:: The converted relations.

to_square_array()[source]¶

Converts the relations to a paradime.relationdata.SquareRelationArray.

Return type:: SquareRelationArray
Returns:: The converted relations.

to_square_tensor()[source]¶

Converts the relations to a paradime.relationdata.SquareRelationTensor.

Return type:: SquareRelationTensor
Returns:: The converted relations.

to_triangular_array()[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationArray.

Return type:: TriangularRelationArray
Returns:: The converted relations.

to_triangular_tensor()[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

Return type:: TriangularRelationTensor
Returns:: The converted relations.

class paradime.relationdata.SquareRelationArray(relations)[source]¶

Relation data in the form of a square array.

Parameters:: relations (ndarray) – A square Numpy array of relation values.

data¶: The raw relation data.

sub(indices)[source]¶

Subsamples the relation matrix based on item indices.

Intended to be used for batch-wise subsampling of global relations.

Parameters:: indices (Union[list[int], ndarray[Any, dtype[integer]], Tensor]) – A flat list of item indices.
Return type:: Tensor
Returns:: A square PyTorch tensor consisting of all relations between items with the given indices.

to_neighbor_tuple()[source]¶

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

Return type:: NeighborRelationTuple
Returns:: The converted relations.

to_sparse_array()[source]¶

Converts the relations to a paradime.relationdata.SparseRelationArray.

Return type:: SparseRelationArray
Returns:: The converted relations.

to_square_array()[source]¶

Converts the relations to a paradime.relationdata.SquareRelationArray.

Return type:: SquareRelationArray
Returns:: The converted relations.

to_square_tensor()[source]¶

Converts the relations to a paradime.relationdata.SquareRelationTensor.

Return type:: SquareRelationTensor
Returns:: The converted relations.

to_triangular_array()[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationArray.

Return type:: TriangularRelationArray
Returns:: The converted relations.

to_triangular_tensor()[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

Return type:: TriangularRelationTensor
Returns:: The converted relations.

class paradime.relationdata.SquareRelationTensor(relations)[source]¶

Relation data in the form of a square tensor.

Parameters:: relations (Tensor) – A square PyTorch tensor of relation values.

data¶: The raw relation data.

sub(indices)[source]¶

Subsamples the relation matrix based on item indices.

Intended to be used for batch-wise subsampling of global relations.

Parameters:: indices (Union[list[int], ndarray[Any, dtype[integer]], Tensor]) – A flat list of item indices.
Return type:: Tensor
Returns:: A square PyTorch tensor consisting of all relations between items with the given indices.

to_neighbor_tuple()[source]¶

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

Return type:: NeighborRelationTuple
Returns:: The converted relations.

to_sparse_array()[source]¶

Converts the relations to a paradime.relationdata.SparseRelationArray.

Return type:: SparseRelationArray
Returns:: The converted relations.

to_square_array()[source]¶

Converts the relations to a paradime.relationdata.SquareRelationArray.

Return type:: SquareRelationArray
Returns:: The converted relations.

to_square_tensor()[source]¶

Converts the relations to a paradime.relationdata.SquareRelationTensor.

Return type:: SquareRelationTensor
Returns:: The converted relations.

to_triangular_array()[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationArray.

Return type:: TriangularRelationArray
Returns:: The converted relations.

to_triangular_tensor()[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

Return type:: TriangularRelationTensor
Returns:: The converted relations.

class paradime.relationdata.TriangularRelationArray(relations)[source]¶

Relation data in ‘triangular’ vector-form.

Parameters:: relations (ndarray) – A Numpy array of relation values, as accepted by scipy.spatial.distance.squareform().

data¶: The raw relation data.

sub(indices)[source]¶

Subsamples the relation matrix based on item indices.

Intended to be used for batch-wise subsampling of global relations.

Parameters:: indices (Union[list[int], ndarray[Any, dtype[integer]], Tensor]) – A flat list of item indices.
Return type:: Tensor
Returns:: A square PyTorch tensor consisting of all relations between items with the given indices.

to_neighbor_tuple()[source]¶

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

Return type:: NeighborRelationTuple
Returns:: The converted relations.

to_sparse_array()[source]¶

Converts the relations to a paradime.relationdata.SparseRelationArray.

Return type:: SparseRelationArray
Returns:: The converted relations.

to_square_array()[source]¶

Converts the relations to a paradime.relationdata.SquareRelationArray.

Return type:: SquareRelationArray
Returns:: The converted relations.

to_square_tensor()[source]¶

Converts the relations to a paradime.relationdata.SquareRelationTensor.

Return type:: SquareRelationTensor
Returns:: The converted relations.

to_triangular_array()[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationArray.

Return type:: TriangularRelationArray
Returns:: The converted relations.

to_triangular_tensor()[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

Return type:: TriangularRelationTensor
Returns:: The converted relations.

class paradime.relationdata.TriangularRelationTensor(relations)[source]¶

Relation data in ‘triangular’ vector-form.

Parameters:: relations (Tensor) – A PyTorch tensor of relation values, with a shape as accepted by scipy.spatial.distance.squareform().

data¶: The raw relation data.

sub(indices)[source]¶

Subsamples the relation matrix based on item indices.

Intended to be used for batch-wise subsampling of global relations.

Parameters:: indices (Union[list[int], ndarray[Any, dtype[integer]], Tensor]) – A flat list of item indices.
Return type:: Tensor
Returns:: A square PyTorch tensor consisting of all relations between items with the given indices.

to_neighbor_tuple()[source]¶

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

Return type:: NeighborRelationTuple
Returns:: The converted relations.

to_sparse_array()[source]¶

Converts the relations to a paradime.relationdata.SparseRelationArray.

Return type:: SparseRelationArray
Returns:: The converted relations.

to_square_array()[source]¶

Converts the relations to a paradime.relationdata.SquareRelationArray.

Return type:: SquareRelationArray
Returns:: The converted relations.

to_square_tensor()[source]¶

Converts the relations to a paradime.relationdata.SquareRelationTensor.

Return type:: SquareRelationTensor
Returns:: The converted relations.

to_triangular_array()[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationArray.

Return type:: TriangularRelationArray
Returns:: The converted relations.

to_triangular_tensor()[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

Return type:: TriangularRelationTensor
Returns:: The converted relations.

paradime.relationdata.relation_factory(relations, force_flat=False)[source]¶

Creates a paradime.relationdata.RelationData object from a variety of input formats.

Parameters:

relations (Union[ndarray, Tensor, spmatrix, Tuple[ndarray, ndarray]]) – The relations, specified either as a flat array or tensor, a square array or tensor, a vector-form (triangular) array or tensor, a sparse array, or a tuple (n, r), where n is an array of neighor indices for each data point and r is an array of relation values of the same shape.
force_flat (bool) – If set true, disables the check for triangular arrays and tensors. Useful if flat relation data might have a length equal to a triangular number.

Return type:

Returns:

A paradime.relationdata.RelationData object with a subclass depending on the input format.

Transforms¶

Relation transforms for ParaDime.

The paradime.tranforms module defines various classes used to transform relations between data points.

class paradime.transforms.AdaptiveNeighborhoodRescale(kernel, find_param, verbose=False, **kwargs)[source]¶

Rescales relation values for each data point based on its neighbors.

This is a base class for transformations such as those used by t-SNE or UMAP. For each data point, a parameter is fitted by comparing kernel-transformed relations to a target value. Once the parameter value is found, the kernel function is used to transform the relations.

Parameters:

kernel (Callable[[ndarray, float], Union[float, ndarray]]) – The kernel function used to transform the relations. This is a callable taking the relation values for a data point, along with a parameter.
find_param (Callable[..., float]) – The function used to find the parameter value. This is a callable taking the relation values and a fixed value to compare the transformed relations against.
verbose (bool) – Verbosity toggle.

property param_values: ndarray¶

The parameter values determined for each data point. Available only after calling the transform.

Return type:: ndarray

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.ConnectivityBasedRescale(n_neighbors=15, verbose=False, **kwargs)[source]¶

Applies a connectivity-based transformation to the relation values.

The relation values are rescaled using shifted Guassian kernels. The shift is equal to the closes neighboring data point, and the kernel width is set by by comparing the summed kernel values to the binary logarithm of the specified number of neighbors. This is the relation transform used by UMAP.

Parameters:

n_neighbors (float) – The number of nearest neighbors used to determine the kernel widths.
verbose (bool) – Verbosity toggle.
**kwargs – Passed on to scipy.optimize.root_scalar(), which determines the kernel widths. By default, this is set to use a bracket of [10^(-6), 10^6] for the root search.

class paradime.transforms.Functional(f, in_place=True, check_valid=False)[source]¶

Applies a function to the relation data.

By default, this transform applies a given function to the data attribute of the paradime.relationdata.RelationData instance in place and returns the transformed instance. This assumes that the transform does not change the data in a way that is incompatible with the paradime.relationdata.RelationData subclass. The transform can also be applied to the whole paradime.relationdata.RelationData instance by setting in_place to False. In this case, the output is that of the given function.

Parameters:

f (Callable[..., Any]) – Function to be applied to the relations.
in_place (bool) – Toggles whether the function is applied to the data attribute of the paradime.relationdata.RelationData object (default), or to the paradime.relationdata.RelationData itself.
check_valid (bool) – Toggles whether a check for the transformed relation data’s validity is performed. If in_place is set to False, no checks are performed regardless of this parameter.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.Identity(**kwargs)[source]¶

A placeholder identity transform.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.ModifiedCauchyTransform(min_dist=0.1, spread=1.0, a=None, b=None)[source]¶

Transforms relations based on a modified Cauchy distribution.

This transform applies a modified Cauchy distribution function to the relations. The distribution’s parameters a and b are determined from the parameters min_dist and spread by fitting a smooth approximation of an offset exponential decay.

Parameters:

min_dist (float) – Effective minimum distance of points if the transformed relations were to be used for calculating an embedding.
spread (float) – Effective scale of the points if the tranformed relations were to be used for calculating an embedding.
a (Union[float, Tensor, None]) – Parameter to define the distribution directly. It can be optimized together with the DR model in a ParametricDR by setting it to one of the model’s additional parameters.
b (Union[float, Tensor, None]) – Parameter to define the distribution directly. It can be optimized together with the DR model in a ParametricDR by setting it to one of the model’s additional parameters.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.Normalize(**kwargs)[source]¶

Normalizes all relations at once.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.NormalizeRows(**kwargs)[source]¶

Normalizes the relation values for each data point separately.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.PerplexityBasedRescale(perplexity=30, verbose=False, **kwargs)[source]¶

Applies a perplexity-based transformation to the relation values.

The relation values are rescaled using Guassian kernels. For each data point, the kernel width is determined by comparing the entropy of the relation values to the binary logarithm of the specified perplexity. This is the relation transform used by t-SNE.

Parameters:

perplexity (float) – The desired perplexity, which can be understood as a smooth measure of nearest neighbors.
verbose (bool) – Verbosity toggle.
**kwargs – Passed on to scipy.optimize.root_scalar(), which determines the kernel widths. By default, this is set to use a bracket of [0.01, 1.] for the root search.

class paradime.transforms.RelationTransform(**kwargs)[source]¶

Base class for relation transforms.

Custom transforms should subclass this class.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.StudentTTransform(alpha)[source]¶

Transforms relations based on Student’s t-distribution.

Parameters:: alpha (Union[float, Tensor]) – Degrees of freedom of the distribution. This can either be a float or a PyTorch tensor. Alpha can be optimized together with the DR model in a paradime.dr.ParametricDR by setting it to one of the model’s additional parameters.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.Symmetrize(subtract_product=False)[source]¶

Symmetrizes the relation values.

Parameters:: subtract_product (bool) – Specifies which symmetrization routine to use. If set to False (default), a matrix M is symmetrized by calculating 1/2 * (M + M^T); if set to True, M is symmetrized by calculating M + M^T - M * M^T, where ‘*’ is the element-wise (Hadamard) product.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.ToFlatArray(**kwargs)[source]¶

Converts the relations to a paradime.relationdata.FlatRelationArray.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.ToFlatTensor(**kwargs)[source]¶

Converts the relations to a paradime.relationdata.FlatRelationTensor.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.ToNeighborTuple(**kwargs)[source]¶

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.ToSparseArray(**kwargs)[source]¶

Converts the relations to a paradime.relationdata.SparseRelationArray.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.ToSquareArray(**kwargs)[source]¶

Converts the relations to a paradime.relationdata.SquareRelationArray.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.ToSquareTensor(**kwargs)[source]¶

Converts the relations to a paradime.relationdata.SquareRelationTensor.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.ToTriangularArray(**kwargs)[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationArray.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.ToTriangularTensor(**kwargs)[source]¶

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

class paradime.transforms.ZeroDiagonal(**kwargs)[source]¶

Sets all self-relations to zero.

transform(reldata)[source]¶

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

Returns:

A paradime.relationdata.RelationData instance containing: the transformed relation values.

Loss¶

Losses for ParaDime routines.

The paradime.loss module implements the specification of losses for ParaDime routines. The supported losses are paradime.loss.RelationLoss, paradime.loss.ClassificationLoss, paradime.loss.ReconstructionLoss, and paradime.loss.CompoundLoss.

class paradime.loss.ClassificationLoss(loss_function=CrossEntropyLoss(), data_key='main', label_key='labels', classification_method='classify', name=None)[source]¶

A loss that compares predicted class labels against ground truth labels.

This loss compares predicted class labels to ground truth labels in a batch using a specified loss function (cross-entropy by default). Class labels are predicted by applying the model’s classify() method to the specified data entry of the input batch.

Parameters:

loss_function (Callable[[Tensor, Tensor], Tensor]) – The loss function to be applied.
data_key (str) – The key under which to find the data in the input batch.
label_key (str) – The key under which ground truth labels are stored in the input batch.
classification_method (str) – The model method to be used for classifying the batch of input data.
name (Optional[str]) – Name of the loss (used by logging functions).

forward(model, global_relations, batch_relations, batch, device)[source]¶

Apply the loss to a batch of input data.

Parameters:

model (Model) – The torch.nn.module used to embed, classify, or reconstruct a batch of input data.
global_relations (dict[str, RelationData]) – A dictionary with paradime.relationdata.RelationData computed for the whole dataset.
batch_relations (dict[str, Relations]) – A dictionary with paradime.relations.Relations to be computed for the batch of input data.
batch (dict[str, Tensor]) – A batch of input data as a dictionary of PyTorch tensors.
device (device) – The device that all relevant tensors will be moved to.

Return type:

Returns:

A single-item PyTorch tensor with the computed loss.

class paradime.loss.CompoundLoss(losses, weights=None, name=None)[source]¶

A weighted sum of multiple losses.

Parameters:

losses (list[Loss]) – A list of paradime.loss.Loss instances to be summed.
weights (Union[ndarray, Tensor, list[float], None]) – A list of weights to multiply the losses with. Must be of the same length as the list of losses. If no weights are specified, all losses are weighted equally.
name (Optional[str]) – Name of the loss (used by logging functions).

checkpoint()[source]¶

Create a checkpoint of the most recent accumulated loss.

Appends the value of the most recent accumulated loss to the loss’s history attribute. If the loss is a paradime.loss.CompoundLoss, checkpoints are also created for each individual loss.

Return type:: None

detailed_history()[source]¶

Returns a detailed history of the compound loss.

Return type:: Tensor
Returns:: A PyTorch tensor with the history of each loss component multiplied by its weight.

forward(model, global_relations, batch_relations, batch, device)[source]¶

Apply the loss to a batch of input data.

Parameters:

model (Model) – The torch.nn.module used to embed, classify, or reconstruct a batch of input data.
global_relations (dict[str, RelationData]) – A dictionary with paradime.relationdata.RelationData computed for the whole dataset.
batch_relations (dict[str, Relations]) – A dictionary with paradime.relations.Relations to be computed for the batch of input data.
batch (dict[str, Tensor]) – A batch of input data as a dictionary of PyTorch tensors.
device (device) – The device that all relevant tensors will be moved to.

Return type:

Returns:

A single-item PyTorch tensor with the computed loss.

class paradime.loss.Loss(name=None)[source]¶

Base class for losses.

Custom losses should subclass this class.

name¶: The name of the loss (used by logging functions).

checkpoint()[source]¶

Create a checkpoint of the most recent accumulated loss.

Appends the value of the most recent accumulated loss to the loss’s history attribute. If the loss is a paradime.loss.CompoundLoss, checkpoints are also created for each individual loss.

Return type:: None

forward(model, global_relations, batch_relations, batch, device)[source]¶

Apply the loss to a batch of input data.

Parameters:

model (Model) – The torch.nn.module used to embed, classify, or reconstruct a batch of input data.
global_relations (dict[str, RelationData]) – A dictionary with paradime.relationdata.RelationData computed for the whole dataset.
batch_relations (dict[str, Relations]) – A dictionary with paradime.relations.Relations to be computed for the batch of input data.
batch (dict[str, Tensor]) – A batch of input data as a dictionary of PyTorch tensors.
device (device) – The device that all relevant tensors will be moved to.

Return type:

Returns:

A single-item PyTorch tensor with the computed loss.

class paradime.loss.PositionLoss(loss_function=MSELoss(), data_key='main', position_key='pos', embedding_method='embed', name=None)[source]¶

A loss that compares embedding coordinates to given positions.

This loss compares embedding coordiantes to given ground-truth coordinates in a batch using a specified loss function (mean-square-error by default). Embedding positions are computed by applying the model’s embed() method to the specified data entry of the input batch.

Parameters:

loss_function (Callable[[Tensor, Tensor], Tensor]) – The loss function to be applied.
data_key (str) – The key under which to find the data in the input batch.
position_key (str) – The key under which the ground truth positions are stored in the input batch.
embedding_method (str) – The model method to be used for embedding the batch of input data.
name (Optional[str]) – Name of the loss (used by logging functions).

forward(model, global_relations, batch_relations, batch, device)[source]¶

Apply the loss to a batch of input data.

Parameters:

model (Model) – The torch.nn.module used to embed, classify, or reconstruct a batch of input data.
global_relations (dict[str, RelationData]) – A dictionary with paradime.relationdata.RelationData computed for the whole dataset.
batch_relations (dict[str, Relations]) – A dictionary with paradime.relations.Relations to be computed for the batch of input data.
batch (dict[str, Tensor]) – A batch of input data as a dictionary of PyTorch tensors.
device (device) – The device that all relevant tensors will be moved to.

Return type:

Returns:

A single-item PyTorch tensor with the computed loss.

class paradime.loss.ReconstructionLoss(loss_function=MSELoss(), data_key='main', encoding_method='encode', decoding_method='decode', name=None)[source]¶

A simple reconstruction loss for auto-encoding data.

This loss compares reconstructed data to input data in a batch using a specified loss function (mean-square-error by default). Reconstructed data is computed by applying the model’s decode() and encode() methods subsequently to the specified data entry of the input batch.

Parameters:

loss_function (Callable[[Tensor, Tensor], Tensor]) – The loss function to be applied.
data_key (str) – The key under which to find the data in the input batch.
encoding_method (str) – The model method to be used for encoding the batch of input data.
decoding_method (str) – The model method to be used for decoding the encoded batch of input data.
name (Optional[str]) – Name of the loss (used by logging functions).

forward(model, global_relations, batch_relations, batch, device)[source]¶

Apply the loss to a batch of input data.

Parameters:

model (Model) – The torch.nn.module used to embed, classify, or reconstruct a batch of input data.
global_relations (dict[str, RelationData]) – A dictionary with paradime.relationdata.RelationData computed for the whole dataset.
batch_relations (dict[str, Relations]) – A dictionary with paradime.relations.Relations to be computed for the batch of input data.
batch (dict[str, Tensor]) – A batch of input data as a dictionary of PyTorch tensors.
device (device) – The device that all relevant tensors will be moved to.

Return type:

Returns:

A single-item PyTorch tensor with the computed loss.

class paradime.loss.RelationLoss(loss_function, global_relation_key='rel', batch_relation_key='rel', embedding_method='embed', normalize_sub=True, name=None)[source]¶

A loss that compares batch-wise relation data against a subset of global relation data.

This loss applies a specified loss function to a subset of pre-computed global relations and the batch-wise relations found under specified keys, respectively. Batch-wise relations are computed from embedded coordinates by applying the model’s embed() method to the specified data entry of the input batch.

Parameters:

loss_function (Callable[[Tensor, Tensor], Tensor]) – The loss function to be applied.
global_relation_key (str) – Key under which to find the global relations.
batch_relation_key (str) – Key under which to find the batch-wise relations.
embedding_method (str) – The model method to be used for embedding the batch of input data.
name (Optional[str]) – Name of the loss (used by logging functions).

forward(model, global_relations, batch_relations, batch, device)[source]¶

Apply the loss to a batch of input data.

Parameters:

model (Model) – The torch.nn.module used to embed, classify, or reconstruct a batch of input data.
global_relations (dict[str, RelationData]) – A dictionary with paradime.relationdata.RelationData computed for the whole dataset.
batch_relations (dict[str, Relations]) – A dictionary with paradime.relations.Relations to be computed for the batch of input data.
batch (dict[str, Tensor]) – A batch of input data as a dictionary of PyTorch tensors.
device (device) – The device that all relevant tensors will be moved to.

Return type:

Returns:

A single-item PyTorch tensor with the computed loss.

paradime.loss.cross_entropy_loss(p, q, epsilon=1e-07)[source]¶

Cross-entropy loss as used by UMAP.

To be used as a loss function in the paradime.loss.RelationLoss of a parametric DR routine.

Parameters:

p (Tensor) – Input tensor containing (a batch of) probabilities.
q (Tensor) – Input tensor containing (a batch of) probabilities.
epsilon (float) – Small constant used to avoid numerical errors caused by near-zero probability values.

Return type:

Returns:

The cross-entropy loss of the two input tensors, divided by the number items in the batch.

paradime.loss.kullback_leibler_div(p, q, epsilon=1e-07)[source]¶

Kullback-Leibler divergence.

To be used as a loss function in the paradime.loss.RelationLoss of a parametric DR routine.

Parameters:

p (Tensor) – Input tensor containing (a batch of) probabilities.
q (Tensor) – Input tensor containing (a batch of) probabilities.
epsilon (float) – Small constant used to avoid numerical errors caused by near-zero probability values.

Return type:

Returns:

The Kullback-Leibler divergence of the two input tensors, divided by the number of items in the batch.

Routines¶

Predefined ParaDime routines for existing DR techniques.

The paradime.routines module implements parametric versions of existing dimensionality reduction techniques using the paradime.dr.ParametricDR interface.

class paradime.routines.ParametricTSNE(perplexity=30.0, alpha=1.0, model=None, in_dim=None, out_dim=2, hidden_dims=[100, 50], initialization='pca', epochs=30, init_epochs=10, batch_size=500, init_batch_size=None, learning_rate=0.01, init_learning_rate=None, data_key='main', use_cuda=False, verbose=False)[source]¶

A parametric version of t-SNE.

This class provides a high-level interface for a paradime.paradime.ParametricDR routine with the following specifications:

The global relations are paradime.relations.NeighborBasedPDist, transformed with a paradime.transforms.PerplexityBasedRescale followed by paradime.tranforms.Symmetrize.
The batch relations are paradime.relations.DifferentiablePDist, transformed with a paradime.relations.StudentTTransform followed by paradime.transform.Normalize.
The first (optional) training phase intializes the model to approximate PCA (see intialization below).
The second training phase uses the Kullback-Leibler divergence to compare the relations.

Parameters:

perplexity (float) – The desired perplexity, which can be understood as a smooth measure of nearest neighbors used to determine high-dimensional relations between data points.
alpha (float) – Degrees of freedom of the Student’s t-disitribution used to calculate low-dimensional relations between data points.
model (Optional[Model]) – The model used to embed the high dimensional data.
in_dim (Optional[int]) – The numer of dimensions of the input data, used to construct a default model in case none is specified. If a dataset is specified at instantiation, the correct value for this parameter will be inferred from the data dimensions.
out_dim (int) – The number of output dimensions (i.e., the dimensionality of the embedding).
hidden_dims (list[int]) – Dimensions of hidden layers for the default fully connected model that is created if no model is specified.
initialization (Optional[str]) – How to pretrain the model to mimic initialization of low-dimensional positions. By default ('pca') the model is pretrained to output an approximation of PCA before beginning the main training phase.
epochs (int) – The number of epochs in the main training phase.
init_epochs (int) – The number of epochs in the pretraining (initialization). phase.
batch_size (int) – The number of items in a batch during the main training phase.
init_batch_size (Optional[int]) – The number of items in a batch during the pretraining (initialization).
learning_rate (float) – The learning rate during the main training phase.
init_learning_reate – The learning rate during the pretraining (initialization).
data_key (str) – The key under which the data can be found in the dataset.
dataset – The dataset on which to perform the training. Datasets can be registerd after instantiation using the register_dataset() class method.
use_cuda (bool) – Whether or not to use the GPU for training.
verbosity – Verbosity flag.

class paradime.routines.ParametricUMAP(n_neighbors=30, min_dist=0.01, spread=1.0, a=None, b=None, model=None, in_dim=None, out_dim=2, hidden_dims=[100, 50], initialization='spectral', epochs=30, init_epochs=5, batch_size=10, negative_sampling_rate=5, init_batch_size=100, learning_rate=0.005, init_learning_rate=0.05, data_key='main', dataset=None, use_cuda=False, verbose=False)[source]¶

A parametric version of UMAP.

This class provides a high-level interface for a paradime.paradime.ParametricDR routine with the following specifications:

The global relations are paradime.relations.NeighborBasedPDist, transformed with a paradime.transforms.ConnectivityBasedRescale followed by paradime.tranforms.Symmetrize with product subtraction.
The batch relations are paradime.relations.DistsFromTo (since negative edge sampling is used), transformed with a paradime.relations.ModifiedCauchyTransform.
The first (optional) training phase intializes the model to approximate a spectral embedding based on the global relations (see intialization below).
The second training phase uses corss-entropy to compare the relations. This phase uses negative edge sampling.

Parameters:

n_neighbors (int) – The desired number of neighbors used for computing the high-dimensional pairwise relations.
min_dist (float) – Effective minimum distance of points in the embedding.
spread (float) – Effective scale of the points in the embedding.
a (Optional[float]) – Parameter to define the modified Cauchy distribution used to compute low-dimensional relations.
b (Optional[float]) – Parameter to define the modified Cauchy distribution used to compute low-dimensional relations.
model (Optional[Model]) – The model used to embed the high dimensional data.
in_dim (Optional[int]) – The numer of dimensions of the input data, used to construct a default model in case none is specified. If a dataset is specified at instantiation, the correct value for this parameter will be inferred from the data dimensions.
out_dim (int) – The number of output dimensions (i.e., the dimensionality of the embedding).
hidden_dims (list[int]) – Dimensions of hidden layers for the default fully connected model that is created if no model is specified.
initialization (Optional[str]) – How to pretrain the model to mimic initialization of low-dimensional positions. By default ('spectral') the model is pretrained to output an approximation of a soectral embedding based on the high-dimensional relations before beginning the main training phase.
epochs (int) – The number of epochs in the main training phase.
init_epochs (int) – The number of epochs in the pretraining (initialization). phase.
batch_size (int) – The number of items in a batch during the main training phase.
init_batch_size (int) – The number of items in a batch during the pretraining (initialization).
learning_rate (float) – The learning rate during the main training phase.
init_learning_reate – The learning rate during the pretraining (initialization).
data_key (str) – The key under which the data can be found in the dataset.
dataset (Union[ndarray, Tensor, Mapping[str, Union[ndarray, Tensor]], Dataset, None]) – The dataset on which to perform the training. Datasets can be registerd after instantiation using the register_dataset() class method.
use_cuda (bool) – Whether or not to use the GPU for training.
verbosity – Verbosity flag.

Utils¶

Utility functions for ParaDime.

The paradime.utils subpackage includes various modules that implement utility functions for logging, plotting, and input conversion.

Convert¶

Conversion utilities for ParaDime.

The paradime.utils.convert module implements various conversion functions for tensors-like objects and index lists.

paradime.utils.convert.rowcol_to_triu_index(i, j, dim)[source]¶

Converts matrix indices to upper-triangular form.

Converts a pair of row and column indices of a symmetrical square array to the corresponding index of the list of upper triangular values.

Parameters:

i (int) – The row index.
j (int) – The column index.
dim (int) – The size of the square matrix.

Return type:

int

Returns:

The upper triangular index.

Raises:

ValueError – For diagonal indices (i.e., if i equals j).

paradime.utils.convert.to_numpy(X)[source]¶

Converts a tensor-like object to a numpy array.

Parameters:: X (Union[ndarray, Tensor, list[float]]) – The tensor-like object to be converted.
Return type:: ndarray
Returns:: The resulting numpy array.

paradime.utils.convert.to_torch(X)[source]¶

Converts a tensor-like object to a PyTorch tensor.

Parameters:: X (Union[ndarray, Tensor, list[float]]) – The tensor-like object to be converted.
Return type:: Tensor
Returns:: The resulting PyTorch tensor. If the input was not a PyTorch tensor already, the output tensor will be of type float32 for float inputs and of type int32 for integer inputs.

paradime.utils.convert.triu_to_square_dim(len_triu)[source]¶

Calculates the size of a square matrix given the length of the list of its upper-triangular values.

Parameters:: len_triu (int) – The lenght of the list of upper-triangular values.
Return type:: int
Returns:: The size of the square matrix.

Logging¶

Loggin utility for ParaDime.

The paradime.utils.logging module implements logging functionality used by verbose ParaDime routines.

paradime.utils.logging.log(message)[source]¶

Calls the ParaDime logger to print a timestamp and a message.

Parameters:: message (str) – The message string to print.
Return type:: None

paradime.utils.logging.set_logfile(filename, mode='a', disable_stdout=False, disable_other_files=False)[source]¶

Configure the ParaDime logger to write its output to a file.

Parameters:

filename (str) – The path to the log file.
mode (str) – The mode to open the file.
disable_stdout (bool) – Whether or not to disbale logging to stdout.
disable_other_files (bool) – Whether or not to remove other file handlers from the ParaDime logger.

Return type:

Plotting¶

Plotting utilities for ParaDime.

The paradime.utils.plotting module implements plotting functions and color palette retrieval.

paradime.utils.plotting.get_color_palette()[source]¶

Get the custom ParaDime color palette.

The palette is usually located in an assets folder in the form of a JSON file. If the JSON file is not found, this method attemps to create it from parsing an SVG file.

Return type:: dict[str, str]
Returns:: The color palette as a dict of names and hex color values.
Raises:: FileNotFoundError – If neither the JSON nor the SVG file can be found.

paradime.utils.plotting.scatterplot(coords, labels=None, colormap=None, labels_to_index=None, figsize=(10, 10), bgcolor='#fcfcfc', legend=True, legend_options=None, ax=None, **kwargs)[source]¶

Creates a scatter plot of points at the given coordinates.

Parameters:

coords (Union[ndarray, Tensor]) – The coordinates of the points.
labels (Union[ndarray, Tensor, None]) – An list of categorical labels. If labels are given, a categorical color scale is used and a legend is constructed automatically.
colormap (Optional[list[str]]) – A list of colors to use instead of the default categorical color scale based on the ParaDime palette.
labels_to_index (Optional[dict]) – A dict that maps labels to indices which are then used to access the colors in the categorical color scale.
figsize (tuple[float, float]) – Width and height of the plot in inches.
bgcolor (Optional[str]) – The background color of the plot, which by default is also to draw thin outlines around the points.
legend (bool) – Whether or not to include the automatically created legend.
legend_options (Optional[dict[str, Any]]) – A dict of keyword arguments that are passed on to the legend method.
ax (Optional[matplotlib.axes.Axes]) – An axes of the current figure. This argument is useful if the scatterplot should be added to an existing figure.
kwargs – Any other keyword arguments are passed on to matplotlib’s scatter method.

Return type: