DR

Main module of ParaDime.

The paradime.dr module implements the main functionality of ParaDime. This includes the paradime.dr.ParametricDR class, as well as paradime.dr.Dataset and paradime.dr.TrainingPhase.

class paradime.dr.Dataset(data)[source]

A dataset for dimensionality reduction.

Constructs a PyTorch :class:torch.utils.data.Dataset from the given data in such a way that each item or batch of items is a dictionary with PyTorch tensors as values. If only a single numpy array or PyTorch tensor is passed, this data will be available under the 'data' key of the dict. Alternatively, a dict of tensors and/or arrays can be passed, which allows additional data such as labels for supervised learning. By default, an entry for indices is added to the dict, if it is not yet included in the passed dict.

Parameters:

data (Union[ndarray, Tensor, Mapping[str, Union[ndarray, Tensor]]]) – The data, passed either as a single numpy array or PyTorch tensor, or as a dictionary containing multiple arrays and/or tensors.

class paradime.dr.DerivedData(func, type_key_tuples=[('data', 'main')], **kwargs)[source]

A derived dataset entry to be computed later.

Derived dataset entries can be used to set up rules for extending existing datasets later based on functions acting on other dataset entries or global relations.

Parameters:
  • func (Callable[..., Union[ndarray, Tensor]]) – The function to compute the derived data.

  • type_key_tuples (list[tuple[Union[Literal[‘data’], Literal[‘rels’]], str]]) – A list of (type, key) tuples, where the types can be 'data' or 'rels', and the keys are used to access the respective entries.

class paradime.dr.NegSampledEdgeDataset(dataset, relations, neg_sampling_rate=5, data_key='main')[source]

A dataset that supports negative edge sampling.

Constructs a PyTorch torch.utils.data.Dataset suitable for negative sampling from a regular :class:Dataset. The passed relation data, along with the negative samplnig rate r, is used to inform the negative sampling process. Each “item” i of the resulting dataset is essentially a small batch of items, including the item i of the original dataset, one of it’s actual neighbors, and r random other items that are considered to not be neighbors of i. Remaining data from the original dataset is collated using PyTorch’s torch.utils.data.default_collate() method.

Parameters:
class paradime.dr.ParametricDR(model=None, in_dim=None, out_dim=2, hidden_dims=[100, 50], derived_data=None, global_relations=None, batch_relations=None, losses=None, training_defaults=TrainingPhase(   name=None,   epochs=5,   batch_size=50,   batches_per_epoch=-1,   sampling='standard',   edge_rel_key='rel',   neg_sampling_rate=5,   loss_keys=['loss'],   loss_weights=[1.0],   _loss=None,   optimizer=<class 'torch.optim.adam.Adam'>,   learning_rate=0.01,   report_interval=5,   kwargs={}, ), training_phases=None, use_cuda=False, verbose=False)[source]

A general parametric dimensionality reduction routine.

Parameters:
  • model (Optional[Module]) – The PyTorch torch.nn.module whose parameters are optimized during training.

  • in_dim (Optional[int]) – The numer of dimensions of the input data, used to construct a default model in case none is specified. If a dataset is specified at instantiation, the correct value for this parameter will be inferred from the data dimensions.

  • out_dim (int) – The number of output dimensions (i.e., the dimensionality of the embedding).

  • hidden_dims (list[int]) – Dimensions of hidden layers for the default fully connected model that is created if no model is specified.

  • derived_data (Optional[dict[str, DerivedData]]) – A dictionary of paradime.dr.DerivedData instances. These entries are computed before training, either before or after the global relations, depending on the options in the entries.

  • global_relations (Union[Relations, dict[str, Relations], None]) – A single paradime.relations.Relations instance or a dictionary with multiple paradime.relations.Relations instances. Global relations are calculated once for the whole dataset before training.

  • batch_relations (Union[Relations, dict[str, Relations], None]) – A single paradime.relations.Relations instance or a dictionary with multiple paradime.relations.Relations instances. Batch relations are calculated during training for each batch and are compared to an appropriate subset of the global relations by a paradime.loss.RelationLoss.

  • losses (Union[Loss, dict[str, Loss], None]) – A single paradime.loss.Loss instance or a dictionary with multiple paradime.loss.Loss instances. These losses are accessed by the training phases via the respective keys.

  • training_defaults (TrainingPhase) – A paradime.dr.TrainingPhase object with settings that override the default values of all other training phases. This parameter is useful to avoid having to repeatedly set parameters to the same non-default value across training phases. Defaults can also be specified after isntantiation using the set_training_deafults() class method.

  • training_phases (Optional[list[TrainingPhase]]) – A single paradime.dr.TrainingPhase object or a list of paradime.dr.TrainingPhase objects defining the training phases to be run. Training phases can also be added after instantiation using the add_training_phase() class method.

  • use_cuda (bool) – Whether or not to use the GPU for training.

  • verbose (bool) – Verbosity flag. This setting overrides all verbosity settings of relations, transforms and/or losses used within the parametric dimensionality reduction.

device

The device on which the model is allocated (depends on the value specified for use_cuda).

add_data(data)[source]

Adds data to a parametric dimensionality reduction routine.

Tensor-like data will be added to registered dataset. If none is registered yet, a new one will be created and registered. Derived entries will be added to the routine.

Parameters:

data (Mapping[str, Union[ndarray, Tensor, DerivedData]]) – A dict containing the data tensors or derived dataset entries to be added.

Return type:

None

add_training_phase(training_phase=None, name=None, epochs=None, batch_size=None, batches_per_epoch=None, sampling=None, edge_rel_key=None, neg_sampling_rate=None, loss_keys=None, loss_weights=None, optimizer=None, learning_rate=None, report_interval=None, **kwargs)[source]

Adds a single training phase to a parametric dimensionality reduction routine.

This methods accepts either a paradime.dr.TrainingPhase instance or individual parameters passed with the same keyword syntax used by paradime.dr.TrainingPhase.

Parameters:

training_phase (Optional[TrainingPhase]) – A paradime.dr.TrainingPhase instance with the new default settings. Instead of this, individual parameters can also be passed. For a full list of training phase settings, see paradime.dr.TrainingPhase.

Raises:

paradime.exceptions.UnsupportedConfigurationError – This error is raised if the type of paradime.relation.Relations is not compatible with the sampling option.

Return type:

None

apply(X, method=None)[source]

Applies the model to input data.

Applies the model to an input tensor after first switching off PyTorch’s automatic gradient tracking. This method also ensures that the resulting output tensor is on the CPU. The method parameter allows calling of any of the model’s methods in this way, but by default, the model’s __call__ method will be used (which wraps around forward.)

Parameters:
  • X (Union[ndarray, Tensor]) – A numpy array or PyTorch tensor with the input data.

  • method (Optional[str]) – The name of the model method to be applied.

Return type:

Tensor

classify(X)[source]

Classifies data using the model’s classify method.

Parameters:

X (Union[ndarray, Tensor]) – A numpy array or PyTorch tensor with the data to be classified.

Return type:

Tensor

Returns:

A PyTorch tensor with the predicted class labels for the data.

compute_derived_data(only=None)[source]

Computes the derived data entries in the registered dataset.

After caling this function, the derived entries will be stored as regular entries in the routine’s dataset.

Parameters:

only (Optional[Literal[‘rel_based’, ‘other’]]) – If “rel_based”, only those entries are computed that require global relations. If “other”, all other entries are computed. By default (None), all relations are computed.

Return type:

None

compute_global_relations(force=False)[source]

Computes the global relations.

The computed relation data are stored in the instance’s global_relation_data attribute.

Parameters:

force (bool) – Whether or not to force a new computation, when relations have been previously computed for the same instance.

Return type:

None

embed(X)[source]

Embeds data into the learned embedding space using the model’s embed method.

Parameters:

X (Union[ndarray, Tensor]) – A numpy array or PyTorch tensor with the data to be embedded.

Return type:

Tensor

Returns:

A PyTorch tensor with the embedding coordinates for the data.

classmethod from_spec(file_or_spec, model=None)[source]

Creates a paradime.dr.ParametricDR routine from a ParaDime specification.

Parameters:

file_or_spec (Union[str, dict]) – The specification, either as a dictionary or as a path to a YAML/JSON file.

Return type:

TypeVar(_ParametricDR, bound= ParametricDR)

Returns:

The paradime.dr.ParametricDR routine.

Raises:

paradime.exceptions.SpecificationError – If the validation of the specification has failed.

run_training_phase(training_phase)[source]

Runs a single training phase.

Parameters:

training_phase (TrainingPhase) – A paradime.dr.TrainingPhase instance.

Return type:

None

set_training_defaults(training_phase=None, epochs=None, batch_size=None, batches_per_epoch=None, sampling=None, edge_rel_key=None, neg_sampling_rate=None, loss_keys=None, loss_weights=None, optimizer=None, learning_rate=None, report_interval=5, **kwargs)[source]

Sets a parametric dimensionality reduction routine’s default training parameters.

This methods accepts either a paradime.dr.TrainingPhase instance or individual parameters passed with the same keyword syntax used by paradime.dr.TrainingPhase. The specified default parameters will be used instead of the regular defaults when adding training phases.

Parameters:

training_phase (Optional[TrainingPhase]) – A paradime.dr.TrainingPhase instance with the new default settings. Instead of this, individual parameters can also be passed. For a full list of training phase settings, see paradime.dr.TrainingPhase.

Return type:

None

train(data=None)[source]

Runs all training phases of a parametric dimensionality reduction routine.

data: The training data, passed either as a single numpy array or

PyTorch tensor, or as a dictionary containing multiple arrays and/or tensors.

Return type:

None

class paradime.dr.TrainingPhase(name=None, epochs=5, batch_size=50, batches_per_epoch=-1, sampling='standard', edge_rel_key='rel', neg_sampling_rate=5, loss_keys=['loss'], loss_weights=None, optimizer=<class 'torch.optim.adam.Adam'>, learning_rate=0.01, report_interval=5, **kwargs)[source]

A collection of parameter settings for a single phase in the training of a paradime.dr.ParametricDR instance.

Parameters:
  • name (Optional[str]) – The name of the training phase.

  • epochs (int) – The number of epochs to run in this phase. In standard item-based sampling, the model sees every item once per epoch In the case of negative edge sampling, this is not guaranteed, and an epoch instead comprises batches_per_epoch batches (see parameter description below).

  • batch_size (int) – The number of items/edges in a batch. In standard item-based sampling, a batch has this many items, and the edges used for batch relations are constructed from the items. In the case of negative edge sampling, this is the number of sampled positive edges. The total number of edges is higher by a factor of r + 1, where r is the negative sampling rate. The same holds for the number of items (apart from possible duplicates, which can result from the edge sampling and are removed).

  • batches_per_epoch (int) – The number of batches per epoch. This parameter only has an effect for negative edge sampling, where the number of batches per epoch is not determined by the dataset size and the batch size. If this parameter is set to -1 (default), an epoch will comprise a number of batches that leads to a total number of sampled items roughly equal to the number of items in the dataset. If this parameter is set to an integer, an epoch will instead comprise that many batches.

  • sampling (Literal[‘standard’, ‘negative_edge’]) – The sampling strategy, which can be either 'standard' (simple item-based sampling; default) or 'negative_edge' (negative edge sampling).

  • edge_rel_key (str) – The key under which to find the global relations that should be used for negative edge sampling.

  • neg_sampling_rate (int) – The number of negative (i.e., non-neighbor) edges to sample for each real neighborhood edge.

  • loss_keys (list[str]) – The keys under which to find the losses that should be minimized in this training phase.

  • loss_weights (Optional[list[float]]) – The weights for the losses. If none are specified, losses will be weighed equally.

  • optimizer (type) – The optmizer to use for loss minimization.

  • learning_rate (float) – The learning rate used in the optimization.

  • report_interval (int) – How often the loss should be reported during training, given in terms of epochs. E.g., with a setting of 5, the loss will be reported every 5 epochs.

  • kwargs – Additional kwargs that are passed on to the optimizer.

loss

The loss constructed from the keys and weights specified above.

paradime.dr.register_data_func(name, data_func)[source]

Registers a new data function to be used in ParaDime specifications.

Parameters:
  • name (str) – The name of the data function.

  • data_func (Callable) – The data function to be registered.

Return type:

None

paradime.dr.register_loss(name, loss)[source]

Registers a new loss type to be used in ParaDime specifications.

Parameters:
  • name (str) – The name of the loss type.

  • loss (type[Loss]) – The paradime.pdloss.Loss to be registered.

Return type:

None

paradime.dr.register_loss_func(name, loss_func)[source]

Registers a new loss function to be used in ParaDime specifications.

Parameters:
  • name (str) – The name of the loss function.

  • data_func – The loss function to be registered.

Return type:

None

paradime.dr.register_relations(name, rel)[source]

Registers a new type of relations to be used in ParaDime specifications.

Parameters:
Return type:

None

paradime.dr.register_transform(name, tf)[source]

Registers a new relation transform to be used in ParaDime specifications.

Parameters:
  • name (str) – The name of the transform.

  • tf (type[RelationTransform]) – The :class:`paradime.transform.RelationTransform`to be registered.

Return type:

None

paradime.dr.validate_spec(file_or_spec)[source]

Validates a ParaDime specification.

Parameters:

file_or_spec (Union[str, dict]) – The specification, either as a dictionary or as a path to a YAML/JSON file.

Return type:

dict[str, Any]

Returns:

The validated specification as a dictionary.

Raises:

paradime.exceptions.SpecificationError – If the validation of the specification failed.

Relations

Relation computation for ParaDime.

The paradime.relations module defines various classes used to compute relations between data points.

class paradime.relations.DifferentiablePDist(p=2, metric=None, transform=None, data_key='main')[source]

Differentiable pairwise distances between data points.

Parameters:
  • p (float) – Parameter that specificies which p-norm to use as a distance function. Ignored if metric is set.

  • metric (Optional[Callable[[Tensor, Tensor], Tensor]]) – The distance metric to be used.

  • transform (Union[RelationTransform, list[RelationTransform], None]) – A single paradime.transforms.Transform or list of paradime.transforms.Transform instances to be applied to the relations.

  • data_key (str) – The key to access the data for which to compute relations.

  • verbose – Verbosity toggle.

relations

A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances. Available only after calling compute_relations().

compute_relations(X=None, **kwargs)[source]

Calculates the pairwise distances.

If metric is not None, a flexible but memory-inefficient implementation is used instead of PyTorch’s torch.nn.functional.pdist().

Parameters:

X (Union[ndarray, Tensor, None]) – Input data tensor with one sample per row.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances.

class paradime.relations.DistsFromTo(metric=None, transform=None, data_key='main')[source]

Distances between individual pairs of data points.

Parameters:
relations

A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances. Available only after calling compute_relations().

compute_relations(X=None, **kwargs)[source]

Calculates the distances.

Parameters:

X (Union[ndarray, Tensor, None]) – Input data tensor of shape (2, n, dim), where n is the number of pairs of data points.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances.

class paradime.relations.NeighborBasedPDist(n_neighbors=None, metric=None, transform=None, data_key='main', verbose=False)[source]

Approximate, nearest-neighbor-based pairwise distances between data points.

Parameters:
relations

A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances. Available only after calling compute_relations().

compute_relations(X=None, **kwargs)[source]

Calculates the pairwise distances.

Parameters:

X (Union[ndarray, Tensor, None]) – Input data tensor with one sample per row.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances.

class paradime.relations.PDist(metric=None, transform=None, keep_result=True, data_key='main', verbose=False)[source]

Full pairwise distances between data points.

Parameters:
  • metric (Union[Callable, str, None]) – The distance metric to be used.

  • transform (Union[RelationTransform, list[RelationTransform], None]) – A single paradime.transforms.Transform or list of paradime.transforms.Transform instances to be applied to the relations.

  • keep_result – Specifies whether or not to keep previously calculated distances, rather than computing new ones.

  • data_key (str) – The key to access the data for which to compute relations.

  • verbose (bool) – Verbosity toggle.

relations

A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances. Available only after calling compute_relations().

compute_relations(X=None, **kwargs)[source]

Calculates the pairwise distances.

Parameters:

X (Union[ndarray, Tensor, None]) – Input data tensor with one sample per row.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing the (possibly transformed) pairwise distances.

class paradime.relations.Precomputed(X, transform=None)[source]

Precomputed relations between data points.

Parameters:
relations

A paradime.relationdata.RelationData instance

containing the
Type:

possibly transformed

compute_relations(X=None, **kwargs)[source]

Obtain the precomputed relations.

Parameters:

X (Union[ndarray, Tensor, None]) – Ignored, since relations are already precomputed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing the (possibly transformed) relations.

class paradime.relations.Relations(transform=None, data_key='main')[source]

Base class for calculating relations between data points.

Custom relations should subclass this class.

Relation Data

Relation data containers for ParaDime.

The paradime.relationdata module implements container classes for various formats of relation data. The relation data containers are used by the different paradime.relations.Relations (see paradime.relations) and paradime.transforms.RelationTransform (see paradime.transforms).

class paradime.relationdata.FlatRelationArray(relations)[source]

Relation data in the form of a flat array of individual relations.

Parameters:

relations (ndarray) – A flat Numpy array of relation values.

data

The raw relation data.

to_flat_array()[source]

Converts the relations to a paradime.relationdata.FlatRelationArray.

Return type:

FlatRelationArray

Returns:

The converted relations.

to_flat_tensor()[source]

Converts the relations to a paradime.relationdata.FlatRelationTensor.

Return type:

FlatRelationTensor

Returns:

The converted relations.

class paradime.relationdata.FlatRelationTensor(relations)[source]

Relation data in the form of a flat tensor of individual relations.

Parameters:

relations (Tensor) – A flat PyTorch tensor of relation values.

data

The raw relation data.

to_flat_array()[source]

Converts the relations to a paradime.relationdata.FlatRelationArray.

Return type:

FlatRelationArray

Returns:

The converted relations.

to_flat_tensor()[source]

Converts the relations to a paradime.relationdata.FlatRelationTensor.

Return type:

FlatRelationTensor

Returns:

The converted relations.

class paradime.relationdata.NeighborRelationTuple(relations, sort=None)[source]

Relation data in neighborhood tuple form.

Parameters:
  • relations (tuple[ndarray, ndarray]) – A tuple (n, r) of relation data, where n is an array of neighor indices for each data point and r is an array of relation values. Both arrays must be of shape (num_points, num_neighbors).

  • sort (Optional[Literal[‘ascending’, ‘descending’]]) – Sorting option. If None is passed (default), values are kept as is. Otherwise, values for each item are sorted either in 'ascending' or 'descending' order.

data

The raw relation data.

sub(indices)[source]

Subsamples the relation matrix based on item indices.

Intended to be used for batch-wise subsampling of global relations.

Parameters:

indices (Union[list[int], ndarray[Any, dtype[integer]], Tensor]) – A flat list of item indices.

Return type:

Tensor

Returns:

A square PyTorch tensor consisting of all relations between items with the given indices.

to_neighbor_tuple()[source]

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

Return type:

NeighborRelationTuple

Returns:

The converted relations.

to_sparse_array()[source]

Converts the relations to a paradime.relationdata.SparseRelationArray.

Return type:

SparseRelationArray

Returns:

The converted relations.

to_square_array()[source]

Converts the relations to a paradime.relationdata.SquareRelationArray.

Return type:

SquareRelationArray

Returns:

The converted relations.

to_square_tensor()[source]

Converts the relations to a paradime.relationdata.SquareRelationTensor.

Return type:

SquareRelationTensor

Returns:

The converted relations.

to_triangular_array()[source]

Converts the relations to a paradime.relationdata.TriangularRelationArray.

Return type:

TriangularRelationArray

Returns:

The converted relations.

to_triangular_tensor()[source]

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

Return type:

TriangularRelationTensor

Returns:

The converted relations.

class paradime.relationdata.RelationData[source]

Base class for storing relations between data points.

sub(indices)[source]

Subsamples the relation matrix based on item indices.

Intended to be used for batch-wise subsampling of global relations.

Parameters:

indices (Union[list[int], ndarray[Any, dtype[integer]], Tensor]) – A flat list of item indices.

Return type:

Tensor

Returns:

A square PyTorch tensor consisting of all relations between items with the given indices.

to_flat_array()[source]

Converts the relations to a paradime.relationdata.FlatRelationArray.

Return type:

FlatRelationArray

Returns:

The converted relations.

to_flat_tensor()[source]

Converts the relations to a paradime.relationdata.FlatRelationTensor.

Return type:

FlatRelationTensor

Returns:

The converted relations.

to_neighbor_tuple()[source]

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

Return type:

NeighborRelationTuple

Returns:

The converted relations.

to_sparse_array()[source]

Converts the relations to a paradime.relationdata.SparseRelationArray.

Return type:

SparseRelationArray

Returns:

The converted relations.

to_square_array()[source]

Converts the relations to a paradime.relationdata.SquareRelationArray.

Return type:

SquareRelationArray

Returns:

The converted relations.

to_square_tensor()[source]

Converts the relations to a paradime.relationdata.SquareRelationTensor.

Return type:

SquareRelationTensor

Returns:

The converted relations.

to_triangular_array()[source]

Converts the relations to a paradime.relationdata.TriangularRelationArray.

Return type:

TriangularRelationArray

Returns:

The converted relations.

to_triangular_tensor()[source]

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

Return type:

TriangularRelationTensor

Returns:

The converted relations.

class paradime.relationdata.SparseRelationArray(relations)[source]

Relation data in sparse array form.

Parameters:

relations (spmatrix) – A square, sparse Scipy array of relation values.

data

The raw relation data.

sub(indices)[source]

Subsamples the relation matrix based on item indices.

Intended to be used for batch-wise subsampling of global relations.

Parameters:

indices (Union[list[int], ndarray[Any, dtype[integer]], Tensor]) – A flat list of item indices.

Return type:

Tensor

Returns:

A square PyTorch tensor consisting of all relations between items with the given indices.

to_neighbor_tuple()[source]

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

Return type:

NeighborRelationTuple

Returns:

The converted relations.

to_sparse_array()[source]

Converts the relations to a paradime.relationdata.SparseRelationArray.

Return type:

SparseRelationArray

Returns:

The converted relations.

to_square_array()[source]

Converts the relations to a paradime.relationdata.SquareRelationArray.

Return type:

SquareRelationArray

Returns:

The converted relations.

to_square_tensor()[source]

Converts the relations to a paradime.relationdata.SquareRelationTensor.

Return type:

SquareRelationTensor

Returns:

The converted relations.

to_triangular_array()[source]

Converts the relations to a paradime.relationdata.TriangularRelationArray.

Return type:

TriangularRelationArray

Returns:

The converted relations.

to_triangular_tensor()[source]

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

Return type:

TriangularRelationTensor

Returns:

The converted relations.

class paradime.relationdata.SquareRelationArray(relations)[source]

Relation data in the form of a square array.

Parameters:

relations (ndarray) – A square Numpy array of relation values.

data

The raw relation data.

sub(indices)[source]

Subsamples the relation matrix based on item indices.

Intended to be used for batch-wise subsampling of global relations.

Parameters:

indices (Union[list[int], ndarray[Any, dtype[integer]], Tensor]) – A flat list of item indices.

Return type:

Tensor

Returns:

A square PyTorch tensor consisting of all relations between items with the given indices.

to_neighbor_tuple()[source]

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

Return type:

NeighborRelationTuple

Returns:

The converted relations.

to_sparse_array()[source]

Converts the relations to a paradime.relationdata.SparseRelationArray.

Return type:

SparseRelationArray

Returns:

The converted relations.

to_square_array()[source]

Converts the relations to a paradime.relationdata.SquareRelationArray.

Return type:

SquareRelationArray

Returns:

The converted relations.

to_square_tensor()[source]

Converts the relations to a paradime.relationdata.SquareRelationTensor.

Return type:

SquareRelationTensor

Returns:

The converted relations.

to_triangular_array()[source]

Converts the relations to a paradime.relationdata.TriangularRelationArray.

Return type:

TriangularRelationArray

Returns:

The converted relations.

to_triangular_tensor()[source]

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

Return type:

TriangularRelationTensor

Returns:

The converted relations.

class paradime.relationdata.SquareRelationTensor(relations)[source]

Relation data in the form of a square tensor.

Parameters:

relations (Tensor) – A square PyTorch tensor of relation values.

data

The raw relation data.

sub(indices)[source]

Subsamples the relation matrix based on item indices.

Intended to be used for batch-wise subsampling of global relations.

Parameters:

indices (Union[list[int], ndarray[Any, dtype[integer]], Tensor]) – A flat list of item indices.

Return type:

Tensor

Returns:

A square PyTorch tensor consisting of all relations between items with the given indices.

to_neighbor_tuple()[source]

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

Return type:

NeighborRelationTuple

Returns:

The converted relations.

to_sparse_array()[source]

Converts the relations to a paradime.relationdata.SparseRelationArray.

Return type:

SparseRelationArray

Returns:

The converted relations.

to_square_array()[source]

Converts the relations to a paradime.relationdata.SquareRelationArray.

Return type:

SquareRelationArray

Returns:

The converted relations.

to_square_tensor()[source]

Converts the relations to a paradime.relationdata.SquareRelationTensor.

Return type:

SquareRelationTensor

Returns:

The converted relations.

to_triangular_array()[source]

Converts the relations to a paradime.relationdata.TriangularRelationArray.

Return type:

TriangularRelationArray

Returns:

The converted relations.

to_triangular_tensor()[source]

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

Return type:

TriangularRelationTensor

Returns:

The converted relations.

class paradime.relationdata.TriangularRelationArray(relations)[source]

Relation data in ‘triangular’ vector-form.

Parameters:

relations (ndarray) – A Numpy array of relation values, as accepted by scipy.spatial.distance.squareform().

data

The raw relation data.

sub(indices)[source]

Subsamples the relation matrix based on item indices.

Intended to be used for batch-wise subsampling of global relations.

Parameters:

indices (Union[list[int], ndarray[Any, dtype[integer]], Tensor]) – A flat list of item indices.

Return type:

Tensor

Returns:

A square PyTorch tensor consisting of all relations between items with the given indices.

to_neighbor_tuple()[source]

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

Return type:

NeighborRelationTuple

Returns:

The converted relations.

to_sparse_array()[source]

Converts the relations to a paradime.relationdata.SparseRelationArray.

Return type:

SparseRelationArray

Returns:

The converted relations.

to_square_array()[source]

Converts the relations to a paradime.relationdata.SquareRelationArray.

Return type:

SquareRelationArray

Returns:

The converted relations.

to_square_tensor()[source]

Converts the relations to a paradime.relationdata.SquareRelationTensor.

Return type:

SquareRelationTensor

Returns:

The converted relations.

to_triangular_array()[source]

Converts the relations to a paradime.relationdata.TriangularRelationArray.

Return type:

TriangularRelationArray

Returns:

The converted relations.

to_triangular_tensor()[source]

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

Return type:

TriangularRelationTensor

Returns:

The converted relations.

class paradime.relationdata.TriangularRelationTensor(relations)[source]

Relation data in ‘triangular’ vector-form.

Parameters:

relations (Tensor) – A PyTorch tensor of relation values, with a shape as accepted by scipy.spatial.distance.squareform().

data

The raw relation data.

sub(indices)[source]

Subsamples the relation matrix based on item indices.

Intended to be used for batch-wise subsampling of global relations.

Parameters:

indices (Union[list[int], ndarray[Any, dtype[integer]], Tensor]) – A flat list of item indices.

Return type:

Tensor

Returns:

A square PyTorch tensor consisting of all relations between items with the given indices.

to_neighbor_tuple()[source]

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

Return type:

NeighborRelationTuple

Returns:

The converted relations.

to_sparse_array()[source]

Converts the relations to a paradime.relationdata.SparseRelationArray.

Return type:

SparseRelationArray

Returns:

The converted relations.

to_square_array()[source]

Converts the relations to a paradime.relationdata.SquareRelationArray.

Return type:

SquareRelationArray

Returns:

The converted relations.

to_square_tensor()[source]

Converts the relations to a paradime.relationdata.SquareRelationTensor.

Return type:

SquareRelationTensor

Returns:

The converted relations.

to_triangular_array()[source]

Converts the relations to a paradime.relationdata.TriangularRelationArray.

Return type:

TriangularRelationArray

Returns:

The converted relations.

to_triangular_tensor()[source]

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

Return type:

TriangularRelationTensor

Returns:

The converted relations.

paradime.relationdata.relation_factory(relations, force_flat=False)[source]

Creates a paradime.relationdata.RelationData object from a variety of input formats.

Parameters:
  • relations (Union[ndarray, Tensor, spmatrix, Tuple[ndarray, ndarray]]) – The relations, specified either as a flat array or tensor, a square array or tensor, a vector-form (triangular) array or tensor, a sparse array, or a tuple (n, r), where n is an array of neighor indices for each data point and r is an array of relation values of the same shape.

  • force_flat (bool) – If set true, disables the check for triangular arrays and tensors. Useful if flat relation data might have a length equal to a triangular number.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData object with a subclass depending on the input format.

Transforms

Relation transforms for ParaDime.

The paradime.tranforms module defines various classes used to transform relations between data points.

class paradime.transforms.AdaptiveNeighborhoodRescale(kernel, find_param, verbose=False, **kwargs)[source]

Rescales relation values for each data point based on its neighbors.

This is a base class for transformations such as those used by t-SNE or UMAP. For each data point, a parameter is fitted by comparing kernel-transformed relations to a target value. Once the parameter value is found, the kernel function is used to transform the relations.

Parameters:
  • kernel (Callable[[ndarray, float], Union[float, ndarray]]) – The kernel function used to transform the relations. This is a callable taking the relation values for a data point, along with a parameter.

  • find_param (Callable[..., float]) – The function used to find the parameter value. This is a callable taking the relation values and a fixed value to compare the transformed relations against.

  • verbose (bool) – Verbosity toggle.

property param_values: ndarray

The parameter values determined for each data point. Available only after calling the transform.

Return type:

ndarray

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.ConnectivityBasedRescale(n_neighbors=15, verbose=False, **kwargs)[source]

Applies a connectivity-based transformation to the relation values.

The relation values are rescaled using shifted Guassian kernels. The shift is equal to the closes neighboring data point, and the kernel width is set by by comparing the summed kernel values to the binary logarithm of the specified number of neighbors. This is the relation transform used by UMAP.

Parameters:
  • n_neighbors (float) – The number of nearest neighbors used to determine the kernel widths.

  • verbose (bool) – Verbosity toggle.

  • **kwargs – Passed on to scipy.optimize.root_scalar(), which determines the kernel widths. By default, this is set to use a bracket of [10^(-6), 10^6] for the root search.

class paradime.transforms.Functional(f, in_place=True, check_valid=False)[source]

Applies a function to the relation data.

By default, this transform applies a given function to the data attribute of the paradime.relationdata.RelationData instance in place and returns the transformed instance. This assumes that the transform does not change the data in a way that is incompatible with the paradime.relationdata.RelationData subclass. The transform can also be applied to the whole paradime.relationdata.RelationData instance by setting in_place to False. In this case, the output is that of the given function.

Parameters:
  • f (Callable[..., Any]) – Function to be applied to the relations.

  • in_place (bool) – Toggles whether the function is applied to the data attribute of the paradime.relationdata.RelationData object (default), or to the paradime.relationdata.RelationData itself.

  • check_valid (bool) – Toggles whether a check for the transformed relation data’s validity is performed. If in_place is set to False, no checks are performed regardless of this parameter.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.Identity(**kwargs)[source]

A placeholder identity transform.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.ModifiedCauchyTransform(min_dist=0.1, spread=1.0, a=None, b=None)[source]

Transforms relations based on a modified Cauchy distribution.

This transform applies a modified Cauchy distribution function to the relations. The distribution’s parameters a and b are determined from the parameters min_dist and spread by fitting a smooth approximation of an offset exponential decay.

Parameters:
  • min_dist (float) – Effective minimum distance of points if the transformed relations were to be used for calculating an embedding.

  • spread (float) – Effective scale of the points if the tranformed relations were to be used for calculating an embedding.

  • a (Union[float, Tensor, None]) – Parameter to define the distribution directly. It can be optimized together with the DR model in a ParametricDR by setting it to one of the model’s additional parameters.

  • b (Union[float, Tensor, None]) – Parameter to define the distribution directly. It can be optimized together with the DR model in a ParametricDR by setting it to one of the model’s additional parameters.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.Normalize(**kwargs)[source]

Normalizes all relations at once.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.NormalizeRows(**kwargs)[source]

Normalizes the relation values for each data point separately.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.PerplexityBasedRescale(perplexity=30, verbose=False, **kwargs)[source]

Applies a perplexity-based transformation to the relation values.

The relation values are rescaled using Guassian kernels. For each data point, the kernel width is determined by comparing the entropy of the relation values to the binary logarithm of the specified perplexity. This is the relation transform used by t-SNE.

Parameters:
  • perplexity (float) – The desired perplexity, which can be understood as a smooth measure of nearest neighbors.

  • verbose (bool) – Verbosity toggle.

  • **kwargs – Passed on to scipy.optimize.root_scalar(), which determines the kernel widths. By default, this is set to use a bracket of [0.01, 1.] for the root search.

class paradime.transforms.RelationTransform(**kwargs)[source]

Base class for relation transforms.

Custom transforms should subclass this class.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.StudentTTransform(alpha)[source]

Transforms relations based on Student’s t-distribution.

Parameters:

alpha (Union[float, Tensor]) – Degrees of freedom of the distribution. This can either be a float or a PyTorch tensor. Alpha can be optimized together with the DR model in a paradime.dr.ParametricDR by setting it to one of the model’s additional parameters.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.Symmetrize(subtract_product=False)[source]

Symmetrizes the relation values.

Parameters:

subtract_product (bool) – Specifies which symmetrization routine to use. If set to False (default), a matrix M is symmetrized by calculating 1/2 * (M + M^T); if set to True, M is symmetrized by calculating M + M^T - M * M^T, where ‘*’ is the element-wise (Hadamard) product.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.ToFlatArray(**kwargs)[source]

Converts the relations to a paradime.relationdata.FlatRelationArray.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.ToFlatTensor(**kwargs)[source]

Converts the relations to a paradime.relationdata.FlatRelationTensor.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.ToNeighborTuple(**kwargs)[source]

Converts the relations to a paradime.relationdata.NeighborRelationTuple.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.ToSparseArray(**kwargs)[source]

Converts the relations to a paradime.relationdata.SparseRelationArray.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.ToSquareArray(**kwargs)[source]

Converts the relations to a paradime.relationdata.SquareRelationArray.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.ToSquareTensor(**kwargs)[source]

Converts the relations to a paradime.relationdata.SquareRelationTensor.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.ToTriangularArray(**kwargs)[source]

Converts the relations to a paradime.relationdata.TriangularRelationArray.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.ToTriangularTensor(**kwargs)[source]

Converts the relations to a paradime.relationdata.TriangularRelationTensor.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

class paradime.transforms.ZeroDiagonal(**kwargs)[source]

Sets all self-relations to zero.

transform(reldata)[source]

Applies the transform to input data.

Parameters:

reldata (RelationData) – The paradime.relationdata.RelationData instance to be transformed.

Return type:

RelationData

Returns:

A paradime.relationdata.RelationData instance containing

the transformed relation values.

Loss

Losses for ParaDime routines.

The paradime.loss module implements the specification of losses for ParaDime routines. The supported losses are paradime.loss.RelationLoss, paradime.loss.ClassificationLoss, paradime.loss.ReconstructionLoss, and paradime.loss.CompoundLoss.

class paradime.loss.ClassificationLoss(loss_function=CrossEntropyLoss(), data_key='main', label_key='labels', classification_method='classify', name=None)[source]

A loss that compares predicted class labels against ground truth labels.

This loss compares predicted class labels to ground truth labels in a batch using a specified loss function (cross-entropy by default). Class labels are predicted by applying the model’s classify() method to the specified data entry of the input batch.

Parameters:
  • loss_function (Callable[[Tensor, Tensor], Tensor]) – The loss function to be applied.

  • data_key (str) – The key under which to find the data in the input batch.

  • label_key (str) – The key under which ground truth labels are stored in the input batch.

  • classification_method (str) – The model method to be used for classifying the batch of input data.

  • name (Optional[str]) – Name of the loss (used by logging functions).

forward(model, global_relations, batch_relations, batch, device)[source]

Apply the loss to a batch of input data.

Parameters:
Return type:

Tensor

Returns:

A single-item PyTorch tensor with the computed loss.

class paradime.loss.CompoundLoss(losses, weights=None, name=None)[source]

A weighted sum of multiple losses.

Parameters:
  • losses (list[Loss]) – A list of paradime.loss.Loss instances to be summed.

  • weights (Union[ndarray, Tensor, list[float], None]) – A list of weights to multiply the losses with. Must be of the same length as the list of losses. If no weights are specified, all losses are weighted equally.

  • name (Optional[str]) – Name of the loss (used by logging functions).

checkpoint()[source]

Create a checkpoint of the most recent accumulated loss.

Appends the value of the most recent accumulated loss to the loss’s history attribute. If the loss is a paradime.loss.CompoundLoss, checkpoints are also created for each individual loss.

Return type:

None

detailed_history()[source]

Returns a detailed history of the compound loss.

Return type:

Tensor

Returns:

A PyTorch tensor with the history of each loss component multiplied by its weight.

forward(model, global_relations, batch_relations, batch, device)[source]

Apply the loss to a batch of input data.

Parameters:
Return type:

Tensor

Returns:

A single-item PyTorch tensor with the computed loss.

class paradime.loss.Loss(name=None)[source]

Base class for losses.

Custom losses should subclass this class.

name

The name of the loss (used by logging functions).

checkpoint()[source]

Create a checkpoint of the most recent accumulated loss.

Appends the value of the most recent accumulated loss to the loss’s history attribute. If the loss is a paradime.loss.CompoundLoss, checkpoints are also created for each individual loss.

Return type:

None

forward(model, global_relations, batch_relations, batch, device)[source]

Apply the loss to a batch of input data.

Parameters:
Return type:

Tensor

Returns:

A single-item PyTorch tensor with the computed loss.

class paradime.loss.PositionLoss(loss_function=MSELoss(), data_key='main', position_key='pos', embedding_method='embed', name=None)[source]

A loss that compares embedding coordinates to given positions.

This loss compares embedding coordiantes to given ground-truth coordinates in a batch using a specified loss function (mean-square-error by default). Embedding positions are computed by applying the model’s embed() method to the specified data entry of the input batch.

Parameters:
  • loss_function (Callable[[Tensor, Tensor], Tensor]) – The loss function to be applied.

  • data_key (str) – The key under which to find the data in the input batch.

  • position_key (str) – The key under which the ground truth positions are stored in the input batch.

  • embedding_method (str) – The model method to be used for embedding the batch of input data.

  • name (Optional[str]) – Name of the loss (used by logging functions).

forward(model, global_relations, batch_relations, batch, device)[source]

Apply the loss to a batch of input data.

Parameters:
Return type:

Tensor

Returns:

A single-item PyTorch tensor with the computed loss.

class paradime.loss.ReconstructionLoss(loss_function=MSELoss(), data_key='main', encoding_method='encode', decoding_method='decode', name=None)[source]

A simple reconstruction loss for auto-encoding data.

This loss compares reconstructed data to input data in a batch using a specified loss function (mean-square-error by default). Reconstructed data is computed by applying the model’s decode() and encode() methods subsequently to the specified data entry of the input batch.

Parameters:
  • loss_function (Callable[[Tensor, Tensor], Tensor]) – The loss function to be applied.

  • data_key (str) – The key under which to find the data in the input batch.

  • encoding_method (str) – The model method to be used for encoding the batch of input data.

  • decoding_method (str) – The model method to be used for decoding the encoded batch of input data.

  • name (Optional[str]) – Name of the loss (used by logging functions).

forward(model, global_relations, batch_relations, batch, device)[source]

Apply the loss to a batch of input data.

Parameters:
Return type:

Tensor

Returns:

A single-item PyTorch tensor with the computed loss.

class paradime.loss.RelationLoss(loss_function, global_relation_key='rel', batch_relation_key='rel', embedding_method='embed', normalize_sub=True, name=None)[source]

A loss that compares batch-wise relation data against a subset of global relation data.

This loss applies a specified loss function to a subset of pre-computed global relations and the batch-wise relations found under specified keys, respectively. Batch-wise relations are computed from embedded coordinates by applying the model’s embed() method to the specified data entry of the input batch.

Parameters:
  • loss_function (Callable[[Tensor, Tensor], Tensor]) – The loss function to be applied.

  • global_relation_key (str) – Key under which to find the global relations.

  • batch_relation_key (str) – Key under which to find the batch-wise relations.

  • embedding_method (str) – The model method to be used for embedding the batch of input data.

  • name (Optional[str]) – Name of the loss (used by logging functions).

forward(model, global_relations, batch_relations, batch, device)[source]

Apply the loss to a batch of input data.

Parameters:
Return type:

Tensor

Returns:

A single-item PyTorch tensor with the computed loss.

paradime.loss.cross_entropy_loss(p, q, epsilon=1e-07)[source]

Cross-entropy loss as used by UMAP.

To be used as a loss function in the paradime.loss.RelationLoss of a parametric DR routine.

Parameters:
  • p (Tensor) – Input tensor containing (a batch of) probabilities.

  • q (Tensor) – Input tensor containing (a batch of) probabilities.

  • epsilon (float) – Small constant used to avoid numerical errors caused by near-zero probability values.

Return type:

Tensor

Returns:

The cross-entropy loss of the two input tensors, divided by the number items in the batch.

paradime.loss.kullback_leibler_div(p, q, epsilon=1e-07)[source]

Kullback-Leibler divergence.

To be used as a loss function in the paradime.loss.RelationLoss of a parametric DR routine.

Parameters:
  • p (Tensor) – Input tensor containing (a batch of) probabilities.

  • q (Tensor) – Input tensor containing (a batch of) probabilities.

  • epsilon (float) – Small constant used to avoid numerical errors caused by near-zero probability values.

Return type:

Tensor

Returns:

The Kullback-Leibler divergence of the two input tensors, divided by the number of items in the batch.

Routines

Predefined ParaDime routines for existing DR techniques.

The paradime.routines module implements parametric versions of existing dimensionality reduction techniques using the paradime.dr.ParametricDR interface.

class paradime.routines.ParametricTSNE(perplexity=30.0, alpha=1.0, model=None, in_dim=None, out_dim=2, hidden_dims=[100, 50], initialization='pca', epochs=30, init_epochs=10, batch_size=500, init_batch_size=None, learning_rate=0.01, init_learning_rate=None, data_key='main', use_cuda=False, verbose=False)[source]

A parametric version of t-SNE.

This class provides a high-level interface for a paradime.paradime.ParametricDR routine with the following specifications:

Parameters:
  • perplexity (float) – The desired perplexity, which can be understood as a smooth measure of nearest neighbors used to determine high-dimensional relations between data points.

  • alpha (float) – Degrees of freedom of the Student’s t-disitribution used to calculate low-dimensional relations between data points.

  • model (Optional[Model]) – The model used to embed the high dimensional data.

  • in_dim (Optional[int]) – The numer of dimensions of the input data, used to construct a default model in case none is specified. If a dataset is specified at instantiation, the correct value for this parameter will be inferred from the data dimensions.

  • out_dim (int) – The number of output dimensions (i.e., the dimensionality of the embedding).

  • hidden_dims (list[int]) – Dimensions of hidden layers for the default fully connected model that is created if no model is specified.

  • initialization (Optional[str]) – How to pretrain the model to mimic initialization of low-dimensional positions. By default ('pca') the model is pretrained to output an approximation of PCA before beginning the main training phase.

  • epochs (int) – The number of epochs in the main training phase.

  • init_epochs (int) – The number of epochs in the pretraining (initialization). phase.

  • batch_size (int) – The number of items in a batch during the main training phase.

  • init_batch_size (Optional[int]) – The number of items in a batch during the pretraining (initialization).

  • learning_rate (float) – The learning rate during the main training phase.

  • init_learning_reate – The learning rate during the pretraining (initialization).

  • data_key (str) – The key under which the data can be found in the dataset.

  • dataset – The dataset on which to perform the training. Datasets can be registerd after instantiation using the register_dataset() class method.

  • use_cuda (bool) – Whether or not to use the GPU for training.

  • verbosity – Verbosity flag.

class paradime.routines.ParametricUMAP(n_neighbors=30, min_dist=0.01, spread=1.0, a=None, b=None, model=None, in_dim=None, out_dim=2, hidden_dims=[100, 50], initialization='spectral', epochs=30, init_epochs=5, batch_size=10, negative_sampling_rate=5, init_batch_size=100, learning_rate=0.005, init_learning_rate=0.05, data_key='main', dataset=None, use_cuda=False, verbose=False)[source]

A parametric version of UMAP.

This class provides a high-level interface for a paradime.paradime.ParametricDR routine with the following specifications:

  • The global relations are paradime.relations.NeighborBasedPDist, transformed with a paradime.transforms.ConnectivityBasedRescale followed by paradime.tranforms.Symmetrize with product subtraction.

  • The batch relations are paradime.relations.DistsFromTo (since negative edge sampling is used), transformed with a paradime.relations.ModifiedCauchyTransform.

  • The first (optional) training phase intializes the model to approximate a spectral embedding based on the global relations (see intialization below).

  • The second training phase uses corss-entropy to compare the relations. This phase uses negative edge sampling.

Parameters:
  • n_neighbors (int) – The desired number of neighbors used for computing the high-dimensional pairwise relations.

  • min_dist (float) – Effective minimum distance of points in the embedding.

  • spread (float) – Effective scale of the points in the embedding.

  • a (Optional[float]) – Parameter to define the modified Cauchy distribution used to compute low-dimensional relations.

  • b (Optional[float]) – Parameter to define the modified Cauchy distribution used to compute low-dimensional relations.

  • model (Optional[Model]) – The model used to embed the high dimensional data.

  • in_dim (Optional[int]) – The numer of dimensions of the input data, used to construct a default model in case none is specified. If a dataset is specified at instantiation, the correct value for this parameter will be inferred from the data dimensions.

  • out_dim (int) – The number of output dimensions (i.e., the dimensionality of the embedding).

  • hidden_dims (list[int]) – Dimensions of hidden layers for the default fully connected model that is created if no model is specified.

  • initialization (Optional[str]) – How to pretrain the model to mimic initialization of low-dimensional positions. By default ('spectral') the model is pretrained to output an approximation of a soectral embedding based on the high-dimensional relations before beginning the main training phase.

  • epochs (int) – The number of epochs in the main training phase.

  • init_epochs (int) – The number of epochs in the pretraining (initialization). phase.

  • batch_size (int) – The number of items in a batch during the main training phase.

  • init_batch_size (int) – The number of items in a batch during the pretraining (initialization).

  • learning_rate (float) – The learning rate during the main training phase.

  • init_learning_reate – The learning rate during the pretraining (initialization).

  • data_key (str) – The key under which the data can be found in the dataset.

  • dataset (Union[ndarray, Tensor, Mapping[str, Union[ndarray, Tensor]], Dataset, None]) – The dataset on which to perform the training. Datasets can be registerd after instantiation using the register_dataset() class method.

  • use_cuda (bool) – Whether or not to use the GPU for training.

  • verbosity – Verbosity flag.

Utils

Utility functions for ParaDime.

The paradime.utils subpackage includes various modules that implement utility functions for logging, plotting, and input conversion.

Convert

Conversion utilities for ParaDime.

The paradime.utils.convert module implements various conversion functions for tensors-like objects and index lists.

paradime.utils.convert.rowcol_to_triu_index(i, j, dim)[source]

Converts matrix indices to upper-triangular form.

Converts a pair of row and column indices of a symmetrical square array to the corresponding index of the list of upper triangular values.

Parameters:
  • i (int) – The row index.

  • j (int) – The column index.

  • dim (int) – The size of the square matrix.

Return type:

int

Returns:

The upper triangular index.

Raises:

ValueError – For diagonal indices (i.e., if i equals j).

paradime.utils.convert.to_numpy(X)[source]

Converts a tensor-like object to a numpy array.

Parameters:

X (Union[ndarray, Tensor, list[float]]) – The tensor-like object to be converted.

Return type:

ndarray

Returns:

The resulting numpy array.

paradime.utils.convert.to_torch(X)[source]

Converts a tensor-like object to a PyTorch tensor.

Parameters:

X (Union[ndarray, Tensor, list[float]]) – The tensor-like object to be converted.

Return type:

Tensor

Returns:

The resulting PyTorch tensor. If the input was not a PyTorch tensor already, the output tensor will be of type float32 for float inputs and of type int32 for integer inputs.

paradime.utils.convert.triu_to_square_dim(len_triu)[source]

Calculates the size of a square matrix given the length of the list of its upper-triangular values.

Parameters:

len_triu (int) – The lenght of the list of upper-triangular values.

Return type:

int

Returns:

The size of the square matrix.

Logging

Loggin utility for ParaDime.

The paradime.utils.logging module implements logging functionality used by verbose ParaDime routines.

paradime.utils.logging.log(message)[source]

Calls the ParaDime logger to print a timestamp and a message.

Parameters:

message (str) – The message string to print.

Return type:

None

paradime.utils.logging.set_logfile(filename, mode='a', disable_stdout=False, disable_other_files=False)[source]

Configure the ParaDime logger to write its output to a file.

Parameters:
  • filename (str) – The path to the log file.

  • mode (str) – The mode to open the file.

  • disable_stdout (bool) – Whether or not to disbale logging to stdout.

  • disable_other_files (bool) – Whether or not to remove other file handlers from the ParaDime logger.

Return type:

None

Plotting

Plotting utilities for ParaDime.

The paradime.utils.plotting module implements plotting functions and color palette retrieval.

paradime.utils.plotting.get_color_palette()[source]

Get the custom ParaDime color palette.

The palette is usually located in an assets folder in the form of a JSON file. If the JSON file is not found, this method attemps to create it from parsing an SVG file.

Return type:

dict[str, str]

Returns:

The color palette as a dict of names and hex color values.

Raises:

FileNotFoundError – If neither the JSON nor the SVG file can be found.

paradime.utils.plotting.scatterplot(coords, labels=None, colormap=None, labels_to_index=None, figsize=(10, 10), bgcolor='#fcfcfc', legend=True, legend_options=None, ax=None, **kwargs)[source]

Creates a scatter plot of points at the given coordinates.

Parameters:
  • coords (Union[ndarray, Tensor]) – The coordinates of the points.

  • labels (Union[ndarray, Tensor, None]) – An list of categorical labels. If labels are given, a categorical color scale is used and a legend is constructed automatically.

  • colormap (Optional[list[str]]) – A list of colors to use instead of the default categorical color scale based on the ParaDime palette.

  • labels_to_index (Optional[dict]) – A dict that maps labels to indices which are then used to access the colors in the categorical color scale.

  • figsize (tuple[float, float]) – Width and height of the plot in inches.

  • bgcolor (Optional[str]) – The background color of the plot, which by default is also to draw thin outlines around the points.

  • legend (bool) – Whether or not to include the automatically created legend.

  • legend_options (Optional[dict[str, Any]]) – A dict of keyword arguments that are passed on to the legend method.

  • ax (Optional[matplotlib.axes.Axes]) – An axes of the current figure. This argument is useful if the scatterplot should be added to an existing figure.

  • kwargs – Any other keyword arguments are passed on to matplotlib’s scatter method.

Return type:

None

Returns:

The matplotlib.axes.Axes instance of the plot.

Seed

Random seeding for ParaDime.

The paradime.utils.seed subpackage implements a function to seed all random number generators potentially involved in a ParaDime routine.

paradime.utils.seed.seed_all(seed)[source]

Sets several seeds to maximize reproducibility.

For infos on reproducibility in PyTorch, see https://pytorch.org/docs/stable/notes/randomness.html.

Parameters:

seed (int) – The integer to use as a seed.

Return type:

Generator

Returns:

The torch.Generator instance returned by torch.manual_seed().