DR¶
Main module of ParaDime.
The paradime.dr
module implements the main functionality of ParaDime.
This includes the paradime.dr.ParametricDR
class, as well as
paradime.dr.Dataset
and paradime.dr.TrainingPhase
.
- class paradime.dr.Dataset(data)[source]¶
A dataset for dimensionality reduction.
Constructs a PyTorch :class:torch.utils.data.Dataset from the given data in such a way that each item or batch of items is a dictionary with PyTorch tensors as values. If only a single numpy array or PyTorch tensor is passed, this data will be available under the
'data'
key of the dict. Alternatively, a dict of tensors and/or arrays can be passed, which allows additional data such as labels for supervised learning. By default, an entry for indices is added to the dict, if it is not yet included in the passed dict.
- class paradime.dr.DerivedData(func, type_key_tuples=[('data', 'main')], **kwargs)[source]¶
A derived dataset entry to be computed later.
Derived dataset entries can be used to set up rules for extending existing datasets later based on functions acting on other dataset entries or global relations.
- Parameters:
- class paradime.dr.NegSampledEdgeDataset(dataset, relations, neg_sampling_rate=5, data_key='main')[source]¶
A dataset that supports negative edge sampling.
Constructs a PyTorch
torch.utils.data.Dataset
suitable for negative sampling from a regular :class:Dataset. The passed relation data, along with the negative samplnig rater
, is used to inform the negative sampling process. Each “item”i
of the resulting dataset is essentially a small batch of items, including the itemi
of the original dataset, one of it’s actual neighbors, andr
random other items that are considered to not be neighbors ofi
. Remaining data from the original dataset is collated using PyTorch’storch.utils.data.default_collate()
method.- Parameters:
data – The data in the form of a ParaDime
paradime.dr.Dataset
.relations (
RelationData
) – Aparadime.relationdata.RelationData
object with the edge data used for negative edge sampling.neg_sampling_rate (
int
) – The negative sampling rate.
- class paradime.dr.ParametricDR(model=None, in_dim=None, out_dim=2, hidden_dims=[100, 50], derived_data=None, global_relations=None, batch_relations=None, losses=None, training_defaults=TrainingPhase( name=None, epochs=5, batch_size=50, batches_per_epoch=-1, sampling='standard', edge_rel_key='rel', neg_sampling_rate=5, loss_keys=['loss'], loss_weights=[1.0], _loss=None, optimizer=<class 'torch.optim.adam.Adam'>, learning_rate=0.01, report_interval=5, kwargs={}, ), training_phases=None, use_cuda=False, verbose=False)[source]¶
A general parametric dimensionality reduction routine.
- Parameters:
model (
Optional
[Module
]) – The PyTorchtorch.nn.module
whose parameters are optimized during training.in_dim (
Optional
[int
]) – The numer of dimensions of the input data, used to construct a default model in case none is specified. If a dataset is specified at instantiation, the correct value for this parameter will be inferred from the data dimensions.out_dim (
int
) – The number of output dimensions (i.e., the dimensionality of the embedding).hidden_dims (
list
[int
]) – Dimensions of hidden layers for the default fully connected model that is created if no model is specified.derived_data (
Optional
[dict
[str
,DerivedData
]]) – A dictionary ofparadime.dr.DerivedData
instances. These entries are computed before training, either before or after the global relations, depending on the options in the entries.global_relations (
Union
[Relations
,dict
[str
,Relations
],None
]) – A singleparadime.relations.Relations
instance or a dictionary with multipleparadime.relations.Relations
instances. Global relations are calculated once for the whole dataset before training.batch_relations (
Union
[Relations
,dict
[str
,Relations
],None
]) – A singleparadime.relations.Relations
instance or a dictionary with multipleparadime.relations.Relations
instances. Batch relations are calculated during training for each batch and are compared to an appropriate subset of the global relations by aparadime.loss.RelationLoss
.losses (
Union
[Loss
,dict
[str
,Loss
],None
]) – A singleparadime.loss.Loss
instance or a dictionary with multipleparadime.loss.Loss
instances. These losses are accessed by the training phases via the respective keys.training_defaults (
TrainingPhase
) – Aparadime.dr.TrainingPhase
object with settings that override the default values of all other training phases. This parameter is useful to avoid having to repeatedly set parameters to the same non-default value across training phases. Defaults can also be specified after isntantiation using theset_training_deafults()
class method.training_phases (
Optional
[list
[TrainingPhase
]]) – A singleparadime.dr.TrainingPhase
object or a list ofparadime.dr.TrainingPhase
objects defining the training phases to be run. Training phases can also be added after instantiation using theadd_training_phase()
class method.use_cuda (
bool
) – Whether or not to use the GPU for training.verbose (
bool
) – Verbosity flag. This setting overrides all verbosity settings of relations, transforms and/or losses used within the parametric dimensionality reduction.
- device¶
The device on which the model is allocated (depends on the value specified for
use_cuda
).
- add_data(data)[source]¶
Adds data to a parametric dimensionality reduction routine.
Tensor-like data will be added to registered dataset. If none is registered yet, a new one will be created and registered. Derived entries will be added to the routine.
- add_training_phase(training_phase=None, name=None, epochs=None, batch_size=None, batches_per_epoch=None, sampling=None, edge_rel_key=None, neg_sampling_rate=None, loss_keys=None, loss_weights=None, optimizer=None, learning_rate=None, report_interval=None, **kwargs)[source]¶
Adds a single training phase to a parametric dimensionality reduction routine.
This methods accepts either a
paradime.dr.TrainingPhase
instance or individual parameters passed with the same keyword syntax used byparadime.dr.TrainingPhase
.- Parameters:
training_phase (
Optional
[TrainingPhase
]) – Aparadime.dr.TrainingPhase
instance with the new default settings. Instead of this, individual parameters can also be passed. For a full list of training phase settings, seeparadime.dr.TrainingPhase
.- Raises:
paradime.exceptions.UnsupportedConfigurationError – This error is raised if the type of
paradime.relation.Relations
is not compatible with the sampling option.- Return type:
- apply(X, method=None)[source]¶
Applies the model to input data.
Applies the model to an input tensor after first switching off PyTorch’s automatic gradient tracking. This method also ensures that the resulting output tensor is on the CPU. The
method
parameter allows calling of any of the model’s methods in this way, but by default, the model’s__call__
method will be used (which wraps aroundforward
.)
- compute_derived_data(only=None)[source]¶
Computes the derived data entries in the registered dataset.
After caling this function, the derived entries will be stored as regular entries in the routine’s dataset.
- compute_global_relations(force=False)[source]¶
Computes the global relations.
The computed relation data are stored in the instance’s
global_relation_data
attribute.
- classmethod from_spec(file_or_spec, model=None)[source]¶
Creates a
paradime.dr.ParametricDR
routine from a ParaDime specification.- Parameters:
file_or_spec (
Union
[str
,dict
]) – The specification, either as a dictionary or as a path to a YAML/JSON file.- Return type:
TypeVar
(_ParametricDR
, bound=ParametricDR
)- Returns:
The
paradime.dr.ParametricDR
routine.- Raises:
paradime.exceptions.SpecificationError – If the validation of the specification has failed.
- run_training_phase(training_phase)[source]¶
Runs a single training phase.
- Parameters:
training_phase (
TrainingPhase
) – Aparadime.dr.TrainingPhase
instance.- Return type:
- set_training_defaults(training_phase=None, epochs=None, batch_size=None, batches_per_epoch=None, sampling=None, edge_rel_key=None, neg_sampling_rate=None, loss_keys=None, loss_weights=None, optimizer=None, learning_rate=None, report_interval=5, **kwargs)[source]¶
Sets a parametric dimensionality reduction routine’s default training parameters.
This methods accepts either a
paradime.dr.TrainingPhase
instance or individual parameters passed with the same keyword syntax used byparadime.dr.TrainingPhase
. The specified default parameters will be used instead of the regular defaults when adding training phases.- Parameters:
training_phase (
Optional
[TrainingPhase
]) – Aparadime.dr.TrainingPhase
instance with the new default settings. Instead of this, individual parameters can also be passed. For a full list of training phase settings, seeparadime.dr.TrainingPhase
.- Return type:
- class paradime.dr.TrainingPhase(name=None, epochs=5, batch_size=50, batches_per_epoch=-1, sampling='standard', edge_rel_key='rel', neg_sampling_rate=5, loss_keys=['loss'], loss_weights=None, optimizer=<class 'torch.optim.adam.Adam'>, learning_rate=0.01, report_interval=5, **kwargs)[source]¶
A collection of parameter settings for a single phase in the training of a
paradime.dr.ParametricDR
instance.- Parameters:
epochs (
int
) – The number of epochs to run in this phase. In standard item-based sampling, the model sees every item once per epoch In the case of negative edge sampling, this is not guaranteed, and an epoch instead comprisesbatches_per_epoch
batches (see parameter description below).batch_size (
int
) – The number of items/edges in a batch. In standard item-based sampling, a batch has this many items, and the edges used for batch relations are constructed from the items. In the case of negative edge sampling, this is the number of sampled positive edges. The total number of edges is higher by a factor ofr + 1
, wherer
is the negative sampling rate. The same holds for the number of items (apart from possible duplicates, which can result from the edge sampling and are removed).batches_per_epoch (
int
) – The number of batches per epoch. This parameter only has an effect for negative edge sampling, where the number of batches per epoch is not determined by the dataset size and the batch size. If this parameter is set to -1 (default), an epoch will comprise a number of batches that leads to a total number of sampled items roughly equal to the number of items in the dataset. If this parameter is set to an integer, an epoch will instead comprise that many batches.sampling (
Literal
[‘standard’, ‘negative_edge’]) – The sampling strategy, which can be either'standard'
(simple item-based sampling; default) or'negative_edge'
(negative edge sampling).edge_rel_key (
str
) – The key under which to find the global relations that should be used for negative edge sampling.neg_sampling_rate (
int
) – The number of negative (i.e., non-neighbor) edges to sample for each real neighborhood edge.loss_keys (
list
[str
]) – The keys under which to find the losses that should be minimized in this training phase.loss_weights (
Optional
[list
[float
]]) – The weights for the losses. If none are specified, losses will be weighed equally.optimizer (
type
) – The optmizer to use for loss minimization.learning_rate (
float
) – The learning rate used in the optimization.report_interval (
int
) – How often the loss should be reported during training, given in terms of epochs. E.g., with a setting of 5, the loss will be reported every 5 epochs.kwargs – Additional kwargs that are passed on to the optimizer.
- loss¶
The loss constructed from the keys and weights specified above.
- paradime.dr.register_data_func(name, data_func)[source]¶
Registers a new data function to be used in ParaDime specifications.
- paradime.dr.register_loss(name, loss)[source]¶
Registers a new loss type to be used in ParaDime specifications.
- paradime.dr.register_loss_func(name, loss_func)[source]¶
Registers a new loss function to be used in ParaDime specifications.
- paradime.dr.register_relations(name, rel)[source]¶
Registers a new type of relations to be used in ParaDime specifications.
- Parameters:
name (
str
) – The name of the relation type.rel (
type
[Relations
]) – Theparadime.relations.Relations
to be registered.
- Return type:
- paradime.dr.register_transform(name, tf)[source]¶
Registers a new relation transform to be used in ParaDime specifications.
- Parameters:
name (
str
) – The name of the transform.tf (
type
[RelationTransform
]) – The :class:`paradime.transform.RelationTransform`to be registered.
- Return type:
Relations¶
Relation computation for ParaDime.
The paradime.relations
module defines various classes used to compute
relations between data points.
- class paradime.relations.DifferentiablePDist(p=2, metric=None, transform=None, data_key='main')[source]¶
Differentiable pairwise distances between data points.
- Parameters:
p (
float
) – Parameter that specificies which p-norm to use as a distance function. Ignored ifmetric
is set.metric (
Optional
[Callable
[[Tensor
,Tensor
],Tensor
]]) – The distance metric to be used.transform (
Union
[RelationTransform
,list
[RelationTransform
],None
]) – A singleparadime.transforms.Transform
or list ofparadime.transforms.Transform
instances to be applied to the relations.data_key (
str
) – The key to access the data for which to compute relations.verbose – Verbosity toggle.
- relations¶
A
paradime.relationdata.RelationData
instance containing the (possibly transformed) pairwise distances. Available only after callingcompute_relations()
.
- compute_relations(X=None, **kwargs)[source]¶
Calculates the pairwise distances.
If
metric
is not None, a flexible but memory-inefficient implementation is used instead of PyTorch’storch.nn.functional.pdist()
.- Parameters:
X (
Union
[ndarray
,Tensor
,None
]) – Input data tensor with one sample per row.- Return type:
- Returns:
A
paradime.relationdata.RelationData
instance containing the (possibly transformed) pairwise distances.
- class paradime.relations.DistsFromTo(metric=None, transform=None, data_key='main')[source]¶
Distances between individual pairs of data points.
- Parameters:
metric (
Optional
[Callable
[[Tensor
,Tensor
],Tensor
]]) – The distance metric to be used.transform (
Union
[RelationTransform
,list
[RelationTransform
],None
]) – A singleparadime.transforms.Transform
or list ofparadime.transforms.Transform
instances to be applied to the relations.data_key (
str
) – The key to access the data for which to compute relations.
- relations¶
A
paradime.relationdata.RelationData
instance containing the (possibly transformed) pairwise distances. Available only after callingcompute_relations()
.
- compute_relations(X=None, **kwargs)[source]¶
Calculates the distances.
- Parameters:
X (
Union
[ndarray
,Tensor
,None
]) – Input data tensor of shape (2, n, dim), where n is the number of pairs of data points.- Return type:
- Returns:
A
paradime.relationdata.RelationData
instance containing the (possibly transformed) pairwise distances.
- class paradime.relations.NeighborBasedPDist(n_neighbors=None, metric=None, transform=None, data_key='main', verbose=False)[source]¶
Approximate, nearest-neighbor-based pairwise distances between data points.
- Parameters:
n_neighbors (
Optional
[int
]) – Number of nearest neighbors to be considered. If not specified, this will be set to 5 percent of the number of data points. If the transforms include anyparadime.transforms.AdaptiveNeighborhoodRescale
instances, this parameter will be overridden according to their parameters.metric (
Union
[Callable
[[Tensor
,Tensor
],Tensor
],str
,None
]) – The distance metric to be used.transform (
Union
[RelationTransform
,list
[RelationTransform
],None
]) – A singleparadime.transforms.Transform
or list ofparadime.transforms.Transform
instances to be applied to the relations.data_key (
str
) – The key to access the data for which to compute relations.verbose (
bool
) – Verbosity toggle.
- relations¶
A
paradime.relationdata.RelationData
instance containing the (possibly transformed) pairwise distances. Available only after callingcompute_relations()
.
- class paradime.relations.PDist(metric=None, transform=None, keep_result=True, data_key='main', verbose=False)[source]¶
Full pairwise distances between data points.
- Parameters:
metric (
Union
[Callable
,str
,None
]) – The distance metric to be used.transform (
Union
[RelationTransform
,list
[RelationTransform
],None
]) – A singleparadime.transforms.Transform
or list ofparadime.transforms.Transform
instances to be applied to the relations.keep_result – Specifies whether or not to keep previously calculated distances, rather than computing new ones.
data_key (
str
) – The key to access the data for which to compute relations.verbose (
bool
) – Verbosity toggle.
- relations¶
A
paradime.relationdata.RelationData
instance containing the (possibly transformed) pairwise distances. Available only after callingcompute_relations()
.
- class paradime.relations.Precomputed(X, transform=None)[source]¶
Precomputed relations between data points.
- Parameters:
X (
Union
[ndarray
,Tensor
]) – The precomputed relations, in a form accepted byparadime.relationdata.relation_factory()
.transform (
Union
[RelationTransform
,list
[RelationTransform
],None
]) – A singleparadime.transforms.Transform
or list ofparadime.transforms.Transform
instances to be applied to the relations.
- relations¶
A
paradime.relationdata.RelationData
instance
- containing the
- Type:
possibly transformed
Relation Data¶
Relation data containers for ParaDime.
The paradime.relationdata
module implements container classes for
various formats of relation data. The relation data containers are used by the
different paradime.relations.Relations
(see paradime.relations
)
and paradime.transforms.RelationTransform
(see
paradime.transforms
).
- class paradime.relationdata.FlatRelationArray(relations)[source]¶
Relation data in the form of a flat array of individual relations.
- Parameters:
relations (
ndarray
) – A flat Numpy array of relation values.
- data¶
The raw relation data.
- to_flat_array()[source]¶
Converts the relations to a
paradime.relationdata.FlatRelationArray
.- Return type:
- Returns:
The converted relations.
- to_flat_tensor()[source]¶
Converts the relations to a
paradime.relationdata.FlatRelationTensor
.- Return type:
- Returns:
The converted relations.
- class paradime.relationdata.FlatRelationTensor(relations)[source]¶
Relation data in the form of a flat tensor of individual relations.
- Parameters:
relations (
Tensor
) – A flat PyTorch tensor of relation values.
- data¶
The raw relation data.
- to_flat_array()[source]¶
Converts the relations to a
paradime.relationdata.FlatRelationArray
.- Return type:
- Returns:
The converted relations.
- to_flat_tensor()[source]¶
Converts the relations to a
paradime.relationdata.FlatRelationTensor
.- Return type:
- Returns:
The converted relations.
- class paradime.relationdata.NeighborRelationTuple(relations, sort=None)[source]¶
Relation data in neighborhood tuple form.
- Parameters:
relations (
tuple
[ndarray
,ndarray
]) – A tuple (n, r) of relation data, where n is an array of neighor indices for each data point and r is an array of relation values. Both arrays must be of shape (num_points, num_neighbors).sort (
Optional
[Literal
[‘ascending’, ‘descending’]]) – Sorting option. If None is passed (default), values are kept as is. Otherwise, values for each item are sorted either in'ascending'
or'descending'
order.
- data¶
The raw relation data.
- sub(indices)[source]¶
Subsamples the relation matrix based on item indices.
Intended to be used for batch-wise subsampling of global relations.
- to_neighbor_tuple()[source]¶
Converts the relations to a
paradime.relationdata.NeighborRelationTuple
.- Return type:
- Returns:
The converted relations.
- to_sparse_array()[source]¶
Converts the relations to a
paradime.relationdata.SparseRelationArray
.- Return type:
- Returns:
The converted relations.
- to_square_array()[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationArray
.- Return type:
- Returns:
The converted relations.
- to_square_tensor()[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationTensor
.- Return type:
- Returns:
The converted relations.
- to_triangular_array()[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationArray
.- Return type:
- Returns:
The converted relations.
- to_triangular_tensor()[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationTensor
.- Return type:
- Returns:
The converted relations.
- class paradime.relationdata.RelationData[source]¶
Base class for storing relations between data points.
- sub(indices)[source]¶
Subsamples the relation matrix based on item indices.
Intended to be used for batch-wise subsampling of global relations.
- to_flat_array()[source]¶
Converts the relations to a
paradime.relationdata.FlatRelationArray
.- Return type:
- Returns:
The converted relations.
- to_flat_tensor()[source]¶
Converts the relations to a
paradime.relationdata.FlatRelationTensor
.- Return type:
- Returns:
The converted relations.
- to_neighbor_tuple()[source]¶
Converts the relations to a
paradime.relationdata.NeighborRelationTuple
.- Return type:
- Returns:
The converted relations.
- to_sparse_array()[source]¶
Converts the relations to a
paradime.relationdata.SparseRelationArray
.- Return type:
- Returns:
The converted relations.
- to_square_array()[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationArray
.- Return type:
- Returns:
The converted relations.
- to_square_tensor()[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationTensor
.- Return type:
- Returns:
The converted relations.
- to_triangular_array()[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationArray
.- Return type:
- Returns:
The converted relations.
- to_triangular_tensor()[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationTensor
.- Return type:
- Returns:
The converted relations.
- class paradime.relationdata.SparseRelationArray(relations)[source]¶
Relation data in sparse array form.
- Parameters:
relations (
spmatrix
) – A square, sparse Scipy array of relation values.
- data¶
The raw relation data.
- sub(indices)[source]¶
Subsamples the relation matrix based on item indices.
Intended to be used for batch-wise subsampling of global relations.
- to_neighbor_tuple()[source]¶
Converts the relations to a
paradime.relationdata.NeighborRelationTuple
.- Return type:
- Returns:
The converted relations.
- to_sparse_array()[source]¶
Converts the relations to a
paradime.relationdata.SparseRelationArray
.- Return type:
- Returns:
The converted relations.
- to_square_array()[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationArray
.- Return type:
- Returns:
The converted relations.
- to_square_tensor()[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationTensor
.- Return type:
- Returns:
The converted relations.
- to_triangular_array()[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationArray
.- Return type:
- Returns:
The converted relations.
- to_triangular_tensor()[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationTensor
.- Return type:
- Returns:
The converted relations.
- class paradime.relationdata.SquareRelationArray(relations)[source]¶
Relation data in the form of a square array.
- Parameters:
relations (
ndarray
) – A square Numpy array of relation values.
- data¶
The raw relation data.
- sub(indices)[source]¶
Subsamples the relation matrix based on item indices.
Intended to be used for batch-wise subsampling of global relations.
- to_neighbor_tuple()[source]¶
Converts the relations to a
paradime.relationdata.NeighborRelationTuple
.- Return type:
- Returns:
The converted relations.
- to_sparse_array()[source]¶
Converts the relations to a
paradime.relationdata.SparseRelationArray
.- Return type:
- Returns:
The converted relations.
- to_square_array()[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationArray
.- Return type:
- Returns:
The converted relations.
- to_square_tensor()[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationTensor
.- Return type:
- Returns:
The converted relations.
- to_triangular_array()[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationArray
.- Return type:
- Returns:
The converted relations.
- to_triangular_tensor()[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationTensor
.- Return type:
- Returns:
The converted relations.
- class paradime.relationdata.SquareRelationTensor(relations)[source]¶
Relation data in the form of a square tensor.
- Parameters:
relations (
Tensor
) – A square PyTorch tensor of relation values.
- data¶
The raw relation data.
- sub(indices)[source]¶
Subsamples the relation matrix based on item indices.
Intended to be used for batch-wise subsampling of global relations.
- to_neighbor_tuple()[source]¶
Converts the relations to a
paradime.relationdata.NeighborRelationTuple
.- Return type:
- Returns:
The converted relations.
- to_sparse_array()[source]¶
Converts the relations to a
paradime.relationdata.SparseRelationArray
.- Return type:
- Returns:
The converted relations.
- to_square_array()[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationArray
.- Return type:
- Returns:
The converted relations.
- to_square_tensor()[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationTensor
.- Return type:
- Returns:
The converted relations.
- to_triangular_array()[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationArray
.- Return type:
- Returns:
The converted relations.
- to_triangular_tensor()[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationTensor
.- Return type:
- Returns:
The converted relations.
- class paradime.relationdata.TriangularRelationArray(relations)[source]¶
Relation data in ‘triangular’ vector-form.
- Parameters:
relations (
ndarray
) – A Numpy array of relation values, as accepted byscipy.spatial.distance.squareform()
.
- data¶
The raw relation data.
- sub(indices)[source]¶
Subsamples the relation matrix based on item indices.
Intended to be used for batch-wise subsampling of global relations.
- to_neighbor_tuple()[source]¶
Converts the relations to a
paradime.relationdata.NeighborRelationTuple
.- Return type:
- Returns:
The converted relations.
- to_sparse_array()[source]¶
Converts the relations to a
paradime.relationdata.SparseRelationArray
.- Return type:
- Returns:
The converted relations.
- to_square_array()[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationArray
.- Return type:
- Returns:
The converted relations.
- to_square_tensor()[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationTensor
.- Return type:
- Returns:
The converted relations.
- to_triangular_array()[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationArray
.- Return type:
- Returns:
The converted relations.
- to_triangular_tensor()[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationTensor
.- Return type:
- Returns:
The converted relations.
- class paradime.relationdata.TriangularRelationTensor(relations)[source]¶
Relation data in ‘triangular’ vector-form.
- Parameters:
relations (
Tensor
) – A PyTorch tensor of relation values, with a shape as accepted byscipy.spatial.distance.squareform()
.
- data¶
The raw relation data.
- sub(indices)[source]¶
Subsamples the relation matrix based on item indices.
Intended to be used for batch-wise subsampling of global relations.
- to_neighbor_tuple()[source]¶
Converts the relations to a
paradime.relationdata.NeighborRelationTuple
.- Return type:
- Returns:
The converted relations.
- to_sparse_array()[source]¶
Converts the relations to a
paradime.relationdata.SparseRelationArray
.- Return type:
- Returns:
The converted relations.
- to_square_array()[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationArray
.- Return type:
- Returns:
The converted relations.
- to_square_tensor()[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationTensor
.- Return type:
- Returns:
The converted relations.
- to_triangular_array()[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationArray
.- Return type:
- Returns:
The converted relations.
- to_triangular_tensor()[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationTensor
.- Return type:
- Returns:
The converted relations.
- paradime.relationdata.relation_factory(relations, force_flat=False)[source]¶
Creates a
paradime.relationdata.RelationData
object from a variety of input formats.- Parameters:
relations (
Union
[ndarray
,Tensor
,spmatrix
,Tuple
[ndarray
,ndarray
]]) – The relations, specified either as a flat array or tensor, a square array or tensor, a vector-form (triangular) array or tensor, a sparse array, or a tuple (n, r), where n is an array of neighor indices for each data point and r is an array of relation values of the same shape.force_flat (
bool
) – If set true, disables the check for triangular arrays and tensors. Useful if flat relation data might have a length equal to a triangular number.
- Return type:
- Returns:
A
paradime.relationdata.RelationData
object with a subclass depending on the input format.
Transforms¶
Relation transforms for ParaDime.
The paradime.tranforms
module defines various classes used to transform
relations between data points.
- class paradime.transforms.AdaptiveNeighborhoodRescale(kernel, find_param, verbose=False, **kwargs)[source]¶
Rescales relation values for each data point based on its neighbors.
This is a base class for transformations such as those used by t-SNE or UMAP. For each data point, a parameter is fitted by comparing kernel-transformed relations to a target value. Once the parameter value is found, the kernel function is used to transform the relations.
- Parameters:
kernel (
Callable
[[ndarray
,float
],Union
[float
,ndarray
]]) – The kernel function used to transform the relations. This is a callable taking the relation values for a data point, along with a parameter.find_param (
Callable
[...
,float
]) – The function used to find the parameter value. This is a callable taking the relation values and a fixed value to compare the transformed relations against.verbose (
bool
) – Verbosity toggle.
- property param_values: ndarray¶
The parameter values determined for each data point. Available only after calling the transform.
- Return type:
- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.ConnectivityBasedRescale(n_neighbors=15, verbose=False, **kwargs)[source]¶
Applies a connectivity-based transformation to the relation values.
The relation values are rescaled using shifted Guassian kernels. The shift is equal to the closes neighboring data point, and the kernel width is set by by comparing the summed kernel values to the binary logarithm of the specified number of neighbors. This is the relation transform used by UMAP.
- Parameters:
n_neighbors (
float
) – The number of nearest neighbors used to determine the kernel widths.verbose (
bool
) – Verbosity toggle.**kwargs – Passed on to
scipy.optimize.root_scalar()
, which determines the kernel widths. By default, this is set to use a bracket of [10^(-6), 10^6] for the root search.
- class paradime.transforms.Functional(f, in_place=True, check_valid=False)[source]¶
Applies a function to the relation data.
By default, this transform applies a given function to the
data
attribute of theparadime.relationdata.RelationData
instance in place and returns the transformed instance. This assumes that the transform does not change the data in a way that is incompatible with theparadime.relationdata.RelationData
subclass. The transform can also be applied to the wholeparadime.relationdata.RelationData
instance by settingin_place
to False. In this case, the output is that of the given function.- Parameters:
f (
Callable
[...
,Any
]) – Function to be applied to the relations.in_place (
bool
) – Toggles whether the function is applied to thedata
attribute of theparadime.relationdata.RelationData
object (default), or to theparadime.relationdata.RelationData
itself.check_valid (
bool
) – Toggles whether a check for the transformed relation data’s validity is performed. Ifin_place
is set to False, no checks are performed regardless of this parameter.
- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.Identity(**kwargs)[source]¶
A placeholder identity transform.
- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.ModifiedCauchyTransform(min_dist=0.1, spread=1.0, a=None, b=None)[source]¶
Transforms relations based on a modified Cauchy distribution.
This transform applies a modified Cauchy distribution function to the relations. The distribution’s parameters
a
andb
are determined from the parametersmin_dist
andspread
by fitting a smooth approximation of an offset exponential decay.- Parameters:
min_dist (
float
) – Effective minimum distance of points if the transformed relations were to be used for calculating an embedding.spread (
float
) – Effective scale of the points if the tranformed relations were to be used for calculating an embedding.a (
Union
[float
,Tensor
,None
]) – Parameter to define the distribution directly. It can be optimized together with the DR model in aParametricDR
by setting it to one of the model’s additional parameters.b (
Union
[float
,Tensor
,None
]) – Parameter to define the distribution directly. It can be optimized together with the DR model in aParametricDR
by setting it to one of the model’s additional parameters.
- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.Normalize(**kwargs)[source]¶
Normalizes all relations at once.
- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.NormalizeRows(**kwargs)[source]¶
Normalizes the relation values for each data point separately.
- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.PerplexityBasedRescale(perplexity=30, verbose=False, **kwargs)[source]¶
Applies a perplexity-based transformation to the relation values.
The relation values are rescaled using Guassian kernels. For each data point, the kernel width is determined by comparing the entropy of the relation values to the binary logarithm of the specified perplexity. This is the relation transform used by t-SNE.
- Parameters:
perplexity (
float
) – The desired perplexity, which can be understood as a smooth measure of nearest neighbors.verbose (
bool
) – Verbosity toggle.**kwargs – Passed on to
scipy.optimize.root_scalar()
, which determines the kernel widths. By default, this is set to use a bracket of [0.01, 1.] for the root search.
- class paradime.transforms.RelationTransform(**kwargs)[source]¶
Base class for relation transforms.
Custom transforms should subclass this class.
- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.StudentTTransform(alpha)[source]¶
Transforms relations based on Student’s t-distribution.
- Parameters:
alpha (
Union
[float
,Tensor
]) – Degrees of freedom of the distribution. This can either be a float or a PyTorch tensor. Alpha can be optimized together with the DR model in aparadime.dr.ParametricDR
by setting it to one of the model’s additional parameters.
- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.Symmetrize(subtract_product=False)[source]¶
Symmetrizes the relation values.
- Parameters:
subtract_product (
bool
) – Specifies which symmetrization routine to use. If set to False (default), a matrix M is symmetrized by calculating 1/2 * (M + M^T); if set to True, M is symmetrized by calculating M + M^T - M * M^T, where ‘*’ is the element-wise (Hadamard) product.
- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.ToFlatArray(**kwargs)[source]¶
Converts the relations to a
paradime.relationdata.FlatRelationArray
.- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.ToFlatTensor(**kwargs)[source]¶
Converts the relations to a
paradime.relationdata.FlatRelationTensor
.- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.ToNeighborTuple(**kwargs)[source]¶
Converts the relations to a
paradime.relationdata.NeighborRelationTuple
.- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.ToSparseArray(**kwargs)[source]¶
Converts the relations to a
paradime.relationdata.SparseRelationArray
.- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.ToSquareArray(**kwargs)[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationArray
.- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.ToSquareTensor(**kwargs)[source]¶
Converts the relations to a
paradime.relationdata.SquareRelationTensor
.- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.ToTriangularArray(**kwargs)[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationArray
.- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.ToTriangularTensor(**kwargs)[source]¶
Converts the relations to a
paradime.relationdata.TriangularRelationTensor
.- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
- class paradime.transforms.ZeroDiagonal(**kwargs)[source]¶
Sets all self-relations to zero.
- transform(reldata)[source]¶
Applies the transform to input data.
- Parameters:
reldata (
RelationData
) – Theparadime.relationdata.RelationData
instance to be transformed.- Return type:
- Returns:
- A
paradime.relationdata.RelationData
instance containing the transformed relation values.
- A
Loss¶
Losses for ParaDime routines.
The paradime.loss
module implements the specification of losses for
ParaDime routines. The supported losses are
paradime.loss.RelationLoss
,
paradime.loss.ClassificationLoss
,
paradime.loss.ReconstructionLoss
, and
paradime.loss.CompoundLoss
.
- class paradime.loss.ClassificationLoss(loss_function=CrossEntropyLoss(), data_key='main', label_key='labels', classification_method='classify', name=None)[source]¶
A loss that compares predicted class labels against ground truth labels.
This loss compares predicted class labels to ground truth labels in a batch using a specified loss function (cross-entropy by default). Class labels are predicted by applying the model’s
classify()
method to the specified data entry of the input batch.- Parameters:
loss_function (
Callable
[[Tensor
,Tensor
],Tensor
]) – The loss function to be applied.data_key (
str
) – The key under which to find the data in the input batch.label_key (
str
) – The key under which ground truth labels are stored in the input batch.classification_method (
str
) – The model method to be used for classifying the batch of input data.name (
Optional
[str
]) – Name of the loss (used by logging functions).
- forward(model, global_relations, batch_relations, batch, device)[source]¶
Apply the loss to a batch of input data.
- Parameters:
model (
Model
) – Thetorch.nn.module
used to embed, classify, or reconstruct a batch of input data.global_relations (
dict
[str
,RelationData
]) – A dictionary withparadime.relationdata.RelationData
computed for the whole dataset.batch_relations (
dict
[str
,Relations
]) – A dictionary withparadime.relations.Relations
to be computed for the batch of input data.batch (
dict
[str
,Tensor
]) – A batch of input data as a dictionary of PyTorch tensors.device (
device
) – The device that all relevant tensors will be moved to.
- Return type:
- Returns:
A single-item PyTorch tensor with the computed loss.
- class paradime.loss.CompoundLoss(losses, weights=None, name=None)[source]¶
A weighted sum of multiple losses.
- Parameters:
losses (
list
[Loss
]) – A list ofparadime.loss.Loss
instances to be summed.weights (
Union
[ndarray
,Tensor
,list
[float
],None
]) – A list of weights to multiply the losses with. Must be of the same length as the list of losses. If no weights are specified, all losses are weighted equally.name (
Optional
[str
]) – Name of the loss (used by logging functions).
- checkpoint()[source]¶
Create a checkpoint of the most recent accumulated loss.
Appends the value of the most recent accumulated loss to the loss’s
history
attribute. If the loss is aparadime.loss.CompoundLoss
, checkpoints are also created for each individual loss.- Return type:
- detailed_history()[source]¶
Returns a detailed history of the compound loss.
- Return type:
- Returns:
A PyTorch tensor with the history of each loss component multiplied by its weight.
- forward(model, global_relations, batch_relations, batch, device)[source]¶
Apply the loss to a batch of input data.
- Parameters:
model (
Model
) – Thetorch.nn.module
used to embed, classify, or reconstruct a batch of input data.global_relations (
dict
[str
,RelationData
]) – A dictionary withparadime.relationdata.RelationData
computed for the whole dataset.batch_relations (
dict
[str
,Relations
]) – A dictionary withparadime.relations.Relations
to be computed for the batch of input data.batch (
dict
[str
,Tensor
]) – A batch of input data as a dictionary of PyTorch tensors.device (
device
) – The device that all relevant tensors will be moved to.
- Return type:
- Returns:
A single-item PyTorch tensor with the computed loss.
- class paradime.loss.Loss(name=None)[source]¶
Base class for losses.
Custom losses should subclass this class.
- name¶
The name of the loss (used by logging functions).
- checkpoint()[source]¶
Create a checkpoint of the most recent accumulated loss.
Appends the value of the most recent accumulated loss to the loss’s
history
attribute. If the loss is aparadime.loss.CompoundLoss
, checkpoints are also created for each individual loss.- Return type:
- forward(model, global_relations, batch_relations, batch, device)[source]¶
Apply the loss to a batch of input data.
- Parameters:
model (
Model
) – Thetorch.nn.module
used to embed, classify, or reconstruct a batch of input data.global_relations (
dict
[str
,RelationData
]) – A dictionary withparadime.relationdata.RelationData
computed for the whole dataset.batch_relations (
dict
[str
,Relations
]) – A dictionary withparadime.relations.Relations
to be computed for the batch of input data.batch (
dict
[str
,Tensor
]) – A batch of input data as a dictionary of PyTorch tensors.device (
device
) – The device that all relevant tensors will be moved to.
- Return type:
- Returns:
A single-item PyTorch tensor with the computed loss.
- class paradime.loss.PositionLoss(loss_function=MSELoss(), data_key='main', position_key='pos', embedding_method='embed', name=None)[source]¶
A loss that compares embedding coordinates to given positions.
This loss compares embedding coordiantes to given ground-truth coordinates in a batch using a specified loss function (mean-square-error by default). Embedding positions are computed by applying the model’s
embed()
method to the specified data entry of the input batch.- Parameters:
loss_function (
Callable
[[Tensor
,Tensor
],Tensor
]) – The loss function to be applied.data_key (
str
) – The key under which to find the data in the input batch.position_key (
str
) – The key under which the ground truth positions are stored in the input batch.embedding_method (
str
) – The model method to be used for embedding the batch of input data.name (
Optional
[str
]) – Name of the loss (used by logging functions).
- forward(model, global_relations, batch_relations, batch, device)[source]¶
Apply the loss to a batch of input data.
- Parameters:
model (
Model
) – Thetorch.nn.module
used to embed, classify, or reconstruct a batch of input data.global_relations (
dict
[str
,RelationData
]) – A dictionary withparadime.relationdata.RelationData
computed for the whole dataset.batch_relations (
dict
[str
,Relations
]) – A dictionary withparadime.relations.Relations
to be computed for the batch of input data.batch (
dict
[str
,Tensor
]) – A batch of input data as a dictionary of PyTorch tensors.device (
device
) – The device that all relevant tensors will be moved to.
- Return type:
- Returns:
A single-item PyTorch tensor with the computed loss.
- class paradime.loss.ReconstructionLoss(loss_function=MSELoss(), data_key='main', encoding_method='encode', decoding_method='decode', name=None)[source]¶
A simple reconstruction loss for auto-encoding data.
This loss compares reconstructed data to input data in a batch using a specified loss function (mean-square-error by default). Reconstructed data is computed by applying the model’s
decode()
andencode()
methods subsequently to the specified data entry of the input batch.- Parameters:
loss_function (
Callable
[[Tensor
,Tensor
],Tensor
]) – The loss function to be applied.data_key (
str
) – The key under which to find the data in the input batch.encoding_method (
str
) – The model method to be used for encoding the batch of input data.decoding_method (
str
) – The model method to be used for decoding the encoded batch of input data.name (
Optional
[str
]) – Name of the loss (used by logging functions).
- forward(model, global_relations, batch_relations, batch, device)[source]¶
Apply the loss to a batch of input data.
- Parameters:
model (
Model
) – Thetorch.nn.module
used to embed, classify, or reconstruct a batch of input data.global_relations (
dict
[str
,RelationData
]) – A dictionary withparadime.relationdata.RelationData
computed for the whole dataset.batch_relations (
dict
[str
,Relations
]) – A dictionary withparadime.relations.Relations
to be computed for the batch of input data.batch (
dict
[str
,Tensor
]) – A batch of input data as a dictionary of PyTorch tensors.device (
device
) – The device that all relevant tensors will be moved to.
- Return type:
- Returns:
A single-item PyTorch tensor with the computed loss.
- class paradime.loss.RelationLoss(loss_function, global_relation_key='rel', batch_relation_key='rel', embedding_method='embed', normalize_sub=True, name=None)[source]¶
A loss that compares batch-wise relation data against a subset of global relation data.
This loss applies a specified loss function to a subset of pre-computed global relations and the batch-wise relations found under specified keys, respectively. Batch-wise relations are computed from embedded coordinates by applying the model’s
embed()
method to the specified data entry of the input batch.- Parameters:
loss_function (
Callable
[[Tensor
,Tensor
],Tensor
]) – The loss function to be applied.global_relation_key (
str
) – Key under which to find the global relations.batch_relation_key (
str
) – Key under which to find the batch-wise relations.embedding_method (
str
) – The model method to be used for embedding the batch of input data.name (
Optional
[str
]) – Name of the loss (used by logging functions).
- forward(model, global_relations, batch_relations, batch, device)[source]¶
Apply the loss to a batch of input data.
- Parameters:
model (
Model
) – Thetorch.nn.module
used to embed, classify, or reconstruct a batch of input data.global_relations (
dict
[str
,RelationData
]) – A dictionary withparadime.relationdata.RelationData
computed for the whole dataset.batch_relations (
dict
[str
,Relations
]) – A dictionary withparadime.relations.Relations
to be computed for the batch of input data.batch (
dict
[str
,Tensor
]) – A batch of input data as a dictionary of PyTorch tensors.device (
device
) – The device that all relevant tensors will be moved to.
- Return type:
- Returns:
A single-item PyTorch tensor with the computed loss.
- paradime.loss.cross_entropy_loss(p, q, epsilon=1e-07)[source]¶
Cross-entropy loss as used by UMAP.
To be used as a loss function in the
paradime.loss.RelationLoss
of a parametric DR routine.- Parameters:
- Return type:
- Returns:
The cross-entropy loss of the two input tensors, divided by the number items in the batch.
- paradime.loss.kullback_leibler_div(p, q, epsilon=1e-07)[source]¶
Kullback-Leibler divergence.
To be used as a loss function in the
paradime.loss.RelationLoss
of a parametric DR routine.- Parameters:
- Return type:
- Returns:
The Kullback-Leibler divergence of the two input tensors, divided by the number of items in the batch.
Routines¶
Predefined ParaDime routines for existing DR techniques.
The paradime.routines
module implements parametric versions of existing
dimensionality reduction techniques using the paradime.dr.ParametricDR
interface.
- class paradime.routines.ParametricTSNE(perplexity=30.0, alpha=1.0, model=None, in_dim=None, out_dim=2, hidden_dims=[100, 50], initialization='pca', epochs=30, init_epochs=10, batch_size=500, init_batch_size=None, learning_rate=0.01, init_learning_rate=None, data_key='main', use_cuda=False, verbose=False)[source]¶
A parametric version of t-SNE.
This class provides a high-level interface for a
paradime.paradime.ParametricDR
routine with the following specifications:The global relations are
paradime.relations.NeighborBasedPDist
, transformed with aparadime.transforms.PerplexityBasedRescale
followed byparadime.tranforms.Symmetrize
.The batch relations are
paradime.relations.DifferentiablePDist
, transformed with aparadime.relations.StudentTTransform
followed byparadime.transform.Normalize
.The first (optional) training phase intializes the model to approximate PCA (see
intialization
below).The second training phase uses the Kullback-Leibler divergence to compare the relations.
- Parameters:
perplexity (
float
) – The desired perplexity, which can be understood as a smooth measure of nearest neighbors used to determine high-dimensional relations between data points.alpha (
float
) – Degrees of freedom of the Student’s t-disitribution used to calculate low-dimensional relations between data points.model (
Optional
[Model
]) – The model used to embed the high dimensional data.in_dim (
Optional
[int
]) – The numer of dimensions of the input data, used to construct a default model in case none is specified. If a dataset is specified at instantiation, the correct value for this parameter will be inferred from the data dimensions.out_dim (
int
) – The number of output dimensions (i.e., the dimensionality of the embedding).hidden_dims (
list
[int
]) – Dimensions of hidden layers for the default fully connected model that is created if no model is specified.initialization (
Optional
[str
]) – How to pretrain the model to mimic initialization of low-dimensional positions. By default ('pca'
) the model is pretrained to output an approximation of PCA before beginning the main training phase.epochs (
int
) – The number of epochs in the main training phase.init_epochs (
int
) – The number of epochs in the pretraining (initialization). phase.batch_size (
int
) – The number of items in a batch during the main training phase.init_batch_size (
Optional
[int
]) – The number of items in a batch during the pretraining (initialization).learning_rate (
float
) – The learning rate during the main training phase.init_learning_reate – The learning rate during the pretraining (initialization).
data_key (
str
) – The key under which the data can be found in the dataset.dataset – The dataset on which to perform the training. Datasets can be registerd after instantiation using the
register_dataset()
class method.use_cuda (
bool
) – Whether or not to use the GPU for training.verbosity – Verbosity flag.
- class paradime.routines.ParametricUMAP(n_neighbors=30, min_dist=0.01, spread=1.0, a=None, b=None, model=None, in_dim=None, out_dim=2, hidden_dims=[100, 50], initialization='spectral', epochs=30, init_epochs=5, batch_size=10, negative_sampling_rate=5, init_batch_size=100, learning_rate=0.005, init_learning_rate=0.05, data_key='main', dataset=None, use_cuda=False, verbose=False)[source]¶
A parametric version of UMAP.
This class provides a high-level interface for a
paradime.paradime.ParametricDR
routine with the following specifications:The global relations are
paradime.relations.NeighborBasedPDist
, transformed with aparadime.transforms.ConnectivityBasedRescale
followed byparadime.tranforms.Symmetrize
with product subtraction.The batch relations are
paradime.relations.DistsFromTo
(since negative edge sampling is used), transformed with aparadime.relations.ModifiedCauchyTransform
.The first (optional) training phase intializes the model to approximate a spectral embedding based on the global relations (see
intialization
below).The second training phase uses corss-entropy to compare the relations. This phase uses negative edge sampling.
- Parameters:
n_neighbors (
int
) – The desired number of neighbors used for computing the high-dimensional pairwise relations.min_dist (
float
) – Effective minimum distance of points in the embedding.spread (
float
) – Effective scale of the points in the embedding.a (
Optional
[float
]) – Parameter to define the modified Cauchy distribution used to compute low-dimensional relations.b (
Optional
[float
]) – Parameter to define the modified Cauchy distribution used to compute low-dimensional relations.model (
Optional
[Model
]) – The model used to embed the high dimensional data.in_dim (
Optional
[int
]) – The numer of dimensions of the input data, used to construct a default model in case none is specified. If a dataset is specified at instantiation, the correct value for this parameter will be inferred from the data dimensions.out_dim (
int
) – The number of output dimensions (i.e., the dimensionality of the embedding).hidden_dims (
list
[int
]) – Dimensions of hidden layers for the default fully connected model that is created if no model is specified.initialization (
Optional
[str
]) – How to pretrain the model to mimic initialization of low-dimensional positions. By default ('spectral'
) the model is pretrained to output an approximation of a soectral embedding based on the high-dimensional relations before beginning the main training phase.epochs (
int
) – The number of epochs in the main training phase.init_epochs (
int
) – The number of epochs in the pretraining (initialization). phase.batch_size (
int
) – The number of items in a batch during the main training phase.init_batch_size (
int
) – The number of items in a batch during the pretraining (initialization).learning_rate (
float
) – The learning rate during the main training phase.init_learning_reate – The learning rate during the pretraining (initialization).
data_key (
str
) – The key under which the data can be found in the dataset.dataset (
Union
[ndarray
,Tensor
,Mapping
[str
,Union
[ndarray
,Tensor
]],Dataset
,None
]) – The dataset on which to perform the training. Datasets can be registerd after instantiation using theregister_dataset()
class method.use_cuda (
bool
) – Whether or not to use the GPU for training.verbosity – Verbosity flag.
Utils¶
Utility functions for ParaDime.
The paradime.utils
subpackage includes various modules that implement
utility functions for logging, plotting, and input conversion.
Convert¶
Conversion utilities for ParaDime.
The paradime.utils.convert
module implements various conversion
functions for tensors-like objects and index lists.
- paradime.utils.convert.rowcol_to_triu_index(i, j, dim)[source]¶
Converts matrix indices to upper-triangular form.
Converts a pair of row and column indices of a symmetrical square array to the corresponding index of the list of upper triangular values.
- Parameters:
- Return type:
- Returns:
The upper triangular index.
- Raises:
ValueError – For diagonal indices (i.e., if i equals j).
Logging¶
Loggin utility for ParaDime.
The paradime.utils.logging
module implements logging functionality used
by verbose ParaDime routines.
- paradime.utils.logging.log(message)[source]¶
Calls the ParaDime logger to print a timestamp and a message.
Plotting¶
Plotting utilities for ParaDime.
The paradime.utils.plotting
module implements plotting functions and
color palette retrieval.
- paradime.utils.plotting.get_color_palette()[source]¶
Get the custom ParaDime color palette.
The palette is usually located in an assets folder in the form of a JSON file. If the JSON file is not found, this method attemps to create it from parsing an SVG file.
- Return type:
- Returns:
The color palette as a dict of names and hex color values.
- Raises:
FileNotFoundError – If neither the JSON nor the SVG file can be found.
- paradime.utils.plotting.scatterplot(coords, labels=None, colormap=None, labels_to_index=None, figsize=(10, 10), bgcolor='#fcfcfc', legend=True, legend_options=None, ax=None, **kwargs)[source]¶
Creates a scatter plot of points at the given coordinates.
- Parameters:
coords (
Union
[ndarray
,Tensor
]) – The coordinates of the points.labels (
Union
[ndarray
,Tensor
,None
]) – An list of categorical labels. If labels are given, a categorical color scale is used and a legend is constructed automatically.colormap (
Optional
[list
[str
]]) – A list of colors to use instead of the default categorical color scale based on the ParaDime palette.labels_to_index (
Optional
[dict
]) – A dict that maps labels to indices which are then used to access the colors in the categorical color scale.figsize (
tuple
[float
,float
]) – Width and height of the plot in inches.bgcolor (
Optional
[str
]) – The background color of the plot, which by default is also to draw thin outlines around the points.legend (
bool
) – Whether or not to include the automatically created legend.legend_options (
Optional
[dict
[str
,Any
]]) – A dict of keyword arguments that are passed on to the legend method.ax (
Optional
[matplotlib.axes.Axes]) – An axes of the current figure. This argument is useful if the scatterplot should be added to an existing figure.kwargs – Any other keyword arguments are passed on to matplotlib’s scatter method.
- Return type:
- Returns:
The
matplotlib.axes.Axes
instance of the plot.
Seed¶
Random seeding for ParaDime.
The paradime.utils.seed
subpackage implements a function to seed all
random number generators potentially involved in a ParaDime routine.
- paradime.utils.seed.seed_all(seed)[source]¶
Sets several seeds to maximize reproducibility.
For infos on reproducibility in PyTorch, see https://pytorch.org/docs/stable/notes/randomness.html.
- Parameters:
seed (
int
) – The integer to use as a seed.- Return type:
- Returns:
The
torch.Generator
instance returned bytorch.manual_seed()
.