Open In Colab

Simple Usage

paraDime is a flexible framework for specifiying parametric dimensionality reduction routines. A routine basically consists of a neural network, a dataset, and some instructions about what exactly paraDime should do yith your data to reduce its dimensionality.

paraDime has a flexible API with several predefined classes for each part of a routine, and each part can be fully customized by extending these existing classes. If you want to learn more about what exactly makes up a routine, see the Building Blocks page.

But for now, the easiest way to get started with paraDime is to use on of the predefined routines. In the following short tutorial, we are going to train one of the predefined paraDime routines to reduce the dimensionality of data from the MNIST dataset of handwritten digits.

Importing paraDime and Loading the Dataset

First, we import the routines submodule of paraDime, which includes the predefined routines. We also import paraDime’s utils subpackage, which implements a scatterplot function that we are later going to use. Finally, we import torchvision, which gives us convenient access to the MNIST dataset.

[1]:
import paradime.routines
import paradime.utils
import torchvision

mnist = torchvision.datasets.MNIST(
    '../data',
    train=True,
    download=True,
)
mnist_data = mnist.data.reshape(-1, 28*28) / 255.
num_items = 5000

Note that we have already flattened the image data into vectors of length 784 and normalized the values to a range between 0 and 1. num_items is the size of the MNIST subset that we are going to use for training our routine.

Setting Up a Predefined Routine

We now create an instance of a parametric version of the t-SNE algorithm:

[2]:
dr = paradime.routines.ParametricTSNE(
    perplexity=100,
    dataset=mnist_data[:num_items],
    epochs=40,
    use_cuda=True,
    verbose=True,
)
2022-08-30 22:02:39,540: Registering dataset.

When initializing a routine, paraDime only needs minimal information to set up the underlying neural network. In this case paraDime infers all the necessary information from the dataset that we pass. For more info on the default model construction, see the section about Models. We tell paraDime that the main part of the traingin should go on for 40 epochs, and we would like to use the GPU for training (use use_cuda = False or comment out this line, if you don’t have CUDA installed.) Finally, the verbose flag tells paraDime to log some information about what is going on behind the scenes.

You might have noticed that we also pass a perplexity value, which is specific to the t-SNE algorithm.

Training the Routine and Visualizing the Results

Since any other necessary bulding blocks are already predefined in this case, all that’s left to do is to train the model. To do this, we simply call:

[3]:
dr.train()
2022-08-30 22:02:39,792: Adding entry 'pca' to dataset.
2022-08-30 22:02:39,793: Computing global relations 'rel'.
2022-08-30 22:02:39,793: Indexing nearest neighbors.
2022-08-30 22:02:57,248: Calculating probabilities.
2022-08-30 22:02:57,855: Beginning training phase 'pca_init'.
2022-08-30 22:03:00,066: Loss after epoch 0: 23.56941819190979
2022-08-30 22:03:00,490: Loss after epoch 5: 0.20473657734692097
2022-08-30 22:03:00,786: Beginning training phase 'embedding'.
2022-08-30 22:03:01,285: Loss after epoch 0: 0.04917518841102719
2022-08-30 22:03:02,806: Loss after epoch 5: 0.03781552822329104
2022-08-30 22:03:04,476: Loss after epoch 10: 0.03461226262152195
2022-08-30 22:03:06,175: Loss after epoch 15: 0.03340187715366483
2022-08-30 22:03:07,904: Loss after epoch 20: 0.03263849508948624
2022-08-30 22:03:09,622: Loss after epoch 25: 0.03208582056686282
2022-08-30 22:03:11,334: Loss after epoch 30: 0.031265218276530504
2022-08-30 22:03:13,031: Loss after epoch 35: 0.031297678826376796

After the training is done, we can apply our trained model to the input data:

[4]:
reduced = dr.apply(mnist_data[:num_items])

Now we can plot the dimensionality-reduced version of the data that we used for training:

[5]:
paradime.utils.plotting.scatterplot(reduced, mnist.targets[:num_items])
[5]:
<AxesSubplot:>
_images/simple_usage_10_1.png

Because paraDime models are parametric, you can easily apply the trained model to the whole MNIST dataset, even though our routine only ever saw a small subset of it:

[6]:
paradime.utils.plotting.scatterplot(dr.apply(mnist_data), mnist.targets)
[6]:
<AxesSubplot:>
_images/simple_usage_12_1.png

If you want to configure our own paraDime routines, you will need to get an understandin of what was going on behind the scenes. The output log might have given you a first idea about the different parts and steps involved in a routine. All the details are explained in the section about the Building Blocks of a ParaDime Routine.