# Visualising Data using t-SNE

This highly cited <mark style="color:blue;">**2008**</mark> paper presented <mark style="color:blue;">**t-SNE (t-distributed Stochastic Neighbor Embedding)**</mark>, an advanced technique for *<mark style="color:yellow;">**visualising high-dimensional data by mapping it onto a two or three-dimensional space.**</mark>*&#x20;

This method is an evolution of the original <mark style="color:blue;">**Stochastic Neighbor Embedding (SNE)**</mark> developed by Hinton and Roweis in 2002. &#x20;

t-SNE modifies SNE to enhance the visualisation quality and ease of optimisation, addressing particularly the issue of crowding points in the centre of the map.&#x20;

This is especially crucial for data lying across multiple, related low-dimensional manifolds, common in datasets like images from various perspectives or text data.

{% embed url="<https://arxiv.org/abs/2108.01301>" %}
Visualizing Data using t-SNE
{% endembed %}

### <mark style="color:purple;">Key Contributions and Methodology</mark>

<mark style="color:green;">**Improved Optimisation:**</mark> t-SNE is easier to optimise compared to its predecessor SNE.

<mark style="color:green;">**Better Visualization:**</mark> The technique reduces crowding at the map's centre, a common issue in similar methods, which enhances the visualisation's readability and effectiveness.

<mark style="color:green;">**Adaptability:**</mark> t-SNE can visualise complex data structures from various domains, adapting to the intrinsic scales and densities of the data.

### <mark style="color:purple;">Technical Details</mark>

<mark style="color:green;">**Probability Distributions**</mark>

t-SNE starts by converting high-dimensional Euclidean distances between data points into conditional probabilities that express similarities.  These probabilities help in maintaining local structures of the data in the lower-dimensional space.

<mark style="color:green;">**Kullback-Leibler Divergence**</mark>

t-SNE minimises the sum of the Kullback-Leibler divergences between the joint probabilities of the high-dimensional and low-dimensional spaces, effectively keeping similar data points close in the map while allowing dissimilar points to be farther apart.

<mark style="color:green;">**Gradient Descent**</mark>

The method uses gradient descent to find the map that best represents the high-dimensional data's structure. The gradient terms are derived based on the difference in probabilities, with additional momentum and noise terms to optimize the embedding effectively.

<figure><img src="https://1839612753-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FpV8SlQaC976K9PPsjApL%2Fuploads%2FnMGgDRwUyhcv1jOy94Zd%2Fimage.png?alt=media&#x26;token=9e3a6541-816f-4792-91b8-bae996477df2" alt=""><figcaption><p>Visualisations of 6,000 hand written digits from the MNIST dataset</p></figcaption></figure>

### <mark style="color:purple;">Practical Impact and Theoretical Significance</mark>

The practical impact of t-SNE is profound, as it significantly improves the visualisation of complex datasets with intricate internal structures.&#x20;

The theoretical implications include a better understanding of how *<mark style="color:yellow;">**dimensionality reduction can be effectively achieved by managing the trade-offs between local and global data structures.**</mark>* This makes t-SNE particularly useful for datasets where preserving both types of structures is crucial for meaningful analysis.

&#x20;t-SNE advanced the field of dimensionality reduction by introducing robust methods to handle the inherent complexities of visualizing high-dimensional data. These enhancements make it a preferred tool in many applications, ranging from bioinformatics to social network analysis.

Conclusion:

t-SNE is a powerful tool for visualizing high-dimensional data effectively, particularly useful in domains where the data's intrinsic structure is complex and multi-scaled. Despite its computational demands and sensitivity to parameter settings, t-SNE's ability to produce superior visualizations makes it a valuable method in the toolbox of machine learning practitioners and data scientists.

### <mark style="color:purple;">Conclusions and Future Work</mark>

The paper concluded that t-SNE is highly effective for visualising complex datasets by retaining local data structures while revealing global structures like clusters.&#x20;

The technique is computationally intensive, but methods like the landmark approach help in managing these demands.

### <mark style="color:purple;">Software Libraries Implementing t-SNE</mark>

t-SNE is widely implemented across several major machine learning and data analysis libraries, including:

<mark style="color:green;">**Scikit-learn (Python):**</mark> Provides a well-optimised implementation of t-SNE, commonly used in academia and industry for data visualisation tasks.

<mark style="color:green;">**R (Rtsne package):**</mark> Offers an implementation tailored for use within the R statistical computing environment.

<mark style="color:green;">**MATLAB:**</mark> Includes t-SNE functions in its Statistics and Machine Learning Toolbox, facilitating easy integration with other MATLAB functionalities.

### <mark style="color:purple;">Modern Applications of t-SNE</mark>

t-SNE is used across various fields to analyse and visualise high-dimensional data:

#### <mark style="color:green;">**Biomedical Data Visualisation**</mark>

For example, it is used in single-cell RNA sequencing data analysis to visualise the variation in gene expression levels across individual cells, helping identify different cell types based on their gene expression profiles.

#### <mark style="color:green;">**Financial Data Analysis**</mark>

Analysts use t-SNE to identify clusters of similar financial products or to analyze consumer behavior based on high-dimensional data.

#### <mark style="color:green;">**Image Data Exploration**</mark>

t-SNE helps in visualizing datasets of high-resolution images, grouping similar images together, which is useful in fields like digital pathology or retail catalogue management.

t-SNE continues to be a vital tool in machine learning and data science, with ongoing research aimed at improving its theoretical understanding and computational efficiency. Its ability to reveal intricate structures hidden within complex datasets makes it an indispensable tool for exploratory data analysis.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://training.continuumlabs.ai/data/datasets/visualising-data-using-t-sne.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
