Question 1

What are vector embeddings and why visualize them?

Accepted Answer

Vector embeddings are numerical representations of data — words, sentences, images, or any object — in a high-dimensional space, produced by neural networks. Models like Word2Vec, BERT, CLIP, and OpenAI's text-embedding-ada map semantically similar items close together. Because humans cannot perceive more than 3 dimensions, dimensionality reduction algorithms like t-SNE (t-distributed Stochastic Neighbor Embedding), UMAP, or PCA are used to project embeddings down to 2D coordinates while preserving local structure. Visualizing these 2D projections reveals cluster quality, outliers, and semantic relationships in the embedding space.

Question 2

How do I generate 2D coordinates from my embeddings?

Accepted Answer

Start with your high-dimensional embeddings (e.g., 768-dimensional BERT vectors). Then run a dimensionality reduction algorithm: use sklearn.manifold.TSNE(n_components=2) in Python for t-SNE, umap.UMAP(n_components=2) for UMAP, or sklearn.decomposition.PCA(n_components=2) for PCA. All three produce a 2-column array of (x, y) coordinates — one row per input item. Export these alongside your labels as a CSV and paste them here.

Question 3

What is the difference between t-SNE, UMAP, and PCA for embedding visualization?

Accepted Answer

PCA (Principal Component Analysis) is linear and fast but may not separate non-linear clusters well. It preserves global variance. t-SNE is non-linear and excellent at revealing tight local clusters, but it distorts global distances — two clusters far apart in 2D may not be far apart in the original space. UMAP is also non-linear and generally faster than t-SNE, better preserves both local and global structure, and produces more stable results across runs. For quick exploration use PCA; for publication-quality cluster visualization use UMAP or t-SNE.

Question 4

What CSV format does this tool accept?

Accepted Answer

The tool accepts CSV with 3 or 4 columns: label,x,y or label,x,y,group. The first row may optionally be a header (auto-detected). Each subsequent row is one data point. The label column is a string used for tooltips. x and y are floating-point coordinates (your 2D projection output). The optional group column is a string used for color-coding — all points with the same group value share a color. Up to 8 distinct group colors are supported.

Question 5

Can I export the plot?

Accepted Answer

Yes. The Download SVG button serializes the current SVG element to a .svg file, which preserves vector quality at any resolution. You can open the SVG in Inkscape, Adobe Illustrator, or a browser to further edit or convert it to PNG/PDF. The downloaded SVG reflects the current zoom and pan state, so adjust the view before downloading for the framing you want.

Embedding Visualizer

Understanding 2D Embedding Projections

Generating 2D Coordinates with Python

Interpreting Embedding Plots

Frequently Asked Questions

What are vector embeddings and why visualize them?

How do I generate 2D coordinates from my embeddings?

What is the difference between t-SNE, UMAP, and PCA for embedding visualization?

What CSV format does this tool accept?

Can I export the plot?

Related Tools