Embedding Visualizer
Paste CSV with label, x, y (and optional group) to plot 2D vector embeddings. Color-coded by group with hover tooltips, zoom slider, and drag-to-pan. Download as SVG.
CSV Input
CSV Format Guide
label,x,y (3 columns, group = "default")
label,x,y,group (4 columns, color-coded by group)
First row may be a header (auto-detected).
Stats
Points
19
Groups
4
X Range
[-2.40, 2.70]
Y Range
[-2.30, 2.40]
Legend
Zoom
1.0×Scatter Plot — drag to pan, use slider to zoom
Understanding 2D Embedding Projections
Modern AI models encode meaning as high-dimensional vectors. A text embedding model might produce a 1536-dimensional vector for each sentence, while an image model might use 2048 dimensions. These spaces are completely opaque to human inspection. Dimensionality reduction projects these vectors onto a 2D plane while attempting to preserve the neighborhood structure — items that were close in high-dimensional space should remain close in the plot.
This tool takes the output of that reduction step — a CSV of (label, x, y) or (label, x, y, group) — and renders an interactive scatter plot. Colors correspond to groups or categories you assign, making it easy to verify whether an embedding model is clustering your data the way you expect.
PCA
Quick exploration
Linear. Preserves global variance. Fast on large datasets.
t-SNE
Tight cluster visualization
Non-linear. Distorts global distances. Best for ≤ 50k points.
UMAP
Balanced structure
Non-linear. Preserves local + global. Faster than t-SNE.
Generating 2D Coordinates with Python
If you have embeddings stored as a NumPy array or a list of vectors, the following snippets produce CSV output ready to paste into this tool.
t-SNE with scikit-learn
import numpy as np
import pandas as pd
from sklearn.manifold import TSNE
# embeddings: np.ndarray of shape (N, D)
# labels: list of N strings
# groups: list of N group strings (optional)
tsne = TSNE(n_components=2, random_state=42)
coords = tsne.fit_transform(embeddings) # (N, 2)
df = pd.DataFrame({
"label": labels,
"x": coords[:, 0],
"y": coords[:, 1],
"group": groups, # omit if no groups
})
print(df.to_csv(index=False))UMAP with umap-learn
import umap reducer = umap.UMAP(n_components=2, random_state=42) coords = reducer.fit_transform(embeddings) # (N, 2) # Then build and print the CSV as above
Interpreting Embedding Plots
A well-trained embedding model should produce distinct, cohesive clusters for each semantic category when projected to 2D. If points from different groups are randomly interleaved, the model may not have learned discriminative features for your data. Tight, well-separated clusters indicate high embedding quality.
Good signs
- — Same-group points cluster tightly together
- — Different groups have visible separation
- — No obvious outliers unless expected
- — Smooth gradient between related groups
Warning signs
- — Groups completely intermixed — model not discriminating
- — Single massive blob — embeddings collapsed
- — Many isolated outliers — data quality issues
- — Uniform grid layout — projection failed, check parameters
Frequently Asked Questions
What are vector embeddings and why visualize them?
Vector embeddings are numerical representations of data — words, sentences, images, or any object — in a high-dimensional space, produced by neural networks. Models like Word2Vec, BERT, CLIP, and OpenAI's text-embedding-ada map semantically similar items close together. Because humans cannot perceive more than 3 dimensions, dimensionality reduction algorithms like t-SNE (t-distributed Stochastic Neighbor Embedding), UMAP, or PCA are used to project embeddings down to 2D coordinates while preserving local structure. Visualizing these 2D projections reveals cluster quality, outliers, and semantic relationships in the embedding space.
How do I generate 2D coordinates from my embeddings?
Start with your high-dimensional embeddings (e.g., 768-dimensional BERT vectors). Then run a dimensionality reduction algorithm: use sklearn.manifold.TSNE(n_components=2) in Python for t-SNE, umap.UMAP(n_components=2) for UMAP, or sklearn.decomposition.PCA(n_components=2) for PCA. All three produce a 2-column array of (x, y) coordinates — one row per input item. Export these alongside your labels as a CSV and paste them here.
What is the difference between t-SNE, UMAP, and PCA for embedding visualization?
PCA (Principal Component Analysis) is linear and fast but may not separate non-linear clusters well. It preserves global variance. t-SNE is non-linear and excellent at revealing tight local clusters, but it distorts global distances — two clusters far apart in 2D may not be far apart in the original space. UMAP is also non-linear and generally faster than t-SNE, better preserves both local and global structure, and produces more stable results across runs. For quick exploration use PCA; for publication-quality cluster visualization use UMAP or t-SNE.
What CSV format does this tool accept?
The tool accepts CSV with 3 or 4 columns: label,x,y or label,x,y,group. The first row may optionally be a header (auto-detected). Each subsequent row is one data point. The label column is a string used for tooltips. x and y are floating-point coordinates (your 2D projection output). The optional group column is a string used for color-coding — all points with the same group value share a color. Up to 8 distinct group colors are supported.
Can I export the plot?
Yes. The Download SVG button serializes the current SVG element to a .svg file, which preserves vector quality at any resolution. You can open the SVG in Inkscape, Adobe Illustrator, or a browser to further edit or convert it to PNG/PDF. The downloaded SVG reflects the current zoom and pan state, so adjust the view before downloading for the framing you want.