Embedding Visualizer

Paste CSV with label, x, y (and optional group) to plot 2D vector embeddings. Color-coded by group with hover tooltips, zoom slider, and drag-to-pan. Download as SVG.

CSV Input

CSV Format Guide

label,x,y (3 columns, group = "default")
label,x,y,group (4 columns, color-coded by group)
First row may be a header (auto-detected).

Stats

Points

19

Groups

4

X Range

[-2.40, 2.70]

Y Range

[-2.30, 2.40]

Legend

animals
tech
food
sports

Zoom

1.0×

Scatter Plot — drag to pan, use slider to zoom

-202-202

Understanding 2D Embedding Projections

Modern AI models encode meaning as high-dimensional vectors. A text embedding model might produce a 1536-dimensional vector for each sentence, while an image model might use 2048 dimensions. These spaces are completely opaque to human inspection. Dimensionality reduction projects these vectors onto a 2D plane while attempting to preserve the neighborhood structure — items that were close in high-dimensional space should remain close in the plot.

This tool takes the output of that reduction step — a CSV of (label, x, y) or (label, x, y, group) — and renders an interactive scatter plot. Colors correspond to groups or categories you assign, making it easy to verify whether an embedding model is clustering your data the way you expect.

PCA

Quick exploration

Linear. Preserves global variance. Fast on large datasets.

t-SNE

Tight cluster visualization

Non-linear. Distorts global distances. Best for ≤ 50k points.

UMAP

Balanced structure

Non-linear. Preserves local + global. Faster than t-SNE.

Generating 2D Coordinates with Python

If you have embeddings stored as a NumPy array or a list of vectors, the following snippets produce CSV output ready to paste into this tool.

t-SNE with scikit-learn

import numpy as np
import pandas as pd
from sklearn.manifold import TSNE

# embeddings: np.ndarray of shape (N, D)
# labels: list of N strings
# groups: list of N group strings (optional)

tsne = TSNE(n_components=2, random_state=42)
coords = tsne.fit_transform(embeddings)  # (N, 2)

df = pd.DataFrame({
    "label": labels,
    "x": coords[:, 0],
    "y": coords[:, 1],
    "group": groups,          # omit if no groups
})
print(df.to_csv(index=False))

UMAP with umap-learn

import umap

reducer = umap.UMAP(n_components=2, random_state=42)
coords = reducer.fit_transform(embeddings)  # (N, 2)

# Then build and print the CSV as above

Interpreting Embedding Plots

A well-trained embedding model should produce distinct, cohesive clusters for each semantic category when projected to 2D. If points from different groups are randomly interleaved, the model may not have learned discriminative features for your data. Tight, well-separated clusters indicate high embedding quality.

Good signs

  • Same-group points cluster tightly together
  • Different groups have visible separation
  • No obvious outliers unless expected
  • Smooth gradient between related groups

Warning signs

  • Groups completely intermixed — model not discriminating
  • Single massive blob — embeddings collapsed
  • Many isolated outliers — data quality issues
  • Uniform grid layout — projection failed, check parameters

Frequently Asked Questions

What are vector embeddings and why visualize them?

Vector embeddings are numerical representations of data — words, sentences, images, or any object — in a high-dimensional space, produced by neural networks. Models like Word2Vec, BERT, CLIP, and OpenAI's text-embedding-ada map semantically similar items close together. Because humans cannot perceive more than 3 dimensions, dimensionality reduction algorithms like t-SNE (t-distributed Stochastic Neighbor Embedding), UMAP, or PCA are used to project embeddings down to 2D coordinates while preserving local structure. Visualizing these 2D projections reveals cluster quality, outliers, and semantic relationships in the embedding space.

How do I generate 2D coordinates from my embeddings?

Start with your high-dimensional embeddings (e.g., 768-dimensional BERT vectors). Then run a dimensionality reduction algorithm: use sklearn.manifold.TSNE(n_components=2) in Python for t-SNE, umap.UMAP(n_components=2) for UMAP, or sklearn.decomposition.PCA(n_components=2) for PCA. All three produce a 2-column array of (x, y) coordinates — one row per input item. Export these alongside your labels as a CSV and paste them here.

What is the difference between t-SNE, UMAP, and PCA for embedding visualization?

PCA (Principal Component Analysis) is linear and fast but may not separate non-linear clusters well. It preserves global variance. t-SNE is non-linear and excellent at revealing tight local clusters, but it distorts global distances — two clusters far apart in 2D may not be far apart in the original space. UMAP is also non-linear and generally faster than t-SNE, better preserves both local and global structure, and produces more stable results across runs. For quick exploration use PCA; for publication-quality cluster visualization use UMAP or t-SNE.

What CSV format does this tool accept?

The tool accepts CSV with 3 or 4 columns: label,x,y or label,x,y,group. The first row may optionally be a header (auto-detected). Each subsequent row is one data point. The label column is a string used for tooltips. x and y are floating-point coordinates (your 2D projection output). The optional group column is a string used for color-coding — all points with the same group value share a color. Up to 8 distinct group colors are supported.

Can I export the plot?

Yes. The Download SVG button serializes the current SVG element to a .svg file, which preserves vector quality at any resolution. You can open the SVG in Inkscape, Adobe Illustrator, or a browser to further edit or convert it to PNG/PDF. The downloaded SVG reflects the current zoom and pan state, so adjust the view before downloading for the framing you want.