Unicode / UTF-8 Inspector

Inspect any text character by character — see code points, UTF-8 bytes, UTF-16 units, HTML entities, categories, and character names. Supports emoji, combining marks, and multilingual text.

12
Graphemes
10
Unique
19
UTF-8 Bytes
Yes
Contains Emoji
Yes
Non-ASCII
Escape as:
12/12
CharCode PointHTML EntityUTF-8 BytesUTF-16CategoryName
HLetterLATIN CAPITAL LETTER H
eLetterLATIN SMALL LETTER E
lLetterLATIN SMALL LETTER L
lLetterLATIN SMALL LETTER L
oLetterLATIN SMALL LETTER O
,PunctuationCOMMA
SpaceSPACE
OtherU+4E16
OtherU+754C
!PunctuationEXCLAMATION MARK
SpaceSPACE
🎉EmojiU+1F389

Frequently Asked Questions

What is a Unicode code point?

A Unicode code point is a unique number assigned to every character in the Unicode standard, written as U+XXXX (e.g., U+0041 for 'A'). Unicode covers over 140,000 characters across all writing systems, symbols, and emoji.

What is the difference between UTF-8 and UTF-16?

UTF-8 is a variable-width encoding that uses 1–4 bytes per character. ASCII characters use just 1 byte, making it efficient for English text and common on the web. UTF-16 uses 2 or 4 bytes per character and is common in Windows and Java environments. This tool shows both encodings side by side.

Why do some emoji show as multiple rows?

Complex emoji like flags, family groups, and skin-tone variations are composed of multiple Unicode code points joined with Zero Width Joiner (ZWJ) sequences. The inspector breaks these into individual code points so you can see exactly what makes up the sequence.

What are the escape output options?

JavaScript escaping converts non-ASCII characters to \uXXXX sequences safe for use in JS string literals. URL encoding converts characters to %XX percent-encoded format for use in URLs. HTML entity encoding converts characters to &#NNN; numeric entities for safe use in HTML documents.