Text Encoding Explorer

See how characters are represented as bytes. Understand ASCII, UTF-8, and Unicode without diving into implementation details.

Characters

UTF-8 Bytes

ASCII Chars

Multi-byte

Char	Unicode	UTF-8 Bytes	Hex	Type

Computers store everything as numbers. Text encoding is simply a system that assigns a number to each character. That number is then stored as bytes.

The oldest standard. Uses numbers 0-127 to represent basic English letters, digits, and symbols. Each character = 1 byte.

Example
'A' = 65 = 01000001 in binary

A universal catalog that assigns a unique number (code point) to every character in every language. Written as U+XXXX (e.g., U+0041 for 'A').

The most common encoding for storing Unicode. Variable-length: ASCII characters use 1 byte, other characters use 2-4 bytes.

Characters ≠ Bytes. A string's length in characters is often different from its size in bytes.

UTF-8 ≠ Unicode. Unicode is the character catalog. UTF-8 is one way to encode those characters into bytes.

Emojis are multi-byte. A single emoji like 🌍 can take 4 bytes in UTF-8.

32 = space
48-57 = 0-9
65-90 = A-Z
97-122 = a-z