Text Encoding Explorer
See how characters are represented as bytes. Understand ASCII, UTF-8, and Unicode without diving into implementation details.
Enter Text
Encoding Analysis
0
Characters
0
UTF-8 Bytes
0
ASCII Chars
0
Multi-byte
Character-by-Character Analysis
| Char | Unicode | UTF-8 Bytes | Hex | Type |
|---|
How Text Encoding Works
The Core Concept
Computers store everything as numbers. Text encoding is simply a system that assigns a number to each character. That number is then stored as bytes.
ASCII (7-bit)
The oldest standard. Uses numbers 0-127 to represent basic English letters, digits, and symbols. Each character = 1 byte.
Example
'A' = 65 = 01000001 in binary
Unicode
A universal catalog that assigns a unique number (code point) to every character in every language. Written as U+XXXX (e.g., U+0041 for 'A').
UTF-8
The most common encoding for storing Unicode. Variable-length: ASCII characters use 1 byte, other characters use 2-4 bytes.
| Code Point Range | Bytes | Example |
|---|---|---|
| U+0000 to U+007F | 1 byte | A, B, 1, 2 |
| U+0080 to U+07FF | 2 bytes | Γ©, Γ±, Ξ± |
| U+0800 to U+FFFF | 3 bytes | δΈ, ζ₯, β¬ |
| U+10000 to U+10FFFF | 4 bytes | π, π |