CoderTools

Text Encoding Converter

Convert text between Hex, Binary, Unicode and more encoding formats

Byte Delimiter:
Add Prefix:

Encoding Converter Documentation

What is Character Encoding?

Character encoding is a system that maps characters to numbers that computers can process. Different encoding schemes are used for different purposes such as storing, transmitting, or displaying text data. Common encodings include ASCII, UTF-8, UTF-16, etc.

Supported Formats

Hexadecimal (Hex)

Hexadecimal representation using digits 0-9 and letters A-F. Each byte is represented by two hex characters. Widely used in programming and debugging.

Binary

Binary representation using only 0 and 1. Each byte is represented by 8 bits. This is the fundamental data representation used by computers.

Unicode Escape

Unicode escape sequences in \uXXXX format, commonly used in programming languages like JavaScript and JSON to represent Unicode characters.

HTML Entity

HTML entity encoding, including named entities (like &) and numeric entities (like & or &). Used to safely display special characters in HTML.

Punycode

Encoding scheme for Internationalized Domain Names (IDN). Converts Unicode characters to ASCII-compatible encoding, prefixed with xn--.

Common Use Cases

  • View hexadecimal or binary representation of characters during debugging
  • Handle data encoding in network protocols
  • Analyze and fix encoding issues (mojibake)
  • Use Unicode escape sequences in code
  • Handle Internationalized Domain Names (IDN)
  • Character escaping in HTML/XML

Character Set vs Encoding Format

Character set and encoding format are two different concepts. A character set defines which characters are used in text (such as ASCII, GB2312, GBK, Unicode, etc.), while an encoding format defines how these characters are stored in a computer (such as UTF-8, UTF-16, etc.). For example, the same text can be encoded using GB2312 character set as GB2312 encoding, or using Unicode character set as UTF-8.

If you need to convert between different character sets (such as GBK, UTF-8, ISO-8859-1, etc.) rather than just changing the encoding format, please use the Character Set Converter tool.

Frequently Asked Questions

What is the difference between encoding and encryption?

Encoding transforms data into another representation using a publicly known scheme — there is no secret key involved, and the process is fully reversible by anyone. Encryption scrambles data using a secret key, so only someone with the key can reverse it. Base64 and hex are encodings; AES and RSA are encryption algorithms.

Why does Base64-encoded text end with = or ==?

Base64 encodes every 3 input bytes into 4 output characters. When the input length is not divisible by 3, one or two = characters are added as padding to make the output a multiple of 4 characters. One = means 1 padding byte was added; two == means 2 were added. Some implementations omit padding — both forms are valid if the decoder handles it.

What is the difference between ASCII and Unicode?

ASCII is a 7-bit encoding that covers 128 characters (A-Z, 0-9, common punctuation). Unicode is a character repertoire standard covering over 140,000 characters across all writing systems. UTF-8, UTF-16, and UTF-32 are different ways to encode Unicode code points as bytes — UTF-8 is backward-compatible with ASCII for the first 128 code points.

When should I use hex encoding instead of Base64?

Hex (Base16) represents each byte as two hexadecimal characters, which is verbose but instantly human-readable for technical inspection — handy for debugging byte streams, cryptographic keys, and binary protocol values. Base64 is roughly 33% more compact and is preferred when transmitting binary data in JSON, email, or URLs.

What does the Unicode code point U+XXXX notation mean?

U+XXXX is the standard notation for a Unicode code point, where XXXX is a hexadecimal number. For example, U+0041 is the Latin capital letter A, and U+4E2D is the Chinese character 中. Code points range from U+0000 to U+10FFFF. The U+ prefix was introduced by the Unicode Consortium to distinguish code points from byte values.

Related Tools

Quick Menu

No recent tools