CoderTools

Charset Converter

Professional character encoding conversion with auto-detection support

Input Characters: 0 Input Bytes: 0
Output Characters: 0 Output Bytes: 0

Charset Converter Documentation

What is Character Encoding?

Character encoding is a system that maps characters to numbers (code points) and then to bytes. Different encodings use different mappings, which is why text can appear garbled when opened with the wrong encoding. Choosing the correct encoding is crucial for properly displaying and processing multilingual text.

How to Use This Tool

Text Conversion Mode

  1. Click the 'Text Conversion' tab to enter text conversion mode
  2. Select the source encoding from the dropdown, or use 'Auto Detect' to automatically identify the encoding
  3. Select the target encoding (default is UTF-8, the most universal encoding)
  4. Choose input/output format: Plain Text, Base64, Hex, or C/C++ array format
  5. Enter or paste your text, then click 'Convert' button. Use 'Copy' to copy the result or 'Download' to save as a file

File Conversion Mode

  1. Click the 'File Conversion' tab to enter file mode
  2. Drag and drop files into the upload area, or click to select files (supports multiple files, max 10MB each)
  3. The system will automatically detect each file's encoding, displayed in the 'Source Encoding' column. You can manually override if needed
  4. Select the target encoding for all files
  5. Click 'Convert All' to convert, then 'Download All' to save the converted files

Supported Input/Output Formats

  • Plain Text - Regular text content, directly input or paste
  • Base64 - Base64 encoded string, commonly used in email attachments and data URLs
  • Hex - Continuous hexadecimal bytes, e.g., 48656C6C6F
  • Hex with spaces - Space-separated hexadecimal bytes, e.g., 48 65 6C 6C 6F
  • C/C++ Array - C/C++ style byte array format, e.g., 0x48,0x65,0x6C,0x6C,0x6F

Common Use Cases

Fix Garbled Text

When you receive garbled text files or emails, use this tool to convert from the original encoding to the correct one to restore readable content.

Database Migration

When migrating data between different database systems or servers, use this tool to ensure character encoding consistency and prevent data corruption.

Web Development

Convert legacy web pages to UTF-8 encoding to ensure proper display across modern browsers and different platforms.

Cross-Platform File Sharing

Convert files between Windows (GBK), macOS, and Linux systems to ensure text displays correctly on all platforms.

Tips & Best Practices

  • Use 'Auto Detect' when you're unsure about the source encoding - the detection accuracy is high for most languages
  • Enable 'Show Hex' to view actual byte values, useful for debugging encoding issues
  • Add BOM (Byte Order Mark) when creating UTF-8/UTF-16 files for Windows applications that require it
  • For batch file conversion, use the 'File Conversion' tab which supports multiple files simultaneously
  • When converting between encodings, some characters may not exist in the target encoding and will be replaced with '?' or similar placeholders

Supported Encodings Reference

This tool supports 30+ character encodings covering major languages and regions worldwide. Below is a detailed reference for each supported encoding.

Unicode Encodings

Encoding Description Byte Range Specification
UTF-8 Variable-length Unicode encoding, the most widely used encoding on the web. Backward compatible with ASCII. 1-4 bytes RFC 3629
UTF-16 LE UTF-16 Little Endian, commonly used on Windows systems. Uses 2 or 4 bytes per character. 2/4 bytes RFC 2781
UTF-16 BE UTF-16 Big Endian, used in some network protocols and Java. Uses 2 or 4 bytes per character. 2/4 bytes RFC 2781

Chinese Encodings

Encoding Description Usage Specification
GBK Extended GB2312, supports 21,003 Chinese characters including traditional characters. Common in Simplified Chinese Windows. Simplified Chinese Windows, older websites IANA GBK
GB2312 Original Chinese national standard (1980), supports 6,763 simplified Chinese characters and 682 symbols. Legacy systems, emails GB 2312-1980
GB18030 Latest Chinese national standard, mandatory in China. Supports all Unicode characters including minority languages. Modern Chinese systems, government docs GB 18030-2005
Big5 Traditional Chinese encoding, primarily used in Taiwan and Hong Kong. Contains 13,060 traditional Chinese characters. Taiwan, Hong Kong websites IANA Charset

Japanese Encodings

Encoding Description Usage Specification
Shift_JIS Microsoft's Japanese encoding, supports JIS X 0201 and JIS X 0208 character sets. Windows, older websites, games IANA Charset
EUC-JP Extended Unix Code for Japanese, variable-length encoding compatible with ASCII. Unix/Linux systems, older websites IANA Charset
ISO-2022-JP 7-bit Japanese encoding using escape sequences. Also known as JIS encoding. Japanese emails, older systems RFC 1468

Korean Encodings

Encoding Description Usage Specification
EUC-KR Extended Unix Code for Korean, based on KS X 1001 standard. Supports 8,822 Korean characters (Hangul + Hanja). Korean websites, legacy systems RFC 1557

Western European Encodings

Encoding Description Languages Specification
ISO-8859-1 Also known as Latin-1, the first part of ISO-8859 series. Covers 191 characters from Western European languages. English, French, German, Spanish, Portuguese, Italian ISO/IEC 8859-1
ISO-8859-15 Latin-9, updates Latin-1 with Euro sign (€) and additional French/Finnish characters. Western European languages with Euro symbol ISO/IEC 8859-15
Windows-1252 Microsoft's extension to Latin-1, adds typographic characters like curly quotes and em-dashes. Western European languages on Windows Unicode.org

Cyrillic Encodings

Encoding Description Languages Specification
Windows-1251 Microsoft's Cyrillic encoding for Windows, supports Russian and other Cyrillic-based languages. Russian, Ukrainian, Bulgarian, Serbian Unicode.org
KOI8-R 8-bit Cyrillic encoding, designed for Russian. Characters are readable even when high bit is stripped. Russian RFC 1489
ISO-8859-5 ISO standard Cyrillic encoding, part of ISO-8859 series. Supports basic Cyrillic characters. Russian, Bulgarian, Macedonian, Serbian ISO/IEC 8859-5

Other Encodings

Encoding Description Usage Specification
ASCII American Standard Code for Information Interchange, the foundation of most modern encodings. 7-bit encoding with 128 characters. Basic English text, programming RFC 20
Macintosh Apple's original character encoding for Mac OS Classic, also known as Mac Roman. Legacy Mac files, old Mac applications Unicode.org

Related Tools