Charset Converter
Professional character encoding conversion with auto-detection support
Hexadecimal Output
Drop files here or click to select
Supports multiple files, max 10MB each
Auto-detect or manually select encoding for uploaded files
Selected Files
| File Name | Size | Source Encoding | Preview | Status |
|---|
Charset Converter Documentation
What is Character Encoding?
Character encoding is a system that maps characters to numbers (code points) and then to bytes. Different encodings use different mappings, which is why text can appear garbled when opened with the wrong encoding. Choosing the correct encoding is crucial for properly displaying and processing multilingual text.
How to Use This Tool
Text Conversion Mode
- Click the 'Text Conversion' tab to enter text conversion mode
- Select the source encoding from the dropdown, or use 'Auto Detect' to automatically identify the encoding
- Select the target encoding (default is UTF-8, the most universal encoding)
- Choose input/output format: Plain Text, Base64, Hex, or C/C++ array format
- Enter or paste your text, then click 'Convert' button. Use 'Copy' to copy the result or 'Download' to save as a file
File Conversion Mode
- Click the 'File Conversion' tab to enter file mode
- Drag and drop files into the upload area, or click to select files (supports multiple files, max 10MB each)
- The system will automatically detect each file's encoding, displayed in the 'Source Encoding' column. You can manually override if needed
- Select the target encoding for all files
- Click 'Convert All' to convert, then 'Download All' to save the converted files
Supported Input/Output Formats
- Plain Text - Regular text content, directly input or paste
- Base64 - Base64 encoded string, commonly used in email attachments and data URLs
- Hex - Continuous hexadecimal bytes, e.g., 48656C6C6F
- Hex with spaces - Space-separated hexadecimal bytes, e.g., 48 65 6C 6C 6F
- C/C++ Array - C/C++ style byte array format, e.g., 0x48,0x65,0x6C,0x6C,0x6F
Common Use Cases
Fix Garbled Text
When you receive garbled text files or emails, use this tool to convert from the original encoding to the correct one to restore readable content.
Database Migration
When migrating data between different database systems or servers, use this tool to ensure character encoding consistency and prevent data corruption.
Web Development
Convert legacy web pages to UTF-8 encoding to ensure proper display across modern browsers and different platforms.
Cross-Platform File Sharing
Convert files between Windows (GBK), macOS, and Linux systems to ensure text displays correctly on all platforms.
Tips & Best Practices
- Use 'Auto Detect' when you're unsure about the source encoding - the detection accuracy is high for most languages
- Enable 'Show Hex' to view actual byte values, useful for debugging encoding issues
- Add BOM (Byte Order Mark) when creating UTF-8/UTF-16 files for Windows applications that require it
- For batch file conversion, use the 'File Conversion' tab which supports multiple files simultaneously
- When converting between encodings, some characters may not exist in the target encoding and will be replaced with '?' or similar placeholders
Supported Encodings Reference
This tool supports 30+ character encodings covering major languages and regions worldwide. Below is a detailed reference for each supported encoding.
Unicode Encodings
| Encoding | Description | Byte Range | Specification |
|---|---|---|---|
| UTF-8 | Variable-length Unicode encoding, the most widely used encoding on the web. Backward compatible with ASCII. | 1-4 bytes | RFC 3629 |
| UTF-16 LE | UTF-16 Little Endian, commonly used on Windows systems. Uses 2 or 4 bytes per character. | 2/4 bytes | RFC 2781 |
| UTF-16 BE | UTF-16 Big Endian, used in some network protocols and Java. Uses 2 or 4 bytes per character. | 2/4 bytes | RFC 2781 |
Chinese Encodings
| Encoding | Description | Usage | Specification |
|---|---|---|---|
| GBK | Extended GB2312, supports 21,003 Chinese characters including traditional characters. Common in Simplified Chinese Windows. | Simplified Chinese Windows, older websites | IANA GBK |
| GB2312 | Original Chinese national standard (1980), supports 6,763 simplified Chinese characters and 682 symbols. | Legacy systems, emails | GB 2312-1980 |
| GB18030 | Latest Chinese national standard, mandatory in China. Supports all Unicode characters including minority languages. | Modern Chinese systems, government docs | GB 18030-2005 |
| Big5 | Traditional Chinese encoding, primarily used in Taiwan and Hong Kong. Contains 13,060 traditional Chinese characters. | Taiwan, Hong Kong websites | IANA Charset |
Japanese Encodings
| Encoding | Description | Usage | Specification |
|---|---|---|---|
| Shift_JIS | Microsoft's Japanese encoding, supports JIS X 0201 and JIS X 0208 character sets. | Windows, older websites, games | IANA Charset |
| EUC-JP | Extended Unix Code for Japanese, variable-length encoding compatible with ASCII. | Unix/Linux systems, older websites | IANA Charset |
| ISO-2022-JP | 7-bit Japanese encoding using escape sequences. Also known as JIS encoding. | Japanese emails, older systems | RFC 1468 |
Korean Encodings
| Encoding | Description | Usage | Specification |
|---|---|---|---|
| EUC-KR | Extended Unix Code for Korean, based on KS X 1001 standard. Supports 8,822 Korean characters (Hangul + Hanja). | Korean websites, legacy systems | RFC 1557 |
Western European Encodings
| Encoding | Description | Languages | Specification |
|---|---|---|---|
| ISO-8859-1 | Also known as Latin-1, the first part of ISO-8859 series. Covers 191 characters from Western European languages. | English, French, German, Spanish, Portuguese, Italian | ISO/IEC 8859-1 |
| ISO-8859-15 | Latin-9, updates Latin-1 with Euro sign (€) and additional French/Finnish characters. | Western European languages with Euro symbol | ISO/IEC 8859-15 |
| Windows-1252 | Microsoft's extension to Latin-1, adds typographic characters like curly quotes and em-dashes. | Western European languages on Windows | Unicode.org |
Cyrillic Encodings
| Encoding | Description | Languages | Specification |
|---|---|---|---|
| Windows-1251 | Microsoft's Cyrillic encoding for Windows, supports Russian and other Cyrillic-based languages. | Russian, Ukrainian, Bulgarian, Serbian | Unicode.org |
| KOI8-R | 8-bit Cyrillic encoding, designed for Russian. Characters are readable even when high bit is stripped. | Russian | RFC 1489 |
| ISO-8859-5 | ISO standard Cyrillic encoding, part of ISO-8859 series. Supports basic Cyrillic characters. | Russian, Bulgarian, Macedonian, Serbian | ISO/IEC 8859-5 |
Other Encodings
| Encoding | Description | Usage | Specification |
|---|---|---|---|
| ASCII | American Standard Code for Information Interchange, the foundation of most modern encodings. 7-bit encoding with 128 characters. | Basic English text, programming | RFC 20 |
| Macintosh | Apple's original character encoding for Mac OS Classic, also known as Mac Roman. | Legacy Mac files, old Mac applications | Unicode.org |
Related Tools
Text Encoding Converter
Convert text between Hex, Binary, Unicode, ASCII, Base64, and many other encoding formats
Base64 Encoder/Decoder
Quickly encode and decode Base64 strings, supporting both text and file conversion
URL Encoder/Decoder
Encode and decode URLs to ensure compliance and usability
HTML Encoder/Decoder
Convert special characters to HTML entities with named, decimal, and hexadecimal formats to prevent XSS attacks