Charset Converter

Professional character encoding conversion with auto-detection support

Source Encoding

Target Encoding

Conversion Options

Add BOM (Byte Order Mark)

Remove BOM if present

Show Hex

Input Format

Output Format

Input Text

Input Characters: 0 Input Bytes: 0

Output Text

Output Characters: 0 Output Bytes: 0

Hexadecimal Output

🔒 100% Local ProcessingYour input data is processed entirely in your browser. It is not uploaded to any server.

Drop files here or click to select

Supports multiple files (Large files may affect performance)

Source Encoding

Auto-detect or manually select encoding for uploaded files

Target Encoding

Selected Files

File Name	Size	Source Encoding	Preview	Status

Charset Converter Documentation

What is Character Encoding?

Character encoding is a system that maps characters to numbers (code points) and then to bytes. Different encodings use different mappings, which is why text can appear garbled when opened with the wrong encoding. Choosing the correct encoding is crucial for properly displaying and processing multilingual text.

How to Use This Tool

Text Conversion Mode

Click the 'Text Conversion' tab to enter text conversion mode
Select the source encoding from the dropdown, or use 'Auto Detect' to automatically identify the encoding
Select the target encoding (default is UTF-8, the most universal encoding)
Choose input/output format: Plain Text, Base64, Hex, or C/C++ array format
Enter or paste your text, then click 'Convert' button. Use 'Copy' to copy the result or 'Download' to save as a file

File Conversion Mode

Click the 'File Conversion' tab to enter file mode
Drag and drop files into the upload area, or click to select files (supports multiple files)
The system will automatically detect each file's encoding, displayed in the 'Source Encoding' column. You can manually override if needed
Select the target encoding for all files
Click 'Convert All' to convert, then 'Download All' to save the converted files

Supported Input/Output Formats

Plain Text - Regular text content, directly input or paste
Base64 - Base64 encoded string, commonly used in email attachments and data URLs
Hex - Continuous hexadecimal bytes, e.g., 48656C6C6F
Hex with spaces - Space-separated hexadecimal bytes, e.g., 48 65 6C 6C 6F
C/C++ Array - C/C++ style byte array format, e.g., 0x48,0x65,0x6C,0x6C,0x6F

Common Use Cases

Fix Garbled Text

When you receive garbled text files or emails, use this tool to convert from the original encoding to the correct one to restore readable content.

Database Migration

When migrating data between different database systems or servers, use this tool to ensure character encoding consistency and prevent data corruption.

Web Development

Convert legacy web pages to UTF-8 encoding to ensure proper display across modern browsers and different platforms.

Cross-Platform File Sharing

Convert files between Windows (GBK), macOS, and Linux systems to ensure text displays correctly on all platforms.

Tips & Best Practices

Use 'Auto Detect' when you're unsure about the source encoding - the detection accuracy is high for most languages
Enable 'Show Hex' to view actual byte values, useful for debugging encoding issues
Add BOM (Byte Order Mark) when creating UTF-8/UTF-16 files for Windows applications that require it
For batch file conversion, use the 'File Conversion' tab which supports multiple files simultaneously
When converting between encodings, some characters may not exist in the target encoding and will be replaced with '?' or similar placeholders

Supported Encodings Reference

This tool supports 30+ character encodings covering major languages and regions worldwide. Below is a detailed reference for each supported encoding.

Unicode Encodings

Encoding	Description	Byte Range	Specification
UTF-8	Variable-length Unicode encoding, the most widely used encoding on the web. Backward compatible with ASCII.	1-4 bytes	RFC 3629
UTF-16 LE	UTF-16 Little Endian, commonly used on Windows systems. Uses 2 or 4 bytes per character.	2/4 bytes	RFC 2781
UTF-16 BE	UTF-16 Big Endian, used in some network protocols and Java. Uses 2 or 4 bytes per character.	2/4 bytes	RFC 2781

Chinese Encodings

Encoding	Description	Usage	Specification
GBK	Extended GB2312, supports 21,003 Chinese characters including traditional characters. Common in Simplified Chinese Windows.	Simplified Chinese Windows, older websites	IANA GBK
GB2312	Original Chinese national standard (1980), supports 6,763 simplified Chinese characters and 682 symbols.	Legacy systems, emails	GB 2312-1980
GB18030	Latest Chinese national standard, mandatory in China. Supports all Unicode characters including minority languages.	Modern Chinese systems, government docs	GB 18030-2005
Big5	Traditional Chinese encoding, primarily used in Taiwan and Hong Kong. Contains 13,060 traditional Chinese characters.	Taiwan, Hong Kong websites	IANA Charset

Japanese Encodings

Encoding	Description	Usage	Specification
Shift_JIS	Microsoft's Japanese encoding, supports JIS X 0201 and JIS X 0208 character sets.	Windows, older websites, games	IANA Charset
EUC-JP	Extended Unix Code for Japanese, variable-length encoding compatible with ASCII.	Unix/Linux systems, older websites	IANA Charset
ISO-2022-JP	7-bit Japanese encoding using escape sequences. Also known as JIS encoding.	Japanese emails, older systems	RFC 1468

Korean Encodings

Encoding	Description	Usage	Specification
EUC-KR	Extended Unix Code for Korean, based on KS X 1001 standard. Supports 8,822 Korean characters (Hangul + Hanja).	Korean websites, legacy systems	RFC 1557

Western European Encodings

Encoding	Description	Languages	Specification
ISO-8859-1	Also known as Latin-1, the first part of ISO-8859 series. Covers 191 characters from Western European languages.	English, French, German, Spanish, Portuguese, Italian	ISO/IEC 8859-1
ISO-8859-15	Latin-9, updates Latin-1 with Euro sign (€) and additional French/Finnish characters.	Western European languages with Euro symbol	ISO/IEC 8859-15
Windows-1252	Microsoft's extension to Latin-1, adds typographic characters like curly quotes and em-dashes.	Western European languages on Windows	Unicode.org

Cyrillic Encodings

Encoding	Description	Languages	Specification
Windows-1251	Microsoft's Cyrillic encoding for Windows, supports Russian and other Cyrillic-based languages.	Russian, Ukrainian, Bulgarian, Serbian	Unicode.org
KOI8-R	8-bit Cyrillic encoding, designed for Russian. Characters are readable even when high bit is stripped.	Russian	RFC 1489
ISO-8859-5	ISO standard Cyrillic encoding, part of ISO-8859 series. Supports basic Cyrillic characters.	Russian, Bulgarian, Macedonian, Serbian	ISO/IEC 8859-5

Other Encodings

Encoding	Description	Usage	Specification
ASCII	American Standard Code for Information Interchange, the foundation of most modern encodings. 7-bit encoding with 128 characters.	Basic English text, programming	RFC 20
Macintosh	Apple's original character encoding for Mac OS Classic, also known as Mac Roman.	Legacy Mac files, old Mac applications	Unicode.org

Related Tools

Text Encoding Converter

Convert text between Hex, Binary, Unicode, ASCII, Base64, and many other encoding formats

Base64 Encoder/Decoder

Quickly encode and decode Base64 strings, supporting both text and file conversion

URL Encoder/Decoder

Encode and decode URLs to ensure compliance and usability

HTML Encoder/Decoder

Convert special characters to HTML entities with named, decimal, and hexadecimal formats to prevent XSS attacks