July 13, 2004

UTF-8 conversion support for mIRC

mIRC's lack of UTF-8 support has been an issue for quite some time. The author promised to 'look at it', but in the meantime, chatting in UTF-8 is not possible. This is problematic for any language that uses more than the occasional accented letter.

So I decided to make a temporary fix myself. The result is a flexible conversion mechanism between UTF-8 and the ANSI codepages. The user sees and types regular ANSI characters, but all data which is sent to and received from the IRC server is UTF-8 encoded. You are still limited to one ANSI codepage though: making mIRC support real Unicode is not possible without an mIRC rewrite.

The script performs a real UTF-8 encoding/decoding, so unlike a simple 'find and replace' approach, characters which do not fit into the current codepage are indicated as such.

I included conversion tables for all of the Windows ANSI codepages:

1250 (ANSI - Central Europe)
1251 (ANSI - Cyrillic)
1252 (ANSI - Western Europe / Latin I)
1253 (ANSI - Greek)
1254 (ANSI - Turkish)
1255 (ANSI - Hebrew)
1256 (ANSI - Arabic)
1257 (ANSI - Baltic)
1258 (ANSI/OEM - Viet Nam)

There is also a little utility (with source) for generating conversion tables for more codepages.

For instructions on how to use it, check the top of the utf-8.mrc file. You can download the script here (19 KB).

Important: This script is provided as-is without any guarantees. Use it if you like it, but don't bug me if you can't get it to work. If you find bugs, feel free to report them, but try to give a little more information than just 'it doesn't work'.

Dev IRC Unicode July 13, 2004

Hackery, Math & Design

Steven Wittens i

UTF-8 conversion support for mIRC