Multi-byte strings

To support strings with special characters, the OS uses a custom encoding in which some characters take not one byte, but two. Here’s an example of such a string:

 const char* string = "S\xE6\x16me \xE6\x02c\xE6\x08\xE6\x0Bnts"

When printed to screen with a function prepared for dealing with multi-byte strings, the previous string would be shown like this:

1

You can see that certain characters are encoded as they would be on any other platform that supports ASCII. But the character set used by the OS is non-standard: certain characters do not correspond to their common ASCII meaning, and characters above 0x7F do not correspond to any known character encodings at all. For example, if a string contains a line feed character (code 10, usually referred in C with the code “\n”), none of the known text printing syscalls actually interpret it as such, and instead display a graphical representation of the code:

2

There are various syscalls related to handling multi-byte strings, including detection of the “leading” byte and of the second byte, and even special versions of strcpy, strcmp and strcat (which aren’t really necessary, as the usual implementations of these functions appear to work just fine with multi-byte strings). So far, the only documented one is MB_ElementCount, which allows for getting the number of characters, as printed, on a string.

Characters supported #

Most latin accents are defined as multi-byte characters. The whole Greek alphabet also appears to be supported as multi-byte characters, and same for the Russian Cyrillic alphabet.

CJK Text #

Text in Chinese and related languages is supported using the Chinese standard GB 18030 encoding. When using PrintXY the first two bytes of the provided string are passed to ProcessPrintChars to set the requested encoding, and reset to the default afterwards. For other display syscalls like PrintCXY, the character set must be set manually using either ProcessPrintChars or EnableGB18030 (or DisableGB18030 to switch back to the default latin character set).

The CJK font is wider than the default, reducing the homescreen text width from 21 to 16 characters. See Fonts for more information.

Font support #

Not all fonts included in the OS support all the characters - in fact, some don’t even support many ASCII codes otherwise supported by other OS fonts. See Fonts for more information.