Conv$

ConvertedString = Conv$ ( String AS String , SourceCharset AS String , DestinationCharset AS String ) AS String ConvertedString = Conv ( String AS String , SourceCharset AS String , DestinationCharset AS String ) AS String

Converts a string from one charset to another charset. A charset is represented by a string like "ASCII", "ISO-8859-1", or "UTF-8".

The Gambas interpreter internally uses the UTF-8 charset.

The charset used by the system is returned by System.Charset. It was ISO-8859-15 on a Mandrake 10.2, but now all Linux systems I know are UTF-8 based.

The charset used by the graphical user interface is returned by Desktop.Charset. It should always be UTF-8.

Note that not all combinations of encoding names can be used for the SourceCharset and DestinationCharset parameters and that a coded character set can have a number of aliases.

The conversion uses the iconv() GNU library function and can convert, among many other encodings, encoded Turkish (ISO-8859-9), Korean (EUC-KR), Simplified Chinese (GB2312), Arabic (WINDOWS-1256), Cyrillic (KOI8-R),Japanese (ISO-2022-JP)... into human-readable UTF-8.

For a full list of supported international text conversions type iconv -l in a terminal.

The returned string is internally terminated by a null byte, but it will not be enough if you send that string to an extern function expecting, for example, a wide-character null-terminated string (as known as wchar_t in C), which expects a string ended by four null bytes.

To workaround that problem, add a null byte to the string before converting it:
Print Conv(MyString & Chr$(0), "UTF-8", "WCHAR_T")

That way, you will be sure that the string is null-terminated in the target character set.

Errors

Message Description
Bad string conversion (32) The string to convert contains untranslatable characters.
Unsupported string conversion (31) The specified charsets are unknown, or cannot be converted.

Examples

DIM sStr AS String
DIM iInd AS Integer

sStr = Conv$("Gambas", "ASCII", "EBCDIC-US")

FOR iInd = 1 TO Len(sStr)
  PRINT Hex$(Asc(Mid$(sStr, iInd, 1)), 2); " ";
NEXT
C7 81 94 82 81 A2

PRINT Conv$("\xE7\xC1\xCD\xC2\xC1\xD3\x20\xD0\xCF\xDE\xD4\xC9\x20\xCF\xDA\xCE\xC1\xDE\xC1\xC5\xD4\x20\xC2\xC5\xCA\xD3\xC9\xCB","KOI8-R","UTF-8")
Гамбас почти означает бейсик

See also