Re: [linux-minidisc] [commit] Linux utilities to deal with Sony Minidisc Walkman branch, master, updated. 0.1.0-9-g763a17a

Michael Karcher <Michael.Karcher@fu-berlin.de> · Mon, 11 Apr 2011 00:09:50 +0200

Am Sonntag, den 10.04.2011, 23:27 +0200 schrieb manner.moe@gmx.de:
> gconvert has following syntax:
> gchar * g_convert (const gchar *str, gssize len, const gchar *to_codeset, 
> const gchar *from_codeset, gsize *bytes_read, gsize *bytes_written, GError 
> **error);
> "to_codeset" and "from_codeset" have to be changed.
> I think your patch tries to convert utf8 into a valid himd encoding format.
You are right. That function has a confusing parameter order.

> My previous patch uses "length = strlen(string)" inside himd_add_string() and 
> doesn´t need the string length as an argument of himd_add_string().
Right.

> Your current patch uses "gsize length" uninitialized.
Wrong. It's initialized by g_convert. And you *can* not achieve what you
want using strlen. You need the length of the converted string in bytes.
The converted string might be in UTF-16, and strlen can not work on
UTF-16 strings.

> Compilation fails with error "too many arguments to function "himd_add_string".
Right. I forgot to add the patches to himddump.c

> If I understand your patch correctly your patch depends on strings encoded in 
> utf8.
Yes. This is intended, because the user of libhimd should not need to
care about what encodings are understood by the Hi-MD format. As we can
store UTF-16 on Hi-MD, all unicode characters can be represented (which
does not mean the Hi-MD Walkman necessarily can display them). Also
UTF-8 can represent all unicode characters, and UTF-8 is the more common
character set in the Linux community.

> This is true for strings taken from get_songinfo() in himddump.c which 
> reads the strings from the id3 tag and converts them to utf8 automatically.
Right.

> For example, if the user computers encoding is SHIFT_JIS we shouldn´t convert 
> it to utf8 and the try to reconvert it to make himd_add_string work.
Why that? In the general case, the users encoding might be something
completely different, like big5 (a chinese character set), Latin-5
(ISO-8859-9, a turkish character set) or Codepage 1251 (the default
cyrillic character set on Windows). Passing data in an arbitrary
character set does not seem to make sense to me.
Also note that with Windows NT and later, the preferred character set is
Unicode (the wide character functions all use UTF-16LE), the internal
characterset of Qt is Unicode (using UTF-16 with native endianness), and
the default character set of Gtk is also unicode (represented as UTF-8).
So if you get the string from a GUI, you most likely can get it as
unicode. The conversion unicode->shift_jis and shift_jis->unicode
(especially if performed by glib in both directions) should be
completely reversible and thus do no harm.

Regards,
  Michael Karcher