Proposal: Specify UTF encoding of Unicode strings
Sairus Patel
sppatel at adobe.com
Thu Nov 24 01:05:03 CET 2011
[I'm sending this to both the OT and OFF lists, per guidelines from the specifications' editors.]
=== Background
I don't believe OT/OFF specifies the encoding of 'name' table Unicode strings.
Microsoft and others, please verify in particular that <platform=3, encoding=10> strings must be in UTF-16. There are no fonts with such strings in my Windows 7 fonts folder so I can't easily comment on current practice.
Also, I don't believe the specification states anywhere that <3,0> strings have Unicode semantics, though that's how current "Windows symbol" fonts are made. I choose UCS-2 in the proposal below because <3,0> predates <3,10>, and so parsers may choke on surrogate pairs in <3,0>. But I'm fine with UCS-4 if that's what is preferred.
=== Proposal { my comments are in curly brackets }
{ In http://www.microsoft.com/typography/otspec/name.htm [OFF sec. 5.2.6]: }
1. { Insert the following sentence at the end of the paragraph "Unicode platform encoding ID 5 can be used for encodings in the 'cmap' table but not for strings in the 'name' table.": }
Strings for all Unicode platform encoding IDs other than 5 must be encoded in UTF-16 (big endian).
2. { Insert the following paragraphs at the end of the "Windows platform-specific encoding IDs (platform ID= 3)" section: }
Strings for Windows platform encoding ID 0 are considered to have Unicode semantics (UCS-2).
Strings for Windows platform encoding IDs 0, 1, and 10 must be encoded in UTF-16 (big endian).
Sairus
More information about the mpeg-otspec
mailing list