[mpeg-OTspec] Re: New work on 3rd edition of the OFF (AHG kick-off) - name table

Peter Constable petercon at microsoft.com
Wed Aug 22 19:35:36 CEST 2012


There’s one other possible angle on using 3/0 name records: I don’t know if, back in 1990, the thinking was that you might want to have different strings (for a given name ID) on different platforms, and hence that the platform/encoding sub-keys mattered as much to filter strings according to the target platform as to identify the encoding scheme.

If I were redesigning the ‘name’ table today, I’d design around assumptions that strings are platform agnostic and are always in a specific Unicode encoding scheme (either UTF-8 or UTF-16).


Peter

From: mpeg-OTspec at yahoogroups.com [mailto:mpeg-OTspec at yahoogroups.com] On Behalf Of Peter Constable
Sent: August 22, 2012 10:24 AM
To: Behdad Esfahbod
Cc: Levantovsky, Vladimir; bobh528; mpeg-OTspec at yahoogroups.com
Subject: RE: [mpeg-OTspec] Re: New work on 3rd edition of the OFF (AHG kick-off) - name table



What's the metric for determining "wrong"?

I agree that this looks like it may have been copied from the 'cmap' table. If nothing else, the text could have been better contextualized (e.g., "...strings should be encoded using [encoding scheme] and corresponding NameRecord entries should use [platform x, encoding y]...").

And I also agree that it looks pretty nonsensical. E.g., the strings for Wingdings cannot possibly be expressed using the (non-standard) characters supported in Wingdings, and so also encoding the strings in that (font-specific) symbol encoding makes no sense.

Even so, for better or worse, it appears that symbol fonts on Windows have since long used 3/0 NameRecord entries that point to string data that is Unicode encoded -- specifically, I experimented to confirm that the string data will be interpreted using UTF-16 on Windows 8 (and I'm certain it would be the same on earlier versions).

It would probably be better to document with the details as I've expressed here: NameRecord entries with 3/0 pointing to string data encoded with UTF-16.

An interesting experiment that I haven't tried would be to create a font that uses 3/0 in the 'cmap' table, but 3/1 in the 'name' table. My guess is that it would work as desired, though that would need to be confirmed. At the same time, it would not at all surprise me if commonly-use type design tools such as FontLab do not make it easy to construct fonts that way. (I don't presently have FL installed on a machine to confirm.)


Peter

-----Original Message-----
From: Behdad Esfahbod [mailto:behdad.esfahbod at gmail.com<mailto:behdad.esfahbod%40gmail.com>] On Behalf Of Behdad Esfahbod
Sent: August 21, 2012 1:21 PM
To: Peter Constable
Cc: Levantovsky, Vladimir; bobh528; mpeg-OTspec at yahoogroups.com<mailto:mpeg-OTspec%40yahoogroups.com>
Subject: Re: [mpeg-OTspec] Re: New work on 3rd edition of the OFF (AHG kick-off) - name table

I have wanted to raise an issue with 'name' table and Windows encodings for a while. To me, it looks like there's some confusion in the spec right now.
The 'name' table page on MS otspec [1] says:

"When building a Unicode font for Windows, the platform ID should be 3 and the encoding ID should be 1. When building a symbol font for Windows, the platform ID should be 3 and the encoding ID should be 0."

This is plain wrong. I assume it was copy/pasted from the 'cmap' table. The platform/encoding ID for name table should match the encoding of the name strings, not what kind of glyphs the font has. Indeed, in fontconfig I had to allow encoding ID 0 in 'name' table, to mean UTF-16BE...

Any clarification in the space will be appreciated.

In the same vein, the same page, platform ID 3 encoding ID 10 is called Unicode UCS-4 (which makes sense for the 'cmap' table), but in the 'name'
table it probably should say UTF-16BE instead.

behdad


[1] http://www.microsoft.com/typography/otspec/name.htm


On 08/21/2012 04:00 PM, Peter Constable wrote:
>
>
> One issue Bob’s comments raise has to do with the way that platform
> and encoding IDs are used both for name records and cmap subtables. In
> a cmap subtable, the difference between UCS-2 and UTF-16 is really
> important since specific formats would be needed to support UTF-16. In
> contrast, there’s nothing that would necessarily need to be different
> for name table data structures. In fact, I doubt that there’s anywhere
> in the Windows platform where a name table string might get processed
> that would assume UCS-2 and _/not/_ UTF-16.
>
> Hence, there might not be any problem if the spec were to state that
> 3/1 _/or/_ 3/10 name strings are assumed to be encoded as UTF-16; or
> even further, to stipulate that 3/10 should not be used in name
> records and that 3/1 name strings are assumed to be UTF-16.
>
>
>
>
>
> Peter
>
>
>
> *From:*mpeg-OTspec at yahoogroups.com<mailto:%2Ampeg-OTspec%40yahoogroups.com>
> [mailto:mpeg-OTspec at yahoogroups.com<mailto:mpeg-OTspec%40yahoogroups.com>] *On Behalf Of *Levantovsky,
> Vladimir
> *Sent:* August 8, 2012 8:28 AM
> *To:* bobh528; mpeg-OTspec at yahoogroups.com<mailto:mpeg-OTspec%40yahoogroups.com>
> *Subject:* RE: [mpeg-OTspec] Re: New work on 3rd edition of the OFF
> (AHG
> kick-off) - name table
>
>
>
>
>
> Hi Bob,
>
>
>
> Thank you very much for taking the time to review the draft and for your comments.
>
> Aside from the changes in OS/2 Panose field and new ‘rclt’ feature
> description, all other changes you currently see in the draft are
> rolled in from already issued and approved prior amendments and
> corrigendum. The text of the ‘name’ table description hasn’t been
> modified at all recently, the last changes we made were discussed back
> in 2009/2010 when the second amendment was finalized. I verified that
> the current text is the exact match of OT 1.6
> (http://www.microsoft.com/typography/otspec/name.htm) – with the
> exception of the example page
> (http://www.microsoft.com/typography/otspec/namesmp.htm) that is nested in the HTML version of OT1.6 and ‘inlined’ in the ISO text.
>
>
>
> I agree with you that there are quite a few places where the current ‘name’
> table text could be improved – in fact, the total re-write of this
> section was already proposed by Josh Hadley earlier this year:
> http://tech.groups.yahoo.com/group/mpeg-OTspec/message/714
>
> Now may be a good time to discuss it in details and see if we can
> improve this section of the spec while the editing period is still open (until 8/31/12).
> However, it’s not “now or never” kind of deal so I don’t want anyone
> to feel rushed to make changes – the clarity of the spec is what
> matters so if it takes us little longer to finalize it – it’s fine
> (this is what the working drafts are for).
>
>
>
> Thank you,
>
> Vlad
>
>
>
>
>
>
>
> *From:*mpeg-OTspec at yahoogroups.com<mailto:%2Ampeg-OTspec%40yahoogroups.com>
> <mailto:mpeg-OTspec at yahoogroups.com<mailto:mpeg-OTspec%40yahoogroups.com>>
> [mailto:mpeg-OTspec at yahoogroups.com<mailto:mpeg-OTspec%40yahoogroups.com>] *On Behalf Of *bobh528
> *Sent:* Tuesday, August 07, 2012 6:04 PM
> *To:* mpeg-OTspec at yahoogroups.com<mailto:mpeg-OTspec%40yahoogroups.com> <mailto:mpeg-OTspec at yahoogroups.com<mailto:mpeg-OTspec%40yahoogroups.com>>
> *Subject:* [mpeg-OTspec] Re: New work on 3rd edition of the OFF (AHG
> kick-off)
> - name table
>
>
>
>
>
>
>
> (sorry -- previous post seems to have gone astray...)
>
>
>
> On 2012-07-27 at 15:06 Levantovsky, Vladimir wrote:
>
> I would like to ask you to review the first draft text
>
>
> Thanks for getting this process going.
>
> I have some questions about the spec for the name table.
>
> 1) In section 5.2.6.3 Name IDs, below the table of name IDs, is a Note
> in which the text:
>
> All 'name' table strings for platform ID 3 (Windows platform) must be
> in Unicode, using the UTF-16 encoding form. The character set encding for 'name'
> table strings with platform ID 0 (Macintosh) is determined by the encoding ID.
>
> has been replaced with:
>
>
> Note that OS/2 and Windows both require that all name strings be
> defined in Unicode. Thus all 'name' table strings for platform ID = 3
> (Windows) will require two bytes per character. Macintosh fonts require single byte strings.
>
>
> This appears to be a regression to the text from MS spec 1.6 -- is
> that intended? If so, the "two bytes per character" phrase needs to
> be updated to modern language.
>
> But in either case, a key question is whether SMP characters (coded
> using surrogate pairs) are permitted or not. If they are, then the
> correct term to use is "UTF-16". If they are not, then "UTF-16" is
> /not/ the correct term -- I think the correct term would then be "UCS-2".
>
> 2) Section 5.2.6.2 5.2.6.2 /Platform IDs, Platform-specific encoding
> IDs and Language IDs/ currently includes this table:
>
>
> *Windows platform-specific encoding IDs (platform ID= 3)*
>
> Platform ID
>
>
>
> Encoding ID
>
>
>
> Description
>
> 3
>
>
>
> 0
>
>
>
> Symbol
>
> 3
>
>
>
> 1
>
>
>
> Unicode BMP (UCS-2)
>
> 3
>
>
>
> 2
>
>
>
> ShiftJIS
>
> 3
>
>
>
> 3
>
>
>
> PRC
>
> 3
>
>
>
> 4
>
>
>
> Big5
>
> 3
>
>
>
> 5
>
>
>
> Wansung
>
> 3
>
>
>
> 6
>
>
>
> Johab
>
> 3
>
>
>
> 7
>
>
>
> Reserved
>
> 3
>
>
>
> 8
>
>
>
> Reserved
>
> 3
>
>
>
> 9
>
>
>
> Reserved
>
> 3
>
>
>
> 10
>
>
>
> Unicode UCS-4
>
>
> What does the third column of this table mean? In the context, it
> seems to be saying that if I want a name string with SMP characters in
> it, then I can use
> 3/10 encoding and encode the string in UCS-4. Is that what it is
> really saying? If this is true, then it goes counter to /either /of
> the quotes in my question 1 above (about UTF-16 or 2-byte characters).
>
> Bob Hallissy
>
>
>
>
>
>



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20120822/1bfda44e/attachment.html>


More information about the mpeg-otspec mailing list