[MPEG-OTSPEC] [MPEGGroup/OpenFontFormat] 32-bit glyph IDs: what and why (#10)

Sairus Patel sppatel at adobe.com
Fri Sep 18 03:16:30 CEST 2020


The CFF2<https://docs.microsoft.com/en-us/typography/opentype/spec/cff2> table was 32-bit glyph ID–ready exactly 4 years ago (to the week):

uint32   count     Number of objects stored in INDEX


(Other benefits of this glorious table with the attractively space-less tag:


  *   Data redundant with the rest of OT removed, e.g. glyph widths and even the hallowed PostScript FontName, thus making it even smaller and – more importantly – finally an integral part of OT instead of a full-fledged standalone font format spliced into the sfnt (understandable & appropriate at the time)

  *   Legacy concepts such as glyph names and CID, crucial in their time, removed (I distinctly remember the high-five between two major stakeholder representatives at the OT spec meeting at which that last was mentioned; happily, my request for a re-enactment for a photograph was granted)

  *   All the geeky quirky goodness of CFF that gave the ‘C’ to its name was retained, resulting in greatly reduced file size compared to TT for Japanese variable fonts; numbers to be shared in another thread at some point

  *   CFF2 allows for variable as well as non-variable fonts; it is truly the future-looking CFF in OpenType & theoretically the only version of CFF (and the only OT outline flavor, period, since that is being brought up in other threads) that needs to be supported in forward-looking implementations that want cubic Béziers and for which the hinting model is desirable or permissible

  *   Already implemented in all major font engines

You can blame my enthusiasm and possible thread derailing on the fact that Adobeites are about to embark on a 3-day weekend, with clear skies and air here, but all that accounts for only part of it.)

Sairus


From: Peter Constable <notifications at github.com>
Reply-To: MPEGGroup/OpenFontFormat <reply+ACA2O6A2PLMQIEXRXHHZCAN5NZEWPEVBNHHCTZZ5BM at reply.github.com>
Date: Wednesday, September 16, 2020 at 12:32 PM
To: MPEGGroup/OpenFontFormat <OpenFontFormat at noreply.github.com>
Cc: Subscribed <subscribed at noreply.github.com>
Subject: [MPEGGroup/OpenFontFormat] 32-bit glyph IDs: what and why (#10)


This is to start some high-level discussion regarding the possibility of supporting 32-bit glyph IDs, but in (high-level) terms of what it would entail and potential reasons / benefits for it.

Benefits (why)
CJK

A primary motivation for 32-bit GIDs that's been around for some time is for CJK fonts: there are more than 64K Han ideographs encoded in Unicode, so currently it's not possible to create a single font to support even default glyphs for all Han ideographs. (Then there are all the other characters needed in typical CJK fonts, "horizontal" (market-specific) variants, glyphs for Ideographic Variation Sequences, etc.)

Up to now, it's been necessary to split characters/glyphs into separate fonts. E.g, Simsun and Simsun-ExtB. Such font pairs can potentially be packaged together into a single TTC file, but that is only one limitation. Bigger problems pertain to the relevant characters being divided between two font names—in fact, two font families! When authoring a document, the font applied to a CJK might have to change back and forth. A app could perhaps handle that automatically... if there were a way for it to know how. Nothing in a font currently can indicate it's a unicode-range complement to some other font. (In Web content, @fontface<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Ffontface&data=02%7C01%7Csppatel%40adobe.com%7C640895a9deca4ff705ab08d85a775014%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637358815782364667&sdata=ZQXCjPo2P0HYVPIMvOE9F7M0DcSfMiEAKZXzi%2FUZHtw%3D&reserved=0> rules do provide a way to work around this in CSS; but there's nothing like that in DTP/word-processing apps generally.) So, in the general case, it's up to the user to handle.

A further issue is that there cannot be any OpenType Layout (OTL) interaction between glyphs in different fonts. So, for example, if a spacing adjustment is needed in certain contexts, a lookup for that context can't be created if the glyphs from from two different fonts.

Implementation scenarios requiring many glyph IDs

There are some scenarios in which implementation requires extra glyph IDs.

For example, in some OTL font implementations, it might be necessary to have a GSUB derivation in successive lookups in which earlier lookups map glyph sequence into transient sequences that are then mapped to final sequences in later lookups. In those middle sequences, virtual glyphs (GID without real glyph data) might be used to distinguish the transitional states.

Colour font implementations are another scenario in which many GIDs may get consumed. For example, a colour depiction for emoji character might be comprised of many elements. In a COLR implementation, graphic elements can be combined into a single glyph if the have exactly the same fill and can be placed in to same layer of a z-order stack. Otherwise, different glyph IDs are required. If the colour depiction requires 10 different fills, then at least 10 GIDs will be needed.

What

The simple idea is that glyph IDs would all become 32-bit. But this should be un-packed some.

By far, the biggest hurdle for breaking the 64K glyph limit is in OTL tables: to support 32-bit GIDs in GPOS, GSUB and GDEF would entail a very major change affecting many dozens of structures. That would require a large engineering investment for ipmlementers.

In the current OT/OFF format, the 'glyf' table is not limited in the number of glyphs for which data is included except that the table length (in the font's table directory) is a 32-bit value. Only the 64K glyph limit has been identified as a problem, not the size of glyph data. The only issue for 'glyf' would be in relation to component GIDs within composite glyph descriptions.

The 'loca' table format has an inherent limit of 2^28 glyph IDs (max elements in the offsets[] array due to max table length in TableDirectory). Presumably more than enough.

Similarly, the inherent limitation in the hmtx table is long metrics for at most 2^28 glyphs.

For the 'glyf', 'loca' and 'hmtx' tables, the only real limitation (for practical purposes) is maxp.numGlyphs being uint16.

I'll assume that glyph names in production fonts for >64 glyphs is not a goal.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FMPEGGroup%2FOpenFontFormat%2Fissues%2F10&data=02%7C01%7Csppatel%40adobe.com%7C640895a9deca4ff705ab08d85a775014%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637358815782364667&sdata=OBZ5oeQnwG9dwhLj9CJd38ln%2F8oxpBG0lJAc%2FaCeh1A%3D&reserved=0>, or unsubscribe<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FACA2O6CNR6K73MA4YYGZJTTSGEHGPANCNFSM4RPJ2J6A&data=02%7C01%7Csppatel%40adobe.com%7C640895a9deca4ff705ab08d85a775014%7Cfa7b1b5a7b34438794aed2c178decee1%7C0%7C0%7C637358815782374662&sdata=0SZxTDTM3pkkVNbwUEtU5RgoWf5VSIYUnc6cZUFmVjs%3D&reserved=0>.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20200918/9c038655/attachment-0001.html>


More information about the mpeg-otspec mailing list