[MPEG-OTSPEC] [EXTERNAL] Re: comments wrt wide glyph ID proposal

Sergey Malkin sergeym at microsoft.com
Tue Dec 12 20:02:47 CET 2023


> In those extreme cases the subtable can be broken down into more, like we currently do with 16bit offsets. I don't think it's a realistic limitation, but happy to bump all Coverage and ClassDef offsets to 32bit.

Fo me, main reason would be not overflow but potential incovenience because of too many formats. We have a mix of 16-24-32 bit offsets without clear reason why to one over another (and no real explanation what future additions should use). I'll have to look up the spec every time I want to read one or another. Saving one byte is not worth the trouble.

Thanks
Sergey



________________________________
From: mpeg-otspec <mpeg-otspec-bounces at lists.aau.at> on behalf of Behdad Esfahbod <behdad at behdad.org>
Sent: Tuesday, December 12, 2023 4:39 AM
To: Peter Constable <pconstable at microsoft.com>; Liam Quin <liam at fromoldbooks.org>
Cc: MPEG OT Spec list <mpeg-otspec at lists.aau.at>
Subject: [EXTERNAL] Re: [MPEG-OTSPEC] comments wrt wide glyph ID proposal

Thanks Peter for the excellent feedback.

Comments inline.

On Mon, Dec 11, 2023 at 7:54 PM Peter Constable <pconstable at microsoft.com<mailto:pconstable at microsoft.com>> wrote:


Hybrid narrow/wide fonts:

Hybrid fonts are going to be more challenging to build and maintain—much more so than hybrid COLRv0/v1. Attempting to engineer mechanisms specifically to accommodate hybrid fonts is likely to add to complexity.

I agree it should not be the focus of the work


TTCs:

A second take-away for us from thinking about hybrid fonts is that we think TTCs can provide another approach to creating hybrid fonts—one that could be easier for font developers to create and maintain. To that end, we think it would make sense to define a v2.1 TTC header that adds numFonts2 and tableDirectoryOffsets2 members, and provide guidance that software that supports wide glyph IDs should use only these new members, ignoring numFonts and tableDirectoryOffsets. In this way, older software could see only fonts with narrow glyph IDs, while newer software could see a distinct set of fonts without duplication.

I'd be happy to incorporate this.


This brought to my mind that, six – ten years ago (I forget the exact timeframe), there was discussion between Adobe, Apple and MS about defining a _dmap_ (delta cmap) table for use in TTCs: It’s very common in TTCs that there are cmap differences, with the result that each font in the TTC must have its own cmap without any sharing of data. In CJK fonts, the cmap table is one of the largest tables (probably second only to glyf or CFF / CFF2). Moreover, in a CJK font, the majority of mappings in a cmap table could be the same, with only a small portion of mappings being different. (E.g., in MS Gothic vs MS PGothic, all the ideograph glyphs are the same; it’s just Latins and punctuation that differ.) A dmap table would allow fonts in a TTC to share a common base cmap table with small, font-specific dmap tables handling differences. In our discussions, we came up with formats that would work, except we hadn’t figured out how to handle format 14 cmap subtables.

This reminds me of another idea we discussed in, I think, 2019, from Monotype to introduce a `cmap` subtable that would map individual characters to sequences of glyphs. Then the pre-composed Unicode characters wouldn't need to have their own glyphs. Back then we dropped the idea for backwards-compat reasons. But maybe we can pick it up now?



COLR, MATH:

We noted that the proposal doesn’t include any integration for COLR or MATH tables. There might be several things to consider in relation to the MATH table, and we have no concern with leaving that for future consideration.



But COLR might not be too difficult. So, we think it’s worth discussing options:

  1.  Postpone for future consideration.
  2.  Create a new major version — i.e., a new table tag — to design a table with wide glyph IDs (it wouldn’t need to support narrow IDs).
  3.  Create a minor version enhancement (COLR v2) that maintains backward compatibility while adding wide support.



The third option would need to add new offsets in the header for wide variants of base glyph and clip lists, with new BaseGlyphPaintRecord2 and ClipRecord2 formats. (There’d also need to be a new PaintGlyph format, but that will be true regardless.)



We haven’t yet decided which option we prefer; we just want to get it into discussion.

My preference is to introduce PaintGlyph's with wide gid's without bumping the format number for now, and postpone ClipList and other enhancements to a future v2 version. Note that there exist already COLRv1 fonts that hit the 64k glyph limit because of all the components. Those would become feasible with just a new PaintGlyph2 / PaintColrGlyph2 / etc extension and do not need the full ClipBox etc widening.




Max profile:

The current proposal doesn’t make any change wrt ‘maxp’, other than to say numGlyphs isn’t used for wide-GID support. In a hybrid font, it’s unclear what font developers should do with all the other maxp members: if they’re set as appropriate for narrow GIDs, then the values may not work for wide GIDs and the app could run out of resources. On the other hand, if the values are set for wide GIDs, those can work for both narrow and wide, but for older software could lead to over-allocation of unused resources.



Since we’re already considering glyf/loca and GLYF/LOCA that can exist side by side, it seems simple and clean to define a MAXP table that gets used only in conjunction with GLYF/LOCA. These tables are small, so the file size impact is negligible.

How real is the use of max profile data these days? My understanding is that since the data cannot be trusted anyway, software doesn't rely on it.



GPOS/GSUB:

It appears the proposal doesn’t yet include wide versions for common table formats that will be required (e.g., coverage). These will, of course, be needed

I'm surprised by that. But you are right, them seem missing from the PDF document. @Liam Quin<mailto:liam at fromoldbooks.org>

The proposal is:

  https://github.com/harfbuzz/boring-expansion-spec/issues/30


This may be an opportunity to deprecate certain formats from use in wide-GID fonts. E.g., GSUB type 5 and GPOS type 7 (contextual) were effectively obsoleted when the chaining contextual formats were added. If we agreed, then Contextual positioning / substitution subtable formats 4 – 6 wouldn’t need to be added.

I'm ambivalent here. Adding them is simple enough for me and keeps consistency.


Various formats are proposed using uint24 for subtable counts and Offset24 for subtable offsets. This could turn into a real limitation. For example, consider single substitution format 4: if glyphCount were 5,592,406, then the size of the substituteGlyphIDs[] array would exceed xFFFFFF and Offset24 for coverageOffset would not work. We’re inclined to make offsets and any counts not limited by 24-bit GIDs to be 32-bit.

In those extreme cases the subtable can be broken down into more, like we currently do with 16bit offsets. I don't think it's a realistic limitation, but happy to bump all Coverage and ClassDef offsets to 32bit.

Thanks,

behdad
<https://lists.aau.at/mailman/listinfo/mpeg-otspec>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20231212/4c6ddfb9/attachment-0001.html>


More information about the mpeg-otspec mailing list