[MPEG-OTSPEC] [EXTERNAL] Re: comments wrt wide glyph ID proposal

Peter Constable pconstable at microsoft.com
Thu Dec 21 20:09:40 CET 2023


It may be worth considering the implications a bit more in cases of _arrays_ of offsets. For example, in MultipleSubstFormat2, changing coverageOffset from Offset24 to Offset32 adds one byte to the size, but changing sequenceOffsets[] from Offset24 to Offset32 adds sequenceCount bytes to the size. Similarly for AlternateSubstFormat2 and alternateSetOffests[].

In these two cases, it’s possible these lookup types might not be heavily used in fonts today, so the impact might not be significant. (That’s an empirical question; it could be checked against some large collections such as Google Fonts.)

But I suspect ligature substitutions are fairly heavily used. One could investigate what are typical or worst-case values for ligatureSetCount (= coverage size) or for ligatureCount (number of ligatures with the same starting glyph). A large ligatureSetCount doesn’t present a perf concern since the LigatureSet array is in coverage order and coverage can be searched quickly. On the other hand with a large ligatureSetCount, the size impact of changing to Offset32 would be larger.

As Behdad pointed out in the meeting, having a large ligatureCount will run into processing perf issues since a fast search over the Ligature table array isn’t possible and they must be scanned in order. What’s the implication? On the one hand, one might argue that Offset32 isn’t needed for ligatureOffsets; but on the other hand, it might be argued that the impact of Offset32 for ligatureOffsets is limited since ligatureCount isn’t ever likely to be large.

Similar thinking applies to (chaining) contextual substitutions: the rule sets are in coverage order and can be scanned quickly, so there isn’t a processing perf reason to limit the count, though a large count would mean a larger size impact using Offset32. But within a rule set, rules have to be scanned in order hence there is a processing perf reason to limit the rule count.


One of the concerns MS folk raised is the potential confusion for developers mixing up offset sizes. The suggestion was that using Offset24 in more places would add to that concern. Of course, if we define formats that use Offset32 for some members but Offset24 for others, that would add further to that concern.

Pros and cons… I haven’t formulated a definite opinion on sub-subtable offsets.


Peter

From: Behdad Esfahbod <behdad at behdad.org>
Sent: Sunday, December 17, 2023 7:21 PM
To: Sergey Malkin <sergeym at microsoft.com>
Cc: Peter Constable <pconstable at microsoft.com>; Liam Quin <liam at fromoldbooks.org>; MPEG OT Spec list <mpeg-otspec at lists.aau.at>
Subject: Re: [EXTERNAL] Re: [MPEG-OTSPEC] comments wrt wide glyph ID proposal

I bumped the subtable-level offsets to Offset32. But I need direction about the sub-subtable-level offsets. They still pose some of the same "problems". That is, you cannot have all of the 24bit glyphs in the same subtable. I can address that by upgrading most of the Offset24's (now mostly in arrays) to Offset32.

How do people feel about this?

behdad
http://behdad.org/


On Tue, Dec 12, 2023 at 12:02 PM Sergey Malkin <sergeym at microsoft.com<mailto:sergeym at microsoft.com>> wrote:
> In those extreme cases the subtable can be broken down into more, like we currently do with 16bit offsets. I don't think it's a realistic limitation, but happy to bump all Coverage and ClassDef offsets to 32bit.

Fo me, main reason would be not overflow but potential incovenience because of too many formats. We have a mix of 16-24-32 bit offsets without clear reason why to one over another (and no real explanation what future additions should use). I'll have to look up the spec every time I want to read one or another. Saving one byte is not worth the trouble.

Thanks
Sergey


________________________________
From: mpeg-otspec <mpeg-otspec-bounces at lists.aau.at<mailto:mpeg-otspec-bounces at lists.aau.at>> on behalf of Behdad Esfahbod <behdad at behdad.org<mailto:behdad at behdad.org>>
Sent: Tuesday, December 12, 2023 4:39 AM
To: Peter Constable <pconstable at microsoft.com<mailto:pconstable at microsoft.com>>; Liam Quin <liam at fromoldbooks.org<mailto:liam at fromoldbooks.org>>
Cc: MPEG OT Spec list <mpeg-otspec at lists.aau.at<mailto:mpeg-otspec at lists.aau.at>>
Subject: [EXTERNAL] Re: [MPEG-OTSPEC] comments wrt wide glyph ID proposal

Thanks Peter for the excellent feedback.

Comments inline.

On Mon, Dec 11, 2023 at 7:54 PM Peter Constable <pconstable at microsoft.com<mailto:pconstable at microsoft.com>> wrote:



Hybrid narrow/wide fonts:

Hybrid fonts are going to be more challenging to build and maintain—much more so than hybrid COLRv0/v1. Attempting to engineer mechanisms specifically to accommodate hybrid fonts is likely to add to complexity.

I agree it should not be the focus of the work


TTCs:

A second take-away for us from thinking about hybrid fonts is that we think TTCs can provide another approach to creating hybrid fonts—one that could be easier for font developers to create and maintain. To that end, we think it would make sense to define a v2.1 TTC header that adds numFonts2 and tableDirectoryOffsets2 members, and provide guidance that software that supports wide glyph IDs should use only these new members, ignoring numFonts and tableDirectoryOffsets. In this way, older software could see only fonts with narrow glyph IDs, while newer software could see a distinct set of fonts without duplication.

I'd be happy to incorporate this.


This brought to my mind that, six – ten years ago (I forget the exact timeframe), there was discussion between Adobe, Apple and MS about defining a _dmap_ (delta cmap) table for use in TTCs: It’s very common in TTCs that there are cmap differences, with the result that each font in the TTC must have its own cmap without any sharing of data. In CJK fonts, the cmap table is one of the largest tables (probably second only to glyf or CFF / CFF2). Moreover, in a CJK font, the majority of mappings in a cmap table could be the same, with only a small portion of mappings being different. (E.g., in MS Gothic vs MS PGothic, all the ideograph glyphs are the same; it’s just Latins and punctuation that differ.) A dmap table would allow fonts in a TTC to share a common base cmap table with small, font-specific dmap tables handling differences. In our discussions, we came up with formats that would work, except we hadn’t figured out how to handle format 14 cmap subtables.

This reminds me of another idea we discussed in, I think, 2019, from Monotype to introduce a `cmap` subtable that would map individual characters to sequences of glyphs. Then the pre-composed Unicode characters wouldn't need to have their own glyphs. Back then we dropped the idea for backwards-compat reasons. But maybe we can pick it up now?



COLR, MATH:

We noted that the proposal doesn’t include any integration for COLR or MATH tables. There might be several things to consider in relation to the MATH table, and we have no concern with leaving that for future consideration.



But COLR might not be too difficult. So, we think it’s worth discussing options:

  1.  Postpone for future consideration.
  2.  Create a new major version — i.e., a new table tag — to design a table with wide glyph IDs (it wouldn’t need to support narrow IDs).
  3.  Create a minor version enhancement (COLR v2) that maintains backward compatibility while adding wide support.



The third option would need to add new offsets in the header for wide variants of base glyph and clip lists, with new BaseGlyphPaintRecord2 and ClipRecord2 formats. (There’d also need to be a new PaintGlyph format, but that will be true regardless.)



We haven’t yet decided which option we prefer; we just want to get it into discussion.

My preference is to introduce PaintGlyph's with wide gid's without bumping the format number for now, and postpone ClipList and other enhancements to a future v2 version. Note that there exist already COLRv1 fonts that hit the 64k glyph limit because of all the components. Those would become feasible with just a new PaintGlyph2 / PaintColrGlyph2 / etc extension and do not need the full ClipBox etc widening.




Max profile:

The current proposal doesn’t make any change wrt ‘maxp’, other than to say numGlyphs isn’t used for wide-GID support. In a hybrid font, it’s unclear what font developers should do with all the other maxp members: if they’re set as appropriate for narrow GIDs, then the values may not work for wide GIDs and the app could run out of resources. On the other hand, if the values are set for wide GIDs, those can work for both narrow and wide, but for older software could lead to over-allocation of unused resources.



Since we’re already considering glyf/loca and GLYF/LOCA that can exist side by side, it seems simple and clean to define a MAXP table that gets used only in conjunction with GLYF/LOCA. These tables are small, so the file size impact is negligible.

How real is the use of max profile data these days? My understanding is that since the data cannot be trusted anyway, software doesn't rely on it.



GPOS/GSUB:

It appears the proposal doesn’t yet include wide versions for common table formats that will be required (e.g., coverage). These will, of course, be needed

I'm surprised by that. But you are right, them seem missing from the PDF document. @Liam Quin<mailto:liam at fromoldbooks.org>

The proposal is:

  https://github.com/harfbuzz/boring-expansion-spec/issues/30


This may be an opportunity to deprecate certain formats from use in wide-GID fonts. E.g., GSUB type 5 and GPOS type 7 (contextual) were effectively obsoleted when the chaining contextual formats were added. If we agreed, then Contextual positioning / substitution subtable formats 4 – 6 wouldn’t need to be added.

I'm ambivalent here. Adding them is simple enough for me and keeps consistency.


Various formats are proposed using uint24 for subtable counts and Offset24 for subtable offsets. This could turn into a real limitation. For example, consider single substitution format 4: if glyphCount were 5,592,406, then the size of the substituteGlyphIDs[] array would exceed xFFFFFF and Offset24 for coverageOffset would not work. We’re inclined to make offsets and any counts not limited by 24-bit GIDs to be 32-bit.

In those extreme cases the subtable can be broken down into more, like we currently do with 16bit offsets. I don't think it's a realistic limitation, but happy to bump all Coverage and ClassDef offsets to 32bit.

Thanks,

behdad

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20231221/064aec79/attachment-0001.html>


More information about the mpeg-otspec mailing list