[MPEG-OTSPEC] [EXTERNAL] Re: Cmap format to map 1 char to multiple glyphs?

Skef Iterum skef at skef.org
Thu Dec 21 22:35:45 CET 2023


Hmm, and going down this road a little further, I suppose one could 
require that all of the relevant substitutions happen in a GSUB feature 
with with a specific tag, applied before any other, analogous to rvrn. 
That would make validation much more straightforward.

Skef

On 12/21/23 13:27, Skef Iterum wrote:
>
> This is just an idea (and perhaps an ill-informed one if I'm not 
> understanding the full problem correctly):
>
> It seems to me that if these cmap proposals were to be adopted it 
> would almost certainly be at the same time as the >64k GID extensions. 
> The problem isn't running out of GIDs, there are plenty. Instead it's 
> removing the overhead associated with those GIDs in various tables.
>
> We've already discussed that there will be some way of determining the 
> equivalent of the maxp GID count, going forward either explicit or 
> based the length of a certain table. So suppose that we say that any 
> GID higher than that (or, if that's too informal, GIDs higher than the 
> value in some additional font-wide table field, or between the values 
> in two such fields) has the following requirements:
>
>   * It can only appear in CMAP and GSUB
>   * Under any valid combination of active GSUB feature tags, the GID
>     must be substituted into other GIDs less than the usual limit.
>
> This way:
>
>  1. No additional CMAP or GSUB "logic" is necessary for interpreting fonts
>  2. The system already accommodates any relevant form of substitution
>     available in GSUB, now or in the future
>
> The main potential drawback I can see is that validating the GSUB 
> requirement could be tricky, but I'm not sure that iron-clad 
> validation is necessarily a requirement. There are lots of ways that 
> fonts can have bugs that one can't necessarily rule out in advance.
>
> Skef
>
> On 12/21/23 10:00, Peter Constable wrote:
>>
>> It seems to me there’d at least be a compatibility boundary: newer 
>> fonts with 1:m cmap mappings wouldn’t produce desired results in 
>> older software unless the same effect were also implemented in GSUB 
>> lookups.
>>
>> Peter
>>
>> *From:*mpeg-otspec <mpeg-otspec-bounces at lists.aau.at> *On Behalf Of 
>> *Behdad Esfahbod
>> *Sent:* Tuesday, December 12, 2023 8:40 PM
>> *To:* Ned Holbrook <ned at apple.com>
>> *Cc:* mpeg-otspec at lists.aau.at
>> *Subject:* [EXTERNAL] Re: [MPEG-OTSPEC] Cmap format to map 1 char to 
>> multiple glyphs?
>>
>> Fair.  I'll do some measurements and report back if I find something 
>> interesting.
>>
>>
>> behdad
>> http://behdad.org/
>>
>> On Tue, Dec 12, 2023 at 4:01 PM Ned Holbrook <ned at apple.com> wrote:
>>
>>     My main concern with producing multiple glyphs is that it has
>>     substantial API and tooling implications.
>>
>>
>>
>>         On Dec 12, 2023, at 10:57 AM, Behdad Esfahbod
>>         <behdad at behdad.org> wrote:
>>
>>         On Tue, Dec 12, 2023 at 11:51 AM John Hudson <john at tiro.ca>
>>         wrote:
>>
>>             I proposed that to the OT developer list a long while
>>             ago, and recall that Kamal had a similar idea, initially
>>             in terms of handling Unicode decompositions such that
>>             fonts would not need precomposed diacritics. At the time,
>>             Microsoft thought it unlikely to get traction, as it
>>             implied significant engineering for unclear benefit, but
>>             perhaps the benefit is clearer now? As you say, being
>>             able to decompose a Unicode character to an arbitrary
>>             sequence of glyphs is very useful for Arabic, and
>>             by-passes the need to handle such decompositions in GSUB
>>             prior to other shaping.
>>
>>             I suppose the question is whether there is a significant
>>             benefit to doing this outside of GSUB? — or, indeed, if
>>             there might be a reason it would be preferable in GSUB?
>>
>>             The inconsistency in dot handling in different joining
>>             forms of some Arabic characters means that one doesn’t
>>             always want to up-front decompose some characters to base
>>             grapheme and combining dots, but those could be excluded
>>             from the cmap and passed to GSUB form decomposition in
>>             the joining form features. But that being the case, why
>>             not do it all in GSUB?
>>
>>         Thanks John. The main benefit in my opinion is not allocating
>>         a gid to every precomposed Unicode character, most of them
>>         Latin. The Arabic use-case is extra.
>>
>>         b
>>
>>             JH
>>
>>             On 2023-12-12 9:04 am, Behdad Esfahbod wrote:
>>
>>                 Thank you everyone for the very productive meeting.
>>
>>                 I like to also bring this issue up. If there is
>>                 interest, I can work on it. I wrote in my reply to
>>                 Peter earlier:
>>
>>                 /This reminds me of another idea we discussed in, I
>>                 think, 2019, from Monotype to introduce a `cmap`
>>                 subtable that would map individual characters to
>>                 sequences of glyphs. Then the pre-composed Unicode
>>                 characters wouldn't need to have their own glyphs.
>>                 Back then we dropped the idea for backwards-compat
>>                 reasons. But maybe we can pick it up now?/
>>
>>                 This is very useful for Arabic as well...
>>
>>                 behdad
>>                 http://behdad.org/
>>
>>                 _______________________________________________
>>
>>                 mpeg-otspec mailing list
>>
>>                 mpeg-otspec at lists.aau.at
>>
>>                 https://lists.aau.at/mailman/listinfo/mpeg-otspec
>>
>>             -- 
>>
>>             John Hudson
>>
>>             Tiro Typeworks Ltdwww.tiro.com  <http://www.tiro.com/>
>>
>>             Tiro Typeworks is physically located on islands
>>
>>             in the Salish Sea, on the traditional territory
>>
>>             of the Snuneymuxw and Penelakut First Nations.
>>
>>             __________
>>
>>             EMAIL HOUR
>>
>>             In the interests of productivity, I am only dealing
>>
>>             with email towards the end of the day, typically
>>
>>             between 4PM and 5PM. If you need to contact me more
>>
>>             urgently, please use other means.
>>
>>             _______________________________________________
>>             mpeg-otspec mailing list
>>             mpeg-otspec at lists.aau.at
>>             https://lists.aau.at/mailman/listinfo/mpeg-otspec
>>
>>         _______________________________________________
>>         mpeg-otspec mailing list
>>         mpeg-otspec at lists.aau.at <mailto:mpeg-otspec at lists.aau.at>
>>         https://lists.aau.at/mailman/listinfo/mpeg-otspec
>>         <https://lists.aau.at/mailman/listinfo/mpeg-otspec>
>>
>>
>> _______________________________________________
>> mpeg-otspec mailing list
>> mpeg-otspec at lists.aau.at
>> https://lists.aau.at/mailman/listinfo/mpeg-otspec
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20231221/afef07ed/attachment-0001.html>


More information about the mpeg-otspec mailing list