[MPEG-OTSPEC] Does a rendering system know if a variation selector requested glyph is not available in a font?

Fri Jun 28 04:21:01 CEST 2024

Hin-Tak,

For better or worse, I am effectively the caretaker of the history of much of the CJK-related type activities that took place at Adobe over the last 30+ years, to include the development of the Source Han and Noto CJK Pan-CJK typefaces, which are clones of one another.

About the observations that you made, particularly about the lookup of UVSes in Source Han being suboptimal, that was intentional. While I have been the IVD Registrar since May of 2011, the registration of virtually all Adobe-Japan1 IVSes was performed by my former Adobe colleague, Eric Muller. I suspect that your observation is about the Variation Selector that is associated with what is deemed the default UVS, meaning that the Format 14 'cmap' subtable defers to the Format 12 (or 4) 'cmap' subtable for the GID. When the first -- and by far, largest -- batch of Adobe-Japan1 IVS were registered in the IVD, it was intentional that the lowest -- by code point order -- Variation Selector was not associated with the UVS that is considered the default (aka encoded) one. This was purposefully done so that implementations would not make such an assumption.

BTW, you may be interested in the "IVS Test" project that I started while at Adobe:

https://github.com/adobe-fonts/ivs-test/

Regards...

-- Ken

> On Jun 27, 2024, at 16:53, Hin-Tak Leung via mpeg-otspec <mpeg-otspec at lists.aau.at> wrote:
> 
> This is probably fairly well-known among Adobe folks, and perhaps Google Noto folks too. I have added a little more code to the submitted example on Adobe Source Han Sans JP to dump some UVS statistics (hence probably applies to Noto CJK too). The current usage of it is this: just under 60,000 are base/canonical(?) single character glyphs. About 1400 characters maps to multiple glyphs via variant selectors. The highest is 15, the 2nd highest then is 8, and many with 2 to 3 variants. I guess the average for characters which have variants is under 4, and 60,000 + 4 × 1400 ~ 65600 > 65535 .
> (We are getting over 64k glyph soon... hurray!)
> 
> I have a look at some of them myself - some of the characters having variants are quite common - e.g. the "loong" character (as they tell you this year is the "year of loong", rather than "year of dragon", in Chinese Zodiac... the chinese loong is a majestic creature and quite different from the evil western dragon...) and first name of the pianist Lang-Lang (the surname and first name are transliterated to the same English phrase but different characters, and one of them have a few glyph variants). They aren't really exotic variants - most native people would recognise and accept the different variants as valid, while having an individual/regional choice of which to use. A bit like spelling "favourite/favorite" etc.
> 
> The order/numbering of the variants are a bit ad-hoc though (and it differs from have 2 to having 15), so it is probably going to be vendor and also font version specific. And remember 1400 is a small number compare to 60,000.
> 
> Back to the original question - it is pretty fast computationally to see glyph id for character with or without selector agree, or missing. It is more a UI/application issue than the rendering system's.
> 
> I don't quite get the construction of Adobe Source Hans Sans - the look-up is not minimal - I.e. not all selectors are distinct, some just map back to the "base" glyph - and it is not exhaustive either (filling in the "upper" selectors by mapping to the base). I don't expect the latter to be the case, as it wastes spaces, but I sort of expect the former - I.e. the selectors should be distinct and minimal.
> _______________________________________________
> mpeg-otspec mailing list
> mpeg-otspec at lists.aau.at
> https://lists.aau.at/mailman/listinfo/mpeg-otspec