From vladimir.levantovsky at gmail.com Thu Jul 11 22:09:46 2024 From: vladimir.levantovsky at gmail.com (Vladimir Levantovsky) Date: Thu, 11 Jul 2024 16:09:46 -0400 Subject: [MPEG-OTSPEC] Text of ISO/IEC 14496-22/CD is now available for review Message-ID: Dear all, The text of the Committee Draft of the 5th edition ISO/IEC 14496-22/CD "Open Font Format" is now available for download and review: https://www.mpeg.org/wp-content/uploads/mpeg_meetings/146_Rennes/w23797.zip Once the ballot is open, any changes you wish to make in the current text need to be submitted via your respective National Bodies as comments, using the following ISO commenting template . The CD text is available for download as both "tracked changes" and "clean" versions - when submitting comments, please reference the clean version of the text for clarity and correct page number references. Please note that if the proposed change requires a substantial amount of text to be replaced, it is allowed to submit a comment referencing a future input contribution document with detailed proposed changes. Any such document would have to be registered and uploaded in advance - to reserve the document number and provide comments prior to ballot closing date (which is yet to be announced). Please also note that different National Bodies have their own additional processes for comments submission and may impose earlier deadlines than the ballot closing date. Thank you for your contributions and hard work! Vladimir -------------- next part -------------- An HTML attachment was scrubbed... URL: From eb2mmrt at gmail.com Sat Jul 13 01:36:55 2024 From: eb2mmrt at gmail.com (MURATA) Date: Sat, 13 Jul 2024 08:36:55 +0900 Subject: [MPEG-OTSPEC] Text of ISO/IEC 14496-22/CD is now available for review In-Reply-To: References: Message-ID: To the editors, I have not read this CD carefully yet, but I am happy to report that neither "MUST" nor "must" appears in this document. Unfortunately, "may not" is still used. I will try to make JP raise a comment. While editing the OOXML specification (ISO/IEC 29500-1, 2, 3, and 4), ISO instructed SC34/WG4 not to use normative modal verbs (such as "shall", "may", and "should") in non-normative descriptions such as examples and notes. So, I wrote a program for enumerating all such occurrences. I will modify this program and use it for this document. Regards, Makoto 2024?7?12?(?) 5:10 Vladimir Levantovsky via mpeg-otspec < mpeg-otspec at lists.aau.at>: > Dear all, > > The text of the Committee Draft of the 5th edition ISO/IEC 14496-22/CD > "Open Font Format" is now available for download and review: > https://www.mpeg.org/wp-content/uploads/mpeg_meetings/146_Rennes/w23797.zip > > Once the ballot is open, any changes you wish to make in the current text > need to be submitted via your respective National Bodies as comments, using > the following ISO commenting template > . > The CD text is available for download as both "tracked changes" and "clean" > versions - when submitting comments, please reference the clean version of > the text for clarity and correct page number references. > > Please note that if the proposed change requires a substantial amount of > text to be replaced, it is allowed to submit a comment referencing a future > input contribution document with detailed proposed changes. Any such > document would have to be registered and uploaded in advance - to reserve > the document number and provide comments prior to ballot closing date > (which is yet to be announced). > Please also note that different National Bodies have their own additional > processes for comments submission and may impose earlier deadlines than the > ballot closing date. > > Thank you for your contributions and hard work! > Vladimir > > _______________________________________________ > mpeg-otspec mailing list > mpeg-otspec at lists.aau.at > https://lists.aau.at/mailman/listinfo/mpeg-otspec > -- -- ???????????????????? ?? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From lunde at unicode.org Mon Jul 15 05:45:29 2024 From: lunde at unicode.org (Ken Lunde) Date: Sun, 14 Jul 2024 20:45:29 -0700 Subject: [MPEG-OTSPEC] Does a rendering system know if a variation selector requested glyph is not available in a font? In-Reply-To: <87213699.19147171.1719712990690@mail.yahoo.com> References: <459491008.18128671.1719528838728.ref@mail.yahoo.com> <459491008.18128671.1719528838728@mail.yahoo.com> <34D8BB87-C9DF-4E9C-9FA6-A2D562204D1D@unicode.org> <87213699.19147171.1719712990690@mail.yahoo.com> Message-ID: <85E5906C-8AD1-4700-8543-72C15F13186C@unicode.org> Hin-Tak, Apologies for the delay in replying. I spent the last week working on Unicode- and IRG-related matters. I just compared the Format 14 'cmap' subtables of the latest Source Han Sans (Version 2.004) and Source Han Serif (Version 2.002) fonts, and they are "pretty much" in sync. (I developed tools for doing such a comparison.) I found only four differences, which were attributed to whether the sequences are "default" or "non-default," which may actually be a bug. That is no longer my problem, but I can still confidently state that that their UVSes are "pretty much" in sync. I am, however, curious about your "suboptimal" claims. I suspect that the Format 14 'cmap' subtable itself may be suboptimal, as the "IVS Test" project sort of demonstrates, given the sheer size of its 'cmap' table. As the number of UVSes increases, the optimization decreases, or rather, the fact that it is suboptimal becomes more apparent. Regards... -- Ken > On Jun 29, 2024, at 19:03, Hin-Tak Leung wrote: > > > > On Friday 28 June 2024 at 06:12:44 BST, Ken Lunde wrote: > > > > Hin-Tak, > > > For better or worse, I am effectively the caretaker of the history of much of the CJK-related type activities that took place at Adobe over the last 30+ years, to include the development of the Source Han and Noto CJK Pan-CJK typefaces, which are clones of one another. > > > About the observations that you made, particularly about the lookup of UVSes in Source Han being suboptimal, that was intentional. While I have been the IVD Registrar since May of 2011, the registration of virtually all Adobe-Japan1 IVSes was performed by my former Adobe colleague, Eric Muller. I suspect that your observation is about the Variation Selector that is associated with what is deemed the default UVS, meaning that the Format 14 'cmap' subtable defers to the Format 12 (or 4) 'cmap' subtable for the GID. When the first -- and by far, largest -- batch of Adobe-Japan1 IVS were registered in the IVD, it was intentional that the lowest -- by code point order -- Variation Selector was not associated with the UVS that is considered the default (aka encoded) one. This was purposefully done so that implementations would not make such an assumption. > > > BTW, you may be interested in the "IVS Test" project that I started while at Adobe: > > > https://github.com/adobe-fonts/ivs-test/ > > Thanks Ken, for the anecdotes about the development history. I am aware that technical decisions are often made not entirely based on technical considerations. It may not even be optimal at the time, and certainly not on hindsight. It is always interesting to learn how "oddities" come to be. > > It makes a lot of sense to intentionally NOT to associate the lowest variation selector with the default. Technologically it is redundant (one can save one code point by just "spec it out" and remove it and gain the use of one empty slot). A lot of parties are going to argue that they want their favourite as default so "default" in this case is a political minefield too. > > I was curious about the non-optimalness of the format 14 cmap on Adobe Sources Hans Sans, and wonder if they are sync with the Serif font. I.e. two glyph shapes can be non-degenerate and different in the serif font (e.g. a brush stroke tapering from top right to bottom left, vs the reverse tapering from bottom left to top right - they become identical in the Sans font). But I found that the serif font has an entirely different versioning and release schedule. While its UVS table feels more optimal, no conclusion could be drawn from its relationship with the Sans font. There is probably another interesting story there. > > Thanks for the URL for the ivs-test - looks to be an interesting "stress test" benchmarking sample for performance in related software/ code path! > > Regards, > Hin-Tak > From htl10 at users.sourceforge.net Tue Jul 16 02:27:36 2024 From: htl10 at users.sourceforge.net (Hin-Tak Leung) Date: Tue, 16 Jul 2024 00:27:36 +0000 (UTC) Subject: [MPEG-OTSPEC] Does a rendering system know if a variation selector requested glyph is not available in a font? In-Reply-To: <85E5906C-8AD1-4700-8543-72C15F13186C@unicode.org> References: <459491008.18128671.1719528838728.ref@mail.yahoo.com> <459491008.18128671.1719528838728@mail.yahoo.com> <34D8BB87-C9DF-4E9C-9FA6-A2D562204D1D@unicode.org> <87213699.19147171.1719712990690@mail.yahoo.com> <85E5906C-8AD1-4700-8543-72C15F13186C@unicode.org> Message-ID: <1317480633.1427931.1721089656328@mail.yahoo.com> Hi Ken, Apologies, I double-checked - I think you are right about them being more or less in sync. For some reasons I seem to have the mistaken impression that that one is v2... the other v4 and with widely different release dates. Some of the other Adobe Source non-Hans seems to be indeed at v4... As for the sub-optimal claim, I just meant minimal in terms of numbers of references to distinct glyph shapes (and minimal table size). Reading UTS #37 properly, I see that this is very much not the case, and as you said, as the number of UVSes increase, the number of registered collections increases. Or rather, the other way round: as the number of UVSes increase AS A CONSEQUENCE OF more registered collections, there will be partially overlapping collections, and redundancies / duplicated references to exact same shape across different collections, and they will take up more variant selector slots with the same glyph shapes. In fact, if I read UTS #37 correctly (sorry this sounds like asking the author to explain the subtlety/intention/clarification - I see you wrote UTS #37) , as a hypothetical scenario, it is entirely possible for a later version of a font having no new glyphs compared to an earlier version, but just a much larger uvs cmap. And it seems to imply that a (specific versioned instance of) uvs cmap should have a corresponding (specific versioned instance of) IVD_Collections + IVD Sequences? Put it in simpler terms, the "suboptimal" claim about the current construction of Adobe Hans - to get around it, a vendor - say, Google - could register a "web font usage uvs collection" with exactly one IVS per distinct glyph, and ship a font that does not support any other collections? Hidden in there, is the idea that the current Adobe Hans must have a (versioned) list of (versioned) IVS collections it claims to support - and it should be possible to check the implementation of a uvs cmap against that text-based list? I hope this is not too tedious a discussion... Regards?Hin-Tak On Monday 15 July 2024 at 04:51:59 BST, Ken Lunde wrote: Hin-Tak, Apologies for the delay in replying. I spent the last week working on Unicode- and IRG-related matters. I just compared the Format 14 'cmap' subtables of the latest Source Han Sans (Version 2.004) and Source Han Serif (Version 2.002) fonts, and they are "pretty much" in sync. (I developed tools for doing such a comparison.) I found only four differences, which were attributed to whether the sequences are "default" or "non-default," which may actually be a bug. That is no longer my problem, but I can still confidently state that that their UVSes are "pretty much" in sync. I am, however, curious about your "suboptimal" claims. I suspect that the Format 14 'cmap' subtable itself may be suboptimal, as the "IVS Test" project sort of demonstrates, given the sheer size of its 'cmap' table. As the number of UVSes increases, the optimization decreases, or rather, the fact that it is suboptimal becomes more apparent. Regards... -- Ken > On Jun 29, 2024, at 19:03, Hin-Tak Leung wrote: > > > > On Friday 28 June 2024 at 06:12:44 BST, Ken Lunde wrote: > > > > Hin-Tak, > > > For better or worse, I am effectively the caretaker of the history of much of the CJK-related type activities that took place at Adobe over the last 30+ years, to include the development of the Source Han and Noto CJK Pan-CJK typefaces, which are clones of one another. > > > About the observations that you made, particularly about the lookup of UVSes in Source Han being suboptimal, that was intentional. While I have been the IVD Registrar since May of 2011, the registration of virtually all Adobe-Japan1 IVSes was performed by my former Adobe colleague, Eric Muller. I suspect that your observation is about the Variation Selector that is associated with what is deemed the default UVS, meaning that the Format 14 'cmap' subtable defers to the Format 12 (or 4) 'cmap' subtable for the GID. When the first -- and by far, largest -- batch of Adobe-Japan1 IVS were registered in the IVD, it was intentional that the lowest -- by code point order -- Variation Selector was not associated with the UVS that is considered the default (aka encoded) one. This was purposefully done so that implementations would not make such an assumption. > > > BTW, you may be interested in the "IVS Test" project that I started while at Adobe: > > > https://github.com/adobe-fonts/ivs-test/ > > Thanks Ken, for the anecdotes about the development history. I am aware that technical decisions are often made not entirely based on technical considerations. It may not even be optimal at the time, and certainly not on hindsight. It is always interesting to learn how "oddities" come to be. > > It makes a lot of sense to intentionally NOT to associate the lowest variation selector with the default. Technologically it is redundant (one can save one code point by just "spec it out" and remove it and gain the use of one empty slot). A lot of parties are going to argue that they want their favourite as default so "default" in this case is a political minefield too. > > I was curious about the non-optimalness of the format 14 cmap on Adobe Sources Hans Sans, and wonder if they are sync with the Serif font. I.e. two glyph shapes can be non-degenerate and different in the serif font (e.g. a brush stroke tapering from top right to bottom left, vs the reverse tapering from bottom left to top right - they become identical in the Sans font). But I found that the serif font has an entirely different versioning and release schedule. While its UVS table feels more optimal, no conclusion could be drawn from its relationship with the Sans font. There is probably another interesting story there. > > Thanks for the URL for the ivs-test - looks to be an interesting "stress test" benchmarking sample for performance in related software/ code path! > > Regards, > Hin-Tak > -------------- next part -------------- An HTML attachment was scrubbed... URL: From htl10 at users.sourceforge.net Tue Jul 16 02:48:30 2024 From: htl10 at users.sourceforge.net (Hin-Tak Leung) Date: Tue, 16 Jul 2024 00:48:30 +0000 (UTC) Subject: [MPEG-OTSPEC] Does a rendering system know if a variation selector requested glyph is not available in a font? In-Reply-To: <1317480633.1427931.1721089656328@mail.yahoo.com> References: <459491008.18128671.1719528838728.ref@mail.yahoo.com> <459491008.18128671.1719528838728@mail.yahoo.com> <34D8BB87-C9DF-4E9C-9FA6-A2D562204D1D@unicode.org> <87213699.19147171.1719712990690@mail.yahoo.com> <85E5906C-8AD1-4700-8543-72C15F13186C@unicode.org> <1317480633.1427931.1721089656328@mail.yahoo.com> Message-ID: <1651110466.1422018.1721090910465@mail.yahoo.com> Argh, looking at the date again -?https://github.com/adobe-fonts/source-han-serif 2.002 on Aug 2023, while?https://github.com/adobe-fonts/source-han-sans 2.004R, Apr 2021. So the "later" 2.004R release is 2 years earlier than 2.002. That's why I thought mistakenly that they might not be in sync. But I check the uvs data itself, they are more or less the same. On Tuesday 16 July 2024 at 01:29:00 BST, Hin-Tak Leung via mpeg-otspec wrote: Hi Ken, Apologies, I double-checked - I think you are right about them being more or less in sync. For some reasons I seem to have the mistaken impression that that one is v2... the other v4 and with widely different release dates. Some of the other Adobe Source non-Hans seems to be indeed at v4... As for the sub-optimal claim, I just meant minimal in terms of numbers of references to distinct glyph shapes (and minimal table size). Reading UTS #37 properly, I see that this is very much not the case, and as you said, as the number of UVSes increase, the number of registered collections increases. Or rather, the other way round: as the number of UVSes increase AS A CONSEQUENCE OF more registered collections, there will be partially overlapping collections, and redundancies / duplicated references to exact same shape across different collections, and they will take up more variant selector slots with the same glyph shapes. In fact, if I read UTS #37 correctly (sorry this sounds like asking the author to explain the subtlety/intention/clarification - I see you wrote UTS #37) , as a hypothetical scenario, it is entirely possible for a later version of a font having no new glyphs compared to an earlier version, but just a much larger uvs cmap. And it seems to imply that a (specific versioned instance of) uvs cmap should have a corresponding (specific versioned instance of) IVD_Collections + IVD Sequences? Put it in simpler terms, the "suboptimal" claim about the current construction of Adobe Hans - to get around it, a vendor - say, Google - could register a "web font usage uvs collection" with exactly one IVS per distinct glyph, and ship a font that does not support any other collections? Hidden in there, is the idea that the current Adobe Hans must have a (versioned) list of (versioned) IVS collections it claims to support - and it should be possible to check the implementation of a uvs cmap against that text-based list? I hope this is not too tedious a discussion... Regards?Hin-Tak On Monday 15 July 2024 at 04:51:59 BST, Ken Lunde wrote: Hin-Tak, Apologies for the delay in replying. I spent the last week working on Unicode- and IRG-related matters. I just compared the Format 14 'cmap' subtables of the latest Source Han Sans (Version 2.004) and Source Han Serif (Version 2.002) fonts, and they are "pretty much" in sync. (I developed tools for doing such a comparison.) I found only four differences, which were attributed to whether the sequences are "default" or "non-default," which may actually be a bug. That is no longer my problem, but I can still confidently state that that their UVSes are "pretty much" in sync. I am, however, curious about your "suboptimal" claims. I suspect that the Format 14 'cmap' subtable itself may be suboptimal, as the "IVS Test" project sort of demonstrates, given the sheer size of its 'cmap' table. As the number of UVSes increases, the optimization decreases, or rather, the fact that it is suboptimal becomes more apparent. Regards... -- Ken > On Jun 29, 2024, at 19:03, Hin-Tak Leung wrote: > > > > On Friday 28 June 2024 at 06:12:44 BST, Ken Lunde wrote: > > > > Hin-Tak, > > > For better or worse, I am effectively the caretaker of the history of much of the CJK-related type activities that took place at Adobe over the last 30+ years, to include the development of the Source Han and Noto CJK Pan-CJK typefaces, which are clones of one another. > > > About the observations that you made, particularly about the lookup of UVSes in Source Han being suboptimal, that was intentional. While I have been the IVD Registrar since May of 2011, the registration of virtually all Adobe-Japan1 IVSes was performed by my former Adobe colleague, Eric Muller. I suspect that your observation is about the Variation Selector that is associated with what is deemed the default UVS, meaning that the Format 14 'cmap' subtable defers to the Format 12 (or 4) 'cmap' subtable for the GID. When the first -- and by far, largest -- batch of Adobe-Japan1 IVS were registered in the IVD, it was intentional that the lowest -- by code point order -- Variation Selector was not associated with the UVS that is considered the default (aka encoded) one. This was purposefully done so that implementations would not make such an assumption. > > > BTW, you may be interested in the "IVS Test" project that I started while at Adobe: > > > https://github.com/adobe-fonts/ivs-test/ > > Thanks Ken, for the anecdotes about the development history. I am aware that technical decisions are often made not entirely based on technical considerations. It may not even be optimal at the time, and certainly not on hindsight. It is always interesting to learn how "oddities" come to be. > > It makes a lot of sense to intentionally NOT to associate the lowest variation selector with the default. Technologically it is redundant (one can save one code point by just "spec it out" and remove it and gain the use of one empty slot). A lot of parties are going to argue that they want their favourite as default so "default" in this case is a political minefield too. > > I was curious about the non-optimalness of the format 14 cmap on Adobe Sources Hans Sans, and wonder if they are sync with the Serif font. I.e. two glyph shapes can be non-degenerate and different in the serif font (e.g. a brush stroke tapering from top right to bottom left, vs the reverse tapering from bottom left to top right - they become identical in the Sans font). But I found that the serif font has an entirely different versioning and release schedule. While its UVS table feels more optimal, no conclusion could be drawn from its relationship with the Sans font. There is probably another interesting story there. > > Thanks for the URL for the ivs-test - looks to be an interesting "stress test" benchmarking sample for performance in related software/ code path! > > Regards, > Hin-Tak > _______________________________________________ mpeg-otspec mailing list mpeg-otspec at lists.aau.at https://lists.aau.at/mailman/listinfo/mpeg-otspec -------------- next part -------------- An HTML attachment was scrubbed... URL: From lunde at unicode.org Tue Jul 16 03:57:21 2024 From: lunde at unicode.org (Ken Lunde) Date: Mon, 15 Jul 2024 18:57:21 -0700 Subject: [MPEG-OTSPEC] Does a rendering system know if a variation selector requested glyph is not available in a font? In-Reply-To: <1317480633.1427931.1721089656328@mail.yahoo.com> References: <459491008.18128671.1719528838728.ref@mail.yahoo.com> <459491008.18128671.1719528838728@mail.yahoo.com> <34D8BB87-C9DF-4E9C-9FA6-A2D562204D1D@unicode.org> <87213699.19147171.1719712990690@mail.yahoo.com> <85E5906C-8AD1-4700-8543-72C15F13186C@unicode.org> <1317480633.1427931.1721089656328@mail.yahoo.com> Message-ID: Hin-Tak, Not tedious at all. In fact, I might be the only person on the planet who can meaningfully respond to your questions. ? In any case, I now understand your sub-optimal claim, and it has merit. I can explain the background. Adobe-Japan1 was the first IVD collection to be register (in 2007), and the philosophy that was used was to register sequences for *every* ideograph in Adobe-Japan1-6 regardless of whether a particular ideograph had any unencoded variants. The number of registered Adobe-Japan1 sequences is 14,684, but the number of unencoded glyphs is only 1,372. This means that the number of base characters with no variants is on the order 12,000. For example, the common ideograph U+4E00 ?, which includes no variants forms, has a registered Adobe-Japan1 IVS. These are referred to as "default" UVSes, meaning that the Format 14 'cmap' subtable stores only the sequence, not a GID. This means that the default glyph for the base character as specified in the Format 4 or 12 'cmap' subtable should be used to render the sequence. I was not the person who insisted on this, and given that registered IVSes cannot be unregistered, we need to live with it. Luckily, subsequent IVD collections that were registered did not follow this philosophy, and instead register an IVS for a base character only when there are one or more uncoded variants. In retrospect, that should have been done for the Adobe-Japan1 IVD collection. About the Source Han and Noto CJK fonts, their Japanese versions support the Adobe-Japan IVD collection, along with SVSes (Standardized Variation Sequences) that correspond to supported CJK Compatibility Ideographs, slashed zero glyphs, and East Asian punctuation. Anything outside that scope do not use any variation sequences. Note there will be new SVSes for the smart quotes in Unicode Version 16.0, and whether they are supported in the Source Han and Noto CJK fonts is up to our friends at Adobe and Google. See the end of the following page: https://www.unicode.org/alloc/Pipeline.html Regards... -- Ken > On Jul 15, 2024, at 17:27, Hin-Tak Leung wrote: > > Hi Ken, > > Apologies, I double-checked - I think you are right about them being more or less in sync. For some reasons I seem to have the mistaken impression that that one is v2... the other v4 and with widely different release dates. Some of the other Adobe Source non-Hans seems to be indeed at v4... > > As for the sub-optimal claim, I just meant minimal in terms of numbers of references to distinct glyph shapes (and minimal table size). Reading UTS #37 properly, I see that this is very much not the case, and as you said, as the number of UVSes increase, the number of registered collections increases. Or rather, the other way round: as the number of UVSes increase AS A CONSEQUENCE OF more registered collections, there will be partially overlapping collections, and redundancies / duplicated references to exact same shape across different collections, and they will take up more variant selector slots with the same glyph shapes. > > In fact, if I read UTS #37 correctly (sorry this sounds like asking the author to explain the subtlety/intention/clarification - I see you wrote UTS #37) , as a hypothetical scenario, it is entirely possible for a later version of a font having no new glyphs compared to an earlier version, but just a much larger uvs cmap. And it seems to imply that a (specific versioned instance of) uvs cmap should have a corresponding (specific versioned instance of) IVD_Collections + IVD Sequences? > > Put it in simpler terms, the "suboptimal" claim about the current construction of Adobe Hans - to get around it, a vendor - say, Google - could register a "web font usage uvs collection" with exactly one IVS per distinct glyph, and ship a font that does not support any other collections? > > Hidden in there, is the idea that the current Adobe Hans must have a (versioned) list of (versioned) IVS collections it claims to support - and it should be possible to check the implementation of a uvs cmap against that text-based list? > > I hope this is not too tedious a discussion... > > Regards > Hin-Tak > > > > On Monday 15 July 2024 at 04:51:59 BST, Ken Lunde wrote: > > > Hin-Tak, > > Apologies for the delay in replying. I spent the last week working on Unicode- and IRG-related matters. > > I just compared the Format 14 'cmap' subtables of the latest Source Han Sans (Version 2.004) and Source Han Serif (Version 2.002) fonts, and they are "pretty much" in sync. (I developed tools for doing such a comparison.) I found only four differences, which were attributed to whether the sequences are "default" or "non-default," which may actually be a bug. That is no longer my problem, but I can still confidently state that that their UVSes are "pretty much" in sync. > > I am, however, curious about your "suboptimal" claims. I suspect that the Format 14 'cmap' subtable itself may be suboptimal, as the "IVS Test" project sort of demonstrates, given the sheer size of its 'cmap' table. As the number of UVSes increases, the optimization decreases, or rather, the fact that it is suboptimal becomes more apparent. > > Regards... > > -- Ken > > > On Jun 29, 2024, at 19:03, Hin-Tak Leung wrote: > > > > > > > > On Friday 28 June 2024 at 06:12:44 BST, Ken Lunde wrote: > > > > > > > Hin-Tak, > > > > > For better or worse, I am effectively the caretaker of the history of much of the CJK-related type activities that took place at Adobe over the last 30+ years, to include the development of the Source Han and Noto CJK Pan-CJK typefaces, which are clones of one another. > > > > > About the observations that you made, particularly about the lookup of UVSes in Source Han being suboptimal, that was intentional. While I have been the IVD Registrar since May of 2011, the registration of virtually all Adobe-Japan1 IVSes was performed by my former Adobe colleague, Eric Muller. I suspect that your observation is about the Variation Selector that is associated with what is deemed the default UVS, meaning that the Format 14 'cmap' subtable defers to the Format 12 (or 4) 'cmap' subtable for the GID. When the first -- and by far, largest -- batch of Adobe-Japan1 IVS were registered in the IVD, it was intentional that the lowest -- by code point order -- Variation Selector was not associated with the UVS that is considered the default (aka encoded) one. This was purposefully done so that implementations would not make such an assumption. > > > > > BTW, you may be interested in the "IVS Test" project that I started while at Adobe: > > > > > https://github.com/adobe-fonts/ivs-test/ > > > > Thanks Ken, for the anecdotes about the development history. I am aware that technical decisions are often made not entirely based on technical considerations. It may not even be optimal at the time, and certainly not on hindsight. It is always interesting to learn how "oddities" come to be. > > > > It makes a lot of sense to intentionally NOT to associate the lowest variation selector with the default. Technologically it is redundant (one can save one code point by just "spec it out" and remove it and gain the use of one empty slot). A lot of parties are going to argue that they want their favourite as default so "default" in this case is a political minefield too. > > > > I was curious about the non-optimalness of the format 14 cmap on Adobe Sources Hans Sans, and wonder if they are sync with the Serif font. I.e. two glyph shapes can be non-degenerate and different in the serif font (e.g. a brush stroke tapering from top right to bottom left, vs the reverse tapering from bottom left to top right - they become identical in the Sans font). But I found that the serif font has an entirely different versioning and release schedule. While its UVS table feels more optimal, no conclusion could be drawn from its relationship with the Sans font. There is probably another interesting story there. > > > > Thanks for the URL for the ivs-test - looks to be an interesting "stress test" benchmarking sample for performance in related software/ code path! > > > > Regards, > > Hin-Tak > > > From htl10 at users.sourceforge.net Wed Jul 17 04:40:53 2024 From: htl10 at users.sourceforge.net (Hin-Tak Leung) Date: Wed, 17 Jul 2024 02:40:53 +0000 (UTC) Subject: [MPEG-OTSPEC] Does a rendering system know if a variation selector requested glyph is not available in a font? In-Reply-To: References: <459491008.18128671.1719528838728.ref@mail.yahoo.com> <459491008.18128671.1719528838728@mail.yahoo.com> <34D8BB87-C9DF-4E9C-9FA6-A2D562204D1D@unicode.org> <87213699.19147171.1719712990690@mail.yahoo.com> <85E5906C-8AD1-4700-8543-72C15F13186C@unicode.org> <1317480633.1427931.1721089656328@mail.yahoo.com> Message-ID: <1589827470.2148326.1721184053809@mail.yahoo.com> Hi Ken, While discussing implementations of uvs tables fall quite firmly within the topics of the list, I am somewhat conscious that others might not be too interested in specific details about Adobe Source fonts... Anyway. On the non-optimal observation, I am referring to that sometimes there is a coded VS17 or even VS22 (in the case of U+53a9) being the default. And in the two cases with 6 variants, U+51de and U+97ff having VS1 and VS18 mapped to the same non-default gid. I looked up what they are - apparently there are semantic variants, specialized semantic variants, traditional variants and simplified variants - thus the semantic variant / specialized semantic variants might be logically different from the simplified/traditional variant, but "happens" to be of the same shape. I know I am "preaching to the priest", as you wrote UAX #38, which details the unihan variant properties too :-). The?https://www.unicode.org/alloc/Pipeline.html?URL is also interesting reading - I didn't know "left justified form" "right justified form" are a thing. This points to an oversight of mine: I was just looking at duplicate gid entries /shapes in the uvs table, and forgotten about positioning (of the same glyph) could be a variant form. Those could be east asian punctuations. Dropping the default UVS entries would be an instant size saving of a few 10k's (?), with no functional impact. Hindsight is a wonderful thing. Thanks for an interesting discussion. Hin-Tak On Tuesday 16 July 2024 at 03:02:58 BST, Ken Lunde wrote: Hin-Tak, Not tedious at all. In fact, I might be the only person on the planet who can meaningfully respond to your questions. ? In any case, I now understand your sub-optimal claim, and it has merit. I can explain the background. Adobe-Japan1 was the first IVD collection to be register (in 2007), and the philosophy that was used was to register sequences for *every* ideograph in Adobe-Japan1-6 regardless of whether a particular ideograph had any unencoded variants. The number of registered Adobe-Japan1 sequences is 14,684, but the number of unencoded glyphs is only 1,372. This means that the number of base characters with no variants is on the order 12,000. For example, the common ideograph U+4E00 ?, which includes no variants forms, has a registered Adobe-Japan1 IVS. These are referred to as "default" UVSes, meaning that the Format 14 'cmap' subtable stores only the sequence, not a GID. This means that the default glyph for the base character as specified in the Format 4 or 12 'cmap' subtable should be used to render the sequence. I was not the person who insisted on this, and given that registered IVSes cannot be unregistered, we need to live with it. Luckily, subsequent IVD collections that were registered did not follow this philosophy, and instead register an IVS for a base character only when there are one or more uncoded variants. In retrospect, that should have been done for the Adobe-Japan1 IVD collection. About the Source Han and Noto CJK fonts, their Japanese versions support the Adobe-Japan IVD collection, along with SVSes (Standardized Variation Sequences) that correspond to supported CJK Compatibility Ideographs, slashed zero glyphs, and East Asian punctuation. Anything outside that scope do not use any variation sequences. Note there will be new SVSes for the smart quotes in Unicode Version 16.0, and whether they are supported in the Source Han and Noto CJK fonts is up to our friends at Adobe and Google. See the end of the following page: https://www.unicode.org/alloc/Pipeline.html Regards... -- Ken > On Jul 15, 2024, at 17:27, Hin-Tak Leung wrote: > > Hi Ken, > > Apologies, I double-checked - I think you are right about them being more or less in sync. For some reasons I seem to have the mistaken impression that that one is v2... the other v4 and with widely different release dates. Some of the other Adobe Source non-Hans seems to be indeed at v4... > > As for the sub-optimal claim, I just meant minimal in terms of numbers of references to distinct glyph shapes (and minimal table size). Reading UTS #37 properly, I see that this is very much not the case, and as you said, as the number of UVSes increase, the number of registered collections increases. Or rather, the other way round: as the number of UVSes increase AS A CONSEQUENCE OF more registered collections, there will be partially overlapping collections, and redundancies / duplicated references to exact same shape across different collections, and they will take up more variant selector slots with the same glyph shapes. > > In fact, if I read UTS #37 correctly (sorry this sounds like asking the author to explain the subtlety/intention/clarification - I see you wrote UTS #37) , as a hypothetical scenario, it is entirely possible for a later version of a font having no new glyphs compared to an earlier version, but just a much larger uvs cmap. And it seems to imply that a (specific versioned instance of) uvs cmap should have a corresponding (specific versioned instance of) IVD_Collections + IVD Sequences? > > Put it in simpler terms, the "suboptimal" claim about the current construction of Adobe Hans - to get around it, a vendor - say, Google - could register a "web font usage uvs collection" with exactly one IVS per distinct glyph, and ship a font that does not support any other collections? > > Hidden in there, is the idea that the current Adobe Hans must have a (versioned) list of (versioned) IVS collections it claims to support - and it should be possible to check the implementation of a uvs cmap against that text-based list? > > I hope this is not too tedious a discussion... > > Regards > Hin-Tak > > > > On Monday 15 July 2024 at 04:51:59 BST, Ken Lunde wrote: > > > Hin-Tak, > > Apologies for the delay in replying. I spent the last week working on Unicode- and IRG-related matters. > > I just compared the Format 14 'cmap' subtables of the latest Source Han Sans (Version 2.004) and Source Han Serif (Version 2.002) fonts, and they are "pretty much" in sync. (I developed tools for doing such a comparison.) I found only four differences, which were attributed to whether the sequences are "default" or "non-default," which may actually be a bug. That is no longer my problem, but I can still confidently state that that their UVSes are "pretty much" in sync. > > I am, however, curious about your "suboptimal" claims. I suspect that the Format 14 'cmap' subtable itself may be suboptimal, as the "IVS Test" project sort of demonstrates, given the sheer size of its 'cmap' table. As the number of UVSes increases, the optimization decreases, or rather, the fact that it is suboptimal becomes more apparent. > > Regards... > > -- Ken > > > On Jun 29, 2024, at 19:03, Hin-Tak Leung wrote: > > > > > > > > On Friday 28 June 2024 at 06:12:44 BST, Ken Lunde wrote: > > > > > > > Hin-Tak, > > > > > For better or worse, I am effectively the caretaker of the history of much of the CJK-related type activities that took place at Adobe over the last 30+ years, to include the development of the Source Han and Noto CJK Pan-CJK typefaces, which are clones of one another. > > > > > About the observations that you made, particularly about the lookup of UVSes in Source Han being suboptimal, that was intentional. While I have been the IVD Registrar since May of 2011, the registration of virtually all Adobe-Japan1 IVSes was performed by my former Adobe colleague, Eric Muller. I suspect that your observation is about the Variation Selector that is associated with what is deemed the default UVS, meaning that the Format 14 'cmap' subtable defers to the Format 12 (or 4) 'cmap' subtable for the GID. When the first -- and by far, largest -- batch of Adobe-Japan1 IVS were registered in the IVD, it was intentional that the lowest -- by code point order -- Variation Selector was not associated with the UVS that is considered the default (aka encoded) one. This was purposefully done so that implementations would not make such an assumption. > > > > > BTW, you may be interested in the "IVS Test" project that I started while at Adobe: > > > > > https://github.com/adobe-fonts/ivs-test/ > > > > Thanks Ken, for the anecdotes about the development history. I am aware that technical decisions are often made not entirely based on technical considerations. It may not even be optimal at the time, and certainly not on hindsight. It is always interesting to learn how "oddities" come to be. > > > > It makes a lot of sense to intentionally NOT to associate the lowest variation selector with the default. Technologically it is redundant (one can save one code point by just "spec it out" and remove it and gain the use of one empty slot). A lot of parties are going to argue that they want their favourite as default so "default" in this case is a political minefield too. > > > > I was curious about the non-optimalness of the format 14 cmap on Adobe Sources Hans Sans, and wonder if they are sync with the Serif font. I.e. two glyph shapes can be non-degenerate and different in the serif font (e.g. a brush stroke tapering from top right to bottom left, vs the reverse tapering from bottom left to top right - they become identical in the Sans font). But I found that the serif font has an entirely different versioning and release schedule. While its UVS table feels more optimal, no conclusion could be drawn from its relationship with the Sans font. There is probably another interesting story there. > > > > Thanks for the URL for the ivs-test - looks to be an interesting "stress test" benchmarking sample for performance in related software/ code path! > > > > Regards, > > Hin-Tak > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From lunde at unicode.org Wed Jul 17 05:47:35 2024 From: lunde at unicode.org (Ken Lunde) Date: Tue, 16 Jul 2024 20:47:35 -0700 Subject: [MPEG-OTSPEC] Does a rendering system know if a variation selector requested glyph is not available in a font? In-Reply-To: <1589827470.2148326.1721184053809@mail.yahoo.com> References: <459491008.18128671.1719528838728.ref@mail.yahoo.com> <459491008.18128671.1719528838728@mail.yahoo.com> <34D8BB87-C9DF-4E9C-9FA6-A2D562204D1D@unicode.org> <87213699.19147171.1719712990690@mail.yahoo.com> <85E5906C-8AD1-4700-8543-72C15F13186C@unicode.org> <1317480633.1427931.1721089656328@mail.yahoo.com> <1589827470.2148326.1721184053809@mail.yahoo.com> Message-ID: <1ACA29FF-E59D-4011-9445-22A54F64ADB9@unicode.org> Hin-Tak, This is simply due to history. The Adobe-Japan1 IVD collection was the first one to be registered, which was on 2007-12-14. You referenced the use of VS1 (aka U+FE00) and VS18 (aka U+E0101). The difference is that VS1 is among the 16 variation selectors that are used for SVSes (Standardized Variation Sequences) and EVSes (Emoji Variation Sequences), and VS18 is among the 240 variation selectors that are (currently) dedicated for use for the IVD (Ideographic Variation Database). Unicode Version 6.3 (2013) added SVSes for all 1,002 CJK Compatibility Ideographs. Adobe-Japan1?related resources had mappings to CJK Comnpatibility Ideographs, and while they could not serve as base characters per UTS #37, Adobe-Japan1 IVSes were registered for them using their canonical equivalents as base characters: 51DE E0101; Adobe-Japan1; CID+20307 97FF E0101; Adobe-Japan1; CID+13337 I am referencing the "Adobe-Japan1_sequences.txt" data file in the Adobe-Japan1-7 Character Collection project, which is the "source of truth" for that glyph set: https://github.com/adobe-type-tools/Adobe-Japan1/ The following corresponding SVSes were added in Unicode Version 6.3: 51DE FE00; CJK COMPATIBILITY IDEOGRAPH-FA15; 97FF FE00; CJK COMPATIBILITY IDEOGRAPH-FA69; These became the following additional entries in the "Adobe-Japan1_sequences.txt" data file: 51DE FE00; Standardized_Variants; CID+20307 97FF FE00; Standardized_Variants; CID+13337 In other words, for CJK Compatibility Ideographs U+FA15 and U+FA69, there is both an SVS and a registered IVS. IVSes cannot be unregistered, and the SVSes need to be supported. Luckily, none of the subsequent registered IVD collections needed to deal with this, so there was no duplication of SVSes and registered IVSes among their sequences. BTW, there are 89 cases like this. Search for "# 89 Standardized Variants (Unicode 6.3)" in the "Adobe-Japan1_sequences.txt" data file. About SVSes for East Asian punctuation, the first batch was added in Unicode Version 12.0 (2019) based on a proposal that I submitted: 3001 FE00; corner-justified form; # IDEOGRAPHIC COMMA 3001 FE01; centered form; # IDEOGRAPHIC COMMA 3002 FE00; corner-justified form; # IDEOGRAPHIC FULL STOP 3002 FE01; centered form; # IDEOGRAPHIC FULL STOP FF01 FE00; corner-justified form; # FULLWIDTH EXCLAMATION MARK FF01 FE01; centered form; # FULLWIDTH EXCLAMATION MARK FF0C FE00; corner-justified form; # FULLWIDTH COMMA FF0C FE01; centered form; # FULLWIDTH COMMA FF0E FE00; corner-justified form; # FULLWIDTH FULL STOP FF0E FE01; centered form; # FULLWIDTH FULL STOP FF1A FE00; corner-justified form; # FULLWIDTH COLON FF1A FE01; centered form; # FULLWIDTH COLON FF1B FE00; corner-justified form; # FULLWIDTH SEMICOLON FF1B FE01; centered form; # FULLWIDTH SEMICOLON FF1F FE00; corner-justified form; # FULLWIDTH QUESTION MARK FF1F FE01; centered form; # FULLWIDTH QUESTION MARK The bsecond batch is targeted for Unicode Version 16.0, which is scheduled for release in September. Regards... -- Ken > On Jul 16, 2024, at 19:40, Hin-Tak Leung wrote: > > Hi Ken, > > While discussing implementations of uvs tables fall quite firmly within the topics of the list, I am somewhat conscious that others might not be too interested in specific details about Adobe Source fonts... Anyway. > > On the non-optimal observation, I am referring to that sometimes there is a coded VS17 or even VS22 (in the case of U+53a9) being the default. And in the two cases with 6 variants, U+51de and U+97ff having VS1 and VS18 mapped to the same non-default gid. I looked up what they are - apparently there are semantic variants, specialized semantic variants, traditional variants and simplified variants - thus the semantic variant / specialized semantic variants might be logically different from the simplified/traditional variant, but "happens" to be of the same shape. I know I am "preaching to the priest", as you wrote UAX #38, which details the unihan variant properties too :-). > > The > https://www.unicode.org/alloc/Pipeline.html URL is also interesting reading - I didn't know "left justified form" "right justified form" are a thing. This points to an oversight of mine: I was just looking at duplicate gid entries /shapes in the uvs table, and forgotten about positioning (of the same glyph) could be a variant form. Those could be east asian punctuations. > > Dropping the default UVS entries would be an instant size saving of a few 10k's (?), with no functional impact. > > Hindsight is a wonderful thing. Thanks for an interesting discussion. > > Hin-Tak > > > On Tuesday 16 July 2024 at 03:02:58 BST, Ken Lunde wrote: > > > Hin-Tak, > > Not tedious at all. In fact, I might be the only person on the planet who can meaningfully respond to your questions. ? > > In any case, I now understand your sub-optimal claim, and it has merit. I can explain the background. > > Adobe-Japan1 was the first IVD collection to be register (in 2007), and the philosophy that was used was to register sequences for *every* ideograph in Adobe-Japan1-6 regardless of whether a particular ideograph had any unencoded variants. The number of registered Adobe-Japan1 sequences is 14,684, but the number of unencoded glyphs is only 1,372. This means that the number of base characters with no variants is on the order 12,000. For example, the common ideograph U+4E00 ?, which includes no variants forms, has a registered Adobe-Japan1 IVS. These are referred to as "default" UVSes, meaning that the Format 14 'cmap' subtable stores only the sequence, not a GID. This means that the default glyph for the base character as specified in the Format 4 or 12 'cmap' subtable should be used to render the sequence. I was not the person who insisted on this, and given that registered IVSes cannot be unregistered, we need to live with it. > > Luckily, subsequent IVD collections that were registered did not follow this philosophy, and instead register an IVS for a base character only when there are one or more uncoded variants. In retrospect, that should have been done for the Adobe-Japan1 IVD collection. > > About the Source Han and Noto CJK fonts, their Japanese versions support the Adobe-Japan IVD collection, along with SVSes (Standardized Variation Sequences) that correspond to supported CJK Compatibility Ideographs, slashed zero glyphs, and East Asian punctuation. Anything outside that scope do not use any variation sequences. Note there will be new SVSes for the smart quotes in Unicode Version 16.0, and whether they are supported in the Source Han and Noto CJK fonts is up to our friends at Adobe and Google. See the end of the following page: > > https://www.unicode.org/alloc/Pipeline.html > > Regards... > > -- Ken > > > On Jul 15, 2024, at 17:27, Hin-Tak Leung wrote: > > > > Hi Ken, > > > > Apologies, I double-checked - I think you are right about them being more or less in sync. For some reasons I seem to have the mistaken impression that that one is v2... the other v4 and with widely different release dates. Some of the other Adobe Source non-Hans seems to be indeed at v4... > > > > As for the sub-optimal claim, I just meant minimal in terms of numbers of references to distinct glyph shapes (and minimal table size). Reading UTS #37 properly, I see that this is very much not the case, and as you said, as the number of UVSes increase, the number of registered collections increases. Or rather, the other way round: as the number of UVSes increase AS A CONSEQUENCE OF more registered collections, there will be partially overlapping collections, and redundancies / duplicated references to exact same shape across different collections, and they will take up more variant selector slots with the same glyph shapes. > > > > In fact, if I read UTS #37 correctly (sorry this sounds like asking the author to explain the subtlety/intention/clarification - I see you wrote UTS #37) , as a hypothetical scenario, it is entirely possible for a later version of a font having no new glyphs compared to an earlier version, but just a much larger uvs cmap. And it seems to imply that a (specific versioned instance of) uvs cmap should have a corresponding (specific versioned instance of) IVD_Collections + IVD Sequences? > > > > Put it in simpler terms, the "suboptimal" claim about the current construction of Adobe Hans - to get around it, a vendor - say, Google - could register a "web font usage uvs collection" with exactly one IVS per distinct glyph, and ship a font that does not support any other collections? > > > > Hidden in there, is the idea that the current Adobe Hans must have a (versioned) list of (versioned) IVS collections it claims to support - and it should be possible to check the implementation of a uvs cmap against that text-based list? > > > > I hope this is not too tedious a discussion... > > > > Regards > > Hin-Tak > > > > > > > > On Monday 15 July 2024 at 04:51:59 BST, Ken Lunde wrote: > > > > > > Hin-Tak, > > > > Apologies for the delay in replying. I spent the last week working on Unicode- and IRG-related matters. > > > > I just compared the Format 14 'cmap' subtables of the latest Source Han Sans (Version 2.004) and Source Han Serif (Version 2.002) fonts, and they are "pretty much" in sync. (I developed tools for doing such a comparison.) I found only four differences, which were attributed to whether the sequences are "default" or "non-default," which may actually be a bug. That is no longer my problem, but I can still confidently state that that their UVSes are "pretty much" in sync. > > > > I am, however, curious about your "suboptimal" claims. I suspect that the Format 14 'cmap' subtable itself may be suboptimal, as the "IVS Test" project sort of demonstrates, given the sheer size of its 'cmap' table. As the number of UVSes increases, the optimization decreases, or rather, the fact that it is suboptimal becomes more apparent. > > > > Regards... > > > > -- Ken > > > > > On Jun 29, 2024, at 19:03, Hin-Tak Leung wrote: > > > > > > > > > > > > On Friday 28 June 2024 at 06:12:44 BST, Ken Lunde wrote: > > > > > > > > > > Hin-Tak, > > > > > > > For better or worse, I am effectively the caretaker of the history of much of the CJK-related type activities that took place at Adobe over the last 30+ years, to include the development of the Source Han and Noto CJK Pan-CJK typefaces, which are clones of one another. > > > > > > > About the observations that you made, particularly about the lookup of UVSes in Source Han being suboptimal, that was intentional. While I have been the IVD Registrar since May of 2011, the registration of virtually all Adobe-Japan1 IVSes was performed by my former Adobe colleague, Eric Muller. I suspect that your observation is about the Variation Selector that is associated with what is deemed the default UVS, meaning that the Format 14 'cmap' subtable defers to the Format 12 (or 4) 'cmap' subtable for the GID. When the first -- and by far, largest -- batch of Adobe-Japan1 IVS were registered in the IVD, it was intentional that the lowest -- by code point order -- Variation Selector was not associated with the UVS that is considered the default (aka encoded) one. This was purposefully done so that implementations would not make such an assumption. > > > > > > > BTW, you may be interested in the "IVS Test" project that I started while at Adobe: > > > > > > > https://github.com/adobe-fonts/ivs-test/ > > > > > > Thanks Ken, for the anecdotes about the development history. I am aware that technical decisions are often made not entirely based on technical considerations. It may not even be optimal at the time, and certainly not on hindsight. It is always interesting to learn how "oddities" come to be. > > > > > > It makes a lot of sense to intentionally NOT to associate the lowest variation selector with the default. Technologically it is redundant (one can save one code point by just "spec it out" and remove it and gain the use of one empty slot). A lot of parties are going to argue that they want their favourite as default so "default" in this case is a political minefield too. > > > > > > I was curious about the non-optimalness of the format 14 cmap on Adobe Sources Hans Sans, and wonder if they are sync with the Serif font. I.e. two glyph shapes can be non-degenerate and different in the serif font (e.g. a brush stroke tapering from top right to bottom left, vs the reverse tapering from bottom left to top right - they become identical in the Sans font). But I found that the serif font has an entirely different versioning and release schedule. While its UVS table feels more optimal, no conclusion could be drawn from its relationship with the Sans font. There is probably another interesting story there. > > > > > > Thanks for the URL for the ivs-test - looks to be an interesting "stress test" benchmarking sample for performance in related software/ code path! > > > > > > Regards, > > > Hin-Tak > > > > > >