<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>I think everything I said below is a reasonable response, I don't

      think it gets at my real concern, especially given that I have no

      particular objection to dmap in isolation.</p>

    <p>My basic concern is that if we a) add dmap to the spec and b)

      don't say anything else in particular about expected, desired, or

      virtuous font development practices, the result will be that fewer

      fonts<i> </i>will attempt to use the script/langsys mechanism

      correctly, with the effect that in the medium term (at least) font

      consumers will be stuck with loading and picking particular fonts

      for particular language/region use. Even if they have an authoring

      environment that handles the tag switching "correctly", or at

      least effectively.</p>

    <p>Maybe that's fine, maybe that's inevitable. If we truly believe

      that it is inevitable, we should deprecate the mechanism by

      recommending "dflt"-only langsys use. (We probably don't believe

      that with enough certainty.)<br>

    </p>

    <p>What I don't give credit to is any sub-argument about what would

      be "our fault". As in "it's just a mechanism, we didn't tell

      people to use it or how to use it". We'll be indicating to people,

      among other things, that saving file space is a virtue and that

      dmap is a way to save file space. They will deduce for themselves

      that constructing multiple GSUBs, each corresponding to a

      different cmap/dmap, and each then able to remap back to various

      langsys tags, will:</p>

    <ol>

      <li>Use significantly more file space</li>

      <li>Be a pain to construct</li>

      <li>Be a much bigger pain to QA.</li>

    </ol>

    <p>And they will deduce this was not our expectation. So, if we

      don't want foundries to make dflt-only fonts we'll have to offer a

      more workable alternative.<br>

    </p>

    <p>Skef<br>

    </p>

    <div class="moz-cite-prefix">On 1/2/24 16:58, Skef Iterum wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:d81e1fa8-1d63-4e7d-9264-192fce447f1c@skef.org">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <p><br>

      </p>

      <div class="moz-cite-prefix">On 1/2/24 13:12, Peter Constable

        wrote:<br>

      </div>

      <blockquote type="cite"

cite="mid:DS7PR21MB3367EA286C482F8D81AF3592DE612@DS7PR21MB3367.namprd21.prod.outlook.com">

        <meta http-equiv="Content-Type"

          content="text/html; charset=UTF-8">

        <meta name="Generator"

          content="Microsoft Word 15 (filtered medium)">

        <style>@font-face

        {font-family:Wingdings;

        panose-1:5 0 0 0 0 0 0 0 0 0;}@font-face

        {font-family:"Cambria Math";

        panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face

        {font-family:Calibri;

        panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face

        {font-family:Aptos;}@font-face

        {font-family:"Segoe UI";

        panose-1:2 11 5 2 4 2 4 2 2 3;}@font-face

        {font-family:Consolas;

        panose-1:2 11 6 9 2 2 4 3 2 4;}p.MsoNormal, li.MsoNormal, div.MsoNormal

        {margin:0in;

        font-size:12.0pt;

        font-family:"Aptos",sans-serif;}a:link, span.MsoHyperlink

        {mso-style-priority:99;

        color:blue;

        text-decoration:underline;}pre

        {mso-style-priority:99;

        mso-style-link:"HTML Preformatted Char";

        margin:0in;

        font-size:10.0pt;

        font-family:"Courier New";}p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph

        {mso-style-priority:34;

        margin-top:0in;

        margin-right:0in;

        margin-bottom:0in;

        margin-left:.5in;

        font-size:12.0pt;

        font-family:"Aptos",sans-serif;}span.HTMLPreformattedChar

        {mso-style-name:"HTML Preformatted Char";

        mso-style-priority:99;

        mso-style-link:"HTML Preformatted";

        font-family:Consolas;}span.EmailStyle25

        {mso-style-type:personal-compose;

        font-family:"Aptos",sans-serif;

        color:windowtext;}.MsoChpDefault

        {mso-style-type:export-only;

        font-size:10.0pt;

        mso-ligatures:none;}div.WordSection1

        {page:WordSection1;}ol

        {margin-bottom:0in;}ul

        {margin-bottom:0in;}</style><!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]-->

        <div class="WordSection1">

          <p class="MsoNormal"><span style="font-size:11.0pt">Happy

              2024, all!<o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">> </span>My

            worry about using dmap for multi-language/region support…<o:p></o:p></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">First, let

              me repeat: multi-language/region support is not the only

              reason for creating TTCs / not the only motivation for a

              dmap table.<o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">As for

              handling multi-language/region support…<o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">If I

              understand correctly your line of reasoning, you’re OK

              with the idea that distinct font resources (bundled in a

              TTC for de-duplication of data) can be used as the means

              given to the user for selecting language-/region-specific

              glyphs. But instead of cmap/dmap as the implementation

              mechanism to get different glyphs according to the font

              resource that is selected, you want to a mechanism that

              instead integrates with GSUB/GPOS.<o:p></o:p></span></p>

        </div>

      </blockquote>

      <p>Yes<br>

      </p>

      <blockquote type="cite"

cite="mid:DS7PR21MB3367EA286C482F8D81AF3592DE612@DS7PR21MB3367.namprd21.prod.outlook.com">

        <div class="WordSection1">

          <p class="MsoNormal"><span style="font-size:11.0pt"><b>Please

                note that dmap is really orthogonal to your line of

                reasoning</b>. Ignore dmap for a moment: today, a TTC

              can bundle language-/region-specific font resources with

              distinct cmaps as the mechanism for selecting distinct

              glyphs.<o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">So, your

              suggestion is that, instead of distinct cmaps, that we

              could provide a way for the fonts in a TTC to share a

              common cmap and instead have some distinct data that

              triggers different GSUB/GPOS actions in each font

              resulting in selecting language-/region-specific glyphs.

              Like the dmap proposal, this would avoid a lot of

              duplication in cmap data, and the size savings in each

              case would likely be comparable. The key difference

              between these is<o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <ul style="margin-top:0in" type="disc">

            <li class="MsoListParagraph"

              style="margin-left:0in;mso-list:l0 level1 lfo5"><span

                style="font-size:11.0pt">Integration into initial

                character-to-glyph mapping<o:p></o:p></span></li>

          </ul>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">versus<o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <ul style="margin-top:0in" type="disc">

            <li class="MsoListParagraph"

              style="margin-left:0in;mso-list:l0 level1 lfo5"><span

                style="font-size:11.0pt">Integration into OT Layout

                glyph actions that occur after initial

                character-to-glyph mapping (which is typically done in a

                shaping engine).<o:p></o:p></span></li>

          </ul>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">Now, I

              said earlier that TTCs can be created for reasons other

              than <b>multi-language/region support. We could

                generalize the latter to cover any situation provided

                the mechanism doesn’t require use of registered

                script/langsys tags.</b><o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

        </div>

      </blockquote>

      <p>I think that it's an overstatement to say that they're

        "orthogonal". The only reason I can see for adding dmap is to

        reduce file size -- otherwise you could just have separate maps.

        It's true the reasoning I've offered for a GSUB/GPOS mechanism

        is based in multi-language/region support, but as (I think) you

        imply here, it could be extended to other cases via the use of

        reserved or otherwise non-conforming langsys tags. <br>

      </p>

      <p>Generally speaking, distinct cmaps (whether achieved via

        separate tables or dmap) will either be upstream of distinct

        GSUB tables or (rarely) upstream of a font without a GSUB.

        (Constructing a GSUB that will work correctly downstream from

        different cmaps is a strange thing to even attempt.) So in

        practice, by enshrining dmap one more or gives up saving the

        duplicate GSUB space. A typical such GSUB may be smaller, in

        that more of the mapping will be sorted out in the dmap, but the

        work of any optional features will be duplicated. So how

        orthogonal the topics are depends in part on what degree of file

        size savings is desired or expected.<br>

      </p>

      <blockquote type="cite"

cite="mid:DS7PR21MB3367EA286C482F8D81AF3592DE612@DS7PR21MB3367.namprd21.prod.outlook.com">

        <div class="WordSection1">

          <p class="MsoNormal"><span style="font-size:11.0pt">I’ll

              digress for a moment to point out that this situation is

              not entirely unlike the need to handle Unicode variation

              sequences: given a triggering condition (presence of a VS

              character / user selection of a particular font resource)

              some characters need to be mapped to different glyphs. It

              was about 15 years ago that Adobe approached Microsoft to

              work on a solution. In that case, those of us at MS

              thought this could simply be handled in GSUB without

              needing to design any new table formats. But Adobe pushed

              strongly, and convinced us, to create a new table format,

              cmap subtable format 14. While I forget all of the

              details, part of their argument was that this should

              really be handled in the initial character-to-glyph

              mapping, not in OT Layout glyph processing that comes

              later.<o:p></o:p></span></p>

        </div>

      </blockquote>

      <p>I can't speak to past Adobe positions beyond observing that

        companies change their positions all the time. As far as this

        specific issue is concerned, the point that Adobe may have been

        making then and the point I'm making now aren't necessarily in

        conflict. cmap, and thus dmap, are creatures of Unicode

        mappings, and are accordingly a context for working out various

        complex issues of <i>identity</i>. In some cases whether a

        question relates to identity or to, say, "style" is a matter of

        opinion, and people may have strongly felt that for the format

        14 stuff it was the former. <br>

      </p>

      <p>In any case, when there are questions of identity but no

        subsequent questions of style, one can just map in cmap and be

        done with it. When there are further questions of style -- and

        the decision by Unicode to use a common subset of codepoints for

        different CJK languages and regions effectively <i>classifies</i>

        those distinctions as matters of style, broadly speaking -- one

        needs to treat the GID initially cmapped-to as "symbolic", in a

        sense, in order to support multiple such languages/regions in

        the same (sfnt) font. Doing more of that work up front with dmap

        is more of a "you won't have to bother with all that" sort of

        thing, there's no big conceptual leap.<br>

      </p>

      <blockquote type="cite"

cite="mid:DS7PR21MB3367EA286C482F8D81AF3592DE612@DS7PR21MB3367.namprd21.prod.outlook.com">

        <div class="WordSection1">

          <p class="MsoNormal"><span style="font-size:11.0pt">Returning

              to the main topic…<o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">></span>

            Ideally we want fonts that support different scripts and

            languages via [OT Layout]…<o:p></o:p></p>

          <p class="MsoNormal"><o:p> </o:p></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">Is your

              preference for OTL integration because you focused on TTCs

              for multi-language/region support and OTL already has

              script/langsys mechanisms? Or would you prefer OTL

              integration even for TTCs not providing

              multi-language/region support?<o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">If the

              former, then (i) I’d counter that (again)

              multi-language/region support isn’t the only reason for

              creating TTCs.<o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">In either

              case, I’ll ask: Why is OTL integration the ideal approach?<o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p><br>

              </o:p></span></p>

        </div>

      </blockquote>

      <p>I think there are virtues to doing either at the OTL level

        because it will further minimize file size, which is the only

        plausible goal of dmap. So it seems like the argument for dmap

        would therefore be that it fits better into existing toolchains,

        which is probably true. If that's right then the question boils

        down the balance between how much reward we want from tool

        development vs how much we think we can ask of it.</p>

      <blockquote type="cite"

cite="mid:DS7PR21MB3367EA286C482F8D81AF3592DE612@DS7PR21MB3367.namprd21.prod.outlook.com">

        <div class="WordSection1">

          <p class="MsoNormal"><span style="font-size:11.0pt">For VSs,

              Adobe argued the other way around: relying on OTL was less

              preferable to initial character-to-glyph mapping. One

              argument against OTL integration involves

              character-palette UI: today that can be handled using cmap

              data alone. With OTL integration, there’s all of the GSUB

              formats that need to be processed—effectively invoking

              shaping logic. And this applies not just to

              character-palette UIs: platform APIs that return the

              initial character-to-glyph mapping (which are

              independently needed) also need to route through code

              that, to now, has been designed for use only after initial

              character-to-glyph mapping. <o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">It’s

              mixing layers that, for some implementations, could be

              kept cleanly separated. For example, in the GDI text

              stack, the <a

href="https://learn.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-getglyphindicesw"

                moz-do-not-send="true"> GetGlyphIndices()</a>

              implementation would read cmap data directly without

              calling into Uniscribe for OTL processing. Using OTL

              integration to de-dup cmap data in TTCs would require the

              GetGlyphIndices implementation to call into Uniscribe.<o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">So, I

              remain unconvinced this is ideal and would like to

              understand more why you think it is.<o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">></span>

            <span style="font-size:11.0pt"> 4. However, after thinking

              about how GSUB tables are structured one realizes one can

              probably accomplish the same thing without any spec

              changes…<o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">Except you

              haven’t proposed any way to trigger a font-specific

              required glyph substitution as part of the initial

              character-to-glyph mapping, which it seems to me is _<i>the

                essence of what is required</i>_. </span></p>

        </div>

      </blockquote>

      <p>The mechanism I discussed is allowing the lookup table

        associated with DFLT/dflt to be distinct for each TTC entry.

        After thinking about it a bit more one might also want to

        influence the dflt entries for non-DFLT scripts to "match", but

        this is just a matter of a bit more "duplicated" material. This

        GSUB mechanism does raise some further questions about switching

        between language systems in TTCs built this way, but distinct

        cmaps raise similar questions. If a font can't support it it can

        just leave every TTC slot as, in effect, DFLT-dflt-only. <br>

      </p>

      <p>Skef<br>

      </p>

      <blockquote type="cite"

cite="mid:DS7PR21MB3367EA286C482F8D81AF3592DE612@DS7PR21MB3367.namprd21.prod.outlook.com">

        <div class="WordSection1">

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">A simple

              way to do that could be a small, font-specific table that

              provides an index into the GSUB lookup list, with the

              constraint that the lookup must be a type 1(single

              substitution) lookup (else it will be ignored). But I’m

              still not sure this is ideal and preferable to a dmap

              table.<o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt">Peter

              Constable<o:p></o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>

          <div>

            <div

style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">

              <p class="MsoNormal"><b><span

style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span

style="font-size:11.0pt;font-family:"Calibri",sans-serif">

                  mpeg-otspec <a class="moz-txt-link-rfc2396E"

                    href="mailto:mpeg-otspec-bounces@lists.aau.at"

                    moz-do-not-send="true"><mpeg-otspec-bounces@lists.aau.at></a>

                  <b>On Behalf Of </b>Skef Iterum<br>

                  <b>Sent:</b> Wednesday, December 27, 2023 4:19 AM<br>

                  <b>To:</b> Ken Lunde <a class="moz-txt-link-rfc2396E"

                    href="mailto:lunde@unicode.org"

                    moz-do-not-send="true"><lunde@unicode.org></a><br>

                  <b>Cc:</b> <a

class="moz-txt-link-abbreviated moz-txt-link-freetext"

                    href="mailto:mpeg-otspec@lists.aau.at"

                    moz-do-not-send="true">mpeg-otspec@lists.aau.at</a><br>

                  <b>Subject:</b> [EXTERNAL] [MPEG-OTSPEC] Shared

                  GSUB/GPOS notes, was Re: dmap proposal<o:p></o:p></span></p>

            </div>

          </div>

          <p class="MsoNormal"><o:p> </o:p></p>

          <table class="MsoNormalTable" style="width:100.0%"

            width="100%" cellspacing="0" cellpadding="0" border="0"

            align="left">

            <tbody>

              <tr>

                <td

style="background:#A6A6A6;padding:5.25pt 1.5pt 5.25pt 1.5pt"><br>

                </td>

                <td

style="width:100.0%;background:#EAEAEA;padding:5.25pt 3.75pt 5.25pt 11.25pt"

                  width="100%">

                  <div>

                    <p class="MsoNormal"

style="mso-element:frame;mso-element-frame-hspace:2.25pt;mso-element-wrap:around;mso-element-anchor-vertical:paragraph;mso-element-anchor-horizontal:column;mso-height-rule:exactly">

                      <span

style="font-size:9.0pt;font-family:"Segoe UI",sans-serif;color:#212121">You

                        don't often get email from <a

                          href="mailto:skef@skef.org"

                          moz-do-not-send="true"

                          class="moz-txt-link-freetext">skef@skef.org</a>.

                        <a

href="https://aka.ms/LearnAboutSenderIdentification"

                          moz-do-not-send="true"> Learn why this is

                          important</a><o:p></o:p></span></p>

                  </div>

                </td>

                <td

style="width:56.25pt;background:#EAEAEA;padding:5.25pt 3.75pt 5.25pt 3.75pt;align:left"

                  width="75"> <br>

                </td>

              </tr>

            </tbody>

          </table>

          <p>Some preliminary notes on an idea I'm looking info,

            starting from this line of reasoning:<o:p></o:p></p>

          <ol type="1" start="1">

            <li class="MsoNormal"

style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l1 level1 lfo1">

              My worry about using dmap for multi-language/region

              support is that the solution is separate from the

              script/language GSUB/GPOS mechanism. Ideally we want fonts

              that support different scripts and languages via the

              latter, and doing so while starting with different initial

              cmaps is a lot of work and QA.<o:p></o:p></li>

            <li class="MsoNormal"

style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l1 level1 lfo1">

              This leads to the idea of allowing a TTC slot to pick a

              script and language to serve as the default that I brought

              up last week, but that will face the objection that it

              violates the current semantic properties of the TTC font

              collection spec: TTC currently works only at the SFNT

              level, not below.<o:p></o:p></li>

            <li class="MsoNormal"

style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l1 level1 lfo1">

              That leads naturally to the prospect of "DSUB": the

              metaphorical equivalent of dmap but at the GSUB level. All

              this would need to do is specify a "pseudo-default" for

              GSUB: act like this script and this language were the

              defaults so they're used unless one is specified. <o:p></o:p></li>

            <li class="MsoNormal"

style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l1 level1 lfo1">

              However, after thinking about how GSUB tables are

              structured one realizes one can probably accomplish the

              same thing without any spec changes. <o:p></o:p></li>

          </ol>

          <p>Why? In GSUB and GPOS all offset fields below the header

            are "hierarchical" -- each is relative to the start of the

            subtable it appears in. And the header is basically a short

            list of offsets (relative to the start of the table).

            Together this means that one should be able to do the

            following with (e.g.) GSUB:<o:p></o:p></p>

          <ol type="1" start="1">

            <li class="MsoNormal"

style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l2 level1 lfo2">

              Move the table, and those below it, a bit further down in

              the font file<o:p></o:p></li>

            <li class="MsoNormal"

style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l2 level1 lfo2">

              Add a new GSUB header above the existing one pointing to

              (literally) the same featureList, lookupList and, if

              relevant, featureVariations tables. (Each offset being

              increased by the difference in start of the two headers.<o:p></o:p></li>

            <li class="MsoNormal"

style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l2 level1 lfo2">

              Add a new top-level ScriptList table below the new header,

              with all offsets adjusted similarly except those for DFLT,

              which points to a new Script table below it. <o:p></o:p></li>

            <li class="MsoNormal"

style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l2 level1 lfo2">

              Add the new ScriptList table, with all offsets adjusted

              similarly except those for defaultLangSys, which is

              adjusted to point to the LangSys table of a different

              language (within the original GSUB table).<o:p></o:p></li>

          </ol>

          <p>Now, with a very modest amount of added memory, you have

            two GSUB tables -- one with the original mapping for DFLT

            dflt and one with a new mapping for those. The latter will

            include some "junk" bytes (the former's header, ScriptList

            and DFLT Script tables) but nothing in it will make any use

            of those areas. (I haven't yet tested whether ots and such

            will complain about that.) And you can do this for more

            languages by adding more such table combinations, limited

            only by the Offset16 fields in the header. (One could, of

            course, repeat the whole GSUB table to buy more overlapping

            table sets if needed.)<o:p></o:p></p>

          <p>With similar modifications to GPOS (when needed), I think

            all that one needs to build out the language-specific TTC

            slots is:<o:p></o:p></p>

          <ol type="1" start="1">

            <li class="MsoNormal"

style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l4 level1 lfo3">

              Per-slot head tables (to get the checksums right -- would

              be nice if this wasn't required)<o:p></o:p></li>

            <li class="MsoNormal"

style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l4 level1 lfo3">

              Per-slot name tables (although for this one could add the

              font-specific name strings to the end of the string data

              and do something similar to GSUB with the NameRecord

              array, sharing the string storage)<o:p></o:p></li>

          </ol>

          <p>All other tables, including cmap, would be shared in the

            normal way.<o:p></o:p></p>

          <p>To be clear, unless a picky client-side validator barfs on

            these conventions I suspect one could build a cross-language

            TTC font collection in this way <i>today</i>, minimizing

            the memory cost of the additional slots. I'm currently

            poking at constructing an example to make this more

            concrete, but, of course, existing tools aren't designed

            with this sort of thing in mind. I'll send another note if

            and when I make progress.<o:p></o:p></p>

          <p>Skef<o:p></o:p></p>

          <div>

            <p class="MsoNormal">On 12/21/23 23:58, Skef Iterum wrote:<o:p></o:p></p>

          </div>

          <p>I stand dystopianed. <o:p></o:p></p>

          <p>However, to not yet give up entirely on this line of

            thinking ...<o:p></o:p></p>

          <p>What is on the table in these messages is a further

            extension of an existing table, in this case cmap. Which at

            least suggests that the problem here isn't "system-level"

            support -- we think we can get those changes. What you

            describe is, loosely speaking, "application level" support

            -- allowing the context that the user interacts with to

            specify the needed parameters, and then educating the user

            to do so.<o:p></o:p></p>

          <p>I agree that's hopeless for the foreseeable future. <o:p></o:p></p>

          <p>These dmap ideas do have the benefit of being <i>somewhat</i>

            general (although one might worry about unusual cases).

            Maybe other compelling use cases, or just the value of

            generality itself, justify such an extension. Still, if the

            fundamental problems are what you describe, we might also

            consider addressing them directly and specifically. Instead

            of extending cmap, and building region- or language-specific

            fonts via a separate mechanism, we should at least consider

            extending TTC to associate a named subfont with the missing

            parameters. Basically: "render this set of tables using this

            script and this language by default". Done a bit subtly, one

            could just ship every cross-language font file with a "base"

            font with just the name, and some entries for other scripts

            and language, suitably named, and otherwise sharing TTC

            data-structures. <o:p></o:p></p>

          <p>From the perspective of the font engineer that seems more

            productive than building a cross-language font with one set

            of mechanisms and then building multiple data-sharing

            individual language fonts using a different mechanism

            (assuming we still want engineers to do the former).<o:p></o:p></p>

          <p>Skef<o:p></o:p></p>

          <div>

            <p class="MsoNormal">On 12/21/23 18:15, Ken Lunde wrote:<o:p></o:p></p>

          </div>

          <pre>Skef,<o:p></o:p></pre>

          <pre><o:p> </o:p></pre>

          <pre>I might be the only one in this discussion who clearly remembers that Version 1.000 of Source Han Sans and Noto Sans CJK, which were released on 2014-07-15, *was* utopian in that the fonts with the full set of 64K glyphs, meaning genuine Pan-CJK, expected that language tagging would be used to access the desired non-default region-specific glyphs, with the default glyphs being for Japan. Reality quickly taught us that expecting language tagging alone to solve this was completely unrealistic for the following three reasons:<o:p></o:p></pre>

          <pre><o:p> </o:p></pre>

          <pre>1) The app must support language tagging<o:p></o:p></pre>

          <pre>2) The app must support language tagging for the appropriate East Asian languages, which is now up to five for these Pan-CJK fonts<o:p></o:p></pre>

          <pre>3) Assuming #1 and #2 work, the user must then language-tag the text<o:p></o:p></pre>

          <pre><o:p> </o:p></pre>

          <pre>Going on 10 years later, not much has changed for #1 and #2.<o:p></o:p></pre>

          <pre><o:p> </o:p></pre>

          <pre>Modern browsers supported the 'locl' GSUB feature way back in 2014, but support in authoring apps is still severely lacking today.<o:p></o:p></pre>

          <pre><o:p> </o:p></pre>

          <pre>I use Adobe InDesign to get full language-tagging support for these fonts, which is still about the only game in town. Adobe Illustrator silently added East Asian language-tagging in the 2018 release (in 2017), but it was a "close but no cigar" outcome in that they added only "Chinese" (that turned out to be Traditional Chinese for Taiwan) and Japanese, and despite filing bugs over five years ago, Adobe Illustrator 2024 (in 2023) is still unchanged in this regard. What makes the current support even less useful for mainstream users, ignoring that three of the five East Asian regions are not supported at all, is that the two supported East Asian regions are visible only when creating Character or Paragraph styles. They are not shown in the list of languages in the Character or Properties panels. Adobe Photoshop 2024 (in 2023) still does not support language tagging for East Asian languages.<o:p></o:p></pre>

          <pre><o:p> </o:p></pre>

          <pre>Getting back to Source Han Sans and Noto Sans CJK, Version 1.001 was released on 2014-09-12, which added separate 64K-glyph fonts for each of the four (at the time) supported East Asian regions. The 'locl' feature is still included for the benefit of those environments that support language tagging. All five regions were not supported until Version 2.000, which was released on 2018-11-19, which meant five separate sets of 64K-glyph fonts. The fifth region, of course, was Hong Kong SAR.<o:p></o:p></pre>

          <pre><o:p> </o:p></pre>

          <pre>In other words, we are quite far from Utopia, and we are unlikely to arrive there anytime soon.<o:p></o:p></pre>

          <pre><o:p> </o:p></pre>

          <pre>Regards...<o:p></o:p></pre>

          <pre><o:p> </o:p></pre>

          <pre>-- Ken<o:p></o:p></pre>

          <pre><o:p> </o:p></pre>

          <pre>On Dec 21, 2023, at 17:04, Skef Iterum <a

          href="mailto:skef@skef.org" moz-do-not-send="true"><skef@skef.org></a> wrote:<o:p></o:p></pre>

          <pre><o:p> </o:p></pre>

          <pre>More stuff after hitting send too fast:<o:p></o:p></pre>

          <pre>I can see a set of arguments against trying to deal with these regional problems within a single mega-font grounded one way or another in GIDs being a limited resource. But we've already decided to overcome that problem. So, for example, if we need to spend a GID to, in effect, abstractly represent a given codepoint to bridge from cmap into the shaping tables, we have GIDs to spend now. (And, as implied in my other messages today, wouldn't necessarily have to pay the typical file overhead for them.)<o:p></o:p></pre>

          <pre>As I understand it that's how regional variations in, e.g., Cyrillic are handled now. So I guess, other than the large number of glyphs in CJK fonts I'm not understanding what requirements are pushing the solution in such a different (and seemingly ad hoc) direction.<o:p></o:p></pre>

          <pre>Skef<o:p></o:p></pre>

          <pre>On 12/21/23 16:49, Skef Iterum wrote:<o:p></o:p></pre>

          <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">

            <pre>Maybe I'm being utopian but I can't help thinking that either there's some token ("dialect"?) that Unicode should be tracking and formalizing but isn't, or Unicode is doing that and we haven't tilted the font specifications enough in its direction to use it. There's already all of that script and language infrastructure there that is meant for this flavor of need, and it seems like a much better place to be solving these problems than rapping stuff up in a TTC and having the client side pick out the sub-font by name or whatever.<o:p></o:p></pre>

            <pre>Skef<o:p></o:p></pre>

            <pre>On 12/21/23 15:00, Peter Constable wrote:<o:p></o:p></pre>

            <blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">

              <pre>During the recent AHG meeting, I mentioned that Apple, Adobe and Microsoft, some years ago, had started discussing a ‘dmap’ (delta character map) table proposal. This was in late fall of 2016; the focus was on pan-CJK fonts, and in that timeframe Ken Lunde has submitted a proposal to UTC (L2/16-063 Proposal to accept the submission to register the “PanCJKV” IVD collection) to define variation sequences for ideographs that designated a range of variation selector characters to correspond to several regions for which regional glyph variants of CJK ideographs might need to be supported. I managed to find an archive of some emails from discussions at the time, so can summarize:<o:p></o:p></pre>

              <pre> The aim was to be able to support distinct fonts for regional CJK variants without duplication of data. A TTC could allow de-duplication of glyph data, but there would be other duplication. We agreed the biggest concern was with ‘cmap’ data: If any one of the regional variant fonts in the collection were taken as a point of reference, then any of the other regional variants would have many of the same mappings (perhaps most), though not all the same mappings. But there wasn’t any existing means to share common mappings across fonts while there were also some different mappings. Dwane Robinson suggested that we define a new ‘dmap’ table that uses ‘cmap’ formats but is just used to describe the differences in mappings from a common ‘cmap’.  We agreed that a ‘dmap’ table doesn’t need the duplication of different platforms/encodings, and that we can converge on only one platform/encoding (hence, no encoding records are necessary). We discussed format 4 versus 12, and agreed to allow either, but that both are never required. Now, we had teleconfs between Apple and MS, but the emails I found indicate that Behdad was also kept informed: one of the emails records that Behdad requested that format 13 also be allowed.<o:p></o:p></pre>

              <pre> We hadn’t settled, however, on what to do about format 14 subtables. It wasn’t a priority for Apple at the time, but it seemed like it would be incomplete if we ignored it. Knowing that Ken Lunde was dealing a lot with VSes and also working on pan CJK Source Han Sans CJK, we brought Adobe into our discussion at that point.<o:p></o:p></pre>

              <pre> The issue with format 14 is that it divides variation sequences into two groups: (i) VSes that map to the same glyph already mapped in a format 4 or 12 subtable (DefaultUVS), and (ii) VSes that map to a different glyph. Certainly the default mappings would be different in the various regional variant fonts, and some of the non-default mappings could also be different. (Even if a given VS never mapped to different glyphs in the different fonts, the fonts could still differ in what VSes they need to support.) So it’s necessary to resolve how a dmap/14 subtable should interacts with a cmap/4 (or cmap/12) subtable, with a cmap/14 subtable, with a dmap/4 (or dmap/12) subtable, and with a dmap/14 subtable. One possible approach would be that the dmap/14 subtable completely supersedes the cmap/14 subtable (i.e., the latter is not used at all, and there is no de-duplication of that data). Another approach could be that a dmap/14 subtable complements the cmap/14 subtable by providing select replacement mappings (a delta—though there are still further details about how that would work exactly).<o:p></o:p></pre>

              <pre> There were some useful points brought up along the way:  <o:p></o:p></pre>

              <pre>    • Ned Holbrook pointed out that the format 14 DefaultUVS subtable is just a space-saving variant of the NonDevaultUVS subtable. A font doesn’t need to have any DefaultUVS table: the same sequences could be handled in NonDefaultUVS subtables — less efficiently… _in a single font_.<o:p></o:p></pre>

              <pre>    • For CJK, Ken Lunde pointed out that there are two kinds of UVSes to consider: <o:p></o:p></pre>

              <pre>        • “Standardized” VSs: these are defined in the Unicode Standard (see unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt) for CJK Compatibility Ideographs. They are defined in Unicode in a region-independent manner, but most represent region-specific glyphs.<o:p></o:p></pre>

              <pre>        • “Ideographic” VSes: these are VSes registered in the Ideographic Variation Database (Ideographic Variation Database (unicode.org)) in region-specific collections. <o:p></o:p></pre>

              <pre>Because of the nature of each type, Ken thought there might be limited sharing across fonts. (E.g., at least some font developers would want to support a given IVS collection only in the one regional font for the corresponding region.) He did identify cases, however, in which the same SVS would need to map to different glyphs in different fonts.<o:p></o:p></pre>

              <pre>    • Again, for CJK, there would be cases in which different fonts would need to support the same VSes, but they would differ wrt DefaultUVS vs. NonDefaultUVS mappings.<o:p></o:p></pre>

              <pre> Ken also called out some other uses in email exchanges. It all suggested that an ideal solution would make it possible to construct a collection file in which  - two or more fonts can share some UVS mapping data while also having some font-specific mapping data; and<o:p></o:p></pre>

              <pre>- it's also possible to have other fonts that do not share any UVS mapping data with other fonts.<o:p></o:p></pre>

              <pre> That would allow the fonts to support only UVSs that are relevant for their respective markets, while also having an efficiency benefit from data-sharing between certain of the fonts.<o:p></o:p></pre>

              <pre> That was in December 2016. We ran into end-of-year holidays and never resumed to closed on an approach that optimizes size of VS mapping data.  The following is the last draft proposal that we exchanged.  —-<o:p></o:p></pre>

              <pre>dmap - Character to Glyph Index Differences Table<o:p></o:p></pre>

              <pre> This table is an optional adjunct to the ‘cmap’ table defining differences from the nominal mappings in order to increase sharing of the ‘cmap’ itself across fonts in a TTC.<o:p></o:p></pre>

              <pre> If a font production tool determines that the ‘cmap’ tables across the fonts in a TTC are largely but not entirely identical, it can choose one font to be used as the basis for the others in terms of character to glyph index mapping, expressing the mappings of the other fonts using only the mappings that are different from those of the former font. An example would be a CJK font family with region-specific fonts, where most characters would map to the same glyph index.<o:p></o:p></pre>

              <pre> The ‘dmap’ table<o:p></o:p></pre>

              <pre> Type Name Description<o:p></o:p></pre>

              <pre>UInt16 version Set to 0.<o:p></o:p></pre>

              <pre>UInt16 numTables Number of offset fields to follow.<o:p></o:p></pre>

              <pre>UInt32 offset[numTables] Array of byte offsets from beginning of table to cmap subtables. All subtables are assumed to use Unicode. There can be at most one subtable of either format 4, 12, or 13.<o:p></o:p></pre>

              <pre> As in the ‘cmap’ table, each ‘dmap’ subtable shall have the same structure as in ‘cmap’, starting with a format field that determines the remainder. The language field for a format 4, 12, or 13 subtable must be set to zero.<o:p></o:p></pre>

              <pre> The steps for determining the glyph index for a given UVS consisting of a base character and optional variation selector are as follows:<o:p></o:p></pre>

              <pre> <o:p></o:p></pre>

              <pre>    • Apply the Unicode ‘cmap’ subtable to the base character to get the nominal glyph index.<o:p></o:p></pre>

              <pre>    • If the font has a ‘dmap’ format 4 or 12 subtable that maps the base character to a non-zero glyph index, it will replace the nominal glyph index.<o:p></o:p></pre>

              <pre>    • If the ‘cmap’ has a format 14 subtable, apply it in this way: <o:p></o:p></pre>

              <pre>3.1.If the Default UVS Table contains the base character, the final glyph index will the be one determined by the ‘cmap’.<o:p></o:p></pre>

              <pre>3.2.Else if the Non-Default UVS Table contains the base character, it will determine the final glyph index.<o:p></o:p></pre>

              <pre>3.3.Else the final glyph index will remain as it was after step 2.<o:p></o:p></pre>

              <pre> Note: An earlier draft of this document allowed for a second subtable of format 14, which would allow redefinition of variation sequences. Owing to uncertainty about usefulness and the exact behavior of the Default UVS Table, however, it has been removed pending further discussion.<o:p></o:p></pre>

              <pre> —<o:p></o:p></pre>

              <pre> In the previous draft, a different set of steps for handling UVSes were considered:<o:p></o:p></pre>

              <pre> —<o:p></o:p></pre>

              <pre>The steps for determining the glyph index for a given UVS consisting of a base character and optional variation selector are as follows:<o:p></o:p></pre>

              <pre> 1. Apply the ‘cmap’ to the base character to get the nominal glyph index.<o:p></o:p></pre>

              <pre>2. If the font has a ‘dmap’ format 4 or 12 subtable that maps the base character to a non-zero glyph index, it will replace the nominal glyph index.<o:p></o:p></pre>

              <pre>3. If the ‘dmap’ has a format 14 subtable, it will be used in place of the one in the ‘cmap’.<o:p></o:p></pre>

              <pre>4. If there is a format 14 subtable, apply it in this way:<o:p></o:p></pre>

              <pre>4.1.If the Default UVS Table contains the base character, the final glyph index will the be one determined by the ‘cmap’.<o:p></o:p></pre>

              <pre>4.2.Else if the Non-Default UVS Table contains the base character, it will determine the final glyph index.<o:p></o:p></pre>

              <pre>4.3.Else the final glyph index will remain as it was after step 2.<o:p></o:p></pre>

              <pre> —<o:p></o:p></pre>

              <pre>  Peter<o:p></o:p></pre>

              <pre> <o:p></o:p></pre>

              <pre>_______________________________________________<o:p></o:p></pre>

              <pre>mpeg-otspec mailing list<o:p></o:p></pre>

              <pre><a href="mailto:mpeg-otspec@lists.aau.at"

              moz-do-not-send="true" class="moz-txt-link-freetext">mpeg-otspec@lists.aau.at</a><o:p></o:p></pre>

              <pre><a

              href="https://lists.aau.at/mailman/listinfo/mpeg-otspec"

              moz-do-not-send="true" class="moz-txt-link-freetext">https://lists.aau.at/mailman/listinfo/mpeg-otspec</a><o:p></o:p></pre>

              <pre><o:p> </o:p></pre>

            </blockquote>

          </blockquote>

          <pre>_______________________________________________<o:p></o:p></pre>

          <pre>mpeg-otspec mailing list<o:p></o:p></pre>

          <pre><a href="mailto:mpeg-otspec@lists.aau.at"

          moz-do-not-send="true" class="moz-txt-link-freetext">mpeg-otspec@lists.aau.at</a><o:p></o:p></pre>

          <pre><a

          href="https://lists.aau.at/mailman/listinfo/mpeg-otspec"

          moz-do-not-send="true" class="moz-txt-link-freetext">https://lists.aau.at/mailman/listinfo/mpeg-otspec</a><o:p></o:p></pre>

        </div>

      </blockquote>

      <br>

      <fieldset class="moz-mime-attachment-header"></fieldset>

      <pre class="moz-quote-pre" wrap="">_______________________________________________

mpeg-otspec mailing list

<a class="moz-txt-link-abbreviated" href="mailto:mpeg-otspec@lists.aau.at">mpeg-otspec@lists.aau.at</a>

<a class="moz-txt-link-freetext" href="https://lists.aau.at/mailman/listinfo/mpeg-otspec">https://lists.aau.at/mailman/listinfo/mpeg-otspec</a>

</pre>

    </blockquote>

  </body>

</html>