<!DOCTYPE html>

<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body>

    <p>I’ve begun leaning the other way: perhaps we should ditch the

      multivalency of langsys, restrict it to tags that map to

      languages, and only use it in that way?</p>

    <p>JH<br>

    </p>

    <p><br>

    </p>

    <div class="moz-cite-prefix">On 2023-12-27 9:42 pm, Skef Iterum

      wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:70e94944-27dc-43d8-b1b0-98a5150b8d9a@skef.org">

      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

      <p>This all makes sense, but is what I was getting at in my

        earlier message when I said (as one horn of alternatives)

        "there's some token ("dialect"?) that Unicode should be tracking

        and formalizing but isn't". If what we need to track is specific

        enough to point a user at the right font, it should be specific

        enough to assign a token to to use as a langsys, or some

        successor of a langsys. It seems better to me to try to get that

        worked out and up to date than to just let the current system

        rot relative to actual usage.</p>

      <p>Is the current system so inflexible (in terms of "registry" or

        whatever) that it's not possible to get some new tags allocated

        to match the regions we would be building ttc-type fonts for?</p>

      <p>As far as multiple options go, that sounds fine to me as long

        as a good faith and ongoing effort is being made to make the

        different options viable. Whereas it sounds a little like dmap

        is a bit of a "here's a hack so we can just not worry about that

        other stuff" sort of thing.<br>

      </p>

      <p>Skef<br>

      </p>

      <div class="moz-cite-prefix">On 12/27/23 17:49, John Hudson wrote:<br>

      </div>

      <blockquote type="cite"

        cite="mid:29ed5749-713f-47b9-abef-8ee88d0db466@tiro.ca">

        <meta http-equiv="Content-Type"

          content="text/html; charset=UTF-8">

        <div class="moz-cite-prefix">On 2023-12-27 1:56 pm, Skef Iterum

          wrote:<br>

        </div>

        <blockquote type="cite"

          cite="mid:cfd1aa4d-7fa3-4d5e-b5cd-5757e69b0315@skef.org">

          <meta http-equiv="Content-Type"

            content="text/html; charset=UTF-8">

          <p>If I understand you right, things have gone against the

            script/language mechanism over the past decades on the

            (broadly speaking) client side. So the responsible thing to

            do now would be to deprecate that mechanism in the spec and

            recommend that future fonts do all substitutions and

            positioning in the context of DFLT dflt. This will save

            foundries a lot of effort and heartache. <br>

          </p>

        </blockquote>

        <p>The <i>script</i> system in OTL is mostly fine, since its

          implementation is mostly derived from Unicode script

          properties. The only shaky part of that infrastructure is the

          lack of a standardised algorithm for script itemisation and

          glyph run segmentation, which can lead to inconsistent results

          for script=Common characters in different shaping engines.</p>

        <p>I always found the DFLT script concept confusing and

          uninviting—except possibly for PUA—, and I don’t agree that it

          would ‘save foundries a lot of effort and heartache’; rather,

          it would push font makers into the AAT-like realm of trying to

          implement all shaping behaviour—even standard behaviour

          derivable from character properties, such as Indic

          reordering—within GSUB and GPOS. Again: the <i>script</i>

          shaping aspect of OTL is mostly pretty reliable and robust: it

          could just do with a bit better standardisation of upfront

          itemisation and segmentation.<br>

        </p>

        <p>It is the <i>langsys</i> aspect that has proven to be

          unreliable and fragile, and while Simon is partly right when

          he says that this is a vendor implementation failure rather

          than a font format failure, I think he is also partly wrong,

          because there are conceptual problems in langsys that

          contribute to those implementation failures along with, of

          course, <i>the absence of an implementation specification.</i>

          As originally conceived by Eilyezer, a registered langsys tag

          represented something like a ‘set of typographic conventions

          that might be shared by multiple fonts and that <i>might</i>

          be associated with a particular language’.</p>

        <p>[One of my favourite examples of the distinction between

          langsys and language was provided by Paul Nelson in the early

          days of registering langsys tags: he pointed to differing

          conventions employed by French and German classicists in their

          typography of Greek texts, and noted that these could be

          captured in the script/langsys pairings grek/FRA and

          grek/DEU.]<br>

        </p>

        <p>That we are now talking about cmap vs GSUB in the context of

          ‘the language/region problem’ illustrates the conceptual

          problem of langsys in OpenType. Neither language nor region

          are reliably and unambiguously captured in langsys, and hence

          mapping of langsys layout behaviours in GSUB and GPOS to

          specific languages or regions are more-or-less guessed at, or

          failed to be guessed at, in those vendor applications to which

          Simon referred. So, for example, Adobe chose to make OTL

          langsys GSUB ad GPOS  accessible via spellchecking and

          dictionary language settings, which is the sort of thing that

          appears to work for a lot of languages, but does so by simply

          ignoring the ways in which langsys was designed to be able to

          represent sets of typographic conventions beyond

          language-specific forms or behaviours. This means that there

          are registered langsys tags that are never going to be

          accessible within Adobe’s implementation model, e.g. IPPH.<br>

        </p>

        <p>Even if the implementation of langsys is limited in this way,

          to hard-coded lists of langsys-to-language mappings, reliable

          application of the langsys GSUB and GPOS relies on users or

          user agents setting text language tags in documents, which is

          not something I have found can be relied upon. Software could

          assist in this regard by automatically identifying text

          language and applying appropriate language tags, so perhaps

          failure to do so is the sort of thing Simon has in mind. But

          there remain edge-cases, e.g. where text is to short to be

          reliably identified, or where a user wants to invoke a

          particular langsys behaviour—perhaps because it is <i>regionally</i>

          appropriate—for a language other than the one with which it is

          associated by the software.<br>

        </p>

        <p>From the preamble to the OTL langsys registry:</p>

        <blockquote>

          <p><i>What is meant by a “language system” in this context is

              a set of typographic conventions for how text in a given

              script should be presented. Such conventions may be

              associated with particular languages, with particular

              genres of usage, with different publications, and other

              such factors. For example, particular glyph variants for

              certain characters may be required for particular

              languages, or for phonetic transcription or mathematical

              notation.</i><br>

          </p>

        </blockquote>

        <p>Given the multivalency inherent in that definition of what is

          meant by language system, it is difficult to see exactly <i>how</i>

          software vendors are meant to ‘correctly’ implement support.

          Personally, I think a proper implementation is one that

          provides the user with a mechanism to explicitly apply a

          particular OTL langsys to text, independent of all other

          language or region tagging, i.e. to be able to invoke

          particular GSUB and GPOS behaviour as grouped within a given

          font under langsys tags in a way that overrides any

          algorithmic application of the tags.<br>

        </p>

        <blockquote type="cite"

          cite="mid:cfd1aa4d-7fa3-4d5e-b5cd-5757e69b0315@skef.org">

          <p> </p>

          In contrast, a hinge point in GSUB/GPOS means that one can

          design a single unified font and just tie into the "initial"

          script/language using the overlapping GSUB trick (which could

          presumably be canned in a tool-set like fontTools) and TTC,

          addressing the messy present while not giving up on the better

          future. </blockquote>

        <p>There is a third option, of course, which is to provide both

          mechanisms and let the font makers decide which to employ or,

          even, to invent ways to combine them. In the same way what we

          can currently make TTCs with separate cmap tables or with

          separate GSUB tables, or with both, why not make it possible

          for us to use data-optimised dmap or overlapping GSUB or both?</p>

        <p>JH<br>

        </p>

        <p><br>

        </p>

        <p>PS. I rather like the idea of region langsys tags or language

          group langsys tags, which would provide more efficient

          mechanisms in fonts to address conventions across multiple

          languages, and to make distinctions between e.g. Eastern and

          Western styles of Devanagari in a single Sanskrit font.<br>

        </p>

        <p><br>

        </p>

        <p><span style="white-space: pre-wrap">

</span></p>

        <pre class="moz-signature" cols="72">-- 

John Hudson

Tiro Typeworks Ltd    <a class="moz-txt-link-abbreviated"

        href="http://www.tiro.com" moz-do-not-send="true">www.tiro.com</a>

Tiro Typeworks is physically located on islands 

in the Salish Sea, on the traditional territory 

of the Snuneymuxw and Penelakut First Nations.

__________

EMAIL HOUR

In the interests of productivity, I am only dealing 

with email towards the end of the day, typically 

between 4PM and 5PM. If you need to contact me more 

urgently, please use other means.</pre>

        <br>

        <fieldset class="moz-mime-attachment-header"></fieldset>

        <pre class="moz-quote-pre" wrap="">_______________________________________________

mpeg-otspec mailing list

<a class="moz-txt-link-abbreviated moz-txt-link-freetext"

        href="mailto:mpeg-otspec@lists.aau.at" moz-do-not-send="true">mpeg-otspec@lists.aau.at</a>

<a class="moz-txt-link-freetext"

        href="https://lists.aau.at/mailman/listinfo/mpeg-otspec"

        moz-do-not-send="true">https://lists.aau.at/mailman/listinfo/mpeg-otspec</a>

</pre>

      </blockquote>

      <br>

      <fieldset class="moz-mime-attachment-header"></fieldset>

      <pre class="moz-quote-pre" wrap="">_______________________________________________

mpeg-otspec mailing list

<a class="moz-txt-link-abbreviated" href="mailto:mpeg-otspec@lists.aau.at">mpeg-otspec@lists.aau.at</a>

<a class="moz-txt-link-freetext" href="https://lists.aau.at/mailman/listinfo/mpeg-otspec">https://lists.aau.at/mailman/listinfo/mpeg-otspec</a>

</pre>

    </blockquote>

    <pre class="moz-signature" cols="72">-- 

John Hudson

Tiro Typeworks Ltd    <a class="moz-txt-link-abbreviated" href="http://www.tiro.com">www.tiro.com</a>

Tiro Typeworks is physically located on islands 

in the Salish Sea, on the traditional territory 

of the Snuneymuxw and Penelakut First Nations.

__________

EMAIL HOUR

In the interests of productivity, I am only dealing 

with email towards the end of the day, typically 

between 4PM and 5PM. If you need to contact me more 

urgently, please use other means.</pre>

  </body>

</html>