[MPEG-OTSPEC] Shared GSUB/GPOS notes, was Re: dmap proposal
John Hudson
john at tiro.ca
Thu Dec 28 17:02:41 CET 2023
I’ve begun leaning the other way: perhaps we should ditch the
multivalency of langsys, restrict it to tags that map to languages, and
only use it in that way?
JH
On 2023-12-27 9:42 pm, Skef Iterum wrote:
>
> This all makes sense, but is what I was getting at in my earlier
> message when I said (as one horn of alternatives) "there's some token
> ("dialect"?) that Unicode should be tracking and formalizing but
> isn't". If what we need to track is specific enough to point a user at
> the right font, it should be specific enough to assign a token to to
> use as a langsys, or some successor of a langsys. It seems better to
> me to try to get that worked out and up to date than to just let the
> current system rot relative to actual usage.
>
> Is the current system so inflexible (in terms of "registry" or
> whatever) that it's not possible to get some new tags allocated to
> match the regions we would be building ttc-type fonts for?
>
> As far as multiple options go, that sounds fine to me as long as a
> good faith and ongoing effort is being made to make the different
> options viable. Whereas it sounds a little like dmap is a bit of a
> "here's a hack so we can just not worry about that other stuff" sort
> of thing.
>
> Skef
>
> On 12/27/23 17:49, John Hudson wrote:
>> On 2023-12-27 1:56 pm, Skef Iterum wrote:
>>>
>>> If I understand you right, things have gone against the
>>> script/language mechanism over the past decades on the (broadly
>>> speaking) client side. So the responsible thing to do now would be
>>> to deprecate that mechanism in the spec and recommend that future
>>> fonts do all substitutions and positioning in the context of DFLT
>>> dflt. This will save foundries a lot of effort and heartache.
>>>
>> The /script/ system in OTL is mostly fine, since its implementation
>> is mostly derived from Unicode script properties. The only shaky part
>> of that infrastructure is the lack of a standardised algorithm for
>> script itemisation and glyph run segmentation, which can lead to
>> inconsistent results for script=Common characters in different
>> shaping engines.
>>
>> I always found the DFLT script concept confusing and
>> uninviting—except possibly for PUA—, and I don’t agree that it would
>> ‘save foundries a lot of effort and heartache’; rather, it would push
>> font makers into the AAT-like realm of trying to implement all
>> shaping behaviour—even standard behaviour derivable from character
>> properties, such as Indic reordering—within GSUB and GPOS. Again: the
>> /script/ shaping aspect of OTL is mostly pretty reliable and robust:
>> it could just do with a bit better standardisation of upfront
>> itemisation and segmentation.
>>
>> It is the /langsys/ aspect that has proven to be unreliable and
>> fragile, and while Simon is partly right when he says that this is a
>> vendor implementation failure rather than a font format failure, I
>> think he is also partly wrong, because there are conceptual problems
>> in langsys that contribute to those implementation failures along
>> with, of course, /the absence of an implementation specification./ As
>> originally conceived by Eilyezer, a registered langsys tag
>> represented something like a ‘set of typographic conventions that
>> might be shared by multiple fonts and that /might/ be associated with
>> a particular language’.
>>
>> [One of my favourite examples of the distinction between langsys and
>> language was provided by Paul Nelson in the early days of registering
>> langsys tags: he pointed to differing conventions employed by French
>> and German classicists in their typography of Greek texts, and noted
>> that these could be captured in the script/langsys pairings grek/FRA
>> and grek/DEU.]
>>
>> That we are now talking about cmap vs GSUB in the context of ‘the
>> language/region problem’ illustrates the conceptual problem of
>> langsys in OpenType. Neither language nor region are reliably and
>> unambiguously captured in langsys, and hence mapping of langsys
>> layout behaviours in GSUB and GPOS to specific languages or regions
>> are more-or-less guessed at, or failed to be guessed at, in those
>> vendor applications to which Simon referred. So, for example, Adobe
>> chose to make OTL langsys GSUB ad GPOS accessible via spellchecking
>> and dictionary language settings, which is the sort of thing that
>> appears to work for a lot of languages, but does so by simply
>> ignoring the ways in which langsys was designed to be able to
>> represent sets of typographic conventions beyond language-specific
>> forms or behaviours. This means that there are registered langsys
>> tags that are never going to be accessible within Adobe’s
>> implementation model, e.g. IPPH.
>>
>> Even if the implementation of langsys is limited in this way, to
>> hard-coded lists of langsys-to-language mappings, reliable
>> application of the langsys GSUB and GPOS relies on users or user
>> agents setting text language tags in documents, which is not
>> something I have found can be relied upon. Software could assist in
>> this regard by automatically identifying text language and applying
>> appropriate language tags, so perhaps failure to do so is the sort of
>> thing Simon has in mind. But there remain edge-cases, e.g. where text
>> is to short to be reliably identified, or where a user wants to
>> invoke a particular langsys behaviour—perhaps because it is
>> /regionally/ appropriate—for a language other than the one with which
>> it is associated by the software.
>>
>> From the preamble to the OTL langsys registry:
>>
>> /What is meant by a “language system” in this context is a set of
>> typographic conventions for how text in a given script should be
>> presented. Such conventions may be associated with particular
>> languages, with particular genres of usage, with different
>> publications, and other such factors. For example, particular
>> glyph variants for certain characters may be required for
>> particular languages, or for phonetic transcription or
>> mathematical notation./
>>
>> Given the multivalency inherent in that definition of what is meant
>> by language system, it is difficult to see exactly /how/ software
>> vendors are meant to ‘correctly’ implement support. Personally, I
>> think a proper implementation is one that provides the user with a
>> mechanism to explicitly apply a particular OTL langsys to text,
>> independent of all other language or region tagging, i.e. to be able
>> to invoke particular GSUB and GPOS behaviour as grouped within a
>> given font under langsys tags in a way that overrides any algorithmic
>> application of the tags.
>>
>>> In contrast, a hinge point in GSUB/GPOS means that one can design a
>>> single unified font and just tie into the "initial" script/language
>>> using the overlapping GSUB trick (which could presumably be canned
>>> in a tool-set like fontTools) and TTC, addressing the messy present
>>> while not giving up on the better future.
>>
>> There is a third option, of course, which is to provide both
>> mechanisms and let the font makers decide which to employ or, even,
>> to invent ways to combine them. In the same way what we can currently
>> make TTCs with separate cmap tables or with separate GSUB tables, or
>> with both, why not make it possible for us to use data-optimised dmap
>> or overlapping GSUB or both?
>>
>> JH
>>
>>
>> PS. I rather like the idea of region langsys tags or language group
>> langsys tags, which would provide more efficient mechanisms in fonts
>> to address conventions across multiple languages, and to make
>> distinctions between e.g. Eastern and Western styles of Devanagari in
>> a single Sanskrit font.
>>
>>
>> --
>>
>> John Hudson
>> Tiro Typeworks Ltdwww.tiro.com
>>
>> Tiro Typeworks is physically located on islands
>> in the Salish Sea, on the traditional territory
>> of the Snuneymuxw and Penelakut First Nations.
>>
>> __________
>>
>> EMAIL HOUR
>> In the interests of productivity, I am only dealing
>> with email towards the end of the day, typically
>> between 4PM and 5PM. If you need to contact me more
>> urgently, please use other means.
>>
>> _______________________________________________
>> mpeg-otspec mailing list
>> mpeg-otspec at lists.aau.at
>> https://lists.aau.at/mailman/listinfo/mpeg-otspec
>
> _______________________________________________
> mpeg-otspec mailing list
> mpeg-otspec at lists.aau.at
> https://lists.aau.at/mailman/listinfo/mpeg-otspec
--
John Hudson
Tiro Typeworks Ltdwww.tiro.com
Tiro Typeworks is physically located on islands
in the Salish Sea, on the traditional territory
of the Snuneymuxw and Penelakut First Nations.
__________
EMAIL HOUR
In the interests of productivity, I am only dealing
with email towards the end of the day, typically
between 4PM and 5PM. If you need to contact me more
urgently, please use other means.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20231228/bb5432c4/attachment.html>
More information about the mpeg-otspec
mailing list