[MPEG-OTSPEC] Shared GSUB/GPOS notes, was Re: dmap proposal

John Hudson john at tiro.ca
Thu Dec 28 17:02:41 CET 2023


I’ve begun leaning the other way: perhaps we should ditch the 
multivalency of langsys, restrict it to tags that map to languages, and 
only use it in that way?

JH


On 2023-12-27 9:42 pm, Skef Iterum wrote:
>
> This all makes sense, but is what I was getting at in my earlier 
> message when I said (as one horn of alternatives) "there's some token 
> ("dialect"?) that Unicode should be tracking and formalizing but 
> isn't". If what we need to track is specific enough to point a user at 
> the right font, it should be specific enough to assign a token to to 
> use as a langsys, or some successor of a langsys. It seems better to 
> me to try to get that worked out and up to date than to just let the 
> current system rot relative to actual usage.
>
> Is the current system so inflexible (in terms of "registry" or 
> whatever) that it's not possible to get some new tags allocated to 
> match the regions we would be building ttc-type fonts for?
>
> As far as multiple options go, that sounds fine to me as long as a 
> good faith and ongoing effort is being made to make the different 
> options viable. Whereas it sounds a little like dmap is a bit of a 
> "here's a hack so we can just not worry about that other stuff" sort 
> of thing.
>
> Skef
>
> On 12/27/23 17:49, John Hudson wrote:
>> On 2023-12-27 1:56 pm, Skef Iterum wrote:
>>>
>>> If I understand you right, things have gone against the 
>>> script/language mechanism over the past decades on the (broadly 
>>> speaking) client side. So the responsible thing to do now would be 
>>> to deprecate that mechanism in the spec and recommend that future 
>>> fonts do all substitutions and positioning in the context of DFLT 
>>> dflt. This will save foundries a lot of effort and heartache.
>>>
>> The /script/ system in OTL is mostly fine, since its implementation 
>> is mostly derived from Unicode script properties. The only shaky part 
>> of that infrastructure is the lack of a standardised algorithm for 
>> script itemisation and glyph run segmentation, which can lead to 
>> inconsistent results for script=Common characters in different 
>> shaping engines.
>>
>> I always found the DFLT script concept confusing and 
>> uninviting—except possibly for PUA—, and I don’t agree that it would 
>> ‘save foundries a lot of effort and heartache’; rather, it would push 
>> font makers into the AAT-like realm of trying to implement all 
>> shaping behaviour—even standard behaviour derivable from character 
>> properties, such as Indic reordering—within GSUB and GPOS. Again: the 
>> /script/ shaping aspect of OTL is mostly pretty reliable and robust: 
>> it could just do with a bit better standardisation of upfront 
>> itemisation and segmentation.
>>
>> It is the /langsys/ aspect that has proven to be unreliable and 
>> fragile, and while Simon is partly right when he says that this is a 
>> vendor implementation failure rather than a font format failure, I 
>> think he is also partly wrong, because there are conceptual problems 
>> in langsys that contribute to those implementation failures along 
>> with, of course, /the absence of an implementation specification./ As 
>> originally conceived by Eilyezer, a registered langsys tag 
>> represented something like a ‘set of typographic conventions that 
>> might be shared by multiple fonts and that /might/ be associated with 
>> a particular language’.
>>
>> [One of my favourite examples of the distinction between langsys and 
>> language was provided by Paul Nelson in the early days of registering 
>> langsys tags: he pointed to differing conventions employed by French 
>> and German classicists in their typography of Greek texts, and noted 
>> that these could be captured in the script/langsys pairings grek/FRA 
>> and grek/DEU.]
>>
>> That we are now talking about cmap vs GSUB in the context of ‘the 
>> language/region problem’ illustrates the conceptual problem of 
>> langsys in OpenType. Neither language nor region are reliably and 
>> unambiguously captured in langsys, and hence mapping of langsys 
>> layout behaviours in GSUB and GPOS to specific languages or regions 
>> are more-or-less guessed at, or failed to be guessed at, in those 
>> vendor applications to which Simon referred. So, for example, Adobe 
>> chose to make OTL langsys GSUB ad GPOS  accessible via spellchecking 
>> and dictionary language settings, which is the sort of thing that 
>> appears to work for a lot of languages, but does so by simply 
>> ignoring the ways in which langsys was designed to be able to 
>> represent sets of typographic conventions beyond language-specific 
>> forms or behaviours. This means that there are registered langsys 
>> tags that are never going to be accessible within Adobe’s 
>> implementation model, e.g. IPPH.
>>
>> Even if the implementation of langsys is limited in this way, to 
>> hard-coded lists of langsys-to-language mappings, reliable 
>> application of the langsys GSUB and GPOS relies on users or user 
>> agents setting text language tags in documents, which is not 
>> something I have found can be relied upon. Software could assist in 
>> this regard by automatically identifying text language and applying 
>> appropriate language tags, so perhaps failure to do so is the sort of 
>> thing Simon has in mind. But there remain edge-cases, e.g. where text 
>> is to short to be reliably identified, or where a user wants to 
>> invoke a particular langsys behaviour—perhaps because it is 
>> /regionally/ appropriate—for a language other than the one with which 
>> it is associated by the software.
>>
>> From the preamble to the OTL langsys registry:
>>
>>     /What is meant by a “language system” in this context is a set of
>>     typographic conventions for how text in a given script should be
>>     presented. Such conventions may be associated with particular
>>     languages, with particular genres of usage, with different
>>     publications, and other such factors. For example, particular
>>     glyph variants for certain characters may be required for
>>     particular languages, or for phonetic transcription or
>>     mathematical notation./
>>
>> Given the multivalency inherent in that definition of what is meant 
>> by language system, it is difficult to see exactly /how/ software 
>> vendors are meant to ‘correctly’ implement support. Personally, I 
>> think a proper implementation is one that provides the user with a 
>> mechanism to explicitly apply a particular OTL langsys to text, 
>> independent of all other language or region tagging, i.e. to be able 
>> to invoke particular GSUB and GPOS behaviour as grouped within a 
>> given font under langsys tags in a way that overrides any algorithmic 
>> application of the tags.
>>
>>> In contrast, a hinge point in GSUB/GPOS means that one can design a 
>>> single unified font and just tie into the "initial" script/language 
>>> using the overlapping GSUB trick (which could presumably be canned 
>>> in a tool-set like fontTools) and TTC, addressing the messy present 
>>> while not giving up on the better future. 
>>
>> There is a third option, of course, which is to provide both 
>> mechanisms and let the font makers decide which to employ or, even, 
>> to invent ways to combine them. In the same way what we can currently 
>> make TTCs with separate cmap tables or with separate GSUB tables, or 
>> with both, why not make it possible for us to use data-optimised dmap 
>> or overlapping GSUB or both?
>>
>> JH
>>
>>
>> PS. I rather like the idea of region langsys tags or language group 
>> langsys tags, which would provide more efficient mechanisms in fonts 
>> to address conventions across multiple languages, and to make 
>> distinctions between e.g. Eastern and Western styles of Devanagari in 
>> a single Sanskrit font.
>>
>>
>> -- 
>>
>> John Hudson
>> Tiro Typeworks Ltdwww.tiro.com
>>
>> Tiro Typeworks is physically located on islands
>> in the Salish Sea, on the traditional territory
>> of the Snuneymuxw and Penelakut First Nations.
>>
>> __________
>>
>> EMAIL HOUR
>> In the interests of productivity, I am only dealing
>> with email towards the end of the day, typically
>> between 4PM and 5PM. If you need to contact me more
>> urgently, please use other means.
>>
>> _______________________________________________
>> mpeg-otspec mailing list
>> mpeg-otspec at lists.aau.at
>> https://lists.aau.at/mailman/listinfo/mpeg-otspec
>
> _______________________________________________
> mpeg-otspec mailing list
> mpeg-otspec at lists.aau.at
> https://lists.aau.at/mailman/listinfo/mpeg-otspec

-- 

John Hudson
Tiro Typeworks Ltdwww.tiro.com

Tiro Typeworks is physically located on islands
in the Salish Sea, on the traditional territory
of the Snuneymuxw and Penelakut First Nations.

__________

EMAIL HOUR
In the interests of productivity, I am only dealing
with email towards the end of the day, typically
between 4PM and 5PM. If you need to contact me more
urgently, please use other means.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20231228/bb5432c4/attachment.html>


More information about the mpeg-otspec mailing list