[MPEG-OTSPEC] Shared GSUB/GPOS notes, was Re: dmap proposal

John Hudson john at tiro.ca
Thu Dec 28 02:49:51 CET 2023


On 2023-12-27 1:56 pm, Skef Iterum wrote:
>
> If I understand you right, things have gone against the 
> script/language mechanism over the past decades on the (broadly 
> speaking) client side. So the responsible thing to do now would be to 
> deprecate that mechanism in the spec and recommend that future fonts 
> do all substitutions and positioning in the context of DFLT dflt. This 
> will save foundries a lot of effort and heartache.
>
The /script/ system in OTL is mostly fine, since its implementation is 
mostly derived from Unicode script properties. The only shaky part of 
that infrastructure is the lack of a standardised algorithm for script 
itemisation and glyph run segmentation, which can lead to inconsistent 
results for script=Common characters in different shaping engines.

I always found the DFLT script concept confusing and uninviting—except 
possibly for PUA—, and I don’t agree that it would ‘save foundries a lot 
of effort and heartache’; rather, it would push font makers into the 
AAT-like realm of trying to implement all shaping behaviour—even 
standard behaviour derivable from character properties, such as Indic 
reordering—within GSUB and GPOS. Again: the /script/ shaping aspect of 
OTL is mostly pretty reliable and robust: it could just do with a bit 
better standardisation of upfront itemisation and segmentation.

It is the /langsys/ aspect that has proven to be unreliable and fragile, 
and while Simon is partly right when he says that this is a vendor 
implementation failure rather than a font format failure, I think he is 
also partly wrong, because there are conceptual problems in langsys that 
contribute to those implementation failures along with, of course, /the 
absence of an implementation specification./ As originally conceived by 
Eilyezer, a registered langsys tag represented something like a ‘set of 
typographic conventions that might be shared by multiple fonts and that 
/might/ be associated with a particular language’.

[One of my favourite examples of the distinction between langsys and 
language was provided by Paul Nelson in the early days of registering 
langsys tags: he pointed to differing conventions employed by French and 
German classicists in their typography of Greek texts, and noted that 
these could be captured in the script/langsys pairings grek/FRA and 
grek/DEU.]

That we are now talking about cmap vs GSUB in the context of ‘the 
language/region problem’ illustrates the conceptual problem of langsys 
in OpenType. Neither language nor region are reliably and unambiguously 
captured in langsys, and hence mapping of langsys layout behaviours in 
GSUB and GPOS to specific languages or regions are more-or-less guessed 
at, or failed to be guessed at, in those vendor applications to which 
Simon referred. So, for example, Adobe chose to make OTL langsys GSUB ad 
GPOS  accessible via spellchecking and dictionary language settings, 
which is the sort of thing that appears to work for a lot of languages, 
but does so by simply ignoring the ways in which langsys was designed to 
be able to represent sets of typographic conventions beyond 
language-specific forms or behaviours. This means that there are 
registered langsys tags that are never going to be accessible within 
Adobe’s implementation model, e.g. IPPH.

Even if the implementation of langsys is limited in this way, to 
hard-coded lists of langsys-to-language mappings, reliable application 
of the langsys GSUB and GPOS relies on users or user agents setting text 
language tags in documents, which is not something I have found can be 
relied upon. Software could assist in this regard by automatically 
identifying text language and applying appropriate language tags, so 
perhaps failure to do so is the sort of thing Simon has in mind. But 
there remain edge-cases, e.g. where text is to short to be reliably 
identified, or where a user wants to invoke a particular langsys 
behaviour—perhaps because it is /regionally/ appropriate—for a language 
other than the one with which it is associated by the software.

 From the preamble to the OTL langsys registry:

    /What is meant by a “language system” in this context is a set of
    typographic conventions for how text in a given script should be
    presented. Such conventions may be associated with particular
    languages, with particular genres of usage, with different
    publications, and other such factors. For example, particular glyph
    variants for certain characters may be required for particular
    languages, or for phonetic transcription or mathematical notation./

Given the multivalency inherent in that definition of what is meant by 
language system, it is difficult to see exactly /how/ software vendors 
are meant to ‘correctly’ implement support. Personally, I think a proper 
implementation is one that provides the user with a mechanism to 
explicitly apply a particular OTL langsys to text, independent of all 
other language or region tagging, i.e. to be able to invoke particular 
GSUB and GPOS behaviour as grouped within a given font under langsys 
tags in a way that overrides any algorithmic application of the tags.

> In contrast, a hinge point in GSUB/GPOS means that one can design a 
> single unified font and just tie into the "initial" script/language 
> using the overlapping GSUB trick (which could presumably be canned in 
> a tool-set like fontTools) and TTC, addressing the messy present while 
> not giving up on the better future. 

There is a third option, of course, which is to provide both mechanisms 
and let the font makers decide which to employ or, even, to invent ways 
to combine them. In the same way what we can currently make TTCs with 
separate cmap tables or with separate GSUB tables, or with both, why not 
make it possible for us to use data-optimised dmap or overlapping GSUB 
or both?

JH


PS. I rather like the idea of region langsys tags or language group 
langsys tags, which would provide more efficient mechanisms in fonts to 
address conventions across multiple languages, and to make distinctions 
between e.g. Eastern and Western styles of Devanagari in a single 
Sanskrit font.


-- 

John Hudson
Tiro Typeworks Ltdwww.tiro.com

Tiro Typeworks is physically located on islands
in the Salish Sea, on the traditional territory
of the Snuneymuxw and Penelakut First Nations.

__________

EMAIL HOUR
In the interests of productivity, I am only dealing
with email towards the end of the day, typically
between 4PM and 5PM. If you need to contact me more
urgently, please use other means.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20231227/83db9715/attachment.html>


More information about the mpeg-otspec mailing list