[MPEG-OTSPEC] [EXTERNAL] Re: Shared GSUB/GPOS notes, was Re: dmap proposal

Peter Constable pconstable at microsoft.com
Wed Jan 3 00:50:05 CET 2024


Simon wrote:
> I'm uncomfortable with this wording and this line of argument. The langsys mechanism in OTL works *fine*; application vendor implementations do not.

There are a couple of difficulties with the mechanism that I don't think can be blamed on vendors not implementing correctly.

First, let's suppose that all text to be rendered has associated language metadata, specifically a BCP 47 language tag. It's not always obvious for vendors how they should map from that language metadata to an OTL langsys tag. We could perhaps solve that (in part) by pre-defining a mapping from any possible ISO 639-3 ID to a corresponding langsys tag. (This has been floated in the past.)

Note, though, that that would put greater burden on font vendors to deal lots of distinct langsys tags / ISO 639-3 IDs when, for all they know, the actual typographic distinctions are much less granular. (John also mentioned this.)

But there's still a more fundamental problem: the concept represented by langsys tags is not the same as that represented by ISO 639-3 IDs. In BCP 47 terms, langsys tags corresponds more like any distinction that can be made using the language, region, script, variant subtags (and potentially defined extensions) of a BCP 47 language tag. E.g., 'MAL ' ("Malayalam") versus 'MLR ' ("Malayalam, Reformed"). Even if doc formats included exactly that semantic in metadata on the text, the UI designers would have a problem of providing a reasonable UI for content authors to make that selection. Nobody has come up with a workable UI that allows choosing from among ~7000 ISO 639-3 IDs, and if you add that the langsys tags can capture distinctions in arbitrary dimensions (time periods, orthography reforms, transcription conventions...), the problem is even more intractable. Plus there's the challenge that there's a _registry_ of langsys tags, meaning that each new product release could have to support any number of new tags — not many app vendors would be eager to sign up for that maintenance challenge.

As John mentioned, there was never an implementation specification, and there have been different understandings of the intent. As mentioned, the idea has been floated in the past to pre-define a mapping from any potential 639-3 ID, and that has assumed that the language tagging of a run of text is what should determine the langsys tag. But that is actually different from the intended usage Eliyezer Kohen first had in mind. John mentioned the example of French and German classicists having different typographic conventions for Greek texts. So, consider this text:

Das Wort ἄξιοι ist Plural.

The language tagging markup would typically be along these lines:

<span lang="de">Das Wort </span><span lang="gr">ἄξιοι</span><span lang="de"> ist Plural.</span>

However, in the way Eliyezer originally envisioned the use of langsys tags, you'd want markup more along these lines:

<doc lang="de">Das Wort <span lang="gr">ἄξιοι</span> ist Plural.</doc>

Then the langsys tag would be determined by the language on the <doc> element, not the <span> — so grek/DEU, not grek/PGR.

Btw, given that understanding, it's conceivable that some situations might require a distinction on the _script_ tag. E.g., if German classicists used different Greek typography than would be used in German documents referring to Modern Greek, then you'd want OTL tags something like grek/DEU vs. grec/DEU.


All in all, probably the only way app vendors could actually make langsys tags work would be to give users direct control over the text metadata, as CSS has done with font-language-override. But I don't think that's a viable option for many apps.



Peter Constable

-----Original Message-----
From: mpeg-otspec <mpeg-otspec-bounces at lists.aau.at> On Behalf Of Simon Cozens
Sent: Wednesday, December 27, 2023 11:09 AM
To: mpeg-otspec at lists.aau.at
Subject: [EXTERNAL] Re: [MPEG-OTSPEC] Shared GSUB/GPOS notes, was Re: dmap proposal

[You don't often get email from simon at simon-cozens.org. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]

On 27/12/2023 16:55, John Hudson wrote:
> Do we? The langsys mechanism in OTL has proven to be unreliable for
> coming up on thirty years,

I'm uncomfortable with this wording and this line of argument. The langsys mechanism in OTL works *fine*; application vendor implementations do not.

The reason why the distinction is important is that it guides how we should respond; what I am uncomfortable about is making decisions about the spec based on the behaviour of people who fail to implement it.
Telling those same vendors "because you didn't implement X correctly, we've now added Y and need you to implement that" will almost certainly lead to Y not being properly implemented either.


S

_______________________________________________
mpeg-otspec mailing list
mpeg-otspec at lists.aau.at
https://lists.aau.at/mailman/listinfo/mpeg-otspec


More information about the mpeg-otspec mailing list