[MPEG-OTSPEC] [EXTERNAL] Re: Does OpenType have a one to many GSUB-like feature please?

Peter Constable pconstable at microsoft.com
Fri Dec 23 17:46:55 CET 2022

John hit a key point:

> GSUB operates at the glyph ID number level…

A GSUB type 2 lookup can replace a single glyph with a sequence of glyphs, but after the initial character-to-glyph mapping (from the cmap table), you’re dealing with glyph IDs. And glyph IDs are font-specific, not something that can interoperate outside the font. Text-to-speech systems currently don’t interact with the glyph processing, but even if they did they’d have no basis to interpret the glyph IDs.

Instead of an approach using font-internal details, a better approach would be for text-to-speech systems to recognize the emoji characters and then use dictionaries of emoji keyword data, which Unicode CLDR provides, to select a word or words to render as speech.


From: mpeg-otspec <mpeg-otspec-bounces at lists.aau.at> on behalf of John Hudson <john at tiro.ca>
Date: Friday, December 23, 2022 at 9:09 AM
To: mpeg-otspec at lists.aau.at <mpeg-otspec at lists.aau.at>
Subject: [EXTERNAL] Re: [MPEG-OTSPEC] Does OpenType have a one to many GSUB-like feature please?
Yes, OTL GSUB has a one-to-many lookup type (GSUB lookup type 2), but I don’t think this is relevant to the issue of emoji in text-to-speech systems.

I am wondering if a font could have a table where the codepoint (or postscript name) of an emoji is substituted by the codepoints of a sequence of regular text characters, with the output then routed to a text to speech system.

GSUB operates at the glyph ID number level, not at the codepoint level. Text-to-speech and text-to-glyphs can be seen as parallel but separate presentation technologies, both sitting on top of Unicode encoded text, but presenting that text in different ways. Text-to-speech systems rely on natural language processing, and the reason they become clunky when dealing with emoji—as with other symbols and non-linguistic content—is that such characters do not participate in linguistic communication, so at best have to be named or described. But what is being named or described is still at the character level, not at the glyph level.



John Hudson

Tiro Typeworks Ltd    www.tiro.com<https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.tiro.com%2F&data=05%7C01%7Cpconstable%40microsoft.com%7C05315cbc8f57411b0d6c08dae5001b23%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638074085873408048%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=6j%2BUWj6l1CrFpD0Z6zh4DCz9RV0PwkJ%2BUKKkdhYyPko%3D&reserved=0>

Tiro Typeworks is physically located on islands

in the Salish Sea, on the traditional territory

of the Snuneymuxw and Penelakut First Nations.



In the interests of productivity, I am only dealing

with email towards the end of the day, typically

between 4PM and 5PM. If you need to contact me more

urgently, please use other means.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20221223/7963eea7/attachment-0001.html>

More information about the mpeg-otspec mailing list