[MPEG-OTSPEC] New AHG mandates and other news!

Thu May 13 00:21:35 CEST 2021

Dear William,

Standards serve very specific and particular purposes. Among many standards, there are some that developed to ensure connectivity and interoperability between objects:

-          nuts and bolts by standardizing threads; 

-          plugs and outlets to make sure that a proper electrical connection can be established where appropriate (and to prevent wrong connections from happening for safety reason);

-          signals and protocols to allow different modules communicate, etc. – these are just few of many possible examples.

Each standard has a well-defined scope and a purpose. Often, multiple standards are needed to enable specific functions. For example, to establish USB connection between devices one need to standardize shapes and sizes for USB plugs and outlets, develop standards for electrical signals that travel via USB cable, develop protocols that are used by connected devices to convert electrical signals into bits of data … the list of examples can go on and on, but the point I am trying to make is that it would be odd [and inappropriate, IMO] to demand that the standards defining USB plugs and outlets should also specify algorithms for data encoding, even though that data will eventually travel as electrical signals through the USB cable connected via a standardized USB plug.

Similarly, the standards and objects we work with have limited, well-defined scope:

Fonts provide a comprehensive and diverse dataset including [but not limited to] glyphs _as units of text display_ (defined by their vector outlines or bitmap images), glyph metrics, character-to-glyph mapping, contextual data for glyph substitution and positioning … everything that may be needed to *visualize* text. The font standard specifies how these datasets are encoded and how to interpret the data in order to enable different devices to use fonts, regardless of the tools used to display text (i.e., either on screen or in print).

The Unicode standardizes unified encoding of characters _as units of text content_ making text strings machine-readable and searchable, but _it doesn’t define the meaning of words_ that are composed using standardized Unicode character code-points, nor it defines how the encoded text characters would be visualized (this is what fonts are for). Because of this separation of scope that enables unified text encoding, it becomes possible to select different fonts to visualize encoded text strings as sequences of characters – for as long as a chosen font offers data to support a particular character set – the result of text rendering will be something that is human readable. On the other hand, it also enables things like search – one can search for a sequence of characters in multiple different documents and would be guaranteed to find it, regardless of the meaning of this particular sequence of characters or what fonts are used to display text.

Character sequences become words, words become sentences … all of these constructs may have different semantical meanings that are, generally, completely outside of the scope for standards that define character encoding (i.e. Unicode) and even further out of scope for the font standard that defines what glyphs would look like and how they should be arranged to visualize text (characters, words, and sentences, regardless of their meaning). Having a particular text string with the specific meaning encoded as part of font data (e.g. a sequence of characters that identifies this particular font name, or a font vendor name) may be necessary and justified, but it doesn’t justify that other, completely arbitrary character strings should be encoded as part of font data.

> So I keep trying to get my proposal for localizable sentences considered by Unicode Inc.

I am hoping that my earlier explanation of limitation of scope for different standards helps your understanding why localizable sentences would be consider out of scope for Unicode. They deal with character encodings, regardless of semantical meaning of words and sentences that are composed using those characters. Semantical meaning of words belongs to a different level, semantical meaning of sentences that are composed by using different words (and how to preserve a particular semantical meaning when translating these sentences from one language to another) is on yet another, much higher level of application, and the task of standardizing a specific, pre-defined data set of localizable sentences and assigning them specific codes (defined as sequences of characters) requires a very different level of standardization, one that would *use* Unicode to specify sequences of characters, but without making them a part of the Unicode standards itself.

> I am not using this list as a back door. There has been a call for ideas and I have put one forward.

I appreciate your contributions but, like I mentioned earlier, we develop a standard that has strictly-defined scope. For the same reason that your idea and proposal is deemed out of scope by Unicode, it is one step further out of scope for the font standard that deals with visualization of text, *not* with text encoding nor its meaning.

> So I am in favour of having the 'text' table and Peter is not, so that is 1 vote for and 1 vote against at this time.

It’s not just about how many votes for or against the proposal we get. We are developing a standard that has a predefined scope, this work is conducted as part of the ISO Working Group that develops standards to serve specific needs of the industry [and are limited in scope], and this Working Group is established by the ISO Subcommittee that has a specific set of mandates. Even if we are all in favor of a new idea for standardization because we consider it useful – it doesn’t mean that we can do it if it is deemed to be out of scope for this particular group or subcommittee. I personally do think that your idea is far out of scope for the font standard.

> My idea is that the message list will be an international standard and that localization will take place automatically in the receiving device when a language-independent encoded message is received, using a decoding list local to the recipient.

I am not criticizing your idea, and it may well be worthy of being developed and become an international standard, but I am asserting my opinion that this idea is far outside of the scope for _this_ particular international standard you’re proposing it to. I hope my earlier explanations help inform your understanding of why this is the case.

Thank you,

Vladimir

From: mpeg-otspec [mailto:mpeg-otspec-bounces at lists.aau.at] On Behalf Of William_J_G Overington
Sent: Tuesday, May 11, 2021 2:43 PM
To: 'MPEG OT Spec list' <mpeg-otspec at lists.aau.at>
Subject: Re: [MPEG-OTSPEC] New AHG mandates and other news!

> Your original idea of localizable sentences, as I recall, involved assigning Unicode code points to particular semantic propositions, or “sentences”.

Yes, that was the original idea, back in 2009.

Research has continued and developed. There are several possible encodings in the research, all involve sequences: two are markup, one involves the exclamation mark and ordinary digits, the other involves an integral sign and circled digits - harder to write a message, but more robust.

The third possible encoding needs a regular Unicode/ ISO-IEC 10646 encoding but would be unambiguous, highly robust and clearly free of concerns about proprietary rights. Yet it needs agreement from Unicode Inc. and ISO/IEC 10646 committees.

> Unicode has stated clearly it is not interested in pursuing that idea and banned further discussion of that idea from its email lists.

Actually no. A fictional character with email address root at unicode.org <mailto:root at unicode.org>  banned discussion. It was not a statement by an official named officer of Unicode Inc. acting officially. So its validity is highly questionable. If Unicode Inc. wishes to ban discussion of localizable sentence technology then it could officially state that, but Unicode Inc. has not done that. No notice of disapproval for encoding localizable sentences has been made.

Rather, the banning by a fictional character is like a Unicode version of The Luxembourg Compromise.

The fictional character did not state any reason why localizable sentences are unsuitable for encoding.

https://en.wikipedia.org/wiki/Luxembourg_compromise

I have not been given a fair opportunity to state my case and have it debated.

QID emoji has been treated as a serious proposal and a Public Review has taken place.

My proposal for localizable sentences being encoded is far more robust, and, I opine, should be treated seriously and assessed properly on a "sauce for pasta is sauce for rice" basis.

So there is nothing OFFICIAL about localizable sentences from Unicode Inc. of which I am aware.

So I keep trying to get my proposal for localizable sentences considered by Unicode Inc..

> I don’t think you should be trying to use this list as a back door to revisit the same idea.

I am not using this list as a back door. There has been a call for ideas and I have put one forward. From what you now write it appears that the 'name' table will not do what I am proposing in what, for purposes of discussion, can be called the 'text' table, because, as far as I am aware, that name is not already in use for an OpenType table.

Also, I am entitled to try to get my invention implemented.

So I am in favour of having the 'text' table and Peter is not, so that is 1 vote for and 1 vote against at this time.

So the proposal goes forward and hopefully other people will express a view and a consensus will emerge.

> Again, there’s an unstated premise of this idea that the font will get transported with the message.

No, there is no such premise.

There is as far as I am aware no premise or presumption when sending any email message that a font will get transported with the message.

My idea is that the message list will be an international standard and that localization will take place automatically in the receiving device when a language-independent encoded message is received, using a decoding list local to the recipient.

I have recently decided that all localizable sentences that are encoded shall have a language-independent glyph - at one time I considered that glyphs were not always needed, but I have since changed my mind on this as my research has proceeded.

I have replied to the comments made. The 'text' table would have far wider application that just localizable sentences.

William Overington

Tuesday 11 May 2021

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20210512/5635129e/attachment-0001.html>