FW: Feedback on CFR

Levantovsky, Vladimir vladimir.levantovsky at monotypeimaging.com
Tue Jan 18 06:21:04 CET 2011

Forwarding the email from Behdad to the AHG list.

-----Original Message-----
From: Behdad Esfahbod [mailto:behdad.esfahbod at gmail.com] On Behalf Of Behdad Esfahbod
Sent: Friday, January 14, 2011 6:26 PM
To: Levantovsky, Vladimir
Subject: Feedback on CFR

Hi Vlad,

Here is feedback from two people at Google, although I guess you can count
them as individual expert feedback since Google is not officially part of the
WG right now.



Feedback from Mark Davis:

From a quick review, I see the following issues. I see other ones, but just
ran out of steam. I can see the rationale for defining a standard Component
font structure, but this would need considerable revision before it was workable.

> and an optional character mapping (cmap) that defines how Unicode characters
map to glyphs in the component font

If the components are treated as atomic, you wouldn't want access to their
glyphs; just to characters. That is, logically, the component font would tell
you which font to use for which range(s) of Unicode characters, optionally
with a transform of the metrics.

Ideally, you'd want to have a bit more elaborate a structure for determining
the ranges, because you can end up with a situation ABCdefGHI, where ABC
should definitely be in Font1, and GHI should definitely be in Font2, but
where def could be in either one, and you want a more interesting algorithm to
determine that (such as handling matching braces, etc.)

> language = "string"
> Optional. The two-letter ISO 639 code that corresponds to the language of
the 'string' attribute.

In 2 places. Should be the BCP47 language tag...

> Required. A series of Unicode code points or code point ranges that are
specified according to ICU's UnicodeSet pattern syntax. As an example, the
ranges U+0020 through U+007E and U+4E00 through U+9FCC can thus be expressed
as [[\u0020-\u007E]|[\u4E00-\u9FCC]].

Syntax is wrong, and a better example would be something like
More importantly, the spec doesn't discuss what to do when two different
components have intersecting UnicodeSets (or contents).


Feedback from Doug Felt:

- 'prefered' is not the preferred spelling of 'preferred'
- The name and metrics fields are essentially taken from parts of the head,
os/2, hhea, and vhea tables in
  the opentype spec.  this spec must be referenced.
- Language is defined as using ISO 639 codes, this should use BCP47 codes.  In
particular, it should not be
  restricted to two letters, and should accept script tags.
- LanguagePreferedList (sic) is described as containing two or more
LanguagePreferredComponentDef instances, this
  should be one or more.
- LanguagePreferredComponentDef should have language as a required attribute,
not as an element.
- It is not clear to me why ComponentDef needs to allow more than one
UnicodeCharSet element.
- I don't see the need for the ToUnicode element's fromEncoding attribute.
Composite fonts should not need
  to support components that require custom encoding behavior.
- The examples section is not normative yet that is the only place where the
languagepreferredcomponentdef usage
  is described.  There must be a normative section detailing how language and
unicode character sequence
  are used together with the font spec to select a font and glyph id(s).
- The phrase 'unicode code point sequence' is used in conjunction with
UnicodeSet.  I'm assuming these are
  sequences of characters or UnicodeSet expressions in a single string.  Since
UnicodeSets are delimited by '['
  and ']', is there a way to specify these characters without defining a
UnicodeSet? Or must they be defined
  within a UnicodeSet?
- UnicodeSet allows strings as elements.  Are these allowed?
- UnicodeSet allows the use of unicode properties, which in turn depend on the
version of Unicode.  Are these
  allowed?  Does the platform define the version of Unicode used?
- ToUnicode maps from 'sequences' of codepoints to 'sequences' of codepoints,
yet the term 'sequence' is a bit
  ambiguous here.  Are these sequences, or ordered sets?  If sequences, then
if UnicodeSet notation is used, then
  are these patterns and not strict sequences?  And if ordered sets, do the
from and to values need to be
  the same length?  If not, what happens if they are not?  (One can imagine,
for instance, using this to map
  multiple codepoints onto a single replacement codepoint, but it's not clear
if that use is intended.)
- I'm assuming the cmap is definitive as to what characters are presumed
supported by the font.  So for example
  if the actual composite font does not support the character in its cmap
(even though specified in the ComponentDef)
  this font is still resolved as far as font lookup though the composite is
concerned (and the result will be that
  component's missing glyph).  An explicit statement would be useful.
- In general it's desirable to choose glyphs from the same font whenever
possible.  Composite fonts can get in
  the way of this process.  For example, combining marks (u+0300 block) should
come from the font the base character
  comes from, and not be iverly specified by the composite character mapping
table. Enclosing punctuation is often best
  obtained from the font that the surrounding characters from from as well,
and of course paired punctuation should
  always come from the same font.  It's not clear how these issues should be
dealt with by people employing composite


More information about the mpeg-otspec mailing list