[OpenType] ISO/IEC 14496-22 Amendment for Font Collections

Wed Apr 8 16:42:22 CEST 2015

On Tuesday, April 07, 2015 5:36 PM Richard Wordingham wrote:

On Tue, 7 Apr 2015 13:45:39 +0000
"Levantovsky, Vladimir" <Vladimir.Levantovsky at monotype.com> wrote:
> [VL]
> Well, I am no security expert but if you allow overlapping tables one 
> can easily doctor the font file content (e.g. to modify table lengths 
> / offsets in a table directory to create an overlap and use it as an 
> exploit to load a malicious data. SFNT structure isn't bullet-proof 
> and anyone can easily add arbitrary data to a font file, but at least 
> you are protected to an extent because a custom table data won't be 
> loaded by a font rasterizer. If table overlaps are allowed, a doctored 
> font data can be used to force loading of malicious data.

One can doctor the font file content without creating overlapping tables.  GSUB, GPOS, name and cmap tables can all contain large chunks of unreferenced data.  So also can the glyf table, but such chunks are easier to notice.  Many users would not notice if a kern table were removed, or severely truncated, as a way of modifying an existing font file without changing its size.

Note that I am not talking of overlapping tables *within* fonts, but
*between* fonts.

[VL] I think we are talking about very different things using the same terms. Overlapping tables in my interpretation is when, within the same font, one table ends _after_ another table begins (i.e., when offset_n + length_n > offset_n+1). You are talking about overlapping tables between fonts (I am not really sure what it means though), but if you are thinking about reusing parts of a table data by different fonts in a collection I call it table sharing. For example, a 'glyf' table can have many glyphs and each font in a collection can only use a subset of what is there. One 'glyf' table would in essence represent a union of different font subsets, but it is still one single table that doesn’t overlap with other tables.

> Moreover, WOFF2 prohibits fonts' tables overlapping without 
> coinciding.  (A WOFF contains only one font.)  That strongly limits 
> schemes that save space in uncompressed font collections.
> 
> [VL]
> Can you please elaborate on such a scheme?

Some Tai Tham fonts accommodate printing constraints by limiting lines of text to three rows - one for base characters, one for marks above and one for marks below.  This includes handwritten camera-ready copy. I've noticed that in Northern Thailand there is little if any difference between the restrained glyphs used for these styles and the less restrained fonts where stacks are not constrained vertically. I am considering creating a variant of my unconstrained font which limits itself to three rows.

Now, I might declare that the unconstrained form was for Northern Thai and the constrained form was for Tai Khün.  Most of the GSUB and GPOS lookups would be the same, but there would be some lookups that differed.  I would simply select different lookups for different languages.  However, this assignment to languages would be dishonest.

Microsoft has very little support of typographical features for complex scripts.  Possibly I could exploit control of the calt and clig features.  All this assumes that the Universal Shaping Engine is modified to support Tai Tham, which Microsoft originally expected it to support.

It occurred to me that I could achieve similar compactness and consistency by generating a font collection instead.  I would organise a large GSUB table as follows:

GSUB_1_start:
Offset from GSUB_1_start to script list for font 1 Offset from GSUB_1_start to common feature list Offset from GSUB_1_start to common lookup list Script list for font 1
GSUB_2_start:
Offset from GSUB_2_start to script list for font 2 Offset from GSUB_2_start to common feature list Offset from GSUB_2_start to common lookup list Script list for font 2 Commmon feature list Language system tables, lookup list, tables and subsidiary data

The two fonts would reference different sets of feaure tables.

I have not started writing the code to generate such a font collection.  Nesting tables did look as though it would be difficult to control at font compilation time.  Conditional compilation directives are probably an easier way to go, with a font collection being stitched together from individual font files.

[VL] Ok, thank you for the explanation but IMO this has nothing to do with overlapping tables within a file (as explained earlier).

> WOFF2 also prohibits fonts sharing glyf but not loca (admittedly a 
> perverse combination unless the last glyph in one font contains more 
> than 3 bytes of ignored junk) and prohibits fonts from sharing loca 
> but not glyf (a perverse combination or probably requiring the 
> tweaking of the sizes of the glyph definitions).  This smacks of the 
> authors being exhausted.
> 
> [VL]
> Where did you find this info? Why do you think WOFF2 has anything to 
> do with prohibiting table sharing ('glyf' /'loca' or any other table 
> for that matter)?

WOFF File format 2.0 W3C Editors Draft 2 April 2015
(http://dev.w3.org/webfonts/WOFF2/spec/) Section 4.2 Paragraph 6 (ignoring tables):

"Sharing of glyf and loca tables is allowed and encouraged (this is one of the major benefits of a font collection); however, it is possible that font collections may have two or more pairs of the glyf / loca tables that may not be shared. When the tables are shared, an encoder MUST verify that both tables are shared and that both form an associated pair (if more than one pair of glyf / loca tables are
present) and MUST reject a collection containing fonts that share only one of either glyf or loca table."

[VL] The paragraph you refer to and the procedure outlined there is a simple security measure. 

As you know, 'loca' table translates glyph indices (a logical reference to a glyph record) into glyph record offsets (a physical reference that is "hard-coded" to a particular 'glyf' table). If a font collection file contains multiple pairs of glyf/loca tables (which is unusual in a typical TTC scenario but possible considering a larger scope of an arbitrary font collection) - the association between the pairs of glyf/loca tables must be preserved so that a font rasterizer is able to find and read glyph records. Mixing wrong 'glyf' and 'loca' tables would have catastrophic consequences, and fonts in a font collection cannot _choose_ to share one table of a pair but not the other. A font doesn’t have to use all of the glyphs (using only a subset from a shared 'glyf' table is fine, as we discussed earlier) but when using a shared 'glyf' table a font has to share the associated 'loca' table to find the glyph records you need - the subset in use by each font in a collection will be determined by 'cmap' table that translates codepoints to glyph indices. 

When dealing with a font resource downloaded from a web server (and, hence, dealing with the built-in risks associated with possible malicious intent), WOFF2 decoders will attempt to analyze font structures and will invalidate anything that has wrong data - fonts with overlapping tables, fonts that share wrong glyf/loca pairs, or share one table but not the other, etc. Properly constructed font files (assuming they were built using tools that do things the right way) will not be affected, and it will sure have no effect on authors who produce font collections using those tools. I don't see any reason to be concerned about it.

Thank you,
Vladimir

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20150408/283e432e/attachment.html>