[OpenType] ISO/IEC 14496-22 Amendment for Font Collections

Thu Apr 9 17:12:40 CEST 2015

On Wednesday, April 08, 2015 3:38 PM Richard Wordingham wrote:
On Wed, 8 Apr 2015 14:42:22 +0000
"Levantovsky, Vladimir" <Vladimir.Levantovsky at monotype.com> wrote:

> Mixing wrong
> 'glyf' and 'loca' tables would have catastrophic consequences, and 
> fonts in a font collection cannot _choose_ to share one table of a 
> pair but not the other.

Two fonts can be *constructed* so that exactly one of the two tables is identical.  The trivial example is a glyf table less than 128kB in size.
The coresponding loca table may use either scaled 16-bit offsets or unscaled 32-bit offsets.  Another possible example is an unhinted pair of regular and italic faces.  If the points were all stored as 16-bit offsets, and remember that some people have their own font compilers, and the italic face were generated from the regular face simply by slanting, the same loca table would serve for both sets of glyphs.

[VL] While I admit that the probability of constructing two fonts with equal number of glyphs, each having an equal number of outlines defined by equal number of points that have their coordinates represented in exactly the same format* having each glyph record occupy exactly the same number of bytes and, therefore, having their glyph records starting in the same physical locations and, because of all this, having 'loca' tables identical is greater than zero - it is likely approaching the probability of myself blindly typing on the keyboard and retyping the complete OT spec by heart (all 550+ pages of it). I've seen it many times, I edited it many times, so one might argue that the probability of me blindly re-typing it "as is" is also greater than zero.

{*Remember that TTF offers three different ways to record point coordinate to optimize the coordinate data stream using flags. Regular outlines, where only one of [x,y] coordinates is likely to be changed from one point to the next won’t have their coordinates recorded in the same way as slanted outlines where both [x,y] coordinates are likely to be changed between some of the two neighboring points - therefore, they will likely not be recorded in the same coordinate stream formats.}

If we ever get a font collection file with two fonts that have identical 'loca' tables that could've been shared but weren’t - I can leave with that!

> A font doesn’t have to use all of the glyphs (using only a subset from 
> a shared 'glyf' table is fine, as we discussed earlier) but when using 
> a shared 'glyf' table a font has to share the associated 'loca' table 
> to find the glyph records you need
> - the subset in use by each font in a collection will be determined by 
> 'cmap' table that translates codepoints to glyph indices.

That's not quite true.  The entries for unused glyphs could be removed (and glyphs renumbers for the font).  I'm not sure what, if anything, goes wrong if the loca table indicates that the glyf table is much shorter than it actually is.  If it is acceptable, then one of the loca tables might use 16-bit offsets while the other had to use 32-bit offsets, and thereby save some space.

[VL] I am afraid you're confusing two different use cases - the purpose of font subsetting would be to remove the unused glyph entries to make the individual font file smaller; however, the purpose of creating a font collection is to reuse the same glyph records that are shared between different fonts. Therefore, it is given that there will be glyphs that are used by one font in a collection but not the other. The 'cmap' table for each font in a collection would have only the codepoints supported by that font and would translate them to a subset of glyph indices, but the 'loca' table has to be complete so that a font engine can fetch the physical glyph records from their correct locations in the glyph table. Offsets and 'loca' table format are determined by the size of the 'glyf' table, not by the number of glyphs supported by a font.

> When dealing with a font resource downloaded from a web server (and, 
> hence, dealing with the built-in risks associated with possible 
> malicious intent), WOFF2 decoders will attempt to analyze font 
> structures and will invalidate anything that has wrong data - fonts 
> with overlapping tables, fonts that share wrong glyf/loca pairs, or 
> share one table but not the other, etc. Properly constructed font 
> files (assuming they were built using tools that do things the right
> way) will not be affected, and it will sure have no effect on authors 
> who produce font collections using those tools. I don't see any reason 
> to be concerned about it.

It is the *encoder" that is to reject because there is not a one-to-one correspondence of glyf and loca tables.  The structure of a WOFF2 file guarantees a one-to-one correspondence within a WOFF2 file - the loca table has to follow the corresponding repackaged glyf table.  Now, if there are two inequivalent loca tables for one glyf table, or two different glyf tables for one loca table, the encoder in general does need to produce two pairs of glyf and loca tables if there is to be any confidence that the WOFF2 file will define the same glyphs as the TTC.

[VL] Again, you seem to be making an assumption that compressed WOFF2 files cannot be touched or 'doctored' in any way and I would argue that your assumption is wrong. In practical cases, the encoding process would happen in a controlled environment (e.g. Monotype production process for web fonts where all input fonts are checked and validated before encoding them as web fonts), and then the webfonts are stored on multiple servers around the globe (including those that are part of the third party CDN network not controlled by us) to enable their use by our subscriber base. It is much more likely that already encoded WOFF1 or WOFF2 files might be subjected to malicious modifications, and it is the job of the decoder to catch what it can and prevent maliciously modified font from loading.

Thank you,
Vladimir

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20150409/9b310bf8/attachment.html>