New cmap format

John Hudson john at tiro.ca
Tue Apr 3 23:21:24 CEST 2012


The format 14 subtable, which implements support for Unicode variation 
selectors, maps from sequences of two Unicode values to a single variant 
glyph. It is fairly simple and elegant, and enables a character level 
solution for variant selector sequences, which seems appropriate.

I would like to explore the possibility of adding a new cmap format that 
would perform the opposite operation, i.e. that would map from a single 
Unicode codepoint to a sequence of two or more glyphs. My thinking 
behind this is that it is currently necessary for fonts to include large 
numbers of glyphs for Unicode precomposed diacritic characters, even 
though the great majority of these can be represented using glyph 
sequences and dynamic mark positioning. Although the effect of all these 
glyphs of glyf or css table size is negligible if composites or 
subroutines are used, they have a significant impact on font development 
time -- not least in maintaining consistency between precomposed glyphs 
and dynamic mark positioning -- and on GSUB and GPOS table size.

My idea is a cmap that would map from a single Unicode codepoint to a 
sequence of two or more GIDs that, in combination with GPOS, would be 
able to display that Unicode character. So instead of mapping

	U+00C4 to /Adieresis/

the new format cmap would map

	U+00C4 to /A/ /dieresiscob.cap/

Note how this kind if mapping can also bypass contextual GSUB 
substitutions to access appropriate variant mark glyphs etc., which 
should be more efficient.

Thoughts?


JH



-- 

Tiro Typeworks        www.tiro.com
Gulf Islands, BC      tiro at tiro.com

The criminologist's definition of 'public order
crimes' comes perilously close to the historian's
description of 'working-class leisure-time activity.'
  - Sidney Harring, _Policing a Class Society_



More information about the mpeg-otspec mailing list