Composite Font Requirements (was RE: [mpeg-OTspec] AHG on Open Font Format

Mon Mar 16 21:42:11 CET 2009

All,

I am still digesting the recent posts by Karsten, John, Jeff, and  
Mikhail. I am also attending the Worldware Conference for the next  
three days, turning me into a pumpkin for most of this week.

I did want to convey some thoughts before that happens, though...

One question that has been raised is about encoding and Unicode.  
Historically, encoding has been the "glue" for legacy Composite Font  
formats. Text is what applications use, and the encoded values are  
used to eventually get to the glyphs. For the Composite Font format  
that we're discussing, we need some type of glue, and Unicode serves  
this purpose well, given its broad overall coverage of languages and  
scripts. In many ways, Composite Fonts are merely convenience  
mechanisms for interfacing what are logically multiple fonts, and  
doing so as though they were a single font. This is how we are able to  
effectively break the 64K glyph barrier.

I would also argue that once the encoded values, such as Unicode,  
become GIDs, which are necessarily specific to each component font of  
a Composite Font, you're no longer dealing with the Composite Font as  
a whole, but rather you're dealing with the component font. And yes,  
Composite Fonts will have limitations. To some extent, using Unicode  
as the glue may be thought of as the source of the limitation, but at  
least for today, it seems to be the best glue we have. Our original  
Composite Font format used Shift-JIS encoding as the glue, so using  
Unicode is a fairly huge step forward. In terms of limitations, it  
will be very difficult for glyphs to interact across component fonts,  
because GSUB features operate on GIDs, not character codes. This means  
that if a developer wants certain glyphs to interact in a GSUB  
feature, they should be in the same component font. Luckily, all of  
the glyphs for each supported script are likely to be in a single  
component font. The only possible exception is like to be the CJK  
Unified Ideographs. If we consider ISO 10646 up through and including  
Amendment 6, there are 74,382 such characters. This is obviously over  
the 64K glyph barrier. Some CJK Unified Ideographs do interact via  
GSUB features, such as the simplified, traditional, and variant forms.  
A large number of them do not interact.

I very much like the idea of using plain XML for representing this  
format. I also favor specifying flags or characteristics that may  
trigger certain behavior. For example, we can define a "cross- 
platform" flag that triggers requirements, such as Unicode encoding  
for the component fonts, a flat structure (a Composite Font cannot be  
used as a component font of another Composite Font), and perhaps even  
font format (OpenType). If a Composite Font is not flagged as cross- 
platform, the client is then responsible for handling the encoding,  
any recursion, and the font formats. This will allow the format to  
serve the needs of more users and developers.

About defining the Composite Font metrics, isn't the 'BASE' table  
designed to serve this purpose? In other words, the necessary  
information, or at least a good chunk of it, should be encapsulated in  
this OpenType table. Of course, some fonts lack this table. (Thinking  
out loud, the presence of this table could also be considered one of  
the requirements when a Composite Font is flagged as "cross- 
platform.") The ability to adjust this on a per-component font basis,  
along with scaling, needs to be in the format.

Regards...

-- Ken

On 2009/03/13, at 5:00, karstenluecke wrote:

> I like that you reduce the Composite Font Format intention to the  
> question, which issue is the format to address?
>
> As to 2.
> ["what are the defining metrics (e.g. max ascender, descender,  
> leading) of the composite font and how closely do the components of  
> a composite need to adhere to these metrics?"]
> I think there are two aspects:
> (a) Metrics that define ideal/recommended/automatic line-to-line  
> distance.
> (a.1) Two columns of different-script texts do not necessarily need  
> the same line-to-line distance. E.g. Latin--Arabic or Latin--Chinese/ 
> Japanese may even suffer from it. I am not sure if a composite font  
> needs to impose "global" values here.
> (a.2) In case of text which includes single different-script words  
> or phrases, the font that provides glyphs for the "primary" script  
> text may determine the line-to-line distance, and the other script  
> would follow. Here, "scale" factors as suggested in Mr Leonov's 4.  
> may jump in.
> (b) Metrics that define maximum dimensions (OS/2.usWinAscent/ 
> Descent) should not have any impact on line-to-line distance anyway.  
> If a composite font would provide these, they should be taken from  
> the font with largest dimensions. There is no need to keep these  
> values identical with every future composite font update or addition  
> of other fonts to the composite font.
> But that would be an ideal world.
>
> Perhaps one more question which I cannot find addressed in the posts:
>
> 9.
> Do Unicode ranges (a) defined in a composite font refer to  
> precomposed character-glyphs only or do they also (b) include  
> characters not covered in the font/cmap as such but would result  
> from Unicode composition rules + separate base/mark glyphs + ccmp/ 
> mark/mkmk?
> (b) would require that composite-font-savvy layout engines must,  
> rather than may, support layout tables.
>
> Best wishes,
> Karsten Luecke
>
>
>