[mpeg-OTspec] Composite Font syntax

Thu Jun 25 14:47:20 CEST 2009

Looks elegant. To be sure I understand:

(1)
How would the re-encoding mechanism fold in? Starting from your example:

    <!-- Latin -->
    <Encoding Target="0000-007F">
        <ComponentFont BaselineShift="12" ScaleFactor="95"
        Target="LatinFont-1"/>
    </Encoding>

Since ComponentFont is inside Encoding, does one need to define 
ComponentFont again per each such re-encoding entry? E.g., with nonsense 
values:

    <!-- Latin -->
    <Encoding Target="0000-007D">
        <ComponentFont BaselineShift="12" ScaleFactor="95"
        Target="LatinFont-1"/>
    </Encoding>
    <Encoding Target="007E" Original="FE59">
        <ComponentFont BaselineShift="12" ScaleFactor="95"
        Target="LatinFont-1"/>
    </Encoding>
    <Encoding Target="007F" Original="FE61">
        <ComponentFont BaselineShift="12" ScaleFactor="95"
        Target="LatinFont-1"/>
    </Encoding>

(2)
Following the descripion of the fallback mechanism, could one do this?

    <!-- ENGLISH -->
    <Language Target="eng">
        <Encoding Target="0028-0029, 002C, 002E, 0041-005A, 0061-007A,
        2018-2019, 201C-201D"><!-- POSSIBLY OBSOLETE IN EXAMPLE -->
            <ComponentFont Target="LatinFont-1"/>
            <ComponentFont Target="LatinFont-2"/>
        </Encoding>
    </Language>

    <!-- OTHER LATIN-SCRIPT LANGUAGES, without Language -->
    <Encoding Target="0000-[...]">
        <ComponentFont Target="LatinFont-1"/>
        <ComponentFont Target="LatinFont-2"/>
    </Encoding>

(3)
Even if given as percent, perhaps ScaleFactor better be a float rather 
than integer, for higher precision?

Best wishes,
Karsten

Ken Lunde wrote:
> All,
> 
> Mikhail Leonov, daan Strebe, and myself spent some time since the last  
> meeting to have some private discussions related to the syntax of the  
> Composite Font format, and I am now prepared to share with the AHG  
> what we have agreed to. Any delay in bringing this discussion to the  
> AHG is due to me. And for that, I apologize.
> 
> There are three basic elements for the syntax that describes the  
> functional portion of the Composite Font recipe:
> 
>    Language
>    Component Fonts
>    Unicode Ranges
> 
> (As a side note, we also discussed the notion of "Script" as an  
> alternative to specifying Unicode ranges, and agreed to defer that  
> portion of the discussion in order to bring the agreed portions to the  
> rest of the group.)
> 
> Given the "elective detail" principle, along with the acknowledgment  
> that the intentions and needs of creators and consumers are diverse,  
> these three Composite Font elements must be specified in a hierarchy,  
> and that the hierarchy depends on the intent of the creator.  
> Furthermore, in adhering to the "elective detail" principle, all of  
> these elements are not required to be specified, except for a minimum  
> of one Component Font.
> 
> We have agreed that a minimal Composite Font specifies a single  
> Component Font, such as the following:
> 
>    <ComponentFont Target="LatinFont-1"/>
> 
> This is functionally equivalent to the form that uses start- and end- 
> tags:
> 
>    <ComponentFont Target="LatinFont-1"></ComponentFont>
> 
> Another minimal form of a Composite Font is simply an ordered list of  
> Component Fonts, and the order in which they appear in the list is  
> their order of preference, and would function as a fallback font:
> 
>    <ComponentFont Target="LatinFont-1, LatinFont-2"/>
> 
> or:
> 
>    <ComponentFont Target="LatinFont-1"/>
>    <ComponentFont Target="LatinFont-2"/>
> 
> The latter form is necessary if one of the Component Fonts requires an  
> attribute not shared by another:
> 
>    <ComponentFont Target="LatinFont-1"/>
>    <ComponentFont ScaleFactor="110" Target="LatinFont-2"/>
> 
> Note that tag attributes are used to specify the content of the tags,  
> as opposed to string data between the start- and end-tags. When an  
> element has no further hierarchical information, the empty-element tag  
> form can be used, and when there is further hierarchy to specified,  
> the start- and end-tag form must be used.
> 
> The <ComponentFont> tag example above provided on attribute,  
> specifically "Target" that specifies the name of the Component Font.  
> Other attributes for this tag can include "BaselineShift" and  
> "ScaleFactor" values. Given that design spaces of Component Fonts can  
> be diverse, ranging from 256- to 2048-em, with 1000-em being typical  
> for PostScript-based fonts, these values are best specified as  
> percentages. The ScaleFactor attribute obviously benefits from this.  
> The following is an example that uses all of the attributes:
> 
>    <ComponentFont ScaleFactor="110" BaselineShift="-2"  
> Target="LatinFont-1"/>
> 
> In other words, the Component Font named "LatinFont-1" is scaled to  
> 110% of its size, and a -2% baseline shift is performed.
> 
> The other elements are specified as the following tags:
> 
>    <Language>
>    <Encoding>
> 
> The <Language> tag uses "Target" as its attribute, which specifies one  
> or more three-letter ISO 639-2/T language codes. The <Encoding> tag  
> also uses "Target" as its primary attribute, which specifies Unicode  
> ranges and Unicode code points, separated by commas. Some examples:
> 
>    <Language Target="jpn"/>
>    <Encoding Target="4E00-9FCB"/>
> 
> Let us consider John Hudson's example of mixed English and Devanagari  
> text. The issue for that example was about language tagging, and that  
> punctuation that is common across languages, but share the same  
> Unicode code points, need appropriate treatment. Let us consider only  
> the following code points for this example:
> 
>    Punctuation: 0028-0029, 002C, 002E, 2018-2019, 201C-201D
>    Latin:       0041-005A, 0061-007A
>    Devanagari:  0900-097F
> 
> When the language is English (or other Latin-based one), the  
> punctuation should be from the font intended for English, and  
> likewise, when the language is Hindi, the punctuation should be from  
> the Devanagari font.
> 
>    <!-- English -->
>    <Language Target="eng">
>        <Encoding Target="0028-0029, 002C, 002E, 0041-005A, 0061-007A,  
> 2018-2019, 201C-201D">
>            <ComponentFont BaselineShift="25" ScaleFactor="108"  
> Target="LatinFont-1"/>
>            <ComponentFont Target="LatinFont-2"/>
>        </Encoding>
>    </Language>
> 
>    <!-- Hindi -->
>    <Language Target="hin">
>        <Encoding Target="0028-0029, 002C, 002E, 0900-097F, 2018-2019,  
> 201C-201D">
>            <ComponentFont Target="DevanagariFont-1"/>
>        </Encoding>
>    </Language>
> 
> If we were to implement Composite Font support equivalent to what  
> Adobe applications provide, I think that the following represents a  
> good example, though it is highly simplified (and incomplete) for the  
> purpose of this explanation:
> 
>    <!-- Latin -->
>    <Encoding Target="0000-007F">
>        <ComponentFont BaselineShift="12" ScaleFactor="95"  
> Target="LatinFont-1"/>
>    </Encoding>
> 
>    <!-- Japanese Punctuation -->
>    <Encoding Target="3000-303F">
>        <ComponentFont Target="JapaneseFont-1"/>
>    </Encoding>
> 
>    <!-- Japanese Kana -->
>    <Encoding Target="3041-30FF">
>        <ComponentFont Target="KanaFont-1"/>
>    </Encoding>
> 
>    <!-- Everything Else -->
>    <ComponentFont Target="JapaneseFont-2"/>
> 
> Note how "JapaneseFont-2" serves as the Base Font, because it does not  
> declare any other elements or attributes. The Component Fonts that are  
> declared prior to that line take precedence, and are used for the  
> specific Unicode ranges that are declared. I could have declared a  
> Unicode range for the last font, but for the purpose of this Composite  
> Font, it is not necessary, because any and all glyphs that it contains  
> can be used, other than those masked by previous declarations in the  
> Composite Font.
> 
> Also note that the language was not declared, because it is not  
> important for this specific Composite Font. This adheres to the  
> "elective detail" principle.
> 
> I cannot think of any other attributes for the <Language> tag other  
> than "Target" to specify a three-letter ISO 639-2/T language code. For  
> the <Encoding> tag, it should be possible to re-encode a Component  
> Font, and an "Original" attribute can be used. Consider the following  
> two scenarios, both of which are very real:
> 
> 1) To be able to add glyphs from a Component Font that are encoded in  
> one way (such as according to a single-byte encoding), but to encode  
> them according to a different encoding in the Composite Font, such as  
> in the PUA region. Legacy Composite Font mechanisms referred to this  
> as the ability to add "gaiji" to fonts. The re-encoding was a  
> necessary step. Basically, re-encoding a single-byte Component Font so  
> that the glyphs are accessed via character codes in the Composite Font.
> 
> 2) To be able to change the encoding of a select number of glyphs in a  
> Component Font. A good example are GB 18030 glyphs that are encoded  
> using PUA code points, but can be encoded according to non-PUA code  
> points. It would be reasonable for a Composite Font definition to  
> perform this function. I would claim that both code points (PUA in the  
> original Component Font and non-PUA in the Composite Font) could  
> result in the same glyph when accessed via the Composite Font.
> 
> The way that these were handled in legacy Composite Font formats was  
> to specify encoding ranges for the Composite Font, which could be  
> length=1 (meaning that the start and end character code are the same)  
> at a minimum, and specify only the start character code for the  
> Component Font. For example:
> 
>    <Encoding Target="4E00-4EFF" Original="00"/>
> 
> In other words, U+4E00 through U+4EFF in the Composite Font are mapped  
> to 0x00 through 0xFF in the Component Font.
> 
> Here is a good example of GB 18030 characters that are often PUA- 
> encoded in fonts, but could be re-encoded in Composite Fonts via this  
> mechanism:
> 
>    <Encoding Target="9FB4" Original="FE59"/>
>    <Encoding Target="9FB5" Original="FE61"/>
>    <Encoding Target="9FB6-9FB7" Original="FE66"/>
>    <Encoding Target="9FB8" Original="FE6D"/>
>    <Encoding Target="9FB9" Original="FE7E"/>
>    <Encoding Target="9FBA" Original="FE90"/>
>    <Encoding Target="9FBB" Original="FEA0"/>
>    <Encoding Target="20087" Original="FE51"/>
>    <Encoding Target="20089" Original="FE52"/>
>    <Encoding Target="200CC" Original="FE53"/>
>    <Encoding Target="215D7" Original="FE6C"/>
>    <Encoding Target="2298F" Original="FE76"/>
>    <Encoding Target="241FE" Original="FE91"/>
> 
> I understand that the above is a lot to digest (especially considering  
> that I polished off half a bottle of red wine, followed by two shots  
> of brandy), but if anyone has any comments or feedback, please post it  
> to the mailing list.
> 
> Regards...
> 
> -- Ken