[mpeg-OTspec] Composite Font syntax
Karsten Luecke
karstenluecke at yahoo.de
Thu Jun 25 14:47:20 CEST 2009
Looks elegant. To be sure I understand:
(1)
How would the re-encoding mechanism fold in? Starting from your example:
<!-- Latin -->
<Encoding Target="0000-007F">
<ComponentFont BaselineShift="12" ScaleFactor="95"
Target="LatinFont-1"/>
</Encoding>
Since ComponentFont is inside Encoding, does one need to define
ComponentFont again per each such re-encoding entry? E.g., with nonsense
values:
<!-- Latin -->
<Encoding Target="0000-007D">
<ComponentFont BaselineShift="12" ScaleFactor="95"
Target="LatinFont-1"/>
</Encoding>
<Encoding Target="007E" Original="FE59">
<ComponentFont BaselineShift="12" ScaleFactor="95"
Target="LatinFont-1"/>
</Encoding>
<Encoding Target="007F" Original="FE61">
<ComponentFont BaselineShift="12" ScaleFactor="95"
Target="LatinFont-1"/>
</Encoding>
(2)
Following the descripion of the fallback mechanism, could one do this?
<!-- ENGLISH -->
<Language Target="eng">
<Encoding Target="0028-0029, 002C, 002E, 0041-005A, 0061-007A,
2018-2019, 201C-201D"><!-- POSSIBLY OBSOLETE IN EXAMPLE -->
<ComponentFont Target="LatinFont-1"/>
<ComponentFont Target="LatinFont-2"/>
</Encoding>
</Language>
<!-- OTHER LATIN-SCRIPT LANGUAGES, without Language -->
<Encoding Target="0000-[...]">
<ComponentFont Target="LatinFont-1"/>
<ComponentFont Target="LatinFont-2"/>
</Encoding>
(3)
Even if given as percent, perhaps ScaleFactor better be a float rather
than integer, for higher precision?
Best wishes,
Karsten
Ken Lunde wrote:
> All,
>
> Mikhail Leonov, daan Strebe, and myself spent some time since the last
> meeting to have some private discussions related to the syntax of the
> Composite Font format, and I am now prepared to share with the AHG
> what we have agreed to. Any delay in bringing this discussion to the
> AHG is due to me. And for that, I apologize.
>
> There are three basic elements for the syntax that describes the
> functional portion of the Composite Font recipe:
>
> Language
> Component Fonts
> Unicode Ranges
>
> (As a side note, we also discussed the notion of "Script" as an
> alternative to specifying Unicode ranges, and agreed to defer that
> portion of the discussion in order to bring the agreed portions to the
> rest of the group.)
>
> Given the "elective detail" principle, along with the acknowledgment
> that the intentions and needs of creators and consumers are diverse,
> these three Composite Font elements must be specified in a hierarchy,
> and that the hierarchy depends on the intent of the creator.
> Furthermore, in adhering to the "elective detail" principle, all of
> these elements are not required to be specified, except for a minimum
> of one Component Font.
>
> We have agreed that a minimal Composite Font specifies a single
> Component Font, such as the following:
>
> <ComponentFont Target="LatinFont-1"/>
>
> This is functionally equivalent to the form that uses start- and end-
> tags:
>
> <ComponentFont Target="LatinFont-1"></ComponentFont>
>
> Another minimal form of a Composite Font is simply an ordered list of
> Component Fonts, and the order in which they appear in the list is
> their order of preference, and would function as a fallback font:
>
> <ComponentFont Target="LatinFont-1, LatinFont-2"/>
>
> or:
>
> <ComponentFont Target="LatinFont-1"/>
> <ComponentFont Target="LatinFont-2"/>
>
> The latter form is necessary if one of the Component Fonts requires an
> attribute not shared by another:
>
> <ComponentFont Target="LatinFont-1"/>
> <ComponentFont ScaleFactor="110" Target="LatinFont-2"/>
>
> Note that tag attributes are used to specify the content of the tags,
> as opposed to string data between the start- and end-tags. When an
> element has no further hierarchical information, the empty-element tag
> form can be used, and when there is further hierarchy to specified,
> the start- and end-tag form must be used.
>
> The <ComponentFont> tag example above provided on attribute,
> specifically "Target" that specifies the name of the Component Font.
> Other attributes for this tag can include "BaselineShift" and
> "ScaleFactor" values. Given that design spaces of Component Fonts can
> be diverse, ranging from 256- to 2048-em, with 1000-em being typical
> for PostScript-based fonts, these values are best specified as
> percentages. The ScaleFactor attribute obviously benefits from this.
> The following is an example that uses all of the attributes:
>
> <ComponentFont ScaleFactor="110" BaselineShift="-2"
> Target="LatinFont-1"/>
>
> In other words, the Component Font named "LatinFont-1" is scaled to
> 110% of its size, and a -2% baseline shift is performed.
>
> The other elements are specified as the following tags:
>
> <Language>
> <Encoding>
>
> The <Language> tag uses "Target" as its attribute, which specifies one
> or more three-letter ISO 639-2/T language codes. The <Encoding> tag
> also uses "Target" as its primary attribute, which specifies Unicode
> ranges and Unicode code points, separated by commas. Some examples:
>
> <Language Target="jpn"/>
> <Encoding Target="4E00-9FCB"/>
>
> Let us consider John Hudson's example of mixed English and Devanagari
> text. The issue for that example was about language tagging, and that
> punctuation that is common across languages, but share the same
> Unicode code points, need appropriate treatment. Let us consider only
> the following code points for this example:
>
> Punctuation: 0028-0029, 002C, 002E, 2018-2019, 201C-201D
> Latin: 0041-005A, 0061-007A
> Devanagari: 0900-097F
>
> When the language is English (or other Latin-based one), the
> punctuation should be from the font intended for English, and
> likewise, when the language is Hindi, the punctuation should be from
> the Devanagari font.
>
> <!-- English -->
> <Language Target="eng">
> <Encoding Target="0028-0029, 002C, 002E, 0041-005A, 0061-007A,
> 2018-2019, 201C-201D">
> <ComponentFont BaselineShift="25" ScaleFactor="108"
> Target="LatinFont-1"/>
> <ComponentFont Target="LatinFont-2"/>
> </Encoding>
> </Language>
>
> <!-- Hindi -->
> <Language Target="hin">
> <Encoding Target="0028-0029, 002C, 002E, 0900-097F, 2018-2019,
> 201C-201D">
> <ComponentFont Target="DevanagariFont-1"/>
> </Encoding>
> </Language>
>
> If we were to implement Composite Font support equivalent to what
> Adobe applications provide, I think that the following represents a
> good example, though it is highly simplified (and incomplete) for the
> purpose of this explanation:
>
> <!-- Latin -->
> <Encoding Target="0000-007F">
> <ComponentFont BaselineShift="12" ScaleFactor="95"
> Target="LatinFont-1"/>
> </Encoding>
>
> <!-- Japanese Punctuation -->
> <Encoding Target="3000-303F">
> <ComponentFont Target="JapaneseFont-1"/>
> </Encoding>
>
> <!-- Japanese Kana -->
> <Encoding Target="3041-30FF">
> <ComponentFont Target="KanaFont-1"/>
> </Encoding>
>
> <!-- Everything Else -->
> <ComponentFont Target="JapaneseFont-2"/>
>
> Note how "JapaneseFont-2" serves as the Base Font, because it does not
> declare any other elements or attributes. The Component Fonts that are
> declared prior to that line take precedence, and are used for the
> specific Unicode ranges that are declared. I could have declared a
> Unicode range for the last font, but for the purpose of this Composite
> Font, it is not necessary, because any and all glyphs that it contains
> can be used, other than those masked by previous declarations in the
> Composite Font.
>
> Also note that the language was not declared, because it is not
> important for this specific Composite Font. This adheres to the
> "elective detail" principle.
>
> I cannot think of any other attributes for the <Language> tag other
> than "Target" to specify a three-letter ISO 639-2/T language code. For
> the <Encoding> tag, it should be possible to re-encode a Component
> Font, and an "Original" attribute can be used. Consider the following
> two scenarios, both of which are very real:
>
> 1) To be able to add glyphs from a Component Font that are encoded in
> one way (such as according to a single-byte encoding), but to encode
> them according to a different encoding in the Composite Font, such as
> in the PUA region. Legacy Composite Font mechanisms referred to this
> as the ability to add "gaiji" to fonts. The re-encoding was a
> necessary step. Basically, re-encoding a single-byte Component Font so
> that the glyphs are accessed via character codes in the Composite Font.
>
> 2) To be able to change the encoding of a select number of glyphs in a
> Component Font. A good example are GB 18030 glyphs that are encoded
> using PUA code points, but can be encoded according to non-PUA code
> points. It would be reasonable for a Composite Font definition to
> perform this function. I would claim that both code points (PUA in the
> original Component Font and non-PUA in the Composite Font) could
> result in the same glyph when accessed via the Composite Font.
>
> The way that these were handled in legacy Composite Font formats was
> to specify encoding ranges for the Composite Font, which could be
> length=1 (meaning that the start and end character code are the same)
> at a minimum, and specify only the start character code for the
> Component Font. For example:
>
> <Encoding Target="4E00-4EFF" Original="00"/>
>
> In other words, U+4E00 through U+4EFF in the Composite Font are mapped
> to 0x00 through 0xFF in the Component Font.
>
> Here is a good example of GB 18030 characters that are often PUA-
> encoded in fonts, but could be re-encoded in Composite Fonts via this
> mechanism:
>
> <Encoding Target="9FB4" Original="FE59"/>
> <Encoding Target="9FB5" Original="FE61"/>
> <Encoding Target="9FB6-9FB7" Original="FE66"/>
> <Encoding Target="9FB8" Original="FE6D"/>
> <Encoding Target="9FB9" Original="FE7E"/>
> <Encoding Target="9FBA" Original="FE90"/>
> <Encoding Target="9FBB" Original="FEA0"/>
> <Encoding Target="20087" Original="FE51"/>
> <Encoding Target="20089" Original="FE52"/>
> <Encoding Target="200CC" Original="FE53"/>
> <Encoding Target="215D7" Original="FE6C"/>
> <Encoding Target="2298F" Original="FE76"/>
> <Encoding Target="241FE" Original="FE91"/>
>
> I understand that the above is a lot to digest (especially considering
> that I polished off half a bottle of red wine, followed by two shots
> of brandy), but if anyone has any comments or feedback, please post it
> to the mailing list.
>
> Regards...
>
> -- Ken
More information about the mpeg-otspec
mailing list