[MPEG-OTSPEC] dmap proposal

Skef Iterum skef at skef.org
Fri Dec 22 01:49:04 CET 2023


Maybe I'm being utopian but I can't help thinking that either there's 
some token ("dialect"?) that Unicode should be tracking and formalizing 
but isn't, or Unicode is doing that and we haven't tilted the font 
specifications enough in its direction to use it. There's already all of 
that script and language infrastructure there that is meant for this 
flavor of need, and it seems like a much better place to be solving 
these problems than rapping stuff up in a TTC and having the client side 
pick out the sub-font by name or whatever.

Skef

On 12/21/23 15:00, Peter Constable wrote:
>
> During the recent AHG meeting, I mentioned that Apple, Adobe and 
> Microsoft, some years ago, had started discussing a ‘dmap’ (delta 
> character map) table proposal. This was in late fall of 2016; the 
> focus was on pan-CJK fonts, and in that timeframe Ken Lunde has 
> submitted a proposal to UTC (L2/16-063 Proposal to accept the 
> submission to register the “PanCJKV” IVD collection 
> <https://www.unicode.org/L2/L2016/16063-pancjkv-ivd-collection.pdf>) 
> to define variation sequences for ideographs that designated a range 
> of variation selector characters to correspond to several regions for 
> which regional glyph variants of CJK ideographs might need to be 
> supported. I managed to find an archive of some emails from 
> discussions at the time, so can summarize:
>
> The aim was to be able to support distinct fonts for regional CJK 
> variants without duplication of data. A TTC could allow de-duplication 
> of glyph data, but there would be other duplication. We agreed the 
> biggest concern was with ‘cmap’ data: If any one of the regional 
> variant fonts in the collection were taken as a point of reference, 
> then any of the other regional variants would have many of the same 
> mappings (perhaps most), though not all the same mappings. But there 
> wasn’t any existing means to share common mappings across fonts while 
> there were also some different mappings. Dwane Robinson suggested that 
> we define a new ‘dmap’ table that uses ‘cmap’ formats but is just used 
> to describe the differences in mappings from a common ‘cmap’.
>
> We agreed that a ‘dmap’ table doesn’t need the duplication of 
> different platforms/encodings, and that we can converge on only one 
> platform/encoding (hence, no encoding records are necessary). We 
> discussed format 4 versus 12, and agreed to allow either, but that 
> both are never required. Now, we had teleconfs between Apple and MS, 
> but the emails I found indicate that Behdad was also kept informed: 
> one of the emails records that Behdad requested that format 13 also be 
> allowed.
>
> We hadn’t settled, however, on what to do about format 14 subtables. 
> It wasn’t a priority for Apple at the time, but it seemed like it 
> would be incomplete if we ignored it. Knowing that Ken Lunde was 
> dealing a lot with VSes and also working on pan CJK Source Han Sans 
> CJK, we brought Adobe into our discussion at that point.
>
> The issue with format 14 is that it divides variation sequences into 
> two groups: (i) VSes that map to the same glyph already mapped in a 
> format 4 or 12 subtable (DefaultUVS), and (ii) VSes that map to a 
> different glyph. Certainly the default mappings would be different in 
> the various regional variant fonts, and some of the non-default 
> mappings could also be different. (Even if a given VS never mapped to 
> different glyphs in the different fonts, the fonts could still differ 
> in what VSes they need to support.) So it’s necessary to resolve how a 
> dmap/14 subtable should interacts with a cmap/4 (or cmap/12) subtable, 
> with a cmap/14 subtable, with a dmap/4 (or dmap/12) subtable, and with 
> a dmap/14 subtable. One possible approach would be that the dmap/14 
> subtable completely supersedes the cmap/14 subtable (i.e., the latter 
> is not used at all, and there is no de-duplication of that data). 
> Another approach could be that a dmap/14 subtable complements the 
> cmap/14 subtable by providing select replacement mappings (a 
> delta—though there are still further details about how that would work 
> exactly).
>
> There were some useful points brought up along the way:
>
>   * Ned Holbrook pointed out that the format 14 DefaultUVS subtable is
>     just a space-saving variant of the NonDevaultUVS subtable. A font
>     doesn’t need to have any DefaultUVS table: the same sequences
>     could be handled in NonDefaultUVS subtables — less efficiently…
>     _/in a single font/_.
>   * For CJK, Ken Lunde pointed out that there are two kinds of UVSes
>     to consider:
>       o “Standardized” VSs: these are defined in the Unicode Standard
>         (see
>         unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt
>         <https://www.unicode.org/Public/UCD/latest/ucd/StandardizedVariants.txt>)
>         for CJK Compatibility Ideographs. They are defined in Unicode
>         in a region-independent manner, but most represent
>         region-specific glyphs.
>       o “Ideographic” VSes: these are VSes registered in the
>         Ideographic Variation Database (Ideographic Variation Database
>         (unicode.org) <https://www.unicode.org/ivd/>) in
>         region-specific collections.
>
> Because of the nature of each type, Ken thought there might be limited 
> sharing across fonts. (E.g., at least some font developers would want 
> to support a given IVS collection only in the one regional font for 
> the corresponding region.) He did identify cases, however, in which 
> the same SVS would need to map to different glyphs in different fonts.
>
>   * Again, for CJK, there would be cases in which different fonts
>     would need to support the same VSes, but they would differ wrt
>     DefaultUVS vs. NonDefaultUVS mappings.
>
> Ken also called out some other uses in email exchanges. It all 
> suggested that an ideal solution would make it possible to construct a 
> collection file in which
>
> - two or more fonts can share some UVS mapping data while also having 
> some font-specific mapping data; and
>
> - it's also possible to have other fonts that do not share any UVS 
> mapping data with other fonts.
>
> That would allow the fonts to support only UVSs that are relevant for 
> their respective markets, while also having an efficiency benefit from 
> data-sharing between certain of the fonts.
>
> That was in December 2016. We ran into end-of-year holidays and never 
> resumed to closed on an approach that optimizes size of VS mapping data.
>
> The following is the last draft proposal that we exchanged.
>
> —-
>
> *dmap - Character to Glyph Index Differences Table*
>
> This table is an optional adjunct to the ‘cmap’ table defining 
> differences from the nominal mappings in order to increase sharing of 
> the ‘cmap’ itself across fonts in a TTC.
>
> If a font production tool determines that the ‘cmap’ tables across the 
> fonts in a TTC are largely but not entirely identical, it can choose 
> one font to be used as the basis for the others in terms of character 
> to glyph index mapping, expressing the mappings of the other fonts 
> using only the mappings that are different from those of the former 
> font. An example would be a CJK font family with region-specific 
> fonts, where most characters would map to the same glyph index.
>
> The ‘dmap’ table
>
> Type
>
> 	
>
> Name
>
> 	
>
> Description
>
> UInt16
>
> 	
>
> version
>
> 	
>
> Set to 0.
>
> UInt16
>
> 	
>
> numTables
>
> 	
>
> Number of offset fields to follow.
>
> UInt32
>
> 	
>
> offset[numTables]
>
> 	
>
> Array of byte offsets from beginning of table to cmap subtables. All 
> subtables are assumed to use Unicode. There can be at most one 
> subtable of either format 4, 12, or 13.
>
> As in the ‘cmap’ table, each ‘dmap’ subtable shall have the same 
> structure as in ‘cmap’, starting with a format field that determines 
> the remainder. The language field for a format 4, 12, or 13 subtable 
> must be set to zero.
>
> The steps for determining the glyph index for a given UVS consisting 
> of a base character and optional variation selector are as follows:
>
>  1. Apply the Unicode ‘cmap’ subtable to the base character to get the
>     nominal glyph index.
>  2. If the font has a ‘dmap’ format 4 or 12 subtable that maps the
>     base character to a non-zero glyph index, it will replace the
>     nominal glyph index.
>  3. If the ‘cmap’ has a format 14 subtable, apply it in this way:
>
> 3.1.If the Default UVS Table contains the base character, the final 
> glyph index will the be one determined by the ‘cmap’.
>
> 3.2.Else if the Non-Default UVS Table contains the base character, it 
> will determine the final glyph index.
>
> 3.3.Else the final glyph index will remain as it was after step 2.
>
> Note: An earlier draft of this document allowed for a second subtable 
> of format 14, which would allow redefinition of variation sequences. 
> Owing to uncertainty about usefulness and the exact behavior of the 
> Default UVS Table, however, it has been removed pending further 
> discussion.
>
> **
>
>>
> In the previous draft, a different set of steps for handling UVSes 
> were considered:
>
>>
> The steps for determining the glyph index for a given UVS consisting 
> of a base character and optional variation selector are as follows:
>
> 1. Apply the ‘cmap’ to the base character to get the nominal glyph index.
>
> 2. If the font has a ‘dmap’ format 4 or 12 subtable that maps the base 
> character to a non-zero glyph index, it will replace the nominal glyph 
> index.
>
> 3. If the ‘dmap’ has a format 14 subtable, it will be used in place of 
> the one in the ‘cmap’.
>
> 4. If there is a format 14 subtable, apply it in this way:
>
> 4.1.If the Default UVS Table contains the base character, the final 
> glyph index will the be one determined by the ‘cmap’.
>
> 4.2.Else if the Non-Default UVS Table contains the base character, it 
> will determine the final glyph index.
>
> 4.3.Else the final glyph index will remain as it was after step 2.
>
>>
> Peter
>
>
> _______________________________________________
> mpeg-otspec mailing list
> mpeg-otspec at lists.aau.at
> https://lists.aau.at/mailman/listinfo/mpeg-otspec
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20231221/a4700acc/attachment-0001.html>


More information about the mpeg-otspec mailing list