[MPEG-OTSPEC] Defining the text shaping working group’s scope

John Hudson john at tiro.ca
Tue Aug 4 19:19:28 CEST 2020


I have a slightly different take on shaping and layout:

Text shaping is a multi-step, multi-level process that proceeds from 
plain text encoding and results in display of text in a specific font or 
fonts. The first step is script itemisation of characters in the plain 
text, followed by run segmentation for subsequent glyph processing (the 
latter is the first thing that needs better documentation, as there are 
inconsistencies in how different software performs run segmentation). 
Next is a series of steps that I call orthographic unit shaping, in 
which the font cmap is used to access default glyph IDs for the 
characters in the run, and then a sequence of OpenType Layout GSUB 
feature are applied to obtain the basic level of readable text display. 
The number of features involved and the complexity of this phase depend 
on the script and the language. Orthographic unit shaping for most 
European languages can be super simple, but may be more complex if 
diacritics require use of combining marks, or if the language has some 
atypical behaviour for the script. Orthographic unit shaping for scripts 
whose basic readable display involves joining behaviours, positional 
forms, reordering of glyphs, etc. may be significantly complex, 
involving not only performing substitutions but tracking the output of 
those substitutions within the glyph run. Shaping for some scripts, 
notably South and Southeast Asian scripts derived from the Brahmi model, 
may involve a (second) reordering of glyphs at the conclusion of the 
orthographic unit shaping stage. After orthographic unit shaping is 
completed, the same text engines are used to perform standard and 
conditional or discretionary typographic substitutions, and this is 
where using a single term, ‘shaping’ may start to be inadequate.

I think some people may limit the definition of /shaping/ per se to 
orthographic unit shaping, i.e. to the process of getting from plain 
text encoding to basic readable text display. Which is why I don't think 
it is a good idea to call the proposed working group the ‘text shaping 
working group’: there is a broader scope of text processing, layout, and 
display that needs better documentation, up to and including text block 
layout (especially with regard to directionality).

For example, some of us have circled back several times over the past 
six years—without conclusion—to the topic of post-line breaking glyph 
processing, i.e. being able to perform substitutions and positioning 
independent of pre-line breaking run segmentation, orthographic unit 
shaping, and various standard and conditional/discretionary typographic 
layout features. Several ideas have been put forward at ad hoc OTL 
working group meetings, but no consensus has yet been reached. I would 
consider this definitely within scope of the proposed group.

So I would be inclined to define the scope thus:

  * Script itemisation
  * Run segmenation
  * Default glyph selection (cmap)
  * Low level glyph processing (lookup types and how to apply them;
    glyph tracking and reordering operations)
  * Orthographic unit shaping (general and script specific)
  * Standard typographic presentation
  * Conditional typographic presentation
  * Positioning (various kinds)
  * Line layout and line breaking
  * Justification and post-line breaking glyph processing
  * Paragraph and text block layout

For many  of these items, we should aim to provide authoritative 
documentation, i.e. description of the correct way to perform 
operations, with reference implementation and test cases. For some of 
the later stages, we may only be able to provide best-practices advice.

So with that in mind, I would be inclined to use a broad, general term 
when naming the group, e.g. the Text Display Working Group.

JH


-- 

John Hudson
Tiro Typeworks Ltd    www.tiro.com
Salish Sea, BC        tiro at tiro.com

NOTE: In the interests of productivity, I am currently
dealing with email on only two days per week, usually
Monday and Thursday unless this schedule is disrupted
by travel. If you need to contact me urgently, please
use some other method of communication. Thank you.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20200804/045b4e61/attachment.html>


More information about the mpeg-otspec mailing list