[MPEG-OTSPEC] Defining the text shaping working group’s scope
John Hudson
john at tiro.ca
Tue Aug 4 19:19:28 CEST 2020
I have a slightly different take on shaping and layout:
Text shaping is a multi-step, multi-level process that proceeds from
plain text encoding and results in display of text in a specific font or
fonts. The first step is script itemisation of characters in the plain
text, followed by run segmentation for subsequent glyph processing (the
latter is the first thing that needs better documentation, as there are
inconsistencies in how different software performs run segmentation).
Next is a series of steps that I call orthographic unit shaping, in
which the font cmap is used to access default glyph IDs for the
characters in the run, and then a sequence of OpenType Layout GSUB
feature are applied to obtain the basic level of readable text display.
The number of features involved and the complexity of this phase depend
on the script and the language. Orthographic unit shaping for most
European languages can be super simple, but may be more complex if
diacritics require use of combining marks, or if the language has some
atypical behaviour for the script. Orthographic unit shaping for scripts
whose basic readable display involves joining behaviours, positional
forms, reordering of glyphs, etc. may be significantly complex,
involving not only performing substitutions but tracking the output of
those substitutions within the glyph run. Shaping for some scripts,
notably South and Southeast Asian scripts derived from the Brahmi model,
may involve a (second) reordering of glyphs at the conclusion of the
orthographic unit shaping stage. After orthographic unit shaping is
completed, the same text engines are used to perform standard and
conditional or discretionary typographic substitutions, and this is
where using a single term, ‘shaping’ may start to be inadequate.
I think some people may limit the definition of /shaping/ per se to
orthographic unit shaping, i.e. to the process of getting from plain
text encoding to basic readable text display. Which is why I don't think
it is a good idea to call the proposed working group the ‘text shaping
working group’: there is a broader scope of text processing, layout, and
display that needs better documentation, up to and including text block
layout (especially with regard to directionality).
For example, some of us have circled back several times over the past
six years—without conclusion—to the topic of post-line breaking glyph
processing, i.e. being able to perform substitutions and positioning
independent of pre-line breaking run segmentation, orthographic unit
shaping, and various standard and conditional/discretionary typographic
layout features. Several ideas have been put forward at ad hoc OTL
working group meetings, but no consensus has yet been reached. I would
consider this definitely within scope of the proposed group.
So I would be inclined to define the scope thus:
* Script itemisation
* Run segmenation
* Default glyph selection (cmap)
* Low level glyph processing (lookup types and how to apply them;
glyph tracking and reordering operations)
* Orthographic unit shaping (general and script specific)
* Standard typographic presentation
* Conditional typographic presentation
* Positioning (various kinds)
* Line layout and line breaking
* Justification and post-line breaking glyph processing
* Paragraph and text block layout
For many of these items, we should aim to provide authoritative
documentation, i.e. description of the correct way to perform
operations, with reference implementation and test cases. For some of
the later stages, we may only be able to provide best-practices advice.
So with that in mind, I would be inclined to use a broad, general term
when naming the group, e.g. the Text Display Working Group.
JH
--
John Hudson
Tiro Typeworks Ltd www.tiro.com
Salish Sea, BC tiro at tiro.com
NOTE: In the interests of productivity, I am currently
dealing with email on only two days per week, usually
Monday and Thursday unless this schedule is disrupted
by travel. If you need to contact me urgently, please
use some other method of communication. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20200804/045b4e61/attachment.html>
More information about the mpeg-otspec
mailing list