[MPEG-OTSPEC] Defining the text shaping working group’s scope

Mon Aug 10 21:33:37 CEST 2020

I’ve seen many different comments that might pertain to scope, though it hasn’t been entirely clear (particularly in the thread about vertical layout) which things are suggested as specification gaps versus implementation bugs. I’d like to make some high-level comments/suggestions.

There is a broad scope of interest that encompasses everything having to do with layout of text. And there are valid scenario requirements that include (among other things) that it should be possible to one person to author content using certain fonts and for another to view the content using different fonts but with certain fidelity in presentation to what the author intended—a certain type of _interoperability_. But the scenarios are varied and require different levels of fidelity to what the original author saw—different kinds of interoperability. And there are multiple layers of content specification and implementation involved that will have bearing on what specifications are needed.

Questions have arisen as to what should be considered “shaping”, “text layout” etc. Dave Crossland suggested that one relevant distinction here is plain text versus rich text. That is certainly a relevant distinction, but not the only relevant distinction. I think there is another key distinction: presentation of single lines of text versus presentation of blocks (or pages) of text.

There is a certain correlation between these two X vs Y distinctions: if you are dealing with multiple lines of text, then implicitly you are dealing with something more than plain text — i.e., text that is “rich” to some degree. If nothing else, there are block metrics that have been specified, and that is more than plain text.

I think a useful definition of “shaping” involves something that can be fully specified to happen on single lines of text. That’s not to imply that no higher-level “rich” information isn’t relevant. For instance, if a line of text is to be contained in a block and page layout that has vertical orientation, then that information is a relevant input parameter for how that line needs to be shaped. And the metrics for a block may determine that lines will break at certain points; but once line breaks are chosen, the layout/shaping of a line can be done without additional consideration of what is happening at the block or higher levels.

Of course, there are ways in which block-level layout needs to interact with the line layout. For instance, line breaks don’t get determined until after line-break opportunities are identified and the most appropriate LBO is selected. And that may require iterative operations laying out lines with different LBOs assumed to determine line metrics in each case (since the laid-out glyph metrics can be affected by where LBs do occur). But that is an iterative process in which single lines of text are shaped/laid out with certain input parameters coming from the higher level processing assumed: where lines get broken, whether the layout is horizontal or vertical, etc.

So, I have always understood “shaping” to be a process that happens and can be fully specified for layout of single lines of text.

There are some distinct factors that are relevant for “shaping”:

  1.  Unicode Bidi Algorithm
  2.  Typographic behaviours required for culturally appropriate presentation of Unicode strings
  3.  Certain styling that occurs on spans of text, independent of blocks—in particular, discretionary typographic features
  4.  Certain inputs from higher-level layout, as mentioned above—e.g., horizontal vs. vertical block layout

Depending on how one might define things, UBA may or may not be encompassed within “shaping”. For instance, one could think of “shaping” as processing that happens on runs of text of a single bidi level. Or you might define “shaping” as happening on lines of text after bidi levels have been resolved. One way or another, I’m inclined to incorporate UBA in some way since the way in which bidi levels get resolved will determine what glyphs are adjacent to what other glyphs and that can have bearing on certain interactions that I think should be in scope for “shaping” — e.g., I’m inclined to consider kerning to be part of the shaping process, and also think it should be (ideally) possible to kern glyphs across a bidi level boundary.

OK, with those things in mind, let me bubble up to the subject line for this thread: “Defining the text shaping working group’s scope”:

While the OT spec has been around for 20+ years and has provided a _basis_ for shaping specification that can allow for interoperability of text, content and fonts, shaping itself has not been specified, and the degree of interoperability that we have today, to a significant degree, has been the product of newer implementations trying to reproduce the behaviours observed in earlier implementations. In particular, Microsoft’s “Uniscribe” implementation has been taken as a de facto reference to be copied.

But that is not really a robust way to ensure broad interoperability, even when limiting to single lines of plain text (and the more limited expectations that can be placed on fidelity to author expectation for plain text). And if it’s not a sufficient basis to ensure interop with single lines of plain text, then it certainly isn’t an adequate basis for higher-level layout of rich text.

So, there certainly may be gaps for specifications pertaining to layout of blocks of rich text that should at some point be addressed. But I’m inclined to think that “shaping” specifications are a more fundamental gap, and that the scope for that can be limited to processing that happens on single lines of (plain or rich) text. If progress can be made within that scope, without taking on boiling the ocean of _all_ text layout, then I think that would be a very significant and worthwhile advance.

Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20200810/b441ef62/attachment-0001.html>