[MPEG-OTSPEC] checksum / 4-byte table offset weasel wording

Peter Constable pgcon6 at msn.com
Mon Aug 31 17:41:18 CEST 2020


Currently the OT spec / OFF as well as Apple's spec are somewhat ambiguous wrt the need for top-level tables to begin at offsets that are integral multiples of four bytes.

Apple's spec<https://developer.apple.com/fonts/TrueType-Reference-Manual/RM06/Chap6.html> gives the algorithm to calculate a checksum for a table, which sums 32-bit chunks. This doesn't necessarily imply that table lengths must be multiples of four (it could consume the first one to three bytes of a following table) except in the case of the last table in the file-there's no allowance in the algorithm for reaching EOF before reaching the table length. But allowing checksum for one table to count bytes from a following table would seem odd. It could alco create confusion when calculating a checksum for the file as a whole: Do overlapping bytes get counted twice?

The section on calculating checksums in the OT spec<https://docs.microsoft.com/en-us/typography/opentype/spec/otff#calculating-checksums> and in OFF give the same algorithm, but add a following note:

"This function implies that the length of a table must be a multiple of four bytes. In fact, a font is not considered structurally well-formed without the correct padding. All tables must begin on four-byte boundaries, and any remaining space between tables is padded with zeros. The length of all tables should be recorded in the table record with their actual length (not their padded length)."

That seems like a clearly-stated requirement. But if four-byte alignment is a requirement, it shouldn't be stated here as an implication of the checksum adjustment; it should be stated in the description of table records within the table directory.

But then in the Recommendations section<https://docs.microsoft.com/en-us/typography/opentype/spec/recom#table-alignment-and-length> of OT/OFF, four-byte alignment is mentioned as a _recommendation_, not a requirement:


"All tables should be aligned to begin at offsets which are multiples of four bytes. While this is not required by the TrueType rasterizer, it does prevent ambiguous checksum calculations and greatly speeds table access on some processors.

"All tables should be recorded in the table directory with their actual length. To ensure that checksums are calculated correctly, it is suggested that tables begin on 32-bit boundaries. Any extra space after a table (and before the next 32-bit boundary) should be padded with zeros."

This is all weasel wording. Either four-byte alignment (with zero padding as needed) should be required, and the checksum calculation be stated is now-easily. Or, four-byte alignment should not be required, but the description for calculation of checksums should be elaborated to cover edge cases unambiguously.

As allowing for the edge cases adds non-trivial complication, both in the description and in implementations, I'm inclined to make it a requirement that tables begin at four-byte-multiple offsets (and remove the weasel wording in the Recommendations section).

Either way, there would be one possible edge case that should be covered: the last table in a file, and how to calculate a checksum if the table length is not a multiple of four. My inclination would be to cover this -which would also cover the general case - by saying that, if a table length is not a multiple of four, then it must be padded with zero bytes to the next four-byte boundary.

Would this break any assumptions in current runtime or tool implementations to make this a clear requirement?



Thanks
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.aau.at/pipermail/mpeg-otspec/attachments/20200831/76a9e589/attachment-0001.html>


More information about the mpeg-otspec mailing list