9.1 Text terminology

Character
A character, as far as an XML document is concerned, is a byte or bytes with a numeric value according to the Unicode standard. For example, what we call the letter "g" is the character with Unicode value 103.
Glyph
A glyph is the visible representation of a character or characters.
A single character can have many different glyphs to represent it, for example g and g.
Multiple characters can reduce to a single glyph; some fonts have separate glyphs for the letter combinations "fl" and "ff" to make their spacing look better (these are called ligatures).
Other times, a single character can be composed of multiple glyphs; a print program might create the character é (which has Unicode value 233) by combining the "e" glyph with a non-spacing accent mark "´".
Font
A collection of glyphs representing a certain set of characters.
Glyph measurements
All the glyphs in a font will normally have the following characteristics in common:
  • baseline - all the glyphs in a font line up on the baseline
  • ascent - the distance from the baseline to the top of the character
  • descent - the distance from the baseline to the bottom of the character
The total height of the character is also called the em-height.
The em-box is a square that has a width as large as an em-height.
glyph_measurements.png