Homoglyph

The homoglyphs
U+0061 a LATIN SMALL LETTER A and
U+0430 а CYRILLIC SMALL LETTER A overlaid. In the image, both characters are set in Helvetica LT Std Roman.

In orthography and typography, a homoglyph is one of two or more graphemes, characters, or glyphs with shapes that appear identical or very similar but may have differing meaning. The designation is also applied to sequences of characters sharing these properties.

In 2008, the Unicode Consortium published its Technical Report #36[1] on a range of issues deriving from the visual similarity of characters both in single scripts, and similarities between characters in different scripts.

Examples of homoglyphic symbols are (a) the diaeresis and umlaut (both a pair of dots, but with different meaning, although encoded with the same code points); and (b) the hyphen and minus sign (both a short horizontal stroke, but with different meaning, although often encoded with the same code point). Among digits and letters, digit 1 and lowercase l are always encoded separately but in many typefaces are given very similar glyphs, and digit 0 and capital O are always encoded separately but in many typefaces are given very similar glyphs. Virtually every example of a homoglyphic pair of characters can potentially be differentiated graphically with clearly distinguishable glyphs and separate code points, but this is not always done. Typefaces that do not emphatically distinguish the one/el and zero/oh homoglyphs are considered unsuitable for writing formulas, URLs, source code, IDs and other text where characters cannot always be differentiated without context. Fonts which distinguish glyphs by means of a slashed zero, for example, are preferred for those uses.

  1. ^ "UTR #36: Unicode Security Considerations". www.unicode.org.

Developed by StudentB