Prefix code

A prefix code is a type of code system distinguished by its possession of the "prefix property", which requires that there is no whole code word in the system that is a prefix (initial segment) of any other code word in the system. It is trivially true for fixed-length codes, so only a point of consideration for variable-length codes.

For example, a code with code {9, 55} has the prefix property; a code consisting of {9, 5, 59, 55} does not, because "5" is a prefix of "59" and also of "55". A prefix code is a uniquely decodable code: given a complete and accurate sequence, a receiver can identify each word without requiring a special marker between words. However, there are uniquely decodable codes that are not prefix codes; for instance, the reverse of a prefix code is still uniquely decodable (it is a suffix code), but it is not necessarily a prefix code.

Prefix codes are also known as prefix-free codes, prefix condition codes and instantaneous codes. Although Huffman coding is just one of many algorithms for deriving prefix codes, prefix codes are also widely referred to as "Huffman codes", even when the code was not produced by a Huffman algorithm. The term comma-free code is sometimes also applied as a synonym for prefix-free codes[1][2] but in most mathematical books and articles (e.g.[3][4]) a comma-free code is used to mean a self-synchronizing code, a subclass of prefix codes.

Using prefix codes, a message can be transmitted as a sequence of concatenated code words, without any out-of-band markers or (alternatively) special markers between words to frame the words in the message. The recipient can decode the message unambiguously, by repeatedly finding and removing sequences that form valid code words. This is not generally possible with codes that lack the prefix property, for example {0, 1, 10, 11}: a receiver reading a "1" at the start of a code word would not know whether that was the complete code word "1", or merely the prefix of the code word "10" or "11"; so the string "10" could be interpreted either as a single codeword or as the concatenation of the words "1" then "0".

The variable-length Huffman codes, country calling codes, the country and publisher parts of ISBNs, the Secondary Synchronization Codes used in the UMTS W-CDMA 3G Wireless Standard, and the instruction sets (machine language) of most computer microarchitectures are prefix codes.

Prefix codes are not error-correcting codes. In practice, a message might first be compressed with a prefix code, and then encoded again with channel coding (including error correction) before transmission.

For any uniquely decodable code there is a prefix code that has the same code word lengths.[5] Kraft's inequality characterizes the sets of code word lengths that are possible in a uniquely decodable code.[6]

  1. ^ US Federal Standard 1037C
  2. ^ ATIS Telecom Glossary 2007, archived from the original on July 8, 2010, retrieved December 4, 2010
  3. ^ Berstel, Jean; Perrin, Dominique (1985), Theory of Codes, Academic Press
  4. ^ Golomb, S. W.; Gordon, Basil; Welch, L. R. (1958), "Comma-Free Codes", Canadian Journal of Mathematics, 10 (2): 202–209, doi:10.4153/CJM-1958-023-9, S2CID 124092269
  5. ^ Le Boudec, Jean-Yves, Patrick Thiran, and Rüdiger Urbanke. Introduction aux sciences de l'information: entropie, compression, chiffrement et correction d'erreurs. PPUR Presses polytechniques, 2015.
  6. ^ Berstel et al (2010) p.75

Developed by StudentB