Language(s) | International |
---|---|
Standard | Unicode Standard |
Classification | Unicode Transformation Format, variable-width encoding |
Extends | UCS-2 |
Transforms / Encodes | ISO/IEC 10646 (Unicode) |
UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as code points are encoded with one or two 16-bit code units. UTF-16 arose from an earlier obsolete fixed-width 16-bit encoding now known as "UCS-2" (for 2-byte Universal Character Set),[1][2] once it became clear that more than 216 (65,536) code points were needed,[3] including most emoji and important CJK characters such as for personal and place names.[4]
UTF-16 is used by systems such as the Microsoft Windows API, the Java programming language and JavaScript/ECMAScript. It is also sometimes used for plain text and word-processing data files on Microsoft Windows. It is used by more modern implementations of SMS.[5]
UTF-16 is the only encoding (still) allowed on the web that is incompatible with ASCII[6][nb 1] and never gained popularity on the web, where it is declared by under 0.003% of web pages.[8] UTF-8, by comparison, accounts for over 98% of all web pages.[9] The Web Hypertext Application Technology Working Group (WHATWG) considers UTF-8 "the mandatory encoding for all [text]" and that for security reasons browser applications should not use UTF-16.[10]
[...] the term UCS-2 should now be considered obsolete. It no longer refers to an encoding form in either 10646 or the Unicode Standard.
UCS-2 is obsolete terminology which refers to a Unicode implementation up to Unicode 1.1 [...]
UTF-16 uses a single 16-bit code unit to encode over 60,000 of the most common characters in Unicode
I first came up with the idea for this Top Ten List over 10 years ago, which was prompted by some environments that still supported only BMP code points. The idea, of course, was to motivate the developers of such environments to support code points beyond the BMP by providing an enumerated list of reasons to do so. And yes, there are still some environments that support only BMP code points, such as the VivaDesigner app.
UTF-16 encodings are the only encodings that this specification needs to treat as not being ASCII-compatible encodings.
The UTF-8 encoding is the most appropriate encoding for interchange of Unicode, the universal coded character set. Therefore for new protocols and formats, as well as existing formats deployed in new contexts, this specification requires (and defines) the UTF-8 encoding. [..] The problems outlined here go away when exclusively using UTF-8, which is one of the many reasons that UTF-8 is now the mandatory encoding for all text things on the Web.
Cite error: There are <ref group=nb>
tags on this page, but the references will not show without a {{reflist|group=nb}}
template (see the help page).