Module:UCS/doc

--[=[

Deletion of the module in English Wikipedia disrupted an example provided in discussion w:Wikipedia_talk:Lua/Archive_1 from 2014.

Usage
&#123;{#invoke:UCS|table|format|list|annotations}}

Parameters
All three currently supported parameters of the table call are positional.

format
Currently ignored but reserved for forward compatibility.

list
Input data, as a sequence of ASCII characters, for building the table. Supported inputs are:


 * +hexadecimal – jump to specified UCS code point, usually (but not necessarily) four hexadecimal digits. Closes the current row if necessary. Default start location is U+0020  SPACE.
 * ! string – name/description the character block, in wiki code. Should not be used where the current row is not finished. The string extends up to newline, so character specifications must start on the next line.
 * Classifiers for exactly one code point:
 * - (hyphen-minus) – the code point is disallowed; produces a purple empty cell.
 * Basic Latin letters (A–Z or a–z) – the code point is an allowed character and belongs to a specified class: see below. Different classes make cells with different background colors. Class letters are case-insensitive, but lowercase letters make smaller character samples.
 * Newline (0x0A) – close the current table row. A special case is a row that consists of a block description and only one “-”: it produces a pink cell spanning all table width that means that specified code points are disallowed.
 * #, ;, / – a comment that runs up to newline. The difference is that for # the ending line feed is not treated as comment and is effective, whereas for ; and / the interpreter resumes from the next line as if were not any line feeds.
 * Spaces (0x20) are ignored and likely will be ignored in future versions.

Tabs (0x09) are currently ignored, but may be interpreted in future versions. All other characters may cause errors or be ignored. Support for page transclusion is planned, but not implemented.

If list is omitted or empty, then a hard-coded list is processed that produces a table for ISO 8859-1.

annotations
An optional list of lines that specify location of #-links on characters. Currently only lines of the form

c1c22…cn#Anchor_for_internal_link

are supported, that generates #-links on specified characters.

Support for “+” code points, ranges, and other targets (links to the mainspace) is planned, but not implemented.

Character classes
This is an original classification, it does not correspond to Unicode character classes. Classifiers are not stored in the module or some other permanent location, but are extracted from the list argument, so classifiers of the same character in different tables can differ.


 * D – digraphs, ligatures, presentation forms, and other redundant characters. Currently a light gray background.
 * I – IPA Extensions and other IPA symbols (except basic Latin). Currently a violet background.
 * J – combining characters. Currently are yellow on the black background.
 * K, L, M – Latin alphabet. Namely, “K” are basic (ASCII) Latin letters, “L” are lesser common letters, and “M” are exotic letters. Currently all have backgrounds in various tones of blue and cyan.
 * N – numerals. Currently a pale red background.
 * O – control characters, broadly construed. Only characters allowed in HTML are classified here. Currently an orange background.
 * P, Q – punctuation marks, common (in English) and exotic respectively. Currently have backgrounds in shades of green.
 * S, T, U – symbols. Can includes also characters from non-Latin scripts, although most of them are not intended to be shown in tables. Namely, “S” are common symbols, “T” are semigraphics, and “U” are exotic symbols. Currently have backgrounds around yellow, olive, and lime.
 * X – classification is unknown. Includes unallocated code points. Currently an empty (default) background.

Class letters A, B, C, E, F, G, H, R, V, W, Y, Z are currently reserved.

The classification has not a firm base and largely reflects personal tastes of the creator. Namely, a separate class for International Phonetic Alphabet reflects its extensive use in Wikipedia, and there is no sharp criterion to discreet “common” and “exotic” characters. Distinction between “U” (exotic symbols) and “Q” (exotic punctuation) is rather arbitrary and probably somewhere is applied mistakenly.

Examples

 * Further information: user:Incnis Mrsi/UCS map

]=]