Changes

Jump to navigation Jump to search
no edit summary
{{Documentation subpage}}
This template produces a formatted description of a [[Unicode]] character, to be used inline with regular text.
* <code>The character {{tlx|unichar|<nowiki>a9|COPYRIGHT SIGN</nowiki>}} is about intellectual property.</code> →
*: The character {{unichar|a9|COPYRIGHT SIGN}} is about intellectual property.

==Usage==
The <nowiki>{{unichar}}</nowiki> template needs Unicode hexadecimal value (mandatory) and character name (if the name should show up) as input, like <code><nowiki>{{unichar|00A9|COPYRIGHT SIGN}}</nowiki></code> → {{unichar|00A9|COPYRIGHT SIGN}}. The name may be either all-caps or lowercase.

This template produces a ''formatted description'' of a [[Unicode]] character, to be used ''in-line'' with regular text. It follows the standard Unicode presentation of a character, using the "U+" prefix for displaying the hex code point, followed by the glyph of the literal, followed optionally (if the name is input) by the character name, formal alias, or whatever the input is, using Unicode’s inline formatting recommendation. In running text such as the Unicode Standard, Wikipedia, or other rich-text environments, the character name is preferredly displayed in {{sc2|SMALL-CAPS STYLE}}. (The all-caps presentation is mainly designed for plain-text environments.)

The hexadecimal value is required (e.g. A9), other input is optional. The actual glyph is rendered using a font that contains the character. This can be set to something more specific, e.g. to language- or [[International Phonetic Alphabet|IPA]]-specific fonts. To show the glyph, the font character can be overridden with an image. A wikilink to an article on the character or set of characters, and another to the article [[Unicode]] can be created. It is also possible to add (bracketed like this), the calculated decimal value, HTML character codes, and a custom note.

Some special [[code point]]s are given extra care, like control and space characters. These are automatically detected by the <code>[[Template:unichar/gc|unichar/gc]]</code> sub-template.

===Examples===
* <code><nowiki>{{unichar|00A9}}</nowiki></code> → {{unichar|00A9}}
* <code><nowiki>{{unichar|00A9|COPYRIGHT SIGN}}</nowiki></code> → {{unichar|00A9|COPYRIGHT SIGN}}
* <code><nowiki>{{unichar|00A9|COPYRIGHT SIGN|nlink=Copyright symbol}}</nowiki></code> → {{unichar|00A9|COPYRIGHT SIGN|nlink=Copyright symbol}}
* <code><nowiki>{{unichar|00A9|COPYRIGHT SIGN|nlink=Copyright symbol|note={{crossref|See also [[Copyleft]] symbol}}}}</nowiki></code> → {{unichar|00A9|COPYRIGHT SIGN|nlink=Copyright symbol|note={{crossref|See also [[Copyleft]] symbol}}}}
* <code><nowiki>{{unichar|00A9|COPYRIGHT SIGN|nlink=Copyright symbol|dec=|html=}}</nowiki></code> → {{unichar|00A9|COPYRIGHT SIGN|nlink=Copyright symbol|dec=|html=}}

====Bad examples====
* <code><nowiki>{{unichar|00A9|nlink=Copyright symbol}}</nowiki></code> → {{unichar|00A9|nlink=Copyright symbol}} – broken because there's no name to link
* <code><nowiki>{{unichar|00A9|COPYRIGHT SIGN|nlink=}}</nowiki></code> → {{unichar|00A9|COPYRIGHT SIGN|nlink=}} – broken because [[COPYRIGHT SIGN]] doesn't redirect to our actual article, [[Copyright symbol]]
* {{crossref|See also {{section link||Possible errors}}, below}}

===Parameters===
The blank template, with all parameters, is as follows:
<source lang=xml>
{{unichar
| <!-- hex value, code point (do not add the "U+") -->
| <!-- Unicode name, in ALL-CAPS -->
| ulink =
| image =
| cwith =
| size =
| use =
| use2 =
| nlink =
| dec =
| html =
| note =
}}
</source>

Inline version:
<source lang=xml>
{{unichar| <!--hex value (do not add "U+")-->| <!--Unicode name in ALL-CAPS-->|ulink= |image= |cwith= |size= |use= |use2= |nlink= |dec= |html= |note= }}
</source>

* '''First unnamed parameter''' or '''1=''' Required. The hexadecimal value of the code point, e.g. {{code|00A9}}.
*:''Notes'': The parameter accepts input like {{code|A9}}, {{code|a9}} and {{code|00A9}} as hexadecimal value. Decimal values are not detected being decimal, and will give unexpected results {{crossref|(see also {{section link||Possible errors}}, below)}}.
* '''Second unnamed parameter''' or '''2=''' Optional. The Unicode name of the character. This is given in ALL-CAPS, and the template will re-render it in {{Smallcaps|SMALL-CAPS}}. This name may differ from the title of the corresponding Wikipedia article (see below: nlink=).
* '''nlink=''' Optional wikilink. Name of the Wikipedia page that will be linked to. If used, the Unicode name (second parameter) has a wikilink to the article.
*: ''Warning'': This parameter must have valid value if it is present; if present and empty, a red-link error will appear unless a redirect from the formal Unicode symbol name like [[COPYRIGHT SIGN]] exists and goes to the correct article here (in this case [[Copyright symbol]]. Eventually all of these redirects should exist, but few do {{as of|2017|post=.}}<br/>
*:''Note'': The name of the page is case-sensitive as with all Wikipedia pages.
*:<code><nowiki>{{unichar|00A9|COPYRIGHT SIGN|nlink=Copyright symbol}}</nowiki></code> &rarr; {{unichar|00A9|Copyright sign|nlink=Copyright symbol}}
* '''ulink''' Optional. Creates a wikilink from the <samp>U+</samp> prefix. When used without a name (i.e., {{para|ulink=}}, blank with no value), the article [[Unicode]] is used as the default value in the output: <samp><nowiki>[[Unicode|U+]]</nowiki></samp> producing [[Unicode|U+]]. This only needs to change if you have a reason to link elsewhere than [[Unicode]], e.g. to an article on a subset of Unicode characters.
* '''dec=''' Optional. Adds the decimal value to the text, in the bracketed note. You do not need to add the value manually; just add {{para|dec}}, blank.
* '''html=''' Optional. Adds the HTML character reference to the text, like <samp>&amp;#160;</samp> in the bracketed note. If a ''named character reference'' exists, like <samp>"&amp;nbsp;"</samp>, that is added too. You do not need to add the values manually, just add {{para|html}}, blank.
*'''use=''' Optional. Sets the font-hinting template to get the glyph, since the character may not be present in a regular browser font. Default is {{tlx|unicode}}, other options are {{tlx|IPA}}, {{tlx|lang}} and {{tlx|script}}.
*'''use2=''' Optional. When setting {{para|use|lang}} or {{para|use|script}}, {{para|use2}} should be used to set the language (e.g. {{para|use2|fr}}) or the script (e.g. {{para|use2|Cyrs}}). A glyph may still not show as expected due to browser effects. For a detailed description, see each template's documentation.
*: <code><nowiki>{{unichar|0485|COMBINING CYRILLIC DASIA PNEUMATA|cwith=|use=script|use2=Cyrs}}</nowiki></code> &rarr; {{unichar|0485|COMBINING CYRILLIC DASIA PNEUMATA|cwith=|use=script|use2=Cyrs}}
* '''image=''' Optional. Allows for a graphic image file to represent the glyph; overrides the font completely. The filename should include the extension (like <samp>.svg</samp> or <samp>.png</samp>), but {{em|not}} the prefix <samp>File:</samp>.
* '''cwith=''' Optional. Useful when the Unicode character is {{em|combining}}. Using {{para|cwith}} adds a space before the character, allowing the combining effect. So when used with a character like {{para|cwith|a}}, the character will be combined with the letter "a". In Unicode, a general glyph used to place a combined character is {{unichar|25CC|DOTTED CIRCLE|html=}}.
*: without {{para|cwith}}:
*:<code><nowiki>{{unichar|0485|COMBINING CYRILLIC DASIA PNEUMATA}}</nowiki></code> &rarr; {{unichar|0485|COMBINING CYRILLIC DASIA PNEUMATA}}
*: {{para|cwith}} without parameter:
*:<code><nowiki>{{unichar|0485|COMBINING CYRILLIC DASIA PNEUMATA|cwith=}}</nowiki></code> &rarr; {{unichar|0485|COMBINING CYRILLIC DASIA PNEUMATA|cwith=}}
*: {{para|cwith}} with dotted circle:
*:<code><nowiki>{{unichar|0485|COMBINING CYRILLIC DASIA PNEUMATA|cwith=&amp;#9676;}}</nowiki></code> &rarr; {{unichar|0485|COMBINING CYRILLIC DASIA PNEUMATA|cwith=&#9676;}}
* '''size=''' Optional. Can be used to set the size ''of the glyph''. The default value is <samp>125%</samp>. For the font, all CSS font-size style inputs are accepted: <kbd>7px</kbd>, <kbd>150%</kbd>, <kbd>2em</kbd>, <kbd>larger</kbd>.
*:<code><nowiki>{{unichar|0041|LATIN CAPITAL LETTER A|size=2em}}</nowiki></code> &rarr; {{unichar|0041|LATIN CAPITAL LETTER A|size=2em}}
*: When using an ''image'' (file) instead of a font, this size can only accept sizes in <samp>px</samp> like <kbd>12px</kbd>. Default for images is <samp>10px</samp>.

<pre>
{{unichar
| A9
| COPYRIGHT SIGN
| ulink = Universal Character Set characters
| image =
| size = 150%
| nlink = Copyright symbol
| note = Example
}}
</pre> Produces:
* {{unichar
| A9
| COPYRIGHT SIGN
| ulink = Universal Character Set characters
| image =
| size = 150%
| nlink = Copyright symbol
| note = Example
}}

===Presentation effects===
Since this template is aimed at presenting a ''formatted, inline description'', some effects are introduced to sustain this target.
* '''Showing space characters''': All space characters (those with [[Unicode_character_property#General_Category|General Category]]: Zs) are presented with a light-blue background, to show their actual presence and width: <code>{{unichar|00A0|No-break space|nlink=NBSP}}</code>.
*:Incidentally, the regular space {{background color|#CEEEF2|&nbsp;}} is replaced with <code>&amp;#00A0;</code> (NBSP) to prevent wiki-markup deleting it as repeated spaces.
* '''Removing formatting characters''': Formatting characters (those with [[Unicode_character_property#General_Category|General Category]]: Cf, Zl and Zp) are removed from the output. By definition, formatting characters have no glyph. By removing them they cannot have a formatting effect.
*:Exception: five Arabic Cf/formatting number markings U+0600..U+0603 and U+60DD, are shown. While Cf formatting characters usually have no glyph, these five have. By internally adding "(visible)" to the category, these characters are shown.
*'''Removing whitespace''': The template removes formatting code and surrounding whitespace from the input. A &lt;Return&gt; in the Name-input (possibly unintended) would frustrate the in-line behaviour expectation.
* '''Showing a label like &lt;control-0007&gt;''': Unicode states that a code point has ''[[Unicode character property#General Category|no name]]'' when it is one of these: a control character, a private use character, a surrogate, a not assigned code point (reserved), or a non-character. These code points instead should be referred to by using a "Code Point Label", such as &lt;private-use&gt; or &lt;private-use-E000>. In this situation, this template ''replaces'' the glyph with that label. This way, the correct presentation wins it over Unicode-usage to the letter of the law.
* "Control" general category=Cc: <code>&lt;control&gt;</code> or <code>&lt;control-''0007''&gt;</code>
* "Surrogate" general category=Cs: <code>&lt;surrogate&gt;</code> or <code>&lt;surrogate-''D800''&gt;</code>
* "Private Use": general category=Co: <code>&lt;private-use&gt;</code> or <code>&lt;private-use-''FFA0''&gt;</code>
* "Not a character" (minus the reserved code points, see below): general category=Cn: <code>&lt;not-a-character&gt;</code>, <code>&lt;non-character&gt;</code> or <code>&lt;not-a-character-''FFA0''&gt;</code>

The second parameter (Unicode name) is not presented, since it cannot exist. It is possible to create a link to an article.
*''Note'': A &lt;reserved> (unassigned) code point cannot be detected yet, and so is not presented with this label. These code points too are given Cn category.
*:(Background on <>-labels: A Name can never have <>-brackets at all. These rules prevent mixing up a name with an actual control-character. So it will not happen that a [[Bell character|bell]] rings when a page is opened that contains a Name of U+0007).

===Possible errors===
* The template produces an {{error|Error-message|tag=span}} when {{para|1}} (or first unnamed parameter), the hex value, is missing, empty, or invalid.
* A non-hexadecimal input like {{!mxt|00G9}} produces an error (because {{!mxt|G}} or {{!mxt|g}} is not hexadecimal).
* Do not add the {{!mxt|U+}} prefix, as in {{!mxt|U+00A9}}. It will not be recognised.
* If the template {{em|only}} shows the code point number, like <samp>2038</samp>, you're probably using the wrong template, {{tlx|unicode}}, instead of {{tlx|unichar}}.
* The glyph may be overruled and changed into a {{em|label}} like <samp>&lt;control-0007&gt;</samp>. These characters have no Unicode name. An {{para|nlink}} will be directly to the article (entered in a form like {{para|nlink|Bell signal}}). A blank value of just {{para|nlink=}} cannot work for <samp>&lt;{{var|label}}-{{var|hhhh}}&gt;</samp> characters (there is no character name at all to make into a link). This produces an error.
* A decimal-value input like {{para|1|98}} will be read as being hexadecimal value <samp>0098</samp>. There is {{em|no way}} that the template can detect you intended to enter <kbd>98<sub>10</sub>=62<sub>16</sub></kbd>. No warning is issued, and the wrong character, <samp>U+0098<sub>16</sub></samp>, will be shown ({{em|not}} <samp>U+0062</samp>).
* As noted above, misuse of the {{para|nlink}} parameter may result in red links to articles that don't exist.

===Technical notes===
The string "unichar" is used only in English Wikipedia, as a name for this template. It has no meaning outside this context.<br/>

The template uses these subtemplates:
* {{tl|unichar/main}} Accepts all the input from {{tlx|unichar}}. Calls several subtemplates to produce the textstrings, and then strings them together. Also checks for the error non-hex input.
* {{tl|unichar/ulink}} Creates a piped link for the <samp>U+</samp> prefix.
* {{tl|unichar/gc}} Determines the Unicode general category, when this category is special (like, for control characters).
* {{tl|unichar/glyph}} For rendering the glyph by font. Accepts {{para|image}}, which overrides the font. Also processes {{para|use}}, {{para|use2}}, {{para|size}}, {{para|cwith}}.
* {{tl|unichar/na}} Produces the formatted name of the character in {{Smallcaps|smallcaps}}. Accepts the {{para|nlink}} to create a piped wikilink to an article. When the [[Unicode character property#General Category|general category]] (gc) is special, the name will change into a <samp>&lt;{{var|label}}-{{var|hhhh}}&gt;</samp>.
* {{tl|unichar/notes}} Produces the three optional notes in parentheses (round brackets): <samp>decimal</samp> (from {{para|dec}}); <samp>HTML</samp> (from {{para|html}} – both decimal like <samp>&amp;#160;</samp> and named like <samp>&amp;nbsp;</samp> if that exists, using {{tlx|numcr2namecr}}); and the free-text {{para|note}}. Also does the parentheses themselves.
* Using the main template as an easy-input feature, there are few calculations done (actually only two hex2dec), and allows for adding default values not too deep in the templates.
* The value <code>&lt;#salted#&gt;</code> is used internally to pass through a non-defined input parameter. This value is correct when about the Unicode name, because it cannot have the characters <##>, and so [[Salting the earth|salted]] is the right word (meaning uninhibitable). For ease of code maintenance, it is used in various places in the code.

====Issues====
* Unassigned code points, to be labelled &lt;reserved&gt;, cannot be detected.
* When using {{para|use-script}}, then {{para|use2}} needs lowercase (e.g. 0485, Cyrs or cyrs){{clarify|date=December 2017|reason=This makes no sense. Numerals have no case, and "Cyrs" is not lowercase but mixed-case.}}
* When using for one of the RTL formatting marks, its effect may break out of the template (text following the template goes RTL, too). As it is now, this requires extra code.

{{Unicode templates}}

==TemplateData==
{{TemplateData header}}
<templatedata>
{
"params": {
"1": {
"aliases": [
"hval"
],
"label": "Hex value",
"description": "Hexadecimal unicode codepoint",
"example": "031A",
"type": "string",
"required": true
},
"2": {
"aliases": [
"na"
],
"label": "Character name",
"description": "If provided, shows Unicode character name",
"example": "COMBINING LEFT ANGLE ABOVE",
"type": "string",
"suggested": true
},
"ulink": {
"example": "Phonetic symbols in Unicode",
"type": "line"
},
"image": {},
"cwith": {
"type": "string"
},
"size": {
"description": "Relative size of rendered character",
"example": "200%",
"type": "string"
},
"use": {
"type": "string"
},
"use2": {
"type": "string"
},
"nlink": {
"type": "string"
},
"dec": {
"type": "number"
},
"HTML": {
"aliases": [
"html"
],
"type": "string",
"label": "Show HTML code?",
"description": "If provided, shows HTML code",
"example": "yes",
"suggested": true
},
"note": {
"type": "line"
}
},
"description": "Formats a Unicode character description. INline.",
"format": "inline"
}
</templatedata>

==See also, external links==
Useful links for researching Unicode characters:
* [http://unicode.org/charts/ Unicode.org] charts in PDF format, showing the U+ hex values.
* [https://www.fileformat.info/info/unicode/char/search.htm Fileformat.com search], to search by ''name'' (whole or partial), by U+ ''hex value'' or ''decimal value'', or by the font ''symbol'' (copy-paste it). Extra information provided per character. One character only.
* [http://www.branah.com/unicode-converter branah.com's] a multi-character Unicode converter.

<includeonly>{{sandbox other||
<!-- Categories below this line, please; interwikis at Wikidata -->
[[Category:Unicode character templates]]
}}</includeonly>
Anonymous user

Navigation menu