Character Entities

Character Entities End Tag: NA Support Key: [2\|3\|3.2\|4] [X1\|X1.1] [IE1\|M1\|N1\|O2.1]	What is it? Attributes Tag Example	Parent/Content Model Tips & Tricks Browser Peculiarities
= Index DOT Html by Brian Wilson =

Main Index | Element Tree | Element Index | HTML Support History

What is it?

A text character usually lives as an Octet, which is a single byte or 8 bits of data. Using 8 bits allows for 256 (a range from 0-255) possible distinct character codes. While the HTTP protocol allows the full 256 character range of the ISO 8859-1 (ISO Latin) characters to be transported, not all operating systems or applications may natively support this range. In order to increase portability and viewability of this character set on all browsers, HTML offers alternative representations of all the ISO Latin characters using coded Character Entities (see index below.) These case-sensitive, coded representations are created using characters from a proper subset of the ISO Latin character set known as ASCII.

Included in the Character Entity domain are both numbered and named entities:

Numbered Entity Syntax: &#charnumber;: Where charnumber is a distinct integer from 0-255.
Named Entity Syntax: &charname;: Where charname is a unique mnemonic shorthand of the character to be represented.

Note: The trailing semi-colon character (';') is only necessary if the character following the entity reference would be recognized as part of the entity. Even so, it is probably wise to always use this trailing termination character to be consistent.

Character Entity Indexes
The ISO-8859-1 Character Set: 000-031 | 032-064 | 065-096 | 097-126; 127-159 | 160-191 | 192-223 | 224-255
Unicode Character Entities: Arrows - Arrow Shapes; Greek Capitals - Greek capital characters; Greek Smalls - Greek 'lower case' characters; Math Symbols - Characters commonly used in mathematics; Miscellaneous letters - Latin Extended-A and B characters and Letter-like Symbols; Miscellaneous shapes - Playing card suit symbols and other graphical symbols; Miscellaneous technical symbols - Characters used in various technical disciplines; Bi-directional and spacing characters - Characters used to control bi-directional text and text spacing; General punctuation set 1 - Commonly used punctuation characters; General punctuation set 2 - More commonly used punctuation characters

Attributes: Character Entities do not accept attributes

Example: À = À

Parent Model: %In-line Parent% | %Block Parent% | <Del> | <Ins> | <Legend> | <Option> | <Script> | <Style> | <Textarea> | <Title>
Content Model: Character Entities do not accept content.

Tips & Tricks

Character entities can be used anywhere regular characters will be displayed on screen.
In cases like IMG or INPUT, entities are used only for final display purposes (ALT text for Images or VALUE for Input elements.)
Entities are not to be used in path names for URLs.
DTD Note: The " named character entity was retracted from the HTML 3.2 DTD. There is still some confusion as to WHY this was done, as this entity is in wide use, and exists in the HTML 2.0, 3.0 and 4.0 DTDs. There are two differing stories as to why it was deleted from the 3.2 DTD:
1. Dan Connolly (co-author of HTML 2.0) has said the omission was a mistake.
2. Dave Raggett (author of HTML 3.0, 3.2 and 4.0) has said that the omission was intentional due to a disagreement in the HTML ERB over which entities should be in HTML 3.2. Only the basic set of entities was agreed upon. (Many thanks to a reader who sent me some mail clarifying this.)
Any documents using " will generate validation errors under the HTML 3.2 DTD, but it should be safe to leave these entities in legacy documents due to wide legacy and future browser/DTD support. The alternate form of this entity ('"') WILL validate and should be considered when authoring new documents.

Browser Peculiarities

Internet Explorer 1.0-3.0 treated character entities case-insensitively, such that "&EacuTE;" was treated the same as "é" In IE4.0+, character entities are correctly case-sensitive.
IE seems to be VERY lenient on character entity parsing - it will allow an author to leave off the trailing semi-colon in every case that I have tried, whereas the Netscape 4.x+ and Opera browsers I tried choked the same way for the same test cases about half the time (Netscape/Opera could handle *&nbsp.test*, *&nbsp test*, and *&nbsp test*, but they couldn't handle *&nbsptest*, *&nbsp1test*, *&nbspptest*, *&nbsp99test* and *&nbsp&nbsptest*.) IE handles ALL of these cases just fine and renders all of the attempted non-breaking spaces. I leave it to the reader to infer equivalence classes for this behavior, but the gist of this item is: don't forget the semi-colon!