HTML Character Sets
To display an HTML page correctly, the browser must know what character set (encoding) to use:
Example
<meta charset="UTF-8">
HTML Character Sets
The HTML5 specification encourages web developers to use the UTF-8 character set!
This has not always been the case. The character encoding for the early web was ASCII.
Later, from HTML 2.0 to HTML 4.01, ISO-8859-1 was considered as the standard character set.
With XML and HTML5, UTF-8 finally arrived and solved a lot of character encoding problems.
In the Beginning: ASCII
Computer data is stored as binary codes (01000101) in the electronics.
To standardize the storing of text, the American Standard Code for Information Interchange (ASCII) was created. It defined a unique binary number for each storable character to support the numbers from 0-9, the upper and lower case alphabet (a-z, A-Z), and special characters like ! $ + - ( ) @ < > , .
Since ASCII used 7 bits for the character, it could only represent 128 different characters.
The biggest weakness with ASCII, was that it excluded non-English letters.
ASCII is still in use today, especially in large mainframe computer systems.
For a closer look, please study our Complete ASCII Reference.
In Windows: Windows-1252
Windows-1252 was the default character set in Windows, up to Windows 95.
It is an extension to ASCII, with added international characters.
It uses a full byte (8-bits) to represent 256 different characters.
Since Windows-1252 has been the default in Windows, it is supported by all browsers.
For a closer look, please study: The Complete Windows-1252 Reference.
In HTML 4: ISO-8859-1
The character set most often used in HTML 4 was ISO-8859-1.
ISO-8859-1 is an extension to ASCII, with added international characters.
Example
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
In HTML 4, a character set different from ISO-8859-1 can be specified in the <meta> tag:
Example
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-8">
All HTML 4 processors also support UTF-8:
Example
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
When a browser detects ISO-8859-1 it normally defaults to Windows-1252, because Windows-1252 has 32 more international characters.
For a closer look, please study: The Complete ISO-8859-1 Reference
In HTML5: Unicode UTF-8
The HTML5 specification encourages web developers to use the UTF-8 character set.
Example
<meta charset="UTF-8">
A character-set different from UTF-8 can be specified in the <meta> tag:
Example
<meta charset="ISO-8859-1">
The Unicode Consortium developed the UTF-8 and UTF-16 standards, because the ISO-8859 character-sets are limited, and not compatible a multilingual environment.
The Unicode Standard covers (almost) all the characters, punctuations, and symbols in the world.
All HTML5 and XML processors support UTF-8, UTF-16, Windows-1252, and ISO-8859.
For a closer look, please study: The Complete Unicode Reference.