It’s hard to find one certain character in over 110,000 codepoints. This site aims to make it as easy as possible with the following search options:
If you happen to already have the character in question just paste it in the search box. It will bring you directly to its description page.
You don’t know the name or any properties of a codepoint but its general look? Fear not, on Shapecatcher you can draw the character and get it recognized. This works remarkably well for many characters.
If you know Unicode and also know the rough range, where the codepoint might be, you can give the range directly in the URL. E. g., to inspect characters in the range U+0200 to U+0300, enter in the address bar “beta.codepoints.net/U+0200..U+0300”.
Computers use 0’s and 1’s to store information. To get useful information out of that, in our case to display text, we need a so-called encoding, that tells the computer how to transform those 0’s and 1’s into an alphabet. The first standardized encoding was ASCII, which basically assigns simple Latin upper- and lowercase letters as well as numbers and some punctuation, all in all 128 positions. The W3C has published a very good introduction to the topic of character encodings.
128 positions didn’t last very long. Many institutions and companies began to implement their own encodings. In 2010 there were a whooping 250 encodings widely used, not counting some obscure or privately used ones. This situation proved disastrous, when computers started to talk to one another over the Internet. If the sender didn’t specify the encoding of a message, there was a good chance the receiver would only get a stream of nonsense and rubbish.
Thus enters Unicode. Adobe and Xerox decided in 1984, that this was no situation to continue, and that there is a need for a universal encoding scheme. 1991 saw the publication of the first version of Unicode with the international standardization as ISO 10646 following two years later. (Fun fact: ASCII is standardized in ISO 646, the number for the Unicode standard was deliberately choosen.) Meanwhile the Unicode Consortium began to form in order to guide the further development of the standard.
The most recent version of Unicode is 8.0.0, containing over 110,000 characters in over 100 different scripts. It’s encoding form UTF-8, a superset of ASCII, is the most popular encoding worldwide and the consortium counts Apple, Oracle, Microsoft, Google, IBM, Nokia and many others to its members.
Unicode is a mechanism for universally identifying characters. All characters get an assigned “codepoint”, which universally refers to them. For example, the letter “A” has the codepoint 65 assigned, the chinese character “㐭” the codepoint 13357. Codepoints are usually represented in hexadecimal notation, where “A” to “F” represent the numbers 10 to 16.
To bring the sheer mass of the possible 1,114,111 codepoints in a useful order, Unicode is divided in 17 planes, which are further divided in logically connected blocks. There are ten principles, that guide the extension and care of the Unicode standard:
This website is a private project coordinated by Manuel Strehl. It is not affiliated with or approved by the Unicode Consortium. You can contact me via:
The content on this website reflects the information found in
The Unicode Consortium. The Unicode Standard, Version 8.0.0, (Mountain View, CA: The Unicode Consortium, 2012. ISBN 978-1-936213-02-3)
which happens to be the most relevant version of the Unicode Standard as of August, 2012.
If you find problems, inaccurancies, bugs or other issues with this site, please e-mail me or issue a new bug at the bug tracker. The source code for this site is live on Github . If you like, fork the code, enhance it and send me a pull request. (If you don’t have a Github account, please send the git patch via e-mail.)
There is no warranty, that the content on this site is accurate, complete or error-free! For normative references please refer to the Unicode website itself.
The images representing single Unicode blocks are taken from the font Unidings by George Douros, released under a permissive license. The quotes from Wikipedia are subject to the Creative Commons Attribution Share-alike license. Details can be obtained by following the respective link on each quote. The geographic localization of blocks (used in the “Find My Codepoint” wizard) is based on the categorization on decodeunicode.org, published under the CC BY NC license.
All code provided specifically for Codepoints.net is released under both the GPL and MIT license, with the licensee free to choose. Content genuine to this site is released under the Creative Commons Attribution 3.0 Germany. Attribution in this case is a simple backlink, optionally with the link text “Based on information from Codepoints.net”.
This site uses Piwik to gather statistics about page views. The sole purpose is to enhance this site. If you don’t want your visits to be tracked at all, please follow these instructions:
First of all we’d like to thank the contributors of the Unicode Consortium, who work to standardize the essential part of computation, the display of characters. The same holds for the authors of Wikipedia, who gather knowledge about many parts of the lettering universe. Their share is an important part of this site.
The Polish translation is kindly provided by professor Janusz S. Bień.
The developers supporting this site with their knowledge, bug reports and input take a fair share in keeping it awesome. We want to thank specifically the people contributing code:
The hosting is done on Uberspace, a phantastic provider with extremely helpful and flexible support.
The LATEX names are derived from www.w3.org/Math/characters/unicode.xml, which is curated by David Carlisle and provided together with the MathML specification of the W3C.
Many people base their work on Unicode. We want to thank the authors of these fonts, that they made it possible to re-use them for this project:
The background image on the front page is released under the Creative Commons Attribution license by Flickr user Willi Heidelbach. The button backgrounds on the front page are in the public domain: map of Charlemagne’s empire, 18th century dowser, and NASA Mars Rover.
The “We’re Open Source” image is released under the Creative Commons Attribution Non-Commercial No-Derivations license by Flickr user tima.
The icons are part of the Font Awesome icon set.
Finally I’d like to thank Mathias Bynens for pushing me to publish this site at last.