IAintaBlonde.com » Linux/Unix/GNU

IAintaBlonde.com

Now that u know, lets get serious…………..

UTF-8 and Unicode Standards :: What is UTF-8?

March3

UTF-8 stands for Unicode
Transformation
Format-8. It is an octet (8-bit)
lossless encoding of Unicode characters.

UTF-8 encodes each Unicode character as a variable number of 1 to 4
octets, where the number of octets depends on the integer value assigned
to the Unicode character. It is an efficient encoding of Unicode
documents that use mostly US-ASCII characters because it represents each
character in the range U+0000 through U+007F as a single octet. UTF-8
is the default encoding for XML.

Standards

RFC
3629
: UTF-8, a transformation format of ISO 10646. November 2003.
The
Unicode Standard 5.0
, November 2006. [purchase
from Amazon.com
]
In particular, see the informal
description
of UTF-8 in sections 2.5 and 2.6, pages 30-32, and a
much more formal
definition
in sections 3.9 and 3.10, pages 77-81.

Unicode Demystified: A Practical Programmer's Guide to the Encoding Standard

Articles and background reading

UTF-8 and
Unicode FAQ for Unix/Linux
by Markus Kuhn
Forms
of Unicode
, an excellent overview by Mark Davis
Wikipedia UTF-8
contains a good discussion of why five- and six-octet sequences are
now illegal UTF-8
Unicode
Transformation Formats
[czyborra.com]
Unicode
UTF-8 FAQ
Unicode in
XML and other Markup Languages
: Unicode Technical Report #20
The
Absolute Minimum Every Software Developer Absolutely, Positively
Must Know About Unicode and Character Sets (No Excuses!)
, an
amusing and informative article by Joel Spolsky

Character Sets

The MIME character set attribute for UTF-8 is UTF-8.
Character sets are case-insensitive, so utf-8 is equally
valid. [IANA Character
Sets
].

In an HTML file, place this tag inside <head>
</head>:

<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">

In an XML prolog, the encoding is typically specified as an
attribute:

<?xml version="1.0" encoding="UTF-8" ?>

In Apache server config or .htaccess, this will cause the HTTP header
to be generated for text/html and text/plain
content:

AddDefaultCharset UTF-8

powered by performancing firefox

Overall Top Posts

Top Posts for Today

  • Recent Comments:

    • admin: Ha Ha. Glad to have saved your life., I mean your time :P
    • Naomi: TY so much! you saved my life. Well, not my life, but you saved me...
    • David M: bohzo (hello) Very, very nice site!
    • ivenxoyz: Hi I saw your blog. You have done a good job, I really liked your...
    • Monique: Are you geared up for the new season? Can’t wait!
    • Ethan: It’s easy to get caught up in the little things, isn’t it?...
    • OweEng2: I love Super Mario especially Yoshi since the Gameboy days, but I...
  • Recent Trackbacks:

    • nihotaqicetuwacn: nihotaqicetuwacn...
    • blog: hello...
    • Recipes for all: Recipe for a great treat > Tandoori Chicken !!
    • Mahdi: Comment on Network Simulation Tools: The ultimate list
    • agrotime: Handling new Hardware on Windows / Reinstalling lost drivers