about UTF- 8_Unix系統_領測軟件測試網

about UTF- 8

發表于：2007-05-26來源：作者：點擊數：標簽：

UTF-8compaction mode is principally designed to support data systems with8-bit communications paths. AnnexBUTF- 8 UTF-8compaction mode is principally designed to support data systems with8-bit communications paths. It has the clearadv ant

UTF-8 compaction mode is principally designed to support data systems with 8-bit communications paths.

Annex B UTF- 8

UTF-8 compaction mode is principally designed to support data systems with 8-bit communications paths. It has the clear advantage that the character addresses U+0000_hex to U+007F_hex, corresponding ASCII (and ISO 646:1991) values 00_hex to 7F_hex are represented by single octets of the same value. It is straightforward both to generate and parse and produces reasonable compaction.

Input and output of up to 21-bit Unicode 3 character addresses for all 1 114 112 characters on the 17 Code Planes 0 through 16 can be cumbersome in normal byte-oriented data systems. In Table B.1, the length of the binary data representation of characters to be encoded (ignoring leading zero bits) determines how many UTF-8 bytes are required.

Table B.1: UTF- 8 byte sequences for Unicode character addresses

Data type and length

Unicode address

(binary format)

1^st Byte

2^nd Byte

3^rd Byte

4^th Byte

Up to 7-bits, encoded as 7-bit ASCII or ISO 646

00000000 0xxxxxxx

0xxxxxxxx

8 to 11 bits

00000yyy yyxxxxxx

110yyyyy

10xxxxxx

16 bits (BMP)

zzzzyyyy yyxxxxxx

1110zzzz

10yyyyyy

10xxxxxx

21 bits, Code Planes 1-16

000uuuuu zzzzyyyy yyxxxxxx

11110uuu

10uuzzzz

10yyyyyy

10xxxxxx

During decoding, the number of bytes in each UTF-8 byte sequence can be immediately determined from the first byte of each sequence.

Legal UTF-8 byte sequences shall conform to Unicode Technical Report 27 as summarized in Table B.2.

Table B.2 – Unicode address ranges for legal UTF-8 byte sequences

Unicode address range	1^st Byte	2^nd Byte	3^rd Byte	4^th Byte
U+0000 to U+007F	00…7F
U+0080 to U+07FF	C2...DF	80…BF
U+0800 to U+0FFF	E0	A0...BF	80...BF
U+1000 to U+FFFF	E1…EF	80...BF	80...BF
U+10000 to U+3FFFF	F0	90…BF	80…BF	80…BF
U+40000 to U+FFFFF	F1…F3	80…BF	80…BF	80…BF
U+100000 to U+10FFFF	F4	80…BF	80…BF	80…BF

原文轉自：http://www.kjueaiud.com

相關文章

漫畫賞析：Linux 內核到底長啥樣

Linux的進程優先級

Windows原生運行Linux的技術細節

Linux常用性能調優工具索引

top使用技巧

bash遍歷目錄

周排行

月排行

下載

全網最詳細的接口測試實戰

先測試再開發？TDD測試驅動

自動化測試架構

軟件測試架構師的知識能力

大數據平臺測試方法

用不同的測試模型來構建測

當軟件測試遇上ChatGPT：軟件

全網最詳細的接口測試實戰

先測試再開發？TDD測試驅動

自動化測試架構

軟件測試架構師的知識能力

大數據平臺測試方法

用不同的測試模型來構建測

當軟件測試遇上ChatGPT：軟件

MBT基于模型的測試介紹資料

iso29119相關介紹性資料

HP QTP 10 中文版官方中文補丁

HP QTP 10 英文版下載地址

HP ALM 11 官方中文版下載地址

Quality Center 9.0中文版下載地

HttpWatch Basic Edition Version 7.

WIN2003+ORACLE11G+QC11(ALM11) 安裝

WIN2003+SQL2005(SP3)+QC11(ALM11) 安

軟件測試沙龍 More>>

新浪微博 More>>

熱門標簽

功能測試

性能測試

安全測試

本地化測試

游戲測試

web測試

單元測試

敏捷測試

測試用例

測試模版

測試管理

測試工具

《測試團隊的招聘與管理

《我們應該如何構建我們

軟件測試 > 測試開發技術 > 軟件測試環境搭建 > Unix系統 >

about UTF- 8

Annex B UTF- 8

Table B.1: UTF- 8 byte sequences for Unicode character addresses

Table B.2 – Unicode address ranges for legal UTF-8 byte sequences