• <ruby id="5koa6"></ruby>
    <ruby id="5koa6"><option id="5koa6"><thead id="5koa6"></thead></option></ruby>

    <progress id="5koa6"></progress>

  • <strong id="5koa6"></strong>
  • about UTF- 8

    發表于:2007-05-26來源:作者:點擊數: 標簽:
    UTF-8compaction mode is principally designed to support data systems with8-bit communications paths. AnnexBUTF- 8 UTF-8compaction mode is principally designed to support data systems with8-bit communications paths. It has the clearadv ant

    UTF-8 compaction mode is principally designed to support data systems with 8-bit communications paths.

    Annex B UTF- 8


    UTF-8 compaction mode is principally designed to support data systems with 8-bit communications paths. It has the clear advantage that the character addresses U+0000hex to U+007Fhex, corresponding ASCII (and ISO 646:1991) values 00hex to 7Fhex are represented by single octets of the same value. It is straightforward both to generate and parse and produces reasonable compaction.


    Input and output of up to 21-bit Unicode 3 character addresses for all 1 114 112 characters on the 17 Code Planes 0 through 16 can be cumbersome in normal byte-oriented data systems. In Table B.1, the length of the binary data representation of characters to be encoded (ignoring leading zero bits) determines how many UTF-8 bytes are required.


    Table B.1: UTF- 8 byte sequences for Unicode character addresses


    Data type and length


    Unicode address

    (binary format)


    1st Byte


    2nd Byte


    3rd Byte


    4th Byte


    Up to 7-bits, encoded as 7-bit ASCII or ISO 646


    00000000 0xxxxxxx


    0xxxxxxxx








    8 to 11 bits


    00000yyy yyxxxxxx


    110yyyyy


    10xxxxxx






    16 bits (BMP)


    zzzzyyyy yyxxxxxx


    1110zzzz


    10yyyyyy


    10xxxxxx




    21 bits, Code Planes 1-16


    000uuuuu zzzzyyyy yyxxxxxx


    11110uuu


    10uuzzzz


    10yyyyyy


    10xxxxxx


    During decoding, the number of bytes in each UTF-8 byte sequence can be immediately determined from the first byte of each sequence.


    Legal UTF-8 byte sequences shall conform to Unicode Technical Report 27 as summarized in Table B.2.






    Table B.2 – Unicode address ranges for legal UTF-8 byte sequences


    Unicode address range


    1st Byte


    2nd Byte


    3rd Byte


    4th Byte

    U+0000 to U+007F

    00…7F




    U+0080 to U+07FF

    C2...DF

    80…BF



    U+0800 to U+0FFF

    E0

    A0...BF

    80...BF


    U+1000 to U+FFFF

    E1…EF

    80...BF

    80...BF


    U+10000 to U+3FFFF

    F0

    90…BF

    80…BF

    80…BF

    U+40000 to U+FFFFF

    F1…F3

    80…BF

    80…BF

    80…BF

    U+100000 to U+10FFFF

    F4

    80…BF

    80…BF

    80…BF


    原文轉自:http://www.kjueaiud.com

    老湿亚洲永久精品ww47香蕉图片_日韩欧美中文字幕北美法律_国产AV永久无码天堂影院_久久婷婷综合色丁香五月

  • <ruby id="5koa6"></ruby>
    <ruby id="5koa6"><option id="5koa6"><thead id="5koa6"></thead></option></ruby>

    <progress id="5koa6"></progress>

  • <strong id="5koa6"></strong>