UTF8 Encode and Decode

UTF-8 encoding is a variable-length character encoding of Unicode and an implementation of Unicode, also known as the universal code; UTF8 uses 1 to 4 bytes for each character encoding, which is relative to the fixed four bytes of Unicode Length saves storage space. The corresponding relationship between UTF-8 byte length and Unicode code point is as follows:
        One byte (0x00-0x7F) -> U+00~U+7F
        Two bytes (0xC280-0xDFBF) -> U+80~U+7FF
        Three bytes (0xE0A080-0xEFBFBF) -> U+800~U+FFFF
        Four bytes (0xF0908080-0xF48FBFBF) -> U+10000~U+10FFFF
        The characters U+0000 to U+007F (ASCII) are encoded as bytes 0×00 to 0x7F (ASCI II compatible). This means that files containing only 7-bit ASCIl characters are the same in both ASCI II and UTF-8 encoding methods.
        All characters greater than 0x007F are encoded as a string with multiple bytes, each byte has a set of mark bits, and common Chinese characters are basically encoded into three bytes.

