问答文章1 问答文章501 问答文章1001 问答文章1501 问答文章2001 问答文章2501 问答文章3001 问答文章3501 问答文章4001 问答文章4501 问答文章5001 问答文章5501 问答文章6001 问答文章6501 问答文章7001 问答文章7501 问答文章8001 问答文章8501 问答文章9001 问答文章9501

有关信息传递(古代~现代)的著名故事

发布网友 发布时间:2022-04-22 14:39

我来回答

2个回答

热心网友 时间:2023-11-09 12:55

下面的内容转自我的百度空间,是我收集来的,在这里看起来如果觉得排版不好,可以直接看我的空间内的文章:http://hi.baidu.com/newkedison/blog/item/1c7d2c392cc192f63b87ce12.html

有关UTF-8的一些资料2008年06月13日 星期五 08:17一, 最重要的,UTF-8和Unicode的转换

UTF-8 编码是一种被广泛应用的编码,这种编码致力于把全球的语言纳入一个统一的编码,目前已经将几种亚洲语言纳入。UTF 代表 UCS Transformation Format.

UTF-8 采用变长度字节来表示字符,理论上最多可以到 6 个字节长度。UTF-8 编码兼容了 ASC II(0-127), 也就是说 UTF-8 对于 ASC II 字符的编码是和 ASC II 一样的。对于超过一个字节长度的字符,才用以下编码规范:

左边第一个字节1的个数表示这个字符编码字节的位数,例如两位字节字符编码样式为为:110xxxxx 10xxxxxx; 三位字节字符的编码样式为:1110xxxx 10xxxxxx 10xxxxxx.;以此类推,六位字节字符的编码样式为:1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx。 xxx 的值由字符编码的二进制表示的位填入。只用最短的那个足够表达一个字符编码的多字节串。例如:

Unicode 字符: 00 A9(版权符号) = 1010 1001, UTF-8 编码为:11000010 10101001 = 0x C2 0xA9; 字符 22 60 (不等于符号) = 0010 0010 0110 0000, UTF-8 编码为:11100010 10001001 10100000 = 0xE2 0x89 0xA0

以上转换例子已经确认是正确的,不用怀疑,如果看不懂请再仔细想想

Unicode编码和utf-8编码之间的对应关系表
The table below summarizes the format of these different octet types.
The letter x indicates bits available for encoding bits of the
character number.

Char. number range | UTF-8 octet sequence
(hexadecimal) | (binary)
--------------------+---------------------------------------------
0000 0000-0000 007F | 0xxxxxxx
0000 0080-0000 07FF | 110xxxxx 10xxxxxx
0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx //////A/////////
0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
这是一个Unicode编码和utf-8编码之间的对应关系表。中文的Unicode编码范围在0000 0800-0000 FFFF 中。

二, 关于BOM

UTF-8以字节为编码单元,没有字节序的问题。UTF-16以两个字节为编码单元,在解释一个UTF-16文本前,首先要弄清楚每个编码单元的字节序。例如收到一个“奎”的Unicode编码是594E,“乙”的Unicode编码是4E59。如果我们收到UTF-16字节流“594E”,那么这是“奎”还是“乙”?

Unicode规范中推荐的标记字节顺序的方法是BOM。BOM不是“Bill Of Material”的BOM表,而是Byte Order Mark。BOM是一个有点小聪明的想法:

在UCS编码中有一个叫做"ZERO WIDTH NO-BREAK SPACE"的字符,它的编码是FEFF。而FFFE在UCS中是不存在的字符,所以不应该出现在实际传输中。UCS规范建议我们在传输字节流前,先传输字符"ZERO WIDTH NO-BREAK SPACE"。

这样如果接收者收到FEFF,就表明这个字节流是Big-Endian的;如果收到FFFE,就表明这个字节流是Little-Endian的。因此字符"ZERO WIDTH NO-BREAK SPACE"又被称作BOM。

UTF-8不需要BOM来表明字节顺序,但可以用BOM来表明编码方式。字符"ZERO WIDTH NO-BREAK SPACE"的UTF-8编码是EF BB BF(读者可以用我们前面介绍的编码方法验证一下)。所以如果接收者收到以EF BB BF开头的字节流,就知道这是UTF-8编码了。

三, VB实现UTF-8转Unicode的函数

1.不使用API

Function Utf8ToUnicode(ByRef Utf() As Byte) As String
Dim utfLen As Long

utfLen = -1
On Error Resume Next
utfLen = UBound(Utf)
If utfLen = -1 Then Exit Function

On Error GoTo 0

Dim i As Long, j As Long, k As Long, N As Long
Dim B As Byte, cnt As Byte
Dim Buf() As String
ReDim Buf(utfLen)

i = 0
j = 0
Do While i <= utfLen
B = Utf(i)

If (B And &HFC) = &HFC Then
cnt = 6
ElseIf (B And &HF8) = &HF8 Then
cnt = 5
ElseIf (B And &HF0) = &HF0 Then
cnt = 4
ElseIf (B And &HE0) = &HE0 Then
cnt = 3
ElseIf (B And &HC0) = &HC0 Then
cnt = 2
Else
cnt = 1
End If

If i + cnt - 1 > utfLen Then
Buf(j) = "?"
Exit Do
End If

Select Case cnt
Case 2
N = B And &H1F
Case 3
N = B And &HF
Case 4
N = B And &H7
Case 5
N = B And &H3
Case 6
N = B And &H1
Case Else
Buf(j) = Chr(B)
GoTo Continued:
End Select

For k = 1 To cnt - 1
B = Utf(i + k)
N = N * &H40 + (B And &H3F)
Next

Buf(j) = ChrW(N)
Continued:
i = i + cnt
j = j + 1
Loop

Utf8ToUnicode = Join(Buf, "")
End Function

2. 使用API (包括Unicode转UTF-8)

Private Declare Function WideCharToMultiByte Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long, ByRef lpMultiByteStr As Any, ByVal cchMultiByte As Long, ByVal lpDefaultChar As String, ByVal lpUsedDefaultChar As Long) As Long
Private Declare Function MultiByteToWideChar Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
Private Const CP_UTF8 = 65001

Function Utf8ToUnicode(ByRef Utf() As Byte) As String
Dim lRet As Long
Dim lLength As Long
Dim lBufferSize As Long
lLength = UBound(Utf) - LBound(Utf) + 1
If lLength <= 0 Then Exit Function
lBufferSize = lLength * 2
Utf8ToUnicode = String$(lBufferSize, Chr(0))
lRet = MultiByteToWideChar(CP_UTF8, 0, VarPtr(Utf(0)), lLength, StrPtr(Utf8ToUnicode), lBufferSize)
If lRet <> 0 Then
Utf8ToUnicode = Left(Utf8ToUnicode, lRet)
End If
End Function

Function UnicodeToUtf8(ByVal UCS As String) As Byte()
Dim lLength As Long
Dim lBufferSize As Long
Dim lResult As Long
Dim abUTF8() As Byte
lLength = Len(UCS)
If lLength = 0 Then Exit Function
lBufferSize = lLength * 3 + 1
ReDim abUTF8(lBufferSize - 1)
lResult = WideCharToMultiByte(CP_UTF8, 0, StrPtr(UCS), lLength, abUTF8(0), lBufferSize, vbNullString, 0)
If lResult <> 0 Then
lResult = lResult - 1
ReDim Preserve abUTF8(lResult)
UnicodeToUtf8 = abUTF8
End If
End Function

Private Sub Command1_Click()
Dim byt() As Byte
byt = UnicodeToUtf8("测试")
Debug.Print Hex(byt(0)) & Hex(byt(1)) & Hex(byt(2))
Debug.Print Utf8ToUnicode(byt())
End Sub

参考资料:http://hi.baidu.com/newkedison/blog/item/1c7d2c392cc192f63b87ce12.html

热心网友 时间:2023-11-09 12:56

'复制下面文件到模块中
'调用:Text1.Text = UTF8_Decode(UTF8Zfc)
'注意:文件下载后直接转换,不能做任何其他转换(如strconv)。

'***************模块代码********************
'Utf8字符转化成Unicode字符定义
Public Declare Function MultiByteToWideChar Lib "kernel32" (ByVal CodePage As Long, ByVal dwFlags As Long, ByRef lpMultiByteStr As Any, ByVal cchMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long
Public Const CP_UTF8 = 65001
'获得系统的类型定义
Private Declare Function GetVersionExA Lib "kernel32" (lpVersionInformation As OSVERSIONINFO) As Integer
Private Type OSVERSIONINFO
dwOSVersionInfoSize As Long
dwMajorVersion As Long
dwMinorVersion As Long
dwBuildNumber As Long
dwPlatformId As Long
szCSDVersion As String * 128
End Type

'获得系统的类型
Public Function GetVersion() As String
Dim osinfo As OSVERSIONINFO
Dim retvalue As Integer

osinfo.dwOSVersionInfoSize = 148
osinfo.szCSDVersion = Space$(128)
retvalue = GetVersionExA(osinfo)

With osinfo
Select Case .dwPlatformId
Case 1
Select Case .dwMinorVersion
Case 0
GetVersion = "1Windows 95"
Case 10
GetVersion = "1Windows 98"
Case 90
GetVersion = "1Windows Mellinnium"
End Select
Case 2
Select Case .dwMajorVersion
Case 3
GetVersion = "2Windows NT 3.51"
Case 4
GetVersion = "2Windows NT 4.0"
Case 5
If .dwMinorVersion = 0 Then
GetVersion = "2Windows 2000"
Else
GetVersion = "2Windows XP"
End If
End Select
Case Else
GetVersion = "Failed"
End Select
End With
End Function

'功能: 把Utf8字符转化成Unicode字符
Public Function UTF8_Decode(ByVal sUTF8 As String) As String
Dim lngUtf8Size As Long
Dim strBuffer As String
Dim lngBufferSize As Long
Dim lngResult As Long
Dim bytUtf8() As Byte
Dim n As Long
If LenB(sUTF8) = 0 Then Exit Function
If Left(GetVersion(), 1) = "2" Then
On Error GoTo EndFunction
'bytUtf8 = StrConv(sUTF8, vbFromUnicode)
bytUtf8 = sUTF8
lngUtf8Size = UBound(bytUtf8) + 1
On Error GoTo 0
'Set buffer for longest possible string i.e. each byte is
'ANSI, thus 1 unicode(2 bytes)for every utf-8 character.
lngBufferSize = lngUtf8Size * 2
strBuffer = String$(lngBufferSize, vbNullChar)
'Translate using code page 65001(UTF-8)
lngResult = MultiByteToWideChar(CP_UTF8, 0, bytUtf8(0), _
lngUtf8Size, StrPtr(strBuffer), lngBufferSize)
'Trim result to actual length
If lngResult Then
UTF8_Decode = Left(strBuffer, lngResult)
End If
Else
Dim i As Long
Dim TopIndex As Long
Dim TwoBytes(1) As Byte
Dim ThreeBytes(2) As Byte
Dim AByte As Byte
Dim TStr As String
Dim BArray() As Byte

'Resume on error in case someone inputs text with accents
'that should have been encoded as UTF-8
On Error Resume Next

TopIndex = LenB(sUTF8) ' Number of bytes equal TopIndex+1
If TopIndex = 0 Then Exit Function ' get out if there's nothing to convert
'BArray = StrConv(sUTF8, vbFromUnicode)
BArray = sUTF8
i = 0 ' Initialise pointer
TopIndex = TopIndex - 1
' Iterate through the Byte Array
Do While i <= TopIndex
AByte = BArray(i)
If AByte < &H80 Then
' Normal ANSI character - use it as is
TStr = TStr & Chr$(AByte): i = i + 1 ' Increment byte array index
ElseIf AByte >= &HE0 Then 'was = &HE1 Then
' Start of 3 byte UTF-8 group for a character
' Copy 3 byte to ThreeBytes
ThreeBytes(0) = BArray(i): i = i + 1
ThreeBytes(1) = BArray(i): i = i + 1
ThreeBytes(2) = BArray(i): i = i + 1
' Convert Byte array to UTF-16 then Unicode
TStr = TStr & ChrW$((ThreeBytes(0) And &HF) * &H1000 + (ThreeBytes(1) And &H3F) * &H40 + (ThreeBytes(2) And &H3F))
ElseIf (AByte >= &HC2) And (AByte <= &HDB) Then
' Start of 2 byte UTF-8 group for a character
TwoBytes(0) = BArray(i): i = i + 1
TwoBytes(1) = BArray(i): i = i + 1
' Convert Byte array to UTF-16 then Unicode
TStr = TStr & ChrW$((TwoBytes(0) And &H1F) * &H40 + (TwoBytes(1) And &H3F))
Else
' Normal ANSI character - use it as is
TStr = TStr & Chr$(AByte): i = i + 1 ' Increment byte array index
End If
Loop
UTF8_Decode = TStr ' Return the resultant string
Erase BArray
End If

EndFunction:

End Function
声明声明:本网页内容为用户发布,旨在传播知识,不代表本网认同其观点,若有侵权等问题请及时与本网联系,我们将在第一时间删除处理。E-MAIL:11247931@qq.com
...一直做到醒,醒来得时候头很疼很晕,有什么办法可以改善? 我为什么一睡觉就做梦啊?有什么方法缓解吗? 如何看待《古董局中局》电影版豆瓣评分6.4? 选矿、尾矿用的耐磨管,请问哪种比较好 qq三国我带张飞为什么我打人是我费血特别厉害啊 PPAPPA应用 计提生产部门工人工资一万元,一般车间工人工资五千元,管理部门工人工资... ...期间的停工损失,计入( )。 A.停工损失 B.制造费用 ...应记入“制造费用”账户的费用有( )。 A 生产工人工资 B 车间管理人... ...工人福利费用20000,车间管理工人福利费6800,福利部门的福利3400.的... 与信息有关的谚语 计算机题目:有关信息与数据之间的联系,下列说法错误的是() A信息是数据的内涵,是对数据语义的解释 有关信息安全道理的法律关的例子 影响家庭经济状况有关信息怎么写 影响家庭经济状况有关信息怎么填,其他情况怎么写? 有关“信息”一词的不同解释 什么是信息?试比较分析书中有关信息的各种定义。 关于信息的资料 关于信息的成语 有关信息的资料 有关于信息的名言 2G宽带用户的一般下载速度是多少? 为什么一直是2G网,怎么解决 为什么电信宽带卡交上费还是只有2g样东西? 宽带是指什么? 电信无线宽带2G流量可以用多久 4G的宽带和2G的宽带有什么不同?电信的 无线宽带的2G流量究竟是个什么概念?? 2G宽带指的是什么 电信宽带2G一年多少钱 有关“信息”的名句 古今中外有关信息传递的小故事 有关信息安全的资料 怎样写信息 关于信息的名人名言 有关信息传递发展的资料 在天猫国际买的东西已经发货了 但没有揽收 可以退款退货吗 钉钉在一个组织下番茄表单如何复制到另一个组织中 我想问一下怎么样把钉钉群的内容移动到另一台手机上? QQ红包能微信支付么? QQ红包怎么充值啊,我没有银行卡怎么办 一加手机摔了屏幕弯曲还能修嘛? 手机摔了一下 变弯了怎么办 手机摔变形了,屏幕摔弯了,能休直么,手机能正常运行 苹果手机摔弯了还能修吗 手机直接摔变形了。屏幕摔弯了。还有修的必要吗? 手机摔了会不会爆炸。屏幕弯了但显示正常。 怎么用微信抢红包的钱充qqvip 手机摔弯了能掰回来吗 404 Not Found