Thursday, September 10, 2009

UTF8ToUTF16

I wrote this script since I had to convert between UTF8 and UTF16 using vbscript. It turned out to be very non intuitive. A definition of UTF8 can be found here.

I have only used it to translate from 3 octets of bytes (the type starting with 1110), but the principle should stand.
Function UTF8ToUTF16(strUTF8)
binUTF8 = HexToBin(strUTF8)
if left(binUTF8, 1) = "0" then
strResult = "0" & BinToHex(mid(binUTF8, 2, 7))
elseif left(binUTF8, 3) = "110" then
strResult = "0" & mid(binUTF8, 2, 7) & mid(binUTF8, 11, 6)
elseif left(binUTF8, 4) = "1110" then
strResult = mid(binUTF8, 5, 4) & mid(binUTF8, 11, 6) & mid(binUTF8, 19, 6)
elseif left(binUTF8, 5) = "11110" then
strResult = "000" & mid(binUTF8, 6, 3) & mid(binUTF8, 11, 6) & mid(binUTF8, 27, 6)
end if
UTF8ToUTF16 = BinToHex(strResult)
End Function

Here are the BinToHex and the HexToBin functions that it uses.
Function HexToBin(hexNumber)
redim arrHexNumber(len(hexNumber) - 1)
for i=1 to len(hexNumber)
arrHexNumber(i - 1) = mid(hexNumber, i, 1)
next
for each hexChar in arrHexNumber
if hexChar = "0" then strResult = strResult & "0000"
if hexChar = "1" then strResult = strResult & "0001"
if hexChar = "2" then strResult = strResult & "0010"
if hexChar = "3" then strResult = strResult & "0011"
if hexChar = "4" then strResult = strResult & "0100"
if hexChar = "5" then strResult = strResult & "0101"
if hexChar = "6" then strResult = strResult & "0110"
if hexChar = "7" then strResult = strResult & "0111"
if hexChar = "8" then strResult = strResult & "1000"
if hexChar = "9" then strResult = strResult & "1001"
if hexChar = "A" then strResult = strResult & "1010"
if hexChar = "B" then strResult = strResult & "1011"
if hexChar = "C" then strResult = strResult & "1100"
if hexChar = "D" then strResult = strResult & "1101"
if hexChar = "E" then strResult = strResult & "1110"
if hexChar = "F" then strResult = strResult & "1111"
next
HexToBin = strResult
End Function

Function BinToHex(binNumber)
redim arrBinNumber(len(binNumber) - 1)
for i=1 to len(binNumber) step 4
arrBinNumber(i - 1) = mid(binNumber, i, 4)
next
for each binChar in arrBinNumber
if binChar = "0000" then strResult = strResult & "0"
if binChar = "0001" then strResult = strResult & "1"
if binChar = "0010" then strResult = strResult & "2"
if binChar = "0011" then strResult = strResult & "3"
if binChar = "0100" then strResult = strResult & "4"
if binChar = "0101" then strResult = strResult & "5"
if binChar = "0110" then strResult = strResult & "6"
if binChar = "0111" then strResult = strResult & "7"
if binChar = "1000" then strResult = strResult & "8"
if binChar = "1001" then strResult = strResult & "9"
if binChar = "1010" then strResult = strResult & "A"
if binChar = "1011" then strResult = strResult & "B"
if binChar = "1100" then strResult = strResult & "C"
if binChar = "1101" then strResult = strResult & "D"
if binChar = "1110" then strResult = strResult & "E"
if binChar = "1111" then strResult = strResult & "F"
next
BinToHex = strResult
End Function

No comments: