摘要
针对目前字符编码方式众多的现状,应用软件如何更好的判断文件编码属于何种字符集,并将其正确的解码成为不容忽视的问题。针对Windows记事本不能正常显示"联通"二字的Bug进行分析,利用Winhex软件解析文件获得16进制编码,根据得到的编码分析误判原因,通过注释记事本IsTextUTF8函数对分析得到的误判原因进行证实,进一步找到了更多Windows记事本无法正常显示的汉字。
According to the present situation of various character encoded modes,it has become a problem,which can't be ignored, that how softwares judge which character set file the target file belongs to.In this paper, aiming at the fact that Notepad can't display the "Unicom" correctly, the Windows Bug,using software Winhex analysis the file to obtain 16 binary codes,according to the codes to guess the cause of misjudgment,exegesis function IsTextUTF8 of Notepad to prove that,finally it properly showed the root cause of that "Unicom" can't display correctly, and found more Chinese characters which Windows Notepad can't display correctly.
出处
《软件工程师》
2014年第9期22-24,共3页
Software Engineer
关键词
编码方式
字符集
UTF-8
记事本
误判
encoded mode
encoded character set
UTF-8
notepad
miscalculation