解析StreamReader与文件乱码问题的解决方法

程序员文章站 2023-12-16 17:56:22

相信很多人在读取文件的时候都会碰到乱码的情况，所谓乱码就是错乱的编码的意思，造成乱码的是由于编码不一致导致的。演示程序：新建3个文本文件：编码和名字一样...

相信很多人在读取文件的时候都会碰到乱码的情况，所谓乱码就是错乱的编码的意思，造成乱码的是由于编码不一致导致的。

演示程序：

新建3个文本文件：

解析StreamReader与文件乱码问题的解决方法

编码和名字一样，分别是ansi,unicode,utf8

里面的内容都是：

~！@#￥%……&*（）

abcdefg

123456789

测试数据

解析StreamReader与文件乱码问题的解决方法

读取这些文件的代码如下：

public static void main()

{

list<string> lstfilepath = new list<string>()

{

"h:\\testtext\\ansi.txt",

"h:\\testtext\\unicode.txt",

"h:\\testtext\\utf8.txt"

};

foreach (string filepath in lstfilepath)

{

using (streamreader reader = new streamreader(filepath))

{

console.writeline("读取文件" + filepath);

console.writeline(reader.readtoend());

console.writeline("************************************************************");

}

输出入下：

解析StreamReader与文件乱码问题的解决方法

由于第一个文件使用ansi编码，但是streamreader 的默认构造函数使用的是utf8编码，所以乱码了。

streamreader 旨在以一种特定的编码输入字符，而 stream 类用于字节的输入和输出。使用 streamreader 读取标准文本文件的各行信息。

除非另外指定， streamreader 的默认编码为 utf-8，而不是当前系统的 ansi 代码页。 utf-8 可以正确处理 unicode 字符并在操作系统的本地化版本上提供一致的结果。

所以解决上面的编码问题的解决方案是使用streamreader，并且传递encoding.default作为编码,一般在中文操作系统中，encoding.default是gb2312编码。

public static void main()

{

list<string> lstfilepath = new list<string>()

{

"h:\\testtext\\ansi.txt",

"h:\\testtext\\unicode.txt",

"h:\\testtext\\utf8.txt"

};

foreach (string filepath in lstfilepath)

{

using (streamreader reader = new streamreader(filepath,encoding.default))

{

console.writeline("读取文件" + filepath);

console.writeline(reader.readtoend());

console.writeline("************************************************************");

}

输出如下：

解析StreamReader与文件乱码问题的解决方法

从这里得到一个结论：使用streamreader,并且使用encoding.default 作为编码。

很可惜，上面的这个结论在某些情况下页会存在问题，例如在你的操作系统中encoding.default 是encoding.utf8的时候。

最完美的解决方案是：文件使用什么编码保存的，就用什么编码来读取。

那如何得到文件的编码呢？

使用下面的代码就可以了：

复制代码代码如下:

public static encoding getencoding(string filepath)
        {
            if (filepath == null)
            {
                throw new argumentnullexception("filepath");
            }
            encoding encoding1 = encoding.default;
            if (file.exists(filepath))
            {
                try
                {
                    using (filestream stream1 = new filestream(filepath, filemode.open, fileaccess.read))
                    {
                        if (stream1.length > 0)
                        {
                            using (streamreader reader1 = new streamreader(stream1, true))
                            {
                                char[] charray1 = new char[1];
                                reader1.read(charray1, 0, 1);
                                encoding1 = reader1.currentencoding;
                                reader1.basestream.position = 0;
                                if (encoding1 == encoding.utf8)
                                {
                                    byte[] buffer1 = encoding1.getpreamble();
                                    if (stream1.length >= buffer1.length)
                                    {
                                        byte[] buffer2 = new byte[buffer1.length];
                                        stream1.read(buffer2, 0, buffer2.length);
                                        for (int num1 = 0; num1 < buffer2.length; num1++)
                                        {
                                            if (buffer2[num1] != buffer1[num1])
                                            {
                                                encoding1 = encoding.default;
                                                break;
                                            }
                                        }
                                    }
                                    else
                                    {
                                        encoding1 = encoding.default;
                                    }
                                }
                            }
                        }
                    }
                }
                catch (exception exception1)
                {
                    throw;
                }
                if (encoding1 == null)
                {
                    encoding1 = encoding.utf8;
                }
            }
            return encoding1;
        }

这段代码使用encoding1.getpreamble()方法来得到编码的字节序列，然后重新读取数据，比较数据，如果不相同则说明是encoding.default.

否则是encoding.utf8.

有了getencoding(filename)方法后，可以将上面的读取代码修改如下：

public static void main()

{

list<string> lstfilepath = new list<string>()

{

"h:\\testtext\\ansi.txt",

"h:\\testtext\\unicode.txt",

"h:\\testtext\\utf8.txt"

};

foreach (string filepath in lstfilepath)

{

using (streamreader reader = new streamreader(filepath, getencoding(filepath)))

{

console.writeline("读取文件" + filepath);

console.writeline(reader.readtoend());

console.writeline("当前编码：" + reader.currentencoding.encodingname);

console.writeline("************************************************************");

}

输出如下：

解析StreamReader与文件乱码问题的解决方法

从这里可以看到ansi 编码，encoding.default 就是简体中文(gb2312)

解析StreamReader与文件乱码问题的解决方法

解析StreamReader与文件乱码问题的解决方法

spring boot使用i18n时properties文件中文乱码问题的解决方法

php导入csv文件碰到乱码问题的解决方法

php中json_encode处理gbk与gb2312中文乱码问题的解决方法

IE6不能正常解析CSS文件问题的解决方法及原因分析

Python遍历zip文件输出名称时出现乱码问题的解决方法

跨浏览器PHP下载文件名中的中文乱码问题解决方法

解析关于java,php以及html的所有文件编码与乱码的处理方法汇总

解析如何在PHP下载文件名中解决乱码的问题

Python字符串的encode与decode研究心得乱码问题解决方法