转：Java 字符串编码

程序员文章站 2022-07-14 19:40:09

...

出处：http://blog.sina.com.cn/s/blog_3f4dc73b0100afub.html

在JAVA中，一个char是2个字节（byte），而一个中文汉字是一个字符，也是2个字节。所以可以把汉字赋值给char。而英文字母都是一个字节的，因此它也能保存到一个byte里，一个中文汉字却不能。

char型字符单独在输出语句时，输出它的字符本身，与＋相连时，输出它的ASCII码值。

UTF-16BE和UTF-16LE是UNICODE编码家族的两个成员。UNICODE标准定义了UTF-8、UTF-16、UTF-32三种编码格式，共有UTF-8、UTF-16、UTF-16BE、UTF-16LE、UTF-32、UTF-32BE、UTF-32LE七种编码方案。JAVA所采用的编码方案是UTF-16BE。

字符编码例：

import java.io.UnsupportedEncodingException;    
public class EncodeTest {    

    public static void printByteLength(String s, String encodingName) {    
        System.out.print("字节数：");    
        try {    
            System.out.print(s.getBytes(encodingName).length);    
        } catch (UnsupportedEncodingException e) {    
            e.printStackTrace();    
        }    
        System.out.println(";编码：" + encodingName);    
    }    

    public static void main(String[] args) {    
        String en = "A";    
        String ch = "人";    

        // 计算一个英文字母在各种编码下的字节数    
        System.out.println("英文字母：" + en);    
        EncodeTest.printByteLength(en, "GB2312");    
        EncodeTest.printByteLength(en, "GBK");    
        EncodeTest.printByteLength(en, "GB18030");    
        EncodeTest.printByteLength(en, "ISO-8859-1");    
        EncodeTest.printByteLength(en, "UTF-8");    
        EncodeTest.printByteLength(en, "UTF-16");    
        EncodeTest.printByteLength(en, "UTF-16BE");    
        EncodeTest.printByteLength(en, "UTF-16LE");    

        System.out.println();    

        // 计算一个中文汉字在各种编码下的字节数    
        System.out.println("中文汉字：" + ch);    
        EncodeTest.printByteLength(ch, "GB2312");    
        EncodeTest.printByteLength(ch, "GBK");    
        EncodeTest.printByteLength(ch, "GB18030");    
        EncodeTest.printByteLength(ch, "ISO-8859-1");    
        EncodeTest.printByteLength(ch, "UTF-8");    
        EncodeTest.printByteLength(ch, "UTF-16");    
        EncodeTest.printByteLength(ch, "UTF-16BE");    
        EncodeTest.printByteLength(ch, "UTF-16LE");    
    }    
}

运行结果如下：



英文字母：A 
字节数：1;编码：GB2312 
字节数：1;编码：GBK 
字节数：1;编码：GB18030 
字节数：1;编码：ISO-8859-1 
字节数：1;编码：UTF-8 
字节数：4;编码：UTF-16 
字节数：2;编码：UTF-16BE 
字节数：2;编码：UTF-16LE 
中文汉字：人 
字节数：2;编码：GB2312 
字节数：2;编码：GBK 
字节数：2;编码：GB18030 
字节数：1;编码：ISO-8859-1 
字节数：3;编码：UTF-8 
字节数：4;编码：UTF-16 
字节数：2;编码：UTF-16BE 
字节数：2;编码：UTF-16LE

字符截取例：

import java.io.UnsupportedEncodingException;       

public class CutString {       
    public static void main(String[] args) throws UnsupportedEncodingException {       
        String s = "我ZWR爱JAVA";       
        // 获取GBK编码下的字节数据       
        byte[] data = s.getBytes("GBK");       
        byte[] tmp = new byte[6];       
        // 将data数组的前六个字节拷贝到tmp数组中       
        System.arraycopy(data, 0, tmp, 0, 6);       
        // 将截取到的前六个字节以字符串形式输出到控制台       
        s = new String(tmp);       
        System.out.println(s);       
    }       
}

输出结果：


我ZWR? 
例2：

import java.io.UnsupportedEncodingException;    

public class CutString {    


    public static boolean isChineseChar(char c)    
            throws UnsupportedEncodingException {    
        // 如果字节数大于1，是汉字    
        // 以这种方式区别英文字母和中文汉字并不是十分严谨，但在这个题目中，这样判断已经足够了    
        return String.valueOf(c).getBytes("GBK").length > 1;    
    }    


    public static String substring(String orignal, int count)    
            throws UnsupportedEncodingException {    
        // 原始字符不为null，也不是空字符串    
        if (orignal != null && !"".equals(orignal)) {    
            // 将原始字符串转换为GBK编码格式    
            orignal = new String(orignal.getBytes(), "GBK");    
            // 要截取的字节数大于0，且小于原始字符串的字节数    
            if (count > 0 && count < orignal.getBytes("GBK").length) {    
                StringBuffer buff = new StringBuffer();    
                char c;    
                for (int i = 0; i < count; i++) {    
                    // charAt(int index)也是按照字符来分解字符串的    
                    c = orignal.charAt(i);    
                    buff.append(c);    
                    if (CutString.isChineseChar(c)) {    
                        // 遇到中文汉字，截取字节总数减1    
                        --count;    
                    }    
                }    
                return buff.toString();    
            }    
        }    
        return orignal;    
    }    

    public static void main(String[] args) {    
        // 原始字符串    
        String s = "我ZWR爱JAVA";    
        System.out.println("原始字符串：" + s);    
        try {    
            System.out.println("截取前1位：" + CutString.substring(s, 1));    
            System.out.println("截取前2位：" + CutString.substring(s, 2));    
            System.out.println("截取前4位：" + CutString.substring(s, 4));    
            System.out.println("截取前6位：" + CutString.substring(s, 6));    
        } catch (UnsupportedEncodingException e) {    
            e.printStackTrace();    
        }    
    }    
}

运行结果：



原始字符串：我ZWR爱JAVA 
截取前1位：我 
截取前2位：我 
截取前4位：我ZW 
截取前6位：我ZWR爱

转：Java 字符串编码

java数据结构与算法学习_栈及栈的应用（计算器、中缀转后缀表达式）

Java编程之字符串的大小写转换

在Java web服务器内使用url rewrite（转）

详解java中String值为空字符串与null的判断方法

学艺要精，思考要慎．－－转UTF-8编码的启发 JSPJVMGoogleXML虚拟机

求 ANSI编码转UTF8编码实现代码,该怎么处理

php使用mb_check_encoding检查字符串在指定的编码里是否有效_PHP

Java开发中的23种设计模式详解(转)

改写函数实现PHP二维/三维数组转字符串_php技巧

UTF-8与GBK编码下PHP获取字符串长度的函数

转：Java 字符串编码

java数据结构与算法学习_栈及栈的应用（计算器、中缀转后缀表达式）

Java编程之字符串的大小写转换

在Java web服务器内使用url rewrite（转）

详解java中String值为空字符串与null的判断方法

学艺要精，思考要慎．－－转UTF-8编码的启发 JSPJVMGoogleXML虚拟机

求 ANSI编码转UTF8编码 实现代码,该怎么处理

php使用mb_check_encoding检查字符串在指定的编码里是否有效_PHP

Java开发中的23种设计模式详解(转)

改写函数实现PHP二维/三维数组转字符串_php技巧

UTF-8与GBK编码下PHP获取字符串长度的函数

求 ANSI编码转UTF8编码实现代码,该怎么处理