String

程序员文章站 2022-05-09 22:21:21

...

String - JDK 1.8.0131

一、类定义

1.源码

public final class String implements java.io.Serializable, Comparable<String>, CharSequence

2.分析

a.类定义由final修饰，String 类不可继承
b.实现 Serializable 接口，表示 String 是序列化，String 修饰的内容的状态会保存在内存中
c.实现 Comparable 接口，实现该接口中的字符串比较方法
d.实现 CharSequence 接口，字符串实质是一个 char[]

3.总结

String 对象的内容是不可变的，所以String是线程安全的
任何字符串内容的改变都将返回新的字符串，而原字符串保持不变

4.设计模式
享元模式

二、类的注释

1.源码

/**
 * <p><blockquote><pre>
 *     String str = "abc";
 * </pre></blockquote><p>
 * is equivalent to:
 * <p><blockquote><pre>
 *     char data[] = {'a', 'b', 'c'};
 *     String str = new String(data);
 * </pre></blockquote><p>
 */

String str = "abc" ;
<==>
char data[] = {'a', 'b', 'c'};
String str = new String(data);

2.分析
a.str 是引用，存在栈中，其存储指向对象"abc"在堆中的位置；
b.堆中的对象"abc"，实质为拥有如下属性的对象

char [] data
int offset
int count
int hash

c.char [] data 依然是一个引用，其存储的是指向堆中另外一块存储 {'a','b','c'}的堆中空间的地址

三、成员变量

1.源码

    /** The value is used for character storage. */
    // 字符串实质是 char 类型的数组
    private final char value[];

    /** The offset is the first index of the storage that is used. */
    // 指定 数组中需要使用的元素的第一个位置，默认 0，截取使用子串时，指定起始位置 
    private final int offset;

    /** The count is the number of characters in the String. */
    // 字符串的长度
    private final int count;

    /** Cache the hash code for the string */
    // 字符串的 hash 值
    private int hash; // Default to 0

    /** use serialVersionUID from JDK 1.0.2 for interoperability */
    // 继承 Serializable 时，Eclipse 编译器提示生成唯一性的标识
    private static final long serialVersionUID = -6849794470754667710L;

2.分析

2.1 String的对象不可变但引用可变

字符串实质是 char [] 即char 类型的数据，且使用 final 修饰，说明字符串是不可变的

字符串不可改变是指字符串的对象不可变；而字符串的引用是可以改变的；

String str = "a" ;
str = "b" ;

对象 "a" 的内容并没有改变，而是 str 中存放的指向堆中对象的地址的值变了。即原来存储的是 "a" 所在的地址，现在指向了 "b" 的地址。
每次的重新赋值都会生成一个新的String对象。原有的String对象等待GC回收。
生成新对象的 char [] value 是对原参数（传入参数）的copy后进行操作，所以对原有字符串的修改不会影响到调用String类中方法得到的返回值。

2.2 实现 Serializable 接口作用

Serializable

2.3 hash值计算


    /**
     * Returns a hash code for this string. The hash code for a
     * <code>String</code> object is computed as
     * <blockquote><pre>
     * s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
     * </pre></blockquote>
     * using <code>int</code> arithmetic, where <code>s[i]</code> is the
     * <i>i</i>th character of the string, <code>n</code> is the length of
     * the string, and <code>^</code> indicates exponentiation.
     * (The hash value of the empty string is zero.)
     *
     * @return  a hash code value for this object.
     */
    public int hashCode() {
	int h = hash;
	if (h == 0) {
	    int off = offset;
	    char val[] = value;
	    int len = count;

            for (int i = 0; i < len; i++) {
                h = 31*h + val[off++];
            }
            hash = h;
        }
        return h;
    }

总结：
String str ==> char[] data
hash = data[0] * 31 ^ (n-1 )+ data[1]*31^(n-2)+...+data[n-1]
data[0] 表示该字符的 ascii 值对应的十进制数值

	public static void main(String[] args) {
		
		String a = "123";
		System.out.println(a.hashCode());
                // 48690
       }

分析：
h1 = 31 * 0 + 49 (49 是 1 对应的 ascii 值)
h2 = 31 * 49 + 50
h3 = 31 * ( 31 * 49 + 50 ) + 51
= 31 ^ (3 -1) * 49 + 31 ^(3 -2)*50 + 31 ^(3-3)*51

作用：
相同前缀的字符串生成的hash值要相邻，便于比较、查找

关于hashCode()计算过程中，为什么使用了数字31，主要有以下原因：
1、使用质数计算哈希码，由于质数的特性，它与其他数字相乘之后，计算结果唯一的概率更大，哈希冲突的概率更小。

2、使用的质数越大，哈希冲突的概率越小，但是计算的速度也越慢；31是哈希冲突和性能的折中，实际上是实验观测的结果。

3、JVM会自动对31进行优化：31 * i == (i << 5) – i

详解 equals() 方法和 hashCode() 方法

四、构造方法

1.无参数构造方法

    /**
     * Initializes a newly created {@code String} object so that it represents
     * an empty character sequence.  Note that use of this constructor is
     * unnecessary since Strings are immutable.
     */
    // 无参构造函数
    public String() {
	this.offset = 0;
	this.count = 0;
	this.value = new char[0];
    }

2.参数为字符串

    /**
     * Initializes a newly created {@code String} object so that it represents
     * the same sequence of characters as the argument; in other words, the
     * newly created string is a copy of the argument string. Unless an
     * explicit copy of {@code original} is needed, use of this constructor is
     * unnecessary since Strings are immutable.
     *
     * @param  original
     *         A {@code String}
     */
    public String(String original) {
	int size = original.count;
	char[] originalValue = original.value;
	char[] v;
  	if (originalValue.length > size) {
 	    // The array representing the String is bigger than the new
 	    // String itself.  Perhaps this constructor is being called
 	    // in order to trim the baggage, so make a copy of the array.
            int off = original.offset;
            v = Arrays.copyOfRange(originalValue, off, off+size);
 	} else {
 	    // The array representing the String is the same
 	    // size as the String, so no point in making a copy.
	    v = originalValue;
 	}
	this.offset = 0;
	this.count = size;
	this.value = v;
    }

解析：originalValue.length > size 的情况

    /**
     * Returns a new string that is a substring of this string. The
     * substring begins at the specified <code>beginIndex</code> and
     * extends to the character at index <code>endIndex - 1</code>.
     * Thus the length of the substring is <code>endIndex-beginIndex</code>.
     * <p>
     * Examples:
     * <blockquote><pre>
     * "hamburger".substring(4, 8) returns "urge"
     * "smiles".substring(1, 5) returns "mile"
     * </pre></blockquote>
     *
     * @param      beginIndex   the beginning index, inclusive.
     * @param      endIndex     the ending index, exclusive.
     * @return     the specified substring.
     * @exception  IndexOutOfBoundsException  if the
     *             <code>beginIndex</code> is negative, or
     *             <code>endIndex</code> is larger than the length of
     *             this <code>String</code> object, or
     *             <code>beginIndex</code> is larger than
     *             <code>endIndex</code>.
     */
    public String substring(int beginIndex, int endIndex) {
	if (beginIndex < 0) {
	    throw new StringIndexOutOfBoundsException(beginIndex);
	}
	if (endIndex > count) {
	    throw new StringIndexOutOfBoundsException(endIndex);
	}
	if (beginIndex > endIndex) {
	    throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
	}
	return ((beginIndex == 0) && (endIndex == count)) ? this :
	    new String(offset + beginIndex, endIndex - beginIndex, value);
    }

当调用截取字符串的子串的方法时，
return new String 时 value 依然是原字符串的 char [] 数组

举例：


String str = "abcde";
// char [] value = {'a','b','c','d','e'};
// offset = 0 ; 起始位置
// count = 5 
String strSub = str.substring(0,3);
// 此时的 char [] value 依然是 str 的 char [] value 
// offset = 0 
// count = 3 
String strSubNew = new String(strSub);
// char [] value 的长度 为 5 
// count = 3 
// 所以 char [] data 的长度会大于 string str 的长度

3.参数为字符数组


    public String(char value[]) {
	int size = value.length;
	this.offset = 0;
	this.count = size;
	this.value = Arrays.copyOf(value, size);
    }

返回的是当前字符数组的拷贝，原数组的变更，不会对新的字符串产生影响。

4.参数为字符数组、起始位置、截止位置

    public String(char value[], int offset, int count) {
        if (offset < 0) {
            throw new StringIndexOutOfBoundsException(offset);
        }
        if (count < 0) {
            throw new StringIndexOutOfBoundsException(count);
        }
        // Note: offset or count might be near -1>>>1.
        if (offset > value.length - count) {
            throw new StringIndexOutOfBoundsException(offset + count);
        }
        this.offset = 0;
        this.count = count;
        this.value = Arrays.copyOfRange(value, offset, offset+count);
    }

相比与上一个构造方法，多了一步起始位置是否大于字符串长度减去需要截止长度
的校验

五、substring()

截取子字符串

1.一个参数：起始位置

public String substring(int beginIndex) {
// 结束位置：默认字符串长度
return substring(beginIndex, count);
}

String s = "0123456789";
System.out.println(s.substring(2));// 2 - 9

2.两个参数：起始位置，结束位置

public String substring(int beginIndex, int endIndex) {
if (beginIndex < 0) {
    throw new StringIndexOutOfBoundsException(beginIndex);
}
if (endIndex > count) {
    throw new StringIndexOutOfBoundsException(endIndex);
}
if (beginIndex > endIndex) {
    throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
}
return ((beginIndex == 0) && (endIndex == count)) ? this :
	    new String(offset + beginIndex, endIndex - beginIndex, value);
}

String s = "0123456789";
System.out.println(s.substring(2));// 2 - 9
		
System.out.println(s.substring(3,6)); // 3 -5

总结：
0.起始位置，从0开始
1.字符串截取，包含头部，不包含尾部，即[a,b) 左开右闭的集合

分析：System.out.println(s.substring(3,6)); 的结果不是 3 到 6 ，而是 3 到 5

由源码上分析 new String(0+3,3,value);

调用 new String

String(int offset, int count, char value[]) {
this.value = value;
this.offset = offset;
this.count = count;
}

public String(String original) {
int size = original.count;
char[] originalValue = original.value;
char[] v;
if (originalValue.length > size) {
    // The array representing the String is bigger than the new
    // String itself.  Perhaps this constructor is being called
    // in order to trim the baggage, so make a copy of the array.
    int off = original.offset;
    v = Arrays.copyOfRange(originalValue, off, off+size);
} else {
   // The array representing the String is the same
   // size as the String, so no point in making a copy.
    v = originalValue;
}
this.offset = 0;
this.count = size;
this.value = v;
}

v = Arrays.copyOfRange(originalValue, off, off+size);
==> Arrays.copyOfRange(originalValue, 3, 6);

调用

public static char[] copyOfRange(char[] original, int from, int to) {
int newLength = to - from;
if (newLength < 0)
   throw new IllegalArgumentException(from + " > " + to);
   char[] copy = new char[newLength];
   System.arraycopy(original, from, copy, 0,
               Math.min(original.length - from, newLength));
    return copy;
}

System.arraycopy(original, from, copy, 0,
Math.min(original.length - from, newLength));
==》 System.arraycopy(original, from, copy, 0,3 ); // 长度是3

即：char [] data = {'0','1','2','3','4','5','6','7','8','9'};
substring(3,6);是从 data[3] 开始取值，长度是 6 -3 ，data[3] data[4] data[5],
而不包含 data[6]

六、两种构造方式

1.
String str = "a" ;
String str = new String("a");

2.
前者生成一个对象，判断字符串常量池是否已存在 "a" ，若不存在，则在常量池中存入 "a";若存在则直接取到常量池中 "a" 的地址

后者
String str = "a" ; str = new String("a");
先去常量值中查找是否已经存在 "a" ，无则生成一个对象 "a" ; 有则，直接取出;
在堆中开辟一个空间，内容为 "a" , str 中存放堆中对象 "a" 的地址;
常量池中的 "a" 对象，等待GC回收。

七、intern()

1.源码

// 如果字符串常量池中已经有了此字符串，则直接返回；否则，在常量池中加入此字符串，并返回此对象的引用
public native String intern();

2.测试

String str2 = new String("a");
String str1 = "a" ;
System.out.println(str1 == str2 );
System.out.println(str1 == str2.intern());

str1 == str2 ; 结果为 false ，
str1 存储的是常量池中 "a" 的地址引用
str2 存储的是堆中对象 "a" 的地址引用，一定不等；

str1 == str2.intern() ; 从常量池中取出 str2 指向的 "a" 的地址
str1.equals(str2) 两者的内容相同，所以在常量池中都指向了 "a" 的地址，所以为true

八、+ 连接操作

1.源码

String s1 = "a" ;
String s2 = "a" + "b" ;
String s3 = "a" + 1 + 2  ;
// 编译期不可确认内容：
String s4 = "a" + s1 ; // s1 存放对象的地址，无法确认地址中的内容
// 相当于
StringBuilder s5 = new StringBuilder("a");
s5.append(s1) ;
// 即 	字符串常量与变量相互拼接时，内部的操作实质为 StringBuilder 进行相应的 append 操作
// + 一次产生一个 StringBuilder 对象

// 多次拼接，每一次拼接  产生一个 StringBuilder 对象
String s6 = "" ;
for(int index = 0 ; index < 100 ; index++){
s6 = s6 + index ;
}
// 只产生一个 StringBuilder 对象
StringBuilder s7 = new StringBuilder("");
for(int index = 0 ; index < 100 ; index++){
s7.append(index);
}

2.总结
运行期能够确认的内容，存放堆中；
变量与变量或常量、字面量的组合拼接，等于new新创建对象；
因为在编译期无法确认变量所代表的常量值

九、trim()
1.源码

    public String trim() {
	int len = count;
	int st = 0;
	int off = offset;      /* avoid getfield opcode */
	char[] val = value;    /* avoid getfield opcode */

	while ((st < len) && (val[off + st] <= ' ')) {
	    st++;
	}
	while ((st < len) && (val[off + len - 1] <= ' ')) {
	    len--;
	}
	return ((st > 0) || (len < count)) ? substring(st, len) : this;
    }

2.分析

u0020 ascii 表中表示空格，其前面有31个字符；<= 32(十进制) 的 ascii 均认为是空格

ascii 中 0 表示空白字符；1 ~ 32 为控制字符；

起始位置：正数连续的空格的数量
截止位置：倒数连续的空格的数量

若 st != 0 说明起始位置有连续的空格
若 len < count 说明尾部有连续的空格

生成子串，即生成新的字符串。

3.总结

'\u0020' 表示空格，\r \n \t 等等均小于 '\u0020' ;

十、length()

1.源码

    /**
     * Returns the length of this string.
     * The length is equal to the number of <a href="Character.html#unicode">Unicode
     * code units</a> in the string.
     *
     * @return  the length of the sequence of characters represented by this
     *          object.
     */
// 字符串中代码单元的长度
    public int length() {
        return count;
    }

由

    public String(char value[]) {
	int size = value.length;
	this.offset = 0;
	this.count = size;
	this.value = Arrays.copyOf(value, size);
    }

得

count 即为 char[] value 的长度，size

2.总结

length 方法，是指字符串中代码单元的数量；

代码点指编码表(比如Unicode)中某个字符的代码值(数字)，书写时前面加U+，比如U+0041是字母A的代码点

java中的代码单元指表示编码表字符的最小存储单元，用16位表示。
UTF16编码
代码点与代码单元

String

C# 中string.split用法详解

Java中String类的+运算以及注意点

javascript asp教程第三课 new String() 构造器

String类的获取功能、转换功能

传入一个Map 返回它按value排序后的结果

python3去掉string中的标点符号方法

js中int和string数据类型互相转化实例

Javascript之String对象详解

asp.net下DataSet.WriteXml(String)与(Stream)的区别

java.lang.String 类的所有方法