Java的String类详解
java的string类
string类是除了java的基本类型之外用的最多的类, 甚至用的比基本类型还多. 同样jdk中对java类也有很多的优化
类的定义
public final class string implements java.io.serializable, comparable<string>, charsequence{ /** the value is used for character storage. */ private final char value[]; /** cache the hash code for the string */ private int hash; // default to 0 /** use serialversionuid from jdk 1.0.2 for interoperability */ private static final long serialversionuid = -6849794470754667710l; /** * class string is special cased within the serialization stream protocol. * * a string instance is written into an objectoutputstream according to * <a href="{@docroot}/../platform/serialization/spec/output.html"> * object serialization specification, section 6.2, "stream elements"</a> */ private static final objectstreamfield[] serialpersistentfields = new objectstreamfield[0]; /** * initializes a newly created {@code string} object so that it represents * an empty character sequence. note that use of this constructor is * unnecessary since strings are immutable. */ public string() { this.value = "".value; } /** * initializes a newly created {@code string} object so that it represents * the same sequence of characters as the argument; in other words, the * newly created string is a copy of the argument string. unless an * explicit copy of {@code original} is needed, use of this constructor is * unnecessary since strings are immutable. * * @param original * a {@code string} */ public string(string original) { this.value = original.value; this.hash = original.hash; }
final 标识不允许集成重载. jdk中还多重要类都是final 标识, 防止应用程序继承重载以影响jdk的安全
继承serializable 接口, 可以放心的序列化
comparable 接口, 可以根据自然序排序.
charsequence 字符串的重要接口
char数组 value . final 修饰.
hash字段 int, 表示当前的hashcode值, 避免每次重复计算hash值
comparable 接口的compareto方法实现
public int compareto(string anotherstring) { int len1 = value.length; int len2 = anotherstring.value.length; int lim = math.min(len1, len2); char v1[] = value; char v2[] = anotherstring.value; int k = 0; while (k < lim) { //也只是循环比较到长度短的那个字符串 char c1 = v1[k]; char c2 = v2[k]; if (c1 != c2) { return c1 - c2; } k++; } return len1 - len2; //如果前面的长度字符串都一样, 则长度长的大 }
从左往右逐个char字符比较大小, 从代码可以看出 "s" > "asssssssssssssss"
也只是循环比较到长度短的那个字符串
-
如果前面的长度字符串都一样, 则长度长的大
构造方法
/** * initializes a newly created {@code string} object so that it represents * an empty character sequence. note that use of this constructor is * unnecessary since strings are immutable. */ public string() { this.value = "".value; } /** * initializes a newly created {@code string} object so that it represents * the same sequence of characters as the argument; in other words, the * newly created string is a copy of the argument string. unless an * explicit copy of {@code original} is needed, use of this constructor is * unnecessary since strings are immutable. * * @param original * a {@code string} */ public string(string original) { this.value = original.value; this.hash = original.hash; } /** * */ public string(byte bytes[], int offset, int length, charset charset) { if (charset == null) throw new nullpointerexception("charset"); checkbounds(bytes, offset, length); this.value = stringcoding.decode(charset, bytes, offset, length); }
空白构造方法其实是生成 "" 字符串
传入其他字符串的构造方式其实只是把其他字符串的value 和hash 值的引用复制一份, 不用担心两个字符串的value和hash 互相干扰. 因为string类中没有修改这两个值的方法, 并且这两个值是private final修饰的, 已经无法修改了
空白构造方法中没有设置hash的值, 则使用 hash的默认值 // default to 0
-
传入字节数组的构造方法, 怎么将字节转成字符串是使用
stringcoding.decode(charset, bytes, offset, length);
方法stringcoding类的修饰符是default 并且里面都是default static 修饰的方法, 很遗憾, 我们无法直接使用其中的方法
stringcoding.decode 方法
static char[] decode(charset cs, byte[] ba, int off, int len) { // (1)we never cache the "external" cs, the only benefit of creating // an additional stringde/encoder object to wrap it is to share the // de/encode() method. these sd/e objects are short-lifed, the young-gen // gc should be able to take care of them well. but the best approash // is still not to generate them if not really necessary. // (2)the defensive copy of the input byte/char[] has a big performance // impact, as well as the outgoing result byte/char[]. need to do the // optimization check of (sm==null && classloader0==null) for both. // (3)getclass().getclassloader0() is expensive // (4)there might be a timing gap in istrusted setting. getclassloader0() // is only chcked (and then istrusted gets set) when (sm==null). it is // possible that the sm==null for now but then sm is not null later // when safetrim() is invoked...the "safe" way to do is to redundant // check (... && (istrusted || sm == null || getclassloader0())) in trim // but it then can be argued that the sm is null when the opertaion // is started... charsetdecoder cd = cs.newdecoder(); int en = scale(len, cd.maxcharsperbyte()); char[] ca = new char[en]; if (len == 0) return ca; boolean istrusted = false; if (system.getsecuritymanager() != null) { if (!(istrusted = (cs.getclass().getclassloader0() == null))) { ba = arrays.copyofrange(ba, off, off + len); off = 0; } } cd.onmalformedinput(codingerroraction.replace) .onunmappablecharacter(codingerroraction.replace) .reset(); if (cd instanceof arraydecoder) { int clen = ((arraydecoder)cd).decode(ba, off, len, ca); return safetrim(ca, clen, cs, istrusted); } else { bytebuffer bb = bytebuffer.wrap(ba, off, len); charbuffer cb = charbuffer.wrap(ca); try { coderresult cr = cd.decode(bb, cb, true); if (!cr.isunderflow()) cr.throwexception(); cr = cd.flush(cb); if (!cr.isunderflow()) cr.throwexception(); } catch (charactercodingexception x) { // substitution is always enabled, // so this shouldn't happen throw new error(x); } return safetrim(ca, cb.position(), cs, istrusted); } }
-
真正的byte[] 转成char[] 是使用charsetdecoder虚拟类, 而这个类的对象是你传入的charset字符编码类中生成的.
看下utf8的charsetdecoder实现类.
utf8的charsetdecoder 类是内部静态类, 实现了charsetdecoder 和arraydecoder 接口, 接口中的方法很长,都是字节转字符的一些换算, 如果要看懂, 需要一些编码的知识. 追到这里结束
private static class decoder extends charsetdecoder implements arraydecoder { private decoder(charset var1) { super(var1, 1.0f, 1.0f); } // 此处省略无关方法....... /** * 真正的字节转字符的方法 */ public int decode(byte[] var1, int var2, int var3, char[] var4) { int var5 = var2 + var3; int var6 = 0; int var7 = math.min(var3, var4.length); bytebuffer var8; for(var8 = null; var6 < var7 && var1[var2] >= 0; var4[var6++] = (char)var1[var2++]) { } while(true) { while(true) { while(var2 < var5) { byte var9 = var1[var2++]; if (var9 < 0) { byte var10; if (var9 >> 5 != -2 || (var9 & 30) == 0) { byte var11; if (var9 >> 4 == -2) { if (var2 + 1 < var5) { var10 = var1[var2++]; var11 = var1[var2++]; if (ismalformed3(var9, var10, var11)) { if (this.malformedinputaction() != codingerroraction.replace) { return -1; } var4[var6++] = this.replacement().charat(0); var2 -= 3; var8 = getbytebuffer(var8, var1, var2); var2 += malformedn(var8, 3).length(); } else { char var15 = (char)(var9 << 12 ^ var10 << 6 ^ var11 ^ -123008); if (character.issurrogate(var15)) { if (this.malformedinputaction() != codingerroraction.replace) { return -1; } var4[var6++] = this.replacement().charat(0); } else { var4[var6++] = var15; } } } else { if (this.malformedinputaction() != codingerroraction.replace) { return -1; } if (var2 >= var5 || !ismalformed3_2(var9, var1[var2])) { var4[var6++] = this.replacement().charat(0); return var6; } var4[var6++] = this.replacement().charat(0); } } else if (var9 >> 3 != -2) { if (this.malformedinputaction() != codingerroraction.replace) { return -1; } var4[var6++] = this.replacement().charat(0); } else if (var2 + 2 < var5) { var10 = var1[var2++]; var11 = var1[var2++]; byte var12 = var1[var2++]; int var13 = var9 << 18 ^ var10 << 12 ^ var11 << 6 ^ var12 ^ 3678080; if (!ismalformed4(var10, var11, var12) && character.issupplementarycodepoint(var13)) { var4[var6++] = character.highsurrogate(var13); var4[var6++] = character.lowsurrogate(var13); } else { if (this.malformedinputaction() != codingerroraction.replace) { return -1; } var4[var6++] = this.replacement().charat(0); var2 -= 4; var8 = getbytebuffer(var8, var1, var2); var2 += malformedn(var8, 4).length(); } } else { if (this.malformedinputaction() != codingerroraction.replace) { return -1; } int var14 = var9 & 255; if (var14 <= 244 && (var2 >= var5 || !ismalformed4_2(var14, var1[var2] & 255))) { ++var2; if (var2 >= var5 || !ismalformed4_3(var1[var2])) { var4[var6++] = this.replacement().charat(0); return var6; } var4[var6++] = this.replacement().charat(0); } else { var4[var6++] = this.replacement().charat(0); } } } else { if (var2 >= var5) { if (this.malformedinputaction() != codingerroraction.replace) { return -1; } var4[var6++] = this.replacement().charat(0); return var6; } var10 = var1[var2++]; if (isnotcontinuation(var10)) { if (this.malformedinputaction() != codingerroraction.replace) { return -1; } var4[var6++] = this.replacement().charat(0); --var2; } else { var4[var6++] = (char)(var9 << 6 ^ var10 ^ 3968); } } } else { var4[var6++] = (char)var9; } } return var6; } } }
结论: 字节转换成字符串需要使用到工具类stringcoding 类的decode方法,此方法会依赖传入的charset 编码类中的内部静态类stringdecode的decode方法来真正的把字节转成字符串. java通过接口的定义很好的把具体的实现转移到具体的编码类中, 而string只要面向接口编程就可以了, 这样也方便扩展不同的编码
同样的string的getbytes方法也是把主要的工作转移到具体charset 编码类的stringencode 来完成
hashcode方法
重写了此方法, 并且值和每个字符有关
public int hashcode() { int h = hash; if (h == 0 && value.length > 0) { char val[] = value; for (int i = 0; i < value.length; i++) { h = 31 * h + val[i]; //为何旧值要乘以31 } hash = h; } return h; }
字符串的拼接concat方法和join静态方法
concat方法
public string concat(string str) { int otherlen = str.length(); if (otherlen == 0) { return this; } int len = value.length; char buf[] = arrays.copyof(value, len + otherlen); str.getchars(buf, len); return new string(buf, true); }
直接在内存中复制一份新的数组, 在new 一个string对象. 线程安全. 性能较低.
-
也可以直接是用 + 拼接.
参考 https://blog.csdn.net/youanyyou/article/details/78992978这个链接了解到. + 链接再编译成字节码后还是使用的stringbuiler 来拼接, 而concat 还是使用数组复制加上 new 新对象来拼接, 综合得出 还是使用 + 来拼接吧, 性能更好
join静态方法
public static string join(charsequence delimiter, charsequence... elements) { objects.requirenonnull(delimiter); objects.requirenonnull(elements); // number of elements not likely worth arrays.stream overhead. stringjoiner joiner = new stringjoiner(delimiter); for (charsequence cs: elements) { joiner.add(cs); } return joiner.tostring(); }
具体的代码需要追到stringjoiner类中
public final class stringjoiner { private final string prefix; private final string delimiter; private final string suffix; /* * stringbuilder value -- at any time, the characters constructed from the * prefix, the added element separated by the delimiter, but without the * suffix, so that we can more easily add elements without having to jigger * the suffix each time. */ private stringbuilder value; /** * adds a copy of the given {@code charsequence} value as the next * element of the {@code stringjoiner} value. if {@code newelement} is * {@code null}, then {@code "null"} is added. * * @param newelement the element to add * @return a reference to this {@code stringjoiner} */ public stringjoiner add(charsequence newelement) { preparebuilder().append(newelement); return this; } private stringbuilder preparebuilder() { if (value != null) { value.append(delimiter); } else { value = new stringbuilder().append(prefix); } return value; }
- 内部发现还是使用stringbuilder来实现, join 完全就是一个为了使用方便的一个工具方法
replace方法
public string replace(char oldchar, char newchar)
- 使用数组遍历替换
public string replace(charsequence target, charsequence replacement)
- 使用正则表达式进行替换, 正则的源码在 接下来的文章分析
format 静态方法, 可以格式换字符串, 主要用于字符串的国际化,
内部使用了formatter类, 而formatter 中也是使用了正则表达式,
tolowercase方法
public string tolowercase(locale locale)
- 遍历char 数组, 每个字符使用character.tolowercase 来小写
trim 方法
从前后遍历空白字符, 判断空白字符是使用的 char <=' '
来判断的(学到一点), 后面在使用substring来截取非空白字符
substring方法
内部使用public string(char value[], int offset, int count)
构造方法来生成新的字符串, 在这个构造方法内部会有数组的赋值
valueof方法
public static string valueof(object obj) { return (obj == null) ? "null" : obj.tostring(); } // 内部使用传入对象的自己的tostring方法, 传入对象如果没有重载tostring方法, 就使用默认的tostring方法.
public static string valueof(char data[]) { return new string(data); } // 根据传入的数组来选择合适的构造方法来生成string对象
public static string valueof(boolean b) { return b ? "true" : "false"; } // 根据传入布尔值
static copyvalueof方法
public static string copyvalueof(char data[], int offset, int count) { return new string(data, offset, count); } // 静态工具方法, 默认使用合适构造方法来截取和生成新新的字符串
native intern方法
这个方法涉及到string的内存和常量池, 具体会在其他文章中详解.
public native string intern();
推荐阅读
-
Python中的类与对象之描述符详解
-
Python中类的定义、继承及使用对象实例详解
-
详解java中的深拷贝和浅拷贝(clone()方法的重写、使用序列化实现真正的深拷贝)
-
Java日期时间API系列5-----Jdk7及以前的日期时间类TimeUnit在并发编程中的应用
-
Java日期时间API系列12-----Jdk8中java.time包中的新的日期时间API类,日期格式化,常用日期格式大全
-
Java String类相关知识梳理(含字符串常量池(String Pool)知识)
-
Java中Date()类 日期转字符串、字符串转日期的问题
-
php实现的操作excel类详解
-
C#类的多态性详解
-
详解Java中的final关键字