android实现汉字转拼音功能 带多音字识别
程序员文章站
2024-02-11 13:15:46
android 汉字转拼音带多音字识别功能,供大家参考,具体内容如下
问题来源
在做地名按首字母排序的时候出现了这样一个bug。长沙会被翻译拼音成zhangsha...
android 汉字转拼音带多音字识别功能,供大家参考,具体内容如下
问题来源
在做地名按首字母排序的时候出现了这样一个bug。长沙会被翻译拼音成zhangsha,重庆会被翻译拼音成zhong qing。于是排序出了问题。
汉字转拼音库和多音字识别库
1.多音字对应的词汇库
2.文字的二进制大小对应的拼音库
关键代码
1.我在这里首先将要转化的文字转化成对应的”gb2312”编码。汉字转化成二进制编码一般占两个字节,如果一个字节返回字符,如果是两个字节算一下偏移量。代码如下
/** * 汉字转成ascii码 * * @param chs * @return */ private int getchsascii(string chs) { int asc = 0; try { byte[] bytes = chs.getbytes("gb2312"); if (bytes == null || bytes.length > 2 || bytes.length <= 0) { throw new runtimeexception("illegal resource string"); } if (bytes.length == 1) { asc = bytes[0]; } if (bytes.length == 2) { int hightbyte = 256 + bytes[0]; int lowbyte = 256 + bytes[1]; asc = (256 * hightbyte + lowbyte) - 256 * 256; } } catch (exception e) { system.out.println("error:chinesespelling.class-getchsascii(string chs)" + e); } return asc; }
2.将单个汉字获取的拼音再和多音字库的hashmap进行比较,代码如下:
public string getsellingwithpolyphone(string chs){ if(polyphonemap != null && polyphonemap.isempty()){ polyphonemap = initdictionary(); } string key, value, resultpy = null; buffer = new stringbuilder(); for (int i = 0; i < chs.length(); i++) { key = chs.substring(i, i + 1); if (key.getbytes().length >= 2) { value = (string) convert(key); if (value == null) { value = "unknown"; } } else { value = key; } resultpy = value; string left = null; if(i>=1 && i+1 <= chs.length()){ left = chs.substring(i-1,i+1); if(polyphonemap.containskey(value) && polyphonemap.get(value).contains(left)){ resultpy = value; } } // if(chs.contains("重庆")){ string right = null; //向右多取一个字,例如 [长]沙 if(i<=chs.length()-2){ right = chs.substring(i,i+2); if(polyphonemap.containskey(right)){ resultpy = polyphonemap.get(right); } } // } string middle = null; //左右各多取一个字,例如 龙[爪]槐 if(i>=1 && i+2<=chs.length()){ middle = chs.substring(i-1,i+2); if(polyphonemap.containskey(value) && polyphonemap.get(value).contains(middle)){ resultpy = value; } } string left3 = null; //向左多取2个字,如 芈月[传],列车长 if(i>=2 && i+1<=chs.length()){ left3 = chs.substring(i-2,i+1); if(polyphonemap.containskey(value) && polyphonemap.get(value).contains(left3)){ resultpy = value; } } string right3 = null; //向右多取2个字,如 [长]孙无忌 if(i<=chs.length()-3){ right3 = chs.substring(i,i+3); if(polyphonemap.containskey(value) && polyphonemap.get(value).contains(right3)){ resultpy = value; } } buffer.append(resultpy); } return buffer.tostring(); }
3.将asserts文件内容解析生成hashmap列表.
public hashmap<string, string> initdictionary(){ string filename = "py4j.dic"; inputstreamreader inputreader = null; bufferedreader bufferedreader = null; hashmap<string, string> polyphonemap = new hashmap<string, string>(); try{ inputreader = new inputstreamreader(myapplication.mcontext.getresources().getassets().open(filename),"utf-8"); bufferedreader = new bufferedreader(inputreader); string line = null; while((line = bufferedreader.readline()) != null){ string[] arr = line.split(pinyin_separator); if(isnotempty(arr[1])){ string[] dyzs = arr[1].split(word_separator); for(string dyz: dyzs){ if(isnotempty(dyz)){ polyphonemap.put(dyz.trim(),arr[0]); } } } } }catch(exception e){ e.printstacktrace(); }finally{ if(inputreader != null){ try { inputreader.close(); } catch (ioexception e) { // todo auto-generated catch block e.printstacktrace(); } } if(bufferedreader != null){ try { bufferedreader.close(); } catch (ioexception e) { // todo auto-generated catch block e.printstacktrace(); } } } return polyphonemap; }
github源码下载:https://github.com/loveburce/chinesepolyphone.git
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持。