PHP similar_text()、levenshtein()、lcs()支持中文汉字版,
程序员文章站
2023-12-27 17:40:46
...
PHP similar_text()、levenshtein()、lcs()支持中文汉字版,
PHP 原生的similar_text()函数、levenshtein()函数对中文汉字支持不好,我自己写了一个
similar_text()中文汉字版
1 php 2 //拆分字符串 3 function split_str($str) { 4 preg_match_all("/./u", $str, $arr); 5 return $arr[0]; 6 } 7 8 //相似度检测 9 function similar_text_cn($str1, $str2) { 10 $arr_1 = array_unique(split_str($str1)); 11 $arr_2 = array_unique(split_str($str2)); 12 $similarity = count($arr_2) - count(array_diff($arr_2, $arr_1)); 13 14 return $similarity; 15 }
levenshtein()中文汉字版
1 php 2 //拆分字符串 3 function mbStringToArray($string, $encoding = 'UTF-8') { 4 $arrayResult = array(); 5 6 while ($iLen = mb_strlen($string, $encoding)) { 7 array_push($arrayResult, mb_substr($string, 0, 1, $encoding)); 8 $string = mb_substr($string, 1, $iLen, $encoding); 9 } 10 11 return $arrayResult; 12 } 13 14 //编辑距离 15 function levenshtein_cn($str1, $str2, $costReplace = 1, $encoding = 'UTF-8') { 16 $count_same_letter = 0; 17 $d = array(); 18 19 $mb_len1 = mb_strlen($str1, $encoding); 20 $mb_len2 = mb_strlen($str2, $encoding); 21 22 $mb_str1 = mbStringToArray($str1, $encoding); 23 $mb_str2 = mbStringToArray($str2, $encoding); 24 25 for ($i1 = 0; $i1 $mb_len1