字符串相似度处理函数
程序员文章站
2022-03-24 09:03:30
oracle里面查比如存储过程里面与表SALES有关jobs: SELECT * FROM (SELECT a.name,upper(b.what)AS what,SYS.UTL_MATCH.edit_distance_similarity (a.name,upper(b.what)) AS sim ......
oracle里面查比如存储过程里面与表sales有关jobs:
select * from (select a.name,upper(b.what)as what,sys.utl_match.edit_distance_similarity (a.name,upper(b.what)) as similarity from dba_source a,dba_jobs b where sys.utl_match.edit_distance_similarity(a.name,upper(b.what))>80 and upper(a.text) like '%sales%' and b.what not like '%dbms_refresh%') order by 3 desc;
有自带相似度函数 sys.utl_match.edit_distance_similarity 可以直接用。
mysql8里面我只查到了一个搜索相关的文档并不适合我自己用,找了一下大佬的文章,发现了一个能用的自定义函数,用于计算字符串相似度。
delimiter $$ create definer=`root`@`localhost` function `compare_string`( s1 text, s2 text) returns int(11) deterministic begin declare s1_len, s2_len, i, j, c, c_temp, cost int; declare s1_char char; declare cv0, cv1 text; set s1_len = char_length(s1), s2_len = char_length(s2), cv1 = 0x00, j = 1, i = 1, c = 0; if s1 = s2 then return 0; elseif s1_len = 0 then return s2_len; elseif s2_len = 0 then return s1_len; else while j <= s2_len do set cv1 = concat(cv1, unhex(hex(j))), j = j + 1; end while; while i <= s1_len do set s1_char = substring(s1, i, 1), c = i, cv0 = unhex(hex(i)), j = 1; while j <= s2_len do set c = c + 1; if s1_char = substring(s2, j, 1) then set cost = 0; else set cost = 1; end if; set c_temp = conv(hex(substring(cv1, j, 1)), 16, 10) + cost; if c > c_temp then set c = c_temp; end if; set c_temp = conv(hex(substring(cv1, j+1, 1)), 16, 10) + 1; if c > c_temp then set c = c_temp; end if; set cv0 = concat(cv0, unhex(hex(c))), j = j + 1; end while; set cv1 = cv0, i = i + 1; end while; end if; return c; end$$ delimiter ; delimiter $$ create definer=`root`@`localhost` function `similarity_string`(a text, b text) returns double begin return abs(((compare_string(a, b) / length(b)) * 100) - 100); end$$ delimiter ;
试了一下还挺好用的,一些逻辑可以自己再适当的修改。
而在gp里面,我找了许久,发现一个 fuzzystrmatch 看起来比较高档的函数,这个函数需要安装:
[gpadmin@szwpldb1085 ~]$ psql -d postgres -f $gphome/share/postgresql/contrib/fuzzystrmatch.sql set create function create function create function create function create function create function create function create function create function create function
然后会在 postgres 库多出一些函数:
用法在