欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Oracle 行列转换实例 列转行报表

程序员文章站 2024-03-01 16:47:58
...

简单说明:

本博文不面向Oracle行列转换零基础的筒子,这是一个高级列转行之后的报表实例:
有张表存在冗余数据
  id1,url1
  id2,url1&url2&url3&url3&url2...
  id3,url1&url2&url3&url3&url2...
  id4,url1&url4&url5&url5...
现在要根据url进行记录滤重,将重复出现的url对应的id查出来拼接在一起
url是不确定的,可能有上万个url甚至更多,一条记录中的url可能存在重复,相邻的记录之间的url可能是完全重复的

准备测试数据:

create table id_urls(id number,urls varchar2(255));
set define off
insert into ID_URLS values(1,'URL1');
insert into ID_URLS values(2,'URL1');
insert into ID_URLS values(3,'URL1&URL2&URL3');
insert into ID_URLS values(4,'URL1&URL2&URL3');
insert into ID_URLS values(5,'URL2&URL3&URL2');
insert into ID_URLS values(6,'URL2&URL3&URL2');
insert into ID_URLS values(7,'URL6&URL7&URL8&URL9&URL0');
insert into ID_URLS values(8,'URL10');
commit;
set define on

查询推导过程:

1° 将URL替换为空,查看该字段最多有几个URL

select regexp_replace(urls, '[^&]+', '') from ID_URLS;
select length(regexp_replace(urls, '[^&]+', '')) from ID_URLS;
select max(length(regexp_replace(urls, '[^&]+', ''))) + 1 as max_len from ID_URLS;
-- 最大数量至少要加1,如果URL数量最多的记录以分隔符&为行尾,加1会导致最大URL个数比实际个数大1
-- 求出来的最大URL个数是做记录笛卡尔积的依据
-- 比如当前记录只有一个URL,最大URL个数为10个,那么该行记录会笛卡尔积10次
-- 然后将10次笛卡尔积的结果集根据&符切分出第1组到第10组,会切出9组空记录,滤空即可解决
-- 如果求出的最大URL个数比实际小的话,则会造成切分时丢失数据
-- 比如当前记录有10个URL,但是求出的最大URL个数为5,那么该记录只会笛卡尔积5次
-- 然后将5次笛卡尔积的结果集根据&符切分出第1组到第5组,就会丢失5组数据
-- 因此求出的最大URL个数比实际个数大是可以的,小是不行的
-- 只有判断表中所有记录包含或者不包含以&分隔符结尾的情况才能得出正好相等的最大URL个数

2° 如果要将所有的URL做列传行的还原,至少需要每条记录重复max_len次

with max_length as
 (select max(length(regexp_replace(urls, '[^&]+', ''))) + 1 as max_len from ID_URLS)
select level as lvl from dual connect by level <= (select max_len from max_length);

3° 每条记录都重复max_len次,列传行切出来所有的URL

with max_length as
  (select max(length(regexp_replace(urls, '[^&]+', ''))) + 1  as max_len from ID_URLS),
repeat as
  (select level as lvl from dual connect by level <= (select max_len from max_length))
select id,regexp_substr(urls,'[^&]+',1,lvl) as single_url from ID_URLS,repeat order by id;

4° 因为并不是所有的记录的URL都有max_len个,所以存在URL为空的记录,过滤掉

with max_length as
  (select max(length(regexp_replace(urls, '[^&]+', ''))) + 1  as max_len from ID_URLS),
repeat as
  (select level as lvl from dual connect by level <= (select max_len from max_length)),
cut_url as
  (select id,regexp_substr(urls,'[^&]+',1,lvl) as single_url from ID_URLS,repeat)
select id, single_url from cut_url where single_url is not null order  by id;

5° 因为同一条记录的URL有可能存在重复,因此需要滤重

with max_length as
  (select max(length(regexp_replace(urls, '[^&]+', ''))) + 1  as max_len from ID_URLS),
repeat as
  (select level as lvl from dual connect by level <= (select max_len from max_length)),
cut_url as
  (select id,regexp_substr(urls,'[^&]+',1,lvl) as single_url from ID_URLS,repeat)
select distinct id, single_url from cut_url where single_url is not null order  by id;

6° 查看一下有多少URL重复,每个URL重复了多少次

with max_length as
  (select max(length(regexp_replace(urls, '[^&]+', ''))) + 1  as max_len from ID_URLS),
repeat as
  (select level as lvl from dual connect by level <= (select max_len from max_length)),
cut_url as
  (select id,regexp_substr(urls,'[^&]+',1,lvl) as single_url from ID_URLS,repeat),
distinct_record as
  (select distinct id, single_url from cut_url where single_url is not null order  by id)
select single_url,count(*) from distinct_record where single_url is not null group by single_url having count(*)>1;

7° 将重复的URL对应的ID拼接在一起,丢到需求人脸上

with max_length as
  (select max(length(regexp_replace(urls, '[^&]+', ''))) + 1  as max_len from ID_URLS),
repeat as
  (select level as lvl from dual connect by level <= (select max_len from max_length)),
cut_url as
  (select id,regexp_substr(urls,'[^&]+',1,lvl) as single_url from ID_URLS,repeat),
distinct_record as
  (select distinct id, single_url from cut_url where single_url is not null order  by id)
select single_url,to_char(wm_concat(id)) from distinct_record where single_url is not null group by single_url having count(*)>1;

收尾:

需求产生原因:

这是一个真实的生产需求,该表存的URL是图片URL
用户前台页面上传图片,一次上传的图片可能是多个
图片上传到第三方,第三方将上传的图片存储的URL返回给后台
因为网络原因或者第三方的BUG,造成多次将用户的一次上传行为返回给后台
且返回信息中的图片URL存在重复
后台未做处理直接将这些图片URL入库,造成了本次需求的产生

解决办法:

后台将收到的图片URL存入Redis,一个图片URL存一个KEY,一分钟超时
当收到图片URL时检索Redis中的当前一分钟内的图片URL,如果存在则不再重复入库
如果不存在,新建该图片URL的KEY,设置一分钟超时,入库

题外话:

从解决办法上可以明显看出锅到底是谁的
该难度的SQL业务开发无法写出(他们少点BUG就谢天谢地了)
普通的维护向DBA无法写出,专司BI报表的DBA或SQL开发能够写出
但是作为DBA,应当追求熟练揉捏手中数据的能力

[TOC]