欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

删除一个表中的重复数据同时保留第一次插入那一条以及sql优化

程序员文章站 2023-10-30 23:20:58
删除一个表中的重复数据同时保留第一次插入那一条以及sql优化          业务:一个表中有很多数据(id为自增主键),在...

删除一个表中的重复数据同时保留第一次插入那一条以及sql优化

 

       业务:一个表中有很多数据(id为自增主键),在这些数据中有个别数据出现了重复的数据。

 

       目标:需要把这些重复数据删除同时保留第一次插入的那一条数据,还要保持其它的数据不受影响。

 

       解题过程:

 

       第一步:查出所有要保留的下来的数据的id(save_id)
[sql] 
select id as save_id  
  from yujing.alarm_event_info_snapshot aeis  
 where aeis.event_id in  
       (select ae.id  
          from yujing.alarm_event ae  
         where ae.event_uuid like 'yuanwtj_%')  
 group by (aeis.event_id)  
优化后:
[sql] 
select aeis.id as save_id  
  from yujing.alarm_event ae  
 right join yujing.alarm_event_info_snapshot aeis  
    on aeis.event_id = ae.id  
 where ae.event_uuid like 'yuanwtj_%'  
 group by (aeis.event_id)  
  
       第二步:获取所有相关数据的id(all_id)
[sql] 
select aeis.id as all_id  
  from yujing.alarm_event_info_snapshot aeis  
 where aeis.event_id in  
       (select ae.id  
          from yujing.alarm_event ae  
         where ae.event_uuid like 'yuanwtj_%')  
 order by aeis.event_id  
优化后:       
[sql] 
select aeis.id as all_id  
  from yujing.alarm_event ae  
 right join yujing.alarm_event_info_snapshot aeis  
    on aeis.event_id = ae.id  
 where ae.event_uuid like 'yuanwtj_%'  

        第三步:获取要删除的数据的id(del_id)
[sql] 
select ad.all_id as del_id  
  from (select aeis.id as all_id  
          from yujing.alarm_event_info_snapshot aeis  
         where aeis.event_id in  
               (select ae.id  
                  from yujing.alarm_event ae  
                 where ae.event_uuid like 'yuanwtj_%')) as ad  
 where ad.all_id not in (select id as save_id  
                           from yujing.alarm_event_info_snapshot aeis  
                          where aeis.event_id in  
                                (select ae.id  
                                   from yujing.alarm_event ae  
                                  where ae.event_uuid like 'yuanwtj_%')  
                          group by (aeis.event_id))  
优化后:
[sql] 
select ad.all_id as del_id  
  from (select aeis.id as all_id  
          from yujing.alarm_event ae  
         right join yujing.alarm_event_info_snapshot aeis  
            on aeis.event_id = ae.id  
         where ae.event_uuid like 'yuanwtj_%') as ad  
  left join (select aeis.id as save_id  
               from yujing.alarm_event ae  
              right join yujing.alarm_event_info_snapshot aeis  
                 on aeis.event_id = ae.id  
              where ae.event_uuid like 'yuanwtj_%'  
              group by (aeis.event_id)) as sd  
    on ad.all_id = sd.save_id  
 where sd.save_id is null  

 

       

       第四步:根据id删除所有节点,注意mysql中如果有大量数据时需要批量删除,我最后使用了etl工具进行的批量删除

 

       总结:在mysql中,sql语句中最好不要在in或not in关键字的查询里动态获取匹配的值,数据量大的情况下使用它们效率很低,可以使用左右连接来代替in操作,这样效率会提高很多倍,大数据量下尤为明显。