处理重复数据的问题
表结构如下:Timer
[Id] [smallint] NOT NULL ,
[Card] [int] Not NULL ,
[EmployeeId] [int] NULL ,
[BrushTime] [datetime] NOT NULL ,
现在此表有近一亿多条数据,其中有大量重复数据,现在要清理掉这些重复数据,大家有什么好的解决办法!现在主要是因为数据量太大,处理不过来!
[解决办法]
表结构如下:Timer
[Id] [smallint] NOT NULL ,
[Card] [int] Not NULL ,
[EmployeeId] [int] NULL ,
[BrushTime] [datetime] NOT NULL ,
现在此表有近一亿多条数据,其中有大量重复数据,现在要清理掉这些重复数据,大家有什么好的解决办法!现在主要是因为数据量太大,处理不过来!
以ID为例,BrushTime分大小
delete from timer where BrushTime not in (select min(BrushTime) from timer group by id)
[解决办法]
如果所有列都有重复,指不能区分两行(按任何字段排序).
则需要建立临时表来删除.(不过一亿,恐怖啊)
select px = identity(int,1,1),* into temp from timer
delete from temp where px not in (select min(px) from temp group by id)
[解决办法]
---保留相同Id,Cardid,EmployeeId,BrushTime最大那行
Delete A From Timer As A Where Exists
(Select 1 From Timer Where Id=A.Id And Cardid=A.Cardid And
EmployeeId=A.EmployeeId And BrushTime> A.BrushTime)
---保留相同Id,Cardid,EmployeeId,BrushTime最小那行
Delete A From Timer As A Where Exists
(Select 1 From Timer Where Id=A.Id And Cardid=A.Cardid And
EmployeeId=A.EmployeeId And BrushTime <A.BrushTime)
---只要Id,Cardid,EmployeeId有重复的,就将其全部删除
Delete A From @Timer As A Where Exists
(Select 1 From @Timer Where Id=A.Id And Cardid=A.Cardid And
EmployeeId=A.EmployeeId Group By Id,Cardid,EmployeeId Having Count(1)> 1)
[解决办法]
哇,看來新增時要多注意重復的問題才行。
[解决办法]
数据重复是指数据完全一样,现在的问题是怎么处理大量的重复数据!
1亿数据,天啊。这样:
select distinct * into #temp from Timer -- 但愿你的内存和磁盘空间足够
truncate table Timer
insert Timer select * from #temp
drop table #temp