首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 数据库 > SQL Server >

处理重复数据的有关问题

2012-01-21 
处理重复数据的问题表结构如下:Timer[Id][smallint]NOTNULL,[Card][int]NotNULL,[EmployeeId][int]NULL,[B

处理重复数据的问题
表结构如下:Timer
[Id]   [smallint]   NOT   NULL   ,
[Card]   [int]   Not   NULL   ,
[EmployeeId]   [int]   NULL   ,
[BrushTime]   [datetime]   NOT   NULL   ,
现在此表有近一亿多条数据,其中有大量重复数据,现在要清理掉这些重复数据,大家有什么好的解决办法!现在主要是因为数据量太大,处理不过来!

[解决办法]
表结构如下:Timer
[Id] [smallint] NOT NULL ,
[Card] [int] Not NULL ,
[EmployeeId] [int] NULL ,
[BrushTime] [datetime] NOT NULL ,
现在此表有近一亿多条数据,其中有大量重复数据,现在要清理掉这些重复数据,大家有什么好的解决办法!现在主要是因为数据量太大,处理不过来!

以ID为例,BrushTime分大小
delete from timer where BrushTime not in (select min(BrushTime) from timer group by id)
[解决办法]
如果所有列都有重复,指不能区分两行(按任何字段排序).

则需要建立临时表来删除.(不过一亿,恐怖啊)

select px = identity(int,1,1),* into temp from timer
delete from temp where px not in (select min(px) from temp group by id)
[解决办法]
---保留相同Id,Cardid,EmployeeId,BrushTime最大那行
Delete A From Timer As A Where Exists
(Select 1 From Timer Where Id=A.Id And Cardid=A.Cardid And
EmployeeId=A.EmployeeId And BrushTime> A.BrushTime)

---保留相同Id,Cardid,EmployeeId,BrushTime最小那行
Delete A From Timer As A Where Exists
(Select 1 From Timer Where Id=A.Id And Cardid=A.Cardid And
EmployeeId=A.EmployeeId And BrushTime <A.BrushTime)

---只要Id,Cardid,EmployeeId有重复的,就将其全部删除
Delete A From @Timer As A Where Exists
(Select 1 From @Timer Where Id=A.Id And Cardid=A.Cardid And
EmployeeId=A.EmployeeId Group By Id,Cardid,EmployeeId Having Count(1)> 1)
[解决办法]
哇,看來新增時要多注意重復的問題才行。
[解决办法]
数据重复是指数据完全一样,现在的问题是怎么处理大量的重复数据!

1亿数据,天啊。这样:

select distinct * into #temp from Timer -- 但愿你的内存和磁盘空间足够

truncate table Timer

insert Timer select * from #temp

drop table #temp

热点排行