筛选前后不连续数据解决办法

2012-05-31

筛选前后不连续数据表 tb (int id)连续定义：就是（前面一个+1等于后面那个前提是以连续的组为单位）1

筛选前后不连续数据
表 tb (int id) 连续定义：就是（前面一个+1等于后面那个前提是以连续的组为单位）
1
2
3
4
1 选
1
2
3
...
7
8
这样第五个1 为不连续数据选出来 (上面 1-4和下面 1-8）为连续

1
2
3
4
5
1
2
3
...
7
8
这样为连续数据上面1-5 下面1-7 都为连续没有要选的

1
2
3
1 选
1 选
1 选
1
2
3
4
选出来为第四个 1，第五个 1 第六个 1 不连续（前面1-3 连续，后面1-4连续）

[解决办法]

SQL code

--缺失范围和现有范围（也称间断和孤岛问题）--1、缺失范围（间断）/*收集人：TravyLee时间：2012-03-25如有引用，请标明“此内容源自MSSQL2008技术内幕之T-SQL”*//*求解间断问题有几种方法，小弟我选择性能较高的三种（使用游标的方法省略有兴趣不全的大哥大姐请回复）---------------------------------间断问题的解决方案1；使用子查询step 1：找到间断之前的值，每个值增加一个间隔step 2：对于没一个间断的起点，找出序列中现有得值，再减去一个间隔本人理解为：找到原数据表中的值加一减一是否存在，若有不妥，望纠正生成测试数据:goif object_id('tbl')is not null drop table tblgocreate table tbl(id int not null)goinsert tblvalues(2),(3),(11),(12),(13),(27),(33),(34),(35),(42)要求：找到上表数据中的不存在的id的范围，--实现输出结果：/*开始范围结束范围 4 10 14 26 28 32 36 41 */ 按照每个步骤实现： step 1：找到间断之前的值，每个值增加一个间隔我们可以清楚的发现，要找的间断范围的起始值实际上就是我们现有数据中的某些值加1后存不存在现有数据表中的问题，通过子查询实现： select id+1 as start_range from tbl as a where not exists(select 1 from tbl as b where b.id=a.id+1)and id<(select max(id) from tbl) --此查询语句实现以下输出： /* start_range 4 14 28 36 */ step 2：对于没一个间断的起点，找出序列中现有得值，再减去一个间隔 select id+1 as start_range,(select min(b.id) from tbl as b where b.id>a.id)-1 as end_range from tbl a where not exists(select 1 from tbl as b where b.id=a.id+1) and id<(select max(id) from tbl) --输出结果： /* start_range end_range 4 10 14 26 28 32 36 41 */通过以上的相关子查询我们实现了找到原数据表中的间断范围。而且这种方式的效率较其他方式有绝对的优势间断问题的解决方案2；使用子查询（主意观察同1的区别）step 1:对每个现有的值匹配下一个现有的值，生成一对一对的当前值和下一个值step 2:只保留下一个值减当前值大于1的间隔值对step 3:对剩下的值对，将每个当前值增加1个间隔，将每个下一个值减去一个间隔--转换成T-SQL语句实现：--step 1:select id as cur,(select min(b.id) from tbl b where b.id>a.id) as nxt from tbl a--此子查询生成的结果：/* cur nxt 2 3 3 11 11 12 12 13 13 27 27 33 33 34 34 35 35 42 42 NULL */ step 2 and step 3: select cur+1 as start_range,nxt-1 as end_range from (select id as cur,(select min(b.id) from tbl b where b.id>a.id) as nxt from tbl a ) as d where nxt-cur>1--生成结果：/* start_range end_range 4 10 14 26 28 32 36 41*/ 间断问题的解决方案3；使用排名函数实现此种方法与第二种类似,这里我一步实现： ;with c as ( select id,row_number()over(order by id) as rownum from tbl ) select cur.id+1 as strat_range,nxt.id-1 as end_range from c as cur join c as nxt on nxt.rownum=cur.rownum+1 where nxt.id-cur.id>1--输出结果： /* strat_range end_range 4 10 14 26 28 32 36 41 */ */--2、现有范围（孤岛）/*以上测试数据，试下如下输出：/*开始范围结束范围2 311 1327 2733 3542 42*/和间断问题一样，孤岛问题也有集中解决方案，这里也只介绍三种省略了用游标的实现方案：孤岛问题解决方案1：使用子查询和排名计算step 1:找出间断之后的点，为他们分配行号（这是孤岛的起点）step 2:找出间断之前的点，为他们分配行号（这是孤岛的终点）step 3:以行号相等作为条件，匹配孤岛的起点和终点--实现代码: with startpoints as ( select id,row_number()over(order by id) as rownum from tbl as a where not exists( select 1 from tbl as b where b.id=a.id-1) /* 此查询语句单独运行的结果： id rownum 2 1 11 2 27 3 33 4 42 5 */ ), endpoinds as ( select id,row_number()over(order by id) as rownum from tbl as a where not exists( select 1 from tbl as b where b.id=a.id+1) /* 此查询语句单独运行的结果： id rownum 3 1 13 2 27 3 35 4 42 5 */ ) select s.id as start_range,e.id as end_range from startpoints as s inner join endpoinds as e on e.rownum=s.rownum--运行结果: /* start_range end_range 2 3 11 13 27 27 33 35 42 42*/孤岛问题解决方案2：使用基于子查询的组标识符--直接给出代码：with d as( select id,(select min(b.id) from tbl b where b.id>=a.id and not exists (select * from tbl c where c.id=b.id+1)) as grp from tbl a)select min(id) as start_range,max(id) as end_rangefrom d group by grp/*start_range end_range2 311 1327 2733 3542 42*/孤岛问题解决方案3：使用基于子查询的组标识符:step 1:按照id顺序计算行号: select id ,row_number()over(order by id) as rownum from tbl/*id rownum2 13 211 312 413 527 633 734 835 942 10*/step 2：生成id和行号的差: select id,id-row_number()over(order by id) as diff from tbl/*id diff2 13 111 812 813 827 2133 2634 2635 2642 32*/这里解释一下这样做的原因；因为在孤岛范围内，这两个序列都以相同的时间间隔来保持增长，所以这时他们的差值保持不变。只要遇到一个新的孤岛，他们之间的差值就会增加。这样做的目的为何，第三步将为你说明。step 3:分别取出第二个查询中生成的相同的diff的值的最大id和最小id with t as( select id,id-row_number()over(order by id) as diff from tbl ) select min(id) as start_range,max(id) as end_range from t group by diff/*start_range end_range2 311 1327 2733 3542 42*/求孤岛问题，低三种方法效率较前两种较高，具有比较强的技巧性希望在实际运用中采纳。*/

[解决办法]

SQL code

create table zen(id int)insert into zen(id)select 1 union allselect 2 union allselect 3 union allselect 1 union allselect 1 union allselect 1 union allselect 1 union allselect 2 union allselect 3 union allselect 4with t as(select row_number() over(order by getdate()) rn, id from zen)select a.idfrom t aleft join t b on a.rn=b.rn+1left join t c on a.rn=c.rn-1where a.id-isnull(b.id,0)<>1and isnull(c.id,0)-a.id<>1/*id-----------111(3 row(s) affected)*/

热点排行

SQL Server

筛选 前后 不连续 数据解决办法

筛选前后不连续数据解决办法