oracle--随机查询
转自:http://1985wanggang.blog.163.com/blog/static/77638332010424102256740/
?
从Oracle8i开始Oracle提供采样表扫描特性。
Oracle访问数据的基本方法有:
1.全表扫描
2.采样表扫描
全表扫描(Full table Scan)
全表扫描返回表中所有的记录。
执行全表扫描,Oracle读表中的所有记录,考查每一行是否满足WHERE条件。Oracle顺序的读分配给该表的每一个数据块,这样全表扫描能够受益于多块读.
每个数据块Oracle只读一次.
采样表扫描(sample table scan)
采样表扫描返回表中随机采样数据。
这种访问方式需要在FROM语句中包含SAMPLE选项或者SAMPLE BLOCK选项.
SAMPLE选项:
当按行采样来执行一个采样表扫描时,Oracle从表中读取特定百分比的记录,并判断是否满足WHERE子句以返回结果。
SAMPLE BLOCK选项:
使用此选项时,Oracle读取特定百分比的BLOCK,考查结果集是否满足WHERE条件以返回满足条件的纪录.
Sample_Percent:
Sample_Percent是一个数字,定义结果集中包含记录占总记录数量的百分比。
Sample值应该在[0.000001,99.999999]之间。
1.使用SAMPLE选项
SQL> select * from employee SAMPLE(30);
?
???? EMPNO ENAME????? JOB????????????? MGR HIREDATE???????? SAL?????? COMM???? DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
????? 7369 SMITH????? CLERK?????????? 7902 17-DEC-80??????? 800??????????????????? 20
????? 7788 SCOTT????? ANALYST???????? 7566 19-APR-87?????? 3000??????????????????? 20
????? 7839 KING?????? PRESIDENT??????????? 17-NOV-81?????? 5000??????????? ????????10
?
Execution Plan
----------------------
?? 0????? SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=25 Bytes=2175)
?? 1??? 0?? TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=25 Bytes=2175)
?
Statistics
----------------------
????????? 0? recursive calls
????????? 0? db block gets
????????? 5? consistent gets
????????? 0? physical reads
????????? 0? redo size
??????? 880? bytes sent via SQL*Net to client
??????? 503? bytes received via SQL*Net from client
????????? 2? SQL*Net roundtrips to/from client
????????? 0? sorts (memory)
????????? 0? sorts (disk)
????????? 3? rows processed
?
SQL> select * from employee SAMPLE(20);
?
???? EMPNO ENAME????? JOB????????????? MGR HIREDATE?????? ??SAL?????? COMM???? DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
????? 7654 MARTIN???? SALESMAN??????? 7698 28-SEP-81?????? 1250?????? 1400???????? 30
????? 7844 TURNER???? SALESMAN??????? 7698 08-SEP-81??? ???1500????????? 0???????? 30
?
Execution Plan
----------------------
?? 0????? SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=16 Bytes=1392)
?? 1??? 0?? TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=16 Bytes=1392)
?
Statistics
----------------------
????????? 0? recursive calls
????????? 0? db block gets
????????? 5? consistent gets
????????? 0? physical reads
????????? 0? redo size
??????? 839? bytes sent via SQL*Net to client
??? ????503? bytes received via SQL*Net from client
????????? 2? SQL*Net roundtrips to/from client
????????? 0? sorts (memory)
????????? 0? sorts (disk)
????????? 2? rows processed
2.使用SAMPLE BLOCK选项
SQL> SELECT * FROM employee SAMPLE BLOCK (50);
?
???? EMPNO ENAME????? JOB????????????? MGR HIREDATE???????? SAL?????? COMM???? DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
????? 7369 SMITH????? CLERK?????????? 7902 17-DEC-80??????? 800??????????????????? 20
?? ???7499 ALLEN????? SALESMAN??????? 7698 20-FEB-81?????? 1600??????? 300???????? 30
????? 7521 WARD?????? SALESMAN??????? 7698 22-FEB-81?????? 1250??????? 500???????? 30
????? 7566 JONES????? MANAGER???????? 7839 02-APR-81?????? 2975??????????????????? 20
????? 7654 MARTIN???? SALESMAN??????? 7698 28-SEP-81?????? 1250?????? 1400???????? 30
????? 7698 BLAKE????? MANAGER???????? 7839 01-MAY-81?????? 2850??????????????????? 30
????? 7782 CLARK????? MANAGER???????? 7839 09-JUN-81?????? 2450??????????????????? 10
????? 7788 SCOTT????? ANALYST???????? 7566 19-APR-87?????? 3000??????????????????? 20
????? 7839 KING?????? PRESIDENT??????????? 17-NOV-81?????? 5000??????????????????? 10
????? 7844 TURNER???? SALESMAN??????? 7698 08-SEP-81?????? 1500????????? 0???????? 30
?
10 rows selected.
?
Execution Plan
----------------------
?? 0????? SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=41 Bytes=3567)
?? 1??? 0?? TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=41 Bytes=3567)
?
Statistics
----------------------
????????? 0? recursive calls
????????? 0? db block gets
????????? 4? consistent gets
????????? 0? physical reads
????????? 0? redo size
?????? 1162? bytes sent via SQL*Net to client
??????? 503? bytes received via SQL*Net from client
????????? 2? SQL*Net roundtrips to/from client
????????? 0? sorts (memory)
????????? 0? sorts (disk)
???????? 10? rows processed
?
SQL>
3.采样前n条记录的查询
也可以使用dbms_random包实现
SQL> select * from (
? 2? select * from employee
? 3? order by dbms_random.value )
? 4? where rownum <= 4;???
?
???? EMPNO ENAME????? JOB????????????? MGR HIREDATE???????? SAL?????? COMM???? DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
????? 7654 MARTIN???? SALESMAN??????? 7698 28-SEP-81?????? 1250?????? 1400???????? 30
????? 7839 KING?????? PRESIDENT??????????? 17-NOV-81?????? 5000??????????????????? 10
????? 7369 SMITH????? CLERK?????????? 7902 17-DEC-80??????? 800?????????????????? ?20
????? 7788 SCOTT????? ANALYST???????? 7566 19-APR-87?????? 3000??????????????????? 20
?
Execution Plan
----------------------
?? 0????? SELECT STATEMENT Optimizer=CHOOSE
?? 1??? 0?? COUNT (STOPKEY)
?? 2??? 1???? VIEW
?? 3??? 2?????? SORT (ORDER BY STOPKEY)
?? 4??? 3???????? TABLE ACCESS (FULL) OF 'EMPLOYEE'
?
Statistics
----------------------
????????? 0? recursive calls
????????? 0? db block gets
????????? 3? consistent gets
??????? ??0? physical reads
????????? 0? redo size
??????? 927? bytes sent via SQL*Net to client
??????? 503? bytes received via SQL*Net from client
????????? 2? SQL*Net roundtrips to/from client
????????? 1? sorts (memory)
????????? 0? sorts (disk)
????????? 4? rows processed
对比一下SAMPLE选项
SQL> SELECT * FROM employee SAMPLE (40);
?
???? EMPNO ENAME????? JOB????????????? MGR HIREDATE???????? SAL?????? COMM???? DEPTNO
---------- ---------- --------- ---------- --------- ---------- ---------- ----------
????? 7499 ALLEN????? SALESMAN??????? 7698 20-FEB-81?????? 1600??????? 300???????? 30
????? 7521 WARD?????? SALESMAN??????? 7698 22-FEB-81?????? 1250??????? 500???????? 30
????? 7698 BLAKE????? MANAGER???????? 7839 01-MAY-81?????? 2850??????????????????? 30
??? ??7839 KING?????? PRESIDENT??????????? 17-NOV-81?????? 5000??????????????????? 10
????? 7844 TURNER???? SALESMAN??????? 7698 08-SEP-81?????? 1500????????? 0???????? 30
?
Execution Plan
----------------------
?? 0????? SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=33 Bytes=2871)
?? 1??? 0?? TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=33 Bytes=2871)
?
Statistics
----------------------
????????? 0? recursive calls
????????? 0? db block gets
????????? 5? consistent gets
????????? 0? physical reads
????????? 0? redo size
??????? 961? bytes sent via SQL*Net to client
??????? 503? bytes received via SQL*Net from client
????????? 2? SQL*Net roundtrips to/from client
????????? 0? sorts (memory)
????????? 0? sorts (disk)
????????? 5? rows processed
?
SQL>
主要注意以下几点:
1.sample只对单表生效,不能用于表连接和远程表
2.sample会使SQL自动使用CBO
===============================
【1】方法一:通过dbms_random.random
select * from (select * from largetable order by dbms_random.random) where rownum <= 20000;
【2】方法二:通过dbms_random.value
select * from (select * from largetable order by dbms_random.value) where rownum <= 20000;
【3】方法三:通过采样表扫描
select * from (select * from largetable sample(10)) where rownum <= 20000;
下面我们通过实践来比较这3种方法的效率,首先我们创建一个包含有10W条记录的表用于实验:
create table LARGETABLE
(
?? ID???????NUMBER not null primary key,
?? BIRTHDAY DATE not null
)
接下来我们插入10W条数据
create or replace procedure random_insert as
?? i?????????number;
?? startDate date := sysdate;
begin
??for i in 1 .. 100000 loop
????insert into largetable values (i, startDate + 1);
??end loop commit;
end;
?
?
在SQL*PLUS下设置显示SQL语句执行时间:set timing on,让后分别运行上述三条语句:
第一个的执行时间为 00: 00: 16: 04
第二个的执行时间为 00: 00: 54: 04
第三个的执行时间为 00: 00: 08: 07
从这里我们可以看出在进行数据随机抽取时,采用sample的方法效率是最高的。为了保证每次随机查询的数据尽量不重复,我们可以把sample中的百分比提高一些(例如从10%提高到20%)。
?