首页 诗词 字典 板报 句子 名言 友答 励志 学校 网站地图
当前位置: 首页 > 教程频道 > 数据库 > 其他数据库 >

oracle-随机查询

2012-08-15 
oracle--随机查询转自:http://1985wanggang.blog.163.com/blog/static/77638332010424102256740/?从Oracle

oracle--随机查询

转自:http://1985wanggang.blog.163.com/blog/static/77638332010424102256740/

?

从Oracle8i开始Oracle提供采样表扫描特性。

Oracle访问数据的基本方法有:
1.全表扫描
2.采样表扫描

全表扫描(Full table Scan)
全表扫描返回表中所有的记录。
执行全表扫描,Oracle读表中的所有记录,考查每一行是否满足WHERE条件。Oracle顺序的读分配给该表的每一个数据块,这样全表扫描能够受益于多块读.
每个数据块Oracle只读一次.

采样表扫描(sample table scan)
采样表扫描返回表中随机采样数据。
这种访问方式需要在FROM语句中包含SAMPLE选项或者SAMPLE BLOCK选项.

SAMPLE选项:
当按行采样来执行一个采样表扫描时,Oracle从表中读取特定百分比的记录,并判断是否满足WHERE子句以返回结果。

SAMPLE BLOCK选项:
使用此选项时,Oracle读取特定百分比的BLOCK,考查结果集是否满足WHERE条件以返回满足条件的纪录.

Sample_Percent:
Sample_Percent是一个数字,定义结果集中包含记录占总记录数量的百分比。
Sample值应该在[0.000001,99.999999]之间。

1.使用SAMPLE选项

SQL> select * from employee SAMPLE(30);

?

???? EMPNO ENAME????? JOB????????????? MGR HIREDATE???????? SAL?????? COMM???? DEPTNO

---------- ---------- --------- ---------- --------- ---------- ---------- ----------

????? 7369 SMITH????? CLERK?????????? 7902 17-DEC-80??????? 800??????????????????? 20

????? 7788 SCOTT????? ANALYST???????? 7566 19-APR-87?????? 3000??????????????????? 20

????? 7839 KING?????? PRESIDENT??????????? 17-NOV-81?????? 5000??????????? ????????10

?

Execution Plan

----------------------

?? 0????? SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=25 Bytes=2175)

?? 1??? 0?? TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=25 Bytes=2175)

?

Statistics

----------------------

????????? 0? recursive calls

????????? 0? db block gets

????????? 5? consistent gets

????????? 0? physical reads

????????? 0? redo size

??????? 880? bytes sent via SQL*Net to client

??????? 503? bytes received via SQL*Net from client

????????? 2? SQL*Net roundtrips to/from client

????????? 0? sorts (memory)

????????? 0? sorts (disk)

????????? 3? rows processed

?

SQL> select * from employee SAMPLE(20);

?

???? EMPNO ENAME????? JOB????????????? MGR HIREDATE?????? ??SAL?????? COMM???? DEPTNO

---------- ---------- --------- ---------- --------- ---------- ---------- ----------

????? 7654 MARTIN???? SALESMAN??????? 7698 28-SEP-81?????? 1250?????? 1400???????? 30

????? 7844 TURNER???? SALESMAN??????? 7698 08-SEP-81??? ???1500????????? 0???????? 30

?

Execution Plan

----------------------

?? 0????? SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=16 Bytes=1392)

?? 1??? 0?? TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=16 Bytes=1392)

?

Statistics

----------------------

????????? 0? recursive calls

????????? 0? db block gets

????????? 5? consistent gets

????????? 0? physical reads

????????? 0? redo size

??????? 839? bytes sent via SQL*Net to client

??? ????503? bytes received via SQL*Net from client

????????? 2? SQL*Net roundtrips to/from client

????????? 0? sorts (memory)

????????? 0? sorts (disk)

????????? 2? rows processed

2.使用SAMPLE BLOCK选项

SQL> SELECT * FROM employee SAMPLE BLOCK (50);

?

???? EMPNO ENAME????? JOB????????????? MGR HIREDATE???????? SAL?????? COMM???? DEPTNO

---------- ---------- --------- ---------- --------- ---------- ---------- ----------

????? 7369 SMITH????? CLERK?????????? 7902 17-DEC-80??????? 800??????????????????? 20

?? ???7499 ALLEN????? SALESMAN??????? 7698 20-FEB-81?????? 1600??????? 300???????? 30

????? 7521 WARD?????? SALESMAN??????? 7698 22-FEB-81?????? 1250??????? 500???????? 30

????? 7566 JONES????? MANAGER???????? 7839 02-APR-81?????? 2975??????????????????? 20

????? 7654 MARTIN???? SALESMAN??????? 7698 28-SEP-81?????? 1250?????? 1400???????? 30

????? 7698 BLAKE????? MANAGER???????? 7839 01-MAY-81?????? 2850??????????????????? 30

????? 7782 CLARK????? MANAGER???????? 7839 09-JUN-81?????? 2450??????????????????? 10

????? 7788 SCOTT????? ANALYST???????? 7566 19-APR-87?????? 3000??????????????????? 20

????? 7839 KING?????? PRESIDENT??????????? 17-NOV-81?????? 5000??????????????????? 10

????? 7844 TURNER???? SALESMAN??????? 7698 08-SEP-81?????? 1500????????? 0???????? 30

?

10 rows selected.

?

Execution Plan

----------------------

?? 0????? SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=41 Bytes=3567)

?? 1??? 0?? TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=41 Bytes=3567)

?

Statistics

----------------------

????????? 0? recursive calls

????????? 0? db block gets

????????? 4? consistent gets

????????? 0? physical reads

????????? 0? redo size

?????? 1162? bytes sent via SQL*Net to client

??????? 503? bytes received via SQL*Net from client

????????? 2? SQL*Net roundtrips to/from client

????????? 0? sorts (memory)

????????? 0? sorts (disk)

???????? 10? rows processed

?

SQL>

3.采样前n条记录的查询

也可以使用dbms_random包实现

SQL> select * from (

? 2? select * from employee

? 3? order by dbms_random.value )

? 4? where rownum <= 4;???

?

???? EMPNO ENAME????? JOB????????????? MGR HIREDATE???????? SAL?????? COMM???? DEPTNO

---------- ---------- --------- ---------- --------- ---------- ---------- ----------

????? 7654 MARTIN???? SALESMAN??????? 7698 28-SEP-81?????? 1250?????? 1400???????? 30

????? 7839 KING?????? PRESIDENT??????????? 17-NOV-81?????? 5000??????????????????? 10

????? 7369 SMITH????? CLERK?????????? 7902 17-DEC-80??????? 800?????????????????? ?20

????? 7788 SCOTT????? ANALYST???????? 7566 19-APR-87?????? 3000??????????????????? 20

?

Execution Plan

----------------------

?? 0????? SELECT STATEMENT Optimizer=CHOOSE

?? 1??? 0?? COUNT (STOPKEY)

?? 2??? 1???? VIEW

?? 3??? 2?????? SORT (ORDER BY STOPKEY)

?? 4??? 3???????? TABLE ACCESS (FULL) OF 'EMPLOYEE'

?

Statistics

----------------------

????????? 0? recursive calls

????????? 0? db block gets

????????? 3? consistent gets

??????? ??0? physical reads

????????? 0? redo size

??????? 927? bytes sent via SQL*Net to client

??????? 503? bytes received via SQL*Net from client

????????? 2? SQL*Net roundtrips to/from client

????????? 1? sorts (memory)

????????? 0? sorts (disk)

????????? 4? rows processed

对比一下SAMPLE选项

SQL> SELECT * FROM employee SAMPLE (40);

?

???? EMPNO ENAME????? JOB????????????? MGR HIREDATE???????? SAL?????? COMM???? DEPTNO

---------- ---------- --------- ---------- --------- ---------- ---------- ----------

????? 7499 ALLEN????? SALESMAN??????? 7698 20-FEB-81?????? 1600??????? 300???????? 30

????? 7521 WARD?????? SALESMAN??????? 7698 22-FEB-81?????? 1250??????? 500???????? 30

????? 7698 BLAKE????? MANAGER???????? 7839 01-MAY-81?????? 2850??????????????????? 30

??? ??7839 KING?????? PRESIDENT??????????? 17-NOV-81?????? 5000??????????????????? 10

????? 7844 TURNER???? SALESMAN??????? 7698 08-SEP-81?????? 1500????????? 0???????? 30

?

Execution Plan

----------------------

?? 0????? SELECT STATEMENT Optimizer=CHOOSE (Cost=2 Card=33 Bytes=2871)

?? 1??? 0?? TABLE ACCESS (SAMPLE) OF 'EMPLOYEE' (Cost=2 Card=33 Bytes=2871)

?

Statistics

----------------------

????????? 0? recursive calls

????????? 0? db block gets

????????? 5? consistent gets

????????? 0? physical reads

????????? 0? redo size

??????? 961? bytes sent via SQL*Net to client

??????? 503? bytes received via SQL*Net from client

????????? 2? SQL*Net roundtrips to/from client

????????? 0? sorts (memory)

????????? 0? sorts (disk)

????????? 5? rows processed

?

SQL>

主要注意以下几点:

1.sample只对单表生效,不能用于表连接和远程表
2.sample会使SQL自动使用CBO


===============================
【1】方法一:通过dbms_random.random

select * from (select * from largetable order by dbms_random.random) where rownum <= 20000;


【2】方法二:通过dbms_random.value

select * from (select * from largetable order by dbms_random.value) where rownum <= 20000;


【3】方法三:通过采样表扫描

select * from (select * from largetable sample(10)) where rownum <= 20000;


下面我们通过实践来比较这3种方法的效率,首先我们创建一个包含有10W条记录的表用于实验:

create table LARGETABLE
(
?? ID???????NUMBER not null primary key,
?? BIRTHDAY DATE not null
)


接下来我们插入10W条数据

create or replace procedure random_insert as
?? i?????????number;
?? startDate date := sysdate;

begin

??for i in 1 .. 100000 loop
????insert into largetable values (i, startDate + 1);
??end loop commit;

end;

?

?

在SQL*PLUS下设置显示SQL语句执行时间:set timing on,让后分别运行上述三条语句:

第一个的执行时间为 00: 00: 16: 04
第二个的执行时间为 00: 00: 54: 04
第三个的执行时间为 00: 00: 08: 07

从这里我们可以看出在进行数据随机抽取时,采用sample的方法效率是最高的。为了保证每次随机查询的数据尽量不重复,我们可以把sample中的百分比提高一些(例如从10%提高到20%)。

?

热点排行