MySQL水平分区表初体验总结
本文总结个这段时间研究MySQL水平分区表总结,列举分区表的相关操作和通过实际数据对分区表读写的性能比较.
在网上看了很多文章,都太过于概念,注意集中在介绍分区表的优点,而不注重时间操作,跟大学教授论文似的,唯一由于的一片文章和大家分享一下吧http://fanqiang.chinaunix.net/db/mysql/2006-05-08/4135.shtml.
MySQL分区表操作代码(本案例按月分区):1.????????创建分区表:
????????CREATE TABLE `表名` (
? ??? `EQUIPMENTID`char(17) NOT NULL,
? ??? `ATTRIBUTEID`char(4) NOT NULL,
? ??? `VALUE`varchar(20) NOT NULL,
? ??? `COLLECTTIME`datetime NOT NULL
) ENGINE=InnoDB(适用大部分引擎,可根据需要调整) DEFAULT CHARSET=latin1
PARTITION BY RANGE(to_days(`时间字段名`))
(PARTITION pminVALUES LESS THAN (to_days('2010-01-01')),
PARTITION p201001VALUES LESS THAN (to_days('2010-02-01')) ,
PARTITION p201002VALUES LESS THAN (to_days('2010-03-01')) ,
PARTITION p201003VALUES LESS THAN (to_days('2010-04-01')) ,
PARTITION p201004VALUES LESS THAN (to_days('2010-05-01')) ,
PARTITION p201005VALUES LESS THAN (to_days('2010-06-01')) ,
PARTITION p201006VALUES LESS THAN (to_days('2010-07-01')) ,
PARTITION p201007VALUES LESS THAN (to_days('2010-08-01')) ,
PARTITION p201008VALUES LESS THAN (to_days('2010-09-01')) ,
PARTITION p201009VALUES LESS THAN (to_days('2010-10-01')) ,
PARTITION p201010VALUES LESS THAN (to_days('2010-11-01')),
PARTITION p201011VALUES LESS THAN (to_days('2010-12-01')),
PARTITION p201012VALUES LESS THAN (to_days('2011-01-01')),
PARTITION pmax VALUESLESS THAN MAXVALUE );
?
2.????????为现有表创建分区:
????????alter table表名
PARTITION BY RANGE(to_days(`时间字段名`))
(PARTITION pminVALUES LESS THAN (to_days('2010-01-01')),
PARTITION p201001VALUES LESS THAN (to_days('2010-02-01')) ,
PARTITION p201002VALUES LESS THAN (to_days('2010-03-01')) ,
PARTITION p201003VALUES LESS THAN (to_days('2010-04-01')) ,
PARTITION p201004 VALUESLESS THAN (to_days('2010-05-01')) ,
PARTITION p201005VALUES LESS THAN (to_days('2010-06-01')) ,
PARTITION p201006VALUES LESS THAN (to_days('2010-07-01')) ,
PARTITION p201007VALUES LESS THAN (to_days('2010-08-01')) ,
PARTITION p201008VALUES LESS THAN (to_days('2010-09-01')) ,
PARTITION p201009VALUES LESS THAN (to_days('2010-10-01')) ,
PARTITION p201010VALUES LESS THAN (to_days('2010-11-01')),
PARTITION p201011VALUES LESS THAN (to_days('2010-12-01')),
PARTITION p201012VALUES LESS THAN (to_days('2011-01-01')),
PARTITION pmax VALUESLESS THAN MAXVALUE );
?
3.????????删除表中的指定分区(删除分区会导致分区数据丢失,建议先备份):
ALTERTABLE表名DROP PARTITION p0;
?
4.????????追加表分区
需要先删除MAXVALUE分区后增加分区后再重建MAXVALUE分区,删除前需要先备份MAXVALUE分区数据.
ALTER TABLE表名 DROPPARTITION pmax;
ALTER TABLE表名
ADD PARTITION (
PARTITION p201201VALUES LESS THAN (to_days('2012-2-1')),
PARTITION pmax VALUESLESS THAN MAXVALUE);
?
5.?????查看标分区信息
SELECT
? ?????????????? partition_namepart,?
? ?????????????? partition_expressionexpr,?
? ?????????????? partition_descriptiondescr,?
? ?????????????? table_rows?
FROM
? ?????????????? INFORMATION_SCHEMA.partitions?
WHERE
? ? ? ? ? ? ? ? TABLE_SCHEMA= schema()
? ? ? ? AND
TABLE_NAME='表名';?
?
6.?????查看查询语句涉及分区信息
explainpartitions
select …from表名 where …;
性能对比:1.????????测试环境
CPU: Intel 奔腾双核 E5300
硬盘: 西数(320GB/7200/16M蓝盘)
内存: 南亚易胜 DDR2 800MHz 1GB +三星 DDR2 800MHz 1GB
操作系统:Windows XP
MySQL版本: 5.1.57(5.1+版本支持分区表)
?
2.????????表信息
表结构:
名
类型
长度
?
EQUIPMENTID
char
17
主键1
ATTRIBUTEID
char
4
主键2
VALUE
varchar
20
?
COLLECTTIME
datetime
?
主键3
?
总记录数:580W
分区信息(红色为主要测试区域):
part
expr
descr
table_rows
pmin
to_days(COLLECTTIME)
734138
2686
p201001
to_days(COLLECTTIME)
734169
2511883
p201002
to_days(COLLECTTIME)
734197
192497
p201003
to_days(COLLECTTIME)
734228
811103
p201004
to_days(COLLECTTIME)
734258
82894
p201005
to_days(COLLECTTIME)
734289
109297
p201006
to_days(COLLECTTIME)
734319
555065
p201007
to_days(COLLECTTIME)
734350
742949
p201008
to_days(COLLECTTIME)
734381
525900
p201009
to_days(COLLECTTIME)
734411
89
p201010
to_days(COLLECTTIME)
734442
71665
p201011
to_days(COLLECTTIME)
734472
85964
p201012
to_days(COLLECTTIME)
734503
1612
p201101
to_days(COLLECTTIME)
734534
176
p201102
to_days(COLLECTTIME)
734562
253
p201103
to_days(COLLECTTIME)
734593
44824
p201104
to_days(COLLECTTIME)
734623
62324
p201105
to_days(COLLECTTIME)
734654
50658
p201106
to_days(COLLECTTIME)
734684
0
p201107
to_days(COLLECTTIME)
734715
0
p201108
to_days(COLLECTTIME)
734746
0
p201109
to_days(COLLECTTIME)
734776
0
p201110
to_days(COLLECTTIME)
734807
0
p201111
to_days(COLLECTTIME)
734837
0
p201112
to_days(COLLECTTIME)
734868
0
p201201
to_days(COLLECTTIME)
734899
0
p201202
to_days(COLLECTTIME)
734928
0
pmax
to_days(COLLECTTIME)
MAXVALUE
921
?
3.????????查询效率对比
对比表:无分区表名nopart_data,有分区表名part_data
查询条件:select count(*) from 表名 whereCOLLECTTIME > 起始时间 and COLLECTTIME <终止时间
查询耗时按照3次平均值统计
统计表:
开始时间
结束时间
查询结果
无分区耗时
有分区耗时
涉及分区
全部
5848859
6.26s
9.58s
全部
2010-5-1
2010-6-1
109086
7.04s
0.48s
pmin,p201005
2010-6-1
2010-7-1
554695
8.34s
0.38s
pmin,p201006
2010-7-1
2010-8-1
742565
7.57s
0.43s
pmin,p201007
2010-5-1
2010-7-1
663781
7.07s
0.51s
pmin,p201005,p201006
2010-6-1
2010-8-1
1297260
6.84s
1.93s
pmin,p201006,p201007
2010-5-1
2010-8-1
1406346
6.97s
2.30s
pmin,p201006,p201007,p201008
小结:
1)????分区表查询在查询上有明显优势.但在跨区查询时会有查询时间消耗,因此需要注意分区的疏密程度.
2)????每次查询都会查询pmin(第一个分区),因此需要尽量减少这个分区的数据.
?
4.????????写入数据效率对比
COLLECTTIME
无分区耗时
有分区耗时
2010-5-22 15:36
0.05s
0.03s
2010-6-22 15:36
0.02s
0.05s
2010-7-22 15:36
0.03s
0.03s
小结:
1) 分区对单条数据的插入操作无较大影响.
?
以上是我对MySQL的初体验总结,没啥心得体会,只有一点点成就感,希望和大家分享.
另外分区表尚存在问题:
1,是否可将分区表设置在不同硬盘,innodb可行?
2,是否可根据多条件进行水平分区,类似group by 列1,列2...
3,是否能将分区设置成不同引擎,例如当前使用中的分区为innodb,老的分区使用MyISAM