MySQL PrepareStatement根本的两种模式&客户端空间占用的源码分析

2013-10-08

MySQL PrepareStatement基本的两种模式&客户端空间占用的源码分析关于预编译（PrepareStatement），对于所有

MySQL PrepareStatement基本的两种模式&客户端空间占用的源码分析

关于预编译（PrepareStatement），对于所有的JDBC驱动程序来讲，有一个共同的功能，就是“防止SQL注入”，类似Oracle还有一种“软解析”的概念，它非常适合应用于OLTP类型的系统中。

在JDBC常见的操作框架中，例如ibatis、jdbcTemplate这些框架对JDBC操作时，默认会走预编译（jdbcTemplate如果没有传递参数，则会走createStatement），这貌似没有什么问题。不过在一个应用中发现了大量的预编译对象导致频繁GC，于是进行了源码上的一些跟踪，写下这篇文章，这里分别从提到的几个参数，以及源码中如何应用这几个参数来说明。

看看有那些参数：

MySQL JDBC是通过其Driver的connenct方法获取到连接，然后可以将连接参数设置在JDBC URL或者Properties中，它会根据这些参数来创建一个Connection，简单说来就是将这些参数解析为K-V结构，交给Connection的对象来解析，Connection会将它们解析为自己所能识别的许多属性中，这个属性的类型为：ConnectionProperty，当然有许多子类来实现不同的类型，例如：BooleanConnectionProperty、IntegerConnectionProperty是处理不同参数类型的。

这些参数会保存在Connection对象中（在源码中，早期的版本，源码的类名就叫：com.mysql.jdbc.Connection，新版本的叫做：com.mysql.jdbc.ConnectionImpl，抽象了接口与实现类，这里统一称Connection的对象）；具体是保存在这个Connection的父类中，这里将几个与本题相关的几个参截取出来，如下所示：

detectServerPreparedStmts”的获取方法是：“getUseServerPreparedStmts()”，如下图：

好吧，不关注它的屌丝做法了，来继续关注正题。
来看看PrepareStatement初始化与编译过程：
要预编译，自然是通过Connection去做的，默认调用的预编译参数是这样一个方法：

这里将逻辑分解为两个大板块：一个为com.mysql.jdbc.ServerPreparedStatement，一个是默认的，反过来讲就是如果是服务器端的Statement，处理类的类名一眼就能看出来。
那么什么时候会走服务器端的PrepareStatement呢？服务器端的PrepareStatement与普通的到底有什么区别呢？先看第一个问题，以下几条代码是进入逻辑的关键：


这个逻辑基本与ServerPrepareStatement内部的逻辑差不多，唯一的区别就是这个是显式做了LRU算法，而这个LRU是一是一种最简单的最近最久未使用方式，将最后一个删掉，将现在这个写进去，它同样也有getCachePreparedStatements()、getPreparedStatementCacheSqlLimit()来控制是否做cache操作，也同样用了一个K-V结构来做cache，这个K-V结构，通过Connection的初始化方法：initializeDriverProperties(Properties)间接调用：createPreparedStatementCaches()完成初始化，可以看到他会被初始化为一个HashMap结构，较早的版本会创建多个类似大小的对象出来。

好了，现在来看问题，一个HashMap不足以造成多少问题，因为有LRU队列来控制长度，但是看代码中你会发现它没控制并行处理，HashMap是非线程安全的，那么为啥MySQL JDBC没出问题呢？因为你会发现这个HashMap完全绑定到Connection对象上，成为Connection对象的一个属性，连接池分配的时候没见过会将一个Connection同时分配给两个请求的，因此它将并发的问题交给了连接池来解决，自己认为线程都是安全的，反过来，如果你自己去并行同一个Connection可能会有问题。

继续回到问题上来，每个Connection都可能cache几十个上百个Statement对象，那么一个按照线上数据源的配置，也就配置5~10个是算比较大的了，也就最多上千个对象，JVM配置都是多少G的空间，几千个对象能造成什么问题？

于是我们来看他cache了什么，主要是普通的PrepareStatement，里面的代码发现编译完后返回了一个ParseInfo类型对象，然后将它作为Value写入到HashMap中，它是一个PrepareStatement的内部类，它的定义如下所示：
public void setString(int parameterIndex, String x) throws SQLException {        // if the passed string is null, then set this column to null        if (x == null) {            setNull(parameterIndex, Types.CHAR);        } else {            checkClosed();            int stringLength = x.length();            if (this.connection.isNoBackslashEscapesSet()) {                // Scan for any nasty chars                boolean needsHexEscape = isEscapeNeededForString(x,                        stringLength);                if (!needsHexEscape) {                    byte[] parameterAsBytes = null;                    StringBuffer quotedString = new StringBuffer(x.length() + 2);                    quotedString.append('\'');                    quotedString.append(x);                    quotedString.append('\'');                    if (!this.isLoadDataQuery) {                        parameterAsBytes = StringUtils.getBytes(quotedString.toString(),                                this.charConverter, this.charEncoding,                                this.connection.getServerCharacterEncoding(),                                this.connection.parserKnowsUnicode());                    } else {                        // Send with platform character encoding                        parameterAsBytes = quotedString.toString().getBytes();                    }                    setInternal(parameterIndex, parameterAsBytes);                } else {                    byte[] parameterAsBytes = null;                    if (!this.isLoadDataQuery) {                        parameterAsBytes = StringUtils.getBytes(x,                                this.charConverter, this.charEncoding,                                this.connection.getServerCharacterEncoding(),                                this.connection.parserKnowsUnicode());                    } else {                        // Send with platform character encoding                        parameterAsBytes = x.getBytes();                    }                    setBytes(parameterIndex, parameterAsBytes);                }                return;            }            String parameterAsString = x;            boolean needsQuoted = true;            if (this.isLoadDataQuery || isEscapeNeededForString(x, stringLength)) {                needsQuoted = false; // saves an allocation later                StringBuffer buf = new StringBuffer((int) (x.length() * 1.1));                buf.append('\'');                //                // Note: buf.append(char) is _faster_ than                // appending in blocks, because the block                // append requires a System.arraycopy()....                // go figure...                //                for (int i = 0; i < stringLength; ++i) {                    char c = x.charAt(i);                                                                             switch (c) {                    case 0: /* Must be escaped for 'mysql' */                        buf.append('\\');                        buf.append('0');                        break;                    case '\n': /* Must be escaped for logs */                        buf.append('\\');                        buf.append('n');                        break;                    case '\r':                        buf.append('\\');                        buf.append('r');                        break;                    case '\\':                        buf.append('\\');                        buf.append('\\');                        break;                    case '\'':                        buf.append('\\');                        buf.append('\'');                        break;                    case '"': /* Better safe than sorry */                        if (this.usingAnsiMode) {                            buf.append('\\');                        }                        buf.append('"');                        break;                    case '\032': /* This gives problems on Win32 */                        buf.append('\\');                        buf.append('Z');                        break;                    default:                        buf.append(c);                    }                }                buf.append('\'');                parameterAsString = buf.toString();            }            byte[] parameterAsBytes = null;            if (!this.isLoadDataQuery) {                if (needsQuoted) {                    parameterAsBytes = StringUtils.getBytesWrapped(parameterAsString,                        '\'', '\'', this.charConverter, this.charEncoding, this.connection                                .getServerCharacterEncoding(), this.connection                                .parserKnowsUnicode());                } else {                    parameterAsBytes = StringUtils.getBytes(parameterAsString,                            this.charConverter, this.charEncoding, this.connection                                    .getServerCharacterEncoding(), this.connection                                    .parserKnowsUnicode());                }            } else {                // Send with platform character encoding                parameterAsBytes = parameterAsString.getBytes();            }            setInternal(parameterIndex, parameterAsBytes);            this.parameterTypes[parameterIndex - 1 + getParameterIndexOffset()] = Types.VARCHAR;        }    }

可以发现，它将传入的参数，进行了特殊字符的转义处理，另外就是在字符串的两边加上了单引号，也就是这与MySQL将SQL转义后传送给服务器端的东西，也就是最终传送的不是分解SQL与参数，而是拼接SQL，只是通过转义防止SQL注入。

在MySQL JDBC中，其实还有许多类似的伪转换，例如批处理，它使用循环来完成的，不过它也算满足了JDBC驱动的基本规范。  

另外，在MySQL分布式数据库上，分表是非常多的，每个物理分表都会有至少好几个SQL，即使每个库下面也会有许多，那么配置几十个cache，它的命中率到底有多少呢？而即便是一个库下面的多个Connection，他们的cache都是彼此独立的，意味着库越多、同一个库下面的表越多、业务逻辑越复杂，这样一个Connection需要多少cache才能达到想要的效果呢？而cache后的结果是占用更多的JVM空间，而且是许多的JVM空间，即使内存可以放得下，在现在的JVM中，只要做发生FULL  GC也会去扫描它们、移动它们。但是反过来，解析这个SQL语句只是解析出占位符，纯CPU密集型，而且次数相对CPU来讲就是小儿科，一个普通SQL可能就是1us的时间，我们没有必要跟JVM过不去，做费力不讨好的事情，因为本身就很土鳖了，再土点不就完蛋了吗。

热点排行

Mysql

MySQL PrepareStatement根本的两种模式&客户端空间占用的源码分析