分布式系统中某个节点随机响应时间升高解析[String常量池导致dumpThread变慢]
故事要从半年前说起....
大体的问题描述是:
约一个季度以上的时间内, 我们发现我们系统的5台机器的90%响应时间都是每天都有随机的一台比其他机器高出20ms-30ms.
在 同事的努力下开始寻找这个问题, 主要怀疑是定时任务导致的. 经过不懈的努力, 一段时间的结果是发现 fastjson 进行反序列化的 json的key 做缓存, 放到static域里的一个List不释放. 大约这个List有65w大小. 然后就导致Log打印从以前的1ms 升到了20ms.
?
1, 从观察到上述的结果后,? 我们开始寻找具体导致这个问题的原因. 模型就是[65w的List String不释放] --> [Log.info一次20ms以上]
观察fastjson的源码发现, 这65w的string存储不释放可能导致的原因比较不好寻找, 不过在思考后, 直觉是由可能这两者都是在操作字符串, 会不会是字符串常量池导致的.
fastjson的缓存key代码如下:
?
/** * Constructs a new entry from the specified symbol information and next entry reference. */public Entry(char[] ch, int offset, int length, int hash, Entry next){ characters = new char[length]; System.arraycopy(ch, offset, characters, 0, length); symbol = new String(characters).intern(); this.next = next; this.hashCode = hash; this.bytes = null;}public Entry(String text, int offset, int length, int hash, Entry next){ symbol = text.substring(offset, offset + length).intern(); characters = symbol.toCharArray(); this.next = next; this.hashCode = hash; this.bytes = null;}
?
上边这块代码是在fastjson使用自己实现的SymbolTable 缓存反序列化对象的key时 存储的数据结构和代码.?? 这里的key都会进行一次 intern 操作. 为了放入全局常量池中, 加快速度. 但这里我们调用的 商家系统 等价接口返回的key是变化的. 所以会导致缓存的key特别多.? 知道65w数据而不释放.?
2, 看到了fastjson使用了intern操作, 敏感的意识到intern操作常量池后, 会导致Log打印日志时操作字符串变慢. 来看String.intern 的实现:
?
/** * Returns a canonical representation for the string object. * <p> * A pool of strings, initially empty, is maintained privately by the * class <code>String</code>. * <p> * When the intern method is invoked, if the pool already contains a * string equal to this <code>String</code> object as determined by * the {@link #equals(Object)} method, then the string from the pool is * returned. Otherwise, this <code>String</code> object is added to the * pool and a reference to this <code>String</code> object is returned. * <p> * It follows that for any two strings <code>s</code> and <code>t</code>, * <code>s.intern() == t.intern()</code> is <code>true</code> * if and only if <code>s.equals(t)</code> is <code>true</code>. * <p> * All literal strings and string-valued constant expressions are * interned. String literals are defined in section 3.10.5 of the * <cite>The Java™ Language Specification</cite>. * * @return a string that has the same contents as this string, but is * guaranteed to be from a pool of unique strings. */public native String intern();
??
好吧, 你们看到了.? intern 方法是个native的. 这一点以前没有跟过感觉有点麻烦了.看方法的注释里如下解释:
?
When the intern method is invoked, if the pool already contains a * string equal to this <code>String</code> object as determined by * the {@link #equals(Object)} method, then the string from the pool is * returned. Otherwise, this <code>String</code> object is added to the * pool and a reference to this <code>String</code> object is returned.
?
注释中明确说明, 如果常量池中存在当前字符串, 就会直接返回当前字符串.? 如果常量池中没有此字符串, 会将此字符串放入常量池中后, 再返回.
从这里可以明确确定, String.intern的操作增大了常量池. 具体常量池是怎么运行的得继续寻找native代码.
?
3, 测试到底是不是intern操作导致的Log变慢:
?
private static final List<Object> finalStringList = new ArrayList<Object>();public void fillPermGenMemory() { for (int i = 0; i < 1000000; i++) { finalStringList.add(String.valueOf(i).intern()); }}public JsonResponse testLogIntern() { long start = System.currentTimeMillis(); System.out.println("start=" + start); logger.info("hao......"); logger.info("hao......"); logger.info("hao......"); logger.info("hao......"); long end = System.currentTimeMillis(); long c = end - start; return new JsonResponse().addData(c);}
?
这样运行 fillPermGenMemory()方法后, 日志打印就到100ms了.?
若将 finalStringList.add(String.valueOf(i).intern()); 这句代码中的intern操作去掉, 变成 finalStringList.add(String.valueOf(i)); 后. 日志打印时间就变成1ms之内了.
因此通过上述的测试确定 Log变慢跟intern有十分强烈的联系. 接下来就要寻找具体的联系了.
?
4, 在网上找资料看String的常量池实现. 发现资料不是想象中的那么多, 而且1.6和1.7还有不小的区别
http://hellojava.info/?p=61
这篇文章中有提到 1.7 的jdk将String的常量池移到了非Perm区的队里, 轻易不会导致OOM了.
而且这里也比较明确的说道:
?
String.intern放进的String Pool是一个固定大小的Hashtable,默认值是1009,如果放进String Pool的String非常多,就会造成Hash冲突严重,从而导致链表会很长,而链表长了后直接会造成的影响就是当调用String.intern时性能会大幅下降(因为要一个一个找)。
?以及继续看他的native实现代码:
?\openjdk7\jdk\src\share\native\java\lang\String.c
?
Java_java_lang_String_intern(JNIEnv *env, jobject this){ return JVM_InternString(env, this);}
??
?\openjdk7\hotspot\src\share\vm\prims\jvm.h?
?
/** java.lang.String*/JNIEXPORT jstring JNICALLJVM_InternString(JNIEnv *env, jstring str); C:\cygwin\home\john\workspace\openjdk7\hotspot\src\share\vm\prims\jvm.cpp// String support ///////////////////////////////////////////////////////////////////////////JVM_ENTRY(jstring, JVM_InternString(JNIEnv *env, jstring str)) JVMWrapper("JVM_InternString"); JvmtiVMObjectAllocEventCollector oam; if (str == NULL) return NULL; oop string = JNIHandles::resolve_non_null(str); oop result = StringTable::intern(string, CHECK_NULL); return (jstring) JNIHandles::make_local(env, result);JVM_END
?
\openjdk7\hotspot\src\share\vm\classfile\symbolTable.cpp
oop StringTable::intern(Handle string_or_null, jchar* name, int len, TRAPS) { unsigned int hashValue = java_lang_String::hash_string(name, len); int index = the_table()->hash_to_index(hashValue); oop string = the_table()->lookup(index, name, len, hashValue); // Found if (string != NULL) return string; // Otherwise, add to symbol to table return the_table()->basic_add(index, string_or_null, name, len, hashValue, CHECK_NULL);}
?
\openjdk7\hotspot\src\share\vm\classfile\symbolTable.cpp
?
oop StringTable::lookup(int index, jchar* name, int len, unsigned int hash) { for (HashtableEntry<oop>* l = bucket(index); l != NULL; l = l->next()) { if (l->hash() == hash) { if (java_lang_String::equals(l->literal(), name, len)) { return l->literal(); } } } return NULL;}
?
大体的结构就是:
public JsonResponse testFormatIntern() { long start = System.currentTimeMillis(); System.out.println("start=" + start); System.out.println(MessageFormatter.format("zhe shi yige {} de rizi. {} duibudui ?", new Random().nextDouble(), new Random().nextDouble()).getMessage()); System.out.println(MessageFormatter.format("zhe shi yige {} de rizi2. {} duibudui ?", new Random().nextDouble(), new Random().nextDouble()).getMessage()); System.out.println(MessageFormatter.format("zhe shi yige {} de rizi3. {} duibudui ?", new Random().nextDouble(), new Random().nextDouble()).getMessage()); System.out.println(MessageFormatter.format("zhe shi yige {} de rizi4. {} duibudui ?", new Random().nextDouble(), new Random().nextDouble()).getMessage()); long end = System.currentTimeMillis(); long c = end - start; return new JsonResponse().addData(c);}?
但是发现这个的时间在 intern 增大后几乎没有任何的影响, 还是1 2ms.. 这里就郁闷了, 发现走错路了.
?
6, 然后从这里开始使用housemd 工具对Logger.info() 方法进行时间的跟踪.
首先是 trace ?SLF4JLogger. 如下结果:
?
org/slf4j/impl/SLF4JLogger.debug(String) sun.misc.Launcher$AppClassLoader@109aca82 2 <1ms org.slf4j.impl.SLF4JLogger@6039fe60org/slf4j/impl/SLF4JLogger.debug(String, Object[]) sun.misc.Launcher$AppClassLoader@109aca82 2 <1ms org.slf4j.impl.SLF4JLogger@7f6b81faorg/slf4j/impl/SLF4JLogger.getLevel(int) sun.misc.Launcher$AppClassLoader@109aca82 14 1ms org.slf4j.impl.SLF4JLogger@3ede1597org/slf4j/impl/SLF4JLogger.info(String) sun.misc.Launcher$AppClassLoader@109aca82 2 41ms org.slf4j.impl.SLF4JLogger@67c5fef9org/slf4j/impl/SLF4JLogger.isDebugEnabled() sun.misc.Launcher$AppClassLoader@109aca82 1 <1ms org.slf4j.impl.SLF4JLogger@665b8cbforg/slf4j/impl/SLF4JLogger.isTraceEnabled() sun.misc.Launcher$AppClassLoader@109aca82 2 <1ms org.slf4j.impl.SLF4JLogger@708cd580org/slf4j/impl/SLF4JLogger.log(Marker, String, int, String, Object[], Throwable) sun.misc.Launcher$AppClassLoader@109aca82 14 9ms org.slf4j.impl.SLF4JLogger@3ede1597
?
实现代码如下:
?
@Overridepublic void info(final String format) { if (logger.isInfoEnabled()) { logger.log(null, FQCN, Level.INFO, new SimpleMessage(format), null); }}
?找到logger的实现类继续 trace AbstractLoggerWrapper:
?
org/apache/logging/log4j/spi/AbstractLoggerWrapper.isEnabled(Level, Marker, Object, Throwable) sun.misc.Launcher$AppClassLoader@109aca82 1 1ms org.apache.logging.log4j.spi.AbstractLoggerWrapper@56d31953org/apache/logging/log4j/spi/AbstractLoggerWrapper.isEnabled(Level, Marker, String) sun.misc.Launcher$AppClassLoader@109aca82 1 <1ms org.apache.logging.log4j.spi.AbstractLoggerWrapper@d8c5450org/apache/logging/log4j/spi/AbstractLoggerWrapper.log(Marker, String, Level, Message, Throwable) sun.misc.Launcher$AppClassLoader@109aca82 2 38ms org.apache.logging.log4j.spi.AbstractLoggerWrapper@1ac28616
?继续trace AsyncLogger
?
org/apache/logging/log4j/core/async/AsyncLogger.actualAsyncLog(RingBufferLogEvent) sun.misc.Launcher$AppClassLoader@109aca82 1 1ms org.apache.logging.log4j.core.async.AsyncLogger@19de86bborg/apache/logging/log4j/core/async/AsyncLogger.location(String) sun.misc.Launcher$AppClassLoader@109aca82 1 30ms org.apache.logging.log4j.core.async.AsyncLogger@19de86bborg/apache/logging/log4j/core/async/AsyncLogger.log(Marker, String, Level, Message, Throwable) sun.misc.Launcher$AppClassLoader@109aca82 1 61ms org.apache.logging.log4j.core.async.AsyncLogger@19de86bb
?这里可以看一下?AsyncLogger.location 这个方法比较异常. 里边主要是调用了 return Log4jLogEvent.calcLocation(fqcnOfLogger); ??Log4jLogEvent.calcLocation()的代码如下:
?
public static StackTraceElement calcLocation(final String fqcnOfLogger) { if (fqcnOfLogger == null) { return null; } final StackTraceElement[] stackTrace = Thread.currentThread().getStackTrace(); boolean next = false; for (final StackTraceElement element : stackTrace) { final String className = element.getClassName(); if (next) { if (fqcnOfLogger.equals(className)) { continue; } return element; } if (fqcnOfLogger.equals(className)) { next = true; } else if (NOT_AVAIL.equals(className)) { break; } } return null;}
?
?
7, 这里才考虑到可能是?Thread.currentThread().getStackTrace(); 调用的时候, 因为这个方法主要是获取当前方法的调用栈, 特别是在打印异常的时候会用到. ?因此此方法会去常量池中取 java的类名, 方法名, 如果常量池比较答, 所以搜索时间会比较长. 因此根据这个思路去跟他的native的方法实现.
具体的过程如下:
?
public StackTraceElement[] getStackTrace() { if (this != Thread.currentThread()) { // check for getStackTrace permission SecurityManager security = System.getSecurityManager(); if (security != null) { security.checkPermission( SecurityConstants.GET_STACK_TRACE_PERMISSION); } // optimization so we do not call into the vm for threads that // have not yet started or have terminated if (!isAlive()) { return EMPTY_STACK_TRACE; } StackTraceElement[][] stackTraceArray = dumpThreads(new Thread[] {this}); StackTraceElement[] stackTrace = stackTraceArray[0]; // a thread that was alive during the previous isAlive call may have // since terminated, therefore not having a stacktrace. if (stackTrace == null) { stackTrace = EMPTY_STACK_TRACE; } return stackTrace; } else { // Don't need JVM help for current thread return (new Exception()).getStackTrace(); }}private native static StackTraceElement[][] dumpThreads(Thread[] threads);
?
?
?\openjdk7\jdk\src\share\native\java\lang\Thread.c
?
static JNINativeMethod methods[] = { {"start0", "()V", (void *)&JVM_StartThread}, {"stop0", "(" OBJ ")V", (void *)&JVM_StopThread}, {"isAlive", "()Z", (void *)&JVM_IsThreadAlive}, {"suspend0", "()V", (void *)&JVM_SuspendThread}, {"resume0", "()V", (void *)&JVM_ResumeThread}, {"setPriority0", "(I)V", (void *)&JVM_SetThreadPriority}, {"yield", "()V", (void *)&JVM_Yield}, {"sleep", "(J)V", (void *)&JVM_Sleep}, {"currentThread", "()" THD, (void *)&JVM_CurrentThread}, {"countStackFrames", "()I", (void *)&JVM_CountStackFrames}, {"interrupt0", "()V", (void *)&JVM_Interrupt}, {"isInterrupted", "(Z)Z", (void *)&JVM_IsInterrupted}, {"holdsLock", "(" OBJ ")Z", (void *)&JVM_HoldsLock}, {"getThreads", "()[" THD, (void *)&JVM_GetAllThreads}, {"dumpThreads", "([" THD ")[[" STE, (void *)&JVM_DumpThreads},};
??
\openjdk7\hotspot\src\share\vm\prims\jvm.h 294:
?
/* getStackTrace() and getAllStackTraces() method */JNIEXPORT jobjectArray JNICALLJVM_DumpThreads(JNIEnv *env, jclass threadClass, jobjectArray threads); C:\cygwin\home\john\workspace\openjdk7\hotspot\src\share\vm\prims\jvm.cpp4382-4414:// Support for java.lang.Thread.getStackTrace() and getAllStackTraces() methods// Return StackTraceElement[][], each element is the stack trace of a thread in// the corresponding entry in the given threads arrayJVM_ENTRY(jobjectArray, JVM_DumpThreads(JNIEnv *env, jclass threadClass, jobjectArray threads)) JVMWrapper("JVM_DumpThreads"); JvmtiVMObjectAllocEventCollector oam; // Check if threads is null if (threads == NULL) { THROW_(vmSymbols::java_lang_NullPointerException(), 0); } objArrayOop a = objArrayOop(JNIHandles::resolve_non_null(threads)); objArrayHandle ah(THREAD, a); int num_threads = ah->length(); // check if threads is non-empty array if (num_threads == 0) { THROW_(vmSymbols::java_lang_IllegalArgumentException(), 0); } // check if threads is not an array of objects of Thread class klassOop k = objArrayKlass::cast(ah->klass())->element_klass(); if (k != SystemDictionary::Thread_klass()) { THROW_(vmSymbols::java_lang_IllegalArgumentException(), 0); } ResourceMark rm(THREAD); GrowableArray<instanceHandle>* thread_handle_array = new GrowableArray<instanceHandle>(num_threads); for (int i = 0; i < num_threads; i++) { oop thread_obj = ah->obj_at(i); instanceHandle h(THREAD, (instanceOop) thread_obj); thread_handle_array->append(h); } Handle stacktraces = ThreadService::dump_stack_traces(thread_handle_array, num_threads, CHECK_NULL); return (jobjectArray)JNIHandles::make_local(env, stacktraces());JVM_END
?
?
\openjdk7\hotspot\src\share\vm\services\threadService.cpp?
235-267:?
?
// Dump stack trace of threads specified in the given threads array.// Returns StackTraceElement[][] each element is the stack trace of a thread in// the corresponding entry in the given threads arrayHandle ThreadService::dump_stack_traces(GrowableArray<instanceHandle>* threads, int num_threads, TRAPS) { assert(num_threads > 0, "just checking"); ThreadDumpResult dump_result; VM_ThreadDump op(&dump_result, threads, num_threads, -1, /* entire stack */ false, /* with locked monitors */ false /* with locked synchronizers */); VMThread::execute(&op); // Allocate the resulting StackTraceElement[][] object ResourceMark rm(THREAD); klassOop k = SystemDictionary::resolve_or_fail(vmSymbols::java_lang_StackTraceElement_array(), true, CHECK_NH); objArrayKlassHandle ik (THREAD, k); objArrayOop r = oopFactory::new_objArray(ik(), num_threads, CHECK_NH); objArrayHandle result_obj(THREAD, r); int num_snapshots = dump_result.num_snapshots(); assert(num_snapshots == num_threads, "Must have num_threads thread snapshots"); int i = 0; for (ThreadSnapshot* ts = dump_result.snapshots(); ts != NULL; i++, ts = ts->next()) { ThreadStackTrace* stacktrace = ts->get_stack_trace(); if (stacktrace == NULL) { // No stack trace result_obj->obj_at_put(i, NULL); } else { // Construct an array of java/lang/StackTraceElement object Handle backtrace_h = stacktrace->allocate_fill_stack_trace_element_array(CHECK_NH); result_obj->obj_at_put(i, backtrace_h()); } } return result_obj;}
?
?
\openjdk7\hotspot\src\share\vm\services\threadService.cpp 566-577:
?
Handle ThreadStackTrace::allocate_fill_stack_trace_element_array(TRAPS) { klassOop k = SystemDictionary::StackTraceElement_klass(); assert(k != NULL, "must be loaded in 1.4+"); instanceKlassHandle ik(THREAD, k); // Allocate an array of java/lang/StackTraceElement object objArrayOop ste = oopFactory::new_objArray(ik(), _depth, CHECK_NH); objArrayHandle backtrace(THREAD, ste); for (int j = 0; j < _depth; j++) { StackFrameInfo* frame = _frames->at(j); methodHandle mh(THREAD, frame->method()); oop element = java_lang_StackTraceElement::create(mh, frame->bci(), CHECK_NH); backtrace->obj_at_put(j, element); } return backtrace;}
?
?
?
\openjdk7\hotspot\src\share\vm\classfile\javaClasses.cpp 1635-[1651,1654,1658]:
?
oop java_lang_StackTraceElement::create(methodHandle method, int bci, TRAPS) { // SystemDictionary::stackTraceElement_klass() will be null for pre-1.4 JDKs assert(JDK_Version::is_gte_jdk14x_version(), "should only be called in >= 1.4"); // Allocate java.lang.StackTraceElement instance klassOop k = SystemDictionary::StackTraceElement_klass(); assert(k != NULL, "must be loaded in 1.4+"); instanceKlassHandle ik (THREAD, k); if (ik->should_be_initialized()) { ik->initialize(CHECK_0); } Handle element = ik->allocate_instance_handle(CHECK_0); // Fill in class name ResourceMark rm(THREAD); const char* str = instanceKlass::cast(method->method_holder())->external_name(); oop classname = StringTable::intern((char*) str, CHECK_0); java_lang_StackTraceElement::set_declaringClass(element(), classname); // Fill in method name oop methodname = StringTable::intern(method->name(), CHECK_0); java_lang_StackTraceElement::set_methodName(element(), methodname); // Fill in source file name Symbol* source = instanceKlass::cast(method->method_holder())->source_file_name(); oop filename = StringTable::intern(source, CHECK_0); java_lang_StackTraceElement::set_fileName(element(), filename); // File in source line number int line_number; if (method->is_native()) { // Negative value different from -1 below, enabling Java code in // class java.lang.StackTraceElement to distinguish "native" from // "no LineNumberTable". line_number = -2; } else { // Returns -1 if no LineNumberTable, and otherwise actual line number line_number = method->line_number_from_bci(bci); } java_lang_StackTraceElement::set_lineNumber(element(), line_number); return element();}
?
?经过漫长的jvm源码跟踪, 走过了几个岔路, 还是终于走上了正途, 最后的三句代码问题就是
?
?
oop classname = StringTable::intern((char*) str, CHECK_0);oop methodname = StringTable::intern(method->name(), CHECK_0);oop filename = StringTable::intern(source, CHECK_0);
?
?
这三段代码是获取类名, 方法名, 和 文件名. 去常量池中搜索的时候性能急剧下降.?
而且一般的StackTrace的长度在70左右, 因此一般一条log语句要调用 210次左右的intern操作. ?性能耗费也是很巨大
?
8, 好了, 基本过程结束, 就来做最后的验证工作:
?
public JsonResponse testGetStackTraceIntern() throws Exception{ StackTraceElement[] stackTraceElements = Thread.currentThread().getStackTrace(); long start = System.currentTimeMillis(); System.out.println("start=" + start); for (int i = 0; i < 4; i++) { for (StackTraceElement stackTraceElement : stackTraceElements) { stackTraceElement.getClassName().intern(); stackTraceElement.getMethodName().intern(); if (stackTraceElement.getFileName() != null) stackTraceElement.getFileName().intern(); } } long end = System.currentTimeMillis(); long c = end - start; return new JsonResponse().addData(c);}
??这里是模拟了StackTrace调用4次的intern操作结果, ?跟打印4次log调用的intern次数和字符串都是基本相同的, ?得到的结果是这个时间和打印log的时间相差在5%以内. 所以验证工作基本结束了. 这里才是真相!
?
前前后后, 我和同事寻找这个问题找了1 2 个月, 才把最后的结果给找了出来, 中途一度因为没有思路而没法跟下去, 但经过坚持最后还是得到了真相, 而且让我对JNI的调用, Java native方法的调用和实现都有了比较深入的认识和学习.
结果就是因为
1,每天我们执行一个统计任务. 统计任务调用商家系统的等价接口返回等价信息
2,fastjson反序列化等价信息的json, 将所有的key缓存到 String的常量池中不释放.
3,Logger.info的时候需要通过Thread.currentTread().getStackTrace() 去定位当前栈信息.
4,Thread.currentTread().getStackTrace() 方法生成StackTrace对象的时候, 会去String常量池中取 类的方法名 类名和文件名, ?因此时间都耗费在链表的查询上了
?
这个问题前后找了1 2个月, 在网上也没有任何跟这个例子有关的文章, 因为详细的记录下来, 避免大家以后碰到此问题比较棘手.