GC开销的降低--netty4在Twitter的使用
netty的founderTrustin Lee发布在Twitter上的一篇博客,非常好,直接转。
?
The following text from Twitter?
?
?
?
At Twitter,?Netty?(@netty_project) is used in core places requiring networking functionality.
For example:
For those who aren’t aware, Netty is an open source?Java NIO?framework that makes it easier to create high-performing protocol servers. An older version of Netty v3 used Java objects to represent I/O events. This was simple, but could generate a lot of?garbageespecially at our scale. In the new Netty 4 release, changes were made so that instead of short-lived event objects, methods on long-lived channel objects are used to handle I/O events. There is also a specialized buffer allocator that uses pools.
We take the performance, usability, and sustainability of the Netty project seriously, and we have been working closely with the Netty community to improve it in all aspects. In particular, we will discuss our usage of Netty 3 and will aim to show why migrating to Netty 4 has made us more efficient.
?
?
A problem was Netty 3’s reliance on the JVM’s memory management for buffer allocations. Netty 3 creates a new heap buffer whenever a new message is received or a user sends a message to a remote peer. This means a ‘new byte[capacity]’ for each new buffer. These buffers caused GC pressure and consumed memory bandwidth: allocating a new byte array consumes memory bandwidth to fill the array with zeros for safety. However, the zero-filled byte array is very likely to be filled with the actual data, consuming the same amount of memory bandwidth. We could have reduced the consumption of memory bandwidth to 50% if the Java Virtual Machine (JVM) provided a way to create a new byte array which is not necessarily filled with zeros, but there’s no such way at this moment.
To address this issue, we made the following changes for Netty 4.
Instead of creating event objects, Netty 4 defines different methods for different event types. In Netty 3, the?ChannelHandler?has a single method that handles all event objects:
class Before implements ChannelUpstreamHandler { void handleUpstream(ctx, ChannelEvent e) { if (e instanceof MessageEvent) { ... } else if (e instanceof ChannelStateEvent) { ... } ... }}
Netty 4 has as many handler methods as the number of event types:
class After implements ChannelInboundHandler { void channelActive(ctx) { ... } void channelInactive(ctx) { ... } void channelRead(ctx, msg) { ... } void userEventTriggered(ctx, evt) { ... } ...}
Note a handler now has a method called ‘userEventTriggered’ so that it does not lose the ability to define a custom event object.
Netty 4 also introduced a new interface, ‘ByteBufAllocator’. It now provides a buffer pool implementation via that interface and is a pure Java variant of?jemalloc, which implements?buddy memory allocation?and?slab allocation.
Now that Netty has its own memory allocator for buffers, it doesn’t waste memory bandwidth by filling buffers with zeros. However, this approach opens another can of worms—reference counting. Because we cannot rely on GC to put the unused buffers into the pool, we have to be very careful about leaks. Even a single handler that forgets to release a buffer can make our server’s memory usage grow boundlessly.
Because of the changes mentioned above, Netty 4 has no backward compatibility with Netty 3. It means our projects built on top of Netty 3 as well as other community projects have to spend non-trivial amount of time for migration. Is it worth doing that?
We compared two?echo?protocol servers built on top of Netty 3 and 4 respectively. (Echo is simple enough such that any garbage created is Netty’s fault, not the protocol). I let them serve the same distributed echo protocol clients with 16,384 concurrent connections sending 256-byte random payload repetitively, nearly saturating gigabit ethernet.
According to our test result, Netty 4 had:
I also wanted to make sure our buffer pool is fast enough. Here’s a graph where the X and Y axis denote the size of each allocation and the time taken to allocate a single buffer respectively:
As you see, the buffer pool is much faster than JVM as the size of the buffer increases. It is even more noticeable for direct buffers. However, it could not beat JVM for small heap buffers, so we have something to work on here.
Although some parts of our services already migrated from Netty 3 to 4 successfully, we are performing the migration gradually. We discovered some barriers that slow our adoption that we hope to address in the near future:
We also are thinking of adding more cool features such as:
What’s interesting about Netty is that it is used by many different people and companies worldwide, mostly not from Twitter. It is an independent and very healthy open source project with many?contributors. If you are interested in building ‘the future of network programming’, why don’t you visit the project?web site, follow?@netty_project, jump right into the?source code?at GitHub or even consider?joining the flock?to help us improve Netty?
?
Netty project was founded by Trustin Lee (@trustin) who joined the flock in 2011 to help build Netty 4. We also like to thank Jeff Pinner (@jpinner) from the TFE team who gave many great ideas mentioned in this article and became a guinea pig for Netty 4 without hesitation. Furthermore, Norman Maurer (@normanmaurer), one of the core Netty committers, made an enormous amount of effort to help us materialize the great ideas into actually shippable piece of code as part of the Netty project. There are also countless number of individuals who gladly tried a lot of unstable releases catching up all the breaking changes we had to make, in particular we would like to thank: Berk Demir (@bd), Charles Yang (@cmyang), Evan Meagher (@evanm), Larry Hosken (@lahosken), Sonja Keserovic (@thesonjake), and Stu Hood (@stuhood).