GC花销的降低-netty4在Twitter的使用

2013-12-06

GC开销的降低--netty4在Twitter的使用netty的founderTrustin Lee发布在Twitter上的一篇博客，非常好，直接转

GC开销的降低--netty4在Twitter的使用

netty的founderTrustin Lee发布在Twitter上的一篇博客，非常好，直接转。

The following text from Twitter?

At Twitter,?Netty?(@netty_project) is used in core places requiring networking functionality.

For example:

Finagle?is our?protocol agnostic RPC system?whose transport layer is built on top of Netty, and it is used to implement most services internally likeSearch
TFE (Twitter Front End) is our proprietary?spoon-feeding?reverse proxy?which serves most of public-facing HTTP and?SPDY?traffic using Netty
Cloudhopper?sends billions of SMS messages every month to hundreds of mobile carriers all around the world using Netty
For those who aren’t aware, Netty is an open source?Java NIO?framework that makes it easier to create high-performing protocol servers. An older version of Netty v3 used Java objects to represent I/O events. This was simple, but could generate a lot of?garbageespecially at our scale. In the new Netty 4 release, changes were made so that instead of short-lived event objects, methods on long-lived channel objects are used to handle I/O events. There is also a specialized buffer allocator that uses pools.
We take the performance, usability, and sustainability of the Netty project seriously, and we have been working closely with the Netty community to improve it in all aspects. In particular, we will discuss our usage of Netty 3 and will aim to show why migrating to Netty 4 has made us more efficient.
?
?
Reducing GC pressure and memory bandwidth consumption
A problem was Netty 3’s reliance on the JVM’s memory management for buffer allocations. Netty 3 creates a new heap buffer whenever a new message is received or a user sends a message to a remote peer. This means a ‘new byte[capacity]’ for each new buffer. These buffers caused GC pressure and consumed memory bandwidth: allocating a new byte array consumes memory bandwidth to fill the array with zeros for safety. However, the zero-filled byte array is very likely to be filled with the actual data, consuming the same amount of memory bandwidth. We could have reduced the consumption of memory bandwidth to 50% if the Java Virtual Machine (JVM) provided a way to create a new byte array which is not necessarily filled with zeros, but there’s no such way at this moment.
To address this issue, we made the following changes for Netty 4.
Removal of event objects
Instead of creating event objects, Netty 4 defines different methods for different event types. In Netty 3, the?ChannelHandler?has a single method that handles all event objects:
```
class Before implements ChannelUpstreamHandler {  void handleUpstream(ctx, ChannelEvent e) {    if (e instanceof MessageEvent) { ... }    else if (e instanceof ChannelStateEvent) { ... }      ...    }}
```
Netty 4 has as many handler methods as the number of event types:
```
class After implements ChannelInboundHandler {  void channelActive(ctx) { ... }  void channelInactive(ctx) { ... }  void channelRead(ctx, msg) { ... }  void userEventTriggered(ctx, evt) { ... }  ...}
```
Note a handler now has a method called ‘userEventTriggered’ so that it does not lose the ability to define a custom event object.
Buffer pooling
Netty 4 also introduced a new interface, ‘ByteBufAllocator’. It now provides a buffer pool implementation via that interface and is a pure Java variant of?jemalloc, which implements?buddy memory allocation?and?slab allocation.
Now that Netty has its own memory allocator for buffers, it doesn’t waste memory bandwidth by filling buffers with zeros. However, this approach opens another can of worms—reference counting. Because we cannot rely on GC to put the unused buffers into the pool, we have to be very careful about leaks. Even a single handler that forgets to release a buffer can make our server’s memory usage grow boundlessly.
Was it worthwhile to make such big changes?
Because of the changes mentioned above, Netty 4 has no backward compatibility with Netty 3. It means our projects built on top of Netty 3 as well as other community projects have to spend non-trivial amount of time for migration. Is it worth doing that?
We compared two?echo?protocol servers built on top of Netty 3 and 4 respectively. (Echo is simple enough such that any garbage created is Netty’s fault, not the protocol). I let them serve the same distributed echo protocol clients with 16,384 concurrent connections sending 256-byte random payload repetitively, nearly saturating gigabit ethernet.
According to our test result, Netty 4 had:
- 5 times less frequent GC pauses:?45.5 vs. 9.2 times/min
- 5 times less garbage production:?207.11 vs 41.81 MiB/s
  I also wanted to make sure our buffer pool is fast enough. Here’s a graph where the X and Y axis denote the size of each allocation and the time taken to allocate a single buffer respectively:
  As you see, the buffer pool is much faster than JVM as the size of the buffer increases. It is even more noticeable for direct buffers. However, it could not beat JVM for small heap buffers, so we have something to work on here.
  Moving forward
  Although some parts of our services already migrated from Netty 3 to 4 successfully, we are performing the migration gradually. We discovered some barriers that slow our adoption that we hope to address in the near future:

热点排行

开源软件

GC花销的降低-netty4在Twitter的使用