Friday, November 4, 2011

New efficient Ehcache Automatic Resource Control (ARC)

To speed up an application, the most common technique used is caching and Ehcache is most commonly used in Java world. BigMemory Ehcache with Terracotta can cache terrabytes of data without any GC issues, bringing data closer to the clustered application in efficient manner. Terracotta Server stores whole data in BigMemory and provides the required data to the clustered application. With new feature, BigMemory at application level (Terracotta client or Layer1/L1), we are bringing cached data more closer to the application.

In an application, we might need to cache different types data in different caches. At some point of time, we would be using one of the caches heavily with few hits to other caches. Before Terracotta 3.6.0 release, we can specify the number of elements to be cached per cache, but with limited heap/BigMemory it gets tough to allocate space for each cache in efficient manner. The memory allocated to each cache is fixed so even if its not used much, the data will reside with application.

With the new feature, Automatic Resource Control (ARC), Ehcache will manage the heap/BigMemory allocated depending on the cache usage. With increased usage of cache_0, Ehcache will try to allocate more memory to cache_0 and as usage of cache_1 increase, it tries to manage the space between both the caches.

Test Case

Here is a small test which creates two caches, loads up the data to both the caches. During the test phase, threads access both the caches, say cache_0 & cache_1, but for one of the caches, cache_0, introduced a small delay after each transaction reducing the hits to the cache.

Both cache can store 1.5GB of data, total of 3GB of data, while BigMemory allocated at the application level is 2GB only. Its enough to store all cached data for one of the cache but not both.

Access Pattern
  1. 2 mins, both the caches are being used
  2. Next 10 mins cache_0 is being used more often
  3. Again for 2 mins both the caches are being used
  4. Next 10 mins cache_1 is used heavily
  5. Repeat

First, will like to discuss the case without ARC.

The tps remains almost constant even if other cache is not being used. We can see if both the caches are being used then also application throughput is same.

The L1 BigMemory usage remains constant throughout the test.

Now we should praise the benefits of ARC

Now with ARC, we can see if full cached data is at application BigMemory, the throughput gets a boost and touches 140k txn/sec. With both caches being used the tps is almost same as with and without ARC.

The throughput variation can be understood by the graph of L1 BigMemory usage. We can see that the L1 BigMemory usage for the cache_0 increasing, as it is heavily used. Overtime, it uses most of the memory for cache_0. As cache_1 usage increases, the memory usage for it also increases giving boost to the throughput.

To enable ARC, we just need to provide maxBytesLocalOffheap at Cache Manager level.

here is a sample ehcache.xml


This is 1 client attached to Terracotta Server to keep the testcase simpler. With multiple nodes, it will bring out the benefits more. :)

The above testcase is with only two caches, just picture if we have 10s of caches and tuning each cache would be a problem. With Ehcache ARC, its Ehcache responsibility to manage the data efficiently giving maximum throughput out of the application.

No comments: