Thursday, November 24, 2011

From "Terrabytes :O" to "Just Terrabytes!!"


My first computer had 64MB of RAM, since then technology has improved and got a lot cheaper. We can get Terrabytes of RAM easily.

But when i talk about storing TB of data on a single java application/process,
I get reaction like are you insane or what!! TB of data on Java application, it wont even start and if it gets into GC (Garbage Collection), you can go and have a coffee at Starbucks even then it wont finish.

Then I say, BigMemory is the saviour you don't have to worry about GCs any more.
But still can BigMemory store TBs of Data without any performance degradation?

Here is my experiment, i tried loading 1 TB of data on a single JVM with BigMemory.
Tried loading 1 Trillion (yes, you read it correctly its thousand times a billion, which we call as trillion) elements of around 850 bytes of payload each. Total data is ~900G, hit the hardware limitaions, but we can sure make it more than TB if hardware is available.

Came across a huge box with 1TB of RAM, which made this happen. To reduce any GC issues, reduce the JVM Heap to 2G.

The test create an Ehcache and loads the data onto it. ehcache configuration used for the test.

<ehcache    
    name="cacheManagerName_0"     
    maxBytesLocalOffHeap="990g">    
    <cache         
        name="mgr_0_cache_0"         
        maxEntriesLocalHeap="3000"     
        overflowToOffHeap="true"/>
</ehcache>

Here is the graph of period (=4 secs) warmup thoughput over time. Secondary Axis of the chart show the total data stored.



There are few slight dips in the chart these are when BigMemory is expanding to store more data. The Throughput is above 200,000 all the time with an average of 350,000 txns/sec.

The latencies are also a big concern for the applications.


Its below a 1 ms and average being 300 µs.

Okay, we have loaded TB of data, now what. Does it even work?

Yes, it does. The test phase does read write operation over the data set. Randomly selects an elements updates it and put it back to the cache.




I will say throughput and latencies are not that bad :)
The spikes are due to JVM GC, even with 2GB heap we will have few GCs, but the pause time is not even 2 secs. So we get the max latency for the test to be around 2 secs but the 99 percentile is around 500 µs

So if you application is slowed down by database or you are spending thousands of dollars on maintaining a database.
Get BigMemory and offload your database!

There would be concerns about searching this huge data, we have ehcache search that makes it happen.



Friday, November 4, 2011

New efficient Ehcache Automatic Resource Control (ARC)

To speed up an application, the most common technique used is caching and Ehcache is most commonly used in Java world. BigMemory Ehcache with Terracotta can cache terrabytes of data without any GC issues, bringing data closer to the clustered application in efficient manner. Terracotta Server stores whole data in BigMemory and provides the required data to the clustered application. With new feature, BigMemory at application level (Terracotta client or Layer1/L1), we are bringing cached data more closer to the application.

In an application, we might need to cache different types data in different caches. At some point of time, we would be using one of the caches heavily with few hits to other caches. Before Terracotta 3.6.0 release, we can specify the number of elements to be cached per cache, but with limited heap/BigMemory it gets tough to allocate space for each cache in efficient manner. The memory allocated to each cache is fixed so even if its not used much, the data will reside with application.

With the new feature, Automatic Resource Control (ARC), Ehcache will manage the heap/BigMemory allocated depending on the cache usage. With increased usage of cache_0, Ehcache will try to allocate more memory to cache_0 and as usage of cache_1 increase, it tries to manage the space between both the caches.

Test Case

Here is a small test which creates two caches, loads up the data to both the caches. During the test phase, threads access both the caches, say cache_0 & cache_1, but for one of the caches, cache_0, introduced a small delay after each transaction reducing the hits to the cache.

Both cache can store 1.5GB of data, total of 3GB of data, while BigMemory allocated at the application level is 2GB only. Its enough to store all cached data for one of the cache but not both.

Access Pattern
  1. 2 mins, both the caches are being used
  2. Next 10 mins cache_0 is being used more often
  3. Again for 2 mins both the caches are being used
  4. Next 10 mins cache_1 is used heavily
  5. Repeat

First, will like to discuss the case without ARC.




The tps remains almost constant even if other cache is not being used. We can see if both the caches are being used then also application throughput is same.



The L1 BigMemory usage remains constant throughout the test.

Now we should praise the benefits of ARC




Now with ARC, we can see if full cached data is at application BigMemory, the throughput gets a boost and touches 140k txn/sec. With both caches being used the tps is almost same as with and without ARC.



The throughput variation can be understood by the graph of L1 BigMemory usage. We can see that the L1 BigMemory usage for the cache_0 increasing, as it is heavily used. Overtime, it uses most of the memory for cache_0. As cache_1 usage increases, the memory usage for it also increases giving boost to the throughput.

To enable ARC, we just need to provide maxBytesLocalOffheap at Cache Manager level.

here is a sample ehcache.xml

 <ehcache           
   name="cacheManagerName_0"            
   maxBytesLocalHeap="512m"             
   maxBytesLocalOffHeap="2g">             
   <defaultCache/>                 
   <cache                      
     name="cache_1"             
     overflowToOffHeap="true">          
     <terracotta/>                 
   </cache>                     
   <cache                      
     name="cache_0"             
     overflowToOffHeap="true">          
     <terracotta/>                 
   </cache>                     
   <terracottaConfig                
     url="localhost:9510"/>             
 </ehcache>                      

This is 1 client attached to Terracotta Server to keep the testcase simpler. With multiple nodes, it will bring out the benefits more. :)

The above testcase is with only two caches, just picture if we have 10s of caches and tuning each cache would be a problem. With Ehcache ARC, its Ehcache responsibility to manage the data efficiently giving maximum throughput out of the application.