Showing posts with label BigMemory. Show all posts

Sunday, October 7, 2012

BigMemory Go - Fast Restartable Store: Kickass!!

BigMemory Go, the latest product offering from Terracotta. BigMemory Go lets you keep all of your application's data instantly available in your server's memory. Your application will respond promptly when you have your data closest to the application.

A general application will have a database storing GBs of data, which makes the application slow. BigMemory Go provides a brand new reliable, durable, crash resilient and, above all, fast storage, called Fast Restartable Store. Fast Restart feature provides the option to store a fully consistent copy of the in-memory data on the local disk at all times.

We generally use Database for bookkeeping, state-of-record, persistent data. With BigMemory Go, we provide durable persistent data store, which can handle any kind of shutdown - planned or unplanned. The next time your application starts up, all of the data that was in memory is still available and very quickly accessible.

We are here comparing the performance of mysql db (tuned to my best knowledge) with Ehcache hibernate 2nd level cache (w/ DB backend) and BigMemory store altogether. The sample application used here is classic Spring's PetClinic app (using hibernate), modified for simplicity and shed spring webflow, converted to standalone Java performance test. This app has a number of Owners which have Pets and corresponding Visits to the Clinic. The test initially creates a number of Owners with pets and visits and puts them to the storage. Then, it executes 10% read/write test for 30 mins.

The three cases considered here are

No Hibernate caching with DB
Ehcache with BigMemory used as Hibernate 2nd level cache with DB backend.
BigMemory Go Fast Restartable Store (FRS)

Performance comparison chart shows that BigMemory Go FRS outnumbers the competitors. With no hibernate and db transaction managements, it just runs faster than ever.

DB vs Ehcache h2lc vs BigMemory FRS

BigMemory Go FRS provides consistent 50 µs latency 99.99% of the time compared to 3 ms average latency for ehcache h2lc and 60 ms latency for MySQL db.With hibernate 2nd level cache we are surely making reads faster but still we have too much of hibernate and JPA transaction overhead to it.

BigMemory Go FRS is new-age data storage which is durable, reliable and super-fast. ;)

Test Sources (Maven Project Download)

Thursday, May 31, 2012

Fast, Predictable & Highly-Available @ 1 TB/Node

The world is pushing huge amounts of data to applications every second, from mobiles, the web, and various gadgets. More applications these days have to deal with this data. To preserve performance, these applications need fast access to the data tier.

RAM prices have crumbled over the past few years and we can now get hardware with a Terabyte of RAM much more cheaply. OK, got the hardware, now what? We generally use virtualization to create smaller virtual machines to meet applications scale-out requirements, as having a Java application with a terabyte of heap is impractical. JVM Garbage Collection will slaughter your application right away. Ever imagined how much time will it take to do a single full garbage collection for a terabyte of heap? It can pause an application for hours, making it unusable.

BigMemory is the key to access terabytes of data with milliseconds of latency, with no maintenance of disk/raid configurations/databases.

BigMemory = Big Data + In-memory

BigMemory can utilize your hardware to the last byte of RAM. BigMemory can store up to a terabyte of data in single java process.

BigMemory provides "fast", "predictable" and "highly-available" data at 1 terabytes per node.

The following test uses two boxes, each with a terabyte of RAM. Leaving enough room for the OS, we were able to allocate 2 x 960 GB of BigMemory, for a total of 1.8+ TB of data. Without facing the problems of high latencies, huge scale-out architectures ... just using the hardware as it is.

Test results: 23K readonly transactions per second with 20 ms latency.
Graphs for test throughput and periodic latency over time.

Readonly Periodic Throughput Graph

Readonly Periodic Latency Graph

Friday, May 11, 2012

Billions of Entries and Terabytes of Data - BigMemory

Combine BigMemory and Terracotta Server Array for Performance and Scalability

The age of Big Data is upon us. With ever expanding data sets, and the requirement to minimize latency as much as possible, you need a solution that offers the best in reliability and performance. Terracotta’s BigMemory caters to the world of big data, giving your application access to literally terabytes of data, in-memory, with the highest performance and controlled latency.

At Terracotta, while working with customers and testing our releases, we continuously experiment with huge amounts of data. This blog illustrates how we were blown away by the results of a test using BigMemory and the Terracotta Server Array (TSA) to cluster and distribute data over multiple computers.

Test Configuration

Our goal was to take four large servers, each with 1TB of RAM, and push them to the limit with BigMemory in terms of data set size, as well as performance while reading and writing to this data set. As shown in Figure 1, all four servers were configured using the Terracotta Server Array to act as one distributed but uniform data cache.

Figure 1 - Terracotta Server Array with a ~4TB BigMemory Cache

We then configured 16 instances of BigMemory across the four servers in the Terracotta Server Array, where each server had 952GB of BigMemory in-memory cache allocated. This left enough free RAM available to the OS on each server. With Terracotta Server Array, you can configure large data caches with high-availability, with your choice of data striping and/or mirroring across the servers in the array (for more information on this, read here http://terracotta.org/documentation/terracotta-server-array/configuration-guide or watch this video http://blip.tv/terracotta/terracotta-server-array-striping-2865283. The end result was 3.8 terabytes of BigMemory available to our sample application for its in-memory data needs.

Next, we ran 16 instances of our test application, each on its own server, to load data and then perform read and write operations. Additionally, the Terracotta Developer Console (see Figure 2) makes it quick and simple to view and test your in-memory data performance while your application is running. Note that we could have configured BigMemory on these servers as well, thereby forming a Level 1 (L1) cache layer for even lower latency access to hot-set data. However, in this blog we decided to focus on the near-linear scalability of BigMemory as configured on the four stand-alone servers. We’ll cover L1 and L2 cache hierarchies and hot-set data in a future blog.

Figure 2 - The developer console helps configure your application's in-memory data topology.

Data Set Size (2 billion entries; 4TB total)

Now that we had our 3.8 terabyte BigMemory up and running, we loaded close to 4 terabytes of data into the Terracotta Server Array at a rate of about 450 gigabytes per hour. The sample objects loaded were data arrays, each 1700 bytes in size. To be precise, we loaded 2 billion of these entries with 200GB left over for key space.

The chart below outlines the data set configuration, as well as the test used:

Test Configuration
Test:	Offheap-test
svn url:	https://svn.terracotta.org/repo/forge/offHeap-test
Element #	2 Billion
Value Size	1700 bytes (simple byte arrays)
Terracotta Server #	16
Stripes #	16 (1 Terracotta Server per Mirror Group)
Application Node #	16
Terracotta Server Heap	3 GB
Application Node Heap	2 GB
Cache Warmup Threads	100
Test Threads	100
Read Write %age	10
Terracotta Server BigMemory	238 GB/Stripe. Total: 3.8 TB

The Test Results

As mentioned above, we ran 16 instances of our test application, each loading data at a rate of 4,000 transactions per second (tps), per server, reaching a total of 64,000 tps. At this rate, we were limited mostly by our network, as 64K tps of our data sample size translates to around 110 MB per second, which is almost 1 Gigabit per second (our network’s theoretical maximum). Figure 2 graphs the average latency measured while loading the data.

Figure 3 - Average latency, in millisecond, during the load phase.

The test phase consisted of operations distributed as 10% writes and 90% reads on randomly accessed keys over our data set of 2 billion entries. The chart below summarizes the incredible performance, measured in tps, of our sample application running with BigMemory.

Test Results
Warmup Avg TPS	4k / Application Node = 64 k total
Warmup Avg Latency	24 ms
Test Avg TPS	4122 / Application Node = 66 k total
Test Avg Latency	23 ms

We reached this level of performance with only four physical servers, each with 1 terabyte of RAM. With more servers and more RAM, we can easily scale up to 100 terabytes of data, almost linearly. This performance and scalability simply wouldn’t be possible without BigMemory and the Terracotta Server Array.

Check out the following links for more information:

· Terracotta Server Array: http://terracotta.org/documentation/terracotta-server-array/introduction

· Terracotta BigMemory: http://terracotta.org/documentation/bigmemory/overview

Thursday, November 24, 2011

From "Terrabytes :O" to "Just Terrabytes!!"

My first computer had 64MB of RAM, since then technology has improved and got a lot cheaper. We can get Terrabytes of RAM easily.

But when i talk about storing TB of data on a single java application/process,
I get reaction like are you insane or what!! TB of data on Java application, it wont even start and if it gets into GC (Garbage Collection), you can go and have a coffee at Starbucks even then it wont finish.

Then I say, BigMemory is the saviour you don't have to worry about GCs any more.
But still can BigMemory store TBs of Data without any performance degradation?

Here is my experiment, i tried loading 1 TB of data on a single JVM with BigMemory.
Tried loading 1 Trillion (yes, you read it correctly its thousand times a billion, which we call as trillion) elements of around 850 bytes of payload each. Total data is ~900G, hit the hardware limitaions, but we can sure make it more than TB if hardware is available.

Came across a huge box with 1TB of RAM, which made this happen. To reduce any GC issues, reduce the JVM Heap to 2G.

The test create an Ehcache and loads the data onto it. ehcache configuration used for the test.

<ehcache    
    name="cacheManagerName_0"     
    maxBytesLocalOffHeap="990g">    
    <cache         
        name="mgr_0_cache_0"         
        maxEntriesLocalHeap="3000"     
        overflowToOffHeap="true"/>
</ehcache>

Here is the graph of period (=4 secs) warmup thoughput over time. Secondary Axis of the chart show the total data stored.

There are few slight dips in the chart these are when BigMemory is expanding to store more data. The Throughput is above 200,000 all the time with an average of 350,000 txns/sec.

The latencies are also a big concern for the applications.

Its below a 1 ms and average being 300 µs.

Okay, we have loaded TB of data, now what. Does it even work?

Yes, it does. The test phase does read write operation over the data set. Randomly selects an elements updates it and put it back to the cache.

I will say throughput and latencies are not that bad :)
The spikes are due to JVM GC, even with 2GB heap we will have few GCs, but the pause time is not even 2 secs. So we get the max latency for the test to be around 2 secs but the 99 percentile is around 500 µs

So if you application is slowed down by database or you are spending thousands of dollars on maintaining a database.
Get BigMemory and offload your database!

There would be concerns about searching this huge data, we have ehcache search that makes it happen.

Tuesday, November 9, 2010

"Terracotta Fairy" brings BigMemory for Java users

I used to discuss about the Java goodness with my friends working on native platforms. But they always used to crib about Java., its slower, max latency is high, Garbage Collection(GC) ruins the user experience and tuning GC is a NIGHTMARE! Didn't had anything to defend on these points :(, as its the fact.

Garbage Collection kills the Java Application.

Being in Java world, loading 100G of data on java heap sounds crazy. I did some experiments to load 100G of data on single JVM. Even for Read-only case (no writes/updates to reduce the GC problems), it wont fit in 150G of heap.

On tuning GC, reducing Young Gen Space, reducing Survivor Ratio, etc, didnt helped much. The test just gets into back-to-back full GCs, killing the application throughput and latency. To get it working had to give 200 GB of heap and it performed so badly.

Then I wished, wont it be really nice to fit whole data, without any GC problems. "Terracotta Fairy" listened to us and here we have BigMemory Ehcache. BigMemory is a GC murder weapon from Terracotta, like an AA12 ShotGun.

Now, we can store 350GB of data with no GC .
Can you believe this !! Literally NO GC!!

Wanna see it with your eyes, here are the charts from the battle of Troy - On-Heap vs BigMemory.

The following charts shows the Ehcache use case which I thought would be fair enough. Ehcache being most widely used Java cache already outperforms other available cache. Didn't wanted to choose best use case for BigMemory i.e. 100% writes NOR the best for On-Heap i.e. Read-only. The read/write ratio is 50% reads and 50% writes. The hot set is that 90% of the time cache.get() will access 10% of the key set. This is representative of the the familiar Pareto distribution that is very commonly observed. The Test loads up full data in the Ehcache and then starts doing r/w operations on it.

These test machine was a Cisco UCS box running with Intel(R) Xeon(R) Processors. It had 6 quad-core 2.93Ghz Xeon(R) cpus for a total of 24 cores, with 378GB (396191024 kB) RAM, running RHEL5.1 with Sun JDK 1.6.0_21 in 64 bit mode.

For BigMemory testcase, had just 2G of java heap even when loading 350 GB of data while for OnHeap testcases, had java heap of twice the data size.

This chart compares the Largest Full GC duration occurred during the full run of the test. The numbers are taken from verbose GC logs.
If you take a microscope, then you can see there is a small green bar beside huge Al-burj tower types red bars. Those are GC duration for BigMemory :). Merely going above 1.2 sec, BigMemory surely kills Garbage Collection and removes the stigma on the Java.

This charts compares the Max Latency during the test run. As expected, this should be equal to the Max Full GC duration, since GC just blocks the application. BigMemory fairly defeats OnHeap here. Anyway who will like to have 4-5 mins of pause in his application, not me atleast !

Lets see how BigMemory throughput behaves with increase in the data size. The chart above shows that after certain point the throughput remains unaffected by the data size. Also, did a run with 350G of data and the tps/latency was constant. (Did we ever thought of caching 350G of data in an application :O). The drop in tps from 512M - 4G of data is because for smaller data size, ehcache stores the entries on heap (remember test is 10% hotset, so till hotset is on Heap & SMALL enough to fit, its faster). And we don't have much GC occurring for smaller data size.

Latency, the most worried about factor for user experience. We don't want our users to wait for 5 secs, because first impression is the last impression. The charts show the mean latency for the tests. Note that all those numbers are in micro-seconds. So they are anyway less than 0.5 secs, meeting your deadliest SLAs. BigMemory wins undoubtedly.

Here comes biggest test for BigMemory. Why will someone use BigMemory if it doesn't performs good enough as On Heap. We can't just ignore throughput for latency. We can see the BigMemory throughput numbers outperforms On Heap numbers here also. On-Heap throughput just goes on decreasing as full GCs would be killing the test. Imagine pausing for 4 mins during the test, will surely reduce the average throughput significantly.
Note: The test which I ran is 50% writes, so we might be overshadowing the onHeap throughput but 10% writes throughput were also comparable.

Mean latency graphs also says the same story which I have been saying all over my blog. BigMemory outperforms OnHeap :)

Do-it-yourself: Here is the svn link to the test, a Maven-based performance comparisons between different store configurations.
Note: You will need to get a demo license key and install it as discussed above to run the test.

So bottom line:
If you are fed-up of GCs, check BigMemory.
If you want to cache 350GB of data, check BigMemory.
If you want to use most AWESOME java cache ever made, check BigMemory.

If you like the post, vote it up on dzone. :)

Billions & Terabytes