Thursday, May 31, 2012

Fast, Predictable & Highly-Available @ 1 TB/Node




The world is pushing huge amounts of data to applications every second, from mobiles, the web, and various gadgets. More applications these days have to deal with this data. To preserve performance, these applications need fast access to the data tier.

RAM prices have crumbled over the past few years and we can now get hardware with a Terabyte of RAM much more cheaply. OK, got the hardware, now what? We generally use virtualization to create smaller virtual machines to meet applications scale-out requirements, as having a Java application with a terabyte of heap is impractical. JVM Garbage Collection will slaughter your application right away. Ever imagined how much time will it take to do a single full garbage collection for a terabyte of heap? It can pause an application for hours, making it unusable.

BigMemory is the key to access terabytes of data with milliseconds of latency, with no maintenance of disk/raid configurations/databases.

BigMemory = Big Data + In-memory

BigMemory can utilize your hardware to the last byte of RAM. BigMemory can store up to a terabyte of data in single java process.

BigMemory provides "fast", "predictable" and "highly-available" data at 1 terabytes per node.

The following test uses two boxes, each with a terabyte of RAM. Leaving enough room for the OS, we were able to allocate 2 x 960 GB of BigMemory, for a total of 1.8+ TB of data. Without facing the problems of high latencies, huge scale-out architectures ... just using the hardware as it is.

Test results: 23K readonly transactions per second with 20 ms latency.
Graphs for test throughput and periodic latency over time.



Readonly Periodic Throughput Graph


Readonly Periodic Latency Graph

Friday, May 11, 2012

Billions of Entries and Terabytes of Data - BigMemory




Combine BigMemory and Terracotta Server Array for Performance and Scalability

The age of Big Data is upon us. With ever expanding data sets, and the requirement to minimize latency as much as possible, you need a solution that offers the best in reliability and performance. Terracotta’s BigMemory caters to the world of big data, giving your application access to literally terabytes of data, in-memory, with the highest performance and controlled latency.
At Terracotta, while working with customers and testing our releases, we continuously experiment with huge amounts of data. This blog illustrates how we were blown away by the results of a test using BigMemory and the Terracotta Server Array (TSA) to cluster and distribute data over multiple computers.

Test Configuration

Our goal was to take four large servers, each with 1TB of RAM, and push them to the limit with BigMemory in terms of data set size, as well as performance while reading and writing to this data set. As shown in Figure 1, all four servers were configured using the Terracotta Server Array to act as one distributed but uniform data cache.
Figure 1 - Terracotta Server Array with a ~4TB BigMemory Cache

We then configured 16 instances of BigMemory across the four servers in the Terracotta Server Array, where each server had 952GB of BigMemory in-memory cache allocated. This left enough free RAM available to the OS on each server. With Terracotta Server Array, you can configure large data caches with high-availability, with your choice of data striping and/or mirroring across the servers in the array (for more information on this, read here http://terracotta.org/documentation/terracotta-server-array/configuration-guide or watch this video http://blip.tv/terracotta/terracotta-server-array-striping-2865283. The end result was 3.8 terabytes of BigMemory available to our sample application for its in-memory data needs.
Next, we ran 16 instances of our test application, each on its own server, to load data and then perform read and write operations. Additionally, the Terracotta Developer Console (see Figure 2) makes it quick and simple to view and test your in-memory data performance while your application is running. Note that we could have configured BigMemory on these servers as well, thereby forming a Level 1 (L1) cache layer for even lower latency access to hot-set data. However, in this blog we decided to focus on the near-linear scalability of BigMemory as configured on the four stand-alone servers. We’ll cover L1 and L2 cache hierarchies and hot-set data in a future blog.
Figure 2 - The developer console helps configure your application's in-memory data topology.

Data Set Size (2 billion entries; 4TB total)

Now that we had our 3.8 terabyte BigMemory up and running, we loaded close to 4 terabytes of data into the Terracotta Server Array at a rate of about 450 gigabytes per hour. The sample objects loaded were data arrays, each 1700 bytes in size. To be precise, we loaded 2 billion of these entries with 200GB left over for key space.
The chart below outlines the data set configuration, as well as the test used:
Test Configuration

Test:
Offheap-test
svn url:
Element #
2 Billion
Value Size
1700 bytes (simple byte arrays)
Terracotta Server #
16
Stripes #
16 (1 Terracotta Server per Mirror Group)
Application Node #
16
Terracotta Server Heap
3 GB
Application Node Heap
2 GB
Cache Warmup Threads
100
Test Threads
100
Read Write %age
10
Terracotta Server BigMemory
238 GB/Stripe. Total: 3.8 TB

The Test Results

As mentioned above, we ran 16 instances of our test application, each loading data at a rate of 4,000 transactions per second (tps), per server, reaching a total of 64,000 tps. At this rate, we were limited mostly by our network, as 64K tps of our data sample size translates to around 110 MB per second, which is almost 1 Gigabit per second (our network’s theoretical maximum). Figure 2 graphs the average latency measured while loading the data.
Figure 3 - Average latency, in millisecond, during the load phase.

The test phase consisted of operations distributed as 10% writes and 90% reads on randomly accessed keys over our data set of 2 billion entries. The chart below summarizes the incredible performance, measured in tps, of our sample application running with BigMemory.


Test Results      

Warmup Avg TPS
4k / Application Node = 64 k total
Warmup Avg Latency
24 ms
Test Avg TPS
4122 / Application Node  = 66 k total
Test Avg Latency
23 ms
We reached this level of performance with only four physical servers, each with 1 terabyte of RAM. With more servers and more RAM, we can easily scale up to 100 terabytes of data, almost linearly. This performance and scalability simply wouldn’t be possible without BigMemory and the Terracotta Server Array. 

Check out the following links for more information:
·         Terracotta BigMemory: http://terracotta.org/documentation/bigmemory/overview