Garbage Collection Logs Analysis Using Splunk
Each application is different and therefore no one size fits all solution for it. However the approach for garbage collector tuning is same for all types of application. Garbage collector tuning is a cyclic process that involves monitoring and adjusting the JVM GC parameters to match the application needs.
In this article, I am going to describe how we can monitor and identify problems related to garbage collection using SPLUNK analytics tool. There are many GC log analysis tools available, however this tutorial focuses on splunk queries and techniques to identify problems in a production application.
Here are the steps to analyze GC logs using splunk
Step 1: Enable GC LogsJava Hot spot VM has a built in support for logging garbage collection activities. This provides a lot of useful information about running application and garbage collection. This log can provide you following information
- Total size of objects using memory
- Number of times GC has been triggered.
- Total available memory for JVM
- Type of GC collection - Full or Partial
Below JVM parameters can be used to enable GC logging in a HotSpot VM, Its highly recommended to enable this logging in production systems and monitor for Full collections.
Additional attribute logging params
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps
Sample log entries in GC log file will look like this
[GC 12345.678: [ParNew: 844129K->4912K(943544K), 0.0092400 secs] 1974793K->1135530K(3040896K) icms_dc=0 , 0.0095380 secs]
[GC 123567.891: [ParNew: 841112K->4824K(943544K), 0.0087620 secs] 2021920K->1184327K(3040896K) icms_dc=0 , 0.0090380 secs]
[GC 232617.109: [Full GC 7525K->7477K(60800K), 0.0615190 secs]
Step 2 : Index GC Logs in SplunkOnce the logs are setup for your application you can start indexing it in splunk. This can be an easy task if you already have splunk setup. For this article I am assuming you can setup the splunk for GC log file.
Step 3 : Query Splunk For Full Garbage collectionsOnce the GC log file is setup in splunk you can start querying it. The most important check on a GC log is to scan for a full collection. Full collections can cause application pauses and create a lot of performance issues. Full collections are common in two types of application
- Applications with memory leaks
- Applications with heavy load that have poor or no GC tuning.
To check for a Full GC collection enter this query in your splunk,
index=myindex sourcetype=myappsource “Full GC"
Run this query for a days time and see if you get any results. Every row in the results will correspond to a Full GC collection. If you do not see any results run this query for longer period of time and observe the application behavior for several days of garbage collection cycle.
How Many Full GC are Good?In a healthy and well tuned application there should be no full GC collection required. Therefore any occurrence of a Full GC log entry is an opportunity for you to optimize your application performance by either fixing memory leaks or tuning garbage collector algorithm to fit you application needs.
How To Avoid Full GC?Avoiding full GC may not be easy. Below are 5 Key Tips to Avoid Full GC in your application
Tip #1Full GC collections are very common in applications that have memory leaks. Therefore, profile the application thoroughly and ensure all memory leaks are fixed. Try to understand the application memory needs and ensure no extra object or resource is left leaking.
Tip #2You may also observe full GC in heavy load applications that have no memory leaks. This is a simple optimization problem. You should tune the GC for you application to match its needs. Try to tune the GC such that it does small but frequent collections such that there is no backlog.
Tip #3Try to avoid having too big heap size. In case your application requires lot of memory you may want to consider GC tuning before it goes live in production. To a load test run to monitor application for sometime. A typical HotSpot vm may be good for GC up to 3-4Gb heap size. Larger heap size may introduce bigger collections and longer GC pauses.
If you have a fat box with lot of RAM - run multiple smaller JVMs on it instead of running one big JVM.
Tip #4Consider using other JVMs like Azul - http://www.azulsystems.com/zing/pgc or others that claim to have pause-less garbage collector. Such JVM is good to host a large heap application (10+GB of heaps).
Tip #5Leave some memory for operating system. A typical production server must leave 20% of total memory for operating system to deal with crisis.
Step 4 : Query Splunk for Minor Garbage collectionsMinor garbage collections are commonly seen in all java applications. Its not easy to identify a problem by looking at few minor collection log entries, however you can identify a pattern of collection by tracking the frequency of minor collection. JVM does frequent minor collections when a application is creating lot of garbage.
Below is the splunk query to find the frequency of minor collections
index=myindex sourcetype=myappsource “ParNew:” | timechart span=5m count(_time)
This query should display a line chart of minor GC collections over time. Notice the peaks in the chart are the times when GC is working hard to collect more garbage.
Understand the application usage in this time frame and relate it to the GC behavior.
Extracting other values in minor collection logBelow query extracts the various parameters from a minor collection entry and shows them as a table of values.
sourcetype=myappsource index=myindex "ParNew:” | rex "(?i)\\), (?P
[^ ]+)" | rex "(?i).*?\\((?P \\d+\\w+)(?=\\))" | rex "(?i)\\[ParNew: (?P [^\\-]+)" | rex "(?i)\\->(?P [^\\(]+)" | table COLLECTION_TIME, HEAP_SIZE_AVAILABLE, OBJECT_SIZE_BEFORE_GC, OBJECT_SIZE_AFTER_GC
- COLLECTION_TIME - Time take by GC to collect garbage objects.
- HEAP_SIZE_AVAILABLE - Memory size available in heap for application
- OBJECT_SIZE_BEFORE_GC - Size of objects in memory before GC was triggered
- OBJECT_SIZE_AFTER_GC - Size of objects in memory after GC completed.
Step 5 : Setup AlertsSplunk makes it really easy to setup any alerts on your log files. Make sure to setup following alerts
- A must have alerts is for Full GC
- Optional alert can also be set for a limit of minor GC collections within a timeframe. For example get an alert if minor GC collections exceed 100 in 10 minutes window. Make sure you tune this alert based on your application behavior and keep adjusting these alerts when required.
Step 6 : Tune the application JVM paramsOnce you understand the GC pattern in your application you can start working on tuning it. The GC tuning is a big topic and requires a lot of efforts. Some popular options you may want to try are listed below
To Use Mark and Sweep GC
-XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing
Step 7: Repeat Step 3 to 6Tuning application for best performance will require you to do several round of changes. Repeat the steps 3 to 6 until you get the optimal performance. Do not over optimize GC.
Useful Resources of GC Tuning and Monitoring
A tutorial for splunk admin on how to configure gc logs in splunk.
A good tutorial about JVM memory model and GC demonstrating the memory by easy to understand diagrams.
A tool by Oracle to conveniently read GC logs.
An in depth tutorial on how to do GC tuning.
Another good tutorial about monitoring applications using GC logs.
Oracle documentation about GC tuning.
A simple to follow tutorial for enabling and analyzing GC logs in any java application.