10 Splunk Best Practices

Splunk is a popular choice for log analytics. I am a java developer and really love to use splunk for production analytics. I have used splunk for more than 5 years and like its simplicity.
This article is a list of best practices that I have learned from good splunk books and over my splunk usage in everyday software projects.

Most of the learnings are common for any software architect however it becomes important to document them for new developers. This makes our life easier in maintaining the software after it goes live in production.

Almost any software becomes difficult change after its live in production. There are some many things you may need to worry about. Using these best practices while implementing splunk in your software will help you in long run.

First Thing First : Keep Splunk Logs Separate

Keep splunk log separate from debug / error logs. Debug logs can be verbose. Define a separate splunk logging file in your application. This will also save you on licensing cost since you will not index unwanted logs.

Use Standard Logging Framework

Use existing logging framework to log to splunk log files. Do not invent your own logging framework. Just ensure to keep the log file separate for splunk. I recommend using Asynchronous logger to avoid any performance issues related to too much logging.

Some popular choice of logging frameworks in Java are listed below

Log In KEY=VALUE Format

Follow Key=Value format in splunk logging – Splunk understands Key=Value format, so your fields are automatically extracted by splunk. This format is also easier to read without splunk too. You may want to follow this for all other logs too.

Use Shorter KEY Names

Keep the key name short – preferable size should be less than 10 characters. Though you may have plenty of disc space. Its better to keep a tap on how much you log since it may create performance problems in long run. At the same time keep them understandable.

Use Enums For Keys

Define a Java Enum for SplunkKeys that has Description of each key and uses name field as the splunk key.

public enum SplunkKey {
    TXID("Transaction id");

    /**     * Describes the purpose of field to be splunked - not logged     */    private String description;

    SplunkKey(String description) {
        this.description = description;
    }

    public String getDescription() {
        return description;
    }
}

Create A Util Class To Log In Splunk

Define a SplunkAudit class in project that can do all splunk logging using easy to call methods.

public class SplunkAudit {
    private Map values = new HashMap<>();
    private static ThreadLocal auditLocal = new ThreadLocal<>();

    public static SplunkAudit getInstance() {
        SplunkAudit instance = auditLocal.get();
        if (instance == null) {
            instance = new SplunkAudit();
            auditLocal.set(instance);
        }
        return instance;
    }

    private SplunkAudit() {
    }

    public void add(SplunkKey key, String message) {
        values.put(key.name(), message);
    }

    public void flush() {
        StringBuilder fullMessage = new StringBuilder();
        for (Map.Entry val : values.entrySet()) {
            fullMessage.append(val.getKey());
            fullMessage.append("=");
            fullMessage.append(val.getValue());
            fullMessage.append(" ");
        }
        //log the full message now        //log.info(fullMessage);    }
}

Collect the Splunk Parameters (a collection of key,value pairs ) in transaction and log them at the end of transaction to avoid multiple writes.

Use Async Log Writer

Its recommended to use async logger for splunk logs. Async logging will perform logging in a separate thread. Below are some options

Setup Alerts

Setup Splunk queries as alerts – get automatic notifications.

Index GC Logs in Splunk

Index Java Garbage Collection Logs separately in splunk. The format of GC log is different and it may get mixed with your regular application logs. Therefore its better to keep it separate. Here are some tips to do GC log analytics using splunk.

Log These Fields

Production logs are key to debug problems in your software. Having following fields may always be useful. This list is just the minimum fields, you may add more based on your application domain.

ThreadName

Most important field for Java application to debug and identify multithreading related problems. Ensure every thread has a logical name in your application. This way you can differentiate threads. For example transaction threads and background threads may have different prefix in thread name.

Ensure to give a unique id for each thread. Its super easy to set thread names in java. One line statement will do it.

Thread.currentThread().setName(“NameOfThread-UniqueId”)

Thread Count

Print count of threads at any point in time in JVM. Below one liner should provide you java active thread count at any point in JVM.

java.lang.Thread.activeCount()

Server IP Address

Logging the server IP address become essential when we are running the application on multiple servers. Most enterprise application run cluster of servers. Its important to be able to differentiate errors specific to a special server.

Its easy to get IP address of current server. Below line of code should work for most places (unless the server has multiple ip addresses)

InetAddress.getLocalHost().getHostAddress()

Version

Version of software source from version control is important field. The software keeps changing for various reasons. You need to be able to identify exact version that is currently live on production. You can include your version control details in manifest file of deployable war / ear file. This can be easily done by maven.

Once the information is available in your war/ear file, it can be read in application at runtime and logged in splunk log file.

API Name

Every application performs some tasks. It may be called API or something else. These are the key identifier of actions. Log unique API names for each action in your application. For example

API=CREATE_USER
API=DELETE_USER
API=RESET_PASS

Transaction ID

Transaction id is a unique identifier of the transaction. This need not be your database transaction id. However you need a unique identifier to be able to trace one full transaction.

User ID – Unique Identifier

User identification is important to debug many use cases. You may not want to log user emails or sensitive info, however you can alway log a unique identifier that represents a user in your database.

Success / Failure of Transaction

Ensure you log success or failure of a transaction in the splunk. This will provide you a easy trend of failures in your system. Sample field would look like

TXS=S (Success transaction)
TXS=F (Failed transaction)

Error Code

Log error codes whenever there is a failure. Error codes can uniquely identify exact scenario therefore spend time defining them in your application. Best way is to define enum of ErrorCodes like below

public enum ErrorCodes {
    INVALID_EMAIL(1);

    private int id;

    ErrorCodes(int id) {
        this.id = id;
    }

    public int getId() {
        return id;
    }
}

Elapsed Time – Time Taken to Finish Transaction

Log the total time take by a transaction. It will help you easily identify the transactions that are slow.

Elapsed Time of Each Major Component in Transaction

If you transaction is made of multiple steps, you must also include time take for each step. This can narrow down your problem to the component that is performing slow.

I hope you find these tip useful. Please share with us anything missed in this page.