While projects are in the process of being created there are many problems that need to be solved. Time optimization is one of them. There is always a need to load data from one or multiple sources and transform it in some way, be it a file system, database or another system. But sometimes loading data takes a lot of time. Multiple loading of such data is unacceptable. To fix this, caches were created.
Cache is a mechanism which allows us to get the most frequently used unchanged data in the shortest possible time. It is usually in the form of in-memory storage with an API allowing us to get data with a key and clearing it when not needed.
As data caching is a common necessity, there are many implementations already created. Two popular ones are Guava Cache and Caffeine
Guava Cache
This implementation is part of Google Guava – a library created by Google. Is easy to install and easy to use with a pretty intuitive API.
Installation
To use it we need to add Guava dependency to a pom file.
<dependencies> <!-- https://mvnrepository.com/artifact/com.google.guava/guava --> <dependency> <groupId>com.google.guava</groupId> <artifactId>guava</artifactId> <version>28.1-jre</version> </dependency> </dependencies>
And that is all. We have all that is needed to create a cache in an application.
Usage
I will present here some basic information about Google Guava Cache, but I highly recommend reading more in the github expanation: https://github.com/google/guava/wiki/CachesExplained
There are multiple ways to add a key-value pair to cache but I think the most popular one is by creating a loading cache. It is the type of cache with information on how to get data that is needed but not present in the cache. For example:
LoadingCache<Key, Graph> graphs = CacheBuilder.newBuilder() .maximumSize(1000) .expireAfterWrite(10, TimeUnit.MINUTES) .build( new CacheLoader<Key, Graph>() { public Graph load(Key key) throws AnyException { return createExpensiveGraph(key); } });
In this way we create LoadingCache by using CacheBuilder. It is possible to add conditions for the clearing cache. If we neglect to do this, it is possible to load too many records, so using all available memory and causing some problems. In the example I used a maximum size, which removes older records when the number of them loaded to the cache is over the limit. The second condition informs the cache that it has to remove loaded records after a specified time. These two are the basic ones and I highly recommend checking the rest of them to better match them to specific application cases.
The second important thing in LoadingCache is CacheLoader. This class, and specifically the implementation of its method load, informs the cache what is needed to do to get the value searched for. Usually there is a method searching for value in db, the remote system or another source using the given key. The returned value is saved in cache for later use.
It is possible to load data another way, like loading it directly by using the PUT method but I recommend keeping this logic in the loader.
The most basic way of accessing data is by using the GET method
cache.get(key);
It gets data from the cache, or in the case that there is none for the given key, it forces loading data by using the load method. Another way of doing that is by using the getUnchecked method.
cache.getUnchecked(key)
In fact, in the background it uses the GET method, but the difference is in exception handling. GetUnchecked catches all exceptions and transforms them into UncheckedExecutionException, which extends RuntimeException. It removes the need to create a try-catch structure which is needed for GET. However, it is worth remembering that type of exception is changed if it happens inside the cache. This is important if we create custom Exceptions.
Sometimes we need to remove some records manually. In that case we have methods to invalidate data:
Cache.invalidate(key) Cache.invalidateAll()
Read also: Painless Changelog
Caffeine
The other cache implementation is Caffeine. It’s an open-source project of rewritten Guava Cache designed to be more effective than the original. It’s available on https://github.com/ben-manes/caffeine. Many companies decided to come to Caffeine, for example Spring Framework (https://github.com/spring-projects/spring-framework/issues/18370)
Installation
Adding this cache is as easy as Guava Cache. There are available Maven and Gradle dependencies like:
<!-- https://mvnrepository.com/artifact/com.github.ben-manes.caffeine/caffeine --> <dependency> <groupId>com.github.ben-manes.caffeine</groupId> <artifactId>caffeine</artifactId> <version>2.8.1</version> </dependency>
Usage
Using Caffeine in a project is similar to Guava Cache. The author even wrote that he reused most of the API. So code can look like this:
LoadingCache<Key, Graph> graphs = Caffeine.newBuilder() .maximumSize(10_000) .expireAfterWrite(5, TimeUnit.MINUTES) .build(key -> createExpensiveGraph(key));
Here we see that CacheBuilder is switched to Caffeine. The build method with the createExpensiveGraph method can be switched to CacheLoader similarly to Guava.
There are differences in API between both cache implementations but basic methods like getting or invalidating data stay the same, so it is easy to try them out.
One example of such differences are that in Caffeine you can cache null values, which in Guava creates an exception.
Another is that there is no getUnchecked method. The GET method in Caffeine throws some extensions of RuntimeException, so there is no need for a separate method.
Is it worth changing?
There are multiple benchmarks comparing both caches, sometimes even more implementations. Performance effects differ depending on the environment, however usually Caffeine gets better result.
An example of such a benchmark is:
https://medium.com/outbrain-engineering/oh-my-guava-we-are-moving-to-caffeine-99387819fdbb
I highly recommend searching for benchmarks yourselves or trying it out.
Another point is when we try using Guava Cache in a multithread environment. A problem can appear when we load data from the data source with one thread and save it to cache while with the second thread we change data in the data source and invalidate cache data. The invalidate operation does not wait for the cache to finish reading which can end with different data in the cache and source systems. More information can be found on
https://softwaremill.com/race-condition-cache-guava-caffeine/
In conclusion, I recommend using Caffeine rather than Guava Cache in new projects and would consider changing in existing projects.
Read also: Classic, headless or hybrid CMS? Maybe all of them together?