Until recently, Java did not take into account running inside a container. JVM ergonomics and Runtime.getAvailableProcessors were based on the host’s resources rather than those assigned to a container. Typically, this resulted in thread pools that were far too large for the container’s resources.
As of version 8u131 Java has become “container aware”, basing its ergonomic decisions on the container’s resources. Further support was backported from later JVM versions in 8u191.
In addition, Runtime.getAvailableProcessors now reflects the container’s resources. This is important as many libraries and applications base thread pool sizes on this value.
The article from Oracle announcing this feature states:
the JVM is Docker-aware with respect to Docker CPU limits transparently
But what does this mean? Docker uses Linux Control Groups (cgroups) to limit CPU usage. Java developers are used to dealing with virtual or physical machines, so configuring cgroups is unfamiliar. Understanding how cgroups work and how the JVM interprets them is critical to successfully deploying and running a JVM based application using Docker and Kubernetes.
Firstly, let’s look at how a container can have its CPU limited. We’ll only discuss shares and quotas as CPU sets are not used in the mainstream container orchestration market. We’ll then see how different JVM versions interpret these cgroup settings.
From the Docker documentation:
[set cpu shares] to a value greater or less than the default of 1024 to increase or reduce the container’s weight, and give it access to a greater or lesser proportion of the host machine’s CPU cycles.
This is a relative figure that gives the container a share of the total CPU cycles. It means nothing in isolation, only when compared to the other containers deployed to the same host so should not be interpreted as a number of cores.
The default value is 1024, which is important for later when we see how the JVM interprets this figure.
For example, if one container has 512 shares and another has 1024 shares then the second container will get double the CPU time as the first. However, this is only in the contended case. The next section from the Docker documentation:
[cpu shares are] only enforced when CPU cycles are constrained.
If there is no contention, then any container is free to use all of the host’s CPU time. E.g. in the above example if the second container is idle, then the first container can use all of the CPU cycles, even though it was configured with fewer shares.
This is a huge benefit of using containers. Shares allow increased utilization while being fair when all the containers need CPU time but still allowing a subset of the containers to fully utilize a host when others are idle. In projects I’ve worked on this benefit has reduced cloud costs by an order of magnitude when migrating from a VM per instance of an application.
However, if an application somehow interprets this value as the number of cores and sizes its thread pools accordingly, then even though cgroups would allow it to use all of the CPU time, there might not be a sufficient number of threads to do so.
From the Docker documentation:
[cpu_quota is the] number of microseconds per
--cpu-periodthat the container is limited to before throttled
A quota is the amount of CPU time that can be used per cpu_period. The default period is 100 microsecond so setting a quota of 50 microseconds is like giving a container half a core and a quota of 200 microseconds is like giving a container 2 cores.
However, if the host has more cores than this and your application is multi-threaded then the threads can run on many cores and use up the quota in a fraction of the period. For example, a host with 64 cores and an application with 20 threads that have work to do can use up a quota of 200 microseconds in 10 microseconds meaning all 20 threads will be throttled for the remainder of the period (90 microseconds). You can track this with the throttled_time in cpu.stat which is made available via tools like cadvisor.
Though quotas enable more predictable throughput, they can severely affect latency due to throttling. The behaviour of the system will change depending on the host’s number of cores.
Shares and quotas can be used together, with the shares first deciding how CPU should be divided until a container hits its quota.
Kubernetes has its own abstraction called millicores. 1000 millicores roughly relates to 1 core.
Millicores can either be set as a request or a limit. A request sets the CPU share for your application where 1000 millicores is equivalent to 1024 shares.
Requests are also used for scheduling. If a node in a Kubernetes cluster has 10 cores then it can have containers adding up to 10_000 millicores or 10_240 shares.
When shares are used in this way with the Kubernetes scheduler, the millicore request value is effectively the minimum amount of CPU your container will get when all the containers on the host are under load.
If the other containers on the host are idle or there are no other containers then your container can use all the available CPU. This results in high utilization while preserving fairness.
A Kubernetes limit is set as a quota. For every 1000 millicores, Kubernetes gives your container a period’s worth of quota. This sets a hard limit on the amount of CPU time your container can use even if the node has unused CPU cycles. Kubernetes limits typically reduce utilization.
It is very common to set requests without limits. This results in decent scheduling, high utilization of resources and fairness when all containers need CPU cycles.
A disadvantage of this approach is that it is harder to capacity plan your application because the number of resources your container gets varies depending on what else is running on the same host. The benefit of higher utilization outweighs this downside in my experience.
Another option is to set neither a limit nor a request. This will mean the container will be able to use the entire host’s resources but will be the first to be throttled when containers with requests and limits need CPU time. I don’t advise doing this as it is very unpredictable and also means your containers will be the first to be OOM killed when memory on a host reaches the limit.
Pre Java 8u131 shares were ignored by the JVM. For Java 8u131 onwards the JVM divides the shares by 1024 to pick a number of cores. The JVM is equating 1024 shares as one core. This was a huge change that will likely change the runtime behaviour of your application significantly. Interpreting 1024 shares as a single core has the downside that even if you don’t set a quota your application may not be able to fully utilize an idle host as your application doesn’t have enough threads to saturate all the cores.
The JVM equates 1024 shares to be 1 core
Shares are often set quite low as they can be used as a lower bound for the CPU time a container will get if the host the container is on is heavily utilized. It is not uncommon to set this to a value of lower than 1024. However, if you did this and then upgraded to Java 8u131 suddenly your application would have very few threads for GC, compilation, the Fork-Join pool, as well as any application thread pool size that is based on Runtime.availableProcessors.
Java 8 has no support for quotas but it was added in Java 11 with a new option that is on by default: PreferContainerQuotaForCPUCount.
When PreferContainerQuotaForCPUCount is set the JVM first looks at the quota value, if this is set then it is divided by the period to get an effective number of cores. If not set then shares are used as described above.
This makes a lot of sense. If an application bases its thread pools on availableProcessors then the quota won’t be used up by additional cores being available if there is a single thread pool. Quotas are a hard limit so there’s no point having more GC threads or processing threads (unless you block them on IO) than the effective core count based on your quota.
However, if your application has more threads than its effective CPU quota then it can use it up in a fraction of the period and then be throttled.
Now we know how CPU resources work in Docker and Kubernetes and how the JVM interprets them we finally need to discuss how we should configure our JVM, shares, and quota.
In my experience, the least amount of configuration required is to use JDK 11 and set CPU shares to the number of CPUs you want thread pools to be sized based on * 1024. Prior to 8u131 a smaller number of shares could have been used to pack more JVMs on each host when using Kubernetes. This can’t really be done now as setting a small shares value will result in very few threads being created.
Quotas should be reserved for special cases where you explicitly don’t want to use more CPU cycles than configured or you are sure that your application won’t have more threads than the quota equivalent worth of CPUs.
JVM applications, even with carefully managed thread pools, typically have vastly more threads than the number of CPUs. Two execution models used commonly on the JVM are a thread per request and an asynchronous event loop.
Thread per request applications, such as most applications that use Servlet based HTTP libraries e.g. Jetty and Tomcat, give a thread to each request allowing that thread to be blocked if IO needs to be performed (database access, file access). The sizing of the thread pool for these applications is based on the desired number of concurrent requests rather than the number of cores meaning that using a quota will likely result in throttling.
Asynchronous event loop execution models, such as Netty, Ratpack, Vertx, and Akka, instead size their thread pools based on the number of cores. These types of libraries are far more suited for containers assuming that asynchronous IO is used for database access and inter-service communication.
However, it is still dangerous to use quotas as these applications will always have other threads: GC threads, compilation threads, additional thread pools for blocking IO, and thread pools for isolating different types of tasks.
One argument for using quotas is that you will still get the same utilization if auto-scaling is in place. Even though one container can’t utilize a host it doesn’t matter as another instance will be created that can. This makes sense for runtimes with very little overhead where many small instances are as efficient as a few larger instances. The JVM does have a significant enough overhead that many JVMs each with 100Mb RAM and 2 threads are not desirable. So it is often better to have a smaller number of medium JVMs rather than many smaller JVMs. That further supports setting shares without a quota.
Using only shares does take a mind shift change from thinking “How many CPUs does my application need?”, especially if you are used to capacity planning with fixed resources.
The JVM has come a long way in terms of container support. Java 11 now does a sensible thing for quotas by default and as long as you understand the implications of what it does for shares then your JVM application can be easily configured to work well in a container. CPU is just part of the puzzle, I’ll be following up with articles on memory, base images, and diagnostic sidecars.
Thanks to @h100gfld for pointing out that cpu_share support was added later than version 8u131