Hosting a chaos mode for Twitch streamer StaySafeTV this past weekend has been an incredibly valuable experience for our team.  While we had plenty of experience hosting low and medium population servers, we learned a lot from this recent influx of players about the strengths and weaknesses of our hardware and software stacks and we're going to take a moment to discuss those strengths and weaknesses in this post.

Background

In general, game servers use ticks to operate: small quanta of input from clients, calculation and storage of all relevant game state which is then sent back out to the clients in updates.

In Minecraft, those calculations handle chunk generation, physics and flow calculations for water and lava, lighting updates, movements and actions for monsters and animals; basically everything the player can do to their environment and everything the environment can do to the player.  These ticks are supposed to happen once every 50 milliseconds, or 20 ticks in every second.

Minecraft, before the Spergs soiled it with their hydra... richards.

Minecraft servers are also (mostly) single threaded: most of the operations they perform are done sequentially, one at a time.  This isn't a problem on small servers with limited player populations, but it can become a serious problem as a server's population increases.  Normally the work to do during a tick can be completed in under 50ms, but with more entities interacting with each other, more work has to be performed, and tick intervals can go well above that 50ms limit.  Fortunately, there are ways to mitigate this.

Solution: Alternate Minecraft Servers

With the scaling and configuration limitations of Minecraft servers becoming evident very soon after the launch of multiplayer mode, the Minecraft community stepped forward and started developing alternatives with richer configuration and customization features than the vanilla server had.

CraftBukkit was the first of these alternate servers, allowing for plugins which can drastically change the gameplay experience, as well as basic configuration options that allow for general performance tuning.  CraftBukkit is no longer available, instead living on as part of the Vanilla Minecraft server and

Spigot is a performance enhanced fork of CraftBukkit in very common use today.  It supports an extension of the Bukkit API (Spigot-API) and advanced configuration options that allow for much better performance on high population servers, as well as support for Minecraft proxies that support multi-server networks like ours.

Paper is a high-performance fork of Spigot used by many high population Minecraft servers.  Is makes additional extensions to the Spigot-API and includes several significant performance enhancing optimizations, which have a significant positive impact on the performance of high population servers.  The feature of Paper that caused us to switch over from Spigot was asynchronous chunk loading and world generation, which has provided a significant performance boost to all of our servers.

We are aware that there are additional custom servers that we have not yet evaluated which may yield better performance properties than Paper, we just haven't tested them yet.

Solution: Server Configuration

Given enough players, monsters, animals, physics events, etc, out-of-the-box installations of Spigot and even Paper can become laggy messes.  Fortunately, Spigot and Paper are quite configurable and their behavior can be easily redefined to provide a much higher-performance experience.

Unfortunately there is conflicting information out there on how to properly optimize a server.  Our configuration is based heavily on the recommendations posted by frash23 on Spigot's forums with a few additional changes.

We also disable the tick-limiting of Spigot, which can interfere with individual players (which are entities as well) instead opting for Aikar's entity activation range system, which does a much better job of prioritizing tick calculation time to the entities that actually need it for a good player experience.

Solution: JVM Options

To prevent issues with Java GC jittering our tick time too much, we adopted Aikar's recommended JVM options on our servers and have yet to experience any issues with G1GC spiking our tick timings.

Solution: Faster Hardware

The majority of our network currently runs on dedicated Xeon E5-2670 processors.  We've found through testing that using the right Minecraft server software, with the right configuration and JVM options, we can reasonably handle 40 to 50 players per server instance on these processors.  Since Minecraft is mostly single threaded, if we want better performance per server instance, we will need to invest in hardware that has better single-core performance.

We are currently investigating switching to a hybrid solution: using our existing Xeons for low to medium population Minecraft server instances while putting our high population server instances on systems based around Intel's i9-9900K processor and fast memory.  Based on benchmarks, these server instances should be able to handle 60 to 100 players per instance, or up to twice as much as each of our Xeon cores can handle.

Solution: A Smaller Per-Instance Player Population

Even with the right server software, the right configuration and JVM options, and the right hardware, there will still need to be a player-limit set on any given Minecraft server instance to prevent unacceptable server performance.  Many larger server networks get around these limitations by sharding their game modes into different server instances.  While this has the benefit of breaking the player base up into smaller, more manageable chunks, it has a side effect of isolating character status, items, awards and achievements away from other server instances.

Don't ask...

Non-Solution: More Memory

If our performance limitations could be solved by more memory, we would have fixed them long ago.  In Chaos, we average around 6GB of old generation when at our rated population.  The 8-12GB of memory we allocate to each server instance gives G1GC plenty of room to work with.

Reviewing our server timings also reveals that garbage collection is not a performance issue for our servers, and that the limiting factor is processing power in relation to the number of entities being processed.

Non-Solution: Pre-generated World

While we don't publicly disclose the max-world-size values on our servers, it's quite large.  Not only would our world pre-generation take an exceedingly long time to perform, it would take an exceedingly large volume of space to store, and to very little if any benefit.

Since Paper now implements asynchronous chunk loading and world generation, those operations can be performed outside of the main server thread, before the chunks are needed, alleviating any need to pre-generate them.

Conclusion

In our minds, a server that isn't fun to play on isn't worth hosting.  Striking the right balance between a reasonable population size and good interactivity with the environment will be an ongoing task.  One that we look forward to tackling as best we can moving forward.