bitfaster/bitfaster.caching

High performance, thread-safe in-memory caching primitives for

time aware least recently used (TLRU) eviction policy

High performance, thread-safe in-memory caching primitives for

⚡ BitFaster.Caching

High performance, thread-safe in-memory caching primitives for .NET.

Installing via NuGet

Install-Package BitFaster.Caching

Quick Start

Please refer to the wiki for more detailed documentation.

ConcurrentLru

ConcurrentLru is a light weight drop in replacement for ConcurrentDictionary, but with bounded size enforced by a pseudo LRU eviction policy. There are no background threads, no lock contention, lookups are fast and hit rate outperforms a pure LRU in all tested scenarios.

Choose a capacity and use just like ConcurrentDictionary, but with bounded size:

Atomic GetOrAdd

ConcurrentDictionary GetOrAdd is not performed atomically. In other words, concurrent requests for the same key can invoke the valueFactory delegate multiple times, and the last write wins.

ConcurrentLru can be configured to use atomic GetOrAdd using the ConcurrentLruBuilder:

Time based eviction

ConcurrentTLru functions the same as ConcurrentLru, but entries also expire after a fixed duration since creation or most recent replacement. This can be used to remove stale items. If the values generated for each key can change over time, ConcurrentTLru is eventually consistent where the inconsistency window = time to live (TTL).

Caching IDisposable values

It can be useful to combine object pooling and caching to reduce allocations, using IDisposable to return objects to the pool. All cache classes in BitFaster.Caching own the lifetime of cached values, and will automatically dispose values when they are evicted.

To avoid races using objects after they have been disposed by the cache, use IScopedCache which wraps values in Scoped<T>. The call to ScopedGetOrAdd creates a Lifetime that guarantees the scoped object will not be disposed until the lifetime is disposed. Scoped cache is thread safe, and guarantees correct disposal for concurrent lifetimes.

Caching Singletons by key

SingletonCache enables mapping every key to a single instance of a value, and keeping the value alive only while it is in use. This is useful when the total number of keys is large, but few will be in use at any moment and removing an item while in use would result in an invalid program state.

The example below shows how to implement exclusive Url access using a lock object per Url.

Performance

DISCLAIMER: Always measure performance in the context of your application. The results provided here are intended as a guide.

The cache replacement policy must maximize the cache hit rate, and minimize the computational and space overhead involved in implementing the policy. Below an analysis of hit rate vs cache size, latency and throughput is provided.

ConcurrentLru Hit rate

The charts below show the relative hit rate of classic LRU vs Concurrent LRU on a Zipfian distribution of input keys, with parameter s = 0.5 and s = 0.86 respectively. If there are N items, the probability of accessing an item numbered i or less is (i / N)^s.

Here N = 50000, and we take 1 million sample keys. The hit rate is the number of times we get a cache hit divided by 1 million. This test was repeated with the cache configured to different sizes expressed as a percentage N (e.g. 10% would be a cache with a capacity 5000). ConcurrentLru is configured with the default FavorFrequencyPartition to allocate internal queue capacity (see here for details).

As above, but interleaving a sequential scan of every key (aka sequential flooding). In this case, ConcurrentLru performs significantly better across the board, and is therefore more resistant to scanning.

These charts summarize the percentage increase in hit rate for ConcurrentLru vs LRU. Increase is in hit rate is significant at lower cache sizes, outperforming the classic LRU by over 150% when s = 0.5 in the best case for both Zipf and scan access patterns.

ConcurrentLru Latency

In these benchmarks, a cache miss is essentially free. These tests exist purely to compare the raw execution speed of the cache bookkeeping code. In a real setting, where a cache miss is presumably quite expensive, the relative overhead of the cache will be very small.

Benchmarks are based on BenchmarkDotNet, so are single threaded. The ConcurrentLru family of classes are composed internally of ConcurrentDictionary.GetOrAdd and ConcurrentQueue.Enqueue/Dequeue method calls, and scale well to concurrent workloads.

Benchmark results below are from a workstation with the following config:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.22000
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores
  [Host]             : .NET Framework 4.8 (4.8.4510.0), X64 RyuJIT
  .NET 6.0           : .NET 6.0.5 (6.0.522.21309), X64 RyuJIT
  .NET Framework 4.8 : .NET Framework 4.8 (4.8.4510.0), X64 RyuJIT

The relative ranking of each cache implementation is stable across .NET Framework/Core/5/6 and on the CPU architectures available in Azure (e.g. Intel Skylake, AMD Zen). Absolute performance can vary.

What are FastConcurrentLru/FastConcurrentTLru?

These are classes that execute with the hit counting logic eliminated (via JIT). If hit counts are not required, this makes the code around 10% faster.

Lookup keys with a Zipf distribution

Take 1000 samples of a Zipfian distribution over a set of keys of size N and use the keys to lookup values in the cache. If there are N items, the probability of accessing an item numbered i or less is (i / N)^s.

s = 0.86 (yields approx 80/20 distribution)
N = 500

Cache size = N / 10 (so we can cache 10% of the total set). ConcurrentLru has approximately the same computational overhead as a standard LRU in this single threaded test.

Method Mean Error StdDev Ratio RatioSD
ClassicLru 175.7 ns 2.75 ns 2.43 ns 1.00 0.00
FastConcurrentLru 180.2 ns 2.55 ns 2.26 ns 1.03 0.02
ConcurrentLru 189.1 ns 3.14 ns 2.94 ns 1.08 0.03
FastConcurrentTLru 261.4 ns 4.53 ns 4.01 ns 1.49 0.04
ConcurrentTLru 266.1 ns 3.96 ns 3.51 ns 1.51 0.03

Raw Lookup speed

In this test the same items are fetched repeatedly, no items are evicted. Representative of high hit rate scenario, when there are a low number of hot items.

  • ConcurrentLru family does not move items in the queues, it is just marking as accessed for pure cache hits.
  • Classic Lru must maintain item order, and is internally splicing the fetched item to the head of the linked list.
  • MemoryCache and ConcurrentDictionary represent a pure lookup. This is the best case scenario for MemoryCache, since the lookup key is a string (if the key were a Guid, using MemoryCache adds string conversion overhead).

FastConcurrentLru does not allocate and is approximately 5-10x faster than System.Runtime.Caching.MemoryCache or the newer Microsoft.Extensions.Caching.Memory.MemoryCache.

Method Runtime Mean StdDev Ratio Allocated
ConcurrentDictionary .NET 6.0 7.783 ns 0.0720 ns 1.00 -
FastConcurrentLru .NET 6.0 9.773 ns 0.0361 ns 1.26 -
ConcurrentLru .NET 6.0 13.615 ns 0.0606 ns 1.75 -
FastConcurrentTLru .NET 6.0 25.480 ns 0.0935 ns 3.28 -
ConcurrentTLru .NET 6.0 29.890 ns 0.2107 ns 3.84 -
ClassicLru .NET 6.0 54.422 ns 0.2935 ns 7.00 -
RuntimeMemoryCacheGet .NET 6.0 115.016 ns 0.6619 ns 14.79 32 B
ExtensionsMemoryCacheGet .NET 6.0 53.328 ns 0.2130 ns 6.85 24 B
ConcurrentDictionary .NET Framework 4.8 13.644 ns 0.0601 ns 1.00 -
FastConcurrentLru .NET Framework 4.8 14.639 ns 0.0892 ns 1.07 -
ConcurrentLru .NET Framework 4.8 17.008 ns 0.2538 ns 1.25 -
FastConcurrentTLru .NET Framework 4.8 43.854 ns 0.0827 ns 3.22 -
ConcurrentTLru .NET Framework 4.8 47.954 ns 1.2772 ns 3.52 -
ClassicLru .NET Framework 4.8 62.683 ns 0.8105 ns 4.60 -
RuntimeMemoryCacheGet .NET Framework 4.8 287.627 ns 1.3691 ns 21.08 32 B
ExtensionsMemoryCacheGet .NET Framework 4.8 114.511 ns 0.5902 ns 8.39 24 B

ConcurrentLru Throughput

In this test, we generate 2000 samples of 500 keys with a Zipfian distribution (s = 0.86). Caches have size 50. From N concurrent threads, fetch the sample keys in sequence (each thread is using the same input keys). The principal scalability limit in concurrent applications is the exclusive resource lock. As the number of threads increases, ConcurrentLru significantly outperforms an LRU implemented with a short lived exclusive lock used to synchronize the linked list data structure.

This test was run on a Standard D16s v3 Azure VM (16 cpus), with .NET Core 3.1.

Issues

Quick list of the latest Issues we found

juancarrey

juancarrey

enhancement
Icon For Comments0

It would be nice to have the feature for those caches that support TTL, that returns the stale value during revalidation of the cached instance, being able to react to the newer instance once retrieved updating the cache in the background for later hits.

This would speed up misses, allowing a stale object to be returned during revalidating time.

This could be optionally defined on the policies or cache options

Thank you :)

Versions

Quick list of the latest released versions

v2.1.0 - Sep 11, 2022

What's Changed

  • Added ConcurrentLfu, a .NET implementation of the W-TinyLfu eviction policy. This closely follows the approach taken by the Caffeine library by Ben Manes - including buffered reads/writes and hill climbing to optimize hit rate. A ConcurrentLfuBuilder provides integration with the existing atomic value factory and scoped value features.
  • To support ConcurrentLfu added the MpscBoundedBuffer and StripedMpscBuffer classes.
  • To support ConcurrentLfu added the ThreadPoolScheduler, BackgroundThreadScheduler and ForegroundScheduler classes.
  • Added the Counter class for fast concurrent counting, based on LongAdder by Doug Lea.
  • Updated ConcurrentLru to use Counter for all metrics and added padding to internal queue counters. This improved throughput by about 2.5x with about 10% worse latency.
  • Added DebuggerTypeProxy types to customize the debugger view of FastConcurrentLru, ConcurrentLru, FastConcurrentTLru, ConcurrentTLru and ConcurrentLfu.
  • API documentation is now included in the NuGet package. Provided documentation for all public APIs.
  • Rewrote and corrected bugs in the throughput analysis tests, which now support Read, Read + Write, Update and Evict scenarios.

Full changelog: https://github.com/bitfaster/BitFaster.Caching/compare/v2.0.0...v2.1.0

v2.0.0 - Jul 29, 2022

What's Changed

  • Split ICache into ICache, IAsyncCache, IScopedCache and IScopedAsyncCache interfaces. Mixing sync and async code paths is problematic and generally discouraged. Splitting sync/async enables the most optimized code for each case. Scoped caches return Lifetime<T> instead of values, and internally have all the boilerplate code to safely resolve races.
  • Added ConcurrentLruBuilder, providing a fluent builder API to ease creation of different cache configurations. Each cache option comes with a small performance overhead. The builder enables the developer to choose the exact combination of options needed, without any penalty from unused features.
  • Cache interfaces have optional metrics, events and policy objects depending on the options chosen when constructing the cache.
  • Implemented optional support for atomic GetOrAdd methods (configurable via ConcurrentLruBuilder), mitigating cache stampede.
  • ConcurrentLru now has configurable hot, warm and cold queue size via ICapacityPartition. Default partition scheme changed from equal to 80% warm via FavorWarmPartition, improving hit rate across all tests.
  • Fixed ConcurrentLru warmup, allowing items to enter the warm queue until warm is full. Improves hit rate across all tests.
  • Added hit rate analysis tests for real world traces from Wikibench, ARC and glimpse workloads. This replicates the test suite used for Java's Caffeine.
  • Async get methods now return ValueTask, reducing memory allocations.
  • Added eviction count to cache metrics

Full changelog: https://github.com/bitfaster/BitFaster.Caching/compare/v1.1.0...v2.0.0

v1.1.0 - Jul 01, 2022

What's Changed

  • Added Trim(int itemCount) to ICache and all derived classes
  • Added TrimExpired() to TLRU classes
  • When an item is updated TLRU classes reset the item timestamp, extending TTL
  • Fixed TLRU long ticks policy on macos
  • Add Cleared and Trimmed to ItemRemovedReason
  • Item removal event that fires on Clear() now has ItemRemovedReason.Cleared instead of Evicted

Full changelog: https://github.com/bitfaster/BitFaster.Caching/compare/v1.0.7...v1.1.0

v1.0.7 - Feb 13, 2022

Added diagnostic features to dump cache contents:

  • ClassicLru/ConcurrentLru family implements IEnumerable<KeyValuePair<K,V>>. Enables enumeration of keys and values in the cache.
  • ClassicLru/ConcurrentLru family implements ICollection Keys. Enables enumeration of the keys in the cache.

v1.0.6 - Nov 17, 2021

  • Implement ItemRemoved event on ConcurrentLru and ConcurrentTLru
  • NuGet package now produced with deterministic build. See https://reproducible-builds.org/ for info

v1.0.5 - Nov 10, 2021

  • Wrap LRU dispose logic in a static class
  • Dispose objects created by LRU valueFactory that are not added to the cache due to a race
  • Fix race in disposed Scoped instances
  • .net6 build target compiled using .NET SDK 6.0.100

v1.0.4 - Sep 11, 2021

  • Add Clear method to ICache and LRUs (ClassicLru, ConcurrentLru, ConcurrentTLru)
  • Fixed ConcurrentLRU so that capacity is exact when ctor arg is not divisible by 3
  • Added net6 build target (compiled with .NET SDK 6.0.0-preview.7)
  • Removed netcoreap2.0 and netcoreapp3.0 (since out of support)

v1.0.3 - May 31, 2021

  • Fixed ObjectDisposedException in SingletonCache.Release()
  • Old values are now disposed when replaced when calling the LRU methods TryUpdate() and AddOrUpdate()
  • LRU uses atomic ConcurrentDictionary method to remove items

v1.0.2 - Jan 12, 2021

Added TryUpdate and AddOrUpdate methods to LRU classes to support updating cached values.

Added .NET 5 target. Included target frameworks:

  • .NET Framework 4.72 and 4.8
  • .NET Standard 2.0
  • .NET Core 2.0, 3.0 and 3.1
  • .NET 5

v1.0.1 - Jul 09, 2020

Now includes multiple target frameworks:

  • .NET Framework 4.72 and 4.8
  • .NET Standard 2.0
  • .NET Core 2.0, 3.0 and 3.1

v1.0.0 - Jul 02, 2020

v0.9.4 - Jun 19, 2020

v0.9.3 - Jun 16, 2020

Library Stats (Sep 12, 2022)

Subscribers: 8
Stars: 175
Forks: 6
Issues: 3

dotnet-sshdeploy

here, otherwise you are in the right place

dotnet-sshdeploy

GraphQL Dotnet Parser

This library contains a lexer and parser classes as well as the complete GraphQL AST model

GraphQL Dotnet Parser

This dotnet extension is designed to clean-up the NuGet cache

(hopefully) temporary workaround for the @dotmorten as outlined in

This dotnet extension is designed to clean-up the NuGet cache

Dotnet client for Tarantool NoSql database

Some methods are not implemented yet because there are no direct analogs in IProto

Dotnet client for Tarantool NoSql database

dotnet-coverageconverter

coverage (binary format) files to

dotnet-coverageconverter

dotnet-stellar-sdk Stellar API SDK for

Report Bug · Report Security Vulnerability

dotnet-stellar-sdk Stellar API SDK for

dotnet-jwk is a JSON Web Key manager for dotnet

It allow to generate, encrypt, decrypt, convert and check JWK

dotnet-jwk is a JSON Web Key manager for dotnet

dotnet add package Brighid

Protecting the Client Secret

dotnet add package Brighid

dotnet-real-time-chat

A real time chat using C# dotnet and RabbitMQ

dotnet-real-time-chat
dotnet tool install --global dotnet-extract