Hamann Distributed

Making mistakes at scale so you don't have to.

Cache Invalidation Made Easy

“There are only two hard problems in Computer Science: cache invalidation and naming things.”

Phil Karlton

Full disclaimer: No, I didn’t find the perfect solution either (guess it’s an NP hard problem). For a lot of use cases, one of the generically applicable patterns I like best is explained very well by DHH – my problem with it was that it does not cover the case when entity content changes under the very same ID. I will show you another nice and generic pattern for just this purpose for you to have another trick up your sleeve.

For us, the key problem was about distributed tracking applications caching metadata about the campaigns or images we’re delivering. For example, if a campaign manager needs to change the URL, the tracking application needs to redirect to another target. As the tracking is high volume, caching was a no-brainer. For cache invalidation, we settled for a pull approach of one minute refreshes from the database first, which could unfortunately serve stale data and obviously wouldn’t scale with growing numbers of entries and servers.

Now instead of custom-building something ourselves, we thought of a more generic approach – and for us, it boiled down to a dead simple convention.

How it works

First step, set up a messaging service. If you’re on Amazon like us, SNS (+SQS maybe) fits the bill perfectly and is set up in minutes, otherwise you might consider RabbitMQ or any AMQP provider.

Second step, create one topic for every entity with the same name.

Third step, follow an easy convention:

  • Everyone who is mutating this entity (with us, it’s just the API, making things even easier) publishes the entity ID to the topic after writing to the database.
  • Everyone who is caching this entity subscribes to the topic and re-pulls the instance when a “dirty” ID comes in.
This approach is so beautiful for a lot of reasons:
  • Loose coupling. There are no hard dependencies between any apps following this pattern, but they all work together magically with no stale caches
  • Very easy to convert from a pull-based system
  • No to low security hassle. Data never goes through the queue, only IDs
  • Added plus for SNS users: You can safely leave the dirty work of monitoring and reliably keeping up a messaging system to Amazon.

Caveat lector: Obviously this approach will probably only be viable for medium- to low-volume core and master data. For caching and invalidation at several orders of magnitude higher, you’d probably be looking at specialized solutions and optimized modelling (Cassandra, S4 and the likes). Also, be careful as this eventually consistent solution still has some low but unknown amount of time where the cache will be stale. So if you’re absolutely dependent on consistency for a problem domain, there are few other ways than disabling caching altogether.