the CAP theorem

In NoSql solutions there is always a tradeoff between consistency availability and partition tolerance. Thinking of ER architectures we have the possibility to relax isolation levels to reach acceptable performance, for example with read uncommitted isolation where data not committed yet can be read, or using read committed transaction level to eliminate read-write conflicts, or the serialised isolation, the highest one.

But what happens in the NoSql world where the scenario is much different from a single-server system (where most relational db systems live in)? We have a high number of clusters that communicate between them and operate on the same data (replicated across the network); the goal (ideally) is to have all the clusters in sync with the same data so, in a way, the clusters should be tolerant to network partitions that might happen in case of communication issues: the lack of “communication” leads to a trade off of consistency versus availability (if a cluster cannot communicate with the other its data might not be consistent so I might decide to limit the data processing and hence limiting the availability of my system to reply to requests)

Let's see some example to dig into the problem and understand the approach:

Let's suppose we have an hotel booking system, and this is our biz scenario:

Mario is booking an Holiday package (last on left) and he's connected to a server in Dublin (where the room and offers are kept), in the middle time Jack is looking at the same offer and its's connected to another server in NY. To ensure consistency both the servers need to agree on who is going to get the special offer and who not: this ensure consistency; but it the network link breaks (between Dublin and NY) so we have sacrifice availability, that means nobody can book. This is the idea of tradeoff in practice: if the network is partitioned (line broken) you cannot have consistency and availability too…

In real life implementation some extra booking are allowed or some extra rooms (in hotels) are kept to accommodate overbooking case; this is necessary and sometimes is not possible to implement the ideal scenario where all the data is consistent, as possible using Acid transaction in non big data solutions; we can rely on the aggregate oriented transaction nature of NoSql as dbs support them, so the biz requirement are relevant in the design of how we build the aggregates… we can think of availability as the maximum latency the system can tolerate; when it gets too high, we give up and treat data as unavailable…


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s