Consistency & NoSQL introduction

Consistency is a problem that spans in the ER and NoSQL worlds.

ER databases try to ensure strong consistency (an example is the ACID transactions approach); this is ok if we suppose having a single server where all the transactions take place. In a distributed scenario with commodity clusters (with NoSQL data persistence) consistency issues come into play under different forms:

Update consistency

Let’s consider an example to understand the problem for this consistency scenario:

image002

This is a so called write-write conflict: 2+ update on the same data item but one update is lost and the committer is not aware of that (in the example Mario is not aware the rows 123 = XYZ, as Jack’s update happens last)

In each type of environment, without a concurrency control, we cannot control the serialization of the updates.

Approach for dealing with consistency

We can divide the approaches in 2 families:

 Pessimistic approach

It prevents conflicts from occurring using write locks, so each committer needs to grant a lock before it can commit any change

Optimistic approach

Conflicts can occurs but it detects them and takes action using conditional updates, so when an inconsistency is detected some merge/resolution actions is required by the committers

As said, it’s possible to implement the approaches only when a consistent serialization of the update exists; so in a single server is doable; with 2+ server (ex peer-to-peer replication) the 2 nodes might apply not-sync updates instead because the nature of the architecture

N> In a peer-to-peer replication one an approach is to have a single node as target and use it to apply the update, then they are replicated across the nodes

Read consistency

Unfortunately having update consistency does not guarantee having data pulled from data ores be consistent across the network:

image004

If the data is kept in multiple aggregates it might happen that Jacks reads partial committed data, the some data can be replicated in different time; this is logical consistency problem; if it was an ER you could use acid transaction to wrap-up the 2 updates on rows 123 & 999 tougher so Jack will read both or none, but in NoSQL the scenario is more complicated. NoSQL supports atomic updated within the same aggregate, but as said in this case data is part of 2 different ones. In this case we can think of a (so called) inconsistency window (time during which the aggregates are replicated across the network: it is function of different factors: data locality; network issues etc)

The replication consistency addresses the need of having the same data (value) read from different replicas; data that is out of date is referred as stale

A related need is a so called read-your-writes consistency: when an update is done you’ll see it from all the replicas; this can achieve providing session consistency (if the consistency is lost, the session is lost too)

One technique to achieve that is

Sticky session: session is tied to a node, so the read-your-writes consistency on the affine node => the session ensures consistency

N> unfortunately this kills the load balance of course

Version stamps: every interaction with data store includes a sort of version stamp sen by the session.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s