Consistency is a problem that spans in the ER and NoSQL worlds.
ER databases try to ensure strong consistency (an example is the ACID transactions approach); this is ok if we suppose having a single server where all the transactions take place. In a distributed scenario with commodity clusters (with NoSQL data persistence) consistency issues come into play under different forms:
Let’s consider an example to understand the problem for this consistency scenario:
This is a so called write-write conflict: 2+ update on the same data item but one update is lost and the committer is not aware of that (in the example Mario is not aware the rows 123 = XYZ, as Jack’s update happens last)
In each type of environment, without a concurrency control, we cannot control the serialization of the updates.
Approach for dealing with consistency
We can divide the approaches in 2 families:
It prevents conflicts from occurring using write locks, so each committer needs to grant a lock before it can commit any change
Conflicts can occurs but it detects them and takes action using conditional updates, so when an inconsistency is detected some merge/resolution actions is required by the committers
As said, it’s possible to implement the approaches only when a consistent serialization of the update exists; so in a single server is doable; with 2+ server (ex peer-to-peer replication) the 2 nodes might apply not-sync updates instead because the nature of the architecture
N> In a peer-to-peer replication one an approach is to have a single node as target and use it to apply the update, then they are replicated across the nodes
Unfortunately having update consistency does not guarantee having data pulled from data ores be consistent across the network:
If the data is kept in multiple aggregates it might happen that Jacks reads partial committed data, the some data can be replicated in different time; this is logical consistency problem; if it was an ER you could use acid transaction to wrap-up the 2 updates on rows 123 & 999 tougher so Jack will read both or none, but in NoSQL the scenario is more complicated. NoSQL supports atomic updated within the same aggregate, but as said in this case data is part of 2 different ones. In this case we can think of a (so called) inconsistency window (time during which the aggregates are replicated across the network: it is function of different factors: data locality; network issues etc)
The replication consistency addresses the need of having the same data (value) read from different replicas; data that is out of date is referred as stale
A related need is a so called read-your-writes consistency: when an update is done you’ll see it from all the replicas; this can achieve providing session consistency (if the consistency is lost, the session is lost too)
One technique to achieve that is
Sticky session: session is tied to a node, so the read-your-writes consistency on the affine node => the session ensures consistency
N> unfortunately this kills the load balance of course
Version stamps: every interaction with data store includes a sort of version stamp sen by the session.