Interview question: Distributed transaction

Interview question: Distributed transaction

  • 1.What is distributed?

With the popularity of microservice architecture, a large-scale business system is often composed of several subsystems, and these subsystems have their own independent databases. Often a business process needs to be completed by multiple subsystems, and these operations may need to be completed in one transaction. In microservice systems, these business scenarios are ubiquitous. At this time, we need to implement cross-database transaction support through some means on the database, which is what everyone often calls "distributed transaction".

  • 2. CAP theory:

What the CAP theory says is: in a distributed system, only two of the requirements of C, A, and P can be met at most.

2.1, the meaning of CAP

C: Consistency consistency

Whether multiple copies of the same data are the same in real time.

A: Availability

Availability: within a certain period of time & when the system returns a clear result, the system is called available.

P: Partition tolerance

Distribute the same service in multiple systems to ensure that one system is down and there are still other systems that provide the same service.

3. Distributed transaction protocol

3.1 2PC

2PC: It is called a two-phase commit protocol. One of the difficulties of distributed systems is how to ensure that multiple nodes under the architecture maintain consistency when performing transactional operations. To achieve this goal, the establishment of the two-phase submission algorithm is based on the following assumptions:

  • In this distributed system, one node acts as a coordinator (Coordinator), and other nodes act as participants (Cohorts). And network communication can be carried out between nodes.
  • All nodes use write-ahead logs, and after the logs are written, they are kept on reliable storage devices. Even if the nodes are damaged, the log data will not disappear.
  • All nodes will not be permanently damaged, even if they are damaged, they can still be restored.

1) The first stage (voting stage)

  • The coordinator node asks all participant nodes whether it can perform a vote, and starts to wait for a response from each participant node.
  • Participant nodes perform all transaction operations until the query is initiated, and write Undo and Redo information to the log. (Note: If successful, each participant has already performed transaction operations)
  • Each participant node responds to the inquiry initiated by the coordinator node. If the transaction operation of the participant node is actually executed successfully, it returns a "agree" message; if the transaction operation of the participant node actually fails, it returns a "abort" message.

2) The second stage (submission stage)

When the corresponding messages obtained by the coordinator node from all participant nodes are "agree":

  • The coordinator node sends a "commit" request to all participant nodes.
  • The participant node formally completes the operation and releases the resources occupied during the entire transaction period.
  • The participant node sends a "complete" message to the coordinator node.
  • The coordinator node completes the transaction after receiving the "complete" message fed back by all participant nodes.

If the response message returned by any participant node in the first stage is "abort":

  • The coordinator node sends a "rollback" request to all participant nodes.
  • Participant nodes use the previously written Undo information to perform rollback and release the resources occupied during the entire transaction period.
  • The participant node sends a "rollback complete" message to the coordinator node.
  • The coordinator node cancels the transaction after receiving the "rollback complete" message fed back by all participant nodes.

3.2 3PC

Three-phase commit agreement

The three-phase submission has three stages: CanCommit, PreCommit, and DoCommit.

3.3 TCC

Try Confirm Cancel

  • Try: Try the business to be executed

This process does not execute the business, but completes the consistency check of all the businesses, and reserves all the resources required for execution

  • Confirm: Perform business

This process really starts to execute the business. Since the consistency check has been completed in the Try phase, this process is executed directly without any checks. And in the process of execution, the business resources reserved in the Try phase will be used.

  • Cancel: Cancel the executed business

If the business execution fails, it will enter the Cancel phase, which will release all occupied business resources and roll back the operations performed in the Confirm phase.

TCC case:


4. Solution:

  • Global affairs:
Jta: java Transaction api
  • Distributed transaction based on reliable message queue

Message queue: delayed, repeated consumption