Skip to main content

Interview question: How to ensure the high availability of the message queue?



How to ensure the high availability of the message queue?


Analysis of Interview Questions

If someone asks about your MQ knowledge, high availability is a must . It’s good to ask this question, because I can’t ask you how to ensure the high availability of Kafka? How to ensure the high availability of ActiveMQ? If an interviewer asks this question, it seems very incompetent. People may use RabbitMQ. If you haven't used Kafka before, you come up and ask them what Kafka is doing? Isn't this just making things difficult?

So a competent interviewer asks how to ensure the high availability of MQ? In this way, which MQ you have used, you can talk about your understanding of the high availability of that MQ.


1. High availability of RabbitMQ

RabbitMQ is more representative, because it is based on the master-slave (non-distributed) for high availability, we will use RabbitMQ as an example to explain how to achieve the first type of MQ high availability.


RabbitMQ has three modes: stand-alone mode, normal cluster mode, and mirrored cluster mode.


Stand-alone mode

The stand-alone mode is at the Demo level. Generally, you start the game locally, and no one produces the stand-alone mode.


Normal cluster mode (no high availability)

Normal cluster mode means to start multiple instances of RabbitMQ on multiple machines, one for each machine. The queue you create will only be placed on one RabbitMQ instance , but each instance synchronizes the metadata of the queue (metadata can be considered as some configuration information of the queue, through the metadata, you can find the instance where the queue is located). When you consume, if you actually connect to another instance, that instance will pull data from the instance where the queue is located.

This method is really troublesome and not very good. It does not achieve the so-called distributed , it is just a normal cluster. Because this causes you to either randomly connect to one instance each time and then pull data, or connect to the instance where the queue is located to consume data. The former has the overhead of data pulling , and the latter leads to a single-instance performance bottleneck .


And if the instance that puts the queue goes down, it will cause other instances to be unable to pull from that instance. If you enable message persistence and let RabbitMQ store messages on the ground, the messages will not necessarily be lost , you have to wait for this instance After recovery, you can continue to pull data from this queue.


So this matter is more embarrassing. There is no so-called high availability . This solution is mainly to improve throughput , that is, let multiple nodes in the cluster serve the read and write operations of a certain queue.


Mirror cluster mode (high availability)

This mode is the so-called high-availability mode of RabbitMQ. Unlike the normal cluster mode, in the mirrored cluster mode, the queue you create, regardless of metadata or messages in the queue, will exist on multiple instances , that is, each RabbitMQ node has a complete queue of this queue . Mirror , including the meaning of all the data of the queue. Then every time you write a message to the queue, the message will be automatically synchronized to the queue of multiple instances.

So how to turn on this mirrored cluster mode ? In fact, it is very simple. RabbitMQ has a very good management console, which is to add a strategy in the background. This strategy is a mirroring cluster mode strategy . When specified, you can request data to be synchronized to all nodes, or you can request to synchronize to a specified number When you create the queue again, apply this strategy to automatically synchronize the data to other nodes.

In this case, the advantage is that any one of your machines is down, it's okay, other machines (nodes) also contain the complete data of the queue, and other consumers can go to other nodes to consume data. The disadvantage is that, first, this performance overhead is too large, the messages need to be synchronized to all machines, resulting in heavy network bandwidth pressure and consumption! Second, these games are not distributed, and there is no scalability at all . If a queue is heavily loaded and you add a machine, the new machine also contains all the data of the queue, and there is no way to scale linearly. Your queue.


2. Kafka's high availability

One of the most basic understandings of Kafka's architecture: consists of multiple brokers, each broker is a node; you create a topic, this topic can be divided into multiple partitions, each partition can exist on a different broker, each partition is Put part of the data.


This is a natural distributed message queue , that is, the data of a topic is scattered on multiple machines, and each machine puts a part of the data .


In fact, RabbmitMQ is not a distributed message queue. It is a traditional message queue. It just provides some clustering and HA (High Availability) mechanisms, because no matter how you play, RabbitMQ is a queue The data is stored in one node. Under the mirrored cluster, each node also stores the complete data of the queue.


Before Kafka 0.8, there was no HA mechanism, that is, if any broker is down, the partition on that broker is invalid, and it cannot be written or read, and there is no high availability at all.


For example, let's suppose that a topic is created and the number of partitions is specified to be 3, each on three machines. However, if the second machine goes down, 1/3 of the data of this topic will be lost, so this is not highly available.

After Kafka 0.8, HA mechanism is provided, which is the replica mechanism. The data of each partition will be synchronized to other machines to form its own multiple replica copies. All replicas will elect a leader, then production and consumption will deal with this leader, and then other replicas are followers. When writing, the leader will be responsible for synchronizing the data to all followers. When reading, just read the data on the leader directly. Can only read and write the leader? It's very simple. If you can read and write each follower at will, then you have to care about the data consistency problem . The system complexity is too high and problems are prone to occur. Kafka will evenly distribute all replicas of a partition on different machines, so as to improve fault tolerance.

In this way, there is the so-called high availability , because if a broker goes down, it's okay, the partition on that broker has a copy on other machines. If the down broker has a leader of a certain partition, then a new leader will be re-elected from the follower at this time , and everyone can continue to read and write that new leader. This is the so-called high availability.

When writing data , the producer writes the leader, and then the leader writes the data to the local disk, and then other followers take the initiative to pull data from the leader. Once all the followers have synchronized their data, they will send an ack to the leader. After the leader receives the ack from all the followers, it will return a successful write message to the producer. (Of course, this is only one of the modes, and this behavior can also be adjusted appropriately)

When consuming , it will only be read from the leader, but only when a message has been synchronized and successfully returned to ack by all followers, the message will be read by the consumer.



Comments

Popular posts from this blog

Defination of the essential properties of operating systems

Define the essential properties of the following types of operating sys-tems:  Batch  Interactive  Time sharing  Real time  Network  Parallel  Distributed  Clustered  Handheld ANSWERS: a. Batch processing:-   Jobs with similar needs are batched together and run through the computer as a group by an operator or automatic job sequencer. Performance is increased by attempting to keep CPU and I/O devices busy at all times through buffering, off-line operation, spooling, and multi-programming. Batch is good for executing large jobs that need little interaction; it can be submitted and picked up later. b. Interactive System:-   This system is composed of many short transactions where the results of the next transaction may be unpredictable. Response time needs to be short (seconds) since the user submits and waits for the result. c. Time sharing:-   This systems uses CPU scheduling and multipro-gramming to provide economical interactive use of a system. The CPU switches rapidl

What is a Fair lock in multithreading?

  Photo by  João Jesus  from  Pexels In Java, there is a class ReentrantLock that is used for implementing Fair lock. This class accepts optional parameter fairness.  When fairness is set to true, the RenentrantLock will give access to the longest waiting thread.  The most popular use of Fair lock is in avoiding thread starvation.  Since longest waiting threads are always given priority in case of contention, no thread can starve.  The downside of Fair lock is the low throughput of the program.  Since low priority or slow threads are getting locks multiple times, it leads to slower execution of a program. The only exception to a Fair lock is tryLock() method of ReentrantLock.  This method does not honor the value of the fairness parameter.

How do clustered systems differ from multiprocessor systems? What is required for two machines belonging to a cluster to cooperate to provide a highly available service?

 How do clustered systems differ from multiprocessor systems? What is required for two machines belonging to a cluster to cooperate to provide a highly available service? Answer: Clustered systems are typically constructed by combining multiple computers into a single system to perform a computational task distributed across the cluster. Multiprocessor systems on the other hand could be a single physical entity comprising of multiple CPUs. A clustered system is less tightly coupled than a multiprocessor system. Clustered systems communicate using messages, while processors in a multiprocessor system could communicate using shared memory. In order for two machines to provide a highly available service, the state on the two machines should be replicated and should be consistently updated. When one of the machines fails, the other could then take‐over the functionality of the failed machine. Some computer systems do not provide a privileged mode of operation in hardware. Is it possible t