您好,登录后才能下订单哦!
这篇文章主要介绍“RocketMQ Broker怎么实现高可用高并发的消息中转服务”的相关知识,小编通过实际案例向大家展示操作过程,操作方法简单快捷,实用性强,希望这篇“RocketMQ Broker怎么实现高可用高并发的消息中转服务”文章能帮助大家解决问题。
broker主要作用就是存储消息。所以重点就放在它对于消息的处理上面。我提出几个问题,后续看代码解答。
broker启动的时候是怎么向nameserv进行注册的?
productor发送过来的消息是怎么储存的?
comsumer是怎么在broker拉取数据的?
高可用怎么做的?broker挂了怎么办,数据肯定要有备份的
注册的时候,就是在启动的时候,向所有的nameService注册自己的信息。其中nameService的地址是可以在启动的时候配置的。代码在org.apache.rocketmq.broker.out.BrokerOuterAPI#registerBrokerAll。这里我省略了其他代码
public List<RegisterBrokerResult> registerBrokerAll( final String clusterName, final String brokerAddr, final String brokerName, final long brokerId, final String haServerAddr, final TopicConfigSerializeWrapper topicConfigWrapper, final List<String> filterServerList, final boolean oneway, final int timeoutMills, final boolean enableActingMaster, final boolean compressed, final Long heartbeatTimeoutMillis, final BrokerIdentity brokerIdentity) { final List<RegisterBrokerResult> registerBrokerResultList = new CopyOnWriteArrayList<>(); List<String> nameServerAddressList = this.remotingClient.getAvailableNameSrvList(); if (nameServerAddressList != null && nameServerAddressList.size() > 0) { final CountDownLatch countDownLatch = new CountDownLatch(nameServerAddressList.size()); for (final String namesrvAddr : nameServerAddressList) { brokerOuterExecutor.execute(new AbstractBrokerRunnable(brokerIdentity) { @Override public void run2() { try { RegisterBrokerResult result = registerBroker(namesrvAddr, oneway, timeoutMills, requestHeader, body); if (result != null) { registerBrokerResultList.add(result); } LOGGER.info("Registering current broker to name server completed. TargetHost={}", namesrvAddr); } catch (Exception e) { LOGGER.error("Failed to register current broker to name server. TargetHost={}", namesrvAddr, e); } finally { countDownLatch.countDown(); } } }); } try { if (!countDownLatch.await(timeoutMills, TimeUnit.MILLISECONDS)) { LOGGER.warn("Registration to one or more name servers does NOT complete within deadline. Timeout threshold: {}ms", timeoutMills); } } catch (InterruptedException ignore) { } } return registerBrokerResultList; }
这里用了countDownLatch来判断一下所有broker注册完成是否超时,超时就打印一个warn。
具体可以看官网的文档设计。我这里贴一部分内容。
消息存储架构图中主要有下面三个跟消息存储相关的文件构成。
(1) CommitLog:消息主体以及元数据的存储主体,存储Producer端写入的消息主体内容,消息内容不是定长的。单个文件大小默认1G, 文件名长度为20位,左边补零,剩余为起始偏移量,比如00000000000000000000代表了第一个文件,起始偏移量为0,文件大小为1G=1073741824;当第一个文件写满了,第二个文件为00000000001073741824,起始偏移量为1073741824,以此类推。消息主要是顺序写入日志文件,当文件满了,写入下一个文件;
(2) ConsumeQueue:消息消费索引,引入的目的主要是提高消息消费的性能。ConsumeQueue作为消费消息的索引,保存了指定Topic下的队列消息在CommitLog中的起始物理偏移量offset,消息大小size和消息Tag的HashCode值。consumequeue文件可以看成是基于topic的commitlog索引文件,故consumequeue文件夹的组织方式如下:topic/queue/file三层组织结构
(3) IndexFile:IndexFile(索引文件)提供了一种可以通过key或时间区间来查询消息的方法。Index文件的存储位置是:$HOME/store/index/{fileName},文件名fileName是以创建时的时间戳命名的,固定的单个IndexFile文件大小约为400M,一个IndexFile可以保存 2000W个索引,IndexFile的底层存储设计为在文件系统中实现HashMap结构,故RocketMQ的索引文件其底层实现为hash索引。
具体请求是通过netty来处理的
NettyRemotingAbstract#processRequestCommand里面会根据请求code拿到具体的processor。
其中
SendMessageProcessor 负责处理 Producer 发送消息的请求;
PullMessageProcessor 负责处理 Consumer 消费消息的请求;
QueryMessageProcessor 负责处理按照消息 Key 等查询消息的请求。
数据写入主要是在DefaultMessageStore#asyncPutMessage里面
public CompletableFuture<PutMessageResult> asyncPutMessage(final MessageExtBrokerInner msg) { ...... topicQueueLock.lock(topicQueueKey); try { boolean needAssignOffset = true; if (defaultMessageStore.getMessageStoreConfig().isDuplicationEnable() && defaultMessageStore.getMessageStoreConfig().getBrokerRole() != BrokerRole.SLAVE) { needAssignOffset = false; } if (needAssignOffset) { defaultMessageStore.assignOffset(msg, getMessageNum(msg)); } PutMessageResult encodeResult = putMessageThreadLocal.getEncoder().encode(msg); if (encodeResult != null) { return CompletableFuture.completedFuture(encodeResult); } msg.setEncodedBuff(putMessageThreadLocal.getEncoder().getEncoderBuffer()); PutMessageContext putMessageContext = new PutMessageContext(topicQueueKey); putMessageLock.lock(); //spin or ReentrantLock ,depending on store config try { long beginLockTimestamp = this.defaultMessageStore.getSystemClock().now(); this.beginTimeInLock = beginLockTimestamp; // Here settings are stored timestamp, in order to ensure an orderly // global if (!defaultMessageStore.getMessageStoreConfig().isDuplicationEnable()) { msg.setStoreTimestamp(beginLockTimestamp); } if (null == mappedFile || mappedFile.isFull()) { // 首先获取mappedFile mappedFile = this.mappedFileQueue.getLastMappedFile(0); // Mark: NewFile may be cause noise } if (null == mappedFile) { log.error("create mapped file1 error, topic: " + msg.getTopic() + " clientAddr: " + msg.getBornHostString()); beginTimeInLock = 0; return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.CREATE_MAPPED_FILE_FAILED, null)); } // 写入数据 result = mappedFile.appendMessage(msg, this.appendMessageCallback, putMessageContext); switch (result.getStatus()) { case PUT_OK: onCommitLogAppend(msg, result, mappedFile); break; case END_OF_FILE: onCommitLogAppend(msg, result, mappedFile); unlockMappedFile = mappedFile; // Create a new file, re-write the message mappedFile = this.mappedFileQueue.getLastMappedFile(0); if (null == mappedFile) { // XXX: warn and notify me log.error("create mapped file2 error, topic: " + msg.getTopic() + " clientAddr: " + msg.getBornHostString()); beginTimeInLock = 0; return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.CREATE_MAPPED_FILE_FAILED, result)); } result = mappedFile.appendMessage(msg, this.appendMessageCallback, putMessageContext); if (AppendMessageStatus.PUT_OK.equals(result.getStatus())) { onCommitLogAppend(msg, result, mappedFile); } break; case MESSAGE_SIZE_EXCEEDED: case PROPERTIES_SIZE_EXCEEDED: beginTimeInLock = 0; return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.MESSAGE_ILLEGAL, result)); case UNKNOWN_ERROR: beginTimeInLock = 0; return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, result)); default: beginTimeInLock = 0; return CompletableFuture.completedFuture(new PutMessageResult(PutMessageStatus.UNKNOWN_ERROR, result)); } elapsedTimeInLock = this.defaultMessageStore.getSystemClock().now() - beginLockTimestamp; beginTimeInLock = 0; } finally { putMessageLock.unlock(); } } finally { topicQueueLock.unlock(topicQueueKey); } if (elapsedTimeInLock > 500) { log.warn("[NOTIFYME]putMessage in lock cost time(ms)={}, bodyLength={} AppendMessageResult={}", elapsedTimeInLock, msg.getBody().length, result); } if (null != unlockMappedFile && this.defaultMessageStore.getMessageStoreConfig().isWarmMapedFileEnable()) { this.defaultMessageStore.unlockMappedFile(unlockMappedFile); } PutMessageResult putMessageResult = new PutMessageResult(PutMessageStatus.PUT_OK, result); // Statistics storeStatsService.getSinglePutMessageTopicTimesTotal(msg.getTopic()).add(result.getMsgNum()); storeStatsService.getSinglePutMessageTopicSizeTotal(topic).add(result.getWroteBytes()); // 刷盘策略 return handleDiskFlushAndHA(putMessageResult, msg, needAckNums, needHandleHA); }
首先获取mappedFile,可以理解就是commitLog文件的一个映射。创建mappedFile会同时提前创建两个文件,避免了下次创建文件等待。
org.apache.rocketmq.store.AllocateMappedFileService#mmapOperation
private boolean mmapOperation() { boolean isSuccess = false; AllocateRequest req = null; try { req = this.requestQueue.take(); AllocateRequest expectedRequest = this.requestTable.get(req.getFilePath()); if (null == expectedRequest) { log.warn("this mmap request expired, maybe cause timeout " + req.getFilePath() + " " + req.getFileSize()); return true; } if (expectedRequest != req) { log.warn("never expected here, maybe cause timeout " + req.getFilePath() + " " + req.getFileSize() + ", req:" + req + ", expectedRequest:" + expectedRequest); return true; } if (req.getMappedFile() == null) { long beginTime = System.currentTimeMillis(); MappedFile mappedFile; if (messageStore.getMessageStoreConfig().isTransientStorePoolEnable()) { try { mappedFile = ServiceLoader.load(MappedFile.class).iterator().next(); mappedFile.init(req.getFilePath(), req.getFileSize(), messageStore.getTransientStorePool()); } catch (RuntimeException e) { log.warn("Use default implementation."); mappedFile = new DefaultMappedFile(req.getFilePath(), req.getFileSize(), messageStore.getTransientStorePool()); } } else { mappedFile = new DefaultMappedFile(req.getFilePath(), req.getFileSize()); } long elapsedTime = UtilAll.computeElapsedTimeMilliseconds(beginTime); if (elapsedTime > 10) { int queueSize = this.requestQueue.size(); log.warn("create mappedFile spent time(ms) " + elapsedTime + " queue size " + queueSize + " " + req.getFilePath() + " " + req.getFileSize()); } // pre write mappedFile if (mappedFile.getFileSize() >= this.messageStore.getMessageStoreConfig() .getMappedFileSizeCommitLog() && this.messageStore.getMessageStoreConfig().isWarmMapedFileEnable()) { mappedFile.warmMappedFile(this.messageStore.getMessageStoreConfig().getFlushDiskType(), this.messageStore.getMessageStoreConfig().getFlushLeastPagesWhenWarmMapedFile()); } req.setMappedFile(mappedFile); this.hasException = false; isSuccess = true; } } catch (InterruptedException e) { log.warn(this.getServiceName() + " interrupted, possibly by shutdown."); this.hasException = true; return false; } catch (IOException e) { log.warn(this.getServiceName() + " service has exception. ", e); this.hasException = true; if (null != req) { requestQueue.offer(req); try { Thread.sleep(1); } catch (InterruptedException ignored) { } } } finally { if (req != null && isSuccess) req.getCountDownLatch().countDown(); } return true; }
这里会去初始化mapperFile
org.apache.rocketmq.store.logfile.DefaultMappedFile#init
private void init(final String fileName, final int fileSize) throws IOException { ...... try { this.fileChannel = new RandomAccessFile(this.file, "rw").getChannel(); this.mappedByteBuffer = this.fileChannel.map(MapMode.READ_WRITE, 0, fileSize); TOTAL_MAPPED_VIRTUAL_MEMORY.addAndGet(fileSize); TOTAL_MAPPED_FILES.incrementAndGet(); ok = true; } catch (FileNotFoundException e) { log.error("Failed to create file " + this.fileName, e); throw e; } catch (IOException e) { log.error("Failed to map file " + this.fileName, e); throw e; } finally { if (!ok && this.fileChannel != null) { this.fileChannel.close(); } } }
这里其实就是用java的map创建文件。
如果开启了堆外对象池,会用writeBuffer来写入数据。读取文件还是用mappedByteBuffer。
@Override public void init(final String fileName, final int fileSize, final TransientStorePool transientStorePool) throws IOException { init(fileName, fileSize); this.writeBuffer = transientStorePool.borrowBuffer(); this.transientStorePool = transientStorePool; }
在创建好maperFile后,还有个预热的操作
public void warmMappedFile(FlushDiskType type, int pages) { this.mappedByteBufferAccessCountSinceLastSwap++; long beginTime = System.currentTimeMillis(); ByteBuffer byteBuffer = this.mappedByteBuffer.slice(); int flush = 0; long time = System.currentTimeMillis(); //通过写入 1G 的字节 0 来让操作系统分配物理内存空间,如果没有填充值,操作系统不会实际分配物理内存,防止在写入消息时发生缺页异常 for (int i = 0, j = 0; i < this.fileSize; i += DefaultMappedFile.OS_PAGE_SIZE, j++) { byteBuffer.put(i, (byte) 0); // force flush when flush disk type is sync if (type == FlushDiskType.SYNC_FLUSH) { if ((i / OS_PAGE_SIZE) - (flush / OS_PAGE_SIZE) >= pages) { flush = i; mappedByteBuffer.force(); } } // 这里就是每隔一段时间sleep一下,这样让其他线程有执行的机会,这其中也包括gc线程,让gc线程有机会在循环的中途可以执行gc。避免很久才执行一次gc // prevent gc if (j % 1000 == 0) { log.info("j={}, costTime={}", j, System.currentTimeMillis() - time); time = System.currentTimeMillis(); try { Thread.sleep(0); } catch (InterruptedException e) { log.error("Interrupted", e); } } } // force flush when prepare load finished if (type == FlushDiskType.SYNC_FLUSH) { log.info("mapped file warm-up done, force to disk, mappedFile={}, costTime={}", this.getFileName(), System.currentTimeMillis() - beginTime); mappedByteBuffer.force(); } log.info("mapped file warm-up done. mappedFile={}, costTime={}", this.getFileName(), System.currentTimeMillis() - beginTime); this.mlock(); }
因为通过 mmap 映射,只是建立了进程虚拟内存地址与物理内存地址之间的映射关系,并没有将 Page Cache 加载至内存。读写数据时如果没有命中写 Page Cache 则发生缺页中断,从磁盘重新加载数据至内存,这样会影响读写性能。为了防止缺页异常,阻止操作系统将相关的内存页调度到交换空间(swap space),RocketMQ 通过对文件预热,将对应page cache提前加载到内存中。
然后中间循环会sleep一下,就是让gc可以运行。我复制一下chatGpt的回答:
这段代码中的if (j % 1000 == 0)语句是为了防止频繁的GC。在每次循环中,当j的值是1000的倍数时,会执行一次Thread.sleep(0),这个操作会让当前线程暂停一小段时间,从而让JVM有机会回收一些不再使用的对象。这样做的目的是为了减少GC的频率,从而提高程序的性能。
最后还有一个锁定
public void mlock() { final long beginTime = System.currentTimeMillis(); final long address = ((DirectBuffer) (this.mappedByteBuffer)).address(); Pointer pointer = new Pointer(address); { // 通过系统调用 mlock 锁定该文件的 Page Cache,防止其被交换到 swap 空间 int ret = LibC.INSTANCE.mlock(pointer, new NativeLong(this.fileSize)); log.info("mlock {} {} {} ret = {} time consuming = {}", address, this.fileName, this.fileSize, ret, System.currentTimeMillis() - beginTime); } { // 通过系统调用 madvise 给操作系统建议,说明该文件在不久的将来要被访问 int ret = LibC.INSTANCE.madvise(pointer, new NativeLong(this.fileSize), LibC.MADV_WILLNEED); log.info("madvise {} {} {} ret = {} time consuming = {}", address, this.fileName, this.fileSize, ret, System.currentTimeMillis() - beginTime); } }
然后就是对mapperFile进行写入消息。就是拿着buffer写入具体的数据。
接着就是处理刷盘方式和高可用。
org.apache.rocketmq.store.CommitLog#handleDiskFlushAndHA
private CompletableFuture<PutMessageResult> handleDiskFlushAndHA(PutMessageResult putMessageResult, MessageExt messageExt, int needAckNums, boolean needHandleHA) { // 处理刷盘机制 CompletableFuture<PutMessageStatus> flushResultFuture = handleDiskFlush(putMessageResult.getAppendMessageResult(), messageExt); CompletableFuture<PutMessageStatus> replicaResultFuture; if (!needHandleHA) { replicaResultFuture = CompletableFuture.completedFuture(PutMessageStatus.PUT_OK); } else { // 处理HA replicaResultFuture = handleHA(putMessageResult.getAppendMessageResult(), putMessageResult, needAckNums); } return flushResultFuture.thenCombine(replicaResultFuture, (flushStatus, replicaStatus) -> { if (flushStatus != PutMessageStatus.PUT_OK) { putMessageResult.setPutMessageStatus(flushStatus); } if (replicaStatus != PutMessageStatus.PUT_OK) { putMessageResult.setPutMessageStatus(replicaStatus); } return putMessageResult; }); }
处理刷盘
org.apache.rocketmq.store.CommitLog.DefaultFlushManager#handleDiskFlush
@Override public CompletableFuture<PutMessageStatus> handleDiskFlush(AppendMessageResult result, MessageExt messageExt) { // Synchronization flush if (FlushDiskType.SYNC_FLUSH == CommitLog.this.defaultMessageStore.getMessageStoreConfig().getFlushDiskType()) { final GroupCommitService service = (GroupCommitService) this.flushCommitLogService; if (messageExt.isWaitStoreMsgOK()) { GroupCommitRequest request = new GroupCommitRequest(result.getWroteOffset() + result.getWroteBytes(), CommitLog.this.defaultMessageStore.getMessageStoreConfig().getSyncFlushTimeout()); flushDiskWatcher.add(request); service.putRequest(request); return request.future(); } else { service.wakeup(); return CompletableFuture.completedFuture(PutMessageStatus.PUT_OK); } } // Asynchronous flush else { if (!CommitLog.this.defaultMessageStore.getMessageStoreConfig().isTransientStorePoolEnable()) { flushCommitLogService.wakeup(); } else { commitLogService.wakeup(); } return CompletableFuture.completedFuture(PutMessageStatus.PUT_OK); } }
根据配置的同步刷盘或者异步刷盘的机制来决定具体的刷盘策略。
处理高可用
org.apache.rocketmq.store.CommitLog#handleHA
private CompletableFuture<PutMessageStatus> handleHA(AppendMessageResult result, PutMessageResult putMessageResult, int needAckNums) { if (needAckNums >= 0 && needAckNums <= 1) { return CompletableFuture.completedFuture(PutMessageStatus.PUT_OK); } HAService haService = this.defaultMessageStore.getHaService(); long nextOffset = result.getWroteOffset() + result.getWroteBytes(); // Wait enough acks from different slaves GroupCommitRequest request = new GroupCommitRequest(nextOffset, this.defaultMessageStore.getMessageStoreConfig().getSlaveTimeout(), needAckNums); haService.putRequest(request); haService.getWaitNotifyObject().wakeupAll(); return request.future(); }
其实后台一直有一个同步线程去处理消息同步的事情,只要比较一下master和salve的commitLog的offset就可以比较出来差多少数据了。所以把slave没有的数据同步过去就可以了,这块后面再写一篇文章细讲。
那还有一个问题,consumeQueue和indexFile是怎么处理的呢?
ReputMessageService里面会去读取commitLog的数据,写入到comsunerQueue和IndexFile
根据各个dispatch,分别处理两个文件。这里就不细讲了。
ConsumeQueue的处理是在这里面
org.apache.rocketmq.store.DefaultMessageStore.CommitLogDispatcherBuildConsumeQueue#dispatch
文件的名字其实就是topic/queueid。写入的数据是
this.byteBufferIndex.flip(); this.byteBufferIndex.limit(CQ_STORE_UNIT_SIZE); this.byteBufferIndex.putLong(offset); this.byteBufferIndex.putInt(size); this.byteBufferIndex.putLong(tagsCode);
其实就是commitLog的一个offset,根据这个值就可以拿到具体的消息了。
org.apache.rocketmq.store.DefaultMessageStore.CommitLogDispatcherBuildIndex
indexFile就是写入这些数据
this.mappedByteBuffer.putInt(absIndexPos, keyHash); this.mappedByteBuffer.putLong(absIndexPos + 4, phyOffset); this.mappedByteBuffer.putInt(absIndexPos + 4 + 8, (int) timeDiff); this.mappedByteBuffer.putInt(absIndexPos + 4 + 8 + 4, slotValue); this.mappedByteBuffer.putInt(absSlotPos, this.indexHeader.getIndexCount());
包括key的hash值,还有物理偏移,还有时间等信息。首先文件是按照每个毫秒创建的,所以天然就是按照时间顺序排列。根据key查询的话,写入文件的位置是根据key的hash来的,所以可以马上知道是哪个位置。
好了,到这里数据存储就差不多了。来看看怎么读消息的
拉取消息有自己的处理器:
org.apache.rocketmq.broker.processor.PullMessageProcessor#processRequest
里面有很多额外的逻辑,具体在下面的方法中:
org.apache.rocketmq.store.DefaultMessageStore#getMessage
消息读取很简单,就是从根据topic和queueId去consumeQueue里面读,消费者知道上次拉取到了哪里,所以就直接根据consumeQueue的offset去读内容,consumeQueue里面存的是commitLog的offset和size,根据这两个值就可以从commitLog里面拿到消息,返回。然后更新下次的offset,返回给productor。
org.apache.rocketmq.store.DefaultMessageStore#queryMessage
主要是查的indexFile,前面提到indexFile就是按照时间来创建文件的,所以先按照时间筛选出符合条件的indexFile,然后根据key的hash,找到文件对应的写入位置,因为对应的hash会有冲突,就一个个遍历,找到所有hash值相等的数据。然后再根据indexFile记录的offset,去commitLog里面去查消息。
关于“RocketMQ Broker怎么实现高可用高并发的消息中转服务”的内容就介绍到这里了,感谢大家的阅读。如果想了解更多行业相关的知识,可以关注亿速云行业资讯频道,小编每天都会为大家更新不同的知识点。
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。