CN106980572A - The on-line debugging method and system of distributed system - Google Patents

The on-line debugging method and system of distributed system Download PDF

Info

Publication number
CN106980572A
CN106980572A CN201610035223.0A CN201610035223A CN106980572A CN 106980572 A CN106980572 A CN 106980572A CN 201610035223 A CN201610035223 A CN 201610035223A CN 106980572 A CN106980572 A CN 106980572A
Authority
CN
China
Prior art keywords
debugging
distributed
server
instruction
debugging information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610035223.0A
Other languages
Chinese (zh)
Other versions
CN106980572B (en
Inventor
马涛
郑旭
杨兵兵
陈生栋
李渭民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Tmall Technology Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610035223.0A priority Critical patent/CN106980572B/en
Publication of CN106980572A publication Critical patent/CN106980572A/en
Application granted granted Critical
Publication of CN106980572B publication Critical patent/CN106980572B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Prevention of errors by analysis, debugging or testing of software
    • G06F11/362Debugging of software
    • G06F11/3644Debugging of software by instrumenting at runtime

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请提出一种分布式系统的在线调试方法和系统,其中,该方法包括以下步骤:第i分布式节点接收调试信息收集指令,其中,调试信息收集指令包括收集标识,其中,i为正整数;第i分布式节点根据调试信息收集指令进入在线调试模式,并收集调试信息;第i分布式节点将调试信息发送至服务器,其中,调试信息具有与调试信息收集指令对应的编号;第i分布式节点将调试信息收集指令发送至第i+1分布式节点。本申请的分布式系统的在线调试方法,降低了调试日志收集难度,提高了日志收集效率,从而便于调试人员对调试信息进行整体分析,能够提高了调试效率。

The present application proposes an online debugging method and system for a distributed system, wherein the method includes the following steps: the i-th distributed node receives a debugging information collection instruction, wherein the debugging information collection instruction includes a collection identifier, where i is a positive integer ; The i-th distributed node enters the online debugging mode according to the debugging information collection instruction, and collects debugging information; the i-th distributed node sends the debugging information to the server, wherein the debugging information has a serial number corresponding to the debugging information collection instruction; the i-th distributed node The type node sends the debugging information collection command to the i+1th distributed node. The online debugging method for a distributed system of the present application reduces the difficulty of collecting debugging logs and improves the efficiency of collecting logs, thereby facilitating debugging personnel to conduct overall analysis on debugging information and improving debugging efficiency.

Description

分布式系统的在线调试方法和系统Online debugging method and system for distributed system

技术领域technical field

本申请涉及在线调试技术领域,特别涉及一种分布式系统的在线调试方法和系统。The present application relates to the technical field of online debugging, in particular to an online debugging method and system for a distributed system.

背景技术Background technique

传统的分布式系统中,主要是通过各个分布式节点的本地文件系统来写入日志式,这种方式对分析和解决分布式系统存在的问题而言,存在以下问题:In the traditional distributed system, the log type is mainly written through the local file system of each distributed node. This method has the following problems for analyzing and solving the problems existing in the distributed system:

1、由于传统的日志系统,将日志存储在各个分布式节点本机上,而在分布式系统严格的权限要求下,不是所有人都有权限能够访问到各个分布式节点中存储的日志,因此,存在日志访问困难的问题。1. Due to the traditional log system, the logs are stored on the local machine of each distributed node, and under the strict permission requirements of the distributed system, not everyone has the permission to access the logs stored in each distributed node, so , there is a problem of difficult log access.

2、由于分布式系统,在一次应用的响应过程中,可能经历很多部署在不同分布式节点的模块,而每个模块是独立生成并存储自己模块的日志,因此在一次响应过程中各个经历的模块的日志是独立存放的,因此,在调试时,需要调试人员从不同的模块上分别查找日志,非常不便。此外,由于分布式系统的并行特点,不同的模块上的日志的上下文相关性不明显,这也为调试带来了极大的困难。2. Due to the distributed system, during the response process of an application, many modules deployed on different distributed nodes may be experienced, and each module independently generates and stores its own module logs, so each experienced during a response process The logs of the modules are stored independently, so when debugging, it is very inconvenient for debuggers to search for logs from different modules. In addition, due to the parallel nature of distributed systems, the context correlation of logs on different modules is not obvious, which also brings great difficulties to debugging.

3、分布式系统中的每个模块是以独立的集群的形式对外服务,集群内部由成百上千台机器组成。由于负载均衡的缘故,每次响应某一请求的机器都可能不一样,这就导致调试人员收集某次请求相关的日志十分困难、低效。3. Each module in the distributed system serves externally in the form of an independent cluster, and the cluster is composed of hundreds or thousands of machines. Due to load balancing, the machine that responds to a certain request may be different each time, which makes it very difficult and inefficient for debuggers to collect logs related to a certain request.

4、分布式系统每秒钟需要处理成百上千的请求,每次请求都会把系统内部运行状态输出到日志里,造成日志内容很快膨胀,因此,很难分辨某次请求的日志。而且传统的日志中多个进程会同时写这一份日志文件,这个也会导致日志中充斥着其他进程写入的跟该次响应无关的日志,更加不便于该次请求的日志的查找,导致了分析效率较低。4. The distributed system needs to process hundreds of requests per second. Each request will output the internal operating status of the system to the log, causing the log content to expand rapidly. Therefore, it is difficult to distinguish the log of a certain request. Moreover, multiple processes in the traditional log will write this log file at the same time, which will also cause the log to be filled with logs written by other processes that have nothing to do with the response, making it even more inconvenient to find the log of the request, resulting in The analysis efficiency is low.

发明内容Contents of the invention

本申请旨在至少在一定程度上解决上述技术问题。The present application aims to solve the above-mentioned technical problems at least to a certain extent.

为此,本申请的第一个目的在于提出一种分布式系统的在线调试方法,降低了调试日志收集难度,提高了日志收集效率。Therefore, the first purpose of this application is to propose an online debugging method for a distributed system, which reduces the difficulty of collecting debugging logs and improves the efficiency of log collection.

本申请的第二个目的在于提出一种分布式系统的在线调试系统。The second purpose of the present application is to propose an online debugging system for a distributed system.

为达上述目的,根据本申请第一方面实施例提出了一种分布式系统的在线调试方法,包括以下步骤:第i分布式节点接收调试信息收集指令,其中,所述调试信息收集指令包括收集标识,其中,i为正整数;所述第i分布式节点根据所述调试信息收集指令进入在线调试模式,并收集调试信息;所述第i分布式节点将所述调试信息发送至服务器,其中,所述调试信息具有与所述调试信息收集指令对应的编号;所述第i分布式节点将所述调试信息收集指令发送至第i+1分布式节点。To achieve the above purpose, according to the embodiment of the first aspect of the present application, an online debugging method for a distributed system is proposed, including the following steps: the i-th distributed node receives a debugging information collection instruction, wherein the debugging information collection instruction includes collecting ID, wherein, i is a positive integer; the i-th distributed node enters the online debugging mode according to the debugging information collection instruction, and collects debugging information; the i-th distributed node sends the debugging information to the server, wherein , the debugging information has a serial number corresponding to the debugging information collection instruction; the ith distributed node sends the debugging information collection instruction to the i+1th distributed node.

本申请实施例的分布式系统的在线调试方法,可分布式系统中的分布式节点在接收包括收集标识的调试信息收集指令,进入在线调试模式,并收集调试信息,发送至服务器,通过收集标识的标识作用可有针对性地对调试信息进行收集,从而避免调试信息与其他运行日志信息相混淆,且能够大大降低日志膨胀的程度。此外,调试信息具有与调试信息收集指令对应的编号,从而可根据编号对不同调试信息收集指令对应调试信息进行区别,便于调试信息的收集、查找和关联。因此,本申请实施例降低了调试日志收集难度,提高了日志收集效率,且便于调试人员对调试信息进行整体分析,能够提高调试效率。In the online debugging method of the distributed system in the embodiment of the present application, the distributed nodes in the distributed system can enter the online debugging mode after receiving the debugging information collection instruction including the collection identifier, and collect the debugging information, send it to the server, and pass the collection identifier The identification function of can collect debugging information in a targeted manner, so as to avoid confusion between debugging information and other running log information, and can greatly reduce the degree of log expansion. In addition, the debugging information has a number corresponding to the debugging information collection instruction, so that the debugging information corresponding to different debugging information collection instructions can be distinguished according to the number, which facilitates the collection, search and association of debugging information. Therefore, the embodiment of the present application reduces the difficulty of collecting debugging logs, improves the efficiency of collecting logs, and facilitates overall analysis of debugging information by debugging personnel, thereby improving debugging efficiency.

本申请第二方面实施例提供了一种分布式系统的在线调试系统,包括:多个分布式节点和服务器,其中,分布式节点用于接收调试信息收集指令,其中,所述调试信息收集指令包括收集标识,并根据所述调试信息收集指令进入在线调试模式,并收集调试信息,以及将所述调试信息发送至所述服务器,其中,所述调试信息具有与所述调试信息收集指令对应的编号,并将所述调试信息收集指令发送至下一分布式节点;所述服务器用于接收所述分布式节点发送的调试信息。The embodiment of the second aspect of the present application provides an online debugging system for a distributed system, including: a plurality of distributed nodes and servers, wherein the distributed nodes are used to receive debugging information collection instructions, wherein the debugging information collection instructions Include a collection identifier, enter an online debugging mode according to the debugging information collection instruction, collect debugging information, and send the debugging information to the server, wherein the debugging information has a corresponding to the debugging information collection instruction number, and send the debugging information collection instruction to the next distributed node; the server is used to receive the debugging information sent by the distributed node.

本申请实施例的分布式系统的在线调试系统,可分布式系统中的分布式节点在接收包括收集标识的调试信息收集指令,进入在线调试模式,并收集调试信息,发送至服务器,通过收集标识的标识作用可有针对性地对调试信息进行收集,从而避免调试信息与其他运行日志信息相混淆,且能够大大降低日志膨胀的程度。此外,调试信息具有与调试信息收集指令对应的编号,从而可根据编号对不同调试信息收集指令对应调试信息进行区别,便于调试信息的收集、查找和关联。因此,本申请实施例降低了调试日志收集难度,提高了日志收集效率,且便于调试人员对调试信息进行整体分析,能够提高调试效率。In the online debugging system of the distributed system in the embodiment of the present application, the distributed nodes in the distributed system can enter the online debugging mode after receiving the debugging information collection instruction including the collection identifier, and collect the debugging information, send it to the server, and pass the collection identifier The identification function of can collect debugging information in a targeted manner, so as to avoid confusion between debugging information and other running log information, and can greatly reduce the degree of log expansion. In addition, the debugging information has a number corresponding to the debugging information collection instruction, so that the debugging information corresponding to different debugging information collection instructions can be distinguished according to the number, which facilitates the collection, search and association of debugging information. Therefore, the embodiment of the present application reduces the difficulty of collecting debugging logs, improves the efficiency of collecting logs, and facilitates overall analysis of debugging information by debugging personnel, thereby improving debugging efficiency.

本申请的附加方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.

附图说明Description of drawings

本申请的上述和/或附加的方面和优点从结合下面附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present application will become apparent and easily understood from the description of the embodiments in conjunction with the following drawings, wherein:

图1为根据本申请一个实施例的分布式系统的在线调试方法的流程图;Fig. 1 is the flowchart of the online debugging method of the distributed system according to one embodiment of the present application;

图2为根据本申请另一个实施例的分布式系统的在线调试方法的流程图;FIG. 2 is a flow chart of an online debugging method for a distributed system according to another embodiment of the present application;

图3为根据本申请另一个实施例的分布式系统的在线调试示意图;FIG. 3 is a schematic diagram of online debugging of a distributed system according to another embodiment of the present application;

图4为根据本申请一个实施例的分布式系统的在线调试系统的结构示意图。Fig. 4 is a schematic structural diagram of an online debugging system of a distributed system according to an embodiment of the present application.

具体实施方式detailed description

下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本申请,而不能理解为对本申请的限制。Embodiments of the present application are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals denote the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary, and are only for explaining the present application, and should not be construed as limiting the present application.

下面参考附图描述根据本申请实施例的分布式系统的在线调试方法和系统。The online debugging method and system of a distributed system according to the embodiments of the present application will be described below with reference to the accompanying drawings.

图1为根据本申请一个实施例的分布式系统的在线调试方法的流程图。Fig. 1 is a flowchart of an online debugging method for a distributed system according to an embodiment of the present application.

如图1所示,根据本申请实施例的分布式系统的在线调试方法,包括:As shown in Figure 1, the online debugging method of the distributed system according to the embodiment of the present application includes:

S101,第i分布式节点接收调试信息收集指令,其中,调试信息收集指令包括收集标识,其中,i为正整数。S101. The i-th distributed node receives a debugging information collection instruction, where the debugging information collection instruction includes a collection identifier, where i is a positive integer.

分布式系统由多个分布式节点组成。每个分布式节点可用于处理服务器发送的请求。客户端用户可根据处理需求向服务器发送应用程序的业务请求或在线调试请求等。例如,对于业务请求来说,可根据各个分布式节点的空闲度分配相应的节点来处理该业务请求。而对于在线调试请求来说,可根据客户端请求向需要收集调试信息的节点发送调试信息收集指令。A distributed system consists of multiple distributed nodes. Each distributed node can be used to process requests sent by the server. Client users can send application business requests or online debugging requests to the server according to processing requirements. For example, for a service request, corresponding nodes may be allocated to process the service request according to the idleness of each distributed node. For an online debugging request, a debugging information collection instruction can be sent to a node that needs to collect debugging information according to a client request.

服务器可根据客户端发送的请求向分布式系统中相应的分布式节点发送相应的指令,以通过各个分布式节点对对客户端的请求进行处理。其中,在线调试请求与业务请求不同,服务器对于客户端的业务请求向分布式系统发送相应的业务处理指令;对于客户端的在线调试请求向分布式系统发送相应的调试信息收集指令。The server can send corresponding instructions to corresponding distributed nodes in the distributed system according to the request sent by the client, so as to process the request to the client through each distributed node. Wherein, the online debugging request is different from the business request, the server sends the corresponding business processing instruction to the distributed system for the client's business request; and sends the corresponding debugging information collection instruction to the distributed system for the client's online debugging request.

本申请的实施例中,为了区分应用程序的业务请求与在线调试请求,服务器在发送调试信息收集指令时,可为调试信息收集指令设置收集标识。从而各个分布式节点可根据接收到的请求是否包括该收集标识判断出接收到的是否为调试信息收集指令,后续可仅针对调试信息收集指令对应的日志进行收集,更加方便,提高信息收集的效率。In the embodiment of the present application, in order to distinguish the service request of the application program from the online debugging request, the server may set a collection flag for the debugging information collection instruction when sending the debugging information collection instruction. Therefore, each distributed node can judge whether the received request is a debugging information collection instruction according to whether the received request includes the collection identifier, and then only collect the logs corresponding to the debugging information collection instruction, which is more convenient and improves the efficiency of information collection .

在本申请的一个实施例中,服务器可根据客户端的请求向分布式系统中相应的分布式节点发送调试信息收集指令,以进行调试,各个分布式节点在调试过程中或者调试完成时,可将调试信息收集指令发送至下一分布式节点,以对下一分布式节点进行调试。因此,第i分布式节点接收的调试信息收集指令可由服务器发送或由第i-1分布式节点发送。In one embodiment of the present application, the server can send debugging information collection instructions to the corresponding distributed nodes in the distributed system according to the client's request for debugging, and each distributed node can send the The debugging information collection instruction is sent to the next distributed node, so as to debug the next distributed node. Therefore, the debugging information collection instruction received by the i-th distributed node may be sent by the server or sent by the i-1th distributed node.

S102,第i分布式节点根据调试信息收集指令进入在线调试模式,并收集调试信息。S102. The i-th distributed node enters an online debugging mode according to the debugging information collection instruction, and collects debugging information.

当第i分布式节点接收到调试信息收集指令后,可进入在线调试模式,即根据当前指令进行的操作为在线调试,则可对当前的处理过程进行记录,得到调试信息。When the i-th distributed node receives the debugging information collection instruction, it can enter the online debugging mode, that is, the operation performed according to the current instruction is online debugging, and the current processing process can be recorded to obtain debugging information.

具体地,在本申请的实施例中,每个分布式节点中设置有Trace API(Trace ApplicationProgramming Interface跟踪应用程序接口),分布式节点可通过Trace API收集调试信息。Specifically, in the embodiment of the present application, a Trace API (Trace Application Programming Interface) is set in each distributed node, and the distributed nodes can collect debugging information through the Trace API.

S103,第i分布式节点将调试信息发送至服务器,其中,调试信息具有与调试信息收集指令对应的编号。S103. The i-th distributed node sends the debugging information to the server, where the debugging information has a serial number corresponding to the debugging information collection instruction.

第i分布式节点在收集调试信息后,可将收集到的调试信息发送至服务器,以由服务器根据其他分布式节点收到的同一调试信息收集指令对应的调试信息。After the i-th distributed node collects the debugging information, it can send the collected debugging information to the server, so that the server can collect the debugging information corresponding to the instruction according to the same debugging information received by other distributed nodes.

其中,该编号可以是预设长度的字符串。举例来说,该编号可以是64位的数字串。每个编号对应一次在线调试,用于唯一标识异常调试信息。也就是说,无论在哪个分布式节点中的调试信息,只要是同一次在线调试的调试信息,都具有相同的编号。从而可根据调试信息的编号将不同分布式节点上的同一调试信息收集指令对应调试信息进行关联,便于调试人员对调试信息进行整体分析。Wherein, the number may be a character string with a preset length. For example, the number can be a 64-bit numeric string. Each number corresponds to an online debugging and is used to uniquely identify abnormal debugging information. That is to say, no matter which distributed node the debugging information is in, as long as it is the debugging information of the same online debugging, it has the same number. Therefore, the debug information corresponding to the same debug information collection instruction on different distributed nodes can be correlated according to the number of the debug information, so that debuggers can analyze the debug information as a whole.

S104,第i分布式节点将调试信息收集指令发送至第i+1分布式节点。S104. The i-th distributed node sends the debugging information collection instruction to the i+1-th distributed node.

第i分布式节点在完成调试后,可向其下一分布式节点(即第i+1分布式节点)发送调试信息收集指令,以控制第i+1分布式节点收集调试信息。After the i-th distributed node finishes debugging, it can send a debugging information collection instruction to its next distributed node (ie, the i+1-th distributed node), so as to control the i+1-th distributed node to collect debugging information.

本申请实施例的分布式系统的在线调试方法,可分布式系统中的分布式节点在接收包括收集标识的调试信息收集指令,进入在线调试模式,并收集调试信息,发送至服务器,通过收集标识的标识作用可有针对性地对调试信息进行收集,从而避免调试信息与其他运行日志信息相混淆,且能够大大降低日志膨胀的程度。此外,调试信息具有与调试信息收集指令对应的编号,从而可根据编号对不同调试信息收集指令对应调试信息进行区别,便于调试信息的收集、查找和关联。因此,本申请实施例降低了调试日志收集难度,提高了日志收集效率,且便于调试人员对调试信息进行整体分析,能够提高调试效率。In the online debugging method of the distributed system in the embodiment of the present application, the distributed nodes in the distributed system can enter the online debugging mode after receiving the debugging information collection instruction including the collection identifier, and collect the debugging information, send it to the server, and pass the collection identifier The identification function of can collect debugging information in a targeted manner, so as to avoid confusion between debugging information and other running log information, and can greatly reduce the degree of log expansion. In addition, the debugging information has a number corresponding to the debugging information collection instruction, so that the debugging information corresponding to different debugging information collection instructions can be distinguished according to the number, which facilitates the collection, search and association of debugging information. Therefore, the embodiment of the present application reduces the difficulty of collecting debugging logs, improves the efficiency of collecting logs, and facilitates overall analysis of debugging information by debugging personnel, thereby improving debugging efficiency.

进一步地,图2为根据本申请另一个实施例的分布式系统的在线调试方法的流程图。Further, FIG. 2 is a flowchart of an online debugging method for a distributed system according to another embodiment of the present application.

如图2所示,根据本申请的分布式系统的在线调试方法,包括步骤S201-S204,步骤S201-S204与图1中步骤S101-S104相同,进一步地,还可包括以步骤S205-S209。As shown in FIG. 2, the online debugging method of a distributed system according to the present application includes steps S201-S204, which are the same as steps S101-S104 in FIG. 1, and further, steps S205-S209 may also be included.

S205,第i+1分布式节点接收调试信息收集指令,其中,调试信息收集指令包括收集标识。S205. The i+1th distributed node receives a debugging information collection instruction, where the debugging information collection instruction includes a collection identifier.

S206,第i+1分布式节点根据调试信息收集指令进入在线调试模式,并收集调试信息。S206, the i+1th distributed node enters an online debugging mode according to the debugging information collection instruction, and collects debugging information.

S207,第i+1分布式节点将调试信息发送至服务器,其中,调试信息具有与调试信息收集指令对应的编号。S207, the i+1th distributed node sends the debugging information to the server, where the debugging information has a serial number corresponding to the debugging information collection instruction.

S208,第i分布式节点将调试信息收集指令发送至第i+2分布式节点。S208, the i-th distributed node sends the debugging information collection instruction to the i+2-th distributed node.

S209,服务器接收分布式节点发送的调试信息,并根据调试信息中的编号进行汇总以生成调试日志。S209, the server receives the debugging information sent by the distributed nodes, and summarizes according to the numbers in the debugging information to generate a debugging log.

每个需要收集调试信息的分布式节点在收集调试信息后,可将收集到的调试信息发送至服务器。服务器可接收各个分布式节点发送的调试信息,并根据调试信息中的编号对调试信息进行汇总。具体地,可将相同编号的调试信息进行合并,生成该编号对应的调试日志。Each distributed node that needs to collect debugging information can send the collected debugging information to the server after collecting the debugging information. The server can receive the debugging information sent by each distributed node, and summarize the debugging information according to the number in the debugging information. Specifically, debugging information of the same number may be combined to generate a debugging log corresponding to the number.

本申请的实施例中,服务器可具有访问接口,客户端可通过该访问接口调取服务器中的调试日志。具体地,客户端可接收用户输入的调试信息收集指令对应的编号,并根据该编号调取服务器中相应的调试日志。In the embodiment of the present application, the server may have an access interface, and the client may retrieve the debugging log in the server through the access interface. Specifically, the client may receive the number corresponding to the debugging information collection instruction input by the user, and retrieve the corresponding debugging log in the server according to the number.

本申请的一个实施例中,调试日志为可标记扩展语言XML格式。从而,便于客户端从服务器读取调试日志后进行结构化展示,便于分析,进一步提升了调试效率。In an embodiment of the present application, the debug log is in Markup Extensible Language XML format. Therefore, it is convenient for the client to read the debugging log from the server and display it in a structured manner, which is convenient for analysis and further improves the debugging efficiency.

下面通过以下应用场景对本实施例的分布式系统的在线调试方法进行说明。如图3所示,服务端的分布式系统由FE(Front End,前端)、Merger(业务逻辑节点)、QR(Query Rewrite,查询词改写节点)、DN(Data Node,数据节点)、SN(Search Node,检索节点)和ORS(OnlineRanking System,在线打分系统)这六个节点组成。需要收集调试信息的各个节点(ORS、SN、Merger三个节点),可通过Trace API来收集调试信息,并在完成调试后,通过网络将收集到的调试信息发送到服务器Trace Server(跟踪服务器)。The online debugging method of the distributed system in this embodiment will be described below through the following application scenarios. As shown in Figure 3, the distributed system of the server consists of FE (Front End, front end), Merger (business logic node), QR (Query Rewrite, query word rewriting node), DN (Data Node, data node), SN (Search Node, Retrieval Node) and ORS (OnlineRanking System, Online Scoring System) are composed of six nodes. Each node that needs to collect debugging information (the three nodes of ORS, SN, and Merger) can collect debugging information through the Trace API, and after completing the debugging, send the collected debugging information to the server Trace Server (trace server) through the network .

具体地,Merger节点在接收到一个指令时,可判断该指令中是否包括收集标识,如果包括,则进入在线调试模式,并收集调试信息(Merger Trace信息),发送至Trace Server。其中,Merger节点在判断包括收集标识之后,或者在收集调试信息完成之后,可向SN节点发送指令。SN节点可判断该指令中是否包括收集标识,如果包括,则进入在线调试模式,并收集调试信息(SN Trace信息),发送至Trace Server。SN节点在判断包括收集标识之后,或者在收集调试信息完成之后,可向ORS节点发送指令。ORS节点可判断该指令中是否包括收集标识,如果包括,则进入在线调试模式,并收集调试信息(ORS Trace信息),发送至Trace Server。Specifically, when a Merger node receives an instruction, it can determine whether the instruction includes a collection flag, and if so, enters the online debugging mode, collects debugging information (Merger Trace information), and sends it to the Trace Server. Wherein, the Merger node may send an instruction to the SN node after judging that the collection identifier is included, or after the collection of debugging information is completed. The SN node can judge whether the instruction includes the collection flag, and if so, enter the online debugging mode, collect debugging information (SN Trace information), and send it to the Trace Server. The SN node may send instructions to the ORS node after judging that the collection identifier is included, or after the collection of debugging information is completed. The ORS node can determine whether the instruction includes the collection flag, and if it does, it will enter the online debugging mode, collect debugging information (ORS Trace information), and send it to the Trace Server.

Trace Server可将ORS、SN、Merger三个节点的调试信息进行合并,生成一次调试请求对应的完整调试日志进行存储。客户端可通过UI(User Interface,用户界面)根据用户输入的编号读取服务器中存储的相应的调试日志,并进行解析,并在解析后结构化展示。Trace Server can merge the debugging information of ORS, SN, and Merger nodes, and generate a complete debugging log corresponding to a debugging request for storage. The client can read the corresponding debugging log stored in the server according to the number input by the user through the UI (User Interface, user interface), parse it, and display it in a structured manner after parsing.

本申请的实施例中,收集调试信息的节点可在各自的调试信息收集完成后,各自分别向服务器发送调试信息,从而避免了发生阻塞的可能,且与其他处理请求的处理结果并不一起返回,因此,不影响处理请求的处理结果的响应时间,也不会侵入其他处理请求的处理结果,服务器中的敏感信息不易泄漏,提高了数据的安全性。In the embodiment of this application, the nodes that collect the debugging information can send the debugging information to the server respectively after the collection of the debugging information is completed, thereby avoiding the possibility of blocking, and the processing results of other processing requests are not returned together Therefore, the response time of the processing result of the processing request is not affected, and the processing results of other processing requests will not be invaded, and the sensitive information in the server is not easy to leak, which improves the security of the data.

本申请实施例的分布式系统的在线调试方法,可根据编号对不同调试信息收集指令对应调试信息进行区别,且各个分布式节点可将收集的调试信息发送至服务器,以便服务器根据调试信息的编号对调试信息进行汇总以生成调试日志,降低了调试日志收集难度,提高了日志收集效率,且便于调试人员对调试信息进行整体分析,能够提高了调试效率。The online debugging method of the distributed system in the embodiment of the present application can distinguish the debugging information corresponding to different debugging information collection instructions according to the number, and each distributed node can send the collected debugging information to the server, so that the server can The debugging information is summarized to generate the debugging log, which reduces the difficulty of collecting the debugging log, improves the efficiency of log collection, and facilitates the overall analysis of the debugging information by the debugging personnel, which can improve the debugging efficiency.

与上述实施例提供的分布式系统的在线调试方法相对应,本申请还提出一种分布式系统的在线调试系统。Corresponding to the online debugging method of the distributed system provided in the foregoing embodiments, the present application also proposes an online debugging system of the distributed system.

图4为根据本申请一个实施例的分布式系统的在线调试系统的结构示意图。Fig. 4 is a schematic structural diagram of an online debugging system of a distributed system according to an embodiment of the present application.

如图4所示,根据本申请实施例的分布式系统的在线调试系统,包括:多个分布式节点10和服务器20。As shown in FIG. 4 , the online debugging system of a distributed system according to an embodiment of the present application includes: a plurality of distributed nodes 10 and a server 20 .

具体地,分布式节点10用于接收调试信息收集指令,其中,调试信息收集指令包括收集标识,并根据调试信息收集指令进入在线调试模式,并收集调试信息,以及将调试信息发送至服务器20,其中,调试信息具有与调试信息收集指令对应的编号,并将调试信息收集指令发送至下一分布式节点。Specifically, the distributed node 10 is configured to receive a debugging information collection instruction, wherein the debugging information collection instruction includes a collection identifier, enter an online debugging mode according to the debugging information collection instruction, collect debugging information, and send the debugging information to the server 20, Wherein, the debugging information has a serial number corresponding to the debugging information collection instruction, and the debugging information collection instruction is sent to the next distributed node.

服务器20用于接收分布式节点发送的调试信息。The server 20 is used for receiving the debugging information sent by the distributed nodes.

其中,多个分布式节点10属于同一分布式系统。每个分布式节点10可用于处理服务器发送的请求。客户端用户可根据处理需求向服务器20发送应用程序的业务请求或在线调试请求等。例如,对于业务请求来说,可根据各个分布式节点的空闲度分配相应的节点来处理该业务请求。而对于在线调试请求来说,可根据客户端请求向需要收集调试信息的节点发送调试信息收集指令。Wherein, multiple distributed nodes 10 belong to the same distributed system. Each distributed node 10 can be used to process the request sent by the server. The client user may send a service request or an online debugging request of the application program to the server 20 according to processing requirements. For example, for a service request, corresponding nodes may be allocated to process the service request according to the idleness of each distributed node. For an online debugging request, a debugging information collection instruction can be sent to a node that needs to collect debugging information according to a client request.

其中,该编号可以是预设长度的字符串。举例来说,该编号可以是64位的数字串。每个编号对应一次在线调试,用于唯一标识异常调试信息。也就是说,无论在哪个分布式节点中的调试信息,只要是同一次在线调试的调试信息,都具有相同的编号。从而可根据调试信息的编号将不同分布式节点上的同一调试信息收集指令对应调试信息进行关联,便于调试人员对调试信息进行整体分析。Wherein, the number may be a character string with a preset length. For example, the number can be a 64-bit numeric string. Each number corresponds to an online debugging and is used to uniquely identify abnormal debugging information. That is to say, no matter which distributed node the debugging information is in, as long as it is the debugging information of the same online debugging, it has the same number. Therefore, the debug information corresponding to the same debug information collection instruction on different distributed nodes can be correlated according to the number of the debug information, so that debuggers can analyze the debug information as a whole.

服务器20可根据客户端发送的请求向分布式系统中相应的分布式节点10发送相应的指令,以通过各个分布式节点10对对客户端的请求进行处理。其中,在线调试请求与业务请求不同,服务器20对于客户端的业务请求向分布式系统发送相应的业务处理指令;对于客户端的在线调试请求向分布式系统发送相应的调试信息收集指令。The server 20 can send corresponding instructions to the corresponding distributed nodes 10 in the distributed system according to the request sent by the client, so that each distributed node 10 can process the client's request. The online debugging request is different from the service request. The server 20 sends a corresponding service processing instruction to the distributed system for the client's service request; and sends a corresponding debugging information collection instruction to the distributed system for the client's online debugging request.

本申请的实施例中,为了区分应用程序的业务请求与在线调试请求,服务器20在发送调试信息收集指令时,可为调试信息收集指令设置收集标识。从而各个分布式节点10可根据接收到的请求是否包括该收集标识判断出接收到的是否为调试信息收集指令,后续可仅针对调试信息收集指令对应的日志进行收集,更加方便,提高信息收集的效率。In the embodiment of the present application, in order to distinguish the service request of the application program from the online debugging request, the server 20 may set a collection flag for the debugging information collection instruction when sending the debugging information collection instruction. Therefore, each distributed node 10 can judge whether the received request is a debugging information collection instruction according to whether the received request includes the collection identifier, and then only collect the logs corresponding to the debugging information collection instruction, which is more convenient and improves the efficiency of information collection. efficiency.

在本申请的一个实施例中,服务器20可根据客户端的请求向分布式系统中相应的分布式节点10发送调试信息收集指令,以进行调试,各个分布式节点10在调试过程中或者调试完成时,可将调试信息收集指令发送至下一分布式节点,以对下一分布式节点进行调试。In one embodiment of the present application, the server 20 can send debugging information collection instructions to the corresponding distributed nodes 10 in the distributed system according to the client's request for debugging. , the debugging information collection instruction can be sent to the next distributed node, so as to debug the next distributed node.

因此,第i分布式节点接收的调试信息收集指令可由服务器发送或由第i-1分布式节点发送。当第i分布式节点接收到调试信息收集指令后,可进入在线调试模式,即根据当前指令进行的操作为在线调试,则可对当前的处理过程进行记录,得到调试信息。第i分布式节点在收集调试信息后,可将收集到的调试信息发送至服务器20,以由服务器20根据其他分布式节点收到的同一调试信息收集指令对应的调试信息。Therefore, the debugging information collection instruction received by the i-th distributed node may be sent by the server or sent by the i-1th distributed node. When the i-th distributed node receives the debugging information collection instruction, it can enter the online debugging mode, that is, the operation performed according to the current instruction is online debugging, and the current processing process can be recorded to obtain debugging information. After the i-th distributed node collects the debugging information, it can send the collected debugging information to the server 20, so that the server 20 can collect the debugging information corresponding to the instruction according to the same debugging information received by other distributed nodes.

第i分布式节点在完成调试后,可向其下一分布式节点(即第i+1分布式节点)发送调试信息收集指令,以控制第i+1分布式节点收集调试信息。由此,可依次控制每个需要收集调试信息的分布式节点收集调试信息。且每个需要收集调试信息的分布式节点都将收集的调试信息发送至服务器20。服务器20可接收各个分布式节点发送的调试信息,并根据调试信息中的编号对调试信息进行汇总。具体地,服务器20可将相同编号的调试信息进行合并,生成该编号对应的调试日志。After the i-th distributed node finishes debugging, it can send a debugging information collection instruction to its next distributed node (ie, the i+1-th distributed node), so as to control the i+1-th distributed node to collect debugging information. Thus, each distributed node that needs to collect debugging information can be sequentially controlled to collect debugging information. And each distributed node that needs to collect debugging information sends the collected debugging information to the server 20 . The server 20 can receive the debugging information sent by each distributed node, and summarize the debugging information according to the numbers in the debugging information. Specifically, the server 20 may combine debugging information of the same number to generate a debugging log corresponding to the number.

本申请的实施例中,服务器20可具有访问接口,客户端可通过该访问接口调取服务器中的调试日志。具体地,客户端可接收用户输入的调试信息收集指令对应的编号,并根据该编号调取服务器中相应的调试日志。In the embodiment of the present application, the server 20 may have an access interface, and the client may retrieve the debugging log in the server through the access interface. Specifically, the client may receive the number corresponding to the debugging information collection instruction input by the user, and retrieve the corresponding debugging log in the server according to the number.

本申请的一个实施例中,调试日志为可标记扩展语言XML格式。从而,便于客户端从服务器读20取调试日志后进行结构化展示,便于分析,进一步提升了调试效率。In an embodiment of the present application, the debug log is in Markup Extensible Language XML format. Therefore, it is convenient for the client to read and fetch the debugging logs from the server for structured display, which is convenient for analysis and further improves the debugging efficiency.

下面通过以下应用场景对本实施例的分布式系统的在线调试方法进行说明。如图3所示,服务端的分布式系统由FE(Front End,前端)、Merger(业务逻辑节点)、QR(Query Rewrite,查询词改写节点)、DN(Data Node,数据节点)、SN(Search Node,检索节点)和ORS(OnlineRanking System,在线打分系统)这六个节点组成。需要收集调试信息的各个节点(ORS、SN、Merger三个节点),可通过Trace API来收集调试信息,并在完成调试后,通过网络将收集到的调试信息发送到服务器Trace Server(跟踪服务器)。The online debugging method of the distributed system in this embodiment will be described below through the following application scenarios. As shown in Figure 3, the distributed system of the server consists of FE (Front End, front end), Merger (business logic node), QR (Query Rewrite, query word rewriting node), DN (Data Node, data node), SN (Search Node, Retrieval Node) and ORS (OnlineRanking System, Online Scoring System) are composed of six nodes. Each node that needs to collect debugging information (the three nodes of ORS, SN, and Merger) can collect debugging information through the Trace API, and after completing the debugging, send the collected debugging information to the server Trace Server (trace server) through the network .

具体地,Merger节点在接收到一个指令时,可判断该指令中是否包括收集标识,如果包括,则进入在线调试模式,并收集调试信息,发送至Trace Server。其中,Merger节点在判断包括收集标识之后,或者在收集调试信息完成之后,可向SN节点发送指令。SN节点可判断该指令中是否包括收集标识,如果包括,则进入在线调试模式,并收集调试信息,发送至Trace Server。SN节点在判断包括收集标识之后,或者在收集调试信息完成之后,可向ORS节点发送指令。ORS节点可判断该指令中是否包括收集标识,如果包括,则进入在线调试模式,并收集调试信息,发送至Trace Server。Specifically, when a Merger node receives an instruction, it can determine whether the instruction includes a collection flag, and if so, enters an online debugging mode, collects debugging information, and sends it to the Trace Server. Wherein, the Merger node may send an instruction to the SN node after judging that the collection identifier is included, or after the collection of debugging information is completed. The SN node can judge whether the command includes the collection flag, and if it does, enter the online debugging mode, collect debugging information, and send it to the Trace Server. The SN node may send instructions to the ORS node after judging that the collection identifier is included, or after the collection of debugging information is completed. The ORS node can determine whether the instruction includes the collection flag, and if so, enters the online debugging mode, collects debugging information, and sends it to the Trace Server.

Trace Server可将ORS、SN、Merger三个节点的调试信息进行合并,生成一次调试请求对应的完整调试日志进行存储。客户端可通过UI(User Interface,用户界面)根据用户输入的编号读取服务器中存储的相应的调试日志,并进行解析,并在解析后结构化展示。Trace Server can merge the debugging information of ORS, SN, and Merger nodes, and generate a complete debugging log corresponding to a debugging request for storage. The client can read the corresponding debugging log stored in the server according to the number input by the user through the UI (User Interface, user interface), parse it, and display it in a structured manner after parsing.

本申请的实施例中,收集调试信息的节点可在各自的调试信息收集完成后,各自分别向服务器发送调试信息,从而避免了发生阻塞的可能,且与其他处理请求的处理结果并不一起返回,因此,不影响处理请求的处理结果的响应时间,也不会侵入其他处理请求的处理结果,服务器中的敏感信息不易泄漏,提高了数据的安全性。In the embodiment of this application, the nodes that collect the debugging information can send the debugging information to the server respectively after the collection of the debugging information is completed, thereby avoiding the possibility of blocking, and the processing results of other processing requests are not returned together Therefore, the response time of the processing result of the processing request is not affected, and the processing results of other processing requests will not be invaded, and the sensitive information in the server is not easy to leak, which improves the security of the data.

本申请实施例的分布式系统的在线调试系统,可分布式系统中的分布式节点在接收包括收集标识的调试信息收集指令,进入在线调试模式,并收集调试信息,发送至服务器,通过收集标识的标识作用可有针对性地对调试信息进行收集,从而避免调试信息与其他运行日志信息相混淆,且能够大大降低日志膨胀的程度。此外,调试信息具有与调试信息收集指令对应的编号,从而可根据编号对不同调试信息收集指令对应调试信息进行区别,便于调试信息的收集、查找和关联。因此,本申请实施例降低了调试日志收集难度,提高了日志收集效率,且便于调试人员对调试信息进行整体分析,能够提高调试效率。In the online debugging system of the distributed system in the embodiment of the present application, the distributed nodes in the distributed system can enter the online debugging mode after receiving the debugging information collection instruction including the collection identifier, and collect the debugging information, send it to the server, and pass the collection identifier The identification function of can collect debugging information in a targeted manner, so as to avoid confusion between debugging information and other running log information, and can greatly reduce the degree of log expansion. In addition, the debugging information has a number corresponding to the debugging information collection instruction, so that the debugging information corresponding to different debugging information collection instructions can be distinguished according to the number, which facilitates the collection, search and association of debugging information. Therefore, the embodiment of the present application reduces the difficulty of collecting debugging logs, improves the efficiency of collecting logs, and facilitates overall analysis of debugging information by debugging personnel, thereby improving debugging efficiency.

流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent modules, segments or portions of code comprising one or more executable instructions for implementing specific logical functions or steps of the process , and the scope of preferred embodiments of the present application includes additional implementations in which functions may be performed out of the order shown or discussed, including in substantially simultaneous fashion or in reverse order depending on the functions involved, which shall It should be understood by those skilled in the art to which the embodiments of the present application belong.

在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。The logic and/or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a sequenced listing of executable instructions for implementing logical functions, which can be embodied in any computer-readable medium, For use with instruction execution systems, devices, or devices (such as computer-based systems, systems including processors, or other systems that can fetch instructions from instruction execution systems, devices, or devices and execute instructions), or in conjunction with these instruction execution systems, devices or equipment for use. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate or transmit a program for use in or in conjunction with an instruction execution system, device or device. More specific examples (non-exhaustive list) of computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable medium on which the program can be printed, since the program can be read, for example, by optically scanning the paper or other medium, followed by editing, interpretation or other suitable processing if necessary. processing to obtain the program electronically and store it in computer memory.

应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that each part of the present application may be realized by hardware, software, firmware or a combination thereof. In the embodiments described above, various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques known in the art: Discrete logic circuits, ASICs with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.

本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium. During execution, one or a combination of the steps of the method embodiments is included.

此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.

上述提到的存储介质可以是只读存储器,磁盘或光盘等。The storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like.

在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or characteristic is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the specific features, structures, materials or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

尽管已经示出和描述了本申请的实施例,本领域的普通技术人员可以理解:在不脱离本申请的原理和宗旨的情况下可以对这些实施例进行多种变化、修改、替换和变型,本申请的范围由权利要求及其等同限定。Although the embodiments of the present application have been shown and described, those skilled in the art can understand that various changes, modifications, substitutions and variations can be made to these embodiments without departing from the principle and spirit of the present application. The scope of the application is defined by the claims and their equivalents.

Claims (11)

1. a kind of on-line debugging method of distributed system, it is characterised in that comprise the following steps:
I-th distributed node receives Debugging message and collects instruction, wherein, the Debugging message, which collects instruction, to be included collecting mark, Wherein, i is positive integer;
I-th distributed node collects instruction according to the Debugging message and enters on-line debugging pattern, and collects Debugging message;
I-th distributed node sends the Debugging message to server, wherein, the Debugging message has to be adjusted with described Try information and instruct corresponding numbering;
The Debugging message is collected instruction and sent to i+1 distributed node by i-th distributed node.
2. the on-line debugging method of distributed system as claimed in claim 1, it is characterised in that also include:
The i+1 distributed node receives Debugging message and collects instruction, wherein, the Debugging message, which collects instruction, to be included collecting Mark;
The i+1 distributed node collects instruction according to the Debugging message and enters on-line debugging pattern, and collects Debugging message;
The i+1 distributed node sends the Debugging message to server, wherein, the Debugging message have with it is described Debugging message collects the corresponding numbering of instruction;
The Debugging message is collected instruction and sent to the i-th+2 distributed node by i-th distributed node.
3. the on-line debugging method of distributed system as claimed in claim 1, it is characterised in that also include:
The server receives the Debugging message that distributed node is sent, and the numbering in the Debugging message collected with Generate debugging log.
4. the on-line debugging method of distributed system as claimed in claim 1, it is characterised in that the described i-th distributed section The Debugging message that point is received collects instruction and is sent by the server or sent by the i-th -1 distributed node.
5. the on-line debugging method of distributed system as claimed in claim 3, it is characterised in that wherein, the server With access interface, for transferring the debugging log by the access interface.
6. the on-line debugging method of distributed system as claimed in claim 3, it is characterised in that the debugging log is can Mark extension Language XML form.
7. a kind of on-line debugging system of distributed system, it is characterised in that including:Multiple distributed nodes and server, Wherein,
Distributed node is used to receive Debugging message collection instruction, wherein, the Debugging message is collected to instruct to include collecting and identified, And on-line debugging pattern is entered according to Debugging message collection instruction, and Debugging message is collected, and the Debugging message is sent out The server is delivered to, wherein, the Debugging message has numbering corresponding with Debugging message collection instruction, and will be described Debugging message is collected instruction and sent to next distributed node;
The server is used to receive the Debugging message that the distributed node is sent.
8. the on-line debugging system of distributed system as claimed in claim 7, it is characterised in that
The numbering that the server is additionally operable in the Debugging message is collected to generate debugging log.
9. the on-line debugging system of distributed system as claimed in claim 7, it is characterised in that the distributed node connects The Debugging message received collects instruction and is sent by the server or sent by a upper distributed node for the distributed node.
10. the on-line debugging system of distributed system as claimed in claim 8, it is characterised in that wherein, the server With access interface, for transferring the debugging log by the access interface.
11. the on-line debugging system of distributed system as claimed in claim 8, it is characterised in that the debugging log is can Mark extension Language XML form.
CN201610035223.0A 2016-01-19 2016-01-19 Online debugging method and system for distributed system Active CN106980572B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610035223.0A CN106980572B (en) 2016-01-19 2016-01-19 Online debugging method and system for distributed system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610035223.0A CN106980572B (en) 2016-01-19 2016-01-19 Online debugging method and system for distributed system

Publications (2)

Publication Number Publication Date
CN106980572A true CN106980572A (en) 2017-07-25
CN106980572B CN106980572B (en) 2021-03-02

Family

ID=59339857

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610035223.0A Active CN106980572B (en) 2016-01-19 2016-01-19 Online debugging method and system for distributed system

Country Status (1)

Country Link
CN (1) CN106980572B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108365982A (en) * 2018-02-06 2018-08-03 北京小米移动软件有限公司 Unit exception adjustment method, device, equipment and storage medium
CN109408310A (en) * 2018-10-19 2019-03-01 网易(杭州)网络有限公司 Adjustment method, server and the readable storage medium storing program for executing of server
CN110018956A (en) * 2019-01-28 2019-07-16 阿里巴巴集团控股有限公司 Using adjustment method and relevant apparatus
CN112328491A (en) * 2020-11-18 2021-02-05 Oppo广东移动通信有限公司 Tracking message output method, electronic device and storage medium
US20210089419A1 (en) * 2019-09-25 2021-03-25 Alibaba Group Holding Limited Debugging unit and processor
CN114371990A (en) * 2021-11-30 2022-04-19 北京仿真中心 Distributed software debugging test method and system based on log
CN114741313A (en) * 2022-04-28 2022-07-12 重庆长安汽车股份有限公司 Service gateway-based SOA service system online debugging method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1703870A (en) * 2002-10-10 2005-11-30 思科技术公司 System and method for distributed diagnostics in a communication system.
CN101043375A (en) * 2007-03-15 2007-09-26 华为技术有限公司 Distributed system journal collecting method and system
US20090327458A1 (en) * 2008-06-27 2009-12-31 Microsoft Corporation Online predicate checking for distributed systems
CN103036961A (en) * 2012-12-07 2013-04-10 蓝盾信息安全技术股份有限公司 Distributed collection and storage method of journal
CN105119752A (en) * 2015-09-08 2015-12-02 北京京东尚科信息技术有限公司 Distributed log acquisition method, device and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1703870A (en) * 2002-10-10 2005-11-30 思科技术公司 System and method for distributed diagnostics in a communication system.
CN101043375A (en) * 2007-03-15 2007-09-26 华为技术有限公司 Distributed system journal collecting method and system
US20090327458A1 (en) * 2008-06-27 2009-12-31 Microsoft Corporation Online predicate checking for distributed systems
CN103036961A (en) * 2012-12-07 2013-04-10 蓝盾信息安全技术股份有限公司 Distributed collection and storage method of journal
CN105119752A (en) * 2015-09-08 2015-12-02 北京京东尚科信息技术有限公司 Distributed log acquisition method, device and system

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108365982A (en) * 2018-02-06 2018-08-03 北京小米移动软件有限公司 Unit exception adjustment method, device, equipment and storage medium
CN109408310A (en) * 2018-10-19 2019-03-01 网易(杭州)网络有限公司 Adjustment method, server and the readable storage medium storing program for executing of server
CN109408310B (en) * 2018-10-19 2022-02-18 网易(杭州)网络有限公司 Debugging method of server, server and readable storage medium
CN110018956A (en) * 2019-01-28 2019-07-16 阿里巴巴集团控股有限公司 Using adjustment method and relevant apparatus
US20210089419A1 (en) * 2019-09-25 2021-03-25 Alibaba Group Holding Limited Debugging unit and processor
US11755441B2 (en) * 2019-09-25 2023-09-12 Alibaba Group Holding Limited Debugging unit and processor
CN112328491A (en) * 2020-11-18 2021-02-05 Oppo广东移动通信有限公司 Tracking message output method, electronic device and storage medium
CN114371990A (en) * 2021-11-30 2022-04-19 北京仿真中心 Distributed software debugging test method and system based on log
CN114741313A (en) * 2022-04-28 2022-07-12 重庆长安汽车股份有限公司 Service gateway-based SOA service system online debugging method

Also Published As

Publication number Publication date
CN106980572B (en) 2021-03-02

Similar Documents

Publication Publication Date Title
US20260111325A1 (en) Graph-based impact analysis of misconfigured or compromised cloud resources
CN106980572A (en) The on-line debugging method and system of distributed system
US11275641B2 (en) Automatic correlation of dynamic system events within computing devices
CN108537543B (en) Parallel processing method, device, equipment and storage medium for blockchain data
US8290994B2 (en) Obtaining file system view in block-level data storage systems
US10491453B2 (en) Correlating computing network events
CN114116811B (en) Log processing method, device, equipment and storage medium
WO2018120721A1 (en) Method and system for testing user interface, electronic device, and computer readable storage medium
CN112866023A (en) Network detection method, model training method, device, equipment and storage medium
CN110287696B (en) Detection method, device and equipment for rebound shell process
CN104468189B (en) A kind of method for the automatic upgrading BIOS of different clients version
CN112732567B (en) Mock data testing method and device based on ip, electronic equipment and storage medium
CN102075368A (en) Method, device and system for diagnosing service failure
CN108255620A (en) A kind of business logic processing method, apparatus, service server and system
CN107203464A (en) The localization method and device of traffic issues
CN106095483A (en) The Automation arranging method of service and device
CN115658452A (en) Buried point checking method, buried point checking device, readable storage medium and electronic equipment
CN112463574B (en) Software testing method, device, system, equipment and storage medium
CN106845228A (en) A kind of method and apparatus for detecting rogue program
CN114116503A (en) Test method, test device, electronic equipment and storage medium
CN116484131A (en) Method and device for processing buried points
CN107402868B (en) Device information collection method and device for physical machine
CN113379525A (en) Financial supervision method and device, electronic equipment and storage medium
CN115718728B (en) A data verification method, apparatus, device, and medium for a distributed system
CN107577802A (en) A kind of data base management method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211116

Address after: Room 507, floor 5, building 3, No. 969, Wenyi West Road, Wuchang Street, Yuhang District, Hangzhou City, Zhejiang Province

Patentee after: Zhejiang tmall Technology Co., Ltd

Address before: P.O. Box 847, 4th floor, Grand Cayman capital building, British Cayman Islands

Patentee before: Alibaba Group Holdings Limited

TR01 Transfer of patent right