CN106919462A - A kind of method and device for generating processor fault record - Google Patents
A kind of method and device for generating processor fault record Download PDFInfo
- Publication number
- CN106919462A CN106919462A CN201510992820.8A CN201510992820A CN106919462A CN 106919462 A CN106919462 A CN 106919462A CN 201510992820 A CN201510992820 A CN 201510992820A CN 106919462 A CN106919462 A CN 106919462A
- Authority
- CN
- China
- Prior art keywords
- instruction address
- processing unit
- type
- entry
- control chip
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0721—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment within a central processing unit [CPU]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0787—Storage of error reports, e.g. persistent data storage, storage using memory protection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Debugging And Monitoring (AREA)
Abstract
Description
技术领域technical field
本申请涉及计算机领域,尤其涉及一种生成处理器故障记录的方法及装置。The present application relates to the field of computers, in particular to a method and device for generating processor fault records.
背景技术Background technique
中央处理器(central processing unit,CPU)是计算机,服务器等具有数据处理功能的设备的运算核心和控制核心。一台设备中的CPU通过总线与设备中的存储器(memory)相连,并通过解释和执行所述存储器中的程序指令,来实现对数据的处理。The central processing unit (CPU) is the computing core and control core of devices with data processing functions such as computers and servers. A CPU in a device is connected to a memory in the device through a bus, and processes data by interpreting and executing program instructions in the memory.
现有的CPU通常包括算数逻辑单元(arithmetic logic unit,ALU)、寄存器(register)、高速缓冲存储器(Cache)以及实现三者之间联系的数据、控制及状态总线(bus)。The existing CPU usually includes an arithmetic logic unit (arithmetic logic unit, ALU), a register (register), a cache memory (Cache), and a data, control, and status bus (bus) for realizing the connection among the three.
在CPU运行的程序中,一系列指令会被多次循环执行,直到到达程序中某一预设的条件。在实际运行过程中,由于设计缺陷或实际应用环境中无法预料的原因,所述预设的条件长时间未到达,会导致所述一系列指令被异常地多次循环执行。在长时间内异常地多次循环执行所述一系列指令,导致CPU无法响应其他业务的现象,被称作死循环(infinite loop)。In the program run by the CPU, a series of instructions will be executed repeatedly until a preset condition in the program is reached. During actual operation, due to design defects or unforeseen reasons in the actual application environment, the preset condition has not been reached for a long time, which will cause the series of instructions to be abnormally executed multiple times. The phenomenon that the series of instructions are executed abnormally multiple times within a long period of time, resulting in the CPU being unable to respond to other services, is called an infinite loop.
现有技术为了解决上述问题,在设备中通常会集成专用复位器件,例如看门狗,对设备中CPU的运行状况进行监测,并在确定CPU停止响应时,触发CPU中断并复位,从而避免包括CPU死循环在内的故障导致业务运行被长时间影响。例如,通过心跳检测等技术对CPU的运行状况进行监测。当CPU超过预设的时间没有对复位器件进行响应,引发心跳检测中的定时器溢出时,复位器件通过将CPU复位使CPU脱离死循环状态。In the prior art, in order to solve the above problems, a special reset device is usually integrated in the device, such as a watchdog, to monitor the operating status of the CPU in the device, and when it is determined that the CPU stops responding, it triggers the CPU to interrupt and reset, thereby avoiding including Faults including the CPU infinite loop cause business operations to be affected for a long time. For example, the operating status of the CPU is monitored through techniques such as heartbeat detection. When the CPU does not respond to the reset device within the preset time, causing the timer in the heartbeat detection to overflow, the reset device will reset the CPU to break the CPU out of the endless loop state.
CPU中的寄存器会保存复位之前发生CPU中断时的指令地址,然而由于当前的程序代码在设计时,往往存在较为复杂的多个函数之间的嵌套调用关系,中断时刻寄存器中存放的指令地址对应的指令不一定是导致发生死循环的函数中的指令。因此,利用中断时刻寄存器中的指令地址难以准确定位死循环发生的函数。The register in the CPU will save the instruction address when the CPU interrupt occurs before the reset. However, since the current program code is designed, there are often more complex nested call relationships between multiple functions. The instruction address stored in the register at the time of the interrupt The corresponding instruction is not necessarily the instruction in the function that causes the infinite loop to occur. Therefore, it is difficult to accurately locate the function where the infinite loop occurs by using the instruction address in the register at the time of interruption.
现有的CPU死循环定位技术更多依赖开发人员的经验来完成。开发人员通过联想和猜测,分析与中断时刻寄存器中存放的地址对应的指令相关的函数,利用调试工具来定位。然而很多实际中出现的死循环都是偶发的,或者不是每次运行都会出现的,复现的难度比较大。因此现有CPU死循环定位技术的低效率问题成为软件产品设计过程中一个亟待解决的问题。。Existing CPU infinite loop positioning technology relies more on the experience of developers to complete. Through association and guessing, the developer analyzes the function related to the instruction corresponding to the address stored in the register at the time of interruption, and uses the debugging tool to locate it. However, many dead loops that appear in practice are sporadic, or do not appear every time it is run, and it is relatively difficult to reproduce. Therefore, the low efficiency of the existing CPU dead-loop positioning technology has become an urgent problem to be solved in the software product design process. .
发明内容Contents of the invention
有鉴于此,本申请提供了一种生成处理器故障记录的方法,用以提供故障发生时CPU中的更多信息,从而降低定位引发CPU死循环原因时的难度。In view of this, the present application provides a method for generating a processor fault record, which is used to provide more information in the CPU when the fault occurs, thereby reducing the difficulty of locating the cause of the CPU dead cycle.
本申请实施例提供的技术方案如下。The technical solutions provided by the embodiments of the present application are as follows.
第一方面,提供了一种生成处理器故障记录的方法,应用于包括控制芯片和中央处理器CPU的硬件平台中,所述方法包括:In a first aspect, a method for generating a processor fault record is provided, which is applied to a hardware platform including a control chip and a central processing unit CPU, and the method includes:
所述控制芯片检测到所述CPU中的一个处理单元停止响应;The control chip detects that a processing unit in the CPU stops responding;
所述控制芯片通过联合测试行动组JTAG通道获取所述处理单元当前的程序计数器PC中的指令地址;The control chip obtains the instruction address in the current program counter PC of the processing unit through the joint test action group JTAG channel;
所述控制芯片创建一个包括所述当前的PC中的指令地址的第一类表项,并将所述第一类表项记录在指令地址表中;The control chip creates a first-type entry including the instruction address in the current PC, and records the first-type entry in the instruction address table;
所述控制芯片判断所述指令地址表中已记录的表项数量是否达到预设值,所述预设值大于等于2;The control chip judges whether the number of entries recorded in the instruction address table reaches a preset value, and the preset value is greater than or equal to 2;
如果所述指令地址表中已记录的表项数量没有达到所述预设值,所述控制芯片,返回执行所述通过JTAG通道获取所述处理单元当前的程序计数器PC中的指令地址步骤;If the number of entries recorded in the instruction address table does not reach the preset value, the control chip returns to execute the step of obtaining the instruction address in the current program counter PC of the processing unit through the JTAG channel;
如果所述指令地址表中记录的表项数量达到所述预设值,所述控制芯片触发所述CPU中断。If the number of entries recorded in the instruction address table reaches the preset value, the control chip triggers the CPU interrupt.
通过上述方案,控制芯片在处理单元停止响应时,通过JTAG通道在一段时间内,获取多项所述处理单元的PC中存放的指令地址并记录在指令地址表中。所述指令地址表反映了所述处理单元停止响应后一段时间内,所述处理单元运行程序的状况。由于当所述处理单元进入死循环时,会导致死循环函数及相应的一段代码行被反复调用,因此,本申请实施例提供的方法,相比于现有技术中,检测到所述处理单元停止响应后立即触发中断,并且只记录中断时刻所述处理单元正在运行的一个指令地址而言,更准确地反映出现死循环的函数和代码区间,有助于提高CPU故障分析的效率。Through the above scheme, when the processing unit stops responding, the control chip obtains multiple instruction addresses stored in the PC of the processing unit through the JTAG channel within a period of time and records them in the instruction address table. The instruction address table reflects the status of the program running by the processing unit within a period of time after the processing unit stops responding. Since when the processing unit enters an infinite loop, the infinite loop function and a corresponding piece of code line will be called repeatedly, therefore, the method provided in the embodiment of the present application, compared with the prior art, detects that the processing unit In terms of triggering an interrupt immediately after stopping the response, and only recording the address of an instruction that the processing unit is running at the time of the interruption, it can more accurately reflect the function and code interval where the infinite loop occurs, and help improve the efficiency of CPU fault analysis.
可选的,所述将所述第一类表项记录在指令地址表中之前,还包括:Optionally, before recording the first type of entry in the instruction address table, the method further includes:
所述控制芯片通过所述JTAG通道获取所述处理单元当前的函数返回地址寄存器中的指令地址;The control chip obtains the instruction address in the current function return address register of the processing unit through the JTAG channel;
所述控制芯片将所述当前的函数返回地址寄存器中的指令地址添加在所述第一类表项中。The control chip adds the instruction address in the current function return address register to the first type entry.
通过将所述处理单元当前的函数返回地址寄存器中的指令地址记录在所述指令地址表中,所述指令地址表能够更清楚地反映所述处理单元正在运行的函数之间的调用关系,进一步提高了CPU故障分析的效率。By recording the instruction address in the current function return address register of the processing unit in the instruction address table, the instruction address table can more clearly reflect the call relationship between the functions being run by the processing unit, further Improved the efficiency of CPU failure analysis.
可选的,所述将所述第一类表项记录在指令地址表中之前,还包括:Optionally, before recording the first type of entry in the instruction address table, the method further includes:
所述控制芯片获取当前的时刻,并将所述当前的时刻添加到所述第一类表项中。The control chip acquires the current time, and adds the current time to the first type entry.
可选的,所述将所述第一类表项记录在指令地址表中,包括:Optionally, the recording the first type of entry in the instruction address table includes:
所述控制芯片按照所述第一类表项生成的先后顺序,将所述第一类表项记录在指令地址表中。The control chip records the first type of entries in the instruction address table according to the sequence in which the first type of entries are generated.
可选的,所述控制芯片返回执行所述通过JTAG通道获取所述处理单元当前的PC中的指令地址步骤,包括:Optionally, the control chip returns to execute the step of obtaining the instruction address in the current PC of the processing unit through the JTAG channel, including:
所述控制芯片延迟时间段T1;The control chip delay time period T1;
在所述时间段T1到达后,所述控制芯片返回执行所述通过JTAG通道获取所述处理单元当前的PC中的指令地址步骤。After the time period T1 arrives, the control chip returns to execute the step of obtaining the instruction address in the current PC of the processing unit through the JTAG channel.
所述控制芯片通过延迟时间段T1,可以使得所述控制芯片在延迟的所述T1时间段内执行其他任务,避免所述控制芯片因循环执行读取所述处理单元PC指针并记录在所述第一类表项等步骤,占用所述控制芯片过多资源。By delaying the time period T1, the control chip can make the control chip perform other tasks during the delayed T1 time period, so as to prevent the control chip from reading the pointer of the processing unit PC due to cyclic execution and recording it in the Steps such as the first type of entries occupy too many resources of the control chip.
可选的,所述控制芯片检测到所述处理单元停止响应,包括:Optionally, the control chip detects that the processing unit stops responding, including:
所述控制芯片对所述处理单元进行心跳检测,确定所述处理单元停止响应。The control chip detects the heartbeat of the processing unit, and determines that the processing unit stops responding.
可选的,所述方法还包括,所述控制芯片按照表项存储的先后顺序,依次从所述指令地址表中读取一个第一类表项,针对读取的每个第一类表项,执行:Optionally, the method further includes that the control chip sequentially reads a first-type entry from the instruction address table according to the order in which the entries are stored, and for each read first-type entry ,implement:
所述控制芯片根据所述第一类表项中包括的PC中的指令地址,从预先设定的所述处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述PC中的指令地址对应的函数名和代码行;According to the instruction address in the PC included in the first type of entry, the control chip obtains the first The function name and code line corresponding to the instruction address in the PC included in the class entry;
所述控制芯片创建一个第二类表项,所述第二类表项中包括所述第一类表项中包括的PC中的指令地址对应的函数名和代码行;The control chip creates a second type of entry, the second type of entry includes the function name and code line corresponding to the instruction address in the PC included in the first type of entry;
所述控制芯片按照所述第二类表项生成的先后顺序,将所述第二类表项记录在函数运行表中。The control chip records the second type of entries in the function operation table according to the sequence in which the second type of entries are generated.
所述控制芯片通过根据指令地址表中的每个第一类表项,生成函数运行表中的一个第二类表项,所述函数运行表中记录了处理单元停止响应后一段时间内,运行的程序的函数名和代码行,相比于指令地址,所述函数名和代码行更直观的反映所述处理单元运行的程序,有助于进一步提高CPU故障分析效率。The control chip generates a second-type entry in the function operation table according to each first-type entry in the instruction address table, and the function operation table records that within a period of time after the processing unit stops responding, the operation The function name and code line of the program, compared with the instruction address, the function name and code line more intuitively reflect the program run by the processing unit, which helps to further improve the efficiency of CPU fault analysis.
可选的,所述方法还包括:Optionally, the method also includes:
所述控制芯片按照表项存储的先后顺序,依次从所述指令地址表中读取一个第一类表项,针对读取的每个第一类表项,执行:The control chip sequentially reads a first-type entry from the instruction address table according to the sequence in which the entries are stored, and executes for each read first-type entry:
所述控制芯片根据所述第一类表项中包括的PC中的指令地址,从预先设定的所述处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述PC中的指令地址对应的函数名和代码行;According to the instruction address in the PC included in the first type of entry, the control chip obtains the first The function name and code line corresponding to the instruction address in the PC included in the class entry;
所述控制芯片根据所述第一类表项中包括的函数返回寄存器中的指令地址,从预先设定的所述处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述函数返回寄存器中的指令地址对应的函数名和代码行;The control chip, according to the instruction address in the function return register included in the first type of entry, obtains the The function name and code line corresponding to the instruction address in the return register included in the first type of entry;
所述控制芯片创建一个第二类表项,所述第二类表项中包括所述第一类表项中包括的PC中的指令地址对应的函数名和代码行,以及所述第一类表项中包括的函数返回寄存器中的指令地址对应的函数名和代码行;The control chip creates a second type of table entry, the second type of table entry includes the function name and code line corresponding to the instruction address in the PC included in the first type of table entry, and the first type of table entry The function included in the item returns the function name and code line corresponding to the instruction address in the register;
所述控制芯片按照所述第二类表项生成的先后顺序,将所述第二类表项记录在函数运行表中。The control chip records the second type of entries in the function operation table according to the sequence in which the second type of entries are generated.
第二方面,提供了一种生成处理器故障记录的方法,其特征在于,应用于包括控制芯片和多核CPU的硬件平台中,所述多核CPU包括第一处理单元和第二处理单元,所述第一处理单元和所述第二处理单元是所述多核CPU中的从核,A second aspect provides a method for generating a processor fault record, which is characterized in that it is applied to a hardware platform including a control chip and a multi-core CPU, the multi-core CPU includes a first processing unit and a second processing unit, the The first processing unit and the second processing unit are slave cores in the multi-core CPU,
所述方法包括:The methods include:
所述控制芯片检测到所述第一处理单元停止响应;The control chip detects that the first processing unit stops responding;
如果所述第一处理单元对应的指令地址表中已记录的表项数量未到达第一预设值,所述控制芯片通过联合测试工作组JTAG通道获取所述第一处理单元当前的程序计数器PC中的指令地址,所述第一预设值大于等于2;If the number of entries recorded in the instruction address table corresponding to the first processing unit does not reach the first preset value, the control chip obtains the current program counter PC of the first processing unit through the joint test working group JTAG channel Instruction address in , the first preset value is greater than or equal to 2;
所述控制芯片创建一个第一类表项,将所述第一类表项记录在所述第一处理单元对应的指令地址表中,所述第一类表项中包括所述第一处理单元当前的PC中的指令地址;The control chip creates a first-type entry, and records the first-type entry in the instruction address table corresponding to the first processing unit, and the first-type entry includes the first processing unit The instruction address in the current PC;
如果所述第二处理单元对应的指令地址表中已记录的表项数量未达到第二预设值,所述控制芯片通过所述JTAG通道获取所述第二处理单元当前的PC中的指令地址,所述第二预设值大于等于2;If the number of entries recorded in the instruction address table corresponding to the second processing unit does not reach the second preset value, the control chip obtains the instruction address in the current PC of the second processing unit through the JTAG channel , the second preset value is greater than or equal to 2;
所述控制芯片创建另一个第一类表项,并将所述另一个第一类表项记录在所述第二处理单元对应的指令地址表中,所述另一个第一类表项中包括所述第二处理单元当前的PC中的指令地址;The control chip creates another first-type entry, and records the other first-type entry in the instruction address table corresponding to the second processing unit, and the other first-type entry includes an instruction address in the current PC of the second processing unit;
所述控制芯片判断所述第一处理单元对应的指令地址表和所述第二处理单元对应的指令地址表中的至少一个指令地址表中记录的表项数量是否达到对应的预设值;The control chip determines whether the number of entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches a corresponding preset value;
如果所述第一处理单元对应的指令地址表和所述第二处理单元对应的指令地址表中的至少一个指令地址表中记录的表项数量没有达到所述对应的预设值,所述控制芯片返回执行所述通过所述JTAG通道获取所述第一处理单元当前的PC中的指令地址步骤;If the number of entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit does not reach the corresponding preset value, the control The chip returns to execute the step of obtaining the instruction address in the current PC of the first processing unit through the JTAG channel;
如果所述第一处理单元对应的指令地址表和所述第二处理单元对应的指令地址表中记录的表项数量均达到对应的预设值,所述控制单元触发所述多核CPU中断。If the number of entries recorded in the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit both reach a corresponding preset value, the control unit triggers an interrupt of the multi-core CPU.
通过上述方案,控制芯片在第一处理单元停止响应时,通过JTAG通道在一段时间内,获取多项所述第一处理单元的PC中存放的指令地址并记录在指令地址表中。所述指令地址表反映了所述第一处理单元停止响应后一段时间内,所述第一处理单元运行程序的状况。同时,所述控制芯片还记录与所述第一处理单元处于同一个多核CPU中的第二处理单元的PC中存放的指令地址。由于死循环时,会导致死循环函数及相应的一段代码行被反复调用,且在多核CPU中,不同从核之间的函数存在相互调用,因此,本申请提供的方法,相比于现有技术中,检测到所述第一处理单元停止响应后立即触发中断,并且只记录中断时刻所述第一处理单元和所述第二处理单元正在运行的指令地址而言,更准确地反映出现死循环的函数和代码区间,有助于提高CPU故障分析的效率。Through the above solution, when the first processing unit stops responding, the control chip obtains multiple instruction addresses stored in the PC of the first processing unit within a period of time through the JTAG channel and records them in the instruction address table. The instruction address table reflects the state of the program running by the first processing unit within a period of time after the first processing unit stops responding. At the same time, the control chip also records the instruction address stored in the PC of the second processing unit in the same multi-core CPU as the first processing unit. Due to an infinite loop, the infinite loop function and a corresponding section of code line will be called repeatedly, and in a multi-core CPU, there are mutual calls between functions between different slave cores. Therefore, the method provided by this application is better than the existing In the technology, an interrupt is triggered immediately after detecting that the first processing unit stops responding, and only the address of the instruction being executed by the first processing unit and the second processing unit at the time of the interruption is recorded, so as to more accurately reflect the occurrence of the dead Loop function and code section help to improve the efficiency of CPU fault analysis.
可选的,所述将所述第一类表项记录在所述第一处理单元对应的指令地址表中之前,还包括:Optionally, before recording the first type of entry in the instruction address table corresponding to the first processing unit, the method further includes:
所述控制芯片通过所述JTAG通道获取所述第一处理单元当前的函数返回地址寄存器中的指令地址;The control chip obtains the instruction address in the current function return address register of the first processing unit through the JTAG channel;
所述控制芯片将所述第一处理单元当前的函数返回地址寄存器中的指令地址添加在所述第一类表项中;The control chip adds the instruction address in the current function return address register of the first processing unit to the first type entry;
所述将所述另一个第一类表项记录在所述第二处理单元对应的指令地址表中之前,还包括:Before recording the other first-type entry in the instruction address table corresponding to the second processing unit, it also includes:
所述控制芯片通过所述JTAG通道获取所述第二处理单元当前的函数返回地址寄存器中的指令地址;The control chip obtains the instruction address in the current function return address register of the second processing unit through the JTAG channel;
所述控制芯片将所述第二处理单元当前的函数返回地址寄存器中的指令地址添加在所述另一个第一类表项中。The control chip adds the instruction address in the current function return address register of the second processing unit to the other first-type entry.
通过将所述第一处理单元以及所述第二处理单元当前的函数返回地址寄存器中的指令地址记录在相应的指令地址表中,所述指令地址表能够更清楚地反映相应的处理单元正在运行的函数之间的调用关系,进一步提高了CPU故障分析的效率。By recording the instruction addresses in the current function return address registers of the first processing unit and the second processing unit in the corresponding instruction address table, the instruction address table can more clearly reflect that the corresponding processing unit is running The call relationship between functions further improves the efficiency of CPU fault analysis.
可选的,所述控制芯片按照表项存储的先后顺序,依次从所述第一处理单元对应的指令地址表中读取一个第一类表项,针对读取的每个第一类表项,执行:Optionally, the control chip sequentially reads a first-type entry from the instruction address table corresponding to the first processing unit according to the order in which the entries are stored, and for each read first-type entry ,implement:
所述控制芯片根据所述第一类表项中包括的PC中的指令地址,从预先设定的所述第一处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述PC中的指令地址对应的函数名和代码行;According to the instruction address in the PC included in the first type of entry, the control chip obtains the The function name and code line corresponding to the instruction address in the PC included in the first type of entry;
所述控制芯片创建一个第二类表项,所述第二类表项中包括所述第一类表项中包括的PC中的指令地址对应的函数名和代码行;The control chip creates a second type of entry, the second type of entry includes the function name and code line corresponding to the instruction address in the PC included in the first type of entry;
所述控制芯片按照所述第二类表项生成的先后顺序,将所述第二类表项记录在第一处理单元对应的函数运行表中。The control chip records the second type of entries in the function operation table corresponding to the first processing unit according to the sequence in which the second type of entries are generated.
可选的,所述控制芯片按照表项存储的先后顺序,依次从所述第二处理单元对应的指令地址表中读取一个第一类表项,针对读取的每个第一类表项,执行:Optionally, the control chip sequentially reads a first-type entry from the instruction address table corresponding to the second processing unit according to the order in which the entries are stored, and for each read first-type entry ,implement:
所述控制芯片根据所述第一类表项中包括的PC中的指令地址,从预先设定的所述第二处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述PC中的指令地址对应的函数名和代码行;According to the instruction address in the PC included in the first type of entry, the control chip obtains the The function name and code line corresponding to the instruction address in the PC included in the first type of entry;
所述控制芯片创建一个第二类表项,所述第二类表项中包括所述第一类表项中包括的PC中的指令地址对应的函数名和代码行;The control chip creates a second type of entry, the second type of entry includes the function name and code line corresponding to the instruction address in the PC included in the first type of entry;
所述控制芯片按照所述第二类表项生成的先后顺序,将所述第二类表项记录在第二处理单元对应的函数运行表中。The control chip records the second type of entries in the function operation table corresponding to the second processing unit according to the sequence in which the second type of entries are generated.
通过上述方案,所述控制芯片通过根据第一指令地址表生成第一函数运行表中的第二类表项,以及根据第二指令地址表中的第一类表项生成第二函数运行表中的第二类表项,所述函数运行表中记录的相应的处理单元停止响应后一段时间内,运行的程序的函数名和代码行,相比于指令地址,所述函数名和代码行更直观的反映处理单元运行的程序,有助于进一步提高CPU故障分析效率。Through the above solution, the control chip generates the second type of entry in the first function operation table according to the first instruction address table, and generates the second type of entry in the second function operation table according to the first type of entry in the second instruction address table. The second type of table item, the function name and code line of the running program recorded in the function operation table within a period of time after the corresponding processing unit stops responding. Compared with the instruction address, the function name and code line are more intuitive The program reflecting the operation of the processing unit is helpful to further improve the efficiency of CPU failure analysis.
第三方面,提供了一种生成处理器故障记录的装置,其特征在于,应用于包括所述装置和第一中央处理器CPU的硬件平台中,所述第一CPU中包括至少一个处理单元,所述装置通过JTAG通道与所述第一CPU通信,所述装置包括:第二处理器、存储器和JTAG接口,所述第二处理器、所述存储器和所述JTAG接口通过总线相连;In a third aspect, there is provided a device for generating a processor fault record, which is characterized in that it is applied to a hardware platform including the device and a first central processing unit CPU, and the first CPU includes at least one processing unit, The device communicates with the first CPU through a JTAG channel, the device includes: a second processor, a memory, and a JTAG interface, and the second processor, the memory, and the JTAG interface are connected through a bus;
所述JTAG接口,用于通过所述JTAG通道,获取所述第一CPU中的处理单元的程序计数器PC中的指令地址,并将所述PC中的指令地址通过所述总线发送给所述第二处理器;The JTAG interface is used to obtain the instruction address in the program counter PC of the processing unit in the first CPU through the JTAG channel, and send the instruction address in the PC to the first CPU through the bus. two processors;
所述第二处理器,用于读取所述存储器中存储的程序代码,执行以下操作:The second processor is configured to read the program code stored in the memory, and perform the following operations:
检测到所述第一CPU中的处理单元停止响应;detecting that a processing unit in the first CPU has stopped responding;
通过JTAG通道获取所述处理单元当前的程序计数器PC中的指令地址;Obtain the instruction address in the current program counter PC of the processing unit through the JTAG channel;
创建一个包括所述当前的PC中的指令地址的第一类表项,并将所述第一类表项记录在指令地址表中;Create a first type entry including the instruction address in the current PC, and record the first type entry in the instruction address table;
判断所述指令地址表中已记录的表项数量是否达到预设值,所述预设值大于等于2;judging whether the number of entries recorded in the instruction address table reaches a preset value, and the preset value is greater than or equal to 2;
如果所述指令地址表中已记录的表项数量没有达到所述预设值,返回执行所述通过JTAG通道获取所述处理单元当前的程序计数器PC中的指令地址步骤;If the number of entries recorded in the instruction address table does not reach the preset value, return to the step of obtaining the instruction address in the current program counter PC of the processing unit through the JTAG channel;
如果所述指令地址表中已记录的表项数量达到所述预设值,触发所述第一CPU中断。If the number of entries recorded in the instruction address table reaches the preset value, an interrupt of the first CPU is triggered.
可选的,所述JTAG接口还用于,通过所述JTAG通道,获取所述第一CPU中的处理单元的当前的函数返回寄存器中的指令地址,并将当前的所述函数返回寄存器中的指令地址通过所述总线发送给所述第二处理器;Optionally, the JTAG interface is also used to obtain the instruction address in the current function return register of the processing unit in the first CPU through the JTAG channel, and return the current function return instruction address in the register. sending an instruction address to the second processor through the bus;
所述第二处理器还用于,在执行所述将所述第一类表项记录在指令地址表中之前,执行以下操作:The second processor is further configured to perform the following operations before performing the recording of the first type entry in the instruction address table:
通过所述JTAG通道获取所述处理单元当前的函数返回地址寄存器中的指令地址;Obtain the instruction address in the current function return address register of the processing unit through the JTAG channel;
将所述当前的函数返回地址寄存器中的指令地址添加在所述第一类表项中。Add the instruction address in the current function return address register to the first type entry.
可选的,所述第二处理器返回执行所述通过JTAG通道获取所述处理单元当前的程序计数器PC中的指令地址步骤,包括执行:Optionally, the second processor returns to execute the step of obtaining the instruction address in the current program counter PC of the processing unit through the JTAG channel, including executing:
延迟时间段T1;Delay period T1;
在所述时间段T1到达后,返回执行所述通过JTAG通道获取所述处理单元当前的程序计数器PC中的指令地址步骤。After the time period T1 arrives, return to the step of obtaining the instruction address in the current program counter PC of the processing unit through the JTAG channel.
可选的,所述第一CPU是多核CPU,所述处理单元是所述第一多核CPU的主核,Optionally, the first CPU is a multi-core CPU, and the processing unit is a main core of the first multi-core CPU,
所述第二处理器检测到所述第一CPU中的处理单元停止响应,包括执行:The second processor detects that a processing unit in the first CPU stops responding, including performing:
对所述主核进行心跳检测,确定所述主核停止响应。A heartbeat detection is performed on the main core, and it is determined that the main core stops responding.
可选的,所述第二处理器还用于,按照表项存储的先后顺序,依次从所述指令地址表中读取一个第一类表项,针对读取的每个第一类表项,执行:Optionally, the second processor is further configured to sequentially read a first-type entry from the instruction address table according to the order in which the entries are stored, and for each read first-type entry ,implement:
根据所述第一类表项中包括的PC中的指令地址,从预先设定的所述处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述PC中的指令地址对应的函数名和代码行;According to the instruction address in the PC included in the first type of entry, from the preset correspondence between the instruction address in the processing unit and the function name and code line, query and obtain the first type of entry The included function name and code line corresponding to the instruction address in the PC;
创建一个第二类表项,所述第二类表项中包括所述第一类表项中包括的PC中的指令地址对应的函数名和代码行;Create a second type of entry, the second type of entry includes the function name and code line corresponding to the instruction address in the PC included in the first type of entry;
按照所述第二类表项生成的先后顺序,将所述第二类表项记录在函数运行表中。According to the order in which the second-type entries are generated, the second-type entries are recorded in the function operation table.
可选的,所述第二处理器还用于,按照表项存储的先后顺序,依次从所述指令地址表中读取一个第一类表项,针对读取的每个第一类表项,执行:Optionally, the second processor is further configured to sequentially read a first-type entry from the instruction address table according to the order in which the entries are stored, and for each read first-type entry ,implement:
根据所述第一类表项中包括的PC中的指令地址,从预先设定的所述处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述PC中的指令地址对应的函数名和代码行;According to the instruction address in the PC included in the first type of entry, from the preset correspondence between the instruction address in the processing unit and the function name and code line, query and obtain the first type of entry The included function name and code line corresponding to the instruction address in the PC;
根据所述第一类表项中包括的函数返回寄存器中的指令地址,从预先设定的所述处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述函数返回寄存器中的指令地址对应的函数名和代码行;According to the instruction address in the function return register included in the first type table item, from the preset correspondence between the instruction address in the processing unit, the function name and the code line, query and obtain the first type table The function included in the item returns the function name and code line corresponding to the instruction address in the register;
创建一个第二类表项,所述第二类表项中包括所述第一类表项中包括的PC中的指令地址对应的函数名和代码行,以及所述第一类表项中包括的函数返回寄存器中的指令地址对应的函数名和代码行;Create a second type of entry, the second type of entry includes the function name and code line corresponding to the instruction address in the PC included in the first type of entry, and the first type of entry included in the The function returns the function name and code line corresponding to the instruction address in the register;
按照所述第二类表项生成的先后顺序,将所述第二类表项记录在函数运行表中。According to the order in which the second-type entries are generated, the second-type entries are recorded in the function operation table.
可选的,所述JTAG接口集成在一个监控芯片中,所述第二处理器集成在一个主控芯片中,所述监控芯片通过所述JTAG通道与所述第一CPU通信,所述主控芯片与所述监控芯片通过总线通信。Optionally, the JTAG interface is integrated in a monitoring chip, the second processor is integrated in a main control chip, the monitoring chip communicates with the first CPU through the JTAG channel, and the main control chip The chip communicates with the monitoring chip through a bus.
通过上述方案,生成处理器故障记录的装置检测到在处理单元停止响应时,通过JTAG接口在一段时间内,获取多项所述处理单元的PC中存放的指令地址并记录在指令地址表中。所述指令地址表反映了所述处理单元停止响应后一段时间内,所述处理单元运行程序的状况。由于当所述处理单元进入死循环时,会导致死循环函数及相应的一段代码行被反复调用,因此,相比于现有技术中,检测到所述处理单元停止响应后立即触发中断,并且只记录中断时刻所述处理单元正在运行的指令地址而言,更准确地反映出现死循环的函数和代码区间,有助于提高CPU故障分析的效率。Through the above solution, when the device for generating the processor fault record detects that the processing unit stops responding, it obtains multiple instruction addresses stored in the PC of the processing unit through the JTAG interface within a period of time and records them in the instruction address table. The instruction address table reflects the status of the program running by the processing unit within a period of time after the processing unit stops responding. Since when the processing unit enters an infinite loop, the infinite loop function and a corresponding piece of code line will be repeatedly called, therefore, compared with the prior art, an interrupt is triggered immediately after the processing unit stops responding, and In terms of only recording the address of the instruction that the processing unit is running at the moment of interruption, it can more accurately reflect the function and code interval where the infinite loop occurs, and help to improve the efficiency of CPU fault analysis.
第四方面,提供了一种处理器故障记录的装置,其特征在于,应用于包括所述装置和第一多核CPU的硬件平台中,所述第一多核CPU中包括第一处理单元和第二处理单元,所述第一处理单元和所述第二处理单元是所述第一多核CPU的从核,所述装置通过联合测试工作组JTAG通道与所述第一多核CPU通信,所述装置包括:第二处理器、存储器和JTAG接口,所述第二处理器、所述存储器和所述JTAG接口通过总线相连;In a fourth aspect, there is provided a device for processor fault recording, which is characterized in that it is applied to a hardware platform including the device and a first multi-core CPU, and the first multi-core CPU includes a first processing unit and a The second processing unit, the first processing unit and the second processing unit are slave cores of the first multi-core CPU, and the device communicates with the first multi-core CPU through a joint test working group JTAG channel, The device includes: a second processor, a memory, and a JTAG interface, and the second processor, the memory, and the JTAG interface are connected through a bus;
所述JTAG接口,用于通过所述JTAG通道,获取所述第一多核CPU中的处理单元的程序计数器PC中的指令地址,并将所述PC中的指令地址通过所述总线发送给所述第二处理器;The JTAG interface is used to obtain the instruction address in the program counter PC of the processing unit in the first multi-core CPU through the JTAG channel, and send the instruction address in the PC to the the second processor;
所述第二处理器,用于读取所述存储器中存储的程序代码,执行以下操作:The second processor is configured to read the program code stored in the memory, and perform the following operations:
检测到所述第一处理单元停止响应;detecting that the first processing unit has stopped responding;
如果所述第一处理单元对应的指令地址表中已记录的表项数量未到达第一预设值,通过JTAG接口获取所述第一处理单元当前的程序计数器PC中的指令地址,所述第一预设值大于等于2;If the number of entries recorded in the instruction address table corresponding to the first processing unit has not reached the first preset value, obtain the instruction address in the current program counter PC of the first processing unit through the JTAG interface, and the first processing unit A preset value is greater than or equal to 2;
创建一个第一类表项,所述第一类表项记录在所述第一处理单元对应的指令地址表中,所述第一类表项中包括所述第一处理单元当前的PC中的指令地址;Create a first type of entry, the first type of entry is recorded in the instruction address table corresponding to the first processing unit, and the first type of entry includes the current PC of the first processing unit command address;
如果所述第二处理单元对应的指令地址表中已记录的表项数量未达到第二预设值,通过所述JTAG接口获取所述第二处理单元当前的PC中的指令地址,所述第二预设值大于等于2;If the number of entries recorded in the instruction address table corresponding to the second processing unit does not reach the second preset value, obtain the instruction address in the current PC of the second processing unit through the JTAG interface, and the second processing unit 2. The preset value is greater than or equal to 2;
创建一个另一个第一类表项,并将所述另一个第一类表项记录在所述第二处理单元对应的指令地址表中,所述另一个第一类表项中包括所述第二处理单元当前的PC中的指令地址;Create another first-type entry, and record the other first-type entry in the instruction address table corresponding to the second processing unit, the other first-type entry includes the first 2. The instruction address in the current PC of the processing unit;
判断所述第一处理单元对应的指令地址表和所述第二处理单元对应的指令地址表中的至少一个指令地址表中记录的表项数量是否达到对应的预设值;judging whether the number of entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches a corresponding preset value;
如果所述第一处理单元对应的指令地址表和所述第二处理单元对应的指令地址表中的至少一个指令地址表中记录的表项数量没有达到所述对应的预设值,返回执行所述通过所述JTAG通道获取所述第一处理单元当前的PC中的指令地址步骤;If the number of entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit does not reach the corresponding preset value, return to execute the Describe the step of obtaining the instruction address in the current PC of the first processing unit through the JTAG channel;
如果所述第一处理单元对应的指令地址表和所述第二处理单元对应的指令地址表中的至少一个指令地址表中记录的表项数量达到所述对应的预设值,触发所述第一多核CPU中断。If the number of entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches the corresponding preset value, triggering the second A multi-core CPU interrupt.
可选的,所述JTAG接口还用于,通过JTAG通道,获取所述第一多核CPU中的处理单元的当前的函数返回寄存器中的指令地址,并将当前的所述函数返回寄存器中的指令地址通过所述总线发送给所述第二处理器;Optionally, the JTAG interface is also used to obtain the instruction address in the current function return register of the processing unit in the first multi-core CPU through the JTAG channel, and return the current function return instruction address in the register. sending an instruction address to the second processor through the bus;
所述第二处理器执行所述将所述第一类表项记录在所述第一处理单元对应的指令地址表中之前,还用于执行:Before the second processor executes the recording of the first type entry in the instruction address table corresponding to the first processing unit, it is also used to execute:
通过所述JTAG接口获取所述第一处理单元当前的函数返回地址寄存器中的指令地址;Obtaining the instruction address in the current function return address register of the first processing unit through the JTAG interface;
将所述第一处理单元当前的函数返回地址寄存器中的指令地址添加在所述第一类表项中;adding the instruction address in the current function return address register of the first processing unit to the first type entry;
所述第二处理器执行所述将所述另一个第一类表项记录在所述第二处理单元对应的指令地址表中之前,还用于执行:Before the second processor executes the recording of the other first-type entry in the instruction address table corresponding to the second processing unit, it is also used to execute:
通过所述JTAG接口获取所述第二处理单元当前的函数返回地址寄存器中的指令地址;Obtaining the instruction address in the current function return address register of the second processing unit through the JTAG interface;
将所述第二处理单元当前的函数返回地址寄存器中的指令地址添加在所述另一个第一类表项中。Add the instruction address in the current function return address register of the second processing unit to the other first type entry.
可选的,所述第二处理器检测到所述第一处理单元停止响应,包括执行:Optionally, the second processor detects that the first processing unit stops responding, including executing:
接收所述第一多核CPU中主核发送的指示信息,所述指示信息中携带所述第一处理单元的标识;receiving instruction information sent by the main core in the first multi-core CPU, where the instruction information carries the identifier of the first processing unit;
所述第二处理器根据所述指示信息,确定所述第一处理单元停止响应。The second processor determines, according to the indication information, that the first processing unit stops responding.
可选的,所述第二处理器还用于,按照表项存储的先后顺序,依次从所述第一处理单元对应的指令地址表中读取一个第一类表项,针对读取的每个第一类表项,执行:Optionally, the second processor is further configured to sequentially read a first-type entry from the instruction address table corresponding to the first processing unit according to the order in which the entries are stored, and for each read A first-type entry, execute:
根据所述第一类表项中包括的PC中的指令地址,从预先设定的所述第一处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述PC中的指令地址对应的函数名和代码行;According to the instruction address in the PC included in the first-type table entry, the first-type table is queried to obtain the first-type table from the preset correspondence between the instruction address in the first processing unit, the function name, and the code line The function name and code line corresponding to the instruction address in the PC included in the item;
创建一个第二类表项,所述第二类表项中包括所述第一类表项中包括的PC中的指令地址对应的函数名和代码行;Create a second type of entry, the second type of entry includes the function name and code line corresponding to the instruction address in the PC included in the first type of entry;
按照所述第二类表项生成的先后顺序,将所述第二类表项记录在第一处理单元对应的函数运行表中。According to the order in which the entries of the second type are generated, the entries of the second type are recorded in the function operation table corresponding to the first processing unit.
可选的,所述第二处理器还用于按照表项存储的先后顺序,依次从所述第二处理单元对应的指令地址表中读取一个第一类表项,针对读取的每个第一类表项,执行:Optionally, the second processor is further configured to sequentially read a first-type entry from the instruction address table corresponding to the second processing unit according to the order in which the entries are stored, and for each read For the first type of entry, execute:
根据所述第一类表项中包括的PC中的指令地址,从预先设定的所述第二处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述PC中的指令地址对应的函数名和代码行;According to the instruction address in the PC included in the first-type table entry, the first-type table is queried to obtain the first-type table from the preset correspondence between the instruction address in the second processing unit, the function name, and the code line The function name and code line corresponding to the instruction address in the PC included in the item;
创建一个第二类表项,所述第二类表项中包括所述第一类表项中包括的PC中的指令地址对应的函数名和代码行;Create a second type of entry, the second type of entry includes the function name and code line corresponding to the instruction address in the PC included in the first type of entry;
按照所述第二类表项生成的先后顺序,将所述第二类表项记录在第二处理单元对应的函数运行表中。According to the order in which the entries of the second type are generated, the entries of the second type are recorded in the function operation table corresponding to the second processing unit.
可选的,所述JTAG接口集成在一个监控芯片中,所述第二处理器集成在一个主控芯片中,所述监控芯片通过所述JTAG通道与所述第一多核CPU通信,所述主控芯片与所述监控芯片通过总线通信。Optionally, the JTAG interface is integrated in a monitoring chip, the second processor is integrated in a main control chip, the monitoring chip communicates with the first multi-core CPU through the JTAG channel, and the The main control chip communicates with the monitoring chip through a bus.
通过上述方案,故障记录生成装置在第一处理单元停止响应时,通过JTAG通道在一段时间内,获取多项所述第一处理单元的PC中存放的指令地址并记录在指令地址表中。所述指令地址表反映了所述第一处理单元停止响应后一段时间内,所述第一处理单元运行程序的状况。同时,所述故障记录生成装置还记录与所述第一处理单元处于同一个多核CPU中的第二处理单元的PC中存放的指令地址。由于死循环时,会导致死循环函数及相应的一段代码行被反复调用,且在多核CPU中,不同从核之间的函数存在相互调用,相比于现有技术中,检测到所述第一处理单元停止响应后立即触发中断,并且只记录中断时刻所述第一处理单元和所述第二处理单元正在运行的指令地址而言,更准确地反映出现死循环的函数和代码区间,有助于提高CPU故障分析的效率。Through the above solution, when the first processing unit stops responding, the fault record generation device obtains multiple instruction addresses stored in the PC of the first processing unit within a period of time through the JTAG channel and records them in the instruction address table. The instruction address table reflects the state of the program running by the first processing unit within a period of time after the first processing unit stops responding. At the same time, the fault record generation device also records the instruction address stored in the PC of the second processing unit in the same multi-core CPU as the first processing unit. Due to an infinite loop, the infinite loop function and a corresponding section of code line will be called repeatedly, and in a multi-core CPU, functions between different slave cores are called each other. Compared with the prior art, it is detected that the first An interrupt is triggered immediately after a processing unit stops responding, and only the address of the instruction that the first processing unit and the second processing unit are running at the time of the interruption is recorded, so as to more accurately reflect the functions and code intervals where an infinite loop occurs, and there is Helps improve the efficiency of CPU failure analysis.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present application. Those skilled in the art can also obtain other drawings based on these drawings without any creative effort.
图1为本申请实施例提供的一种应用场景示意图;FIG. 1 is a schematic diagram of an application scenario provided by an embodiment of the present application;
图2为本申请实施例提供的一种生成处理器故障记录的方法流程图;FIG. 2 is a flow chart of a method for generating a processor failure record provided by an embodiment of the present application;
图3为本申请实施例提供的另一种生成处理器故障记录的方法流程图;FIG. 3 is a flow chart of another method for generating a processor failure record provided by an embodiment of the present application;
图4为本申请实施例提供的一种生成处理器故障记录的装置结构示意图;FIG. 4 is a schematic structural diagram of an apparatus for generating a processor failure record provided by an embodiment of the present application;
图5为本申请实施例提供的另一种生成处理器故障记录的装置结构示意图。FIG. 5 is a schematic structural diagram of another device for generating a processor fault record provided by an embodiment of the present application.
具体实施方式detailed description
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。In order to make the purpose, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application.
图1示出了本申请实施例提供的一种应用场景示意图。网络设备100中包括由CPU101和控制芯片102构成的硬件平台。CPU101用于执行网络设备100中的存储器中存储的程序,实现网络设备100的业务功能。控制芯片102用于对CPU101的运行状况进行监控。FIG. 1 shows a schematic diagram of an application scenario provided by an embodiment of the present application. The network device 100 includes a hardware platform composed of a CPU 101 and a control chip 102 . The CPU 101 is used to execute programs stored in the memory of the network device 100 to realize the service functions of the network device 100 . The control chip 102 is used to monitor the running status of the CPU 101 .
举例来说,控制芯片102中可以包括主控芯片1021和监控芯片1022。主控芯片1021管理CPU101的业务运行状况,例如,主控芯片1021通过对CPU101进行心跳检测来监控CPU的运行状况,当主控芯片1021检测到CPU101停止响应时,通过触发CPU101中断,并重新启动CPU101,使CPU101脱离故障状态。监控芯片1022用于管理网络设备的工作条件,例如电压、温度等。主控芯片1021,监控芯片1022以及CPU101之间,分别通过总线连接并进行通信。For example, the control chip 102 may include a main control chip 1021 and a monitoring chip 1022 . The main control chip 1021 manages the business operation status of the CPU 101. For example, the main control chip 1021 monitors the operation status of the CPU by performing heartbeat detection on the CPU 101. When the main control chip 1021 detects that the CPU 101 stops responding, it triggers the CPU 101 to interrupt and restart The CPU 101 makes the CPU 101 out of the fault state. The monitoring chip 1022 is used to manage the working conditions of network equipment, such as voltage and temperature. The main control chip 1021 , the monitoring chip 1022 and the CPU 101 are respectively connected and communicated through a bus.
CPU101中可以是单核处理器,也可以是多核处理器。在CPU101是单核处理器的情况下,包括处理单元1011;在CPU101是多核处理器的情况下,至少还包括处理单元1012。The CPU 101 may be a single-core processor or a multi-core processor. When the CPU 101 is a single-core processor, it includes a processing unit 1011; when the CPU 101 is a multi-core processor, at least a processing unit 1012 is included.
在CPU101是多核处理器的情况下,主控芯片1021可以分别对处理单元1011和处理单元1012进行心跳检测,实现对CPU101的运行状况进行监控。在多核处理器包括主核(如处理单元1011)和从核(如处理单元1012)的情况下,主控芯片1021也可以只对所述主核进行心跳检测,并由所述主核对所述从核进行检测。当所述主核检测到所述从核停止响应时,通知主控芯片1021。When the CPU 101 is a multi-core processor, the main control chip 1021 can perform heartbeat detection on the processing unit 1011 and the processing unit 1012 respectively, so as to monitor the operating status of the CPU 101 . In the case where a multi-core processor includes a master core (such as a processing unit 1011) and a slave core (such as a processing unit 1012), the main control chip 1021 may only perform heartbeat detection on the master core, and the master core may check the Detection is performed from the nucleus. When the master core detects that the slave core stops responding, it notifies the master chip 1021 .
主控芯片1021可以和CPU101集成在一个印刷电路板(英文:printed circuit board,简称:PCB)上,也可以集成在不同的PCB上。当主控芯片1021与CPU101集成在同一个PCB上时,可以使用内部总线进行通信。当主控芯片1021与CPU101不在同一个PCB上时,可以通过以太网接口通信。监控芯片1022,通常与CPU101集成在同一个PCB板上。The main control chip 1021 can be integrated with the CPU 101 on a printed circuit board (English: printed circuit board, PCB for short), or can be integrated on different PCBs. When the main control chip 1021 and the CPU 101 are integrated on the same PCB, an internal bus can be used for communication. When the main control chip 1021 and the CPU 101 are not on the same PCB, they can communicate through the Ethernet interface. The monitoring chip 1022 is generally integrated with the CPU 101 on the same PCB.
CPU101执行的程序由一系列的指令组成,每一条指令在存储器中的存储位置由一个指令地址标识。当CPU执行一条指令时,首先需要将指令地址存入CPU的一个寄存器,即程序计数器(英文:program counter,简称:PC)中,再通过所述指令地址,从存储器对应的地址中获取指令,并将所述指令存储在CPU的指令寄存器(英文:instruction register,简称:IR)中,以执行该指令。在一些类型的CPU中,还包括函数返回地址寄存器,当一条指令调用程序中的一个函数时,所述指令的地址被保存在所述函数返回地址寄存器中,当所述被调用的函数执行完毕,CPU继续执行所述函数返回地址寄存器中指令地址的下一条指令。The program executed by the CPU 101 is composed of a series of instructions, and the storage location of each instruction in the memory is identified by an instruction address. When the CPU executes an instruction, it first needs to store the instruction address in a register of the CPU, that is, the program counter (English: program counter, PC for short), and then obtain the instruction from the address corresponding to the memory through the instruction address. and storing the instruction in an instruction register (English: instruction register, IR for short) of the CPU to execute the instruction. In some types of CPUs, a function return address register is also included. When an instruction calls a function in the program, the address of the instruction is stored in the function return address register. When the called function is executed , the CPU continues to execute the next instruction of the instruction address in the function return address register.
现有技术中,当主控芯片1021检测到CPU101停止响应后,立即触发CPU101中断。CPU101在响应所述中断时,会将中断时刻CPU101内部的PC和函数返回寄存器等寄存器中的指令地址保存起来。此后,CPU101可以输出中断时刻保存的寄存器中的指令地址保存并输出。然而由于当前的程序代码在设计时,往往存在较为复杂的多个函数之间的嵌套调用关系,中断时刻寄存器中存放的指令地址对应的指令不一定是导致发生死循环的函数中的指令。因此,利用中断时刻寄存器中的指令地址难以准确定位死循环发生的函数。In the prior art, when the main control chip 1021 detects that the CPU 101 stops responding, it immediately triggers an interrupt of the CPU 101 . When the CPU 101 responds to the interrupt, it will save the instruction address in the registers such as the PC and the function return register inside the CPU 101 at the time of the interrupt. Thereafter, the CPU 101 can save and output the instruction address in the register saved at the time of the output interruption. However, since the current program code is designed, there are often complex nested calling relationships between multiple functions, and the instruction corresponding to the instruction address stored in the register at the time of interruption is not necessarily the instruction in the function that causes the infinite loop to occur. Therefore, it is difficult to accurately locate the function where the infinite loop occurs by using the instruction address in the register at the time of interruption.
在本申请提供的生成处理器故障记录的方法中,控制芯片102和CPU101之间通信的通道包括JTAG通道。所述JTAG通道,是指采用联合测试行动组(英文:joint test action group,简称:JTAG)的相关协议中定义的接口进行通信的通道。例如,在IEEE 1149.1标准中,定义了JTAG接口需要有四个接口,分别是测试数据输入(TDI,Test Data In),测试数据输出(TDO,Test Data Out),测试时钟(TCK,Test Clock)和测试模式选择(TMS,Test Mode Select)。所述控制芯片通过所述IEEE1149.1标准定义的接口,读取所述处理单元中寄存器的信息。In the method for generating a processor fault record provided in this application, the communication channel between the control chip 102 and the CPU 101 includes a JTAG channel. The JTAG channel refers to a channel for communicating using an interface defined in a related protocol of a joint test action group (English: joint test action group, JTAG for short). For example, in the IEEE 1149.1 standard, it is defined that the JTAG interface requires four interfaces, namely, test data input (TDI, Test Data In), test data output (TDO, Test Data Out), and test clock (TCK, Test Clock) And test mode selection (TMS, Test Mode Select). The control chip reads the information of the register in the processing unit through the interface defined by the IEEE1149.1 standard.
在控制芯片102包括主控芯片1021和监控芯片1022的情况下,所述JTAG通道通常可以由CPU101与监控芯片1022之间通信的通道实现。主控芯片1021通常包括处理器(英文:processor)。监控芯片1022通过JTAG通道,按照本申请实施例提供的方法获取CPU101中寄存器的指令地址,并通过内部总线发送给主控芯片1021,由主控芯片1021中的处理器实施生成故障记录。In the case that the control chip 102 includes a main control chip 1021 and a monitoring chip 1022 , the JTAG channel can generally be realized by a communication channel between the CPU 101 and the monitoring chip 1022 . The main control chip 1021 usually includes a processor (English: processor). The monitoring chip 1022 obtains the instruction address of the register in the CPU 101 through the JTAG channel according to the method provided in the embodiment of the present application, and sends it to the main control chip 1021 through the internal bus, and the processor in the main control chip 1021 implements and generates a fault record.
通过充分利用了分布式网络设备中现有的芯片结构,即利用所述CPU在同一PCB板上的监控芯片实现JTAG接口,利用所述主控芯片中的处理器实现对图2所述的方法中指令地址表的生成,有利于降低所述处理器故障记录生成方法的实现难度。By making full use of the existing chip structure in the distributed network equipment, namely utilizing the monitoring chip on the same PCB board of the CPU to realize the JTAG interface, utilizing the processor in the main control chip to realize the method described in Fig. 2 The generation of the instruction address table is beneficial to reduce the implementation difficulty of the method for generating the processor fault record.
图2示出了本申请实施例提供的一种生成处理器故障记录的方法,应用于包括控制芯片和中央处理器CPU的硬件平台中。例如,所述硬件平台可以是图1所示的硬件平台,所述控制芯片可以是图1所示的控制芯片102,所述CPU可以是图1所示的CPU101。FIG. 2 shows a method for generating a processor fault record provided by an embodiment of the present application, which is applied to a hardware platform including a control chip and a central processing unit CPU. For example, the hardware platform may be the hardware platform shown in FIG. 1 , the control chip may be the control chip 102 shown in FIG. 1 , and the CPU may be the CPU 101 shown in FIG. 1 .
下面将结合图2对本申请实施例提供的一种生成处理器故障记录的方法进行详细阐述。A method for generating a processor fault record provided by an embodiment of the present application will be described in detail below with reference to FIG. 2 .
S201,控制芯片检测到所述CPU中的一个处理单元停止响应。S201. The control chip detects that a processing unit in the CPU stops responding.
举例来说,所述控制芯片对所述处理单元进行心跳检测,确定所述处理单元停止响应。例如,所述控制芯片定期向所述处理单元发送检测报文,如果在预设的时间之内,所述控制芯片没有收到所述处理单元对所述检测报文的应答,则确定所述处理单元停止响应。For example, the control chip detects the heartbeat of the processing unit, and determines that the processing unit stops responding. For example, the control chip periodically sends a detection message to the processing unit, and if the control chip does not receive a response from the processing unit to the detection message within a preset time, it determines that the The processing unit stops responding.
所述处理单元可以是图1所示的处理单元1011。在所述CPU是多核处理器的情况下,处理单元1011可以是所述CPU的主核,也可以是所述CPU的从核。The processing unit may be the processing unit 1011 shown in FIG. 1 . In the case that the CPU is a multi-core processor, the processing unit 1011 may be a master core of the CPU, or a slave core of the CPU.
S202,所述控制芯片通过联合测试行动组JTAG通道获取所述处理单元当前的程序计数器PC中的指令地址。S202. The control chip obtains the instruction address in the current program counter PC of the processing unit through the joint test action group JTAG channel.
需要说明的是,要实现通过所述JTAG通道读取所述处理单元中寄存器的信息,所述处理单元内部需要具备JTAG模块。JTAG模块由测试访问端口(英文:test access port,简称:TAP)控制器和若干寄存器(英文:register)组成。所述控制芯片通过JTAG通道对所述TAP控制器发送指令,将所述处理单元PC等内部寄存器的信息,读入所述JTAG模块的寄存器中,并通过所述JTAG通道将所述信息发送给所述控制芯片。目前常见的高级器件都具有支持JTAG协议的模块,例如MIPS处理器、ARM处理器等。It should be noted that, in order to realize reading information of registers in the processing unit through the JTAG channel, the processing unit needs to have a JTAG module inside. The JTAG module is composed of a test access port (English: test access port, abbreviated: TAP) controller and several registers (English: register). The control chip sends instructions to the TAP controller through the JTAG channel, reads the information of internal registers such as the processing unit PC into the registers of the JTAG module, and sends the information to the TAP controller through the JTAG channel. The control chip. Currently common high-level devices have modules that support the JTAG protocol, such as MIPS processors, ARM processors, and so on.
可选的,S202中所述控制芯片还通过所述JTAG通道获取所述处理单元当前的函数返回地址寄存器中的指令地址。Optionally, in S202, the control chip also obtains the instruction address in the current function return address register of the processing unit through the JTAG channel.
具体来说,在一些类型的CPU中,包括函数返回地址寄存器。当一条指令调用程序中的一个函数时,所述指令的地址被保存在所述函数返回地址寄存器中,当所述被调用的函数执行完毕,CPU继续执行所述函数返回地址寄存器中指令地址的下一条指令。举例来说,在MIPS架构的CPU中,所述函数返回地址寄存器可以是R31寄存器,又称为Ra寄存器;在ARM架构的CPU中,所述函数返回地址寄存器可以是R14寄存器,又称链接寄存器(英文:link register,简称:LR)。为叙述方便,下文中的实施例中,将仅以Ra寄存器为例进行说明。Specifically, in some types of CPUs, a function return address register is included. When an instruction calls a function in the program, the address of the instruction is stored in the function return address register, and when the called function is executed, the CPU continues to execute the address of the instruction in the function return address register. next instruction. For example, in a MIPS architecture CPU, the function return address register may be the R31 register, also known as the Ra register; in an ARM architecture CPU, the function return address register may be the R14 register, also known as the link register (English: link register, abbreviation: LR). For the convenience of description, in the following embodiments, only the Ra register will be used as an example for illustration.
通过将所述处理单元当前的函数返回地址寄存器中的指令地址记录在所述指令地址表中,所述指令地址表能够更清楚地反映所述处理单元正在运行的函数之间的调用关系,进一步提高了CPU故障分析的效率。By recording the instruction address in the current function return address register of the processing unit in the instruction address table, the instruction address table can more clearly reflect the call relationship between the functions being run by the processing unit, further Improved the efficiency of CPU failure analysis.
需要说明的是,S202中所述获取当前的PC中的指令地址,以及所述获取当前的Ra寄存器中的指令地址,可以是同一时刻获取的;也可以是先获取所述当前的PC中的指令地址,随后立即获取所述当前的Ra寄存器中的指令地址;还可以是先获取所述当前的Ra寄存器中的指令地址,随后立即获取所述当前的PC中的指令地址。由于在函数调用的过程中,例如函数A调用函数B,在整个函数B运行期间,Ra寄存器中的指令地址均为函数A调用函数B的指令的指令地址,即整个函数B运行期间,Ra寄存器中的指令地址不发生变化。因此,本领域技术人员可以理解的是,获取当前的PC中的指令地址,以及获取当前的Ra中的指令地址,并不严格要求在同一时刻进行。It should be noted that the acquisition of the instruction address in the current PC in S202 and the acquisition of the instruction address in the current Ra register may be acquired at the same time; The instruction address, and then immediately obtain the instruction address in the current Ra register; it is also possible to first obtain the instruction address in the current Ra register, and then immediately obtain the instruction address in the current PC. Because in the process of function calling, for example, function A calls function B, during the entire function B running, the instruction address in the Ra register is the instruction address of the instruction that function A calls function B, that is, during the entire function B running, the Ra register The address of the instruction in does not change. Therefore, those skilled in the art can understand that obtaining the instruction address in the current PC and obtaining the instruction address in the current Ra are not strictly required to be performed at the same time.
S203,所述控制芯片创建一个包括所述当前的PC中的指令地址的第一类表项,并将所述第一类表项记录在指令地址表中。S203. The control chip creates a first-type entry including the instruction address in the current PC, and records the first-type entry in the instruction address table.
举例来说,所述控制芯片在第一次获取所述处理单元的PC中的指令地址之前,先对所述指令地址表进行初始化。例如,在所述控制芯片的内存中分配一定的空间,用于存储所述指令地址表中的表项,并对需要记录的表项数量设定预设值。所述预设值大于或等于2。For example, before the control chip acquires the instruction address in the PC of the processing unit for the first time, it initializes the instruction address table. For example, a certain space is allocated in the memory of the control chip for storing entries in the instruction address table, and a preset value is set for the number of entries to be recorded. The preset value is greater than or equal to 2.
可选的,如果S202中所述控制芯片还获取了所述处理单元当前的函数返回地址寄存器中的指令地址,S203中所述控制芯片还将所述当前的函数返回地址寄存器中的指令地址添加在所述第一类表项中。Optionally, if the control chip in S202 also obtains the instruction address in the current function return address register of the processing unit, the control chip in S203 also adds the instruction address in the current function return address register to In the first type of entry.
可选的,所述控制芯片获取当前的时刻,并将所述当前的时刻添加到所述第一类表项中。Optionally, the control chip acquires the current time, and adds the current time to the first type of entry.
需要说明的是,由于在所述第一类表项中记录所述当前的时刻,主要的目的在于确定故障发生的大致时间,所述指令地址表中每一个第一类表项创建的时间间隔,因此所述当前的时刻,可以是S202中获取PC中指令地址的时刻,也可以是S202中获取函数返回寄存器中指令地址的时刻,还可以是S203中创建所述第一类表项的时刻,所述当前的时刻可以是通过所述JTAG通道从所述处理单元中获取的,也可以是由所述控制芯片生成的。本申请对于获取所述当前的时刻的具体时间点和方式不做限制。It should be noted that since the current time is recorded in the first type of entry, the main purpose is to determine the approximate time when the fault occurs, and the time interval between the creation of each first type of entry in the instruction address table , so the current moment may be the moment when the instruction address in the PC is obtained in S202, or the moment when the instruction address in the register is obtained by the function return register in S202, or the moment when the first type of entry is created in S203 , the current time may be obtained from the processing unit through the JTAG channel, or may be generated by the control chip. The present application does not limit the specific time point and method for obtaining the current moment.
可选的,所述将所述第一类表项记录在指令地址表中,包括:所述控制芯片按照所述第一类表项生成的先后顺序,将所述第一类表项记录在指令地址表中。Optionally, the recording the first type of entry in the instruction address table includes: the control chip records the first type of entry in the order in which the first type of entry is generated. instruction address table.
S204,所述控制芯片判断所述指令地址表中已记录的表项数量是否达到预设值,所述预设值大于等于2。S204. The control chip judges whether the number of entries recorded in the instruction address table reaches a preset value, and the preset value is greater than or equal to 2.
如果所述指令地址表中已记录的表项数量没有达到所述预设值,所述控制芯片返回执行S202以及S203;如果所述指令地址表中记录的表项数量达到所述预设值,执行S205。If the number of entries recorded in the instruction address table does not reach the preset value, the control chip returns to execute S202 and S203; if the number of entries recorded in the instruction address table reaches the preset value, Execute S205.
可选的,如果所述指令地址表中已记录的表项数量没有达到所述预设值,所述控制芯片返回执行S202以及S203,包括:所述控制芯片延迟时间段T1;在所述时间段T1到达后,所述控制芯片返回执行S202以及S203。Optionally, if the number of entries recorded in the instruction address table does not reach the preset value, the control chip returns to execute S202 and S203, including: the control chip delays the time period T1; After the segment T1 arrives, the control chip returns to execute S202 and S203.
所述控制芯片通过延迟时间段T1,可以使得所述控制芯片在延迟的所述T1时间段内执行其他任务,避免所述控制芯片因循环执行S202至S204,占用所述控制芯片过多资源。举例来说,所述时间段T1可以是1ms,10ms,50ms或100ms等。By delaying the time period T1, the control chip can make the control chip perform other tasks during the delayed T1 time period, so as to prevent the control chip from occupying too many resources of the control chip due to cyclic execution of S202 to S204. For example, the time period T1 may be 1 ms, 10 ms, 50 ms or 100 ms and so on.
S205,所述控制芯片触发所述CPU中断。S205. The control chip triggers the CPU interrupt.
举例来说,所述控制芯片触发的中断可以是不可屏蔽中断(英文:non-maskable interrupt,简称NMI)。所述控制芯片通过触发中断,进而触发所述CPU重新启动,以脱离当前的故障。For example, the interrupt triggered by the control chip may be a non-maskable interrupt (English: non-maskable interrupt, NMI for short). The control chip triggers an interrupt, and then triggers the restart of the CPU, so as to get rid of the current fault.
表1为所述指令地址表的一个示例。Table 1 is an example of the instruction address table.
表1Table 1
如表1所示,在6:18:01.050时刻,所述控制芯片获取所述处理单元PC中的指令地址为0x2dc9c0,Ra寄存器中的指令地址为0xddb2f0ae,所述控制芯片创建一个第一类表项,即表1中的表项1,并将上述两个指令地址记录在所述第一类表项中。在6:18:01.100时刻,所述控制芯片获取所述所述处理单元PC中的指令地址为0x151a4d4,Ra寄存器中指令地址为0x2dc3d1,所述控制芯片创建另一个第一类表项,即表1中的表项2,并将上述两个指令地址记录在所述另一个第一类表项中。同理,所述控制芯片还在6:18:01.150时刻创建了表项3,以及在6:18:04.050时刻创建了表项N,并记录了相应的指令地址。As shown in Table 1, at 6:18:01.050, the control chip acquires that the instruction address in the processing unit PC is 0x2dc9c0, the instruction address in the Ra register is 0xddb2f0ae, and the control chip creates a first-class table entry, that is, entry 1 in Table 1, and record the addresses of the above two instructions in the first type of entry. At 6:18:01.100, the control chip acquires that the instruction address in the processing unit PC is 0x151a4d4, and the instruction address in the Ra register is 0x2dc3d1, and the control chip creates another first-type entry, namely table 1, and record the addresses of the above two instructions in the other first-type entry. Similarly, the control chip also creates entry 3 at 6:18:01.150, and creates entry N at 6:18:04.050, and records the corresponding instruction address.
通过上述方案,控制芯片在处理单元停止响应时,通过JTAG通道在一段时间内,获取多项所述处理单元的PC中存放的指令地址并记录在指令地址表中。所述指令地址表反映了所述处理单元停止响应后一段时间内,所述处理单元运行程序的状况。由于当所述处理单元进入死循环时,会导致死循环函数及相应的一段代码行被反复调用,因此,本申请提供的方法,相比于现有技术中,检测到所述处理单元停止响应后立即触发中断,并且只记录中断时刻所述处理单元正在运行的指令地址而言,更准确地反映出现死循环的函数和代码区间,有助于提高CPU故障分析的效率Through the above scheme, when the processing unit stops responding, the control chip obtains multiple instruction addresses stored in the PC of the processing unit through the JTAG channel within a period of time and records them in the instruction address table. The instruction address table reflects the status of the program running by the processing unit within a period of time after the processing unit stops responding. Since when the processing unit enters an infinite loop, the infinite loop function and a corresponding piece of code line will be called repeatedly, therefore, the method provided by this application, compared with the prior art, detects that the processing unit stops responding In terms of triggering an interrupt immediately after the interrupt, and only recording the address of the instruction that the processing unit is running at the time of the interrupt, it can more accurately reflect the function and code interval where the infinite loop occurs, and help improve the efficiency of CPU fault analysis
可选的,S206,所述控制芯片按照表项存储的先后顺序,依次从所述指令地址表中读取一个第一类表项,针对读取的每个第一类表项,执行:所述控制芯片根据所述第一类表项中包括的PC中的指令地址,从预先设定的所述处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述PC中的指令地址对应的函数名和代码行;所述控制芯片创建一个第二类表项,所述第二类表项中包括所述第一类表项中包括的PC中的指令地址对应的函数名和代码行;所述控制芯片按照所述第二类表项生成的先后顺序,将所述第二类表项记录在函数运行表中。Optionally, in S206, the control chip sequentially reads a first-type entry from the instruction address table according to the order in which the entries are stored, and for each read first-type entry, execute: According to the instruction address in the PC included in the entry of the first type, the control chip obtains the first type of The function name and code line corresponding to the instruction address in the PC included in the entry; the control chip creates a second type of entry, and the second type of entry includes the first type of entry The function name and code line corresponding to the instruction address in the PC; the control chip records the second type of entry in the function operation table according to the sequence in which the second type of entry is generated.
举例来说,所述预先设定的所述处理单元中的指令地址与函数名和代码行的对应关系,可以通过对所述处理单元中的程序进行反汇编得到。例如,将所述处理单元中的程序输入编译器,生成反汇编文件,所述反汇编文件包括所述处理单元的程序中每个函数的名称,所述每个函数汇编语言的代码,以及所述汇编语言的代码行和所述处理单元的内存中指令地址的对应关系。例如,所述编译器可以是GNU编译器套件(英文:GNU compiler collection,简称:GCC)中的objdump软件。举例来说,输入objdump的文件可以是管理平面的文件,也可以是数据平面的文件。For example, the preset correspondence between instruction addresses in the processing unit, function names, and code lines can be obtained by disassembling programs in the processing unit. For example, the program in the processing unit is input into a compiler to generate a disassembly file, the disassembly file includes the name of each function in the program of the processing unit, the code of each function assembly language, and all The corresponding relationship between the code line of the assembly language and the instruction address in the memory of the processing unit is described. For example, the compiler may be the objdump software in the GNU compiler suite (English: GNU compiler collection, GCC for short). For example, the file input to objdump may be a management plane file or a data plane file.
可选的,如果S203中所述控制芯片还将所述当前的函数返回地址寄存器中的指令地址添加在了所述第一类表项中,则S206中所述控制芯片针对读取的每个第一类表项,还包括执行:所述控制芯片根据所述第一类表项中包括的函数返回寄存器中的指令地址,从预先设定的所述处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述函数返回寄存器中的指令地址对应的函数名和代码行;将所述第一类表项中包括的所述函数返回寄存器中的指令地址对应的函数名和代码行添加在所述第二类表项中。Optionally, if the control chip in S203 also adds the instruction address in the current function return address register to the first type of entry, the control chip in S206 for each read The first type of entry also includes execution: the control chip returns the instruction address in the register according to the function included in the first type of entry, from the preset instruction address and function name and code in the processing unit In the row correspondence, query to obtain the function name and code line corresponding to the instruction address in the function return register included in the first type of entry; return the function included in the first type of entry to the register Add the function name and code line corresponding to the instruction address in the second type of entry.
所述控制芯片通过根据指令地址表中的每个第一类表项,生成函数运行表中的一个第二类表项,所述函数运行表中记录了处理单元停止响应后一段时间内,运行的程序的函数名和代码行,相比于指令地址,所述函数名和代码行更直观的反映所述处理单元运行的程序,有助于进一步提高CPU故障分析效率。The control chip generates a second-type entry in the function operation table according to each first-type entry in the instruction address table, and the function operation table records that within a period of time after the processing unit stops responding, the operation The function name and code line of the program, compared with the instruction address, the function name and code line more intuitively reflect the program run by the processing unit, which helps to further improve the efficiency of CPU fault analysis.
表2示出了所述函数运行表的示例。Table 2 shows an example of the function execution table.
表2Table 2
如表2所示,函数运行表的每一个第二类表项包括六项内容,分别是:表项编号,PC中的指令地址对应的函数,PC中的指令地址对应的代码行,Ra中的指令地址对应的函数,Ra中的指令地址对应的代码行和时间。其中,Ra中的指令地址对应的函数,Ra中的指令地址对应的代码行和时间这三项是可选的。As shown in Table 2, each second-type entry in the function operation table includes six items, which are: the entry number, the function corresponding to the instruction address in the PC, the code line corresponding to the instruction address in the PC, and the The function corresponding to the instruction address in Ra, the code line and time corresponding to the instruction address in Ra. Among them, the function corresponding to the instruction address in Ra, the code line and time corresponding to the instruction address in Ra are optional.
每一个第二类表项,与S203中创建的一个第一类表项相对应。例如,表2中的表项i与表1中的表项1对应,表2中的表项ii与表1中的表项2对应,表2中的表项iii与表1中的表项3对应,表2中的表项M与表1中的表项N对应。所述每一个第二类表项中,所述PC中的指令地址对应的函数,以及所述PC中的指令地址对应的代码行,分别是指与所述第二类表项对应的第一类表项中记录的PC中的指令地址在所述预先设定的所述处理单元中的指令地址与函数名和代码行的对应关系中查询得到的函数名和代码行。同理,所述Ra中的指令地址对应的函数,以及所述Ra中的指令地址对应的代码行,分别是指与所述第二类表项对应的第一类表项中记录的Ra中的指令地址在所述预先设定的所述处理单元中的指令地址与函数名和代码行的对应关系中查询得到的函数名和代码行。Each entry of the second type corresponds to an entry of the first type created in S203. For example, entry i in Table 2 corresponds to entry 1 in Table 1, entry ii in Table 2 corresponds to entry 2 in Table 1, and entry iii in Table 2 corresponds to entry 1 in Table 1 3, and entry M in Table 2 corresponds to entry N in Table 1. In each of the second-type entries, the function corresponding to the instruction address in the PC and the code line corresponding to the instruction address in the PC refer to the first The instruction address in the PC recorded in the class entry is the function name and code line obtained by querying the preset correspondence between the instruction address in the processing unit and the function name and code line. Similarly, the function corresponding to the instruction address in Ra and the code line corresponding to the instruction address in Ra respectively refer to the The instruction address of the function name and the code line obtained by querying the preset correspondence between the instruction address in the processing unit and the function name and the code line.
在表2中不包括所述Ra中的指令地址对应的函数以及Ra中的指令地址对应的代码行的情况下,通过表2对CPU的故障进行分析,可以发现函数B和函数C多次交替被执行,据此分析死循环可能出现在函数B和函数C相互调用过程中。In the case that the function corresponding to the instruction address in Ra and the code line corresponding to the instruction address in Ra are not included in Table 2, the fault of the CPU is analyzed through Table 2, and it can be found that function B and function C alternate multiple times is executed, and according to this analysis, the infinite loop may appear in the process of calling each other between function B and function C.
进一步地,分析处理器故障的工程师,通过查询程序源代码,可以获得函数B与函数C之间的调用关系,例如函数B调用了函数C,则分析出函数B调用函数C的代码可能是死循环产生的原因。Furthermore, the engineer who analyzes the processor fault can obtain the call relationship between function B and function C by querying the source code of the program. The cause of the cycle.
在表2中包括所述Ra中的指令地址对应的函数以及Ra中的指令地址对应的代码行的情况下,则可以更加直观地获得函数之间的调用关系。例如通过表2可以看出,函数B的第129行代码对应的指令反复调用函数C。由于函数B和函数C交替出现,说明函数C能够正常运行并返回函数B。因此,分析出函数B第129行代码前后的一段指令可能是死循环产生的原因。In the case where Table 2 includes the function corresponding to the instruction address in Ra and the code line corresponding to the instruction address in Ra, the calling relationship between functions can be obtained more intuitively. For example, it can be seen from Table 2 that the instruction corresponding to the 129th line of code of function B calls function C repeatedly. Since function B and function C appear alternately, it means that function C can run normally and return function B. Therefore, it is analyzed that a section of instructions before and after the 129th line of code of function B may be the cause of the infinite loop.
如果在所述控制芯片检测到所述处理单元停止响应时立即触发中断,中断的时刻PC中的指令地址可能是函数C中的指令地址,那么将较难高效地分析出故障产生的原因。If an interrupt is triggered immediately when the control chip detects that the processing unit stops responding, the instruction address in the PC at the moment of the interrupt may be the instruction address in the function C, so it will be difficult to efficiently analyze the cause of the fault.
图3示出了本申请实施例提供的另一种生成处理器故障记录的方法,应用于包括控制芯片和多核CPU的硬件平台中,所述多核CPU包括第一处理单元和第二处理单元,所述第一处理单元和所述第二处理单元是所述多核CPU中的从核。例如,所述硬件平台可以是图1所示的硬件平台,所述控制芯片可以是图1所示的控制芯片102,所述CPU可以是图1所示的CPU101,所述第一处理单元可以是图1所示的第一处理单元1011,所述第二处理单元可以是图1所示的第二处理单元1012。FIG. 3 shows another method for generating a processor fault record provided by an embodiment of the present application, which is applied to a hardware platform including a control chip and a multi-core CPU, and the multi-core CPU includes a first processing unit and a second processing unit, The first processing unit and the second processing unit are slave cores in the multi-core CPU. For example, the hardware platform may be the hardware platform shown in FIG. 1, the control chip may be the control chip 102 shown in FIG. 1, the CPU may be the CPU 101 shown in FIG. 1, and the first processing unit may be is the first processing unit 1011 shown in FIG. 1 , and the second processing unit may be the second processing unit 1012 shown in FIG. 1 .
下面将结合图3对本申请实施例提供的另一种生成处理器故障记录的方法进行详细阐述。Another method for generating a processor fault record provided by the embodiment of the present application will be described in detail below with reference to FIG. 3 .
S301,所述控制芯片检测到所述第一处理单元停止响应。S301. The control chip detects that the first processing unit stops responding.
举例来说,所述控制芯片可以直接对所述第一处理单元检测,确定所述第一处理单元停止响应;也可以由所述多核CPU中的主核对所述第一处理单元是否停止响应进行检测,当所述主核检测到所述第一处理单元停止响应,向所述控制芯片发送指示信息,所述指示信息中携带所述第一处理单元的标识,所述控制芯片根据所述指示信息,确定所述第一处理单元停止响应。For example, the control chip may directly detect the first processing unit to determine whether the first processing unit stops responding; the main core in the multi-core CPU may also check whether the first processing unit stops responding. detection, when the main core detects that the first processing unit has stopped responding, it sends instruction information to the control chip, the instruction information carries the identification of the first processing unit, and the control chip information, it is determined that the first processing unit stops responding.
S302,如果所述第一处理单元对应的指令地址表中已记录的表项数量未到达第一预设值,所述控制芯片通过联合测试工作组JTAG通道获取所述第一处理单元当前的程序计数器PC中的指令地址,并执行S303,所述第一预设值大于等于2。如果所述第一处理单元对应的指令地址表中已记录的表项数量到达所述第一预设值,跳过执行S303。S302, if the number of entries recorded in the instruction address table corresponding to the first processing unit does not reach the first preset value, the control chip acquires the current program of the first processing unit through the joint test working group JTAG channel The instruction address in the counter PC, and execute S303, the first preset value is greater than or equal to 2. If the number of entries recorded in the instruction address table corresponding to the first processing unit reaches the first preset value, skip execution of S303.
可选的,S302中所述控制芯片还通过所述JTAG通道获取所述第一处理单元当前的函数返回地址寄存器中的指令地址。Optionally, in S302, the control chip also obtains the instruction address in the current function return address register of the first processing unit through the JTAG channel.
S303,所述控制芯片创建一个第一类表项,将所述第一类表项记录在所述第一处理单元对应的指令地址表中,所述第一类表项中包括所述第一处理单元当前的PC中的指令地址。S303. The control chip creates a first-type entry, and records the first-type entry in the instruction address table corresponding to the first processing unit, and the first-type entry includes the first The address of the instruction in the current PC of the processing unit.
可选的,如果S302中所述控制芯片还通过所述JTAG通道获取所述第一处理单元当前的函数返回地址寄存器中的指令地址,则S303中所述控制芯片将所述第一处理单元当前的函数返回地址寄存器中的指令地址添加在所述第一类表项中。Optionally, if the control chip in S302 obtains the instruction address in the current function return address register of the first processing unit through the JTAG channel, the control chip in S303 converts the current The instruction address in the function return address register is added to the first type entry.
S304,如果所述第二处理单元对应的指令地址表中已记录的表项数量未达到第二预设值,所述控制芯片通过所述JTAG通道获取所述第二处理单元当前的PC中的指令地址,并执行S305,所述第二预设值大于等于2。S304. If the number of entries recorded in the instruction address table corresponding to the second processing unit does not reach the second preset value, the control chip obtains the current address of the second processing unit in the PC through the JTAG channel. Instruction address, and execute S305, the second preset value is greater than or equal to 2.
可选的,所述控制芯片还通过所述JTAG通道获取所述第二处理单元当前的函数返回地址寄存器中的指令地址。Optionally, the control chip also obtains the instruction address in the current function return address register of the second processing unit through the JTAG channel.
如果所述第二处理单元对应的指令地址表中已记录的表项数量到达所述第二预设值,跳过S305。If the number of entries recorded in the instruction address table corresponding to the second processing unit reaches the second preset value, S305 is skipped.
S305,所述控制芯片创建另一个第一类表项,并将所述另一个第一类表项记录在所述第二处理单元对应的指令地址表中,所述另一个第一类表项中包括所述第二处理单元当前的PC中的指令地址。S305. The control chip creates another first-type entry, and records the other first-type entry in the instruction address table corresponding to the second processing unit, and the other first-type entry Include the instruction address in the current PC of the second processing unit.
可选的,如果S304中所述控制芯片还通过所述JTAG通道获取所述第二处理单元当前的函数返回地址寄存器中的指令地址,则S305中所述控制芯片将所述第二处理单元当前的函数返回地址寄存器中的指令地址添加在所述另一个第一类表项中。Optionally, if the control chip in S304 also obtains the instruction address in the current function return address register of the second processing unit through the JTAG channel, the control chip in S305 converts the current The instruction address in the function return address register is added to the other first-type table entry.
需要说明的是,本申请中,可以先S302以及S303,后执行S304以及S305,也可以先执行S304以及S305,后执行S302以及S303。It should be noted that, in this application, S302 and S303 may be executed first, and then S304 and S305 may be executed, or S304 and S305 may be executed first, and then S302 and S303 are executed.
S306,所述控制芯片判断所述第一处理单元对应的指令地址表和所述第二处理单元对应的指令地址表中的至少一个指令地址表中记录的表项数量是否达到对应的预设值。如果所述第一处理单元对应的指令地址表和所述第二处理单元对应的指令地址表中的至少一个指令地址表中记录的表项数量没有达到所述对应的预设值,所述控制芯片返回执行S302步骤。如果所述第一处理单元对应的指令地址表和所述第二处理单元对应的指令地址表中记录的表项数量均达到对应的预设值,执行S307。S306. The control chip determines whether the number of entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches a corresponding preset value . If the number of entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit does not reach the corresponding preset value, the control The chip returns to step S302. If the number of entries recorded in the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit both reach a corresponding preset value, execute S307.
S307,所述控制单元触发所述多核CPU中断。具体实现方式与S205类似。S307. The control unit triggers an interrupt of the multi-core CPU. The specific implementation manner is similar to that of S205.
举例来说,所述第一处理单元对应的指令地址表,所述第二处理单元对应的指令地址表与表1所示的指令地址表相似。For example, the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit are similar to the instruction address table shown in Table 1.
通过上述方案,控制芯片在第一处理单元停止响应时,通过JTAG通道在一段时间内,获取多项所述第一处理单元的PC中存放的指令地址并记录在指令地址表中。所述指令地址表反映了所述第一处理单元停止响应后一段时间内,所述第一处理单元运行程序的状况。同时,所述控制芯片还记录与所述第一处理单元处于同一个多核CPU中的第二处理单元的PC中存放的指令地址。由于死循环时,会导致死循环函数及相应的一段代码行被反复调用,且在多核CPU中,不同从核之间的函数存在相互调用,因此,本申请提供的方法,相比于现有技术中,检测到所述第一处理单元停止响应后立即触发中断,并且只记录中断时刻所述第一处理单元和所述第二处理单元正在运行的指令地址而言,更准确地反映出现死循环的函数和代码区间,有助于提高CPU故障分析的效率。Through the above solution, when the first processing unit stops responding, the control chip obtains multiple instruction addresses stored in the PC of the first processing unit within a period of time through the JTAG channel and records them in the instruction address table. The instruction address table reflects the state of the program running by the first processing unit within a period of time after the first processing unit stops responding. At the same time, the control chip also records the instruction address stored in the PC of the second processing unit in the same multi-core CPU as the first processing unit. Due to an infinite loop, the infinite loop function and a corresponding section of code line will be called repeatedly, and in a multi-core CPU, there are mutual calls between functions between different slave cores. Therefore, the method provided by this application is better than the existing In the technology, an interrupt is triggered immediately after detecting that the first processing unit stops responding, and only the address of the instruction being executed by the first processing unit and the second processing unit at the time of the interruption is recorded, so as to more accurately reflect the occurrence of the dead Loop function and code section help to improve the efficiency of CPU fault analysis.
可选的,所述方法进一步包括S308和S309。Optionally, the method further includes S308 and S309.
S308,所述控制芯片按照表项存储的先后顺序,依次从所述第一处理单元对应的指令地址表中读取一个第一类表项,针对读取的每个第一类表项,执行:所述控制芯片根据所述第一类表项中包括的PC中的指令地址,从预先设定的所述第一处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述PC中的指令地址对应的函数名和代码行。所述控制芯片创建一个第二类表项,所述第二类表项中包括所述第一类表项中包括的PC中的指令地址对应的函数名和代码行。所述控制芯片按照所述第二类表项生成的先后顺序,将所述第二类表项记录在第一处理单元对应的函数运行表中。S308, the control chip sequentially reads a first-type entry from the instruction address table corresponding to the first processing unit according to the order in which the entries are stored, and executes : According to the instruction address in the PC included in the first type of table item, the control chip obtains all The function name and code line corresponding to the instruction address in the PC included in the first type of entry. The control chip creates a second type of table entry, and the second type of table entry includes the function name and code line corresponding to the instruction address in the PC included in the first type of table entry. The control chip records the second type of entries in the function operation table corresponding to the first processing unit according to the sequence in which the second type of entries are generated.
可选的,如果S303中所述控制芯片将所述第一处理单元当前的函数返回地址寄存器中的指令地址添加在所述第一类表项中,则S308中所述控制芯片针对读取的每个第一类表项,还包括执行:所述控制芯片根据所述第一类表项中包括的函数返回寄存器中的指令地址,从预先设定的所述第一处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述函数返回寄存器中的指令地址对应的函数名和代码行;将所述第一类表项中包括的所述函数返回寄存器中的指令地址对应的函数名和代码行添加在所述第二类表项中。Optionally, if the control chip in S303 adds the instruction address in the current function return address register of the first processing unit to the first type entry, then in S308 the control chip reads Each first-type entry further includes executing: the control chip returns the instruction address in the register according to the function included in the first-type entry, from the preset instruction address in the first processing unit In the corresponding relationship with the function name and the code line, query the function name and code line corresponding to the instruction address in the function return register included in the first type of entry; The function name and code line corresponding to the instruction address in the function return register are added to the second type of entry.
S309,所述控制芯片按照表项存储的先后顺序,依次从所述第二处理单元对应的指令地址表中读取一个第一类表项,针对读取的每个第一类表项,执行:所述控制芯片根据所述第一类表项中包括的PC中的指令地址,从预先设定的所述第二处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述PC中的指令地址对应的函数名和代码行。所述控制芯片创建一个第二类表项,所述第二类表项中包括所述第一类表项中包括的PC中的指令地址对应的函数名和代码行。所述控制芯片按照所述第二类表项生成的先后顺序,将所述第二类表项记录在第二处理单元对应的函数运行表中。S309, the control chip sequentially reads a first-type entry from the instruction address table corresponding to the second processing unit according to the order in which the entries are stored, and executes for each read first-type entry : According to the instruction address in the PC included in the first type of table item, the control chip obtains all The function name and code line corresponding to the instruction address in the PC included in the first type of entry. The control chip creates a second type of table entry, and the second type of table entry includes the function name and code line corresponding to the instruction address in the PC included in the first type of table entry. The control chip records the second type of entries in the function operation table corresponding to the second processing unit according to the sequence in which the second type of entries are generated.
可选的,如果S305中所述控制芯片将所述第二处理单元当前的函数返回地址寄存器中的指令地址添加在所述第一类表项中,则S309中所述控制芯片针对读取的每个第一类表项,还包括执行:所述控制芯片根据所述第一类表项中包括的函数返回寄存器中的指令地址,从预先设定的所述第二处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述函数返回寄存器中的指令地址对应的函数名和代码行;将所述第一类表项中包括的所述函数返回寄存器中的指令地址对应的函数名和代码行添加在所述第二类表项中。Optionally, if the control chip in S305 adds the instruction address in the current function return address register of the second processing unit to the first type entry, then in S309 the control chip reads Each entry of the first type further includes execution: the control chip returns the instruction address in the register according to the function included in the entry of the first type, from the preset instruction address in the second processing unit In the corresponding relationship with the function name and the code line, query the function name and code line corresponding to the instruction address in the function return register included in the first type of entry; The function name and code line corresponding to the instruction address in the function return register are added to the second type of entry.
需要说明的是,S308与S309执行的先后顺序,本申请不做限制。It should be noted that the sequence of execution of S308 and S309 is not limited in this application.
通过将所述第一处理单元以及所述第二处理单元当前的函数返回地址寄存器中的指令地址记录在相应的指令地址表中,所述指令地址表能够更清楚地反映相应的处理单元正在运行的函数之间的调用关系,进一步提高了CPU故障分析的效率。By recording the instruction addresses in the current function return address registers of the first processing unit and the second processing unit in the corresponding instruction address table, the instruction address table can more clearly reflect that the corresponding processing unit is running The call relationship between functions further improves the efficiency of CPU fault analysis.
表3为所述第一处理单元对应的指令地址表,以及所述第二处理单元对应的指令地址表的一个示例。Table 3 is an example of an instruction address table corresponding to the first processing unit and an instruction address table corresponding to the second processing unit.
表3table 3
如表3所示,函数运行表的每一个第二类表项包括六项内容,分别是:表项编号,PC中的指令地址对应的函数,PC中的指令地址对应的代码行,Ra中的指令地址对应的函数,Ra中的指令地址对应的代码行和时间。其中,Ra中的指令地址对应的函数,Ra中的指令地址对应的代码行和时间这三项是可选的。As shown in Table 3, each second-type entry in the function operation table includes six items, which are: the entry number, the function corresponding to the instruction address in the PC, the code line corresponding to the instruction address in the PC, and the The function corresponding to the instruction address in Ra, the code line and time corresponding to the instruction address in Ra. Among them, the function corresponding to the instruction address in Ra, the code line and time corresponding to the instruction address in Ra are optional.
在表3中不包括所述Ra中的指令地址对应的函数以及Ra中的指令地址对应的代码行的情况下,通过表3对CPU的故障进行分析,可以发现第一处理单元在函数B的运行到第300行代码之后就停止继续运行,且第二处理单元中的函数C被反复运行。In the case that the function corresponding to the instruction address in Ra and the code line corresponding to the instruction address in Ra are not included in Table 3, the fault of the CPU is analyzed through Table 3, and it can be found that the first processing unit is in function B. After running to the 300th line of code, it stops running, and the function C in the second processing unit is repeatedly run.
进一步地,分析CPU故障的工程师,可以通过查询程序源代码,可以获得函数B调用了所述第二处理单元的函数C,且函数C一直在循环运行,分析出所述第一处理器停止响应,可能是由于所述第二处理单元的函数C死循环造成的。在表3中包括所述Ra中的指令地址对应的函数以及Ra中的指令地址对应的代码行的情况下,则可以更加直观地获得函数之间的调用关系。Further, the engineer who analyzes the CPU fault can obtain that function B calls function C of the second processing unit by querying the program source code, and function C has been running in a loop, and it is analyzed that the first processor stops responding , may be caused by an infinite loop of function C of the second processing unit. In the case where Table 3 includes the function corresponding to the instruction address in Ra and the code line corresponding to the instruction address in Ra, the calling relationship between functions can be obtained more intuitively.
图4是本申请实施例提供的一种生成处理器故障记录的装置结构示意图。如图4所示,生成处理器故障记录的装置400包括第二处理器410,存储器420,JTAG接口430和总线440,所述第二处理器410,存储器420和JTAG接口430通过总线440相互连接。Fig. 4 is a schematic structural diagram of an apparatus for generating a processor fault record provided by an embodiment of the present application. As shown in FIG. 4 , the device 400 for generating a processor failure record includes a second processor 410, a memory 420, a JTAG interface 430 and a bus 440, and the second processor 410, the memory 420 and the JTAG interface 430 are interconnected through the bus 440 .
所述生成处理器故障记录的装置400应用于包括所述装置400和第一中央处理器CPU的硬件平台中,所述第一CPU中包括至少一个处理单元,所述装置通过JTAG通道与所述第一CPU通信。The device 400 for generating a processor fault record is applied to a hardware platform including the device 400 and a first central processing unit CPU, the first CPU includes at least one processing unit, and the device communicates with the device through a JTAG channel. The first CPU communicates.
存储器420包括但不限于是随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或者快闪存储器)、或便携式只读存储器(CD-ROM)。Memory 420 includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), or portable read-only memory (CD-ROM).
第二处理器410可以是一个或多个中央处理器(英文:Central Processing Unit,简称CPU),在第二处理器410是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。The second processor 410 can be one or more central processing units (English: Central Processing Unit, CPU for short), and in the case where the second processor 410 is a CPU, the CPU can be a single-core CPU or a multi-core CPU.
JTAG接口430可以是采用JTAG的相关协议中定义的接口。例如,在IEEE 1149.1标准中,定义了JTAG接口需要有四个接口,分别是测试数据输入(TDI,Test Data In),测试数据输出(TDO,Test Data Out),测试时钟(TCK,Test Clock)和测试模式选择(TMS,Test Mode Select)。The JTAG interface 430 may be an interface defined in a related protocol using JTAG. For example, in the IEEE 1149.1 standard, it is defined that the JTAG interface requires four interfaces, namely, test data input (TDI, Test Data In), test data output (TDO, Test Data Out), and test clock (TCK, Test Clock) And test mode selection (TMS, Test Mode Select).
所述JTAG接口430用于通过所述JTAG通道,获取所述第一CPU中的处理单元的程序计数器PC中的指令地址,并将所述PC中的指令地址通过所述总线440发送给所述第二处理器410。The JTAG interface 430 is used to obtain the instruction address in the program counter PC of the processing unit in the first CPU through the JTAG channel, and send the instruction address in the PC to the the second processor 410 .
可选的,所述JTAG接口430还用于通过所述JTAG通道,获取所述第一CPU中的处理单元的当前的函数返回寄存器中的指令地址,并将当前的所述函数返回寄存器中的指令地址通过所述总线440发送给所述第二处理器410。Optionally, the JTAG interface 430 is further configured to obtain the instruction address in the current function return register of the processing unit in the first CPU through the JTAG channel, and return the current function return instruction address in the register. The instruction address is sent to the second processor 410 through the bus 440 .
存储器420还用于存储表1所示的指令地址表,表2所示的函数运行表等等。The memory 420 is also used to store the instruction address table shown in Table 1, the function execution table shown in Table 2, and so on.
所述第二处理器410,用于读取所述存储器420中存储的程序代码,执行以下操作。The second processor 410 is configured to read the program code stored in the memory 420 and perform the following operations.
检测到所述第一CPU中的处理单元停止响应;detecting that a processing unit in the first CPU has stopped responding;
通过JTAG通道获取所述处理单元当前的程序计数器PC中的指令地址;Obtain the instruction address in the current program counter PC of the processing unit through the JTAG channel;
创建一个包括所述当前的PC中的指令地址的第一类表项,并将所述第一类表项记录在指令地址表中;Create a first type entry including the instruction address in the current PC, and record the first type entry in the instruction address table;
判断所述指令地址表中已记录的表项数量是否达到预设值,所述预设值大于等于2;judging whether the number of entries recorded in the instruction address table reaches a preset value, and the preset value is greater than or equal to 2;
如果所述指令地址表中已记录的表项数量没有达到所述预设值,返回执行所述通过JTAG通道获取所述处理单元当前的程序计数器PC中的指令地址步骤;If the number of entries recorded in the instruction address table does not reach the preset value, return to the step of obtaining the instruction address in the current program counter PC of the processing unit through the JTAG channel;
如果所述指令地址表中已记录的表项数量达到所述预设值,触发所述第一CPU中断。If the number of entries recorded in the instruction address table reaches the preset value, an interrupt of the first CPU is triggered.
可选的,所述第二处理器410还用于,在执行所述将所述第一类表项记录在指令地址表中之前,执行以下操作:Optionally, the second processor 410 is further configured to, before performing the recording of the first type entry in the instruction address table, perform the following operations:
通过所述JTAG通道获取所述处理单元当前的函数返回地址寄存器中的指令地址;Obtain the instruction address in the current function return address register of the processing unit through the JTAG channel;
将所述当前的函数返回地址寄存器中的指令地址添加在所述第一类表项中。Add the instruction address in the current function return address register to the first type entry.
可选的,所述第二处理器410返回执行所述通过JTAG通道获取所述处理单元当前的程序计数器PC中的指令地址步骤,包括执行:Optionally, the second processor 410 returns to execute the step of obtaining the instruction address in the current program counter PC of the processing unit through the JTAG channel, including executing:
延迟时间段T1;Delay period T1;
在所述时间段T1到达后,返回执行所述通过JTAG通道获取所述处理单元当前的程序计数器PC中的指令地址步骤。After the time period T1 arrives, return to the step of obtaining the instruction address in the current program counter PC of the processing unit through the JTAG channel.
可选的,所述第一CPU是多核CPU,所述处理单元是所述多核CPU的主核。Optionally, the first CPU is a multi-core CPU, and the processing unit is a main core of the multi-core CPU.
所述第二处理器检测到所述第一CPU中的处理单元停止响应,包括执行:The second processor detects that a processing unit in the first CPU stops responding, including performing:
对所述主核进行心跳检测,确定所述主核停止响应。A heartbeat detection is performed on the main core, and it is determined that the main core stops responding.
进一步地,所述第二处理器410还用于,按照表项存储的先后顺序,依次从所述指令地址表中读取一个第一类表项,针对读取的每个第一类表项,执行:Further, the second processor 410 is further configured to sequentially read a first-type entry from the instruction address table according to the order in which the entries are stored, and for each read first-type entry ,implement:
根据所述第一类表项中包括的PC中的指令地址,从预先设定的所述处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述PC中的指令地址对应的函数名和代码行;According to the instruction address in the PC included in the first type of entry, from the preset correspondence between the instruction address in the processing unit and the function name and code line, query and obtain the first type of entry The included function name and code line corresponding to the instruction address in the PC;
创建一个第二类表项,所述第二类表项中包括所述第一类表项中包括的PC中的指令地址对应的函数名和代码行;Create a second type of entry, the second type of entry includes the function name and code line corresponding to the instruction address in the PC included in the first type of entry;
按照所述第二类表项生成的先后顺序,将所述第二类表项记录在函数运行表中。According to the order in which the second-type entries are generated, the second-type entries are recorded in the function operation table.
进一步地,所述第二处理器410还用于,按照表项存储的先后顺序,依次从所述指令地址表中读取一个第一类表项,针对读取的每个第一类表项,执行:Further, the second processor 410 is further configured to sequentially read a first-type entry from the instruction address table according to the order in which the entries are stored, and for each read first-type entry ,implement:
根据所述第一类表项中包括的PC中的指令地址,从预先设定的所述处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述PC中的指令地址对应的函数名和代码行;According to the instruction address in the PC included in the first type of entry, from the preset correspondence between the instruction address in the processing unit and the function name and code line, query and obtain the first type of entry The included function name and code line corresponding to the instruction address in the PC;
根据所述第一类表项中包括的函数返回寄存器中的指令地址,从预先设定的所述处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述函数返回寄存器中的指令地址对应的函数名和代码行;According to the instruction address in the function return register included in the first type table item, from the preset correspondence between the instruction address in the processing unit, the function name and the code line, query and obtain the first type table The function included in the item returns the function name and code line corresponding to the instruction address in the register;
创建一个第二类表项,所述第二类表项中包括所述第一类表项中包括的PC中的指令地址对应的函数名和代码行,以及所述第一类表项中包括的函数返回寄存器中的指令地址对应的函数名和代码行;Create a second type of entry, the second type of entry includes the function name and code line corresponding to the instruction address in the PC included in the first type of entry, and the first type of entry included in the The function returns the function name and code line corresponding to the instruction address in the register;
按照所述第二类表项生成的先后顺序,将所述第二类表项记录在函数运行表中。According to the order in which the second-type entries are generated, the second-type entries are recorded in the function operation table.
本实施例中提供的装置400可以集成在图1所示的控制芯片102,包括所述装置400和所述第一CPU的硬件平台集成在图1所示的网络设备100中,所述第一CPU可以是图1所示的CPU101。The device 400 provided in this embodiment can be integrated in the control chip 102 shown in FIG. 1, and the hardware platform including the device 400 and the first CPU is integrated in the network device 100 shown in FIG. 1. The first The CPU may be the CPU 101 shown in FIG. 1 .
可选的,所述JTAG接口430集成在一个监控芯片中,所述第二处理器410集成在一个主控芯片中,所述监控芯片通过所述JTAG通道与所述第一CPU通信,所述主控芯片与所述监控芯片通过总线440通信。Optionally, the JTAG interface 430 is integrated in a monitoring chip, the second processor 410 is integrated in a main control chip, the monitoring chip communicates with the first CPU through the JTAG channel, and the The main control chip communicates with the monitoring chip through the bus 440 .
举例来说,所述装置400集成在图1所示的控制芯片102中,所述JTAG接口集成在图1所示的监控芯片1022中,所述第二处理器410集成在图1所示的主控芯片1021中,所述第一CPU可以是图1所示的CPU101。所述第二处理器410与所述JTAG接口430之间的总线440可以由图1所示的主控芯片1021和监控芯片1022之间的总线实现,所述JTAG通道可以由图1所示的监控芯片1022与CPU101之间的总线实现。主控芯片1011、监控芯片1021以及CPU101之间连接的具体结构,请参照图1的具体描述。For example, the device 400 is integrated in the control chip 102 shown in FIG. 1, the JTAG interface is integrated in the monitoring chip 1022 shown in FIG. 1, and the second processor 410 is integrated in the In the main control chip 1021, the first CPU may be the CPU 101 shown in FIG. 1 . The bus 440 between the second processor 410 and the JTAG interface 430 can be realized by the bus between the main control chip 1021 shown in Figure 1 and the monitoring chip 1022, and the JTAG channel can be realized by the bus shown in Figure 1 The bus between the monitoring chip 1022 and the CPU 101 is realized. For the specific structure of the connection between the main control chip 1011 , the monitoring chip 1021 and the CPU 101 , please refer to the detailed description in FIG. 1 .
本实施例提供的生成处理器故障记录的装置400可以应用于图2实施例的方法中,实现其控制芯片的功能。所述装置400可以实现的其他附加功能、以及与第一CPU的交互过程,请参照方法实施例中对控制芯片的描述,在这里不再赘述。The apparatus 400 for generating a processor fault record provided in this embodiment can be applied to the method of the embodiment in FIG. 2 to realize the function of the control chip. For other additional functions that can be realized by the apparatus 400 and the interaction process with the first CPU, please refer to the description of the control chip in the method embodiment, and details will not be repeated here.
通过上述方案,生成处理器故障记录的装置检测到在处理单元停止响应时,通过JTAG接口在一段时间内,获取多项所述处理单元的PC中存放的指令地址并记录在指令地址表中。所述指令地址表反映了所述处理单元停止响应后一段时间内,所述处理单元运行程序的状况。由于当所述处理单元进入死循环时,会导致死循环函数及相应的一段代码行被反复调用,因此,本申请提供的方法,相比于现有技术中,检测到所述处理单元停止响应后立即触发中断,并且只记录中断时刻所述处理单元正在运行的指令地址而言,更准确地反映出现死循环的函数和代码区间,有助于提高CPU故障分析的效率。Through the above solution, when the device for generating the processor fault record detects that the processing unit stops responding, it obtains multiple instruction addresses stored in the PC of the processing unit through the JTAG interface within a period of time and records them in the instruction address table. The instruction address table reflects the status of the program running by the processing unit within a period of time after the processing unit stops responding. Since when the processing unit enters an infinite loop, the infinite loop function and a corresponding piece of code line will be called repeatedly, therefore, the method provided by this application, compared with the prior art, detects that the processing unit stops responding In terms of triggering an interrupt immediately after the interrupt and only recording the address of the instruction that the processing unit is running at the time of the interrupt, it can more accurately reflect the function and code interval where the infinite loop occurs, and help to improve the efficiency of CPU fault analysis.
图5是本申请实施例提供的一种生成处理器故障记录的装置结构示意图。如图5所示,生成处理器故障记录的装置500包括第二处理器510,存储器520,JTAG接口530和总线540,所述第二处理器510,存储器520和JTAG接口530通过总线540相互连接。Fig. 5 is a schematic structural diagram of an apparatus for generating a processor fault record provided by an embodiment of the present application. As shown in FIG. 5 , the device 500 for generating a processor fault record includes a second processor 510, a memory 520, a JTAG interface 530 and a bus 540, and the second processor 510, the memory 520 and the JTAG interface 530 are interconnected through the bus 540 .
所述生成处理器故障记录的装置500应用于包括所述装置500和第一多核中央处理器CPU的硬件平台中,所述第一CPU中包括第一处理单元和第二处理单元,所述第一处理单元和所述第二处理单元是所述第一多核CPU的从核,所述装置通过JTAG通道与所述第一多核CPU通信。The device 500 for generating a processor fault record is applied to a hardware platform including the device 500 and a first multi-core central processing unit CPU, the first CPU includes a first processing unit and a second processing unit, the The first processing unit and the second processing unit are slave cores of the first multi-core CPU, and the device communicates with the first multi-core CPU through a JTAG channel.
存储器520包括但不限于是随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或者快闪存储器)、或便携式只读存储器(CD-ROM)。Memory 520 includes, but is not limited to, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), or portable read-only memory (CD-ROM).
第二处理器510可以是一个或多个中央处理器(英文:Central Processing Unit,简称CPU),在处理器510是一个CPU的情况下,该CPU可以是单核CPU,也可以是多核CPU。The second processor 510 may be one or more central processing units (English: Central Processing Unit, CPU for short), and when the processor 510 is a CPU, the CPU may be a single-core CPU or a multi-core CPU.
JTAG接口530可以是采用JTAG的相关协议中定义的接口。例如,在IEEE 1149.1标准中,定义了JTAG接口需要有四个接口,分别是测试数据输入(TDI,Test Data In),测试数据输出(TDO,Test Data Out),测试时钟(TCK,Test Clock)和测试模式选择(TMS,Test Mode Select)。The JTAG interface 530 may be an interface defined in a related protocol using JTAG. For example, in the IEEE 1149.1 standard, it is defined that the JTAG interface requires four interfaces, namely, test data input (TDI, Test Data In), test data output (TDO, Test Data Out), and test clock (TCK, Test Clock) And test mode selection (TMS, Test Mode Select).
所述JTAG接口530用于通过所述JTAG通道,获取所述第一多核CPU中的处理单元的程序计数器PC中的指令地址,并将所述PC中的指令地址通过所述总线540发送给所述第二处理器510。The JTAG interface 530 is used to obtain the instruction address in the program counter PC of the processing unit in the first multi-core CPU through the JTAG channel, and send the instruction address in the PC to the The second processor 510.
可选的,所述JTAG接口530还用于通过所述JTAG通道,获取所述第一多核CPU中的处理单元的当前的函数返回寄存器中的指令地址,并将当前的所述函数返回寄存器中的指令地址通过所述总线540发送给所述第二处理器510。Optionally, the JTAG interface 530 is also used to obtain the instruction address in the current function return register of the processing unit in the first multi-core CPU through the JTAG channel, and return the current function return register The instruction address in is sent to the second processor 510 through the bus 540 .
存储器520还用于存储表1所示的指令地址表,表3所示的函数运行表等等。The memory 520 is also used to store the instruction address table shown in Table 1, the function execution table shown in Table 3, and so on.
所述第二处理器510,用于读取所述存储器520中存储的程序代码,执行以下操作。The second processor 510 is configured to read the program code stored in the memory 520 and perform the following operations.
检测到所述第一处理单元停止响应;detecting that the first processing unit has stopped responding;
如果所述第一处理单元对应的指令地址表中已记录的表项数量未到达第一预设值,通过JTAG接口获取所述第一处理单元当前的程序计数器PC中的指令地址,所述第一预设值大于等于2;If the number of entries recorded in the instruction address table corresponding to the first processing unit has not reached the first preset value, obtain the instruction address in the current program counter PC of the first processing unit through the JTAG interface, and the second A preset value is greater than or equal to 2;
创建一个第一类表项,所述第一类表项记录在所述第一处理单元对应的指令地址表中,所述第一类表项中包括所述第一处理单元当前的PC中的指令地址;Create a first type of entry, the first type of entry is recorded in the instruction address table corresponding to the first processing unit, and the first type of entry includes the current PC of the first processing unit command address;
如果所述第二处理单元对应的指令地址表中已记录的表项数量未达到第二预设值,通过所述JTAG接口获取所述第二处理单元当前的PC中的指令地址,所述第二预设值大于等于2;If the number of entries recorded in the instruction address table corresponding to the second processing unit does not reach the second preset value, obtain the instruction address in the current PC of the second processing unit through the JTAG interface, and the second processing unit 2. The preset value is greater than or equal to 2;
创建一个另一个第一类表项,并将所述另一个第一类表项记录在所述第二处理单元对应的指令地址表中,所述另一个第一类表项中包括所述第二处理单元当前的PC中的指令地址;Create another first-type entry, and record the other first-type entry in the instruction address table corresponding to the second processing unit, the other first-type entry includes the first 2. The instruction address in the current PC of the processing unit;
判断所述第一处理单元对应的指令地址表和所述第二处理单元对应的指令地址表中的至少一个指令地址表中记录的表项数量是否达到对应的预设值;judging whether the number of entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches a corresponding preset value;
如果所述第一处理单元对应的指令地址表和所述第二处理单元对应的指令地址表中的至少一个指令地址表中记录的表项数量没有达到所述对应的预设值,返回执行所述通过所述JTAG通道获取所述第一处理单元当前的PC中的指令地址步骤;If the number of entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit does not reach the corresponding preset value, return to execute the Describe the step of obtaining the instruction address in the current PC of the first processing unit through the JTAG channel;
如果所述第一处理单元对应的指令地址表和所述第二处理单元对应的指令地址表中的至少一个指令地址表中记录的表项数量达到所述对应的预设值,触发所述第一多核CPU中断。If the number of entries recorded in at least one of the instruction address table corresponding to the first processing unit and the instruction address table corresponding to the second processing unit reaches the corresponding preset value, triggering the second A multi-core CPU interrupt.
可选的,所述第二处理器510执行所述将所述第一类表项记录在所述第一处理单元对应的指令地址表中之前,还用于执行:Optionally, before the second processor 510 executes the recording of the first type entry in the instruction address table corresponding to the first processing unit, it is further configured to execute:
通过所述JTAG接口530获取所述第一处理单元当前的函数返回地址寄存器中的指令地址;Obtain the instruction address in the current function return address register of the first processing unit through the JTAG interface 530;
将所述第一处理单元当前的函数返回地址寄存器中的指令地址添加在所述第一类表项中;adding the instruction address in the current function return address register of the first processing unit to the first type entry;
所述第二处理器510执行所述将所述另一个第一类表项记录在所述第二处理单元对应的指令地址表中之前,还用于执行:Before the second processor 510 executes the recording of the other first-type entry in the instruction address table corresponding to the second processing unit, it is further configured to:
通过所述JTAG接口530获取所述第二处理单元当前的函数返回地址寄存器中的指令地址;Obtain the instruction address in the current function return address register of the second processing unit through the JTAG interface 530;
将所述第二处理单元当前的函数返回地址寄存器中的指令地址添加在所述另一个第一类表项中。Add the instruction address in the current function return address register of the second processing unit to the other first type entry.
可选的,所述第二处理器510检测到所述第一处理单元停止响应,包括执行:Optionally, the second processor 510 detects that the first processing unit stops responding, including executing:
接收所述第一多核CPU中主核发送的指示信息,所述指示信息中携带所述第一处理单元的标识;receiving instruction information sent by the main core in the first multi-core CPU, where the instruction information carries the identifier of the first processing unit;
所述第二处理器510根据所述指示信息,确定所述第一处理单元停止响应。The second processor 510 determines, according to the indication information, that the first processing unit stops responding.
进一步地,所述第二处理器510还用于,按照表项存储的先后顺序,依次从所述第一处理单元对应的指令地址表中读取一个第一类表项,针对读取的每个第一类表项,执行:Further, the second processor 510 is further configured to sequentially read a first-type entry from the instruction address table corresponding to the first processing unit according to the order in which the entries are stored, and for each read A first-type entry, execute:
根据所述第一类表项中包括的PC中的指令地址,从预先设定的所述第一处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述PC中的指令地址对应的函数名和代码行;According to the instruction address in the PC included in the first-type table entry, the first-type table is queried to obtain the first-type table from the preset correspondence between the instruction address in the first processing unit, the function name, and the code line The function name and code line corresponding to the instruction address in the PC included in the item;
创建一个第二类表项,所述第二类表项中包括所述第一类表项中包括的PC中的指令地址对应的函数名和代码行;Create a second type of entry, the second type of entry includes the function name and code line corresponding to the instruction address in the PC included in the first type of entry;
按照所述第二类表项生成的先后顺序,将所述第二类表项记录在第一处理单元对应的函数运行表中。According to the order in which the entries of the second type are generated, the entries of the second type are recorded in the function operation table corresponding to the first processing unit.
进一步地,所述第二处理器510还用于按照表项存储的先后顺序,依次从所述第二处理单元对应的指令地址表中读取一个第一类表项,针对读取的每个第一类表项,执行:Further, the second processor 510 is further configured to sequentially read a first-type entry from the instruction address table corresponding to the second processing unit according to the order in which the entries are stored, and for each read For the first type of entry, execute:
根据所述第一类表项中包括的PC中的指令地址,从预先设定的所述第二处理单元中的指令地址与函数名和代码行的对应关系中,查询得到所述第一类表项中包括的所述PC中的指令地址对应的函数名和代码行;According to the instruction address in the PC included in the first-type table entry, the first-type table is queried to obtain the first-type table from the preset correspondence between the instruction address in the second processing unit, the function name, and the code line The function name and code line corresponding to the instruction address in the PC included in the item;
创建一个第二类表项,所述第二类表项中包括所述第一类表项中包括的PC中的指令地址对应的函数名和代码行;Create a second type of entry, the second type of entry includes the function name and code line corresponding to the instruction address in the PC included in the first type of entry;
按照所述第二类表项生成的先后顺序,将所述第二类表项记录在第二处理单元对应的函数运行表中。According to the order in which the entries of the second type are generated, the entries of the second type are recorded in the function operation table corresponding to the second processing unit.
本实施例中提供的装置500可以集成在图1所示的控制芯片102,包括所述装置500和所述第一多核CPU的硬件平台集成在图1所示的网络设备100中,所述第一CPU可以是图1所示的CPU101,所述第一处理单元可以是图1所示的第一处理单元1011,所述第二处理单元可以是图1所示的第二处理单元1012。The device 500 provided in this embodiment can be integrated in the control chip 102 shown in FIG. 1, and the hardware platform including the device 500 and the first multi-core CPU is integrated in the network device 100 shown in FIG. 1, the The first CPU may be the CPU 101 shown in FIG. 1 , the first processing unit may be the first processing unit 1011 shown in FIG. 1 , and the second processing unit may be the second processing unit 1012 shown in FIG. 1 .
可选的,所述JTAG接口530集成在一个监控芯片中,所述第二处理器510集成在一个主控芯片中,所述监控芯片通过所述JTAG通道与所述第一多核CPU通信,所述主控芯片与所述监控芯片通过总线540通信。Optionally, the JTAG interface 530 is integrated in a monitoring chip, the second processor 510 is integrated in a main control chip, and the monitoring chip communicates with the first multi-core CPU through the JTAG channel, The main control chip communicates with the monitoring chip through the bus 540 .
举例来说,所述装置500集成在图1所示的控制芯片102中,所述JTAG接口集成在图1所示的监控芯片1022中,所述第二处理器510集成在图1所示的主控芯片1021中,所述第一CPU可以是图1所示的CPU101。所述第二处理器510与所述JTAG接口530之间的总线540可以由图1所示的主控芯片1021和监控芯片1022之间的总线实现,所述JTAG通道可以由图1所示的监控芯片1022与CPU101之间的总线实现。主控芯片1011、监控芯片1021以及CPU101之间连接的具体结构,请参照图1的具体描述。For example, the device 500 is integrated in the control chip 102 shown in FIG. 1, the JTAG interface is integrated in the monitoring chip 1022 shown in FIG. 1, and the second processor 510 is integrated in the In the main control chip 1021, the first CPU may be the CPU 101 shown in FIG. 1 . The bus 540 between the second processor 510 and the JTAG interface 530 can be realized by the bus between the main control chip 1021 shown in Figure 1 and the monitoring chip 1022, and the JTAG channel can be realized by the bus shown in Figure 1 The bus between the monitoring chip 1022 and the CPU 101 is realized. For the specific structure of the connection between the main control chip 1011 , the monitoring chip 1021 and the CPU 101 , please refer to the detailed description in FIG. 1 .
本实施例提供的生成处理器故障记录的装置500可以应用于图3实施例的方法中,实现其控制芯片的功能。所述装置500可以实现的其他附加功能、以及与所述第一多核CPU的交互过程,请参照方法实施例中对控制芯片的描述,在这里不再赘述。The apparatus 500 for generating a processor fault record provided in this embodiment can be applied to the method of the embodiment in FIG. 3 to realize the function of the control chip. For other additional functions that the apparatus 500 can implement and the interaction process with the first multi-core CPU, please refer to the description of the control chip in the method embodiment, and details will not be repeated here.
通过上述方案,故障记录生成装置在第一处理单元停止响应时,通过JTAG通道在一段时间内,获取多项所述第一处理单元的PC中存放的指令地址并记录在指令地址表中。所述指令地址表反映了所述第一处理单元停止响应后一段时间内,所述第一处理单元运行程序的状况。同时,所述故障记录生成装置还记录与所述第一处理单元处于同一个多核CPU中的第二处理单元的PC中存放的指令地址。由于死循环时,会导致死循环函数及相应的一段代码行被反复调用,且在多核CPU中,不同从核之间的函数存在相互调用,因此,本申请提供的方法,相比于现有技术中,检测到所述第一处理单元停止响应后立即触发中断,并且只记录中断时刻所述第一处理单元和所述第二处理单元正在运行的指令地址而言,更准确地反映出现死循环的函数和代码区间,有助于提高CPU故障分析的效率。Through the above solution, when the first processing unit stops responding, the fault record generation device obtains multiple instruction addresses stored in the PC of the first processing unit within a period of time through the JTAG channel and records them in the instruction address table. The instruction address table reflects the state of the program running by the first processing unit within a period of time after the first processing unit stops responding. At the same time, the fault record generation device also records the instruction address stored in the PC of the second processing unit in the same multi-core CPU as the first processing unit. Due to an infinite loop, the infinite loop function and a corresponding section of code line will be called repeatedly, and in a multi-core CPU, there are mutual calls between functions between different slave cores. Therefore, the method provided by this application is better than the existing In the technology, an interrupt is triggered immediately after detecting that the first processing unit stops responding, and only the address of the instruction being executed by the first processing unit and the second processing unit at the time of the interruption is recorded, so as to more accurately reflect the occurrence of the dead Loop function and code section help to improve the efficiency of CPU fault analysis.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, refer to part of the description of the method embodiment.
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。Apparently, those skilled in the art can make various changes and modifications to the present application without departing from the scope of the present application. In this way, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalent technologies, the present application is also intended to include these modifications and variations.
Claims (25)
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510992820.8A CN106919462B (en) | 2015-12-25 | 2015-12-25 | A method and apparatus for generating a processor fault record |
| PCT/CN2016/098537 WO2017107576A1 (en) | 2015-12-25 | 2016-09-09 | Method and device for generating fault record of processor |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201510992820.8A CN106919462B (en) | 2015-12-25 | 2015-12-25 | A method and apparatus for generating a processor fault record |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN106919462A true CN106919462A (en) | 2017-07-04 |
| CN106919462B CN106919462B (en) | 2020-04-21 |
Family
ID=59088920
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201510992820.8A Active CN106919462B (en) | 2015-12-25 | 2015-12-25 | A method and apparatus for generating a processor fault record |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN106919462B (en) |
| WO (1) | WO2017107576A1 (en) |
Cited By (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108873010A (en) * | 2018-06-20 | 2018-11-23 | 北京亿信华辰软件有限责任公司 | A kind of silo stock amount detection terminal |
| CN109491856A (en) * | 2017-09-12 | 2019-03-19 | 中兴通讯股份有限公司 | Monitoring bus system, method and device |
| CN112232027A (en) * | 2020-10-19 | 2021-01-15 | 腾讯科技(深圳)有限公司 | A symbol translation method, apparatus, device and computer-readable storage medium |
| CN113832663A (en) * | 2021-09-18 | 2021-12-24 | 珠海格力电器股份有限公司 | Control chip fault recording method and device and control chip fault reading method |
| CN114020645A (en) * | 2021-12-03 | 2022-02-08 | 海宁奕斯伟集成电路设计有限公司 | Test method, device, equipment, readable storage medium and computer program product |
| CN119473695A (en) * | 2025-01-16 | 2025-02-18 | 芯云晟(杭州)电子科技有限公司 | Methods for locating faults in complex RiscV core operation situations |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN112084050B (en) * | 2019-06-14 | 2024-12-24 | 北京北方华创微电子装备有限公司 | Information recording method and system |
| CN113220334B (en) * | 2021-05-25 | 2024-04-16 | 百富计算机技术(深圳)有限公司 | Program fault location method, terminal device and computer readable storage medium |
| CN114416408A (en) * | 2021-12-13 | 2022-04-29 | 飞腾信息技术有限公司 | Interrupt processing method and device |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1484159A (en) * | 2002-09-19 | 2004-03-24 | 华为技术有限公司 | Method for Centralized Control Processing Using CPU on System Board |
| CN101131657A (en) * | 2006-08-25 | 2008-02-27 | 华为技术有限公司 | System and method for assisting CPU to drive chip |
| US20090307436A1 (en) * | 2008-06-06 | 2009-12-10 | International Business Machines Corporation | Hypervisor Page Fault Processing in a Shared Memory Partition Data Processing System |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20060095624A1 (en) * | 2004-11-03 | 2006-05-04 | Ashok Raj | Retargeting device interrupt destinations |
| US7594144B2 (en) * | 2006-08-14 | 2009-09-22 | International Business Machines Corporation | Handling fatal computer hardware errors |
| CN101149636B (en) * | 2007-10-23 | 2010-07-07 | 华为技术有限公司 | Repositioning system and method |
| CN101556551B (en) * | 2009-04-15 | 2011-12-21 | 杭州华三通信技术有限公司 | Hardware acquisition system and method for equipment failure log |
| CN102214137B (en) * | 2010-04-06 | 2014-01-22 | 华为技术有限公司 | Debugging method and debugging equipment |
| CN102662889B (en) * | 2012-04-24 | 2016-12-14 | 华为技术有限公司 | Interruption processing method, interrupt control unit and processor |
-
2015
- 2015-12-25 CN CN201510992820.8A patent/CN106919462B/en active Active
-
2016
- 2016-09-09 WO PCT/CN2016/098537 patent/WO2017107576A1/en not_active Ceased
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1484159A (en) * | 2002-09-19 | 2004-03-24 | 华为技术有限公司 | Method for Centralized Control Processing Using CPU on System Board |
| CN101131657A (en) * | 2006-08-25 | 2008-02-27 | 华为技术有限公司 | System and method for assisting CPU to drive chip |
| US20090307436A1 (en) * | 2008-06-06 | 2009-12-10 | International Business Machines Corporation | Hypervisor Page Fault Processing in a Shared Memory Partition Data Processing System |
Cited By (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109491856A (en) * | 2017-09-12 | 2019-03-19 | 中兴通讯股份有限公司 | Monitoring bus system, method and device |
| WO2019052275A1 (en) * | 2017-09-12 | 2019-03-21 | 中兴通讯股份有限公司 | Bus monitoring system, method and apparatus |
| US11093361B2 (en) | 2017-09-12 | 2021-08-17 | Zte Corporation | Bus monitoring system, method and apparatus |
| CN108873010A (en) * | 2018-06-20 | 2018-11-23 | 北京亿信华辰软件有限责任公司 | A kind of silo stock amount detection terminal |
| CN112232027A (en) * | 2020-10-19 | 2021-01-15 | 腾讯科技(深圳)有限公司 | A symbol translation method, apparatus, device and computer-readable storage medium |
| CN112232027B (en) * | 2020-10-19 | 2024-12-03 | 腾讯科技(深圳)有限公司 | A symbol translation method, device, equipment and computer-readable storage medium |
| CN113832663A (en) * | 2021-09-18 | 2021-12-24 | 珠海格力电器股份有限公司 | Control chip fault recording method and device and control chip fault reading method |
| CN113832663B (en) * | 2021-09-18 | 2022-08-16 | 珠海格力电器股份有限公司 | Control chip fault recording method and device and control chip fault reading method |
| CN114020645A (en) * | 2021-12-03 | 2022-02-08 | 海宁奕斯伟集成电路设计有限公司 | Test method, device, equipment, readable storage medium and computer program product |
| CN119473695A (en) * | 2025-01-16 | 2025-02-18 | 芯云晟(杭州)电子科技有限公司 | Methods for locating faults in complex RiscV core operation situations |
| CN119473695B (en) * | 2025-01-16 | 2025-04-22 | 芯云晟(杭州)电子科技有限公司 | Method for locating RiscV core faults under complex operation condition |
Also Published As
| Publication number | Publication date |
|---|---|
| CN106919462B (en) | 2020-04-21 |
| WO2017107576A1 (en) | 2017-06-29 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN106919462A (en) | A kind of method and device for generating processor fault record | |
| CN112131118B (en) | Mock testing method and device, electronic equipment and computer readable storage medium | |
| US9569325B2 (en) | Method and system for automated test and result comparison | |
| CN103109276B (en) | System detection method | |
| WO2021248754A1 (en) | System testing method and apparatus, and storage medium and electronic device | |
| EP3167371B1 (en) | A method for diagnosing power supply failure in a wireless communication device | |
| EP3591485B1 (en) | Method and device for monitoring for equipment failure | |
| CN104683180A (en) | Performance monitoring method and system as well as application server | |
| CN106155883A (en) | A virtual machine reliability testing method and device | |
| CN114153783B (en) | Method, system, computer device and storage medium for implementing multi-core communication mechanism | |
| WO2021056913A1 (en) | Fault locating method, apparatus and system based on i2c communication | |
| CN111966599A (en) | Virtualization platform reliability testing method, system, terminal and storage medium | |
| Schmidt et al. | Checkpoint/restart and beyond: Resilient high performance computing with FPGAs | |
| JP2006164185A (en) | Debug device | |
| CN112068980B (en) | Method and device for sampling information before CPU suspension, equipment and storage medium | |
| US9195524B1 (en) | Hardware support for performance analysis | |
| CN114138600A (en) | Storage method, device, equipment and storage medium for firmware key information | |
| US20180349253A1 (en) | Error handling for device programmers and processors | |
| CN106656684B (en) | Grid resource reliability monitoring method and device | |
| CN115756935A (en) | Abnormal fault positioning method, device and equipment of embedded software system | |
| WO2023185266A1 (en) | Automatic detection method, single board, electronic device and storage medium | |
| CN114238067B (en) | A method for fast locating abnormal processes based on program performance counting | |
| US8205117B2 (en) | Migratory hardware diagnostic testing | |
| CN112925700A (en) | Program debugging method, device and system and embedded equipment | |
| US20260064565A1 (en) | Mechanisms for assessing service resilience through fault injections |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |