WO2017016255A1 - Procédé et appareil de traitement parallèle d'instructions multiples de lancement de micromoteur, et support de stockage - Google Patents

Procédé et appareil de traitement parallèle d'instructions multiples de lancement de micromoteur, et support de stockage Download PDF

Info

Publication number
WO2017016255A1
WO2017016255A1 PCT/CN2016/080579 CN2016080579W WO2017016255A1 WO 2017016255 A1 WO2017016255 A1 WO 2017016255A1 CN 2016080579 W CN2016080579 W CN 2016080579W WO 2017016255 A1 WO2017016255 A1 WO 2017016255A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction
instructions
class
thread
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/080579
Other languages
English (en)
Chinese (zh)
Inventor
周峰
安康
王志忠
刘衡祁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sanechips Technology Co Ltd
Original Assignee
Sanechips Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sanechips Technology Co Ltd filed Critical Sanechips Technology Co Ltd
Publication of WO2017016255A1 publication Critical patent/WO2017016255A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead

Definitions

  • the present invention relates to network processor technology, and in particular, to a network processor micro-engine (ME, Micro Engine) multi-transmission instruction parallel processing method and device, and a storage medium.
  • ME network processor micro-engine
  • Micro Engine multi-transmission instruction parallel processing method and device
  • the core routers in the backbone of the Internet have undergone one technological change.
  • network processors have become an irreplaceable part of the routing and forwarding engine with its outstanding packet processing performance and programmability.
  • the ME is the core component of the network processor, and is responsible for parsing and processing the message according to the Microcode Instructions. Therefore, the processing performance of the microengine is an important parameter of the network processor, which determines the overall performance of the network processor.
  • the traditional single-embedding instruction pipeline can only process one instruction at a time, and complete one type of operation in the logic calculation/jump/data movement, which causes many other execution units to be in an idle state.
  • the kernel's resources are not fully utilized, ie the microengine performance is not maximized.
  • the existing multi-issue instruction pipeline mainly uses ultra-long instruction set technology.
  • users should try to use as many different executable units as possible in a very long instruction according to their requirements to improve instruction parallelism.
  • This kind of scheme relies mainly on the pre-compilation stage, and the user uses the parallel use of the execution unit, which increases the complexity of user programming, thereby increasing the labor cost.
  • the storage of very long instructions requires a larger instruction memory, which increases the cost of the chip.
  • an embodiment of the present invention provides a multi-transmission instruction parallel processing method and device, and a storage medium of a micro engine.
  • the instructions are parsed by the parallel decoding unit to obtain an instruction type of each instruction and an address of the source operand;
  • the determining and marking the correlation between the instructions includes:
  • the determining, according to the marking, whether to transmit the instructions in parallel comprises:
  • the multi-port kernel register is divided into two groups of registers according to threads, and each group of registers includes four register units; two source operands of one instruction are respectively in two different register units; one thread The destination operands of the two instructions are in two different register units;
  • the multi-port core register has eight data read ports and four data write ports, and supports four instruction accesses, each of which accesses two source operands and one destination operand.
  • the instruction type of the instruction is mainly classified into a logical computing class instruction, a data uploading/downloading class instruction, and a jump class instruction; each instruction type large class includes a plurality of instruction small classes; each The thread corresponds to a set of executable units, including: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit;
  • the assigning the corresponding executable unit to the instruction according to the instruction type of the instruction comprises:
  • the respective logic calculation class execution unit is allocated in the thread;
  • the instruction type is an upload/download class instruction
  • the respective data upload/download class execution units are allocated in the thread
  • the respective executable units are allocated according to the constraints.
  • a compiling unit configured to determine and mark the correlation between the instructions, and determine whether to transmit the instructions in parallel according to the marking
  • a parallel decoding unit configured to parse the instructions in parallel when the instructions are transmitted in parallel Obtain the instruction type of each instruction and the address of the source operand;
  • a read unit configured to obtain a source operand in the multi-port kernel register according to an address of a source operand of the instruction
  • An instruction allocating unit configured to allocate, according to an instruction type of the instruction, a corresponding executable unit to process the source operand
  • a write unit configured to store processing results in a multiport core register.
  • the compiling unit is further configured to determine whether the destination registers of the two instructions are in the same area; when the destination registers of the two subsequent instructions are not in the same area, determine the destination registers of the two instructions before and after Whether there is a data adventure; the current two registers of the destination register does not exist data adventure, determine whether the instruction type of the two instructions before and after is different; when the current two instructions have different instruction types, determine whether the previous instruction is a jump instruction When the current instruction is not a jump instruction, it is determined that the two instructions are irrelevant, and an irrelevant flag is placed on the latter instruction.
  • the compiling unit is further configured to: when the latter instruction is provided with an irrelevant flag, one thread simultaneously transmits two instructions before and after.
  • the multi-port kernel register is divided into two groups of registers according to threads, and each group of registers includes four register units; two source operands of one instruction are respectively in two different register units; one thread The destination operands of the two instructions are in two different register units;
  • the multi-port core register has eight data read ports and four data write ports, and supports four instruction accesses, each of which accesses two source operands and one destination operand.
  • the instruction type of the instruction is mainly classified into a logical computing class instruction, a data uploading/downloading class instruction, and a jump class instruction; each instruction type large class includes a plurality of instruction small classes; each The thread corresponds to a set of executable units, including: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit;
  • the instruction allocating unit is further configured to allocate instructions of each group to respective corresponding executable units when two instructions of one thread are inconsistent; when two instructions of one thread are in a large class and the instruction class is inconsistent
  • the thread allocates its own logical computing class execution unit; when the instruction is a generic class upload/download class instruction, the thread allocates its own data upload/download.
  • Class execution unit when one of the instructions is a jump class instruction, the respective executable unit is allocated according to the constraint.
  • the storage medium provided by the embodiment of the present invention stores a computer program for executing the multi-transmission instruction parallel processing method of the micro engine.
  • the compiling unit completes the judgment and marking of the correlation between the instructions, thereby reducing the complexity of the microcode personnel programming; determining whether to transmit the instructions in parallel according to the marking; and using parallel decoding when transmitting the instructions in parallel
  • the unit parses the instruction to obtain an instruction type of each instruction and an address of the source operand, and implements parallel parsing of the multi-transmission instruction; and then, according to the address of the source operand of the instruction, obtains in the multi-port kernel register a source operand; processing the source operand by assigning a corresponding executable unit to the instruction according to the instruction type of the instruction; storing the processing result in a multi-port kernel register.
  • the unique multi-end kernel register structure can well support multiple instruction parallel processing, and the corresponding executable unit can also perform parallel processing on the source operand, which greatly improves the performance of the microengine.
  • FIG. 1 is a schematic flowchart diagram of a multi-transmission instruction parallel processing method of a micro engine according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of parallel processing of multiple transmit instructions according to an embodiment of the present invention
  • FIG. 3 is a flow chart showing the correlation between a judgment and a mark instruction according to an embodiment of the present invention.
  • FIG. 4 is a flowchart of a pipeline read source operand and a writeback destination register according to an embodiment of the present invention
  • FIG. 5 is a structural diagram of a multi-port kernel register according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of a pipeline parallel processing instruction according to an embodiment of the present invention.
  • FIG. 7 is a structural diagram of a parallel decoding unit and an instruction allocation unit according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a multi-transmission instruction parallel processing apparatus of a microengine according to an embodiment of the present invention.
  • a multi-transmission instruction parallel processing method and apparatus for a micro-engine completes inter-instruction correlation judgment and labeling by a compiling unit; a unique multi-port kernel register structure is designed; a parallel decoding unit and an executable unit are adopted Complete parallel processing of multiple transmit instructions.
  • FIG. 1 is a schematic flowchart of a multi-transmission instruction parallel processing method of a micro-engine according to an embodiment of the present invention. As shown in FIG. 1 , the multi-transmission instruction parallel processing method of the micro-engine includes the following steps:
  • Step 101 Determine and mark the correlation between the instructions, and determine whether to transmit the instructions in parallel according to the mark.
  • the correlation between the instructions includes:
  • Embodiments of the present invention support simultaneous scheduling of two thread executions, namely, thread A and thread B.
  • the compiling unit judges the correlation between the two instructions before and after compiling, and sets the irrelevant flag of the instruction to be valid when the last two instructions are irrelevant.
  • each thread decides whether to transmit one instruction or two instructions at the same time according to the irrelevant flag.
  • the parallelism of instructions can be maximized and execution can be performed.
  • the efficiency of the unit reduces the performance loss caused by the idle unit, thus improving the overall performance of the ME.
  • the determining and marking the correlation between the instructions includes:
  • the determining, according to the marking, whether to transmit the instructions in parallel comprises:
  • Step 102 When the instructions are transmitted in parallel, the instructions are parsed by the parallel decoding unit to obtain the instruction type of each instruction and the address of the source operand.
  • the instruction enters the pipeline decoding stage and performs instruction parsing 201.
  • the embodiment of the present invention provides four parallel decoding units.
  • the decoding unit decodes the instruction and parses out the instruction type.
  • the instruction type includes:
  • the instruction classes are divided into logical computing class instructions, data uploading/downloading class instructions, and jump class instructions.
  • Each instruction class includes multiple instruction classes, for example, logical computing class instructions include addition operations and sword operations. , or logical operations, etc., each instruction class has its own separate instruction code.
  • the types of instructions described in the embodiments of the present invention mainly refer to the instruction subclass of each instruction.
  • the parallel decoding unit also parses the address of the source operand required by the instruction in the multiport core register.
  • Step 103 Obtain a source operand in a multi-port kernel register according to an address of a source operand of the instruction.
  • the multi-port kernel register is accessed to obtain the source operand 202.
  • the multi-port kernel register of the embodiment of the present invention provides eight data read ports and four data write ports, and can simultaneously support four instruction accesses, each of which can access two source operands and one destination operand.
  • the multi-port kernel register is divided into two groups of registers according to threads, and each group of registers includes four register units; two source operands of one instruction are respectively in two different register units; one thread The destination operands of the two instructions are in two different register locations.
  • Step 104 Process the source operand by assigning a corresponding executable unit to the instruction according to the instruction type of the instruction.
  • the instruction allocation unit starts the allocation of the executable unit according to the instruction type to maximize the processing performance 203.
  • the executable unit includes: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit.
  • the three types of execution units described in the embodiments of the present invention respectively perform the execution functions of the three types of instructions.
  • Embodiments of the present invention provide two sets of logical computing class execution units, two sets of data uploading/downloading class execution units, and two sets of jump class execution units.
  • the pipeline of the embodiment of the present invention has at most four instructions executed at the same time, and the instruction allocation unit allocates the instructions to the respective executable units according to the respective instruction types, and ensures that the same type of instructions are allocated to different groups of executable units, and cannot Generating resource conflicts triggers structural adventures.
  • the instruction types in the embodiments of the present invention are classified into logical computing instructions, data uploading/downloading.
  • the respective logic calculation class execution unit is allocated in the thread;
  • the instruction type is an upload/download class instruction
  • the respective data upload/download class execution units are allocated in the thread
  • the respective executable units are allocated according to the constraints.
  • Step 105 Store the processing result in a multi-port kernel register.
  • the instructions are allocated to the respective executable units and the execution is completed.
  • the processed result after execution needs to be written back to the specified destination register, and if it is a jump type instruction, the address 204 is re-addressed from the instruction memory.
  • the kernel register of the embodiment of the invention provides four data write ports, and supports up to four instructions to complete data write back. After the operation result is written back, an instruction is processed.
  • FIG. 3 is a flowchart of determining and marking the correlation between instructions according to an embodiment of the present invention, where the process includes the following steps:
  • Step 301 Determine whether the destination registers of the two instructions before and after are in the same area.
  • the same area is mainly:
  • Multi-port kernel registers can provide 32 registers for each thread, numbered in order From 0 to register 31, each register space is 4 bytes. Register 0 to register 15 are divided into one area, and register 16 to register 31 are divided into another area.
  • step 302 If the destination registers of the two instructions are in the same area, then it is determined that the instructions are related to each other. As shown in FIG. 3, if the conditions are not met, the compiling unit discards the irrelevant flag. If the destination registers of the two instructions are not in the same area, then the decision of step 302 is continued.
  • Step 302 Determine whether there is a data risk in the destination register of the two instructions before and after.
  • the data adventure is mainly: whether the source operand register of the latter instruction is the destination register of the previous instruction.
  • step 303 If there are data risks in the two instructions before and after, then it is determined that the instructions are related before and after, as shown in Figure 3, the compiler unit discards the irrelevant flag as shown in Figure 3. If there is no data risk in the previous two instructions, then the determination in step 303 is continued.
  • Step 303 Determine whether the instruction types of the two instructions before and after are different, and do not use the same executable unit.
  • the type of instruction judged here is an instruction subclass except for the jump class instruction. If the two instruction subclasses are the same, then it is determined to be related to the instruction before and after. If the same instruction is a jump class instruction, then only the instruction class is judged. Assume that the preceding and following instructions are related. As shown in FIG. 3, the compiling unit discards the irrelevant flag if the condition is not met. If the two instruction types are different before and after, it is determined that the two instructions are irrelevant, and the unrelated flag is placed in the latter instruction.
  • Step 304 Determine whether the previous instruction is a jump instruction.
  • Step 305 If the previous instruction is a jump instruction, then it is determined that the preceding and following instructions are related, and the compiling unit discards the irrelevant flag if the condition is not met.
  • Step 306 If the previous instruction is not a jump instruction, it is determined that the two instructions are irrelevant, and an irrelevant flag is set in the latter instruction.
  • FIG. 4 is a flowchart of a pipeline read source operand and a write-back destination register according to an embodiment of the present invention, and the flow includes the following steps:
  • Step 401 According to the thread allocation, the multi-port kernel registers are divided into two groups.
  • the multi-port kernel register module of the embodiment of the present invention is divided into two groups, namely, a group of registers of thread A and thread B, and each group of registers provides four register units.
  • the four register units of the thread A include: a set of registers 0 to 15 being register unit 0, another set of registers 0 to 15 being register unit 2; and a set of registers 16 to 31 For register unit 1, another set of registers 16 through 31 is register unit 3.
  • the four register unit division rules of thread B are the same as thread A, which are register units 4, 5, 6, and 7, respectively.
  • Step 402 Within the group, the source operand is read according to constraints and instructions.
  • the two source operands of an instruction use the operands in two regions as much as possible, that is, one in register 0 to register 15, and the other in register 16 to register 31.
  • read port 0 and read port 1 are supplied to instruction 0 to complete the reading of the source operand, and so on, read port 2 and read port 3 are provided to instruction 1, and read port 4 and read port 5 are provided to the instruction. 2, read port 6 and read port 7 are provided to instruction 3, so that one instruction can access all 32 registers, and can also obtain two different operands, and can fully utilize the kernel register read port, and support up to four instructions. Access multiple port core registers simultaneously.
  • Step 403 In the group, the operation result is written back to the destination register according to the constraint and the instruction.
  • the destination registers of the two instructions of a thread also use the registers in the two regions.
  • write port 0 is supplied to instruction 0, the operand result is written back to the destination register, and so on, write port 1 is supplied to instruction 1, write port 2 is supplied to instruction 2, and write port 3 is supplied to instruction. 3, in this way, you can fully utilize the kernel register write port, which supports up to four fingers. Let the kernel registers be accessed at the same time.
  • FIG. 6 is a flowchart of a pipeline parallel processing instruction according to an embodiment of the present invention, where the process includes the following steps:
  • Step 601 Instructing parallel decoding to parse out the instruction type.
  • four decoding units analyze and decode four instructions, and parse out the respective instruction types.
  • the three types of instruction types described in the embodiments of the patent the logic calculation type instruction, the data upload/download type instruction, and the jump type instruction.
  • Step 602 Group the executable units according to the type of the instruction.
  • the embodiment of the present invention provides two sets of logical computing class execution units, two sets of data uploading/downloading class execution units, and two sets of jump class execution units, respectively, for providing a set of logical computing class executions for threads A and B respectively.
  • Unit, data upload/download class execution unit, and jump class execution unit are two sets of logical computing class execution units, respectively, for providing a set of logical computing class executions for threads A and B respectively.
  • the main classification rules here are mainly for the case where the two instruction classes of one thread are the same and the instruction classes are inconsistent. If the instructions of the two instructions are inconsistent, then only the instructions of each group need to be assigned to their respective corresponding ones. The executable unit is OK and there is no conflict.
  • the first case the instruction type is a logical calculation class instruction, the small class is inconsistent, and the respective calculation unit is allocated in the thread.
  • the second case the instruction type is an upload/download class instruction, the small class is inconsistent, and the respective data upload/download unit is allocated in the thread.
  • the third case one of the instructions is a jump class instruction, and the respective execution units are allocated according to the constraints.
  • the compiler unit constraint if the previous one is a jump instruction, then one thread only transmits one instruction, then only the jump execution unit is assigned to this instruction. If the latter one is a jump instruction, the execution units are assigned according to the type.
  • Step 603 The instruction allocating unit completes the effective allocation of the executable unit.
  • FIG. 8 is a schematic structural diagram of a multi-transmission instruction parallel processing apparatus of a micro-engine according to an embodiment of the present invention. As shown in FIG. 8, the multi-transmission instruction parallel processing apparatus of the micro-engine includes:
  • the compiling unit 81 is configured to determine and mark the correlation between the instructions, and determine, according to the flag, whether to transmit the instructions in parallel;
  • the parallel decoding unit 82 is configured to parse the instructions in parallel when the instructions are transmitted in parallel, to obtain an instruction type of each instruction and an address of the source operand;
  • a reading unit 83 configured to obtain a source operand in the multi-port kernel register according to an address of the source operand of the instruction
  • the instruction allocating unit 84 is configured to allocate, according to the instruction type of the instruction, the corresponding executable unit to process the source operand;
  • Write unit 85 is configured to store the processing results in a multi-port core register.
  • the compiling unit 81 is further configured to determine whether the destination registers of the two instructions are in the same area; when the destination registers of the two subsequent instructions are not in the same area, determine whether the destination registers of the two instructions have data risk; When the destination register of the last two instructions does not exist, the data type of the two instructions is different. When the instruction types of the two previous instructions are different, it is judged whether the previous instruction is a jump instruction; the current one is not When the instruction is jumped, it is determined that the two instructions are irrelevant, and an irrelevant flag is placed on the latter instruction.
  • the compiling unit 81 is further configured to: when the latter instruction is provided with an irrelevant flag, one thread transmits two instructions before and after in parallel.
  • the multi-port kernel registers are divided into two sets of registers according to threads, each set of registers including four register units; two source operands of one instruction are respectively in two different register units; the purpose operation of two instructions of one thread The numbers are in two different register units;
  • the multi-port core register has eight data read ports and four data write ports, and supports four instruction accesses, each of which accesses two source operands and one destination operand.
  • the instruction type of the instruction is divided into a logical calculation type instruction, a data upload/download type instruction, and a jump type instruction; each instruction type includes a plurality of instruction subclasses in a large class; each thread corresponds to a set of executables.
  • the unit includes: a logical computing class execution unit, a data upload/download class execution unit, and a jump class execution unit;
  • the instruction allocating unit 84 is further configured to allocate the instructions of each group to the corresponding executable units when the two instructions of one thread are inconsistent; when two instructions of one thread are of the same type and the instruction class is small When there is inconsistency, it is processed according to the following three situations: when the instruction type is a logic calculation class instruction, the thread is assigned a respective logic calculation class execution unit; when the instruction type is an upload/download class instruction, the respective data upload is assigned in the thread/ The class execution unit is downloaded; when one of the instructions is a jump class instruction, the respective executable unit is allocated according to the constraint.
  • the multi-transmission instruction parallel processing method and device of the microengine completes the judgment and labeling of the inter-instruction correlation by the compiling unit; and designs a unique kernel register structure supporting multi-port access;
  • the decoding unit and the instruction dispatch unit perform parallel processing of the multi-transmit instructions.
  • the embodiment of the present invention firstly reduces the complexity of micro-code software personnel programming by compiling the correlation and marking of instructions between the instructions; in addition, the unique multi-port access kernel register structure can well support multiple instructions for parallel processing;
  • the parallel decoding unit and the instruction allocation unit implement parallel processing of multiple transmission instructions, and the implementation of the scheme is relatively simple and extremely large. Improved the performance of the microengine.
  • Each of the above units may be implemented by a central processing unit (CPU), a digital signal processor (DSP), or a field-programmable gate array (FPGA) in an electronic device.
  • CPU central processing unit
  • DSP digital signal processor
  • FPGA field-programmable gate array
  • the apparatus for tracking the service signaling may also be stored in a computer readable storage medium if it is implemented in the form of a software function module and sold or used as a separate product.
  • the technical solution of the embodiments of the present invention may be embodied in the form of a software product in essence or in the form of a software product stored in a storage medium, including a plurality of instructions.
  • a computer device (which may be a personal computer, server, or network device, etc.) is caused to perform all or part of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a read only memory (ROM), a magnetic disk, or an optical disk.
  • program codes such as a USB flash drive, a mobile hard disk, a read only memory (ROM), a magnetic disk, or an optical disk.
  • the embodiment of the present invention further provides a storage medium, wherein a computer program for executing a multi-transmission instruction parallel processing method of the micro-engine of the embodiment of the present invention is stored.
  • embodiments of the present invention can be provided as a method, system, or computer program product. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or a combination of software and hardware. Moreover, the invention can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
  • the judgment and marking of the inter-instruction correlation can reduce the complexity of the micro-code personnel programming; whether the instruction is transmitted in parallel according to the flag; when the instruction is transmitted in parallel, the instruction is parsed to obtain the instruction of each instruction.
  • the instruction allocates a corresponding executable unit to process the source operand.
  • the unique multi-end kernel register structure can well support multiple instruction parallel processing, and the corresponding executable unit can also perform parallel processing on the source operand, which greatly improves the performance of the microengine.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

L'invention concerne un procédé et un appareil de traitement parallèle d'instructions multiples de lancement d'un micromoteur, et un support de stockage. Le procédé comporte les étapes consistant à: juger et noter une corrélation entre des instructions, et juger, d'après les notes, s'il convient de traiter des instructions de lancement en parallèle; lorsque les instructions de lancement sont traitées en parallèle, analyser les instructions au moyen d'une unité de décodage parallèle pour obtenir un type d'instruction de chaque instruction et une adresse d'un opérande source; acquérir un opérande source dans un registre de noyau multiport d'après l'adresse de l'opérande source de chaque instruction; attribuer des unités exécutables correspondantes aux instructions en vue de traiter l'opérande source selon le type d'instruction de chaque instruction; et stocker un résultat de traitement dans le registre de noyau multiport.
PCT/CN2016/080579 2015-07-29 2016-04-28 Procédé et appareil de traitement parallèle d'instructions multiples de lancement de micromoteur, et support de stockage Ceased WO2017016255A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510456059.6 2015-07-29
CN201510456059.6A CN106406820B (zh) 2015-07-29 2015-07-29 一种网络处理器微引擎的多发射指令并行处理方法及装置

Publications (1)

Publication Number Publication Date
WO2017016255A1 true WO2017016255A1 (fr) 2017-02-02

Family

ID=57884049

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/080579 Ceased WO2017016255A1 (fr) 2015-07-29 2016-04-28 Procédé et appareil de traitement parallèle d'instructions multiples de lancement de micromoteur, et support de stockage

Country Status (2)

Country Link
CN (1) CN106406820B (fr)
WO (1) WO2017016255A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113703841A (zh) * 2021-09-10 2021-11-26 中国人民解放军国防科技大学 一种寄存器数据读取的优化方法、装置及介质
CN114217856A (zh) * 2021-12-17 2022-03-22 中国人民解放军国防科技大学 面向AArch64架构的CPU指令微基准测试方法及系统
CN114816535A (zh) * 2022-04-29 2022-07-29 深圳大学 利用超标量处理器进行指令融合的方法及相关设备
CN117008975A (zh) * 2023-06-14 2023-11-07 进迭时空(杭州)科技有限公司 一种指令融合分割方法、处理器核和处理器
CN117093270A (zh) * 2023-08-18 2023-11-21 摩尔线程智能科技(北京)有限责任公司 指令发送方法、装置、设备及存储介质
CN118642762A (zh) * 2024-08-14 2024-09-13 北京开源芯片研究院 指令处理方法、装置、电子设备及可读存储介质
CN119415155A (zh) * 2025-01-08 2025-02-11 山东浪潮科学研究院有限公司 Risc-v指令加速方法、系统、设备及存储介质
CN119537041A (zh) * 2025-01-23 2025-02-28 山东浪潮科学研究院有限公司 一种基于gpgpu的指令执行加速方法和gpgpu架构

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109257280B (zh) * 2017-07-14 2022-05-27 深圳市中兴微电子技术有限公司 一种微引擎及其处理报文的方法
CN111240682B (zh) * 2018-11-28 2024-11-08 深圳市中兴微电子技术有限公司 一种指令数据的处理方法及装置、设备、存储介质
CN115657090B (zh) * 2022-10-24 2023-04-28 上海时空奇点智能技术有限公司 Gnss北斗定位模块接口数据低延时解析处理方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999026132A2 (fr) * 1997-11-17 1999-05-27 Advanced Micro Devices, Inc. Processeur configure pour generer des resultats d'anticipation et pour reduire les deplacements et les comparaisons, et instructions arythmetiques simples recevant les resultats d'anticipation
CN101706715A (zh) * 2009-12-04 2010-05-12 北京龙芯中科技术服务中心有限公司 指令调度装置和方法
CN101957743A (zh) * 2010-10-12 2011-01-26 中国电子科技集团公司第三十八研究所 并行数字信号处理器
CN102945148A (zh) * 2012-09-26 2013-02-27 中国航天科技集团公司第九研究院第七七一研究所 一种并行指令集的实现方法
CN103218207A (zh) * 2012-01-18 2013-07-24 上海算芯微电子有限公司 基于单/双发射指令集的微处理器指令处理方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO1999026132A2 (fr) * 1997-11-17 1999-05-27 Advanced Micro Devices, Inc. Processeur configure pour generer des resultats d'anticipation et pour reduire les deplacements et les comparaisons, et instructions arythmetiques simples recevant les resultats d'anticipation
CN101706715A (zh) * 2009-12-04 2010-05-12 北京龙芯中科技术服务中心有限公司 指令调度装置和方法
CN101957743A (zh) * 2010-10-12 2011-01-26 中国电子科技集团公司第三十八研究所 并行数字信号处理器
CN103218207A (zh) * 2012-01-18 2013-07-24 上海算芯微电子有限公司 基于单/双发射指令集的微处理器指令处理方法及系统
CN102945148A (zh) * 2012-09-26 2013-02-27 中国航天科技集团公司第九研究院第七七一研究所 一种并行指令集的实现方法

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113703841A (zh) * 2021-09-10 2021-11-26 中国人民解放军国防科技大学 一种寄存器数据读取的优化方法、装置及介质
CN113703841B (zh) * 2021-09-10 2023-09-26 中国人民解放军国防科技大学 一种寄存器数据读取的优化方法、装置及介质
CN114217856A (zh) * 2021-12-17 2022-03-22 中国人民解放军国防科技大学 面向AArch64架构的CPU指令微基准测试方法及系统
CN114816535A (zh) * 2022-04-29 2022-07-29 深圳大学 利用超标量处理器进行指令融合的方法及相关设备
CN117008975A (zh) * 2023-06-14 2023-11-07 进迭时空(杭州)科技有限公司 一种指令融合分割方法、处理器核和处理器
CN117093270A (zh) * 2023-08-18 2023-11-21 摩尔线程智能科技(北京)有限责任公司 指令发送方法、装置、设备及存储介质
CN117093270B (zh) * 2023-08-18 2024-06-14 摩尔线程智能科技(北京)有限责任公司 指令发送方法、装置、设备及存储介质
CN118642762A (zh) * 2024-08-14 2024-09-13 北京开源芯片研究院 指令处理方法、装置、电子设备及可读存储介质
CN119415155A (zh) * 2025-01-08 2025-02-11 山东浪潮科学研究院有限公司 Risc-v指令加速方法、系统、设备及存储介质
CN119537041A (zh) * 2025-01-23 2025-02-28 山东浪潮科学研究院有限公司 一种基于gpgpu的指令执行加速方法和gpgpu架构

Also Published As

Publication number Publication date
CN106406820B (zh) 2019-01-15
CN106406820A (zh) 2017-02-15

Similar Documents

Publication Publication Date Title
WO2017016255A1 (fr) Procédé et appareil de traitement parallèle d'instructions multiples de lancement de micromoteur, et support de stockage
US9672035B2 (en) Data processing apparatus and method for performing vector processing
US8495603B2 (en) Generating an executable version of an application using a distributed compiler operating on a plurality of compute nodes
US8683468B2 (en) Automatic kernel migration for heterogeneous cores
US8332854B2 (en) Virtualized thread scheduling for hardware thread optimization based on hardware resource parameter summaries of instruction blocks in execution groups
US9195443B2 (en) Providing performance tuned versions of compiled code to a CPU in a system of heterogeneous cores
CN101799760B (zh) 生成任意目标架构的并行单指令多数据代码的系统和方法
RU2614583C2 (ru) Определение профиля пути, используя комбинацию аппаратных и программных средств
US20200336421A1 (en) Optimized function assignment in a multi-core processor
EP3111333B1 (fr) Attribution de fils et de données dans des processeurs multic urs
US20130117543A1 (en) Low overhead operation latency aware scheduler
US20080155197A1 (en) Locality optimization in multiprocessor systems
US9354932B2 (en) Dynamically allocated thread-local storage
CN113934455B (zh) 指令转换方法及装置
US10430191B2 (en) Methods and apparatus to compile instructions for a vector of instruction pointers processor architecture to enable speculative execution and avoid data corruption
TWI639951B (zh) 基於同時多執行緒(smt)的中央處理單元以及用於檢測指令的資料相關性的裝置
EP3186704A1 (fr) Noyau de traitement de mots d'instructions très longs à plusieurs agrégats
US20090037889A1 (en) Speculative code motion for memory latency hiding
US11507386B2 (en) Booting tiles of processing units
US20070169001A1 (en) Methods and apparatus for supporting agile run-time network systems via identification and execution of most efficient application code in view of changing network traffic conditions
US11513841B2 (en) Method and system for scheduling tasks in a computing system
CN116301874B (zh) 代码编译方法、电子设备及存储介质
CN107506623B (zh) 应用程序的加固方法及装置、计算设备、计算机存储介质
CN105446733A (zh) 分离核
JP2022512879A (ja) ネットワークインターフェースデバイス

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16829622

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16829622

Country of ref document: EP

Kind code of ref document: A1