WO2016202001A1 - 计算机指令处理方法、协处理器和系统 - Google Patents

计算机指令处理方法、协处理器和系统 Download PDF

Info

Publication number
WO2016202001A1
WO2016202001A1 PCT/CN2016/073942 CN2016073942W WO2016202001A1 WO 2016202001 A1 WO2016202001 A1 WO 2016202001A1 CN 2016073942 W CN2016073942 W CN 2016073942W WO 2016202001 A1 WO2016202001 A1 WO 2016202001A1
Authority
WO
WIPO (PCT)
Prior art keywords
instruction set
binary code
coprocessor
cpu
instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2016/073942
Other languages
English (en)
French (fr)
Inventor
高云伟
林鑫龙
詹剑锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to EP16810742.3A priority Critical patent/EP3301567B1/en
Publication of WO2016202001A1 publication Critical patent/WO2016202001A1/zh
Priority to US15/844,191 priority patent/US10514929B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/52Binary to binary
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/3017Runtime instruction translation, e.g. macros
    • G06F9/30174Runtime instruction translation, e.g. macros for non-native instruction set, e.g. Javabyte, legacy code
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling
    • G06F9/3863Recovery, e.g. branch miss-prediction, exception handling using multiple copies of the architectural state, e.g. shadow registers
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45504Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
    • G06F9/45516Runtime code conversion or optimisation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • G06F9/5088Techniques for rebalancing the load in a distributed system involving task migration
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/509Offload
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/3017Runtime instruction translation, e.g. macros
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline or look ahead using a secondary processor, e.g. coprocessor
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline or look ahead using a secondary processor, e.g. coprocessor
    • G06F9/3879Concurrent instruction execution, e.g. pipeline or look ahead using a secondary processor, e.g. coprocessor for non-native instruction execution, e.g. executing a command; for Java instruction set
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45504Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45504Abstract machines for programme code execution, e.g. Java virtual machine [JVM], interpreters, emulators
    • G06F9/45508Runtime interpretation or emulation, e g. emulator loops, bytecode interpretation
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45537Provision of facilities of other operating environments, e.g. WINE
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • G06F9/4856Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • G06F9/4856Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration
    • G06F9/4862Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration the task being a mobile agent, i.e. specifically designed to migrate

Definitions

  • Embodiments of the present invention relate to the field of computers, and in particular, to computer instruction processing methods, coprocessors, and systems.
  • a coprocessor is a chip that is mainly used to process specific tasks on behalf of a central processing unit (CPU). Due to some differences in the instruction set between the coprocessor and the central processing unit, programs running on the coprocessor often need to be compiled separately using the compiler, and some adjustments to the code are required.
  • a tag is generally made in the code of the application, and the tag distinguishes which code is executed by the CPU and which code is executed by the coprocessor.
  • embodiments of the present invention provide a computer instruction processing method, a coprocessor, and a system, and a CPU may migrate a computer instruction to a coprocessor running an operating system, and the coprocessor executes the computer instruction to reduce CPU load.
  • an embodiment of the present invention provides a computer instruction processing method, which is applied to a processor system, where the processor system includes a coprocessor and a central processing unit CPU, and the CPU runs a first operating system, Running a second operating system on the coprocessor; the method includes:
  • the coprocessor obtains a second set of instructions according to the first set of instructions, wherein the binary code in the second set of instructions is used to instruct the coprocessor to perform the computer operation in the second operating system ;
  • the coprocessor executes the binary code in the second set of instructions.
  • the obtaining, by the coprocessor, the second instruction set according to the first instruction set includes:
  • the coprocessor matches an opcode of the binary code in the first instruction set in a preset translation table, if an opcode of the first binary code in the first instruction set is in the translation table Matching, the operation code of the first binary code is translated into the operation code of the second binary code according to the matching item corresponding to the operation code of the first binary code in the translation table, Obtaining the second binary code, the coprocessor obtaining the second instruction set according to the obtained at least one second binary code, wherein the translation table comprises the same computer instruction respectively compiled and generated Corresponding relationship between different opcodes in the first operating system and the second operating system, the second binary code being a binary code applicable to the second operating system.
  • the method before the coprocessor executes the binary code in the second instruction set, the method further include:
  • the coprocessor converts a register address of the CPU in a binary code included in the second instruction set to a register address of the coprocessor.
  • the obtaining, by the coprocessor, the second instruction set according to the first instruction set further includes:
  • the coprocessor uses the third binary code as the second instruction Centralized binary code.
  • the first instruction set is that the CPU migrates to the coprocessor when the CPU usage of the CPU is greater than a first threshold.
  • the coprocessor receives the first instruction set that is migrated by the CPU, including:
  • the coprocessor receives an address of the first instruction set to be migrated sent by the CPU, where an address of the first instruction set refers to an address stored in a memory of the CPU by the first instruction set.
  • the address of the first instruction set is sent by the CPU to the coprocessor when a memory usage of the CPU is less than or equal to a second threshold;
  • the coprocessor acquires the first instruction set by accessing a memory of the CPU based on an address of the first instruction set.
  • the first set of instructions is sent by the CPU to the coprocessor when the memory usage of the CPU is greater than a second threshold.
  • the coprocessor executes the binary code in the second instruction set, including :
  • the coprocessor sequentially executes the binary code in the second instruction set
  • the method before the converting the fourth binary code into the intermediate code, the method further includes:
  • the coprocessor executes the binary code in the second instruction set, including :
  • the coprocessor sequentially executes the binary code in the second instruction set
  • the binary code acquisition starting from the sixth binary code in the second instruction set is applicable to the Before the third instruction set of an operating system, it also includes:
  • an embodiment of the present invention provides a coprocessor, which is applied to a processor system, where the processor system includes the coprocessor and a central processing unit CPU running a first operating system, and the coprocessor Running a second operating system; the coprocessor includes:
  • a first instruction set receiving unit configured to receive a first instruction set migrated by the CPU, where the first instruction set is used to instruct the CPU to perform a computer operation in the first operating system, the first instruction set a set of binary codes applicable to the first operating system;
  • a second instruction set obtaining unit configured to obtain a second instruction set according to the first instruction set, where the binary code in the second instruction set is used to indicate that the coprocessor is executed in the second operating system The computer operation;
  • a second instruction set execution unit configured to execute a binary code in the second instruction set.
  • the second instruction set obtaining unit configured to obtain the second instruction set according to the first instruction set, includes:
  • the second instruction set obtaining unit is configured to match an operation code of the binary code in the first instruction set in a preset translation table, if an operation code of the first binary code in the first instruction set is If the translation table is matched, the operation code of the first binary code is translated into the second binary according to the matching item corresponding to the operation code of the first binary code in the translation table.
  • the coprocessor further includes:
  • a register address conversion unit configured to convert a register address of the CPU in a binary code included in the second instruction set into a register address of the coprocessor.
  • the second instruction set obtaining unit is further used If the opcode of the third binary code in the first instruction set is not matched in the translation table, the coprocessor uses the third binary code as the second instruction Centralized binary code.
  • the first instruction set is that the CPU migrates to the coprocessor when the CPU usage of the CPU is greater than a first threshold.
  • the first instruction set receiving unit configured to receive the first instruction set that is migrated by the CPU, includes:
  • the first instruction set receiving unit is configured to receive an address of the first instruction set to be migrated sent by the CPU, and access the memory of the CPU based on an address of the first instruction set to obtain the first An instruction set; wherein the address of the first instruction set refers to an address stored in a memory of the CPU by the first instruction set, wherein an address of the first instruction set is used by the CPU
  • the memory is sent to the coprocessor when the memory usage of the CPU is less than or equal to the second threshold.
  • the first set of instructions is sent by the CPU to the coprocessor when the memory usage of the CPU is greater than a second threshold.
  • the second instruction set execution unit is configured to execute the second instruction Centralized binary code, including:
  • the second instruction set execution unit is configured to sequentially execute the binary code in the second instruction set; if a binary code recognition abnormality occurs when detecting the execution of the second instruction set, determining a fourth binary that triggers an abnormality Code, converting the fourth binary code into an intermediate code, and then Translating the intermediate code into a fifth binary code applicable to the second operating system, executing the fifth binary code, and continuing to execute after the fourth binary code in the second instruction set Binary code.
  • the second instruction set execution unit is further configured to: before converting the fourth binary code into the intermediate code Sending an instruction set fetch request to the CPU, and receiving a reject fetch instruction sent by the CPU.
  • the second instruction set execution unit is configured to execute the second instruction, in a fifth possible implementation manner, or a sixth possible implementation manner
  • Centralized binary code including:
  • the second instruction set execution unit is configured to sequentially execute the binary code in the second instruction set; if a binary code recognition abnormality occurs when detecting the execution of the second instruction set, determining a sixth binary that triggers an abnormality Code; acquiring, according to the binary code starting from the sixth binary code in the second instruction set, a third instruction set applicable to the first operating system, and migrating the third instruction set to the CPU.
  • the second instruction set execution unit is further configured to use the sixth binary code according to the second instruction set Before the initial binary code acquires the third instruction set applicable to the first operating system, the instruction set fetch request is sent to the CPU, and the instruction set fetch response sent by the CPU is received.
  • an embodiment of the present invention provides a coprocessor, where the coprocessor is connected to a memory through a bus, the memory is used to store a computer execution instruction, and the coprocessor reads the memory storage location.
  • the computer executes the instructions, and the computer instruction processing method provided by the above first aspect or any one of the possible embodiments of the first aspect is executed.
  • an embodiment of the present invention provides a processor system, where the processor system includes a central processing unit CPU and a coprocessor, wherein the CPU runs a first operating system, Running a second operating system on the coprocessor;
  • the CPU is configured to migrate the first instruction set to the coprocessor
  • the coprocessor is the computer instruction processing method provided by the above first aspect or any one of the possible implementation manners of the first aspect.
  • the coprocessor acquires the second instruction set executed by the coprocessor according to the first instruction set, and executes the second instruction by the coprocessor The set replaces the execution of the first instruction set by the CPU, reduces the load on the CPU, and increases the usage rate of the coprocessor.
  • FIG. 1 is a schematic diagram of a system logic structure of an application scenario in which a compiled binary code is allocated in the prior art
  • FIG. 2 is a schematic diagram of a system logic structure of an application scenario of a computer instruction processing method
  • FIG. 3 is a schematic diagram of a correspondence between a computer instruction in a command set of a central processing unit and an instruction set of a coprocessor;
  • Figure 5 is an alternative exemplary flow chart of step A402
  • FIG. 6 is still another exemplary flowchart of a computer instruction processing method
  • FIG. 7 is an alternative exemplary flow chart of step A401
  • FIG. 8 is an optional exemplary flowchart of processing a binary code exception in step A403;
  • step A403 is an alternative exemplary flowchart of processing binary code exceptions in step A403;
  • FIG. 10 is an alternative exemplary flowchart for processing binary code exceptions in step A403;
  • step A403 is an alternative exemplary flowchart of processing a binary code exception in step A403;
  • FIG. 12 is a schematic diagram of a logical structure of the coprocessor 202
  • 13 is a schematic diagram of an optional logical structure of the coprocessor 202;
  • FIG. 14 is a schematic diagram showing the system logic structure of a system composed of a coprocessor 1401 and a memory 1402.
  • a system 100 provided by the prior art includes a central processing unit (CPU) 101 and a coprocessor 102.
  • the instruction set supported by the central processing unit 101 and the instructions supported by the coprocessor 102 are provided.
  • the set is different.
  • the central processing unit 101 is installed with an operating system (for example, an operating system supporting the X86 instruction set), and the coprocessor 102 does not have an operating system installed.
  • the program running by the central processing unit 101 compiles the source code processed by the coprocessor 102 according to the instruction set of the coprocessor 102 when compiling the source code, and the compiled binary code is available to the coprocessor 102.
  • the processor 101 is identifiable and executable, but there may be unrecognizable binary code for the coprocessor 102, so even if the load of the central processor 101 is large, the coprocessor 102 cannot share the execution by the central The binary code executed by the processor 101.
  • a system 200 is provided by an embodiment of the present invention.
  • the system 200 includes a central processing unit 201 and a coprocessor 202.
  • the coprocessor 202 has control capabilities, and the central processor 201 and the coprocessor 202 each run an operating system; compared to the prior art, the coprocessor 202 does not have an operating system installed, and the present invention installs an operating system in the coprocessor 202,
  • the coprocessor 202 runs the process and is scheduled by the operating system of the coprocessor 202; thus, the central processor 201 and the coprocessor 202 can migrate processes to each other.
  • the system 200 can be located in the same data processing device at the same time, and the location of the central processing unit 201 and the coprocessor 202 that is included in the data processing device is not limited in the embodiment of the present invention;
  • the processor 201 is connected to the coprocessor 202.
  • the central processing unit 201 and the coprocessor 202 included in the system 200 are connected to the data processing device.
  • the embodiment of the present invention is not limited;
  • the processor 201 and the coprocessor 202 can perform data transmission with each other.
  • the data processing device includes a bus, and the central processor 201 and the coprocessor 202 included in the system 200 are simultaneously connected to the bus, and the central processor 201 and the coprocessor 202 perform data interaction via the bus on the bus.
  • the data transmission requirement includes a data transmission speed and a data transmission format, and the specific model of the bus and the supported bus protocol are not limited;
  • other media may be used to connect the central processing unit 201 and the coprocessor 202 included in the system 200, and the data interaction speed between the central processing unit 201 and the coprocessor 202 is improved by the medium.
  • the medium may be used in place of the bus connecting the central processor 201 and the coprocessor 202, or the medium coexists with the bus for data interaction between the central processor 201 and the coprocessor 202. .
  • the central processing unit 201 is implemented by using an X86 processor, and the coprocessor 202 is implemented by an Intel Integrated Core (MIC), which is fast and external.
  • MIC Intel Integrated Core
  • PCI-E Peripheral Component Interconnect Express
  • the central processing unit 201 and the coprocessor 202 are not located in the same device, the central processing unit 201 is communicatively coupled to the coprocessor 202, and between the central processing unit 201 and the coprocessor 202. Data interaction by means of messages.
  • the instruction set supported by the central processing unit 201 and the instruction set supported by the coprocessor 202 may include partially different computer instructions, and the central processor 201 operates first.
  • the operating system supports the instruction set of the central processing unit 201, and the operating system run by the coprocessor 202 supports the instruction set of the coprocessor 202; if the first operating system is the same as the second operating system, the first operating system and the second operating system
  • the instruction set of the central processing unit 201 and the instruction set of the coprocessor 202 are both supported; if the first operating system does not support the instruction set of the coprocessor 202 or the second operating system does not support the instruction set of the central processing unit 201, then An operating system is not the same operating system as the second operating system.
  • the set of instructions supported by the central processor 201 can be divided into three partial sub-instructions, including a first partial sub-instruction set, a second partial sub-instruction set, and a third-part sub-instruction set, relative to the set of instructions supported by the coprocessor 202.
  • the first partial sub-instruction set which contains each computer instruction
  • the instruction set of the coprocessor 202 also contains the same computer instruction; and the binary code supported by the central processing unit 201 representing the computer instruction is supported by the coprocessor 202. Indicates that the binary code of the computer instruction is the same. Taking FIG.
  • the binary code "AAAA” represents a computer instruction in the first partial sub-instruction set
  • the instruction set supported by the coprocessor also includes the computer instruction
  • the binary code "AAAA” also indicates that the coprocessor supports The computer instruction in the instruction set; similar to the binary code "AAAA”, corresponding to the binary code "CCCC” and the binary code "DDDD” indicating the other two computer instructions in the first partial sub-instruction set, also indicating that the coprocessor is included in the instruction set.
  • the two computer instructions is similar to the binary code "AAAA”, corresponding to the binary code "CCCC” and the binary code "DDDD" indicating the other two computer instructions in the first partial sub-instruction set, also indicating that the coprocessor is included in the instruction set.
  • the second partial sub-instruction set which contains each computer instruction
  • the instruction set of the coprocessor 202 also contains the same computer instruction; however, the binary code supported by the central processing unit 201 representing the computer instruction, and the coprocessor 202 Supported binary codes representing the computer instructions are not identical.
  • the binary code “BBBB” represents a certain computer instruction in the second partial sub-instruction set
  • the instruction set supported by the coprocessor also includes the computer instruction, but the binary supported by the coprocessor represents the computer instruction.
  • the codes “B1B1B1B1", "BBBB” and "B1B1B1B1" are different binary codes.
  • the third partial sub-instruction set which contains each computer instruction, does not include the instruction set of the coprocessor 202.
  • the binary code “EEEE” represents a certain computer instruction in the third partial sub-instruction set, and the instruction set of the coprocessor does not include the computer instruction, so for the binary code “EEEE”, the coprocessor instruction The binary code corresponding to the computer instruction cannot be found in the set.
  • the coprocessor 202 receives the data related to the process migrated by the central processing unit 201, including the binary code required to execute the process, and includes the The process state of the process and so on.
  • the opcode of the binary code belongs to a binary code representing a computer instruction, the binary code may also include an operand, and the operand is also represented by a binary code.
  • the binary code supported by the central processor 201 representing the computer instruction is different from the binary code supported by the coprocessor 202 indicating the computer instruction; in view of this, the present invention provides The computer instruction processing method establishes a translation table for the second partial sub-instruction set, and converts the binary code by the translation table, and converts the binary code supported by the central processor 201 and represents the computer instruction of the second partial sub-instruction set into the coprocessor 202.
  • Branch The binary code representing the computer instruction is such that the coprocessor 202 can recognize the converted binary code, and the binary code executed in the process of running the migrated process implements the functions of the computer instruction.
  • the computer instruction processing method provided by the present invention may display a binary code indicating that the computer instruction is not recognized, and an unrecognized binary code appears (the operation code of the binary code indicates the computer)
  • an unrecognized binary code appears (the operation code of the binary code indicates the computer)
  • the binary code that triggers the exception is first converted into one or more intermediate codes, and then each intermediate code is converted into binary code supported by the coprocessor 202, and the process continues from the converted binary code;
  • An optional specific implementation of intermediate code conversion is to determine an operation code represented by each intermediate code according to an operation code in a binary code that triggers an exception (a binary code belonging to a computer instruction indicating a third partial sub-instruction set), if there is an intermediate
  • the operand represented by the code determines the operand corresponding to each opcode represented by the intermediate code according to the operand in the binary code that triggers the exception, and then determines the binary code supported by the coprocessor 202 according to the opcode represented by the intermediate code.
  • Each of the operation codes, and the operand corresponding to each of the binary codes supported by the coprocessor 202 is determined according to the operand represented by the intermediate code.
  • the central processing unit 201 is configured as a device that preferentially executes a process
  • the coprocessor 202 is set as a device for performing a secondary execution process.
  • the central processor 201 executes a process, and the following occurs during the process of executing the process.
  • the process is migrated to the coprocessor 202, including:
  • the central processor 201 executes the process, identifying the binary code that is executed by the coprocessor 202, the process is migrated to the coprocessor 202, which is executed by the coprocessor 202 to execute the process.
  • Binary code optionally, the coprocessor 202 feeds back the execution result of the process to the central processing unit 201;
  • the process is run by the coprocessor 202 translating the binary code required to execute the process according to the translation table, and then executing the translated binary code; wherein the translation action is specifically: traversing the binary code required for executing the process Match lookup, optionally in binary generation
  • the execution order of the codes is traversed to match the search in order, and the matching finds whether there is a binary code supported by the central processing unit 201 of the translation table record. If the matching is found, the binary code found by the translation table is replaced with the binary code indicating the same computer instruction. a binary code supported by the processor 202;
  • the coprocessor 202 translates the binary code required to execute the process according to the translation table, and then executes the translated process.
  • An embodiment of the present invention details how the coprocessor runs the process migrated from the CPU.
  • the coprocessor is implemented by the MIC, and the central processor adopts the X86 processor as an example.
  • the X86 processor and the MIC pass PCI-E bus connection, X86 processor runs a general-purpose operating system (such as an operating system that supports the X86 instruction set), MIC runs a custom uOS operating system, and the X86 processor and MIC have separate memory and registers.
  • the register of the X86 processor is a 128-bit register.
  • the X86 processor includes 16 128-bit XMM registers, and the XMM register belongs to a vector register, and supports single instruction stream multiple stream extension (Streaming SIMD Extensions, referred to as The SSE) instruction set, wherein the Chinese name of the SIMD is a single instruction stream multiple data stream, and the SIMD English is called Single Instruction Multiple Data; that is, the X86 processor can operate the XMM register group by using the SSE instruction set to perform 128 bits. Vector operation.
  • the X86 processor's registers are 256-bit registers.
  • the X86 processor contains 16 256-bit YMM registers, which belong to vector registers and support the Advanced Vector Extensions (AVX) instruction set, X86.
  • the processor can operate the YMM register set using the AVX instruction set for 256-bit vector operations, such as operating the YMM register set for floating-point operations.
  • AVX Advanced Vector Extensions
  • the MIC register is a 512-bit register.
  • the MIC contains 32 512-bit ZMM registers.
  • the ZMM registers belong to vector registers and can operate the ZMM register set for 512-bit vector operations.
  • the MIC registers also support the SSE instruction set. AVX instruction set.
  • the MIC is compatible with the operation of the X86 processor operation register, and the MIC selects 16 of the 32 registers, supports 128-bit operations (such as operations supporting the SSE instruction set), and supports 256-bit operations (eg, Support for the operation of the AVX instruction set); optionally, the MIC uses 16 The lower 256 bits of a 512-bit register perform 128-bit operations or 256-bit operations to be compatible with X86 processor operation registers.
  • the MIC replaces the binary code of the register representing the X86 processor in the binary code required to execute the process with the binary code representing the register selected in the MIC, that is, the register pointing to the X86 processor.
  • the binary code is replaced with a binary code that points to the register selected in the MIC.
  • the bit of the operation data can be adjusted without any bit adjustment and the operation rule can be adjusted.
  • the MIC will migrate the binary of the X86 processor.
  • the binary code of the XMM register representing the X86 processor in the code is replaced with the binary code representing the register selected in the MIC.
  • the MIC executes the binary code representing the computer instruction in the SSE instruction set, and operates the 16 registers selected in the MIC to perform a 128-bit vector.
  • the MIC replaces the binary code of the YMM register representing the X86 processor in the binary code migrated by the X86 processor with the binary code representing the register selected in the MIC, and the MIC executes a binary code representing the computer instruction in the AVX instruction set. Operate 16 registers selected in MIC for 256-bit vector Operation.
  • the MIC replaces the binary code of the register representing the X86 processor in the binary code required for the process with the binary code representing the register selected in the MIC, and then runs using the register of the MIC.
  • the process of migration is the process of migration.
  • a translation table is created, and the created translation table is matched by the MIC.
  • the translation table is established for the computer instructions included in the second partial sub-instruction set, because the binary code supported by the X86 processor representing the computer instruction is different from the binary code supported by the MIC indicating the computer instruction;
  • the method of translating the table is to add a binary code representing the computer instruction supported by the X86 processor, a binary code representing the computer instruction supported by the MIC in the translation table, and determine the representation of the computer supported by the X86 processor in the translation table.
  • Table 1 exemplifies five computer instructions, as follows:
  • the "FXSAVE” in Table 1 is an instruction indicating the floating point save state, which is used to save the state of the Float Point Unit (FPU) register, and the binary code supported by the X86 processor indicating "FXSAVE” is " 01111011110111010000", the binary code of the MIC support "FXSAVE” is "00001111101011101111”;
  • the "FXRSTOR” in Table 1 is an instruction indicating the floating-point recovery state, which is used to restore the state of the saved FPU register to the FPU register;
  • the binary code supported by X86 for "FXRSTOR” is "11011101", which is supported by the MIC.
  • the binary code indicating "FXRSTOR” is "00001111101011100001";
  • the "RDPMC” in Table 1 is the instruction to read the execution monitor count to read the performance monitor's count; the X86 processor supports two binary codes representing "RDPMC", including "0000111100110001” and "0000111100110010" The MIC supports only one binary code representing "RDPMC”, which is "0000111100110011";
  • the "FSUB” in Table 1 is a floating-point subtraction instruction for the subtraction of floating-point numbers.
  • the binary code for "FSUB” supported by X86 is "1101100011100000”
  • the binary code for "FSUB” supported by MIC is "0000111101011100”. ".
  • the data related to the process acquired by the MIC from the CPU during the migration process of the X86 processor to the MIC is extracted from the data related to the process to execute the process.
  • the required binary code is matched and replaced according to the translation table, and the binary code corresponding to the computer instruction included in the second partial sub-instruction set is replaced by a binary code representing the computer instruction supported by the MIC in the translation table. Then the MIC can identify the replaced binary code, which improves the accuracy and efficiency of the MIC's post-migration process to a certain extent.
  • the operating system of the X86 processor runs a piece of code to implement a monitoring module, and the monitoring module detects the X86 processor.
  • the load includes: detecting the CPU usage of the X86 processor, and detecting the memory usage of the memory used by the X86 processor.
  • the X86 processor migrates the process to the MIC, if the memory usage of the memory used by the X86 processor is greater than the second threshold, the process is migrated from the X86 processor to the MIC, regardless of whether the CPU usage of the X86 processor is greater than the first
  • the threshold sends the data related to the process in the memory used by the X86 processor to the MIC.
  • the MIC stores the received data related to the process in the memory used by the MIC, and extracts the binary code required to execute the process from the data stored in the memory of the MIC; if the traversal is extracted Binary code matches the binary code supported by the CPU in the translation table (the binary code corresponding to the computer instruction contained in the second partial sub-instruction set), and is replaced by the corresponding binary code of the computer instruction supported by the MIC in the translation table, and The replaced binary code is stored in the memory of the MIC; thus, the binary code corresponding to the computer instruction contained in the second partial sub-instruction set stored in the MIC is replaced by the translation table to the MIC-supported representation of the computer instruction. Binary code.
  • the X86 processor migrates the process to the MIC
  • the storage address of the X86 processor's memory storing the data related to the process is sent to the MIC.
  • the MIC divides a storage space from the memory used by the MIC, and establishes an address mapping of the storage address included in the storage space and the received storage address (the storage address of the data related to the process in the memory of the X86 processor).
  • the MIC accesses the data related to the process in the memory of the X86 processor according to the address mapping relationship through the PCI-E bus, and extracts the binary code required to execute the process from the data related to the process;
  • the extracted binary code matches the binary code supported by the CPU in the translation table (ie, the binary code corresponding to the computer instruction contained in the second partial sub-instruction set), and is replaced with the binary code supported by the MIC in the translation table, and will be replaced.
  • the binary code is updated to the memory of the X86 processor.
  • the binary code corresponding to the computer instruction contained in the second partial sub-instruction set stored in the memory of the X86 processor is replaced with the binary code representing the computer instruction supported by the MIC according to the translation table; further, the MIC maps according to the address The relationship uses the CPU's memory to run the process.
  • the MIC's uOS operating system runs a piece of code to implement the translation module, which loads the translation table.
  • the translation module For each process migrated by the X86 processor, the translation module traverses the binary code required to execute the process, and matches to find whether there is a binary code supported by the X86 processor in the translation table. Each time the match is found, the binary is obtained according to the translation table. The code is translated into a binary code supported by the MIC until the traversal lookup is completed.
  • the binary code supported by the X86 processor varies depending on the object being compared. If the two register values are compared, the X86 processor supports the binary code of the XOR instruction as "0011001". If the register value and the value stored in the memory are compared, the X86 processor supports the binary code of the XOR instruction as "0011000"; however, the binary code of the MIC supporting the XOR instruction is uniformly expressed as "0011000”; The MIC can recognize the XOR instruction comparing the two register values, and record the mapping relationship between "0011001" and "0011000” in the translation table. If the translation module searches for the binary code included in the process migrated from the X86 processor, it searches according to the translation table matching. Go to "0011001" and replace "0011001" in the binary code included in the process with "0011000” according to the translation table.
  • the X86 processor supports the binary code of the "RDPMC” instruction as "0000111100110001” and "0000111100110010", and the MIC supports a binary code representing "RDPMC”. Only one is “0000111100110011”, so the mapping relationship between "0000111100110001” and "0000111100110011” is recorded in the translation table, and the mapping relationship between "0000111100110010” and "0000111100110011” is recorded in the translation table.
  • the translation module searches for "0000111100110001” according to the translation table matching in the binary code included in the process migrated from the X86 processor, the "0000111100110001” in the binary code included in the process is replaced with "0000111100110011” according to the translation table;
  • the translation module searches for the "0000111100110010” according to the translation table matching in the binary code included in the process migrated from the X86 processor, and replaces "0000111100110010" in the binary code included in the process with "0000111100110011” according to the translation table.
  • the MIC during the migration process of the X86 processor to the MIC, not only acquires data related to the process from the memory used by the X86 processor, but also obtains the register value associated with the process from the register of the X86 processor. And transfer the obtained register value to the corresponding registration of the MIC Device.
  • the process state of the process is first extracted from the data related to the process stored in the memory of the MIC, and the process state includes the necessary state information of the process running, such as the process priority, the process identifier, and the stack pointer; then, The MIC starts from the process running node determined according to the state of the process, and uses the register of the MIC to execute the process based on the data related to the process stored in the MIC-based memory (the data includes the binary code translated according to the translation table).
  • the uOS operating system of the MIC runs a piece of code to implement an exception handling module capable of intercepting an exception occurring in the MIC execution process, including an exception triggered by the process executing to an unrecognized binary code.
  • an exception handling module capable of intercepting an exception occurring in the MIC execution process, including an exception triggered by the process executing to an unrecognized binary code.
  • the exception handling module detects a process execution exception, the process is suspended, and an exception information that records the abnormal execution of the process is generated.
  • the binary code of the computer instruction included in the third partial sub-instruction set is not included in the instruction set of the MIC, and the MIC cannot identify the binary code corresponding to the computer instruction; in addition, the translation table does not record the third partial sub-instruction.
  • the binary code of the computer instruction contained in the set cannot convert the binary code of the computer instruction contained in the third partial sub-instruction set into the binary code corresponding to the computer instruction in the instruction set of the MIC according to the translation table; therefore, the MIC cannot recognize the binary code
  • the code is the binary code of the opcode, and if the process executes to the binary code, it will trigger an exception that the binary code does not recognize.
  • the exception handling module detects an execution process exception triggered by the instruction recognition exception, suspends the process, and uses the following three optional exception handling methods for exception handling:
  • the exception handling module starts the suspended process from the binary code that triggers the process exception, and includes: executing the memory in the memory (which may be the memory of the MIC or the memory of the X86 processor).
  • the binary code required for the relocation process is based on the translation of the opcode (in binary code), the matched binary code is converted to the binary code supported by the X86 processor, and the corresponding binary in the memory is updated by the converted binary code. code. If the binary code required to execute the fetch process is updated after the MIC memory is stored, the updated binary code required to execute the fetch process is transferred to the memory of the X86 processor, and a transfer storage implementation is implemented.
  • the method is to transfer the updated binary code required to execute the fetching process stored in the MIC's memory to the memory of the X86 processor through data communication between the MIC and the X86 processor.
  • the binary code of the register representing the MIC in the binary code required for the fetching process is also replaced.
  • the X86 processor can run the fetch process using its registers and its memory.
  • the exception handling module determines whether the suspended process belongs to a process that the X86 processor migrates to the MIC. If the suspended process belongs to a process that the X86 processor migrates to the MIC, the binary code that triggers the exception is identified, and Converting the binary code that triggers the exception to the intermediate code of the simulator (such as the simulator or qemu simulator), then converting the intermediate code to the binary code supported by the MIC, and continuing to execute the process from the converted binary code;
  • the intermediate code of the simulator such as the simulator or qemu simulator
  • the exception handling module sends a process fetch request to the X86 processor to notify the X86 processor to expect to relocate the currently abnormal execution process; the X86 processor responds to the process fetch request and determines that the monitoring module is currently monitoring the X86 processing.
  • Usage of the device memory usage of the memory of the currently monitored X86 processor; if the X86 processor determines that the usage of the X86 processor is less than the first threshold, and the memory usage of the memory of the X86 processor is less than Or equal to the second threshold, the X86 processor feeds back the instruction to the MIC; if the X86 processor determines that the X86 processor usage is greater than the first threshold, or if the X86 processor determines that the result is the memory of the X86 processor The memory usage rate is greater than the second threshold, and the X86 processor rejects the fetch instruction to the MIC feedback process;
  • the exception handling module receives the process fetch instruction fed back by the X86 processor, the suspended process is moved back to the X86 processor, and the process of relocating the process from the MIC to the X86 processor is the same as the first mode described above. The principle is implemented and will not be described here;
  • the suspended process is not moved back to the X86 processor, and it is determined whether the suspended process belongs to the process of the X86 processor migrating to the MIC. If the suspended process belongs to a process that the X86 processor migrates to the MIC, identify the binary code that triggered the exception, and convert the binary code that triggered the exception to the intermediate code of the simulator (such as the simulator or qemu simulator), and then The intermediate code is converted to a binary code supported by the MIC, and the process is continued from the converted binary code.
  • the intermediate code such as the simulator or qemu simulator
  • the MIC directly outputs the exception information; wherein the suspended process is not part of the process of the X86 processor migrating to the MIC.
  • the reason may be that the X86 processor has a process in which the code segment to be executed by the MIC is identified, the code segment is transferred to the MIC, and the MIC newly establishes a process to execute the code segment and an exception occurs.
  • the first type is a computer instruction that can be split into two actions, including a conditional jump instruction "CMOV”, a comparison swap 16-byte instruction “CMPXCHG16B”, a floating-point conditional jump instruction “FCMOVcc”, a floating-point comparison load flag instruction” FCOMI”, floating-point compare load flag pop instruction “FCOMIP”, floating-point inverse load flag instruction "FUCOMI”, floating-point inverse load flag pop command "FUCOMIP”; if MIC executes binary code containing first-class computer instructions If the MIC continues to execute the process, the MIC converts the binary code containing the first type of computer instructions into intermediate code, which is a binary code containing two actions, which are respectively in the MIC instruction set.
  • the intermediate code is converted into a binary code supported by the MIC.
  • the operation code of each binary code in the binary code supported by the MIC respectively represents a computer instruction corresponding to the instruction set of the MIC, so that the MIC can recognize the conversion. After the binary code is executed.
  • the MIC cannot recognize the binary code indicating the binary code of the conditional jump instruction as an operand, and the exception handling module converts the binary code into a binary code supported by the MIC, including The following conversion of the instruction in the binary code form: determining, according to the conditional jump instruction, the conditional judgment instruction and the movement instruction indicated by the intermediate code, and then translating the conditional judgment instruction and the movement instruction indicated by the intermediate code into a conditionally judged condition of the MIC Instruction and Move instruction (MOV); the MIC first executes the condition determination instruction to determine whether the jump condition is satisfied, and if so, executes the move instruction (MOV);
  • the second type is computer instructions for reading data or writing data through a port, including port input instruction "IN”, port input string instruction “INS”, port input byte string instruction “INSB”, port input double string instruction” INSD”, port input string instruction “INSW”, monitor instruction “MONITOR”, thread synchronization instruction "MWAIT”, port output instruction "OUT”, port output string instruction "OUTS”, port output byte string instruction "OUTSB”, The port outputs the double word instruction "OUTSD” and the port output string instruction "OUTSW”; the MIC executes the second type of instruction, causing the process to be abnormal. If the process is continued by the MIC, the process is handled in two cases; the first case is through the port.
  • the MIC notifies the X86 processor to write data from the target port specified by the computer instruction;
  • the second case is a computer instruction to read the data, in which case the MIC first notifies the X86 processor to The target port specified by the computer instruction reads data into the memory, and then accesses the memory to obtain the data;
  • the third category includes the pause instruction "PAUSE”, the system entry instruction “SYSENTER”, and the system exit instruction “SYSEXIT”. These three instructions are added later.
  • the pause command "PAUSE” is to reduce the performance of the spin lock. Loss, the system enters the instruction "SYSENTER”, the system exit instruction "SYSEXIT” is to reduce the loss of switching between kernel mode and user mode; the third type of computer instruction is the optimization of the original X86 instruction set, but the MIC does not support this If the MIC executes the pause instruction "PAUSE” indicated by the binary code to trigger the process exception, if the process continues to be executed by the MIC, the spin lock instruction represented by the binary code is executed instead of the pause instruction "PAUSE” indicated by the binary code.
  • the process can continue to run; if the system entering the binary code indicates that the instruction "SYSENTER” triggers the process exception, the binary code indicates the switching user mode to the kernel mode switching instruction instead of the system entering the instruction "SYSENTER". , you can continue to run the process; if the execution of the binary code indicates the system exits Order "SYSEXIT” trigger abnormal process, execute changeover command kernel mode to user mode binary code representation, instead of the system to perform a binary code representation Exit command "SYSEXIT", you can continue to run the process.
  • the MIC may feedback the execution result to the X86 processor, and may also directly output the execution result.
  • the output manner includes but is not limited to: a data output module through a display module or the like.
  • the execution result is presented, or other actions are performed based on the execution result.
  • the MIC can schedule the process after running the uos operating system, and then the X86 processor is under heavy load (the usage rate of the X86 processor is greater than the first threshold, and/or the memory usage of the X86 processor is greater than the second. Threshold) migrates the filtered process to MIC execution, sharing the load of the X86 processor; in particular, for heavy-duty processes, the X86 processor can migrate the process to MIC execution, extending the life of the X86 processor. Also try to ensure that each process can allocate enough resources to ensure the efficiency of each process.
  • the present embodiment adapts and expands the technical solution of the foregoing embodiment, from the coprocessor.
  • the perspective of 101 proposes a basic flow of implementation of a computer instruction processing method
  • FIG. 4 is an exemplary workflow of the computer instruction processing method, but for the convenience of description, only parts related to the embodiment of the present invention are shown.
  • the computer instruction processing method provided in this embodiment is applied to a processor system, where the processor system includes a coprocessor and a central processing unit CPU, the CPU runs a first operating system, and the coprocessor runs a second An operating system; wherein the first operating system refers to an operating system that supports an instruction set of the CPU; and the second operating system refers to an operating system that supports an instruction set of the coprocessor.
  • the coprocessor can run processes and threads through the second operating system, perform scheduling between processes, and perform scheduling between threads; and further, the CPU can migrate one or more to the coprocessor.
  • this embodiment defines that the first process is a single process that migrates from the CPU to the coprocessor; in addition, the CPU may also migrate one or more threads to the coprocessor, and this embodiment defines that the first thread is from the CPU to the coordinator. A single thread that the processor migrates.
  • the CPU and the coprocessor can not only migrate processes to each other, migrate threads to each other, but also migrate one or more binary codes to each other.
  • This embodiment defines a first instruction set. If the CPU migrates the process to the coprocessor, the first instruction set refers to the binary code required to execute the process; if the CPU migrates the thread to the coprocessor, the first instruction set refers to executing the The binary code required by the thread; if the CPU migrates one or more binary code to the coprocessor, the first instruction set refers to the set of binary code of the CPU migration coprocessor.
  • the binary code included in the first instruction set is compiled according to the instruction set of the CPU, and is not limited to the compilation executed by the first operating system, and may be executed by the first operating system.
  • the compilation may also be that the first operating system obtains the first instruction set after the other compiler completes the compilation, and the source code is not limited herein, and the source code is not limited to which programming language is used. .
  • the computer instruction processing method provided in this embodiment includes: step A401, step A402, and step A403.
  • Step A401 the coprocessor receives a first instruction set migrated by the CPU, where the first instruction set is used to instruct the CPU to perform a computer operation in the first operating system, where the first instruction set is A collection of binary code suitable for the first operating system.
  • the trigger condition for triggering the CPU to migrate the first instruction set to the coprocessor is not limited, and even the CPU may migrate the first instruction set to the coprocessor under any condition;
  • the CPU when the CPU receives the instruction set migration instruction in the process of executing the first instruction set, the first instruction set is migrated to the coprocessor; wherein the instruction set migration is triggered by any of the following three conditions: instruction:
  • the first condition is to artificially trigger the instruction set migration instruction.
  • the CPU and the coprocessor are simultaneously integrated into a data processing device, and the instruction processing set migration instruction is triggered by the human operation data processing device;
  • the CPU determines whether to migrate the first instruction set to the coprocessor according to the CPU usage, and if the CPU usage of the CPU is greater than the first threshold, triggering the instruction set migration instruction;
  • the CPU determines whether to migrate the first instruction set to the coprocessor according to the memory usage rate of the CPU using the memory. If the memory usage of the memory in the CPU is greater than the second threshold, the instruction set migration instruction is triggered.
  • the operation code of each binary code in the compiled first instruction set belongs to the binary code indicating the instruction set of the CPU. Therefore, each binary code representing the computer instruction in the first instruction set can be recognized and executed by the CPU.
  • a binary code representing a computer instruction triggers a computer operation, for example, a process running on an X86 processor executes a binary code "1101100011100000" for floating-point subtraction operations; for example, on an X86 processor The running process executes a binary code "1101100011100000" to perform the computer operation of restoring the state of the saved FPU register to the FPU register.
  • each binary code in the first instruction set includes not only the operation code, but also Including the operand; the opcode is represented by a binary code, and the operand is also represented by a binary code.
  • the CPU migrates the first instruction set to the coprocessor, and the coprocessor performs step A401 to receive the first instruction set migrated by the CPU.
  • the CPU sends a first instruction set to the coprocessor during the process of migrating the first process by the coprocessor, and the coprocessor performs step A401 to receive the first instruction set migrated by the CPU.
  • Step A402 the coprocessor obtains a second instruction set according to the first instruction set, where the binary code in the second instruction set is used to instruct the coprocessor to execute in the second operating system. Computer operation.
  • the instruction set of the CPU is divided into the first partial sub-instruction set, the second partial sub-instruction set, and the third partial sub-instruction set according to the foregoing, and the co-processor instruction set does not include the third-part sub-instruction set, so the co-processing
  • the instruction set of the processor is different from the instruction set of the CPU; and for the computer instruction included in the second partial sub-instruction set, the binary code supported by the central processor 201 to represent the computer instruction, and the coprocessor 202 support the representation of the computer instruction
  • the binary code is not identical; therefore, the binary code representing the computer instruction contained in the second partial sub-instruction set, and the binary code representing the computer instruction contained in the third partial sub-instruction set are not recognized by the coprocessor and carried out.
  • the embodiment provides step A402 to convert a partial binary code or all binary codes in the first instruction set to obtain a second instruction set; and with respect to the first instruction set, the second instruction set includes more capable of being coprocessor A binary code that is identified and executed, the second instruction set having better recognizability and enforceability. Similar to the first instruction set, each binary code included in the second instruction set triggers a computer operation, a binary code included in the second instruction set triggers a computer operation, and the present embodiment expects the coprocessor to execute the second instruction set trigger The computer operation is the same as the computer operation triggered by the CPU executing the first instruction set.
  • step A402 directly obtains the second instruction set according to the first instruction set.
  • the binary code included in the first instruction set is acquired to the second instruction set.
  • the first instruction set includes a representation computer instruction (the second partial sub-instruction) a binary code of the centralized computer instruction
  • step A402 when acquiring the second instruction set according to the first instruction set, converting the binary code included in the first instruction set into a binary code representing the computer instruction supported by the coprocessor, The converted binary code is obtained into the second instruction set.
  • step A402 directly acquires the second instruction set according to the first instruction set.
  • the binary code contained in an instruction set is acquired to the second instruction set.
  • the coprocessor not only receives the first instruction set required to execute the first process from the CPU, but also acquires other data related to the first process from the CPU, including a process state of the first process and a register value associated with the first process; the process state includes necessary state information of a process such as a process priority, a process identifier, and a stack pointer; and a register related to the first process by the coprocessor The value is stored in the coprocessor's register.
  • step A402 needs to replace the register address of the CPU in the binary code of the first instruction set with the second instruction set according to the first instruction set.
  • a register address of the coprocessor the register address of the replaced coprocessor is obtained to the second instruction set; optionally, if the register of the coprocessor and the register of the CPU belong to the same register, step A402 is according to the first instruction set When the second instruction set is obtained, the register address of the CPU in the binary code in the first instruction set is directly acquired to the second instruction set.
  • the coprocessor not only receives the first instruction set required to execute the first thread from the CPU, but also acquires other data related to the first thread from the CPU, including The thread state of the first thread and the register value associated with the first thread.
  • the coprocessor stores the register value associated with the first thread into a register of the coprocessor; similar to the first instruction set required to execute the first thread, the CPU migrates to the coprocessor, here also according to the coprocessor Whether the register and the register of the CPU do not belong to the same register, determine whether the register address of the CPU or the register address of the coprocessor is acquired to the second instruction set when acquiring the second instruction set according to the first instruction set in step A402.
  • the first instruction set migrated by the CPU to the coprocessor is a set of binary code.
  • the coprocessor also acquires, from the CPU, other data required to execute the first instruction set, including register values associated with the first instruction set; the coprocessor stores the register values associated with the first instruction set to the coprocessor In the register. Migrating from the CPU to the coprocessor is similar to the first instruction set required to execute the first thread.
  • the register of the coprocessor and the register of the CPU do not belong to the same register, it is determined in step A402 according to the first When the instruction set acquires the second instruction set, the register address of the CPU or the register address of the coprocessor is acquired to the second instruction set.
  • Step A403 the coprocessor executes the binary code in the second instruction set.
  • the second operating system determines, on the coprocessor, the process running node of the second process according to the process state and the register value of the first process; Starting from the process running node, using the coprocessor's registers, executing the binary code in the second instruction set runs the second process.
  • the second operating system determines, on the coprocessor, the thread running node of the second thread according to the thread state and the register value of the first thread; Starting from the thread running the node, using the coprocessor's registers, executing the binary code in the second instruction set to run the second thread.
  • the coprocessor uses the registers of the coprocessor to execute the second instruction set.
  • the coprocessor for the first instruction set migrated by the CPU even if the first instruction set compiled according to the instruction set of the CPU includes a binary code that the coprocessor cannot recognize, performing step A402 can partially or completely convert the binary instruction code.
  • Binary code correspondingly generates a second instruction set
  • the coprocessor can recognize that the recognition rate of the second instruction set is greater than the recognition rate capable of identifying the first instruction set, and the coprocessor executes the second instruction set, thereby eliminating the first operation of the CPU The load required by the instruction set.
  • the coprocessor completes executing the second instruction set in step A403, determining whether to output the execution result of the second instruction set to the CPU according to the specific application; if the CPU performs other computer operations according to the result, the coprocessing The result is fed back to the CPU; if it is by the coprocessor Performing other computer operations based on the result, the coprocessor may not feed back the result to the CPU; for example, for the execution result of the second instruction set, if the next computer operation is performed by the CPU controlling the explicit module to explicitly execute the result, then The coprocessor returns the execution result to the CPU. If the coprocessor can directly control the display module, and the next computer operation is controlled by the coprocessor to explicitly display the result, the coprocessor may not need to feedback the result. The CPU directly controls the explicit module to explicitly execute the result.
  • step A402 is refined.
  • the coprocessor is configured according to the Obtaining a second instruction set in an instruction set includes:
  • Step A4021 the coprocessor matches the operation code of the binary code in the first instruction set in a preset translation table, if the operation code of the first binary code in the first instruction set is in the translation If the table is matched, the operation code of the first binary code is translated into the second binary code according to the matching item corresponding to the operation code of the first binary code in the translation table.
  • An operation code obtaining the second binary code the coprocessor obtaining the second instruction set according to the obtained at least one of the second binary codes, wherein the translation table includes the same computer instruction Corresponding relationships between different operation codes in the first operating system and the second operating system are respectively compiled, and the second binary code is a binary code applicable to the second operating system.
  • the instruction set supported by the coprocessor also includes the same computer instruction, but the binary code supported by the CPU and the representation supported by the coprocessor are supported by the CPU.
  • the binary code of the computer instruction is different.
  • a translation table is established, and the translation table records the correspondence relationship of each computer instruction for each computer instruction in the second partial sub-instruction set; the correspondence relationship of the computer instruction includes: A binary code supported by the CPU that represents the computer instruction, and a binary code supported by the coprocessor that represents the computer instruction.
  • the embodiment defines a specific computer instruction as a computer instruction in which the corresponding relationship is recorded in the translation table; therefore, the embodiment adds the correspondence relationship of each computer instruction included in the second partial sub-instruction set to the translation table, Each computer instruction contained in the second partial sub-instruction set belongs to a specific computer instruction.
  • step A4021 traverses each binary code in the first instruction set according to the translation table to match whether the search is performed.
  • the opcode of the first binary code being a binary code supported by the CPU recorded in the translation table (ie, a binary code supported by the CPU to represent a computer instruction in the second partial sub-instruction set).
  • step A4021 replaces the operation code in the first binary code with translation for each first binary code in the first instruction set.
  • a match corresponding to the opcode recorded by the table the match being a binary code supported by the coprocessor representing a particular computer instruction (a specific computer instruction represented by the opcode in the first binary code), replacing the opcode
  • the resulting binary code is used as a second binary code; in the second binary code, the match is an opcode of the second binary code; the second binary code is obtained into the second set of instructions.
  • the binary code supported by the CPU to express the XOR instruction differs depending on the object being compared; if the values of the two registers are compared, the binary code of the XOR instruction is expressed as " 0011001", if the value of the register and the value of the memory are compared, the binary code of the XOR instruction is expressed as "0011000"; but in the coprocessor, the binary code of the XOR instruction is uniformly expressed as "0011000",
  • the translation table records the mapping relationship between "0011001" and "0011000".
  • the coprocessor matches the operation code of the first binary code of the first instruction set to "0011001" according to the translation table, and then "0011000" is taken as the second.
  • the operation code of the binary code If the first binary code has an operand, the operand of how to generate the second binary code according to the operand of the first binary code is not limited herein.
  • the coprocessor obtains the second instruction set according to the first instruction set, and traverses the searched other binary codes that are not in the first binary code according to the translation table, which manner is used to obtain the corresponding binary code
  • the binary code to the second instruction set is not limited.
  • step A402 how to process the first instruction set to include a binary code representing the computer instruction of the first partial sub-instruction set and/or the third partial sub-instruction set, and to refine step A402, Referring to FIG. 5, the obtaining, by the coprocessor, the second instruction set according to the first instruction set further includes:
  • Step A4022 If the operation code of the third binary code in the first instruction set is not matched in the translation table, the coprocessor uses the third binary code as the The binary code in the second instruction set.
  • the third binary code belongs to the other binary code that is not traversed by the first binary code and is traversed from the first instruction set according to the translation table according to the translation table.
  • the opcode of a binary code in the first instruction set is a binary code representing a computer instruction in the first partial sub-instruction set
  • the opcode of the binary code is not matched in the translation table
  • the operation code of a binary code in the instruction set is a binary code indicating a computer instruction in the third partial sub-instruction set, and the operation code of the binary code is not matched in the translation table; therefore, the first
  • the opcode of the triple binary code may be a binary code representing a computer instruction in the first partial sub-instruction set, or may be a binary code of a computer instruction in the third partial sub-instruction set.
  • step A4022 may not match the binary code (ie, the third binary code) of the operation code in the translation table in the first instruction set,
  • the third binary code is directly fetched from the first instruction set to the second instruction set.
  • step A402 may include step A4021 and/or step A4022, whether step A4021 or step A4022 is performed when step A402 is performed, and is determined according to a specific implementation scenario;
  • Implementation scenario if there is no specific computer instruction in the first instruction set, and the CPU and the coprocessor use the same register to execute the process, step A402 includes step A4022; if the first instruction set contains a specific computer instruction, step A402 is performed Including step A4021, FIG. 5 shows a schematic diagram of step A4021 and/or step A4022 performed at step A402.
  • step A4021 and step A4022 are to be performed in step A402, step A4021 may be performed first and then step A4022 may be performed, or step A4021 and step A4022 may be executed in parallel.
  • step A4021 and step A4022 when searching for the first binary code according to the translation table traversing the first instruction set, each time determining whether one binary code in the first instruction set is the first binary code, Then, according to the determination result, it is determined that step A4021 is performed or step A4022 is performed. Specifically, if the result of the determination is that the binary code is the first binary code, step A4021 is performed to perform the second binary corresponding to the first binary code. The code acquires the second instruction set. If the result of the determination is that the binary code is the third binary code, step A4022 is executed to directly acquire the third binary code to the second instruction set.
  • searching for the first binary code according to the translation table traversing the first instruction set is performed in the order of execution of each binary code in the first instruction set.
  • the correspondence between the translation table records may be one or more, and the above translation table used in step A4021
  • the correspondence corresponding to each computer instruction in the second partial sub-instruction set is recorded, but the translation table here may be less than the number of correspondences recorded in the translation table used in step A4021, so the translation table here It can be updated, such as adding the correspondence to a translation table matching a computer instruction in the second partial sub-instruction set, deleting one or more of the correspondences in the translation table; thereby providing an alternative step A4021 That is, when the step A4021 is performed, the translation table described above is used instead of the translation table described above.
  • an optional refinement is performed on the computer instruction processing method.
  • the coprocessor is executed.
  • the method further includes steps before the binary code in the second instruction set A601;
  • Step A601 the coprocessor converts a register address of the CPU in a binary code included in the second instruction set into a register address of the coprocessor.
  • the register address of the CPU is represented by a binary code
  • the register address of the coprocessor is represented by a binary code
  • the register possessed by the CPU and the register possessed by the coprocessor are not the same type of register
  • the binary of the register address of the CPU is represented.
  • the code is not the same as the binary code representing the register address of the coprocessor.
  • the coprocessor is a second instruction set that uses the own register to run the CPU to migrate to the coprocessor. Before executing the second instruction set, it searches whether the binary code of the second instruction set contains the register address of the CPU, if it is found Replacing the register address of the CPU found in the second instruction set with the register address of the corresponding coprocessor according to the matching replacement relationship, and the matching replacement relationship is a mapping relationship between the register address of the coprocessor and the register address of the CPU. .
  • step A602 when the coprocessor obtains the second instruction set according to the first instruction set in step A402, step A602 is executed to execute the binary code included in the first instruction set.
  • the register address of the CPU is converted into a register address of the coprocessor, and the register address of the coprocessor is acquired into the second instruction set; at this time, the execution sequence of the step A602 and the execution sequence of step A4021 are not performed. Defined, typically, this alternative is performed in parallel with step A4021.
  • the register of the CPU is a 128-bit register or a 256-bit register
  • the coprocessor's register is a 512-bit register
  • the register of the CPU is a 128-bit XMM register or a 256-bit YMM register
  • the coprocessor register is a 256-bit ZMM register.
  • the computer instructions that use the registers of the CPU in the first instruction set belong to the vectorization instructions
  • the computer instructions that use the registers of the coprocessor in the second instruction set belong to the vectorization instructions.
  • the first instruction set is that the CPU migrates to the coprocessor when a CPU usage of the CPU is greater than a first threshold.
  • the CPU executes the first set of instructions, optionally optionally executing one or more other binary codes in parallel. If the CPU usage is greater than the first threshold, indicating that the CPU usage is too high, the first instruction set is migrated to the coprocessor to reduce the CPU load.
  • the first process is taken as an example to explain how to filter the first instruction set migrated to the coprocessor.
  • the method of screening the first process is also suitable for screening the first thread; the manner of filtering the first process is as follows: if the CPU only runs A process, the process is the first process. If the CPU runs multiple processes in parallel, several alternative refinements are provided for determining how to determine the first process from among the multiple processes executed by the CPU:
  • the first optional refinement implementation where the CPU usage of the CPU is greater than the first threshold, selecting one or more first processes from the process whose priority is less than the priority threshold, preferably, selecting the process with the lowest priority As the first process;
  • one or more first processes are selected from a process in which the CPU usage is greater than the occupancy threshold, and preferably, the CPU usage is the highest.
  • the process as the first process.
  • the coprocessor receives the first instruction set migrated by the CPU, and includes step A4011 and step A4012.
  • Step A4011 the coprocessor receives an address of the first instruction set to be migrated sent by the CPU, where the address of the first instruction set is that the first instruction set is in a memory of the CPU. a stored address, wherein an address of the first instruction set is sent by the CPU to the coprocessor when a memory usage of the CPU is less than or equal to a second threshold;
  • Step A4012 the coprocessor acquires the first instruction set by accessing a memory of the CPU based on an address of the first instruction set.
  • the CPU usage of the CPU is greater than the first threshold, and the memory usage of the CPU using the memory is less than or equal to the second threshold, indicating that the CPU usage is too high but the memory usage of the CPU using the memory is not too high; in this case,
  • the CPU sends the address of the first instruction set to the coprocessor.
  • the address of the first instruction set is a physical address in which the first instruction set is stored in a memory of the CPU.
  • the coprocessor executes step A4012 to access the memory of the CPU according to the address of the first instruction set, and reads the first instruction set from the memory of the CPU.
  • the step A402 is executed to acquire the second instruction set according to the first instruction set, and store the acquired second instruction set in the memory of the CPU.
  • An optional specific manner is to replace the memory of the CPU with the acquired second instruction set.
  • the first instruction set in .
  • An optional implementation manner of the step A4011 and the step A4012 is: when the CPU usage rate is greater than the first threshold, and the memory usage rate of the CPU usage memory is less than or equal to the second threshold, the data related to the first instruction set is used.
  • the storage address in the memory of the CPU is sent to the coprocessor; then, the coprocessor accesses the memory of the CPU according to the storage address via the bus (such as the PCI-E bus), and reads the memory associated with the first instruction set from the memory of the CPU.
  • the data related to the first instruction set includes a first instruction set, and further includes other data required to execute the first instruction set, such as an running state of the first instruction set (such as a process state of the first process).
  • the first instruction set is sent by the CPU to the coprocessor when a memory usage of the CPU is greater than a second threshold.
  • the memory usage of the CPU using the memory is greater than the second threshold, the memory usage on behalf of the CPU using the memory is too high; in this case, whether the CPU usage of the CPU is greater than the first Threshold, the CPU migrates the first instruction set to the coprocessor, and the step A401 receives the first instruction set and stores it in the memory of the coprocessor.
  • a specific optional implementation manner is: when the memory usage rate of the CPU using the memory is greater than the second threshold, the CPU reads data related to the first instruction set from the memory used by the CPU, and reads the first and the first The instruction set related data is sent to the coprocessor; the coprocessor receives the data related to the first instruction set, and stores the data related to the first instruction set in the coprocessor memory; and then the coprocessor again and the first The instruction set related data extracts the first instruction set, and the executing step A402 acquires the second instruction set according to the first instruction set, and stores the acquired second instruction set into the coprocessor memory, and executes the coprocessor memory. Two instruction sets.
  • the processing may be performed in any one of the following four alternative manners.
  • step A403 executes the binary code in the second instruction set, and specifically includes step A801, step A802, step A803, and step A804.
  • Step A801 the coprocessor sequentially executes the binary code in the second instruction set
  • Step A802 if a binary code recognition abnormality occurs when the second instruction set is detected, determining a fourth binary code that triggers an abnormality
  • Step A803 converting the fourth binary code into an intermediate code, and converting the intermediate code into a fifth binary code applicable to the second operating system; wherein the fifth binary code is One or more binary codes;
  • Step A804 executing the fifth binary code, and continuing to execute the binary code after the fourth binary code in the second instruction set.
  • step A402 acquires a second instruction set according to the first instruction set, although the obtained binary code in the second instruction set is used to instruct the coprocessor to perform a computer operation in the second operating system, but the second Each binary code in the instruction set may not be recognized by the coprocessor.
  • the binary code recognition exception is triggered.
  • the binary instruction that triggers the binary code exception is defined as the fourth two. Binary code.
  • the first instruction set indicates that the binary code of each computer instruction in the third partial sub-instruction set is directly acquired into the second instruction set, and the coprocessor executes to execute the second instruction set to include the binary code as the operation code.
  • the binary code that does not recognize the binary code triggers the binary code to identify the exception, so the binary code belongs to the fourth binary code.
  • the coprocessor step A801 sequentially executes the binary code in the second instruction set. If the step A802 detects the binary code recognition abnormality, determining a fourth binary code that triggers the binary code to identify the abnormality; and then step A803 Converting the fourth binary code into an intermediate code, and converting the intermediate code into a fifth binary code recognizable by the coprocessor; then, step A804 executes the fifth binary code, and then The binary code following the fourth binary code in the second instruction set is continued to be executed.
  • the intermediate code has a mapping relationship with the fourth binary code, and may be a mapping relationship of one or more intermediate codes corresponding to a fourth binary code; meanwhile, the intermediate code further has a fifth binary code Mapping relationship; under the condition that the intermediate code has a mapping relationship with the fourth binary code and the fifth binary code, the specific expression form of the intermediate code is not limited; for example, the Java byte code is used as the The intermediate code determines a mapping relationship between the fourth binary code and the corresponding Java bytecode when converting the fourth binary code into the corresponding Java bytecode, and converts the corresponding Java bytecode into the fifth two Determining a mapping relationship between the corresponding Java bytecode and the fifth binary code; similarly, in the process of converting the fourth binary code into the fifth binary code in step A803, You can also use the intermediate code implementation of the simics simulator, and you can also use the intermediate code of the qemu simulator.
  • the MIC does not support 22 computer instructions such as CMOV, OUT, PAUSE, SYSEXIT, etc. included in the third partial sub-instruction set, and the MIC executes to the fourth binary code that represents the binary code of the computer instruction as the operation code.
  • the code will not be recognized, and a binary code recognition exception will occur; taking the conditional jump instruction (CMOV) as an example, the fourth binary code executed by the MIC to represent the binary code of the conditional jump instruction will trigger the binary.
  • CMOV conditional jump instruction
  • the code recognition is abnormal, and the binary code recognition abnormality is detected in step A802.
  • conditional judgment instruction and the movement instruction indicated by the intermediate code are determined according to the conditional jump instruction in step A803, and the condition determination instruction and the movement instruction indicated by the intermediate code are respectively Translating into a conditional judgment instruction and a movement instruction (MOV) recognizable by the MIC, the MIC first executes a condition determination instruction to determine whether the jump condition is satisfied in step A804, and if so, executes a movement instruction (MOV), and then executes the fourth instruction set.
  • condition determination instruction to determine whether the jump condition is satisfied in step A804
  • MOV movement instruction
  • step A803 can Converting the fourth binary code representing the binary code of the computer instruction to the fifth binary code by the intermediate code, and converting the fifth binary code according to a fourth binary code may be One or more, the number of the fifth binary code is not limited here.
  • step A403 executes an exception of the second instruction set, suspends execution of the second instruction set, and outputs abnormal information, where the abnormal information includes a fourth binary code that triggers the abnormality, an abnormal type, an abnormal execution result, and the like;
  • the status information of executing the second instruction set is written into the running log of the second instruction set in real time, and the abnormal information including executing the abnormality of the second instruction set is also written into the second instruction set. In the run log. If it is determined from the abnormality information that the abnormality is a binary code identifying abnormality, step A802 determines a fourth binary code that triggers the abnormality based on the abnormality information.
  • the fourth binary code can be indirectly converted to the fifth binary code supported by the coprocessor via the intermediate code even if the coprocessor executes the second instruction set to generate a binary code identification exception. Resuming execution of the second instruction set from the fifth binary code, effectively overcoming the exception and ensuring normal execution of the second instruction set; and so on, for each exception that occurs during the execution of the second instruction set Can effectively overcome, continue to execute the second instruction set after each time the exception is overcome.
  • an extended instruction set supported by only the coprocessor is developed for the special application of the coprocessor, and the computer instruction (in binary code representation) included in the extended instruction set is only co-processed.
  • the computer instruction (in binary code representation) included in the extended instruction set is only co-processed.
  • the mapping relationship between one or more computer instructions included in the instruction set, and thus the intermediate code converted to the fifth binary code in step A803, the opcode of the fifth binary code may represent the extended instruction set Binary code for computer instructions.
  • the fourth binary code is indirectly converted to represent the extended instruction set
  • the binary code of the computer instruction included is the binary code of the operation code, not only the abnormality indicated by the abnormal information but also the efficiency of the second instruction set specified by the coprocessor can be improved.
  • the second alternative manner referring to FIG. 9, before the conversion of the fourth binary code into the intermediate code in step A803, includes step A901 and step A902.
  • Step A901 sending an instruction set fetch request to the CPU
  • Step A902 Receive a reject backhaul instruction sent by the CPU.
  • step A403 performs a binary code identification abnormality in the second instruction set
  • step A901 sends an instruction set fetch request to the CPU; the CPU responds to the instruction set fetch request, and determines whether to transfer the second instruction set to the CPU for execution, if If it is determined that the second instruction set is not to be fetched, the regenerative instruction is rejected to the coprocessor; if the coprocessor receives the reject relocation instruction in step A902, then step A803 is executed to convert the fourth binary code into the middle. Code.
  • An optional way for the CPU to respond to the instruction set fetch request is to determine whether to transfer the second instruction set to the CPU for execution according to the load of the CPU; the CPU load includes: CPU usage, and memory usage of the CPU using memory. If the CPU usage of the CPU is greater than the third threshold, or if the memory usage of the CPU using the memory is greater than a fourth threshold, then the reject redirection instruction is sent to the coprocessor.
  • step A403 executes the binary code in the second instruction set, including step B1001, step B1002, and step B1003.
  • Step B1001 the coprocessor sequentially executes the binary code in the second instruction set
  • Step B1002 if a binary code recognition abnormality occurs when the second instruction set is detected, determining a sixth binary code that triggers an abnormality
  • Step B1003 Obtain a third instruction set applicable to the first operating system according to the binary code starting from the sixth binary code in the second instruction set, and migrate the third instruction set to the CPU.
  • step A402 acquires a second instruction set according to the first instruction set, and each binary code in the second instruction set may not be recognized by the coprocessor, and the binary code is triggered to execute the unrecognized binary code for the coprocessor Identify an exception, this embodiment defines a trigger binary code
  • the usual binary instruction is the sixth binary code.
  • the definition of the sixth binary code is the same as the definition of the fourth binary code. See the explanation of defining the fourth binary code in the first alternative. .
  • the coprocessor executes when the second instruction set is executed.
  • the binary code is a binary code of the opcode, and the binary code is not recognized to trigger the binary code to recognize the exception. Therefore, the binary code belongs to the sixth binary code.
  • step B1003 is executed to process the binary code recognition exception.
  • the coprocessor acquires a third instruction set applicable to the first operating system according to the binary code starting from the sixth binary code in the second instruction set in step B1003, and according to the first instruction set in step A402
  • the implementation principle of obtaining the second instruction set is the same, and the related explanation of step A402 and the related explanation of the optional refinement of step A402, for example, the related explanation of step A4021, step A4022, and the like, can be referred to.
  • step A4021, step B1003 when acquiring the third instruction set applicable to the first operating system according to the binary code starting from the sixth binary code in the second instruction set, traversing the first according to the translation table.
  • Each binary code in the second instruction set starting with the six binary code matches whether there is a seventh binary code, and the operation code of the seventh binary code is supported by the coprocessor recorded in the translation table.
  • the binary code of the specific computer instruction represented by the code, the binary code after replacing the operation code as the eighth binary code; in the eighth binary code, the matching item is the operation code of the eighth binary code
  • the eighth binary code is obtained into the third instruction set.
  • step B1003 obtains the third instruction set applicable to the first operating system according to the binary code starting from the sixth binary code in the second instruction set
  • the sixth binary code starts.
  • Each of the seventh binary codes in the second instruction set is converted into a corresponding eighth binary code, and the eighth binary code is acquired to the third instruction set.
  • the first processing mode is processed by the coprocessor; specifically, the coprocessor acquires the binary code starting from the sixth binary code in the second instruction set in step B1003, and is applicable to the first operation.
  • step B1003 is executed to convert the register address of the coprocessor included in the binary code starting from the sixth binary code in the second instruction set to the register address of the CPU,
  • the register address of the CPU is obtained in the third instruction set; at this time, the execution order of the eighth binary code and the first processing mode conversion register address according to the translation table is not limited, and is usually executed in parallel;
  • the second processing mode is processed by the CPU; specifically, after receiving the third instruction set migrated by the coprocessor in step B1003, the CPU replaces the register address of the coprocessor in the third instruction set with the register of the CPU. address.
  • step B1003 replaces the second instruction set stored in the memory of the CPU with the third instruction set according to the storage address in the memory of the CPU of the second instruction set. If the second set of instructions is stored in the memory of the coprocessor, step B1003 migrates the third set of instructions to the CPU such that the CPU stores the third set of instructions in the memory of the CPU.
  • the coprocessor executes the sixth binary code to trigger the binary code to identify the abnormality, and then migrates the unexecuted second instruction set to the CPU; during the process of migrating the unexecuted second instruction set to the CPU, If the data related to the unexecuted second instruction set is stored in the coprocessor's memory, the data stored in the coprocessor's memory and related to the unexecuted second instruction set is sent to the CPU.
  • the data related to the unexecuted second instruction set includes: a third instruction set migrated to the CPU, so that the CPU stores the data related to the unexecuted second instruction set in the memory of the CPU.
  • the coprocessor also sends a register value associated with the unexecuted second instruction set to the CPU, and the CPU stores the register value associated with the unexecuted second instruction set to the CPU register (and the third The register address in the instruction set corresponds to).
  • step B1101 and step B1102 are also included.
  • Step B1101 sending an instruction set fetch request to the CPU
  • Step B1102 Receive an instruction set fetch response sent by the CPU.
  • step A403 performs a binary code identification abnormality in the second instruction set
  • step A1101 sends an instruction set fetch request to the CPU
  • the CPU responds to the instruction set fetch request, and determines whether the second instruction set is returned to the CPU for execution, if Determining to fetch the second instruction set, and feeding back the instruction set fetch response to the coprocessor;
  • the coprocessor receiving the fetch response of the instruction set in step B1003, performing step B1003 according to the sixth instruction set according to the second instruction set
  • the binary code starting with the binary code obtains a third set of instructions applicable to the first operating system.
  • An optional way for the CPU to respond to the instruction set fetch request is to determine whether to transfer the second instruction set to the CPU for execution according to the load of the CPU; the CPU load includes: CPU usage, and memory usage of the CPU using memory. And if the CPU usage of the CPU is less than or equal to a fifth threshold, and the memory usage of the CPU usage memory is less than or equal to a sixth threshold, sending an instruction set fetch response to the coprocessor.
  • FIG. 12 is a schematic diagram of an optional logical structure of the coprocessor 202 of the present embodiment; the coprocessor 202 is applied to a processor system, and the processor system includes the coprocessor 202 And a central processing unit (CPU) running a first operating system, the coprocessor 202 running a second operating system;
  • CPU central processing unit
  • the coprocessor 202 includes:
  • a first instruction set receiving unit 2021 configured to receive a first instruction set migrated by the central processor, where the first instruction set is used to instruct the central processor to perform a computer operation in the first operating system, where
  • the first set of instructions is a set of binary codes applicable to the first operating system
  • a second instruction set obtaining unit 2022 configured to obtain a second instruction set according to the first instruction set, where the binary code in the second instruction set is used to indicate that the coprocessor 202 is in the second operating system Performing the computer operation;
  • the second instruction set execution unit 2023 is configured to execute the binary code in the second instruction set.
  • the second instruction set obtaining unit 2022 is configured to obtain the second instruction set according to the first instruction set, including:
  • the second instruction set obtaining unit 2022 is configured to match an operation code of the binary code in the first instruction set in a preset translation table, if an operation code of the first binary code in the first instruction set If the translation table is matched, the operation code of the first binary code is translated into the second second according to the matching item corresponding to the operation code of the first binary code in the translation table.
  • the operation code of the hex code obtains the second binary code, and obtains the second instruction set according to the obtained at least one second binary code, wherein the translation table includes the same computer instruction respectively Compiling a generated correspondence between different operation codes in the first operating system and the second operating system, the second binary code being a binary code applicable to the second operating system.
  • the coprocessor 202 further includes:
  • the register address conversion unit 2024 is configured to convert a register address of the central processor in the binary code included in the second instruction set into a register address of the coprocessor 202.
  • the second instruction set obtaining unit 2022 is further configured to: if the operation code of the third binary code in the first instruction set is not matched in the translation table, the coprocessing The 202 uses the third binary code as a binary code in the second instruction set.
  • the first instruction set is that the central processor migrates to the coprocessor 202 when the CPU usage of the central processor is greater than a first threshold.
  • the first instruction set receiving unit 2021 is configured to receive the first instruction set migrated by the central processing unit, including:
  • the first instruction set receiving unit 2021 is configured to receive an address of the first instruction set to be migrated sent by the central processing unit, and access the memory of the central processing unit based on an address of the first instruction set. Obtaining the first instruction set; wherein an address of the first instruction set refers to an address stored in a memory of the central processing unit of the first instruction set, where The address is sent by the central processor to the coprocessor 202 when the memory usage of the central processor is less than or equal to a second threshold.
  • the first set of instructions is sent by the central processor to the coprocessor 202 when a memory usage of the central processor is greater than a second threshold.
  • the second instruction set execution unit 2023 is configured to execute the binary code in the second instruction set, including:
  • the second instruction set execution unit 2023 is configured to sequentially execute the binary code in the second instruction set; if a binary code recognition abnormality occurs when detecting the execution of the second instruction set, determine a fourth binary that triggers an abnormality Code-coding, converting the fourth binary code into an intermediate code, converting the intermediate code into a fifth binary code applicable to the second operating system, and executing the fifth binary code And continuing to execute the binary code after the fourth binary code in the second instruction set.
  • the second instruction set execution unit 2023 is further configured to send an instruction set fetch request to the central processor to receive the central processing before converting the fourth binary code into the intermediate code.
  • the device rejects the fetch instruction.
  • the second instruction set execution unit 2023 is configured to execute the binary code in the second instruction set, including:
  • the second instruction set execution unit 2023 is configured to sequentially execute the binary code in the second instruction set; if a binary code recognition abnormality occurs when detecting the execution of the second instruction set, determine a sixth binary that triggers an abnormality Code generation; acquiring, according to the binary code starting from the sixth binary code in the second instruction set, a third instruction set applicable to the first operating system, and migrating the third instruction to the central processor set.
  • the second instruction set execution unit 2023 is further configured to obtain, according to the binary code starting from the sixth binary code in the second instruction set, a third instruction set applicable to the first operating system.
  • the instruction set fetch request is sent to the central processor to receive an instruction set fetch response sent by the central processor.
  • FIG. 14 is a schematic diagram showing the hardware structure of the coprocessor 1401 according to the embodiment, and shows a hardware structure of the coprocessor 1401.
  • the coprocessor 1401 is connected to the memory 1402 via a bus 1403 for storing computer execution instructions, and the coprocessor 1401 reads the computer execution instructions stored by the memory 1402, and executes The computer instruction processing method provided by the above embodiment.
  • the computer instruction processing method For a specific implementation of the computer instruction processing method, refer to the related description of the computer instruction processing method in the foregoing embodiment, and details are not described herein again.
  • the coprocessor 1401 may use an Intel Integrated Core (MIC), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits.
  • MIC Intel Integrated Core
  • ASIC Application Specific Integrated Circuit
  • the related program is implemented to implement the technical solution provided by the foregoing method embodiment, and includes the computer instruction processing method provided by the foregoing embodiment.
  • the memory 1402 may be a read only memory (ROM), a static storage device, a dynamic storage device, or a random access memory (RAM).
  • the memory 1402 can store an operating system and other applications.
  • the program code for implementing the technical solution provided by the foregoing method embodiment is saved in the memory 1402, including the foregoing implementation to be applied to the coprocessor 1401.
  • the program code of the computer instruction processing method provided by the example is stored in the memory 1402 and executed by the coprocessor 1401.
  • the bus 1403 may include a path for transmitting information between the various components in the coprocessor 1401 and the memory 1402.
  • coprocessor 1401 shown in FIG. 14 only shows the coprocessor 1401, the memory 1402, and the bus 1403, in a specific implementation process, those skilled in the art will understand that the coprocessor The 1401 also contains other devices necessary to achieve normal operation, such as a communication interface. In the meantime, those skilled in the art will appreciate that the coprocessor 1401 may also include hardware devices that implement other additional functions, depending on the particular needs. Moreover, those skilled in the art will appreciate that the coprocessor 1401 may also only include the components necessary to implement the above-described method embodiments, and does not necessarily include all of the devices shown in FIG.
  • the processor system 200 includes a central processing unit 201 (CPU) and a coprocessor 202.
  • the CPU runs a first operating system, and the coprocessing is performed.
  • the second operating system is running on the device 202;
  • the central processing unit 201 is configured to migrate the first instruction set to the coprocessor 202;
  • the coprocessor 202 is configured to execute the computer instruction processing method provided by the foregoing embodiment or the optional refinement manner of the foregoing embodiment.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the modules and units is only one logical function division, and may be implemented in another manner, such as multiple modules or units or components. It can be combined or integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or module, and may be electrical, mechanical or otherwise.
  • the modules described as separate components may or may not be physically separated, and the components of the modules may or may not be physical modules, that is, may be located in one place, or may be distributed to multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional module in each embodiment of the present invention may be integrated into one processing module, or each module may exist physically separately, or two or more modules may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of hardware plus software function modules.
  • the above-described modules implemented in the form of software function modules can be stored in a computer readable storage medium.
  • the software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform some of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a mobile hard disk, a read-only memory (English: Read-Only Memory, ROM for short), and a random access storage.
  • RAM Random Access Memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

一种计算机指令处理方法、协处理器(202)和系统(200);所述计算机指令处理方法包括:协处理器(202)接收中央处理器CPU(201)迁移的第一指令集(A401),根据适于CPU(201)执行的第一指令集获取在协处理器(202)执行的第二指令集(A402),执行所述第二指令集中的二进制代码(A403)。这样,由协处理器(202)执行第二指令集代替由CPU(201)执行第一指令集,减小CPU(201)的负荷,提高协处理器(202)的使用率。

Description

计算机指令处理方法、协处理器和系统 技术领域
本发明实施例涉及计算机领域,尤其涉及计算机指令处理方法、协处理器和系统。
背景技术
目前,协处理器(coprocessor),为一种芯片,主要用于代中央处理器(Central Processing Unit,简称CPU)处理特定任务。由于协处理器和中央处理器在指令集上的部分差异,在协处理器上运行的程序往往需要使用编译器单独编译,且需要对代码进行一定调整。现有技术中,对于某个应用的代码,一般会在该应用的代码中做好标签,通过该标签区分出哪些代码由CPU执行,哪些代码由协处理器执行。首先,对该应用的所有代码编译,包括根据标签对由CPU执行的代码和对由协处理器执行的代码进行不同的编译;代码编译后,CPU运行该应用的进程的过程中,如执行到需由协处理器执行的已编译代码,则CPU暂停执行该应用的进程并发送需由协处理器执行的已编译代码至协处理器,将标签处的代码卸载(offload)到协处理器上执行。
从上可知,现有技术完成对应用的代码编译时,已明确哪部分已编译代码由CPU执行,哪些已编译代码由协处理器执行,不能根据需求将任意代码发送给协处理器执行,导致协处理器实际使用率较低,不能很好减轻CPU的负荷。
发明内容
有鉴于此,本发明实施例提供了一种计算机指令处理方法、协处理器和系统,CPU可将计算机指令迁移至运行有操作系统的协处理器,由协处理器执行该计算机指令来减小CPU的负荷。
第一方面,本发明实施例提供了一种计算机指令处理方法,应用于处理器系统,所述处理器系统包括协处理器和中央处理器CPU,所述CPU上运行第一操作系统,所述协处理器上运行第二操作系统;所述方法包括:
所述协处理器接收所述CPU迁移的第一指令集,所述第一指令集用于指示所述CPU在所述第一操作系统中执行计算机操作,所述第一指令集为适用于所述第一操作系统的二进制代码的集合;
所述协处理器根据所述第一指令集获得第二指令集,其中,所述第二指令集中的二进制代码用于指示所述协处理器在所述第二操作系统中执行所述计算机操作;
所述协处理器执行所述第二指令集中的二进制代码。
结合第一方面,在第一种可能的实现方式中,所述协处理器根据所述第一指令集获得第二指令集包括:
所述协处理器在预先设置的翻译表中匹配所述第一指令集中的二进制代码的操作码,若所述第一指令集中的第一二进制代码的操作码在所述翻译表中被匹配到,则根据所述翻译表中所述第一二进制代码的操作码对应的匹配项,将所述第一二进制代码的操作码翻译为第二二进制代码的操作码,获得所述第二二进制代码,所述协处理器根据获得的至少一条所述第二二进制代码获得所述第二指令集,其中,所述翻译表包含相同的计算机指令分别编译生成的在所述第一操作系统和所述第二操作系统中不同的操作码之间的对应关系,所述第二二进制代码为适用于所述第二操作系统的二进制代码。
结合第一方面或者第一方面的的第一种可能的实现方式,在第二种可能的实现方式中,在所述协处理器执行所述第二指令集中的二进制代码之前,所述方法还包括:
所述协处理器将所述第二指令集包含的二进制代码中所述CPU的寄存器地址转换为所述协处理器的寄存器地址。
结合第一方面的第一种可能的实现方式或者第一方面的第二种可能的 实现方式,在第三种可能的实现方式中,所述协处理器根据所述第一指令集获得第二指令集还包括:
若所述第一指令集中的第三二进制代码的操作码在所述翻译表中未被匹配到,则所述协处理器将所述第三二进制代码做为所述第二指令集中的二进制代码。
结合第一方面或者第一方面的的第一种可能的实现方式或者在第二种可能的实现方式或者第一方面的第三种可能的实现方式,在第四种可能的实现方式中,所述第一指令集是所述CPU在所述CPU的CPU使用率大于第一阈值时向所述协处理器迁移的。
结合第一方面的第四种可能的实现方式,在第五种可能的实现方式中,所述协处理器接收所述CPU迁移的第一指令集,包括:
所述协处理器接收所述CPU发送的要迁移的所述第一指令集的地址,所述第一指令集的地址是指所述第一指令集在所述CPU的内存中存储的地址,其中,所述第一指令集的地址由所述CPU在所述CPU的内存使用率小于或等于第二阈值时向所述协处理器发送;
所述协处理器基于所述第一指令集的地址访问所述CPU的内存来获取所述第一指令集。
结合第一方面或者第一方面的的第一种可能的实现方式或者在第二种可能的实现方式或者第一方面的第三种可能的实现方式,在第六种可能的实现方式中,所述第一指令集由所述CPU在所述CPU的内存使用率大于第二阈值时向所述协处理器发送。
结合第一方面或者第一方面的的第一种可能的实现方式或者在第二种可能的实现方式或者第一方面的第三种可能的实现方式或者第一方面的的第四种可能的实现方式或者在第五种可能的实现方式或者第一方面的第六种可能的实现方式,在第七种可能的实现方式中,所述协处理器执行所述第二指令集中的二进制代码,包括:
所述协处理器依次执行所述第二指令集中的二进制代码;
如果检测到执行所述第二指令集时出现二进制代码识别异常,则确定触发异常的第四二进制代码;
将所述第四二进制代码转换为中间代码,再将所述中间代码转换为适用于所述第二操作系统的第五二进制代码;
执行所述第五二进制代码,并继续执行所述第二指令集中所述第四二进制代码之后的二进制代码。
结合第一方面的第七种可能的实现方式,在第八种可能的实现方式中,所述将所述第四二进制代码转换为中间代码之前,还包括:
向所述CPU发送指令集回迁请求;
接收所述CPU发送的拒绝回迁指令。
结合第一方面或者第一方面的的第一种可能的实现方式或者在第二种可能的实现方式或者第一方面的第三种可能的实现方式或者第一方面的的第四种可能的实现方式或者在第五种可能的实现方式或者第一方面的第六种可能的实现方式,在第九种可能的实现方式中,所述协处理器执行所述第二指令集中的二进制代码,包括:
所述协处理器依次执行所述第二指令集中的二进制代码;
如果检测到执行所述第二指令集时出现二进制代码识别异常,则确定触发异常的第六二进制代码;
根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集,并向所述CPU迁移所述第三指令集。
结合第一方面的第九种可能的实现方式,在第十种可能的实现方式中,所述根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集之前,还包括:
向所述CPU发送指令集回迁请求;
接收所述CPU发送的指令集回迁响应。
第二方面,本发明实施例提供了一种协处理器,应用于处理器系统,所述处理器系统包括所述协处理器和运行第一操作系统的中央处理器CPU,所述协处理器上运行第二操作系统;所述协处理器包括:
第一指令集接收单元,用于接收所述CPU迁移的第一指令集,所述第一指令集用于指示所述CPU在所述第一操作系统中执行计算机操作,所述第一指令集为适用于所述第一操作系统的二进制代码的集合;
第二指令集获得单元,用于根据所述第一指令集获得第二指令集,其中,所述第二指令集中的二进制代码用于指示所述协处理器在所述第二操作系统中执行所述计算机操作;
第二指令集执行单元,用于执行所述第二指令集中的二进制代码。
结合第二方面,在第一种可能的实现方式中,所述第二指令集获得单元,用于根据所述第一指令集获得第二指令集,包括:
所述第二指令集获得单元,用于在预先设置的翻译表中匹配所述第一指令集中的二进制代码的操作码,若所述第一指令集中的第一二进制代码的操作码在所述翻译表中被匹配到,则根据所述翻译表中所述第一二进制代码的操作码对应的匹配项,将所述第一二进制代码的操作码翻译为第二二进制代码的操作码,获得所述第二二进制代码,根据获得的至少一条所述第二二进制代码获得所述第二指令集,其中,所述翻译表包含相同的计算机指令分别编译生成的在所述第一操作系统和所述第二操作系统中不同的操作码之间的对应关系,所述第二二进制代码为适用于所述第二操作系统的二进制代码。
结合第二方面或者第二方面的的第一种可能的实现方式,在第二种可能的实现方式中,所述协处理器还包括:
寄存器地址转换单元,用于将所述第二指令集包含的二进制代码中所述CPU的寄存器地址转换为所述协处理器的寄存器地址。
结合第二方面的第一种可能的实现方式或者第二方面的第二种可能的实现方式,在第三种可能的实现方式中,所述第二指令集获得单元,还用于 若所述第一指令集中的第三二进制代码的操作码在所述翻译表中未被匹配到,则所述协处理器将所述第三二进制代码做为所述第二指令集中的二进制代码。
结合第二方面或者第二方面的的第一种可能的实现方式或者在第二种可能的实现方式或者第二方面的第三种可能的实现方式,在第四种可能的实现方式中,所述第一指令集是所述CPU在所述CPU的CPU使用率大于第一阈值时向所述协处理器迁移的。
结合第二方面的第四种可能的实现方式,在第五种可能的实现方式中,所述第一指令集接收单元,用于接收所述CPU迁移的第一指令集,包括:
所述第一指令集接收单元,用于接收所述CPU发送的要迁移的所述第一指令集的地址,并基于所述第一指令集的地址访问所述CPU的内存来获取所述第一指令集;其中,所述第一指令集的地址是指所述第一指令集在所述CPU的内存中存储的地址,其中,所述第一指令集的地址由所述CPU在所述CPU的内存使用率小于或等于第二阈值时向所述协处理器发送。
结合第二方面或者第二方面的的第一种可能的实现方式或者在第二种可能的实现方式或者第二方面的第三种可能的实现方式,在第六种可能的实现方式中,所述第一指令集由所述CPU在所述CPU的内存使用率大于第二阈值时向所述协处理器发送。
结合第二方面或者第二方面的的第一种可能的实现方式或者在第二种可能的实现方式或者第二方面的第三种可能的实现方式或者第二方面的的第四种可能的实现方式或者在第五种可能的实现方式或者第二方面的第六种可能的实现方式,在第七种可能的实现方式中,所述第二指令集执行单元,用于执行所述第二指令集中的二进制代码,包括:
所述第二指令集执行单元,用于依次执行所述第二指令集中的二进制代码;如果检测到执行所述第二指令集时出现二进制代码识别异常,则确定触发异常的第四二进制代码,将所述第四二进制代码转换为中间代码,再将所 述中间代码转换为适用于所述第二操作系统的第五二进制代码,执行所述第五二进制代码,并继续执行所述第二指令集中所述第四二进制代码之后的二进制代码。
结合第二方面的第七种可能的实现方式,在第八种可能的实现方式中,所述第二指令集执行单元,还用于在将所述第四二进制代码转换为中间代码之前,向所述CPU发送指令集回迁请求,接收所述CPU发送的拒绝回迁指令。
结合第二方面或者第二方面的的第一种可能的实现方式或者在第二种可能的实现方式或者第二方面的第三种可能的实现方式或者第二方面的的第四种可能的实现方式或者在第五种可能的实现方式或者第二方面的第六种可能的实现方式,在第九种可能的实现方式中,所述第二指令集执行单元,用于执行所述第二指令集中的二进制代码,包括:
所述第二指令集执行单元,用于依次执行所述第二指令集中的二进制代码;如果检测到执行所述第二指令集时出现二进制代码识别异常,则确定触发异常的第六二进制代码;根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集,并向所述CPU迁移所述第三指令集。
结合第二方面的第九种可能的实现方式,在第十种可能的实现方式中,所述第二指令集执行单元,还用于根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集之前,向所述CPU发送指令集回迁请求,接收所述CPU发送的指令集回迁响应。
第三方面,本发明实施例提供了一种协处理器,所述协处理器与存储器通过总线连接,所述存储器用于存储计算机执行指令,所述协处理器读取所述存储器存储的所述计算机执行指令,执行上述第一方面或者第一方面的任一种可能的实施方式提供的计算机指令处理方法。
第四方面,本发明实施例提供了一种处理器系统,所述处理器系统包括中央处理器CPU和协处理器,其特征在于,所述CPU上运行第一操作系统, 所述协处理器上运行第二操作系统;
所述CPU,用于向协处理器迁移第一指令集;
所述协处理器,用于上述第一方面或者第一方面的任一种可能的实施方式提供的计算机指令处理方法。
通过上述方案,对于编译所得的适于在第一操作系统执行的第一指令集,协处理器根据第一指令集获取在协处理器执行的第二指令集,由协处理器执行第二指令集代替由CPU执行第一指令集,减小CPU的负荷,提高协处理器的使用率。
附图说明
图1为现有技术分配编译后的二进制代码的应用场景的一种系统逻辑结构示意图;
图2为计算机指令处理方法的应用场景的一种系统逻辑结构示意图;
图3为中央处理器的指令集中的计算机指令与协处理器的指令集的对应关系的示意图;
图4为计算机指令处理方法的一种示范性流程图;
图5为步骤A402的一种可选示范性流程图;
图6为计算机指令处理方法的又一种示范性流程图;
图7为步骤A401的一种可选示范性流程图;
图8为步骤A403中处理二进制代码异常的一种可选示范性流程图;
图9为步骤A403中处理二进制代码异常的一种可选示范性流程图;
图10为步骤A403中处理二进制代码异常的一种可选示范性流程图;
图11为步骤A403中处理二进制代码异常的一种可选示范性流程图;
图12为协处理器202的一种逻辑结构示意图;
图13为协处理器202的一种可选逻辑结构示意图;
图14为协处理器1401与存储器1402组成的系统的系统逻辑结构示意图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
参见图1,现有技术提供的系统100,该系统100包括中央处理器(Central Processing Unit,简称CPU)101和协处理器102,中央处理器101支持的指令集与协处理器102支持的指令集不同。另外,中央处理器101安装有操作系统(例如,支持X86指令集的操作系统),协处理器102没有安装操作系统。其中,中央处理器101运行的程序在对源代码编译时,针对由协处理器102处理的源代码根据协处理器102的指令集编译,编译所得的二进制代码对于协处理器102来说是可识别的和可执行的,但对于中央处理器101来说可能存在不可识别的二进制代码;针对由中央处理器101处理的源代码根据中央处理器101的指令集编译,编译所得的二进制代码对于中央处理器101来说是可识别的和可执行的,但对于协处理器102来说可能存在不可识别的二进制代码,因此即使中央处理器101的负荷大,协处理器102也不能分担执行由中央处理器101执行的二进制代码。
参见图2,本发明实施例提供的系统200,该系统200包括中央处理器201和协处理器202。协处理器202具有控制能力,中央处理器201和协处理器202各自运行操作系统;相对于现有技术中协处理器202没有安装操作系统,本发明在协处理器202安装操作系统,可在协处理器202运行进程,并由协处理器202的操作系统进行进程调度;这样,中央处理器201和协处理器202之间可以相互迁移进程。
可选地,该系统200可以同时位于同一数据处理设备,对于该系统200包括的中央处理器201和协处理器202在该数据处理设备上设置的位置,本发明实施例不做限定;该中央处理器201与该协处理器202连接,对于该系统200包括的中央处理器201和协处理器202在该数据处理设备上如何实现连接的,本发明实施例不做限定;连接后的该中央处理器201与该协处理器202可相互之间进行数据传输。
例如,该数据处理设备包括总线,系统200包括的中央处理器201和协处理器202同时连接在该总线上,中央处理器201与协处理器202之间经该总线进行数据交互,在该总线满足中央处理器201与协处理器202之间的数据传输需求的情况下,该数据传输需求包括数据传输速度和数据传输格式,对于该总线的具体型号和支持的总线协议不做限定;另外,随着时代发展,可采用其他介质连接该系统200包括的中央处理器201和协处理器202,通过该介质提高该中央处理器201与该协处理器202之间的数据交互速度,具体实施时,可将该介质替代用于连接该中央处理器201与该协处理器202的总线,或者该介质与该总线并存,都用于该中央处理器201与该协处理器202之间的数据交互。
举例说明,该系统200中,中央处理器201采用X86处理器实现,协处理器202采用因特尔集成众核架构(Many Integrated Core,简称MIC)实现,该X86处理器和该MIC通过快速外设部件互连(Peripheral Component Interconnect Express,简称PCI-E)总线连接,X86处理器和MIC之间通过PCI-E总线进行数据交互。
可选地,系统200中,中央处理器201与协处理器202不位于同一设备内,该中央处理器201与该协处理器202通信连接,该中央处理器201与该协处理器202之间通过消息的方式进行数据交互。
参见图2,在本发明提供的计算机指令处理方法的应用场景中,中央处理器201支持的指令集与协处理器202支持的指令集会包含部分不同的计算机指令,中央处理器201运行的第一操作系统支持中央处理器201的指令集,协处理器202运行的操作系统支持协处理器202的指令集;如果第一操作系统与第二操作系统相同,则第一操作系统与第二操作系统均同时支持中央处理器201的指令集和协处理器202的指令集;如果第一操作系统不支持协处理器202的指令集或者第二操作系统不支持中央处理器201的指令集,则第一操作系统与第二操作系统是不相同的操作系统。
相对于协处理器202支持的指令集,中央处理器201支持的指令集可划分为三部分子指令集,包括第一部分子指令集、第二部分子指令集和第三部分子指令集。
第一部分子指令集,其包含的每个计算机指令,协处理器202的指令集中也包含相同的计算机指令;并且,中央处理器201支持的表示该计算机指令的二进制码,与协处理器202支持的表示该计算机指令的二进制码相同。以图3为例,二进制码“AAAA”表示第一部分子指令集中的一个计算机指令,协处理器支持的指令集也包含该计算机指令,并且,该二进制码“AAAA”也表示协处理器支持的指令集中的该计算机指令;与二进制码“AAAA”类似,对应表示第一部分子指令集中其他两个计算机指令的二进制码“CCCC”和二进制码“DDDD”,也表示协处理器的指令集中包含的该两个计算机指令。
第二部分子指令集,其包含的每个计算机指令,协处理器202的指令集中也包含相同的计算机指令;但是,中央处理器201支持的表示该计算机指令的二进制码,与协处理器202支持的表示该计算机指令的二进制码是不相同的。以图3为例,二进制码“BBBB”表示第二部分子指令集中的某个计算机指令,协处理器支持的指令集也包含该计算机指令,但是,协处理器支持的表示该计算机指令的二进制码“B1B1B1B1”,“BBBB”与“B1B1B1B1”为不同的二进制码。
第三部分子指令集,其包含的每个计算机指令,协处理器202的指令集并不包含。以图3为例,二进制码“EEEE”表示第三部分子指令集中的某个计算机指令,协处理器的指令集并不包含该计算机指令,因此对于二进制码“EEEE”,协处理器的指令集中找不到对应该计算机指令的二进制码。
本发明实施例中,若中央处理器201向协处理器202迁移进程,协处理器202接收中央处理器201迁移的与该进程相关的数据,包括执行该进程所需的二进制代码,还包括该进程的进程状态等等。在执行该进程所需的每条二进制代码中,该二进制代码的操作码属于表示计算机指令的二进制码,该二进制代码还可能包括操作数,操作数也是采用二进制码表示的。
对于第二部分子指令集中的计算机指令,中央处理器201支持的表示该计算机指令的二进制码与协处理器202支持的表示该计算机指令的二进制码是不相同的;鉴于此,本发明提供的计算机指令处理方法针对第二部分子指令集建立翻译表,通过翻译表进行二进制码的转换,将中央处理器201支持的表示该第二部分子指令集中计算机指令的二进制码转换为协处理器202支 持的表示该计算机指令的二进制码,这样协处理器202能够识别转换后的二进制码,运行迁移后的进程的过程中执行该转换后的二进制码实现计算机指令所具有的功能。对于第三部分子指令集包含的每个计算机指令,本发明提供的计算机指令处理方法都会出现无法识别表示该计算机指令的二进制码,出现无法识别二进制代码(该二进制代码的操作码为表示该计算机指令的二进制码)的异常,这时有两种解决途径:
第一种,将进程回迁中央处理器201;
第二种,将触发异常的二进制代码先转换为一个或多个中间代码,再将每个中间代码转换为协处理器202支持的二进制代码,从转换后的二进制代码开始继续运行进程;一种经中间代码转换的可选具体实现是:根据触发异常的二进制代码中的操作码(属于表示第三部分子指令集中计算机指令的二进制码)确定每个中间代码表示的操作码,如果还有中间代码表示的操作数,再根据触发异常的二进制代码中的操作数确定中间代码表示的与每个操作码对应的操作数,然后再根据中间代码表示的操作码确定协处理器202支持的二进制代码中的每个操作码,并根据中间代码表示的操作数确定协处理器202支持的二进制代码中的每个操作码对应的操作数。
可选地,将所述中央处理器201设置为首选执行进程的设备,设置协处理器202为次选执行进程的设备,首先由中央处理器201执行进程,执行进程的过程中如出现以下情况,则将进程迁移至协处理器202,包括:
第一种情况,中央处理器201执行进程的过程中,识别到由协处理器202负责执行的二进制代码时,将该进程迁移至协处理器202,由协处理器202运行该进程来执行该二进制代码;可选地,协处理器202将该进程的执行结果反馈中央处理器201;
第二种情况,当中央处理器201的CPU使用率过高时,筛选出一个或多个进程(例如筛选出占CPU使用率最大的进程),将筛选出的进程迁移至协处理器202,由协处理器202根据翻译表对执行该进程所需的二进制代码进行翻译、再执行翻译后的二进制代码来运行该进程;其中,翻译动作具体为,对执行进程所需的二进制代码遍历地进行匹配查找,可选地是按照二进制代 码的执行顺序依次遍历地匹配查找,匹配查找是否存在翻译表记录的中央处理器201支持的二进制码,若匹配查找到,根据翻译表将查找到的二进制码替换为表示相同计算机指令的、协处理器202支持的二进制码;
第三种情况,当中央处理器201使用内存的内存使用率过高时,筛选出一个或多个进程(例如筛选出内存使用率最大的进程),将筛选出的进程迁移至协处理器202,由协处理器202根据翻译表对执行该进程所需的二进制代码进行翻译、再执行翻译后的进程。
本发明一实施例,详述协处理器如何运行从CPU迁移过来的进程,为便于说明,以协处理器采用MIC实现、以中央处理器采用X86处理器实现为例,X86处理器与MIC通过PCI-E总线连接,X86处理器运行通用操作系统(如支持X86指令集的操作系统),MIC运行定制的uOS操作系统,X86处理器和MIC端分别具有独立的内存和寄存器。
本实施例中,X86处理器的寄存器为128位的寄存器,例如X86处理器包含16个128位的XMM寄存器,该XMM寄存器属于矢量寄存器,支持单指令流多数据流扩展(Streaming SIMD Extensions,简称SSE)指令集,其中,所述SIMD的中文名称为单指令流多数据流,所述SIMD的英文全称为Single Instruction Multiple Data;即X86处理器可采用SSE指令集操作XMM寄存器组,进行128位的矢量运算。或者,X86处理器的寄存器为256位的寄存器,例如X86处理器包含16个256位的YMM寄存器,该YMM寄存器属于矢量寄存器,支持高级矢量扩展(Advanced Vector Extensions,简称AVX)指令集,即X86处理器可采用AVX指令集操作YMM寄存器组,进行256位的矢量运算,例如操作YMM寄存器组进行浮点数的运算。
MIC的寄存器为512位的寄存器,例如MIC包含32个512位的ZMM寄存器,该ZMM寄存器属于矢量寄存器,可操作ZMM寄存器组进行512位的矢量运算;另外,MIC的寄存器也支持SSE指令集和AVX指令集。
本实施例中,MIC为兼容X86处理器操作寄存器的运算,MIC选用32个寄存器中的16个寄存器,支持128位的运算(如支持SSE指令集的运算),和支持256位的运算(如支持AVX指令集的的运算);可选地,MIC使用16 个512位的寄存器中低256位,进行128位的运算或者进行256位的运算,来兼容X86处理器操作寄存器的运算。这样,针对X86处理器的迁移的进程,MIC将执行该进程所需的二进制代码中代表X86处理器的寄存器的二进制码替换为代表MIC中选用的寄存器的二进制码,即将指向X86处理器的寄存器的二进制码替换为指向MIC中选用的寄存器的二进制码,不需对运算数据进行位数调整和进行运算规则的调整,即可使用MIC的寄存器进行运算;例如,MIC将X86处理器迁移的二进制代码中代表X86处理器的XMM寄存器的二进制码替换为代表MIC中选用的寄存器的二进制码,MIC执行表示SSE指令集中的计算机指令的二进制代码,操作MIC中选用的16个寄存器进行128位的矢量运算;再例如,MIC将X86处理器迁移的二进制代码中代表X86处理器的YMM寄存器的二进制码替换为代表MIC中选用的寄存器的二进制码,MIC执行表示AVX指令集中的计算机指令的二进制代码,操作MIC中选用的16个寄存器进行256位的矢量运算。
这样,对于X86处理器向MIC迁移的进程,MIC将执行该进程所需的二进制代码中代表X86处理器的寄存器的二进制码替换为代表MIC中选用的寄存器的二进制码后,使用MIC的寄存器运行迁移的进程。
另外在本实施例中,建立翻译表,建立的翻译表由MIC执行匹配。具体地,翻译表是针对上述第二部分子指令集包含的计算机指令建立的,原因是:X86处理器支持的表示该计算机指令的二进制码与MIC支持的表示该计算机指令的二进制码不同;建立翻译表的方法为,在翻译表中分别添加X86处理器支持的表示该计算机指令的二进制码、MIC支持的表示该计算机指令的二进制码,并在翻译表中确定X86处理器支持的表示该计算机指令的二进制码与MIC支持的表示该计算机指令的二进制码的映射关系。作为翻译表一个举例,表1例举了五个计算机指令,如下:
表1
Figure PCTCN2016073942-appb-000001
Figure PCTCN2016073942-appb-000002
表1中的“FXSAVE”,是指示浮点保存状态的指令,用于保存浮点运算单元(Float Point Unit,简称FPU)寄存器的状态,X86处理器支持的表示“FXSAVE”的二进制码为“01111011110111010000”,MIC支持的表示“FXSAVE”的二进制码为“00001111101011101111”;
表1中的“FXRSTOR”,是指示浮点恢复状态的指令,用于将保存的FPU寄存器的状态恢复到FPU寄存器中;X86支持的表示“FXRSTOR”的二进制码为“11011101”,MIC支持的表示“FXRSTOR”的二进制码为“00001111101011100001”;
表1中的“RDPMC”,是读执行监视计数的指令,用于读取性能监控器的计数;X86处理器支持的表示“RDPMC”的二进制码有两个,包括“0000111100110001”和“0000111100110010”,MIC支持的表示“RDPMC”的二进制码仅一个,为“0000111100110011”;
表1中的“FSUB”,是浮点减指令,用于浮点数的减法运算,X86支持的表示“FSUB”的二进制码为“1101100011100000”,MIC支持的表示“FSUB”的二进制码为“0000111101011100”。
这样,MIC运行的uOS操作系统加载该翻译表后,针对X86处理器向MIC迁移进程的过程中MIC从CPU获取到的与该进程相关的数据,从与该进程相关的数据提取出执行该进程所需的二进制代码,根据该翻译表对该二进制代码中与第二部分子指令集包含的计算机指令对应的二进制码进行匹配替换,替换为翻译表中MIC支持的表示该计算机指令的二进制码,继而MIC能够识别替换后的二进制码,一定程度提高了MIC执行迁移后的进程的正确率和效率。
在本实施例中,触发X86处理器向MIC迁移进程的条件有两个,一个条件是检测到X86处理器的CPU使用率大于第一阈值;又一个条件是检测到X86处理器使用内存的内存使用率大于第二阈值。具体实现时,X86处理器的操作系统运行一段代码来实现一监控模块,通过该监控模块检测X86处理器的 负荷,包括:检测X86处理器的CPU使用率,和检测X86处理器使用内存的内存使用率。该监控模块检测到以上两个条件中的任一得到满足时,将某一个或多个进程挂起,锁定该进程使用的内存空间,向MIC迁移该进程。
X86处理器将进程向MIC迁移时,如果是因X86处理器使用内存的内存使用率大于第二阈值触发的将进程从X86处理器迁移到MIC,无论X86处理器的CPU使用率是否大于第一阈值,都将X86处理器使用的内存中与该进程相关的数据发送给MIC。对应地,MIC将接收的与该进程相关的数据存储到MIC使用的内存中,再从MIC的内存存储的与该进程相关的数据中提取执行该进程所需的二进制代码;若在遍历提取的二进制代码中匹配到翻译表中CPU支持的二进制码(与第二部分子指令集包含的计算机指令对应的二进制码),替换为翻译表中MIC支持的标识该计算机指令的对应二进制码,并以替换的二进制码更新存储至MIC的内存;这样,对于MIC的内存中存储的与第二部分子指令集包含的计算机指令对应的二进制码,都会根据翻译表替换为MIC支持的表示该计算机指令的二进制码。
X86处理器将进程向MIC迁移时,如果是因X86处理器的CPU使用率大于第一阈值且X86处理器使用内存的内存使用率小于或等于第二阈值而触发的将进程从X86处理器迁移到MIC,将X86处理器的内存中存储该进程相关的数据的存储地址发送至MIC。对应地,MIC从其使用的内存中划分出一个存储空间,建立该个存储空间包含的存储地址与接收的存储地址(X86处理器的内存中存储该进程相关的数据的存储地址)的地址映射关系;继而,MIC通过PCI-E总线,根据该地址映射关系访问X86处理器的内存中与该进程相关的数据,从与该进程相关的数据中提取执行该进程所需的二进制代码;若在提取的二进制代码中匹配到翻译表中CPU支持的二进制码(即与第二部分子指令集包含的计算机指令对应的二进制码),替换为翻译表中MIC对应支持的二进制码,并将替换的二进制码更新存储至X86处理器的内存。这样,X86处理器的内存中存储的与第二部分子指令集包含的计算机指令对应的二进制码,都会根据翻译表替换为MIC支持的表示该计算机指令的二进制码;进而,MIC根据该地址映射关系使用CPU的内存来运行该进程。
下面,对MIC如何根据翻译表转换二进制码的具体实现做如下描述:
MIC的uOS操作系统运行一段代码来实现翻译模块,该翻译模块加载翻译表。
针对X86处理器迁移来的每个进程,翻译模块遍历执行该进程所需的二进制代码,匹配查找是否存在翻译表中X86处理器支持的二进制码,每次匹配查找到,根据翻译表将该二进制码翻译成MIC支持的二进制码,直到完成遍历查找。
例如,对于异或指令(XOR),X86处理器支持的二进制码根据比较的对象不同而有所变化,如果比较两个寄存器值,X86处理器支持该异或指令的二进制码表示为“0011001”,如果比较寄存器值和内存存储的值,X86处理器支持该异或指令的二进制码表示为“0011000”;但是,MIC支持该异或指令(XOR)的二进制码统一表示为“0011000”;为MIC能够识别比较两个寄存器值的该异或指令,在翻译表中记录“0011001”与“0011000”的映射关系,若翻译模块从X86处理器迁移的进程包含的二进制代码中根据翻译表匹配查找到“0011001”,便根据该翻译表将该进程包含的二进制码中的“0011001”替换为“0011000”。
再例如,对于用于读取性能监控器的计数的“RDPMC”指令,X86处理器支持该“RDPMC”指令的二进制码为“0000111100110001”和“0000111100110010”,MIC支持的表示“RDPMC”的二进制码仅一个,为“0000111100110011”,因此在翻译表中记录“0000111100110001”与“0000111100110011”的映射关系,和在翻译表中记录“0000111100110010”与“0000111100110011”的映射关系。若翻译模块从X86处理器迁移的进程所包含的二进制代码中根据翻译表匹配查找到“0000111100110001”,便根据该翻译表将该进程包含的二进制码中的“0000111100110001”替换为“0000111100110011”;若翻译模块从X86处理器迁移的进程包含的二进制代码中根据翻译表匹配查找到“0000111100110010”,便根据该翻译表将该进程包含的二进制码中的“0000111100110010”替换为“0000111100110011”。
本实施例中,X86处理器向MIC迁移进程的过程中,MIC不但会从X86处理器使用的内存获取与该进程相关的数据,还会从X86处理器的寄存器获取与该进程相关的寄存器值,并将获取的寄存器值转存储至MIC的对应寄存 器。MIC运行进程时,首先从MIC的内存存储的与该进程相关的数据中提取该进程的进程状态,该进程状态包括进程优先级、进程标识符、栈指针等进程运行的必要状态信息;然后,MIC从根据该进程状态确定的进程运行节点开始,使用MIC的寄存器,基于MIC的内存中存储的与该进程相关的数据(该数据包括根据翻译表翻译后的二进制代码),执行进程。
本实施例中,MIC的uOS操作系统运行一段代码来实现异常处理模块,该异常处理模块能够截获MIC执行进程出现的异常,包括进程执行到不能识别的二进制代码所触发的异常。可选地,MIC执行进程的过程中,如果异常处理模块检测到进程执行异常,将进程挂起,并生成记录进程异常执行的异常信息。
例如,上述第三部分子指令集包含的计算机指令的二进制码,MIC的指令集没有包含有,MIC无法识别与该计算机指令对应的二进制码;另外,翻译表也没有记录该第三部分子指令集包含的计算机指令的二进制码,无法根据翻译表将该第三部分子指令集包含的计算机指令的二进制码转换为MIC的指令集中的计算机指令对应的二进制码;因此,MIC不能识别以该二进制码作为操作码的二进制代码,如果进程执行到该二进制代码便会触发二进制代码无法识别的异常。
异常处理模块检测到因指令识别异常触发的执行进程异常,将进程挂起,采用以下三种可选的异常处理方式进行异常处理:
第一种方式,异常处理模块从触发进程异常的二进制代码开始将该挂起的进程回迁X86处理器,具体包括:对内存(可能是MIC的内存,或者是X86处理器的内存)中执行该回迁进程所需的二进制代码根据翻译表进行操作码(采用二进制码表示)的匹配,将匹配到的二进制码转换为X86处理器支持的二进制码,以转换的二进制码更新该内存中对应的二进制码。如果是在MIC的内存存储的更新后的执行该回迁进程所需的二进制代码,将该更新后的执行该回迁进程所需的二进制代码转存储至X86处理器的内存,一种转存储的实现方式是,通过MIC与X86处理器的数据通信,将MIC的内存中存储的更新后的执行该回迁进程所需的二进制代码转存储至X86处理器的内存。另外还将执行该回迁进程所需的二进制代码中表示MIC的寄存器的二进制码替换 为表示X86处理器的寄存器的二进制码,并将替换后的二进制码更新该内存中存储的表示MIC的寄存器的二进制码;另外还将MIC的寄存器中存储的与该回迁进程相关的寄存器值转存储至X86处理器的寄存器中;这样,X86处理器可以使用其寄存器和其内存运行该回迁进程。
第二种方式,异常处理模块判断该挂起的进程是否属于X86处理器迁移到MIC的进程,如果该挂起的进程属于X86处理器迁移到MIC的进程,识别触发异常的二进制代码,并将触发异常的二进制代码转换为模拟器(如simics模拟器或者qemu模拟器)的中间代码,再将该中间代码转换为MIC支持的二进制代码,从转换得到的二进制代码开始继续执行该进程;
第三种方式,异常处理模块向X86处理器发送进程回迁请求来通知X86处理器期望回迁当前异常执行的进程;X86处理器响应该进程回迁请求,并确定:监控模块在当前监控到的X86处理器的使用率、在当前监控到的X86处理器的内存的内存使用率;如果X86处理器确定的结果是X86处理器的使用率小于第一阈值、且X86处理器的内存的内存使用率小于或等于第二阈值,X86处理器向MIC反馈进程回迁指令;如果X86处理器确定的结果是X86处理器的使用率大于第一阈值,或者如果X86处理器确定的结果是X86处理器的内存的内存使用率大于第二阈值,X86处理器向MIC反馈进程拒绝回迁指令;
在第三种方式中,如果异常处理模块接收到X86处理器反馈的进程回迁指令,将挂起的进程回迁X86处理器,将进程从MIC回迁X86处理器的实现方式与上述第一种方式同原理实现,在此不再赘述;
在第三种方式中,如果异常处理模块接收到X86处理器反馈的拒绝回迁指令,不将挂起的进程回迁X86处理器,并判断该挂起的进程是否属于X86处理器迁移到MIC的进程,如果该挂起的进程属于X86处理器迁移到MIC的进程,识别触发异常的二进制代码,并将触发异常的二进制代码转换为模拟器(如simics模拟器或者qemu模拟器)的中间代码,再将该中间代码转换为MIC支持的二进制代码,从转换得到的二进制代码开始继续执行该进程。可选地,如果该挂起的进程不属于X86处理器迁移到MIC的进程,MIC直接输出异常信息;其中,该挂起的进程不属于X86处理器迁移到MIC的进程的 原因可能是:X86处理器具有一进程,执行该进程的过程中识别到需由MIC执行的代码段,将该代码段转由MIC执行,MIC新建立一进程来执行该代码段并出现异常。
本实施例中,X86处理器支持的以下计算机指令,MIC是不支持的,包括以下22种计算机指令:条件跳转指令“CMOV”、比较交换16字节指令“CMPXCHG16B”、浮点条件跳转指令“FCMOVcc”、浮点比较加载标志指令“FCOMI”、浮点比较加载标志出栈指令“FCOMIP”、浮点反比加载标志指令“FUCOMI”、浮点反比加载标志出栈指令“FUCOMIP”、端口输入指令“IN”、端口输入串指令“INS”、端口输入字节串指令“INSB”、端口输入双字串指令“INSD”、端口输入字串指令“INSW”、监视指令“MONITOR”、线程同步指令“MWAIT”、端口输出指令“OUT”、端口输出串指令“OUTS”、端口输出字节串指令“OUTSB”、端口输出双字指令“OUTSD”、端口输出字串指令“OUTSW”、暂停指令“PAUSE”、系统进入指令“SYSENTER”、系统退出指令“SYSEXIT”;这22种计算机指令可以分为三大类:
第一类是可拆成两个动作的计算机指令,包括条件跳转指令“CMOV”、比较交换16字节指令“CMPXCHG16B”、浮点条件跳转指令“FCMOVcc”、浮点比较加载标志指令“FCOMI”、浮点比较加载标志出栈指令“FCOMIP”、浮点反比加载标志指令“FUCOMI”、浮点反比加载标志出栈指令“FUCOMIP”;若MIC执行包含第一类计算机指令的二进制代码导致进程发生异常,如需MIC继续执行该进程,MIC将包含第一类计算机指令的二进制代码转换为中间代码,该中间代码为包含两个动作的二进制代码,这两个动作在MIC的指令集中分别存在对应的计算机指令,再将该中间代码转换为MIC支持的二进制代码,该MIC支持的二进制代码中每个二进制代码的操作码分别表示MIC的指令集中对应的计算机指令,这样,MIC能够识别转换后的二进制码并执行。以条件跳转指令“CMOV”为例,MIC不能识别以表示该条件跳转指令的二进制码为操作数的二进制代码,异常处理模块将该二进制代码经中间代码转换为MIC支持的二进制代码,包括指令在二进制码形式下的以下转换:根据该条件跳转指令确定中间代码表示的条件判断指令和移动指令,再将该中间代码表示的条件判断指令和移动指令对应翻译成MIC可识别的条件判断指令和 移动指令(MOV);MIC先执行条件判断指令确定是否满足跳转条件,如果满足,再执行移动指令(MOV);
第二类是通过端口读取数据或者写入数据的计算机指令,包括端口输入指令“IN”、端口输入串指令“INS”、端口输入字节串指令“INSB”、端口输入双字串指令“INSD”、端口输入字串指令“INSW”、监视指令“MONITOR”、线程同步指令“MWAIT”、端口输出指令“OUT”、端口输出串指令“OUTS”、端口输出字节串指令“OUTSB”、端口输出双字指令“OUTSD”、端口输出字串指令“OUTSW”;MIC执行第二类指令导致进程发生异常,如果继续由MIC执行该进程,分两种情况处理;第一种情况是通过端口写入数据,该情况下,MIC通知X86处理器从该计算机指令指定的目标端口写入数据即可;第二种情况是读取数据的计算机指令,该情况下,MIC先通知X86处理器从该计算机指令指定的目标端口读取数据到内存中,再访问这块内存来获取该数据;
第三类包括暂停指令“PAUSE”、系统进入指令“SYSENTER”、系统退出指令“SYSEXIT”,这三条指令是后期添加的,为了增加性能,其中暂停指令“PAUSE”是为了减少自旋锁的性能损失,系统进入指令“SYSENTER”、系统退出指令“SYSEXIT”是为了减少内核态和用户态之间切换的损失;第三类计算机指令是对原有X86指令集的优化,但MIC并不支持这种优化;若MIC执行二进制码表示的暂停指令“PAUSE”触发进程异常,如果继续由MIC执行该进程,执行二进制码表示的自旋锁指令,来替代执行二进制码表示的暂停指令“PAUSE”,即可继续运行进程;若执行二进制码表示的系统进入指令“SYSENTER”触发进程异常,执行二进制码表示的切换用户态到内核态的切换指令,来替代执行二进制码表示的系统进入指令“SYSENTER”,即可继续运行进程;若执行二进制码表示的系统退出指令“SYSEXIT”触发进程异常,执行二进制码表示的切换内核态到用户态的切换指令,来替代执行二进制码表示的系统退出指令“SYSEXIT”,即可继续运行进程。
本实施例中,如果MIC顺利执行完X86处理器迁移的进程,MIC可将执行结果反馈X86处理器,也可控制直接输出执行结果,输出的方式包括但不限于:通过显示模块等数据输出模块呈现执行结果,或者基于执行结果执行其他动作。
本实施例中,MIC运行uos操作系统后可调度进程,继而,X86处理器在负荷较大(X86处理器的使用率大于第一阈值,和/或X86处理器的内存的使用率大于第二阈值)时将筛选的进程迁移至MIC执行,分担了X86处理器的负荷;尤其是,对于大负荷的进程,X86处理器可将该进程迁移至MIC执行,延长了X86处理器的使用寿命,还尽量保证了各进程都能分配到足够的资源,保证了各进程的执行效率。
本发明一实施例,基于上述的系统200及上述实施例对协处理器如何运行从CPU迁移过来的进程的所作的改进,本实施例对上述实施例的技术方案做适应扩展,从协处理器101的角度提出一种计算机指令处理方法实现的基础流程,图4为该计算机指令处理方法的一种示范性的工作流程,但为了便于描述,仅示出了与本发明实施例相关的部分。
本实施例提供的计算机指令处理方法,应用于处理器系统,所述处理器系统包括协处理器和中央处理器CPU,所述CPU上运行第一操作系统,所述协处理器上运行第二操作系统;其中,第一操作系统是指支持CPU的指令集的操作系统;第二操作系统是指支持协处理器的指令集的操作系统。
所述协处理器运行第二操作系统后,可通过第二操作系统运行进程和线程,进行进程之间的调度,及进行线程之间的调度;进而,CPU可以向协处理器迁移一个或多个进程,本实施例定义第一进程为从CPU向协处理器迁移的单个进程;另外,CPU还可以向协处理器迁移一个或多个线程,本实施例定义第一线程为从CPU向协处理器迁移的单个线程。
更进一步地,CPU与协处理器之间不但可以相互迁移进程,相互迁移线程,还可以相互迁移一条或多条二进制代码。本实施例定义第一指令集,如果CPU向协处理器迁移进程,第一指令集是指执行该进程所需的二进制代码;如果CPU向协处理器迁移线程,第一指令集是指执行该线程所需的二进制代码;如果CPU向协处理器迁移一条或多条二进制代码,第一指令集是指CPU迁移协处理器的二进制代码的集合。
第一指令集包含的二进制代码,是根据CPU的指令集对源代码编译得到,对是否由第一操作系统执行的该编译不做限定,可以是第一操作系统执行的 该编译,也可以是其他编译器完成该编译后第一操作系统再获得该第一指令集,另外此处对源代码不做限定,并对采用哪种编程语言编写得到该源代码不做限定。
如图4所示,本实施例提供的计算机指令处理方法包括:步骤A401、步骤A402和步骤A403。
步骤A401,所述协处理器接收所述CPU迁移的第一指令集,所述第一指令集用于指示所述CPU在所述第一操作系统中执行计算机操作,所述第一指令集为适用于所述第一操作系统的二进制代码的集合。
本实施例中,对触发CPU向协处理器迁移第一指令集的触发条件不做限定,甚至CPU可以在任何条件下将第一指令集向协处理器迁移;
举例说明,CPU在执行第一指令集的过程中一旦接收到指令集迁移指令,则将该第一指令集迁移至协处理器;其中,由以下三个条件中任一条件触发该指令集迁移指令:
第一个条件,人为触发该指令集迁移指令,例如,CPU和协处理器同时集成在一数据处理设备中,人为操作数据处理设备触发该指令集迁移指令;
第二个条件,CPU根据CPU使用率确定是否将第一指令集迁移至协处理器,如果CPU的CPU使用率大于第一阈值,触发该指令集迁移指令;
第三个条件,CPU根据该CPU使用内存的内存使用率确定是否将第一指令集迁移至协处理器,如果CPU中该内存的内存使用率大于第二阈值,触发该指令集迁移指令。
需说明的是,因第一指令集是根据表示CPU的指令集的二进制码对源代码编译得到,编译得到的第一指令集中每条二进制代码的操作码均属于表示CPU的指令集的二进制码,所以第一指令集中代表计算机指令的每条二进制码都能够被CPU识别和执行。通常,一条代表计算机指令的二进制码触发一个计算机操作,例如,X86处理器上运行的进程执行到一条二进制码“1101100011100000”,进行浮点数的减法运算这一计算机操作;再例如,X86处理器上运行的进程执行一条二进制码“1101100011100000”,进行将保存的FPU寄存器的状态恢复到FPU寄存器中这一计算机操作。
需说明的是,第一指令集中的每条二进制代码不但包括操作码,可能还 包括操作数;操作码是采用二进制码表示的,操作数也是采用二进制码表示的。
本实施例中,CPU向协处理器迁移第一指令集,协处理器执行步骤A401接收所述CPU迁移的第一指令集。例如,CPU向协处理器迁移第一进程的过程中向协处理器发送第一指令集,所述协处理器执行步骤A401接收所述CPU迁移的第一指令集。
步骤A402,所述协处理器根据所述第一指令集获得第二指令集,其中,所述第二指令集中的二进制代码用于指示所述协处理器在所述第二操作系统中执行所述计算机操作。
具体地,按照上述将CPU的指令集划分为第一部分子指令集、第二部分子指令集和第三部分子指令集,协处理器的指令集不包含第三部分子指令集,因此协处理器的指令集与CPU的指令集存在不同;并且对于第二部分子指令集包含的计算机指令,中央处理器201支持的表示该计算机指令的二进制码,与协处理器202支持的表示该计算机指令的二进制码是不相同的;因此,表示第二部分子指令集包含的计算机指令的二进制码,和表示第三部分子指令集包含的计算机指令的二进制码,均不能够被协处理器识别和执行。鉴于此,本实施例提供步骤A402将第一指令集中的部分二进制代码或全部二进制代码转换,转换得到第二指令集;相对于第一指令集,第二指令集包含更多能够被协处理器识别和执行的二进制码,所述第二指令集具有更好的可识别性和可执行性。与第一指令集类似,第二指令集包含的每个二进制码分别触发计算机操作,第二指令集包含的一个二进制码触发一个计算机操作,并且本实施例期望协处理器执行第二指令集触发的计算机操作与CPU执行第一指令集触发的计算机操作相同。
可选地,若所述第一指令集包含有表示计算机指令(该第一部分子指令集中的计算机指令)的二进制码,步骤A402在根据第一指令集获取第二指令集时,直接将所述第一指令集包含的该二进制码获取到第二指令集。
可选地,若所述第一指令集包含有表示计算机指令(该第二部分子指令 集中的计算机指令)的二进制码,步骤A402在根据第一指令集获取第二指令集时,将第一指令集包含的该二进制码转换为协处理器支持的表示该计算机指令的二进制码,将转换的二进制码获取到第二指令集。
可选地,若所述第一指令集包含有表示计算机指令(该第三部分子指令集中的计算机指令)的二进制码,步骤A402在根据第一指令集获取第二指令集时,直接将第一指令集包含的该二进制码获取到第二指令集。
需说明的是,如果CPU向协处理器迁移第一进程,协处理器不但从CPU接收执行该第一进程所需的第一指令集,还从CPU获取其他与第一进程相关的数据,包括该第一进程的进程状态和该第一进程相关的寄存器值;该进程状态包括进程优先级、进程标识符、栈指针等进程运行的必要状态信息;协处理器将该第一进程相关的寄存器值存储至协处理器的寄存器中。另外,如果协处理器的寄存器与CPU的寄存器不属于同一种寄存器,步骤A402在根据第一指令集获取第二指令集时,还需将第一指令集中的二进制码中CPU的寄存器地址替换为协处理器的寄存器地址,将替换的协处理器的寄存器地址获取到第二指令集;可选地,如果协处理器的寄存器与CPU的寄存器属于同一种寄存器,步骤A402在根据第一指令集获取第二指令集时,直接将第一指令集中的二进制代码中CPU的寄存器地址获取到第二指令集。
需说明的是,如果CPU向协处理器迁移第一线程,协处理器不但从CPU接收执行该第一线程所需的第一指令集,还从CPU获取其他与第一线程相关的数据,包括该第一线程的线程状态和该第一线程相关的寄存器值。协处理器将该第一线程相关的寄存器值存储至协处理器的寄存器中;与CPU向协处理器迁移的是执行第一线程所需的第一指令集类似,此处也根据协处理器的寄存器与CPU的寄存器是否不属于同一种寄存器,确定在步骤A402在根据第一指令集获取第二指令集时将CPU的寄存器地址还是将协处理器的寄存器地址获取到第二指令集。
需说明的是,如果CPU向协处理器迁移的第一指令集是二进制代码的集 合,协处理器还从CPU获取其他执行第一指令集所需的数据,包括该第一指令集相关的寄存器值;协处理器将该第一指令集相关的寄存器值存储至协处理器的寄存器中。与CPU向协处理器迁移的是执行第一线程所需的第一指令集类似,此处也根据协处理器的寄存器与CPU的寄存器是否不属于同一种寄存器,确定在步骤A402在根据第一指令集获取第二指令集时将CPU的寄存器地址还是将协处理器的寄存器地址获取到第二指令集。
步骤A403,所述协处理器执行所述第二指令集中的二进制代码。
如果CPU向协处理器迁移的是执行第一进程所需的第一指令集,第二操作系统在协处理器上,根据第一进程的进程状态和寄存器值确定第二进程的进程运行节点;从该进程运行节点开始,使用协处理器的寄存器,执行第二指令集中的二进制代码来运行第二进程。
如果CPU向协处理器迁移的是执行第一线程所需的第一指令集,第二操作系统在协处理器上,根据第一线程的线程状态和寄存器值确定第二线程的线程运行节点;从该线程运行节点开始,使用协处理器的寄存器,执行第二指令集中的二进制代码来运行第二线程。
如果CPU向协处理器迁移的第一指令集是二进制代码的集合,协处理器使用协处理器的寄存器,执行第二指令集。
本实施例中,协处理器针对CPU迁移来的第一指令集,即使根据CPU的指令集编译得到的第一指令集包含协处理器不能识别的二进制代码,执行步骤A402能够部分或全部转换该二进制代码并对应生成第二指令集,协处理器能够识别第二指令集的识别率大于能够识别第一指令集的识别率,协处理器执行第二指令集,为CPU省去了运行第一指令集所需的负荷。
可选地,如果协处理器在步骤A403中完成执行第二指令集,根据具体应用场合确定是否将第二指令集的执行结果反馈CPU;如果是由CPU根据该结果执行其他计算机操作,协处理器将该结果反馈CPU;如果是由协处理器 根据该结果执行其他计算机操作,协处理器可不将该结果反馈CPU;例如,对于所述第二指令集的执行结果,如果下一个计算机操作是由CPU控制显式模块显式该执行结果,则协处理器将该执行结果反馈CPU,如果协处理器能够直接控制显示模块,并且下一个计算机操作是由协处理器控制显式模块显式该执行结果,则协处理器可不需将该结果反馈CPU,直接控制显式模块显式该执行结果。
可选地,针对第一指令集包含表示所述第二部分子指令集中的计算机指令的二进制码这一场景,对步骤A402做一细化,参见图5,所述协处理器根据所述第一指令集获得第二指令集包括:
步骤A4021,所述协处理器在预先设置的翻译表中匹配所述第一指令集中的二进制代码的操作码,若所述第一指令集中的第一二进制代码的操作码在所述翻译表中被匹配到,则根据所述翻译表中所述第一二进制代码的操作码对应的匹配项,将所述第一二进制代码的操作码翻译为第二二进制代码的操作码,获得所述第二二进制代码,所述协处理器根据获得的至少一条所述第二二进制代码获得所述第二指令集,其中,所述翻译表包含相同的计算机指令分别编译生成的在所述第一操作系统和所述第二操作系统中不同操作码之间的对应关系,所述第二二进制代码为适用于所述第二操作系统的二进制代码。
具体地,对于上述第二部分子指令集包含的每个计算机指令,协处理器支持的指令集也包含相同计算机指令,但是,CPU支持的表示该计算机指令的二进制码与协处理器支持的表示该计算机指令的二进制码不同。为让协处理器识别出该计算机指令,建立了翻译表,该翻译表针对第二部分子指令集中的每个计算机指令,分别记录每条计算机指令的对应关系;该计算机指令的对应关系包含:CPU支持的表示该计算机指令的二进制码,和协处理器支持的表示该计算机指令的二进制码。
本实施例定义特定计算机指令为翻译表中记录有所述对应关系的计算机指令;因此,本实施例将第二部分子指令集包含的每个计算机指令的所述对应关系添加入翻译表,则第二部分子指令集包含的每个计算机指令均属于特定计算机指令。
协处理器的第二操作系统加载翻译表后,对于步骤A401从CPU向协处理器迁移的第一指令集,步骤A4021根据翻译表遍历该第一指令集中的每条二进制代码,来匹配查找是否存在第一二进制代码,该第一二进制代码的操作码为该翻译表中记录的CPU支持的二进制码(即CPU支持的表示第二部分子指令集中的计算机指令的二进制码)。
协处理器根据所述第一指令集获得第二指令集时,步骤A4021针对所述第一指令集中的每条第一二进制代码,替换该第一二进制代码中的操作码为翻译表记录的该操作码对应的匹配项,该匹配项为协处理器支持的表示特定计算机指令(该第一二进制代码中的操作码表示的特定计算机指令)的二进制码,将操作码替换所得的二进制代码作为第二二进制代码;在第二二进制代码中,该匹配项为该第二二进制代码的操作码;将第二二进制代码获取到第二指令集中。以此可知,协处理器根据所述第一指令集获得第二指令集时,第一指令集中的每条第一二进制代码都会转换为对应的第二二进制代码,将第二二进制代码获取到第二指令集。
举例说明,对于异或指令(XOR),CPU支持的表达该异或指令的二进制码根据比较的对象不同而有所不同;如果比较两个寄存器的值,该异或指令的二进制码表示为“0011001”,如果比较寄存器的值和内存的值,该异或指令的二进制码表示为“0011000”;但在协处理器中,异或指令(XOR)的二进制码统一表示为“0011000”,在翻译表中记录“0011001”与“0011000”的映射关系,协处理器根据翻译表从第一指令集的第一二进制码的操作码匹配到“0011001”,则将“0011000”作为第二二进制码的操作码,如果第一二进制码具有操作数,此处对如何根据第一二进制码的操作数生成第二二进制码的操作数不做限定。
另外,协处理器根据所述第一指令集获得第二指令集时,根据翻译表遍历查找出的不属于第一二进制代码的其他二进制代码,采用哪种方式获取与其他二进制代码对应的二进制代码到第二指令集不做限定。
进一步可选地,从步骤A402如何处理第一指令集包含表示所述第一部分子指令集和/或第三部分子指令集中的计算机指令的二进制码这一场景,对步骤A402做一细化,参见图5,所述协处理器根据所述第一指令集获得第二指令集还包括:
步骤A4022,若所述第一指令集中的第三二进制代码的操作码在所述翻译表中未被匹配到,则所述协处理器将所述第三二进制代码做为所述第二指令集中的二进制代码。
具体地,第三二进制代码属于步骤A4021根据翻译表从第一指令集中遍历查找出的不属于第一二进制代码的其他二进制代码。
如果第一指令集中的某条二进制代码的操作码为表示所述第一部分子指令集中的计算机指令的二进制码,则该条二进制代码的操作码不会在所述翻译表匹配到;如果第一指令集中的某条二进制代码的操作码为表示所述第三部分子指令集中的计算机指令的二进制码,则该条二进制代码的操作码不会在所述翻译表匹配到;因此,所述第三二进制代码的操作码,可能是表示所述第一部分子指令集中的计算机指令的二进制码,或者可能是所述第三部分子指令集中的计算机指令的二进制码。
协处理器根据所述第一指令集获得第二指令集时,步骤A4022针对所述第一指令集中不能在所述翻译表匹配到操作码的二进制代码(即第三二进制码),将该第三二进制代码从第一指令集直接获取到第二指令集。
可选地,步骤A402可以包括步骤A4021和/或步骤A4022,在执行步骤A402时是否会执行步骤A4021或者步骤A4022,根据具体实施场景确定;一 种实施场景,如果第一指令集中不存在特定计算机指令,并且CPU和协处理器使用同一种寄存器执行进程,则步骤A402包括步骤A4022;如果第一指令集包含有特定计算机指令,则执行步骤A402包括步骤A4021,图5示出了在步骤A402时需执行步骤A4021和/或步骤A4022的示意图。
进一步可选地,如果在步骤A402中需执行步骤A4021和步骤A4022,可先执行步骤A4021再执行步骤A4022,或者步骤A4021和步骤A4022并行执行。
对步骤A4021和步骤A4022并行执行做一具体举例,在根据翻译表遍历第一指令集查找第一二进制代码时,每确定第一指令集中的一条二进制代码是否为第一二进制代码,便根据确定结果确定执行步骤A4021或者执行步骤A4022,具体地,如果确定结果为该条二进制代码为第一二进制代码,则执行步骤A4021将该第一二进制代码对应的第二二进制代码获取到第二指令集,如果确定结果为该条二进制代码为第三二进制代码,则执行步骤A4022将该第三二进制代码直接获取到第二指令集。
可选地,根据翻译表遍历第一指令集查找第一二进制代码的查找顺序为:第一指令集中每条二进制代码的执行顺序。
可选地扩展,翻译表记录的对应关系(CPU支持的表示计算机指令的二进制码,和协处理器支持的表示该计算机指令的二进制码)可以是一条或多条,步骤A4021使用的上述翻译表记录了与第二部分子指令集中的每个计算机指令匹配的该对应关系,但此处的翻译表可以少于步骤A4021使用的翻译表中记录的对应关系的条数,因此此处的翻译表是可以更新的,如向翻译表添加与第二部分子指令集中的某个计算机指令匹配的该对应关系,删除翻译表中的一条或者多条该对应关系;从而提供一种可替换的步骤A4021,即在执行步骤A4021时使用此处的翻译表替换使用上述的翻译表。
可选地,针对CPU具有的寄存器与协处理器具有的寄存器不是同一种寄存器这一场景,对计算机指令处理方法做一可选细化,参见图6,在所述协处理器执行所述第二指令集中的二进制代码之前,所述方法还包括步骤 A601;
步骤A601,所述协处理器将所述第二指令集包含的二进制代码中所述CPU的寄存器地址转换为所述协处理器的寄存器地址。
具体地,所述CPU的寄存器地址采用二进制码表示,所述协处理器的寄存器地址采用二进制码表示;CPU具有的寄存器与协处理器具有的寄存器不是同一种寄存器,表示CPU的寄存器地址的二进制码与表示协处理器的寄存器地址的二进制码不相同。
协处理器为使用自己的寄存器运行CPU向协处理器迁移的第二指令集,在执行第二指令集之前,查找第二指令集的二进制代码中是否包含所述CPU的寄存器地址,如果查找到,根据匹配替换关系替换第二指令集中查找到的所述CPU的寄存器地址为对应的协处理器的寄存器地址,该匹配替换关系为协处理器的寄存器地址与所述CPU的寄存器地址的映射关系。
可选地,作为步骤A601的替代方案,所述协处理器在步骤A402中根据所述第一指令集获得第二指令集时,便执行步骤A602将所述第一指令集包含的二进制代码中所述CPU的寄存器地址转换为所述协处理器的寄存器地址,将所述协处理器的寄存器地址获取到第二指令集中;这时,对步骤A602的替代方案与步骤A4021的执行顺序不做限定,通常,该替代方案与步骤A4021并行执行。
可选地,所述CPU的寄存器为128位的寄存器或者为256位的寄存器,所述协处理器的寄存器为512位的寄存器。
更进一步可选地,所述CPU的寄存器为128位的XMM寄存器或者为256位的YMM寄存器,所述协处理器的寄存器为256位的ZMM寄存器。无论CPU中的寄存器还是协处理器中的寄存器,在执行第二指令集时都是用于暂存计算机指令、数据和地址的,相对于基于XMM寄存器或基于YMM寄存器执行第二指令集,基于ZMM寄存器执行第二指令集,能够提高第二指令集的执行速 度,提前完成第二指令集的执行。
进一步可选地,第一指令集中使用CPU的寄存器的计算机指令属于矢量化指令,第二指令集中使用协处理器的寄存器的计算机指令属于矢量化指令。在步骤A402中根据所述第一指令集获得第二指令集时,根据第一指令集中使用CPU的寄存器的矢量化指令,对应地将使用协处理器的寄存器的矢量化指令获取到第二指令集中。
可选地,所述第一指令集是所述CPU在所述CPU的CPU使用率大于第一阈值时向所述协处理器迁移的。
具体地,CPU执行第一指令集,可选地还可以并行执行一条或多条其他二进制代码。如果所述CPU使用率大于第一阈值,代表CPU使用率过高,将第一指令集迁移至协处理器,减小CPU的负荷。
下面以第一进程为例讲解如何筛选向协处理器迁移的第一指令集,当然,筛选第一进程的方式也适合筛选第一线程;筛选第一进程的方式详述如下:如果CPU仅运行一个进程,则该个进程为第一进程。如果CPU并行运行多个进程,对如何从CPU执行的多个进程中确定第一进程提供几种可选细化实现方式:
第一种可选细化实现方式,在CPU的CPU使用率大于第一阈值的当前,从优先级小于优先级阈值的进程选取一个或多个第一进程,优选地,选取优先级最低的进程作为第一进程;
第二种可选细化实现方式,在CPU的CPU使用率大于第一阈值的当前,从CPU占用率大于占用率阈值的进程选取一个或多个第一进程,优选地,选取CPU占用率最高的进程作为第一进程。
进一步可选地,参见图7,步骤A401所述协处理器接收所述CPU迁移的第一指令集,包括步骤A4011和步骤A4012。
步骤A4011,所述协处理器接收所述CPU发送的要迁移的所述第一指令集的地址,所述第一指令集的地址是指所述第一指令集在所述CPU的内存中 存储的地址,其中,所述第一指令集的地址由所述CPU在所述CPU的内存使用率小于或等于第二阈值时向所述协处理器发送;
步骤A4012,所述协处理器基于所述第一指令集的地址访问所述CPU的内存来获取所述第一指令集。
具体地,CPU的CPU使用率大于第一阈值、CPU使用内存的内存使用率小于或等于第二阈值,代表CPU使用率过高但CPU使用内存的内存使用率没有过高;这种情况下,CPU向协处理器发送的是所述第一指令集的地址,可选地,所述第一指令集的地址为在CPU的内存中存储所述第一指令集的物理地址。协处理器在步骤A4011接收到所述第一指令集的地址后,执行步骤A4012根据所述第一指令集的地址访问所述CPU的内存,从所述CPU的内存读取第一指令集,再执行步骤A402根据第一指令集获取第二指令集,并将获取的第二指令集存储至CPU的内存中,一种可选的具体方式是,以获取的第二指令集替换CPU的内存中的第一指令集。
一种具体实现步骤A4011和步骤A4012的可选方式是,CPU在CPU使用率大于第一阈值、且CPU使用内存的内存使用率小于或等于第二阈值时,将与第一指令集相关的数据在CPU的内存中的存储地址发送至协处理器;而后,协处理器经总线(如PCI-E总线)根据该存储地址访问CPU的内存,从CPU的内存读取与第一指令集相关的数据,从与第一指令集相关的数据提取第一指令集,再执行步骤A402根据第一指令集获取第二指令集,并将获取的第二指令集存储至CPU的内存中,使用CPU的内存执行第二指令集。
其中,与第一指令集相关的数据包括第一指令集,还包括第一指令集的运行状态(如第一进程的进程状态)等其他执行第一指令集所需的数据。
可选地,所述第一指令集由所述CPU在所述CPU的内存使用率大于第二阈值时向所述协处理器发送。
具体地,如果CPU使用内存的内存使用率大于第二阈值,代表CPU使用内存的内存使用率过高;这种情况下,无论CPU的CPU使用率是否大于第一 阈值,CPU在均向协处理器迁移第一指令集,步骤A401接收该第一指令集并存储至协处理器的内存中。
一种具体的可选实现方式是,在CPU使用内存的内存使用率大于第二阈值时,CPU从其使用的内存中读取与第一指令集相关的数据,将读取的该与第一指令集相关的数据发送至协处理器;协处理器接收与第一指令集相关的数据,在协处理器的内存存储该与第一指令集相关的数据;继而协处理器再从与第一指令集相关的数据提取第一指令集,执行步骤A402根据第一指令集获取第二指令集,并将获取的第二指令集存储至协处理器的内存中,使用协处理器的内存执行第二指令集。
可选地,如果步骤A403执行第二指令集中出现异常,可采用以下四种可选方式中的任一种可选方式处理。
第一种可选方式,参见图8,步骤A403中所述协处理器执行所述第二指令集中的二进制代码,具体包括步骤A801、步骤A802、步骤A803和步骤A804。
步骤A801,所述协处理器依次执行所述第二指令集中的二进制代码;
步骤A802,如果检测到执行所述第二指令集时出现二进制代码识别异常,则确定触发异常的第四二进制代码;
步骤A803,将所述第四二进制代码转换为中间代码,再将所述中间代码转换为适用于所述第二操作系统的第五二进制代码;其中,第五二进制代码为一条或多条二进制代码;
步骤A804,执行所述第五二进制代码,并继续执行所述第二指令集中所述第四二进制代码之后的二进制代码。
具体地,步骤A402根据第一指令集获取第二指令集,虽然获取到的第二指令集中的二进制代码用于指示所述协处理器在所述第二操作系统中执行计算机操作,但第二指令集中的每条二进制代码不一定都能够被协处理器识别,对于协处理器执行到不能识别的二进制代码会触发二进制代码识别异常,本实施例定义触发二进制代码异常的二进制指令为第四二进制代码。
一种第二指令集中的二进制代码不能够被识别的场景是,如果在步骤 A402中将第一指令集中表示第三部分子指令集中每个计算机指令的二进制码直接获取到第二指令集中,则协处理器在运行第二指令集时执行到包含以该二进制码为操作码的二进制代码,无法识别该条二进制码而触发二进制代码识别异常,因此,该条二进制代码属于第四二进制代码。
所述协处理器步骤A801依次执行所述第二指令集中的二进制代码的过程中,若步骤A802检测到二进制代码识别异常,确定触发该二进制代码识别异常的第四二进制代码;继而步骤A803将所述第四二进制代码转换为中间代码,再将所述中间代码转换为协处理器能够识别的第五二进制代码;继而,步骤A804执行所述第五二进制代码,再继续执行所述第二指令集中所述第四二进制代码之后的二进制代码。其中,该中间代码与第四二进制代码具有映射关系,可以是一条或多条中间代码对应一条第四二进制代码的映射关系;同时,该中间代码还与第五二进制代码具有映射关系;在满足中间代码同时与第四二进制代码和第五二进制代码均具有映射关系的条件下,对中间代码的具体表现形式不做限定;举例说明,采用Java字节码作为中间代码,在将第四二进制代码转换为相应的Java字节码时确定该第四二进制代码与相应Java字节码的映射关系,在将相应Java字节码转换为第五二进制代码为时确定该相应Java字节码与第五二进制代码的映射关系;类似地,在步骤A803将所述第四二进制代码转换为第五二进制代码的过程中,还可选用simics模拟器的中间代码实现,还可选用qemu模拟器的中间代码实现。
举例说明,MIC对于上述第三部分子指令集中包括的CMOV、OUT、PAUSE、SYSEXIT等22种计算机指令是不支持的,MIC执行到以表示该计算机指令的二进制码为操作码的第四二进制代码将无法识别,会出现二进制代码识别异常;以条件跳转指令(CMOV)为例,MIC执行到以表示该条件跳转指令的二进制码为操作码的第四二进制代码会触发二进制代码识别异常,在步骤A802检测到该二进制代码识别异常,在步骤A803根据该条件跳转指令确定中间代码表示的条件判断指令和移动指令,再将该中间代码表示的条件判断指令和移动指令分别翻译成MIC可识别的条件判断指令和移动指令(MOV),MIC在步骤A804先执行条件判断指令确定是否满足跳转条件,如果满足,执行移动指令(MOV),再执行第二指令集中第四二进制代码后的二进制代码;需说明 的是,此处是根据条件跳转指令(CMOV)所带的操作数确定中间代码表示的条件判断指令和移动指令分别所带的操作数,再根据中间代码表示的条件判断指令和移动指令分别带的操作数确定MIC可识别的条件判断指令和移动指令(MOV)分别带的操作数;按照上述对22种计算机指令的分类,对于第一类和第三类包含的计算机指令,步骤A803能够将以表示该计算机指令的二进制码为操作码的第四二进制代码经中间代码转换为第五二进制代码,根据一条第四二进制代码转换出的第五二进制代码可以是一条或多条,此处对第五二进制代码的条数不做限定。
可选地,步骤A403执行第二指令集出现异常,暂停执行第二指令集,并输出异常信息,所述异常信息包括触发异常的第四二进制代码、异常类型、异常执行结果等等;在可选的具体实施中,实时将执行第二指令集的状态信息等写入第二指令集的运行日志中,包括将执行第二指令集出现异常的异常信息也会写入第二指令集的运行日志中。如果根据异常信息确定异常为二进制代码识别异常,则步骤A802根据异常信息确定触发异常的第四二进制代码。
在第一中可选方式中,即使协处理器执行第二指令集出现二进制代码识别异常,能够将该第四二进制代码经中间代码间接转换为协处理器支持的第五二进制代码,从第五二进制代码继续执行第二指令集,有效克服该异常并保证第二指令集的正常执行;以此类推,对于在执行第二指令集的过程中每次出现的异常,都能有效克服,在每次克服异常后继续执行第二指令集。
可选地,通常为优化协处理器的执行效率,针对协处理器的特殊应用开发了仅协处理器支持的扩展指令集,该扩展指令集包括的计算机指令(采用二进制码表示)仅协处理器支持但CPU不支持;预先确定中间代码表示的操作码与协处理器支持的指令集中一个或多个计算机指令之间的映射关系,可能会确定所述中间代码表示的操作码与所述扩展指令集包括的一个或多个计算机指令之间的映射关系,因此对于步骤A803将中间代码转换为的第五二进制代码,该第五二进制代码的操作码可能是表示该扩展指令集中的计算机指令的二进制码。这样,将第四二进制代码间接地转换为以表示该扩展指令集 包括的计算机指令的二进制码为操作码的二进制代码之后,不但能够解决该异常信息所指的异常,还能够提高协处理器指定第二指令集的效率。
第二种可选方式,参见图9,在步骤A803将所述第四二进制代码转换为中间代码之前,包括步骤A901和步骤A902。
步骤A901,向所述CPU发送指令集回迁请求;
步骤A902,接收所述CPU发送的拒绝回迁指令。
具体地,步骤A403执行第二指令集出现二进制代码识别异常,执行步骤A901向所述CPU发送指令集回迁请求;CPU响应该指令集回迁请求,并确定是否将第二指令集回迁CPU执行,如果确定不将第二指令集回迁,则向协处理器反馈拒绝回迁指令;协处理器在步骤A902中接收接收到该拒绝回迁指令,则执行步骤A803将所述第四二进制代码转换为中间代码。
CPU响应该指令集回迁请求的一种可选方式是,根据CPU的负荷确定是否将第二指令集回迁CPU执行;该CPU的负荷包括:CPU使用率,和CPU使用内存的内存使用率。如果所述CPU的CPU使用率大于所述第三阈值,或者如果所述CPU使用内存的内存使用率大于第四阈值,则向所述协处理器发送拒绝回迁指令。
第三种可选方式,参见图10,步骤A403中所述协处理器执行所述第二指令集中的二进制代码,包括步骤B1001、步骤B1002和步骤B1003。
步骤B1001,所述协处理器依次执行所述第二指令集中的二进制代码;
步骤B1002,如果检测到执行所述第二指令集时出现二进制代码识别异常,则确定触发异常的第六二进制代码;
步骤B1003,根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集,并向所述CPU迁移所述第三指令集。
具体地,步骤A402根据第一指令集获取第二指令集,第二指令集中的每条二进制代码不一定都能够被协处理器识别,对于协处理器执行到不能识别的二进制代码会触发二进制代码识别异常,本实施例定义触发二进制代码异 常的二进制指令为第六二进制代码,定义第六二进制代码与定义第四二进制代码同原理,可参见第一种可选方式中对定义第四二进制代码的相关解释。同原理地,如果在步骤A402中将第一指令集中表示第三部分子指令集中每个计算机指令的二进制码直接获取到第二指令集中,则协处理器在运行第二指令集时执行到以该二进制码为操作码的二进制代码,无法识别该二进制代码而触发二进制代码识别异常,因此,该二进制代码属于第六二进制代码。
与第一种可选方式不同的是,所述协处理器在步骤B1001依次执行所述第二指令集中的二进制代码的过程中,若步骤B1002检测到二进制代码识别异常并确定触发该二进制代码识别异常的第六二进制代码后,执行步骤B1003来处理该二进制代码识别异常。
协处理器在步骤B1003根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集,与步骤A402根据所述第一指令集获得第二指令集的实现原理相同,可参照上述对步骤A402的相关解释,及对步骤A402的可选细化的相关解释,例如对步骤A4021、步骤A4022等的相关解释。与步骤A4021对应地,步骤B1003根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集时,根据翻译表遍历所述第六二进制代码开始的第二指令集中的每条二进制代码,匹配查找是否存在第七二进制代码,该第七二进制代码的操作码为该翻译表中记录的协处理器支持的二进制码,替换该第七二进制代码中的操作码为翻译表记录的该操作码对应的匹配项,该匹配项为CPU支持的表示特定计算机指令(该第七二进制代码中的操作码表示的特定计算机指令)的二进制码,将操作码替换后的二进制代码作为第八二进制代码;在第八二进制代码中,该匹配项为该第八二进制代码的操作码;将第八二进制代码获取到第三指令集中。以此可知,步骤B1003根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集时,所述第六二进制代码开始的第二指令集中的每条第七二进制代码都会转换为对应的第八二进制代码,将第八二进制代码获取到第三指令集。
另可选地,如果协处理器的寄存器与CPU的寄存器不是同一种寄存器,有两种处理方式:
第一种处理方式,由协处理器处理;具体地,所述协处理器在步骤B1003根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集时,便执行步骤B1003将第二指令集中所述第六二进制代码开始的二进制代码包含的所述协处理器的寄存器地址转换为所述CPU的寄存器地址,将所述CPU的寄存器地址获取到第三指令集中;这时,对根据翻译表获取第八二进制代码和第一种处理方式转换寄存器地址的执行顺序不做限定,通常并行执行;
第二种处理方式,由CPU处理;具体地,CPU接收到协处理器在步骤B1003迁移的第三指令集后,将第三指令集中所述协处理器的寄存器地址替换为所述CPU的寄存器地址。
可选地,如果第二指令集存储在CPU的内存中,则步骤B1003根据第二指令集在CPU的内存中的存储地址,以第三指令集替换CPU的内存中存储的第二指令集。如果第二指令集存储在协处理器的内存中,则步骤B1003将第三指令集向CPU迁移,使得CPU将第三指令集存储在CPU的内存中。
本可选方式中,协处理器执行第六二进制代码触发二进制代码识别异常,则向CPU迁移未执行完的第二指令集;向CPU迁移未执行完的第二指令集的过程中,如果是由协处理器的内存中存储的与该未执行完的第二指令集相关的数据,将协处理器的内存中存储的与该未执行完的第二指令集相关的数据向CPU发送,与该未执行完的第二指令集相关的数据包括:向CPU迁移的第三指令集,以便CPU将该未执行完的第二指令集相关的数据存储至CPU的内存中。另外,协处理器还将与该未执行完的第二指令集相关的寄存器值向CPU发送,CPU将与该未执行完的第二指令集相关的寄存器值存储至CPU的寄存器(与第三指令集中的寄存器地址对应)中。
第四种可选方式,参见图11,在步骤B1003根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令 集之前,还包括步骤B1101和步骤B1102。
步骤B1101,向所述CPU发送指令集回迁请求;
步骤B1102,接收所述CPU发送的指令集回迁响应。
具体地,步骤A403执行第二指令集出现二进制代码识别异常,执行步骤A1101向所述CPU发送指令集回迁请求;CPU响应该指令集回迁请求,并确定是否将第二指令集回迁CPU执行,如果确定将第二指令集回迁,则向协处理器反馈指令集回迁响应;协处理器在步骤B1003中接收接收到该指令集回迁响应,则执行步骤B1003根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集。
CPU响应该指令集回迁请求的一种可选方式是,根据CPU的负荷确定是否将第二指令集回迁CPU执行;该CPU的负荷包括:CPU使用率,和CPU使用内存的内存使用率。如果所述CPU的CPU使用率小于或等于第五阈值、且所述CPU使用内存的内存使用率小于或等于第六阈值,则向所述协处理器发送指令集回迁响应。
本发明一实施例,图12是本实施例的协处理器202的一种可选逻辑结构示意图;所述协处理器202应用于处理器系统,所述处理器系统包括所述协处理器202和运行第一操作系统的中央处理器(CPU),所述协处理器202上运行第二操作系统;
所述协处理器202包括:
第一指令集接收单元2021,用于接收所述中央处理器迁移的第一指令集,所述第一指令集用于指示所述中央处理器在所述第一操作系统中执行计算机操作,所述第一指令集为适用于所述第一操作系统的二进制代码的集合;
第二指令集获得单元2022,用于根据所述第一指令集获得第二指令集,其中,所述第二指令集中的二进制代码用于指示所述协处理器202在所述第二操作系统中执行所述计算机操作;
第二指令集执行单元2023,用于执行所述第二指令集中的二进制代码。
可选地,所述第二指令集获得单元2022,用于根据所述第一指令集获得第二指令集,包括:
所述第二指令集获得单元2022,用于在预先设置的翻译表中匹配所述第一指令集中的二进制代码的操作码,若所述第一指令集中的第一二进制代码的操作码在所述翻译表中被匹配到,则根据所述翻译表中所述第一二进制代码的操作码对应的匹配项,将所述第一二进制代码的操作码翻译为第二二进制代码的操作码,获得所述第二二进制代码,根据获得的至少一条所述第二二进制代码获得所述第二指令集,其中,所述翻译表包含相同的计算机指令分别编译生成的在所述第一操作系统和所述第二操作系统中不同的操作码之间的对应关系,所述第二二进制代码为适用于所述第二操作系统的二进制代码。
可选地,参见图13,所述协处理器202还包括:
寄存器地址转换单元2024,用于将所述第二指令集包含的二进制代码中所述中央处理器的寄存器地址转换为所述协处理器202的寄存器地址。
可选地,所述第二指令集获得单元2022,还用于若所述第一指令集中的第三二进制代码的操作码在所述翻译表中未被匹配到,则所述协处理器202将所述第三二进制代码做为所述第二指令集中的二进制代码。
可选地,所述第一指令集是所述中央处理器在所述中央处理器的CPU使用率大于第一阈值时向所述协处理器202迁移的。
可选地,所述第一指令集接收单元2021,用于接收所述中央处理器迁移的第一指令集,包括:
所述第一指令集接收单元2021,用于接收所述中央处理器发送的要迁移的所述第一指令集的地址,并基于所述第一指令集的地址访问所述中央处理器的内存来获取所述第一指令集;其中,所述第一指令集的地址是指所述第一指令集在所述中央处理器的内存中存储的地址,其中,所述第一指令集的 地址由所述中央处理器在所述中央处理器的内存使用率小于或等于第二阈值时向所述协处理器202发送。
可选地,所述第一指令集由所述中央处理器在所述中央处理器的内存使用率大于第二阈值时向所述协处理器202发送。
可选地,所述第二指令集执行单元2023,用于执行所述第二指令集中的二进制代码,包括:
所述第二指令集执行单元2023,用于依次执行所述第二指令集中的二进制代码;如果检测到执行所述第二指令集时出现二进制代码识别异常,则确定触发异常的第四二进制代码,将所述第四二进制代码转换为中间代码,再将所述中间代码转换为适用于所述第二操作系统的第五二进制代码,执行所述第五二进制代码,并继续执行所述第二指令集中所述第四二进制代码之后的二进制代码。
可选地,所述第二指令集执行单元2023,还用于在将所述第四二进制代码转换为中间代码之前,向所述中央处理器发送指令集回迁请求,接收所述中央处理器发送的拒绝回迁指令。
可选地,所述第二指令集执行单元2023,用于执行所述第二指令集中的二进制代码,包括:
所述第二指令集执行单元2023,用于依次执行所述第二指令集中的二进制代码;如果检测到执行所述第二指令集时出现二进制代码识别异常,则确定触发异常的第六二进制代码;根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集,并向所述中央处理器迁移所述第三指令集。
可选地,所述第二指令集执行单元2023,还用于根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集之前,向所述中央处理器发送指令集回迁请求,接收所述中央处理器发送的指令集回迁响应。
本发明一实施例,图14是本实施例提供的协处理器1401的硬件结构示意图,示出了所述协处理器1401的一种硬件结构。
如图14所示,协处理器1401与存储器1402通过总线1403连接,所述存储器1402用于存储计算机执行指令,所述协处理器1401读取所述存储器1402存储的所述计算机执行指令,执行上述实施例提供的计算机指令处理方法。该计算机指令处理方法的具体实现,参见上述实施例对该计算机指令处理方法的相关描述,在此不再赘述。
其中,协处理器1401可以采用因特尔集成众核架构(Many Integrated Core,简称MIC),微处理器,应用专用集成电路(Application Specific Integrated Circuit,ASIC),或者一个或多个集成电路,用于执行相关程序,以实现上述方法实施例所提供的技术方案,包括执行上述实施例提供的计算机指令处理方法。
其中,存储器1402可以是只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。存储器1402可以存储操作系统和其他应用程序。在通过软件或者固件来实现上述方法实施例提供的技术方案时,用于实现上述方法实施例提供的技术方案的程序代码保存在存储器1402中,包括将应用于所述协处理器1401的上述实施例提供的计算机指令处理方法的程序代码保存在存储器1402中,并由协处理器1401来执行。
其中,总线1403可包括一通路,用于在所述协处理器1401中各个部件与存储器1402之间传送信息。
应注意,尽管图14所示的所述协处理器1401仅仅示出了协处理器1401、存储器1402以及总线1403,但是在具体实现过程中,本领域的技术人员应当明白,所述协处理器1401还包含实现正常运行所必须的其他器件,例如通信接口。同时,根据具体需要,本领域的技术人员应当明白,所述协处理器1401还可包含实现其他附加功能的硬件器件。此外,本领域的技术人员应当明白,所述协处理器1401也可仅仅包含实现上述方法实施例所必须的器件,而不必包含图14中所示的全部器件。
本发明一实施例,提供一种系统200,参见图1,所述处理器系统200包括中央处理器201(CPU)和协处理器202,所述CPU上运行第一操作系统,所述协处理器202上运行第二操作系统;
所述中央处理器201,用于向协处理器202迁移第一指令集;
所述协处理器202,用于执行上述实施例或者及上述实施例的可选细化方式提供的计算机指令处理方法。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,所述模块和单元的划分,仅仅为一种逻辑功能划分,实现时可以有另外的划分方式,例如多个模块或单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,设备或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能模块可以集成在一个处理模块中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用硬件加软件功能模块的形式实现。
上述以软件功能模块的形式实现集成的模块,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括:移动硬盘、只读存储器(英文:Read-Only Memory,简称ROM)、随机存取存储 器(英文:Random Access Memory,简称RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
总之,以上所述仅为本发明技术方案的较佳实施例而已,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (24)

  1. 一种计算机指令处理方法,应用于处理器系统,所述处理器系统包括协处理器和中央处理器CPU,其特征在于,所述CPU上运行第一操作系统,所述协处理器上运行第二操作系统;
    所述方法包括:
    所述协处理器接收所述CPU迁移的第一指令集,所述第一指令集用于指示所述CPU在所述第一操作系统中执行计算机操作,所述第一指令集为适用于所述第一操作系统的二进制代码的集合;
    所述协处理器根据所述第一指令集获得第二指令集,其中,所述第二指令集中的二进制代码用于指示所述协处理器在所述第二操作系统中执行所述计算机操作;
    所述协处理器执行所述第二指令集中的二进制代码。
  2. 根据权利要求1所述的方法,其特征在于,所述协处理器根据所述第一指令集获得第二指令集包括:
    所述协处理器在预先设置的翻译表中匹配所述第一指令集中的二进制代码的操作码,若所述第一指令集中的第一二进制代码的操作码在所述翻译表中被匹配到,则根据所述翻译表中所述第一二进制代码的操作码对应的匹配项,将所述第一二进制代码的操作码翻译为第二二进制代码的操作码,获得所述第二二进制代码,所述协处理器根据获得的至少一条所述第二二进制代码获得所述第二指令集,其中,所述翻译表包含相同的计算机指令分别编译生成的在所述第一操作系统和所述第二操作系统中不同的操作码之间的对应关系,所述第二二进制代码为适用于所述第二操作系统的二进制代码。
  3. 根据权利要求1或2所述的方法,其特征在于,在所述协处理器执行所述第二指令集中的二进制代码之前,所述方法还包括:
    所述协处理器将所述第二指令集包含的二进制代码中所述CPU的寄存器地址转换为所述协处理器的寄存器地址。
  4. 根据权利要求2或3所述的方法,其特征在于,所述协处理器根据所述第一指令集获得第二指令集还包括:
    若所述第一指令集中的第三二进制代码的操作码在所述翻译表中未被匹配到,则所述协处理器将所述第三二进制代码做为所述第二指令集中的二进制代码。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述第一指令集是所述CPU在所述CPU的CPU使用率大于第一阈值时向所述协处理器迁移的。
  6. 根据权利要求5所述的方法,其特征在于,所述协处理器接收所述CPU迁移的第一指令集,包括:
    所述协处理器接收所述CPU发送的要迁移的所述第一指令集的地址,所述第一指令集的地址是指所述第一指令集在所述CPU的内存中存储的地址,其中,所述第一指令集的地址由所述CPU在所述CPU的内存使用率小于或等于第二阈值时向所述协处理器发送;
    所述协处理器基于所述第一指令集的地址访问所述CPU的内存来获取所述第一指令集。
  7. 根据权利要求1至4任一项所述的方法,其特征在于,所述第一指令集由所述CPU在所述CPU的内存使用率大于第二阈值时向所述协处理器发送。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述协处理器执行所述第二指令集中的二进制代码,包括:
    所述协处理器依次执行所述第二指令集中的二进制代码;
    如果检测到执行所述第二指令集时出现二进制代码识别异常,则确定触发异常的第四二进制代码;
    将所述第四二进制代码转换为中间代码,再将所述中间代码转换为适用于所述第二操作系统的第五二进制代码;
    执行所述第五二进制代码,并继续执行所述第二指令集中所述第四二进制代码之后的二进制代码。
  9. 根据权利要求8所述的方法,其特征在于,所述将所述第四二进制代码转换为中间代码之前,还包括:
    向所述CPU发送指令集回迁请求;
    接收所述CPU发送的拒绝回迁指令。
  10. 根据权利要求1至7任一项所述的方法,其特征在于,所述协处理器执行所述第二指令集中的二进制代码,包括:
    所述协处理器依次执行所述第二指令集中的二进制代码;
    如果检测到执行所述第二指令集时出现二进制代码识别异常,则确定触发异常的第六二进制代码;
    根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集,并向所述CPU迁移所述第三指令集。
  11. 根据权利要求10所述的方法,其特征在于,所述根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集之前,还包括:
    向所述CPU发送指令集回迁请求;
    接收所述CPU发送的指令集回迁响应。
  12. 一种协处理器,应用于处理器系统,所述处理器系统包括所述协处理器和运行第一操作系统的中央处理器CPU,其特征在于,所述协处理器上运行第二操作系统;
    所述协处理器包括:
    第一指令集接收单元,用于接收所述CPU迁移的第一指令集,所述第一指令集用于指示所述CPU在所述第一操作系统中执行计算机操作,所述第一指令集为适用于所述第一操作系统的二进制代码的集合;
    第二指令集获得单元,用于根据所述第一指令集获得第二指令集,其中,所述第二指令集中的二进制代码用于指示所述协处理器在所述第二操作系统中执行所述计算机操作;
    第二指令集执行单元,用于执行所述第二指令集中的二进制代码。
  13. 根据权利要求12所述的协处理器,其特征在于,所述第二指令集获得单元,用于根据所述第一指令集获得第二指令集,包括:
    所述第二指令集获得单元,用于在预先设置的翻译表中匹配所述第一指令集中的二进制代码的操作码,若所述第一指令集中的第一二进制代码的操作码在所述翻译表中被匹配到,则根据所述翻译表中所述第一二进制代码的操作码对应的匹配项,将所述第一二进制代码的操作码翻译为第二二进制代码的操作码,获得所述第二二进制代码,根据获得的至少一条所述第二二进制代码获得所述第二指令集,其中,所述翻译表包含相同的计算机指令分别编译生成的在所述第一操作系统和所述第二操作系统中不同的操作码之间的对应关系,所述第二二进制代码为适用于所述第二操作系统的二进制代码。
  14. 根据权利要求12或13所述的协处理器,其特征在于,所述协处理器还包括:
    寄存器地址转换单元,用于将所述第二指令集包含的二进制代码中所述CPU的寄存器地址转换为所述协处理器的寄存器地址。
  15. 根据权利要求13或14所述的协处理器,其特征在于,所述第二指令集获得单元,还用于若所述第一指令集中的第三二进制代码的操作码在所述翻译表中未被匹配到,将所述第三二进制代码做为所述第二指令集中的二进制代码。
  16. 根据权利要求12至15任一项所述的协处理器,其特征在于,所述第一指令集是所述CPU在所述CPU的CPU使用率大于第一阈值时向所述协处理器迁移的。
  17. 根据权利要求16所述的协处理器,其特征在于,所述第一指令集接收单元,用于接收所述CPU迁移的第一指令集,包括:
    所述第一指令集接收单元,用于接收所述CPU发送的要迁移的所述第一指令集的地址,并基于所述第一指令集的地址访问所述CPU的内存来获取所述第一指令集;其中,所述第一指令集的地址是指所述第一指令集在所述CPU的内存中存储的地址,所述第一指令集的地址由所述CPU在所述CPU的内存使用率小于或等于第二阈值时向所述协处理器发送。
  18. 根据权利要求12至15任一项所述的协处理器,其特征在于,所述第一指令集由所述CPU在所述CPU的内存使用率大于第二阈值时向所述协处理器发送。
  19. 根据权利要求12至18任一项所述的协处理器,其特征在于,所述第二指令集执行单元,用于执行所述第二指令集中的二进制代码,包括:
    所述第二指令集执行单元,用于依次执行所述第二指令集中的二进制代码;如果检测到执行所述第二指令集时出现二进制代码识别异常,则确定触发异常的第四二进制代码,将所述第四二进制代码转换为中间代码,再将所述中间代码转换为适用于所述第二操作系统的第五二进制代码,执行所述第五二进制代码,并继续执行所述第二指令集中所述第四二进制代码之后的二进制代码。
  20. 根据权利要求19所述的协处理器,其特征在于,所述第二指令集执行单元,还用于在将所述第四二进制代码转换为中间代码之前,向所述CPU发送指令集回迁请求,接收所述CPU发送的拒绝回迁指令。
  21. 根据权利要求12至18任一项所述的协处理器,其特征在于,所述第二指令集执行单元,用于执行所述第二指令集中的二进制代码,包括:
    所述第二指令集执行单元,用于依次执行所述第二指令集中的二进制代码;如果检测到执行所述第二指令集时出现二进制代码识别异常,则确定触发异常的第六二进制代码;根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集,并向所述CPU迁移所述第三指令集。
  22. 根据权利要求21所述的协处理器,其特征在于,所述第二指令集执行单元,还用于根据所述第二指令集中所述第六二进制代码开始的二进制代码获取适用于所述第一操作系统的第三指令集之前,向所述CPU发送指令集回迁请求,接收所述CPU发送的指令集回迁响应。
  23. 一种协处理器,其特征在于,所述协处理器与存储器通过总线连接,所述存储器用于存储计算机执行指令,所述协处理器读取所述存储器存储的所述计算机执行指令,执行权利要求1至11任一项所述的计算机指令处理方法。
  24. 一种处理器系统,所述处理器系统包括中央处理器CPU和协处理器,其特征在于,所述CPU上运行第一操作系统,所述协处理器上运行第二操作系统;
    所述CPU,用于向协处理器迁移第一指令集;
    所述协处理器,用于执行权利要求1至11任一项所述的计算机指令处理方法。
PCT/CN2016/073942 2015-06-17 2016-02-17 计算机指令处理方法、协处理器和系统 Ceased WO2016202001A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP16810742.3A EP3301567B1 (en) 2015-06-17 2016-02-17 Computer instruction processing method, coprocessor, and system
US15/844,191 US10514929B2 (en) 2015-06-17 2017-12-15 Computer instruction processing method, coprocessor, and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510336409.5A CN106325819B (zh) 2015-06-17 2015-06-17 计算机指令处理方法、协处理器和系统
CN201510336409.5 2015-06-17

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/844,191 Continuation US10514929B2 (en) 2015-06-17 2017-12-15 Computer instruction processing method, coprocessor, and system

Publications (1)

Publication Number Publication Date
WO2016202001A1 true WO2016202001A1 (zh) 2016-12-22

Family

ID=57545019

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/073942 Ceased WO2016202001A1 (zh) 2015-06-17 2016-02-17 计算机指令处理方法、协处理器和系统

Country Status (4)

Country Link
US (1) US10514929B2 (zh)
EP (1) EP3301567B1 (zh)
CN (1) CN106325819B (zh)
WO (1) WO2016202001A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180107489A1 (en) * 2015-06-17 2018-04-19 Huawei Technologies Co.,Ltd. Computer instruction processing method, coprocessor, and system
US11119941B2 (en) 2017-10-31 2021-09-14 Hewlett Packard Enterprise Development Lp Capability enforcement controller
WO2024001699A1 (zh) * 2022-06-28 2024-01-04 华为技术有限公司 一种着色器输入数据的处理方法和图形处理装置
CN117687626A (zh) * 2024-02-04 2024-03-12 双一力(宁波)电池有限公司 一种上位机和主程序匹配系统及方法

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10157164B2 (en) * 2016-09-20 2018-12-18 Qualcomm Incorporated Hierarchical synthesis of computer machine instructions
CN108021454A (zh) * 2017-12-28 2018-05-11 努比亚技术有限公司 一种处理器负载均衡的方法、终端及计算机存储介质
CN108363726A (zh) * 2018-01-10 2018-08-03 郑州云海信息技术有限公司 一种改进的cpu和os的兼容性数据查询的方法及系统
CN108647009A (zh) * 2018-03-22 2018-10-12 中钞信用卡产业发展有限公司杭州区块链技术研究院 区块链信息交互的装置、方法和存储介质
CN108810087B (zh) * 2018-04-28 2020-06-26 北京青云科技股份有限公司 一种存储服务器的连接方法、系统及设备
US10949207B2 (en) 2018-09-29 2021-03-16 Intel Corporation Processor core supporting a heterogeneous system instruction set architecture
CN111079916B (zh) * 2018-10-19 2021-01-15 安徽寒武纪信息科技有限公司 运算方法、系统及相关产品
CN110750304B (zh) * 2019-09-30 2022-04-12 百富计算机技术(深圳)有限公司 提升任务切换效率的方法及终端设备
CN111290759B (zh) * 2020-01-19 2023-09-19 龙芯中科技术股份有限公司 指令生成方法、装置及设备
CN113806006B (zh) * 2020-06-12 2025-06-06 华为技术有限公司 一种异构指令集架构下异常或中断的处理方法、装置
US11403102B2 (en) 2020-06-27 2022-08-02 Intel Corporation Technology to learn and offload common patterns of memory access and computation
CN114077424A (zh) * 2020-08-14 2022-02-22 上海芯联芯智能科技有限公司 扩展mips指令集处理器支持risc指令集多模系统的方法和装置
CN115309693A (zh) * 2021-05-07 2022-11-08 脸萌有限公司 集成电路、数据处理装置和方法
CN113641404B (zh) * 2021-07-20 2024-10-29 昆仑芯(北京)科技有限公司 程序运行方法、装置、处理器芯片、电子设备和存储介质
CN114237708B (zh) * 2021-09-23 2024-12-27 武汉深之度科技有限公司 一种多处理器的指令执行方法、计算设备及存储介质
CN118069222A (zh) * 2023-10-24 2024-05-24 上海芯联芯智能科技有限公司 一种指令执行方法及装置
CN121255574B (zh) * 2025-12-03 2026-03-20 中国科学院软件研究所 软硬件协同的进程识别与监控方法、装置、设备及介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7167559B2 (en) * 2001-03-28 2007-01-23 Matsushita Electric Industrial Co., Ltd. Information security device, exponentiation device, modular exponentiation device, and elliptic curve exponentiation device
CN101387969A (zh) * 2008-10-16 2009-03-18 上海交通大学 软硬件协同设计的动态二进制翻译方法
CN101546301A (zh) * 2009-05-05 2009-09-30 浪潮电子信息产业股份有限公司 一种由异构处理器组成协同计算机的方法
CN101944077A (zh) * 2010-09-02 2011-01-12 东莞市泰斗微电子科技有限公司 一种主处理器和协处理器之间的通讯接口及其控制方法
CN102282540A (zh) * 2008-12-17 2011-12-14 超威半导体公司 带有共享指令流的协处理器单元
CN103294540A (zh) * 2013-05-17 2013-09-11 北京航空航天大学 一种通过至强融核协处理器提升Erlang虚拟机性能的方法

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3250729B2 (ja) * 1999-01-22 2002-01-28 日本電気株式会社 プログラム実行装置及びそのプロセス移動方法並びにプロセス移動制御プログラムを格納した記憶媒体
US7093258B1 (en) * 2002-07-30 2006-08-15 Unisys Corporation Method and system for managing distribution of computer-executable program threads between central processing units in a multi-central processing unit computer system
JP2008299648A (ja) * 2007-05-31 2008-12-11 Toshiba Corp プログラムおよび情報処理装置
US8615647B2 (en) * 2008-02-29 2013-12-24 Intel Corporation Migrating execution of thread between cores of different instruction set architecture in multi-core processor and transitioning each core to respective on / off power state
US20090319662A1 (en) * 2008-06-24 2009-12-24 Barsness Eric L Process Migration Based on Exception Handling in a Multi-Node Environment
JP2010272055A (ja) * 2009-05-25 2010-12-02 Sony Corp 情報処理装置および方法、並びにプログラム
CN102193788B (zh) * 2010-03-12 2016-08-03 复旦大学 基于动态二进制翻译的跨平台驱动程序复用方法
US9495183B2 (en) * 2011-05-16 2016-11-15 Microsoft Technology Licensing, Llc Instruction set emulation for guest operating systems
WO2013100996A1 (en) * 2011-12-28 2013-07-04 Intel Corporation Binary translation in asymmetric multiprocessor system
JP5929353B2 (ja) * 2012-03-14 2016-06-01 富士通株式会社 例外処理方法、プログラム及び装置
US9141361B2 (en) * 2012-09-30 2015-09-22 Intel Corporation Method and apparatus for performance efficient ISA virtualization using dynamic partial binary translation
GB2508433A (en) * 2012-12-03 2014-06-04 Ibm Migration of processes in heterogeneous computing environments using emulating and compiling source code on target system
US9405551B2 (en) 2013-03-12 2016-08-02 Intel Corporation Creating an isolated execution environment in a co-designed processor
US20140375658A1 (en) * 2013-06-25 2014-12-25 Ati Technologies Ulc Processor Core to Graphics Processor Task Scheduling and Execution
US20150301955A1 (en) * 2014-04-21 2015-10-22 Qualcomm Incorporated Extending protection domains to co-processors
CN104572307B (zh) * 2015-01-30 2019-03-05 无锡华云数据技术服务有限公司 一种对虚拟资源进行弹性调度的方法
CN106325819B (zh) * 2015-06-17 2019-08-02 华为技术有限公司 计算机指令处理方法、协处理器和系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7167559B2 (en) * 2001-03-28 2007-01-23 Matsushita Electric Industrial Co., Ltd. Information security device, exponentiation device, modular exponentiation device, and elliptic curve exponentiation device
CN101387969A (zh) * 2008-10-16 2009-03-18 上海交通大学 软硬件协同设计的动态二进制翻译方法
CN102282540A (zh) * 2008-12-17 2011-12-14 超威半导体公司 带有共享指令流的协处理器单元
CN101546301A (zh) * 2009-05-05 2009-09-30 浪潮电子信息产业股份有限公司 一种由异构处理器组成协同计算机的方法
CN101944077A (zh) * 2010-09-02 2011-01-12 东莞市泰斗微电子科技有限公司 一种主处理器和协处理器之间的通讯接口及其控制方法
CN103294540A (zh) * 2013-05-17 2013-09-11 北京航空航天大学 一种通过至强融核协处理器提升Erlang虚拟机性能的方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3301567A4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180107489A1 (en) * 2015-06-17 2018-04-19 Huawei Technologies Co.,Ltd. Computer instruction processing method, coprocessor, and system
US11119941B2 (en) 2017-10-31 2021-09-14 Hewlett Packard Enterprise Development Lp Capability enforcement controller
WO2024001699A1 (zh) * 2022-06-28 2024-01-04 华为技术有限公司 一种着色器输入数据的处理方法和图形处理装置
CN117687626A (zh) * 2024-02-04 2024-03-12 双一力(宁波)电池有限公司 一种上位机和主程序匹配系统及方法
CN117687626B (zh) * 2024-02-04 2024-05-03 双一力(宁波)电池有限公司 一种上位机和主程序匹配系统及方法

Also Published As

Publication number Publication date
CN106325819A (zh) 2017-01-11
EP3301567A1 (en) 2018-04-04
US10514929B2 (en) 2019-12-24
US20180107489A1 (en) 2018-04-19
CN106325819B (zh) 2019-08-02
EP3301567A4 (en) 2018-05-30
EP3301567B1 (en) 2022-10-19

Similar Documents

Publication Publication Date Title
WO2016202001A1 (zh) 计算机指令处理方法、协处理器和系统
CN107066241B (zh) 用于动态加载基于图的计算的系统和方法
US10372493B2 (en) Thread and/or virtual machine scheduling for cores with diverse capabilities
US8078850B2 (en) Branch prediction technique using instruction for resetting result table pointer
US12217089B2 (en) Identifying memory devices for swapping virtual machine memory pages
US11593113B2 (en) Widening memory access to an aligned address for unaligned memory operations
US20100251227A1 (en) Binary resource format and compiler
JP5478526B2 (ja) データ分析及び機械学習処理装置及び方法及びプログラム
US11366690B2 (en) Scheduling commands in a virtual computing environment
AU2014262225A1 (en) Dynamically loading graph-based computations
JP5238876B2 (ja) 情報処理装置及び情報処理方法
AU2016200107B2 (en) Dynamically loading graph-based computations
Bader XMP-IO function and its application to mapreduce on the K computer
CN117222980A (zh) 一种任务调度方法和装置
HK1249218B (zh) 动态加载基於图的计算
WO2019188171A1 (ja) コード生成方法、コード生成装置
HK1179007A (zh) 动态装载基於图的计算
HK1179007B (zh) 动态装载基於图的计算

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16810742

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2016810742

Country of ref document: EP