WO2023241418A1 - 处理器以及用于数据处理的方法、设备和存储介质 - Google Patents
处理器以及用于数据处理的方法、设备和存储介质 Download PDFInfo
- Publication number
- WO2023241418A1 WO2023241418A1 PCT/CN2023/098716 CN2023098716W WO2023241418A1 WO 2023241418 A1 WO2023241418 A1 WO 2023241418A1 CN 2023098716 W CN2023098716 W CN 2023098716W WO 2023241418 A1 WO2023241418 A1 WO 2023241418A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- data
- instruction
- memory
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
- G06F9/3016—Decoding the operand specifier, e.g. specifier format
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/355—Indexed addressing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/3001—Arithmetic instructions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30018—Bit or string instructions
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30029—Logical and Boolean instructions, e.g. XOR, NOT
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30032—Movement instructions, e.g. MOVE, SHIFT, ROTATE, SHUFFLE
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
- G06F9/30038—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations using a mask
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/3004—Arrangements for executing specific machine instructions to perform operations on memory
- G06F9/30043—LOAD or STORE instructions; Clear instruction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/34—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes
- G06F9/345—Addressing or accessing the instruction operand or the result ; Formation of operand address; Addressing modes of multiple operands or results
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Definitions
- Example embodiments of the present disclosure relate generally to the field of computers, and in particular to processors and methods, devices, and computer-readable storage media for data processing.
- processors can be used in a variety of scenarios.
- different instruction set architectures ISAs
- These instruction set architectures often need to be compatible with a variety of usage scenarios.
- ISAs instruction set architectures
- a processor in a first aspect of the present disclosure, includes an instruction decoder configured to decode target instructions for vector operations.
- Target instructions involve target opcodes, source operands, and destination operands.
- the target opcode indicates the vector operation specified by the target instruction.
- the source operand specifies the source storage location in memory from which to read the data to be processed.
- the destination operand specifies the destination storage location in memory to which the processing results are written.
- the processor also includes an arithmetic logic unit coupled to the instruction decoder and the memory.
- the arithmetic logic unit is configured to: read the data to be processed from the source storage location of the memory; perform the arithmetic logic operation associated with the vector operation specified by the target instruction on the data to be processed; to and writing the processing results to the target storage location of the memory.
- a method for data processing includes decoding target instructions for vector operations.
- Target instructions involve target opcodes, source operands, and destination operands.
- the target opcode indicates the vector operation specified by the target instruction.
- the source operand specifies the source storage location in memory from which to read the data to be processed.
- the destination operand specifies the destination storage location in memory to which the processing results are written.
- the method also includes reading the data to be processed from a source storage location in the memory; performing an arithmetic and logical operation on the data to be processed associated with the vector operation specified by the target instruction; and writing the processing result to the target storage location in the memory.
- an electronic device in a third aspect of the present disclosure, includes at least a processor according to the first aspect.
- a computer-readable storage medium is provided.
- a computer program is stored on the computer-readable storage medium, and the computer program can be executed by a processor to implement the method of the second aspect.
- FIG. 1 illustrates a schematic diagram of an example environment in which embodiments of the present disclosure can be implemented
- Figure 2 shows a schematic diagram of example instructions in accordance with some embodiments of the present disclosure
- Figure 3 shows a schematic diagram of storage locations corresponding to example source operands according to some embodiments of the present disclosure
- FIG. 4 illustrates a flow diagram of a process for data processing in accordance with some embodiments of the present disclosure.
- FIG. 5 illustrates a block diagram of an electronic device in which a processor may be included in accordance with one or more embodiments of the present disclosure.
- processors can be applied to a variety of scenarios.
- different instruction set architectures that can be adopted by processors have been proposed. These instruction set architectures often need to be compatible with a variety of usage scenarios.
- the usage scenarios of these conventional instruction set architectures are not consistent with the usage scenarios of vector computing such as neural network computing. Therefore, for some vector calculations with high instruction repetition and large data volume, a better instruction set architecture is needed to enable the processor to better handle such vector calculations.
- a conventional solution is to use a standard processor instruction set, such as the reduced instruction set computer (RISC)-V instruction set.
- RISC reduced instruction set computer
- these general instruction sets can complete various vector calculations, such as various neural network operators, it is difficult to ensure high execution efficiency because these general instruction sets need to be compatible with a variety of usage scenarios.
- the calculation of neural network operators usually involves a large number of vector calculations, which is not friendly to general instruction sets. good.
- a conventional solution may use a digital signal processor (DSP) architecture, such as a single instruction multiple data (SIMD) architecture, or may use a vector processor (Vector) architecture.
- DSP digital signal processor
- SIMD single instruction multiple data
- Vector vector processor
- the instruction sets of the above-mentioned DSP architectures are usually not publicly available.
- vector processor architectures such as the vector instruction set under the RISC-V standard (referred to as RISC-V vector instruction set)
- these instruction sets are usually highly complex and appear redundant for vector calculations such as neural network operators.
- the processor includes an instruction decoder and an arithmetic logic unit.
- the instruction encoder is used to receive target instructions for processing vector operations.
- This target instruction is suitable for memory-to-memory (MEM to MEM) processor architecture.
- the target instruction involves a target opcode, a source operand, and a destination operand.
- the target opcode indicates the vector operation specified by the target instruction.
- the source operand specifies at least the source storage location in the memory for reading the data to be processed.
- the target operand specifies at least the target storage location in the memory for writing the processing result. .
- the processor's arithmetic logic unit is coupled to the instruction decoder and memory.
- the arithmetic logic unit is configured to perform a vector operation of the target instruction based on decoding information of the target instruction by the instruction decoder. For example, the arithmetic logic unit is configured to read the data to be processed from a source storage location of the memory; perform an arithmetic logic operation associated with the vector operation specified by the target instruction on the data to be processed; and write the processing result of the data to be processed.
- the target storage location of the memory is configured to perform a vector operation of the target instruction based on decoding information of the target instruction by the instruction decoder. For example, the arithmetic logic unit is configured to read the data to be processed from a source storage location of the memory; perform an arithmetic logic operation associated with the vector operation specified by the target instruction on the data to be processed; and write the processing result of the data to be processed.
- the target storage location of the memory is configured to perform a vector operation of the
- This solution simplifies processor operation by using a processor suitable for memory-to-memory architecture.
- the processor is able to perform a large number of vector calculations using a simple instruction set.
- the processor can perform neural network vector calculations using a simple instruction set.
- this solution can use a simple instruction set to improve the efficiency of the processor in performing vector calculations.
- FIG. 1 shows a schematic diagram of an example environment 100 in which embodiments of the present disclosure can be implemented.
- processor 110 may represent any kind of instruction processing device.
- processor 110 may be a general purpose processor or any other suitable processor.
- Processor 110 is configured to receive instructions 140 and perform operations indicated by instructions 140, such as vector operations.
- processor 110 may receive instructions 140 from other devices in environment 100 .
- instructions 140 are SIMD instructions.
- Processor 110 includes an instruction decoder 120 and an arithmetic logic unit 130 .
- processor 110 may also include or be communicatively coupled to memory (not shown).
- the memory may be data memory such as Vector Closely-coupled Memory (VCCM).
- Instruction decoder 120, arithmetic logic unit 130, and memory are communicatively coupled. That is, the instruction decoder 120, the arithmetic logic unit 130, and the memory may communicate with each other according to appropriate data transmission protocols and/or standards. In operation, instruction decoder 120 receives instructions 140 and decodes instructions 140 .
- VCCM Vector Closely-coupled Memory
- instruction decoder 120 may decode instructions 140 into arithmetic and/or logical operations, etc., that may be processed by arithmetic logic unit 130 .
- Instruction decoder 120 may be implemented using a variety of different mechanisms.
- instruction decoder 120 may be implemented using hardware circuitry, or at least in part by means of software modules.
- Arithmetic logic unit 130 is configured to operate based on information obtained by decoding instruction 140 by instruction decoder 120 . Arithmetic logic unit 130 may perform various arithmetic operations, logical operations, and the like. Arithmetic logic unit 130 may be implemented using a variety of different mechanisms. For example, arithmetic logic unit 130 may be implemented using hardware circuitry, or at least partially by means of software modules.
- processor 110 may be implemented in a variety of existing or future computing platforms or computing systems.
- the processor 110 may be implemented in various embedded applications (eg, data processing systems of mobile network base stations, etc.) to provide services such as large-scale vector calculations.
- the processor 110 may also be integrated or embedded into various electronic devices or computing devices to provide various computing services.
- the application environment and application scenarios of the processor 110 are not limited here.
- instruction decoder 120 decodes received instructions 140 .
- Instructions 140 are sometimes referred to herein as "target instructions,” which may be used interchangeably in this context.
- Figure 2 shows a schematic diagram of example instructions 140 in accordance with some embodiments of the present disclosure.
- instruction 140 includes a target opcode 210 , a source operand 220 , and a target operand 230 .
- the target opcode 210 is sometimes also referred to as an "opcode”, which may be used interchangeably in this context.
- Target opcode 210 may indicate the vector operation specified by instruction 140 .
- the source operand 220 is at least used to specify a source storage location in the memory for reading the data to be processed.
- the target operand 230 is at least used to specify a target storage location in the memory for writing the processing result.
- the instruction decoder 120 decodes the above information indicated by the target operation code 210, the source operand 220, and the target operand 230 for processing by the arithmetic logic unit 130.
- arithmetic logic unit 130 is configured to read data to be processed from a source storage location in memory specified by source operand 220 .
- Arithmetic logic unit 130 performs arithmetic logic operations associated with the vector operations specified by instructions 140 on the data to be processed.
- the arithmetic logic unit 130 then writes the processing result of the data to be processed into the target storage location specified by the target operand 230 .
- instructions 140 may be encoded using, for example, binary. In other embodiments, instructions 140 may be encoded using other encoding forms or other bases. In this article, unless otherwise specified, the encoding format and encoding representation of the instruction 140 described below are all in binary format. For example, the instruction 140 in binary form may be defined in the format in Table 1 below.
- bits 86 to 95 are used to represent target opcode 210 of instruction 140.
- Each operand from bit 22 to bit 85 is used to represent the source operand 220 of the instruction 140 .
- Each parameter from bit 0 to bit 21 is used to represent the target operand 230 of the instruction 140.
- any specific numerical values or digits appearing here and elsewhere herein are exemplary unless otherwise stated.
- the number of bits at which each operation code and/or operand is listed above is exemplary rather than limiting.
- the destination opcode 210, source operand 220, and destination operand 230 of instruction 140 may be located at other appropriate number of bits.
- the source operand A_vaddr of bits 70 to 85 is used to represent the address index of the data of the A channel (also called the first storage space of the memory) in a memory, such as a data memory (such as VCCM), that is, Address index of VCCM[A_vaddr].
- the address index is in units of a vector word.
- a vector word can represent a memory cell of a channel within a memory that is width by the SIMD width. That is, the address index is in units of one SIMD width.
- the memory depth is, for example, 1024. In this example, only 10 bits from bits 70 to 85 can be used to represent the address index of the A channel.
- the memory may have other suitable depths and the address index may have other suitable number of bits.
- the source operand A_index from bit 60 to bit 69 is used to represent the element index of the vector word of the A channel in the memory.
- Each vector word can have e.g. 64 elements. This element index can be used to indicate a certain element in the vector word within the A channel.
- the vector word of the A channel is divided into, for example, 64 elements, only 6 of the 60th to 69th bits may be used to represent the A_index.
- the source operand A_vm in bits 54 to 59 is used to represent the index of the vector mask (VM) register of the A channel.
- the A channel has 16 VM registers. In such an example, A_vm may be represented using only 4 bits from bits 54 to 59.
- the source operand B_vaddr of bits 38 to 53 is used to represent the address index of the B channel (also known as the second storage space of the memory) within the memory (for example, the data memory VCCM), that is, VCCM[B_vaddr ] address index.
- the address index is in units of one vector word. That is, the address index is in units of one SIMD width.
- the source operand B_index in bits 28 to 37 is used to represent the element index of the vector word of the B channel.
- the source operand B_vm in bits 22 to 27 is used to represent the index of the vector mask register of the B channel.
- Examples of the target operand 230 in Table 1 include bits 6 to 21 of C_vaddr that can represent the address index of the C channel of the data memory VCCM, that is, the address index of VCCM[C_vaddr].
- the address index is in units of one vector word. That is, the address index is in units of one SIMD width.
- Examples of destination operand 230 also include bits 0 to 5 of C_vm, which represent the index of the vector mask register of the C channel.
- FIG. 3 shows a schematic diagram of storage locations corresponding to example source operands according to some embodiments of the present disclosure.
- the storage space of the memory is divided into multiple channels, such as channel 310-1, channel 310-2, ..., channel 310-N, etc., where N is an integer greater than 1.
- the channels 310-1, 310-2, ..., and the channels 310-N are collectively referred to as channels 310 or individually as channel 310 below.
- the value of N may be preset.
- N can be set to different values such as 1024, 512, etc.
- Each channel 310 includes, for example, 1024 bits or other suitable number of bits.
- Address index 330 may indicate the address of channel 310-1.
- the address of the channel 310 may be in units of vector words.
- Address index 330 may be 16 bits. For example, if address index 330 is "0b0000_0000_0000_0000,” then address index 330 may indicate channel 310-1.
- the address index 330 may also be 10 bits, for example, the channel 310-1 is indicated by the address "0b00_0000_0000”. Note that the encoding representations starting with "0b” in this article all represent binary representations, and will not be repeated below. The "_" appearing in the binary representation is for viewing convenience only, has no actual meaning, and does not occupy binary bits.
- the vector words of each channel 310 may be divided into multiple elements.
- element such as element 320.
- Element 320 may include, for example, 64 bits.
- Element index 340 (eg, source operand A_index or B_index) may indicate a certain element, such as the index of element 320.
- Element index 340 may be 10 bits. For example, if the element index 340 is "0b00_0000_0000", the element index 340 may indicate the element 320. For another example, in the example where the number of elements in the vector word of each channel 310 is 64, the element index may be 6 bits, for example, the element index "0b00_0000" may indicate the element 320.
- each channel may have a different number of bits, and each vector word may also have a different number of bits.
- the address index and the element index can also have different number of bits and different encoding representations. The scope of the present disclosure is not limited in this regard.
- source operands 220 are listed above with reference to Table 1. Further examples of source operands 220 are described below with reference to Table 2.
- source operand 220 may include A_imm located at bits 54 to 85, which represents an immediate value in instruction 140. Similarly, source operand 220 may also include B_imm located at bits 22 to 53, which represents another immediate value in instruction 140 . Similar to Table 1, the target operand 230 in Table 2 may also include C_vaddr and/or C_vm.
- each source operand and/or each target operand described above in conjunction with Table 1 and Table 2 is only exemplary and not restrictive.
- the source operand 220 and/or the destination operand 230 used in the present disclosure may include any one or more of the above source operands and/or destination operands.
- source operand 220 and/or destination operand 230 may include any other suitable operand types other than the above source operands and/or destination operands.
- Table 3 below describes example encodings of opcodes for instruction 140. For example, if bit 0 is 0, it means that instruction 140 is a variable type. If bit 0 is 1, it means that instruction 140 is of immediate type.
- the first to second bits represent the sub-function code of instruction 140.
- Bits 3 to 7 represent the function code of instruction 140.
- Bits 8 to 9 represent the calculation accuracy of instruction 140. For example, binary "00" can represent a single-precision floating point number. Other binary values can represent other computational precisions reserved.
- the vector operation specified by instruction 140 may be determined based on the target opcode 210 of instruction 140 .
- the processor 110 may pre-store the operation codes of each instruction.
- Instruction decoder 120 may determine the vector operation specified by instruction 140 based on the target operation code 210 of received instruction 140 .
- the target opcode 210 of instruction 140 is encoded as "0b00_00110_01_0”
- instruction decoder 120 may determine the instruction 140 to be a v2indexr instruction. It should be understood that the examples of opcodes and instruction types listed above are only illustrative and not restrictive. has been coded
- the instruction "0b00_00110_01_0" can also specify other vector operations.
- source operand 220 may include two source operands, such as A_vaddr and B_vaddr, or A_vaddr and B_imm.
- the width of each source operand can be SIMD width.
- source operand 220 may include only one source operand B_vaddr, etc.
- the target operand 230, such as C_vaddr, can specify the target storage location where the processing result is written back to the memory, that is, VCCM[C_vaddr].
- the instruction 140 includes a processing result vector at the target storage location.
- Target operand 230 also indicates the target VM register, such as C_vm or vm3.
- the value at each position of the target VM register indicates whether the corresponding processing result is to be written at the corresponding position of the processing result vector. For example, if the target register vm3[i] is 1, it means that the i-th element of the processing result vector word is write-enabled and can be written to the corresponding processing result. On the contrary, if the target register vm3[i] is 0, the i-th element of the processing result vector word cannot be written to the corresponding processing result.
- Table 4 describes several example instructions that processor 110 may support.
- the instructions in Table 4 can be described with reference to the instruction definitions in Table 1 or Table 2, and can be encoded with reference to the example encoding method in Table 3.
- target operands 230 both include C_vaddr (ie, &v3) and C_vm (ie, vm3).
- Reserved in Table 4 indicates one or more bits reserved. These reserved bits can be encoded or used later.
- instructions 140 include a first index determination instruction (eg, v2indexl or v2indexr in Table 4).
- source operand 220 specifies the location of the first storage space of memory, ie, the address index of channel A (A_vaddr is &v1).
- the source operand 220 also specifies a given index value of the data to be processed in the second storage space of the memory, that is, the element index within the vector word of channel B (B_vaddr is &v2, B_index is index2).
- arithmetic logic unit 130 is configured to determine the first index.
- the first index indicates the storage location in the first storage space of the value at the position indicated by the given index value in the data to be processed.
- the opcode of the instruction v2indexl can be encoded as "0b00_00110_00_0".
- the instruction v2indexl v1, v2, index2, v3, vm3 means assigning v3[i] to indext, where indext can make v1[indext] equal to v2[index2 ]
- target operand 230 also indicates a target vector mask register. The value at each position of the target vector mask register indicates whether the corresponding processing result is to be written at the corresponding position of the processing result vector. For example, if vm3[i] equals 1, then v3[i] is write-enabled.
- the opcode of the instruction v2indexr can be encoded as "0b00_00110_01_0".
- the instruction v2indexr v1, v2, index2, v3, vm3 means assigning v3[i] to indext, where indext is the index of the first element from right to left that can make v1[indext] equal to v2[index2]. If no element satisfies the above condition, indext is set to "-1" represented by two's complement. In this example, if vm3[i] equals 1, then v3[i] is write-enabled.
- instructions 140 include a second index determination instruction (eg, v2indexli or v2indexri in Table 4).
- source operand 220 specifies the location of the first storage space of memory, ie, the address index of channel A (A_vaddr is &v1).
- Source operand 220 also specifies the first immediate value, namely, immediate value imm2.
- the arithmetic logic unit is configured 130 to determine the second index.
- the second index indicates the storage location of the first immediate value in the first storage space.
- the opcode of instruction v2indexli is encoded as "0b00_00110_00_1".
- the instruction v2indexli v1, imm2, v3, vm3 means assigning v3[i] to indext, where indext is the index of the first element from left to right that can make v1[indext] equal to imm2. If no element satisfies the above condition, indext is set to "-1" represented by two's complement. In this example, if vm3[i] equals 1, then v3[i] is write-enabled.
- the opcode of the instruction v2indexri can be encoded as "0b00_00110_01_1".
- the instruction v2indexri v1, imm2, v3, vm3 means assigning v3[i] to indext, where indext is the index of the first element from right to left that can make v1[indext] equal to imm2. If no element satisfies the above conditions, indext is set to Set to "-1" represented by two's complement notation. In this example, if vm3[i] equals 1, then v3[i] is write-enabled.
- instructions 140 may include a first value determination instruction, such as instruction Sindex2v in Table 4.
- the source operand 220 specifies the location of the first storage space of the memory, that is, the address index of channel A (A_vaddr is &v1).
- the source operand 220 also specifies a given index value of the data to be processed in the second storage space of the memory, that is, the element index within the vector word of channel B (B_vaddr is &v2, B_index is index2).
- the arithmetic logic unit 130 is configured to: determine a given value of the data to be processed at a location indicated by a given index value, and determine a location in the first storage space indexed by the given value. The first value at the position.
- the instruction Sindex2v v1,v2,index2,v3,vm3 has an opcode encoded as "0b00_00110_10_0". This instruction means assigning v3[i] to v1[v2[index2]]. If vm3[i] is equal to 1, v3[i] is write enabled.
- instructions 140 include second value determination instructions.
- the source operand 220 specifies a given index value of the data to be processed in the second storage space of the memory, that is, B_vaddr is &v2 and B_index is index2.
- the arithmetic logic unit 130 is configured to determine a second value in the data to be processed at the position indicated by the given index value.
- the instruction s2v v2,index2,v3,vm3 has an opcode encoded as "0b00_00110_10_1". This instruction means assigning v3[i] to v2[index2]. If vm3[i] is equal to 1, v3[i] is write enabled.
- the processor 110 can better process some operations such as obtaining coordinates.
- Operators such as the index for finding the maximum value (ArgMax) operator, the index for finding the minimum value (ArgMin) operator, or the operator for finding the highest ranked K values (TopK). Taking ArgMax as an example, it is used to find the index that makes the value v[index] in the vector v be the maximum value.
- ArgMax The instructions required for the 64-element ArgMax are as follows: first, v2smax v1,vm1,v2,vm2 (this instruction will be described in Table 5 and Table 6 below), which finds the largest element value in v1, and Write it to v2, where all bits of vm1 and vm2 store values are 1; next, v2indexl v1,v2,0,v3, vm3, this instruction finds the index so that v1[index] is equal to v2[0], and writes the value of index to v3, where all bits of the vm3 stored value are 1.
- the target instructions include vector transpose instructions, such as the instructions vtranspose or vstranspose.
- the source operand 220 specifies the first position in the first storage space in the memory, that is, A_vaddr is &v1, and A_index is index1.
- Source operand 220 also specifies the source vector mask register vm1 and optionally vm2.
- the arithmetic logic unit 130 is configured to vector-transpose the data to be processed at the first location in the first storage space to obtain the transposed data to be processed.
- the vector transpose instruction vtranspose v1,index1,vm1,vm2,v3,vm3 has an opcode encoded as 0b00_00111_11_0, which is used to transpose a vector (or matrix) of, for example, 32*32.
- the values of vm1 and vm2 enable the read lane (lane); the value of vm3 enables the write lane.
- the number R of consecutive 1 bits in vm1 is used to represent the number of rows of the matrix
- the number C of consecutive 1 bits in vm2 is used to represent the number of columns of the matrix, where R and C are both arbitrary natural numbers, and R and C can be the same It can also be different.
- the valid bits of vm1, vm2 and vm3 must be consecutive, otherwise the first 1 in the lowest bit shall prevail.
- the above vector transpose instruction vtranspose can be used to transpose the R*C matrix.
- the vector transpose instruction vstranspose v1, index1, vm1, v3, vm3 can be used, which is used to transpose the square matrix.
- the value of vm1 is the read channel enable; the value of vm3 is the write channel enable.
- the number R of consecutive 1 bits in vm1 is used to represent the number of rows (or columns) of the square matrix.
- the vector transpose instruction vstranspose can be used to transpose the R*R square matrix.
- the vector transpose instruction is not a standard RISC type instruction.
- the vector transposition function must be completed through multiple consecutive transposition instructions.
- This solution can improve the computing power of part of the network by using vector transpose instructions.
- the neural network training process usually involves a large number of transposition operations of matrices or square matrices. Using the vector transpose instruction of this solution can improve the computational efficiency of the neural network training process.
- the target instructions include exponent instructions, such as vexp.
- source operand 220 specifies the source storage location, that is, A_vaddr is &v1.
- arithmetic logic sheet Element 130 is configured to determine an exponent value with a predetermined value (eg, the natural base e) as the base raised to the power of the data to be processed at the source storage location.
- vexp v1, v3, vm3 has an opcode encoded as "0b00_01000_01_0", which means assigning v3[i] to exp(v1[i]). If vm3[i] is equal to 1, v3[i] is write enabled.
- the above exponential instructions are suitable for sigmoid operators and operators such as hyperbolic functions sinh/cosh/tanh.
- the sigmoid operator, sinh operator, cosh operator and tanh operator can be represented by the following equations (1) to (4).
- x represents the data to be processed.
- instructions 140 include VM register instructions, such as vm2index instructions.
- Source operand 220 indicates the source VM register in memory, namely vm1.
- the arithmetic logic unit is configured to store the index at the enabled location in the source VM register to the target storage location.
- instructions 140 include onehot code conversion instructions, such as vindex2vm.
- This is a VM register manipulation instruction.
- the source operation Operator 220 specifies a given index value of the data to be processed in the second storage space of the memory, that is, B_vaddr is &v2, and B_index is index2.
- the behavior of reading the vector mask register involved in this instruction is not a write enable for memory writing (other instructions that write to memory require reading the vector mask register as a write enable).
- Destination operand 230 specifies the destination VM register, which is vm3.
- the arithmetic logic unit is configured to: convert the value of the data to be processed at a given index value into a one-hot code, and store the one-hot code into the target VM register.
- the instruction vindex2vm v2,index2,vm3 has an opcode encoded as "0b00_10000_01_0". This instruction indicates that vm3 is assigned the value onehot(v2[index2]), where onehot() indicates the one-hot code conversion function.
- the one-hot code conversion instructions described above are suitable for index-type instructions and can support one-hot code operators. Convert a number to one-hot encoded form.
- the following two instructions can be used to implement: vindex2vm v1,0,vm 1 and vmload vm1,v2,vm2, where the first instruction is used to convert the value of v1[0] into a one-hot encoding form , and write to vm1, the second instruction (vmload will be described in Table 7 and Table 8 below) is used to store the value in vm1 to v2, where all the bits of the stored value of vm2 are 1.
- Table 5 shows examples of more general instructions 140 supported by the processor 110 .
- the instructions in Table 5 can be described with reference to the instruction definitions in Table 1 or Table 2, and the opcodes are encoded using the example encoding method in Table 3.
- the MAX() and MIN() functions in Table 6 represent the functions for finding the maximum value and the minimum value respectively.
- DW represents the width of the vector word
- LANE_NUM represents the number of elements in a vector word
- the mod() function represents the remainder function
- the ceil() and floor() functions represent upward rounding and downward rounding respectively
- the SUM() function represents Sum function.
- instructions supported by processor 110 also include various vector mask register access and operation instructions.
- vector mask register access and manipulation instructions are shown in Table 7.
- Table 7 The functions and definitions of each instruction in Table 7 will be shown in Table 8.
- the functions of these instructions include reading, writing and operating on the vector mask register. These instructions involve the act of reading the vector mask register, not as a write enable for memory writes (other instructions that write to memory require reading the vector mask register as a write enable). These instructions are not described in detail here.
- the instructions 140 supported by the processor 110 also include internal register access and manipulation instructions. This type of instruction is used to handle access to internal registers and some special operations. For example, writing to the internal control and status (CSR) register, writing a fixed value, or a certain SIMD length data in the data memory VCCM, etc. Another example is reading out the internal CSR register or reading out the data memory VCCM; and empty instructions (ie, no operation is performed, waiting for 1 cycle), etc.
- CSR internal control and status
- Table 9 shows several examples of internal register access and manipulation instructions.
- Table 10 shows the functions of each instruction in Table 9. These instructions are not described in detail in this article. Note that for the vwcsr instruction in Table 9, the source operand is in channel A; while for vwcsri Instructions, immediate data in channel B.
- the various instructions supported by the processor 110 are described above in conjunction with Table 4 to Table 10. These instructions may be decoded by instruction decoder 120 of processor 110 and executed by arithmetic execution unit 130 . These instructions may constitute an instruction set supported by processor 110 . It should be understood that in some embodiments, the instruction set may be constructed from only some or all of the above individual instructions. Alternatively or additionally, other suitable instructions not described above may also be employed to construct the instruction set supported by processor 110 .
- example instruction definitions and example opcode encoding representations specified above with reference to Tables 1 to 3 enumerate individual instructions in Tables 4 to 10, they are only exemplary and not limiting.
- the instruction set supported by the processor of the present disclosure may be defined and encoded in any suitable manner.
- individual bits of each instruction may have different meanings than those represented by each bit in Table 1 or Table 2.
- the coded representation of the operation code of each instruction may have different digits from those in Table 3, and each digit may also have a different meaning from each bit in Table 3.
- the encoding representation of the operation codes of each instruction in the above Table 4 to Table 10 can be changed or interchanged. Individual instructions can also be represented by other names. The scope of the present disclosure is not limited in this regard.
- the instructions described above do not include branch type instructions, nor do they include load/store type instructions.
- the registers used in the present disclosure are memory-to-memory SIMD processor architectures.
- the above instruction set defines multiple (for example, 64 or more or less) vector mask registers to represent the specific vectors that each SIMD instruction needs to process.
- processor 110 simplifies the operation of the processor 110 by using a SIMD processor suitable for memory-to-memory architecture.
- processor 110 is able to perform a large number of vector calculations using a simple instruction set.
- the processor 110 can use a simple instruction set to perform tasks such as vector calculation of neural network operators.
- this solution can use A simple instruction set to improve the efficiency of the processor performing vector calculations.
- the solution of the present disclosure can greatly improve computational efficiency.
- a processor according to embodiments of the present disclosure may support various index determination instructions, thereby improving the efficiency of various vector calculations such as obtaining coordinates.
- the processor of the present disclosure can process instructions such as vector transpose, thereby improving the computational efficiency of corresponding calculations in the neural network training process.
- the processor of the present disclosure can support exponential instructions, thereby improving and optimizing the calculation efficiency of sigmoid operators and hyperbolic function operators.
- FIG. 4 illustrates a flow diagram of a process 400 for data processing in accordance with some embodiments of the present disclosure.
- Process 400 may be implemented at processor 110 .
- process 400 will be described with reference to environment 100 of FIG. 1 .
- the target instruction such as instruction 140
- instruction 140 may be decoded by instruction decoder 120 of processor 110 .
- Instruction 140 involves a target opcode 210 , a source operand 220 , and a target operand 230 .
- Target opcode 210 indicates the vector operation specified by instruction 140 .
- the source operand 220 specifies at least a source storage location in memory from which to read the data to be processed.
- the target operand 230 specifies at least a target storage location in memory for writing the processing results.
- the data to be processed is read by the processor 110 from the source storage location of the memory.
- the data to be processed may be read from the source storage location of the memory by the arithmetic logic unit 130 of the processor 110 .
- arithmetic logical operations associated with the vector operations specified by the target instructions are performed by the processor 110 on the data to be processed.
- the arithmetic logic operations described above may be performed by the arithmetic logic unit 130 of the processor 110 .
- the processing result of the data to be processed is written by the processor 110 to a target storage location of the memory.
- the processing results may be written to the target storage location by the arithmetic logic unit 130 of the processor 110 .
- instructions 140 include index determination instructions.
- the index determination instruction may be a first index determination instruction (v2indexl or v2indexr) or a second index determination instruction (v2indexli or v2indexri).
- the source operand 220 specifies the location of the first storage space of the memory.
- the source operand 220 also specifies a given index value or the first immediate value of the data to be processed in the second storage space of the memory.
- processor 110 performs calculations
- the numerical logic operation includes: determining the first index or determining the second index.
- the first index indicates the storage location in the first storage space of the value at the position indicated by the given index value in the data to be processed.
- the second index indicates the storage location of the first immediate value in the first storage space.
- instruction 140 includes a first value determination instruction (eg, instruction Sindex2v), source operand 220 specifies a location in a first storage space of memory, and source operand 220 also specifies a location to be determined in a second storage space of memory. Processes the given index value of the data.
- the arithmetic logic operations performed by the processor 110 include: determining a given value of the data to be processed at a location indicated by a given index value; and determining a given value in the first storage space indexed by the given value. The first value at the position.
- instructions 140 include second value determination instructions, such as instructions s2v.
- the source operand 220 specifies a given index value of the data to be processed within the second storage space of the memory.
- the arithmetic logic operations performed by the processor 110 include determining a second value in the data to be processed at the location indicated by the given index value.
- instructions 140 include vector transpose instructions, such as instructions vtranspose or vstranspose.
- Source operand 220 specifies a first location in a first storage space in memory.
- the arithmetic logic operation performed by the processor 110 includes vector transposing the data to be processed at the first location in the first storage space to obtain the transposed data to be processed.
- instructions 140 include exponent instructions, such as the vexp instruction.
- Source operand 220 specifies the source storage location.
- the arithmetic logic operation performed by the processor 110 includes determining an exponent value with a predetermined numerical base raised to the power of the data to be processed at the source storage location.
- instructions 140 include VM register instructions, such as vm2index.
- Source operand 220 of instruction 140 indicates the source VM register in memory.
- the arithmetic logic operation performed by the processor 110 includes storing the index at the enabled location in the source VM register to the target storage location.
- each instruction 140 described above includes a processing result vector at the target storage location.
- Target operand 230 also indicates the target VM register.
- Target VM sends The value at each location in the register indicates whether the corresponding processing result is to be written to the corresponding location in the processing result vector. For example, if the target register vm3[i] is 1, it means that the i-th element of the processing result vector word is write-enabled and can be written to the corresponding processing result. On the contrary, if the target register vm3[i] is 0, the i-th element of the processing result vector word cannot be written to the corresponding processing result.
- instructions 140 include one-hot code conversion instructions, such as the instruction vindex2vm.
- the source operand 220 specifies a given index value of the data to be processed within the second storage space of the memory.
- Destination operand 230 specifies the destination VM register.
- the processor 110 converts the value of the data to be processed at the given index value into a one-hot code.
- the processor 110 is also configured to store the one-hot code into the target VM register.
- FIG. 5 illustrates a block diagram of an electronic device 500 in which a processor 110 may be included in accordance with one or more embodiments of the present disclosure. It should be understood that the electronic device 500 shown in FIG. 5 is merely exemplary and should not constitute any limitation on the functionality and scope of the embodiments described herein.
- electronic device 500 is in the form of a general electronic device or computing device.
- Components of electronic device 500 may include, but are not limited to, one or more processors 110 , memory 520 , storage devices 530 , one or more communication units 540 , one or more input devices 550 , and one or more output devices 560 .
- processor 110 may perform various processes according to programs stored in memory 520 .
- the processor 110 may be a multi-core processor that can execute computer-executable instructions in parallel to improve the parallel processing capability of the electronic device 500 .
- Electronic device 500 typically includes a plurality of computer storage media. Such media may be any available media that is accessible to electronic device 500, including, but not limited to, volatile and nonvolatile media, removable and non-removable media.
- Memory 520 may be volatile memory (e.g., registers, cache, random access memory (RAM)), nonvolatile memory (e.g., read only memory (ROM), electrically erasable programmable read only memory (EEPROM) , flash memory) or some combination thereof.
- Storage device 530 may be a removable or non-removable medium and may include machine-readable media such as a flash drive, a magnetic disk, or any other medium that may be capable of storing information and/or data (e.g., using training data for training) and can be accessed within the electronic device 500.
- machine-readable media such as a flash drive, a magnetic disk, or any other medium that may be capable of storing information and/or data (e.g., using training data for training) and can be accessed within the electronic device 500.
- Electronic device 500 may further include additional removable/non-removable, volatile/non-volatile storage media.
- a disk drive may be provided for reading from or writing to a removable, non-volatile disk (eg, a "floppy disk") and for reading from or writing to a removable, non-volatile optical disk. Read or write to optical disc drives.
- each drive may be connected to the bus (not shown) by one or more data media interfaces.
- Memory 520 may include a computer program product 525 having one or more program modules configured to perform various methods or actions of various embodiments of the disclosure. For example, these program modules may be configured to implement various functions or actions of processor 110 , such as implementing the functions of instruction decoder 120 and arithmetic logic unit 130 .
- the communication unit 540 implements communication with other electronic devices or computing devices through communication media. Additionally, the functionality of the components of electronic device 500 may be implemented as a single computing cluster or as multiple computing machines capable of communicating over a communications connection. Accordingly, electronic device 500 may operate in a networked environment using a logical connection to one or more other servers, a network personal computer (PC), or another network node.
- PC network personal computer
- Input device 550 may be one or more input devices, such as a mouse, a keyboard, a trackball, etc.
- Output device 560 may be one or more output devices, such as a display, speakers, printer, etc.
- the electronic device 500 may also communicate with one or more external devices (not shown), such as storage devices, display devices, etc., through the communication unit 540 as needed, and with one or more devices that enable the user to interact with the electronic device 500 Communicate with or with any device (eg, network card, modem, etc.) that enables electronic device 500 to communicate with one or more other electronic devices or computing devices. Such communication may be performed via an input/output (I/O) interface (not shown).
- I/O input/output
- a computer-readable storage medium is provided with computer-executable instructions stored thereon, wherein the computer-executable instructions are executed by a processor to implement the method described above.
- a computer program product is also provided, the computer program product is tangibly stored on a non-transitory computer-readable medium and includes computer-executable instructions, and the computer-executable instructions are executed by a processor to implement the method described above.
- These computer-readable program instructions may be provided to a processing unit of a general-purpose computer, a special-purpose computer, or other programmable data processing apparatus, thereby producing a machine such that, when executed by the processing unit of the computer or other programmable data processing apparatus, the computer-readable program instructions , resulting in an apparatus that implements the functions/actions specified in one or more blocks in the flowchart and/or block diagram.
- These computer-readable program instructions can also be stored in a computer-readable storage medium. These instructions cause the computer, programmable data processing device and/or other equipment to work in a specific manner. Therefore, the computer-readable medium storing the instructions includes An article of manufacture that includes instructions that implement aspects of the functions/acts specified in one or more blocks of the flowcharts and/or block diagrams.
- Computer-readable program instructions may be loaded onto a computer, other programmable data processing apparatus, or other equipment, causing a series of operating steps to be performed on the computer, other programmable data processing apparatus, or other equipment to produce a computer-implemented process, Thereby, instructions executed on a computer, other programmable data processing apparatus, or other equipment implement the functions/actions specified in one or more blocks of the flowcharts and/or block diagrams.
- Example 1 describes a processor including an instruction decoder configured to decode target instructions for vector operations.
- Target instructions involve target opcodes, source operands, and destination operands.
- the target opcode indicates the vector operation specified by the target instruction.
- the source operand specifies at least the source storage location in memory from which to read the data to be processed.
- the destination operand specifies at least a destination storage location in memory to which to write the processing results.
- the processor also includes an arithmetic logic unit coupled to the instruction decoder and the memory.
- the arithmetic logic unit is configured to: read the data to be processed from the source storage location of the memory; perform the arithmetic logic operation associated with the vector operation specified by the target instruction on the data to be processed; and write the processing result of the data to be processed to the memory target storage location.
- Example 2 includes as described according to Example 1 A processor, wherein the target instruction includes a first index determination instruction, the source operand specifies the location of the first storage space of the memory, and the source operand also specifies a given index value of the data to be processed in the second storage space of the memory.
- the arithmetic logic unit is configured to perform an arithmetic logic operation associated with the vector operation specified by the target instruction: determine a first index indicating a location in the data to be processed indicated by a given index value. The storage location of the value in the first storage space.
- Example 3 includes the processor as described in Example 1, wherein the target instruction includes a second index determination instruction, the source operand specifies a location of the first storage space of the memory, and the source operand further Specifies the first immediate value.
- the arithmetic logic unit is configured to perform an arithmetic logic operation associated with the vector operation specified by the target instruction: determine a second index indicating a storage location of the first immediate value in the first storage space.
- Example 4 includes the processor as described in Example 1, wherein the target instruction includes a first value determination instruction, the source operand specifies a location of a first storage space of the memory, and the source operand further Specify the given index value of the data to be processed in the second storage space of the memory.
- the arithmetic logic unit is configured to perform an arithmetic logic operation associated with the vector operation specified by the target instruction: determine a given value of the data to be processed at a location indicated by a given index value; and determine a first storage The first value in space at the position indexed by the given value.
- Example 5 includes the processor as described in Example 1, wherein the target instruction includes a second value determination instruction, and the source operand specifies a given value of data to be processed in a second storage space of the memory. Fixed index value.
- the arithmetic logic unit is configured to perform an arithmetic logic operation associated with the vector operation specified by the target instruction: determine a second value in the data to be processed at a location indicated by a given index value.
- Example 6 includes the processor as described in Example 1, wherein the target instruction includes a vector transpose instruction and the source operand specifies at least a first location in a first storage space in the memory.
- the arithmetic logic unit is configured as follows to perform an arithmetic logic operation associated with the vector operation specified by the target instruction: vector-transpose the data to be processed at the first location in the first storage space to obtain the transposed of Data to be processed.
- Example 7 includes the processor as described in Example 1, wherein the target instruction includes an exponent instruction and the source operand specifies the source storage location.
- the arithmetic logic unit is configured to perform an arithmetic logic operation associated with the vector operation specified by the target instruction by raising the data to be processed at the source storage location to the power of determining an exponent value with a predetermined numerical base.
- Example 8 includes the processor as described in Example 1, wherein the target instruction includes a vector mask VM register instruction and the source operand indicates a source VM register in memory.
- the arithmetic logic unit is configured to perform an arithmetic logic operation associated with the vector operation specified by the target instruction: store an index at the enabled location in the source VM register to the target storage location.
- Example 9 includes the processor as described in any one of Examples 2 to 8, wherein a processing result vector is included at the target storage location, and the target operand further indicates a target vector mask VM register. , the value at each position of the target VM register indicates whether the corresponding processing result is to be written at the corresponding position of the processing result vector.
- Example 10 includes the processor as described in Example 1, wherein the target instruction includes a one-hot code conversion instruction, and the source operand specifies a given value of the data to be processed in the second storage space of the memory.
- the index value is specified, and the destination operand specifies the destination vector mask VM register.
- the arithmetic logic unit is configured to perform the arithmetic logic operations associated with the vector operations specified by the target instruction: convert the value of the data to be processed at the given index value into a one-hot code; and store the one-hot code in in the target VM register.
- Example 11 describes a method of data processing.
- the method includes decoding a target instruction for a vector operation, the target instruction involving a target opcode, a source operand, and a destination operand.
- the target opcode indicates the vector operation specified by the target instruction.
- the source operand specifies at least the source storage location in memory from which to read the data to be processed.
- the destination operand specifies at least a destination storage location in memory to which to write the processing results.
- the method also includes: reading the data to be processed from the source storage location of the memory; Perform arithmetic and logical operations associated with the vector operations specified by the target instruction on the data to be processed; and write the processing results of the data to be processed into the target storage location of the memory.
- Example 12 includes the method described in Example 11, wherein the target instruction includes an index determination instruction, the source operand specifies a location of the first storage space of the memory, and the source operand further specifies at least the following: One item: the given index value and the first immediate value of the data to be processed in the second storage space of the memory.
- Performing an arithmetic logical operation associated with the vector operation specified by the target instruction includes at least one of: determining a first index indicating that a value in the data to be processed at a position indicated by a given index value is at a th A storage location in a storage space; determine a second index, and the second index indicates the storage location of the first immediate number in the first storage space.
- Example 13 includes the method described in Example 11, wherein the target instruction includes a first value determination instruction, the source operand specifies a location of a first storage space of the memory, and the source operand further specifies The given index value of the data to be processed in the second storage space of the memory.
- Performing an arithmetic logical operation associated with the vector operation specified by the target instruction includes: determining a given value of the data to be processed at a location indicated by a given index value; and determining a given value in the first storage space. is the first value at the index position.
- Example 14 includes the method described in Example 11, wherein the target instruction includes a second value determination instruction, and the source operand specifies a given value of the data to be processed in a second storage space of the memory. index value. Performing the arithmetic logic operation associated with the vector operation specified by the target instruction includes determining a second value in the data to be processed at a location indicated by the given index value.
- Example 15 includes the method described in Example 11, wherein the target instruction includes a vector transpose instruction, and wherein the source operand specifies at least a first location in a first storage space in the memory.
- Performing an arithmetic logical operation associated with the vector operation specified by the target instruction includes: vector transposing the data to be processed at the first location in the first storage space to obtain the transposed data to be processed.
- Example 16 includes the method described in Example 11, wherein the target instruction includes an exponent instruction and the source operand specifies the source storage location. Performing the arithmetic logic operations associated with the vector operation specified by the target instruction includes determining an exponent value with a predetermined numerical base raised to the power of the data to be processed at the source storage location.
- Example 17 includes the method described in Example 11, wherein the target instruction includes a vector mask VM register instruction and the source operand indicates a source VM register in memory. Performing the arithmetic logic operation associated with the vector operation specified by the target instruction includes storing the index at the enabled location in the source VM register to the target storage location.
- Example 18 includes the method described in Example 11, wherein the target instruction includes a one-hot code conversion instruction, and the source operand specifies a given value of the data to be processed in the second storage space of the memory. Index value, destination operand specifies the destination vector mask VM register. Performing the arithmetic and logical operations associated with the vector operation specified by the target instruction includes converting the value of the data to be processed at the given index value into a one-hot code; and storing the one-hot code into the target VM register.
- Example 19 describes an electronic device that includes at least the processor according to any one of Examples 1 to 10.
- Example 20 describes a computer-readable storage medium having a computer program stored thereon.
- the computer program is executed by the processor to implement the method according to any one of Examples 11 to 18.
- each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions that contains one or more executable functions for implementing the specified logical functions instruction.
- the functions noted in the block may occur out of the order noted in the figures. For example, two consecutive blocks may actually execute substantially in parallel, or they may sometimes execute in the reverse order, depending on the functionality involved.
- each block of the block diagram and/or flowchart illustration, and combinations of blocks in the block diagram and/or flowchart illustration can be implemented by special purpose hardware-based systems that perform the specified functions or acts. , or can be implemented using a combination of specialized hardware and computer instructions.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Computational Mathematics (AREA)
- Pure & Applied Mathematics (AREA)
- Complex Calculations (AREA)
- Executing Machine-Instructions (AREA)
- Memory System (AREA)
Abstract
Description
Claims (20)
- 一种处理器,包括:指令解码器,被配置为解码用于向量操作的目标指令,所述目标指令涉及目标操作码、源操作数和目标操作数,所述目标操作码指示所述目标指令所指定的向量操作,所述源操作数至少指定存储器中的用于读取待处理数据的源存储位置,所述目标操作数至少指定所述存储器中的用于写入处理结果的目标存储位置;以及算数逻辑单元,耦合至所述指令解码器和所述存储器,并且被配置为:从所述存储器的所述源存储位置读取所述待处理数据;对所述待处理数据执行与所述目标指令所指定的向量操作相关联的算数逻辑运算;以及将所述待处理数据的处理结果写入所述存储器的所述目标存储位置。
- 根据权利要求1所述的处理器,其中所述目标指令包括第一索引确定指令,所述源操作数指定所述存储器的第一存储空间的位置,所述源操作数还指定所述存储器的第二存储空间内的所述待处理数据的给定索引值,并且所述算数逻辑单元被配置为如下,以执行与所述目标指令所指定的向量操作相关联的算数逻辑运算:确定第一索引,所述第一索引指示所述待处理数据中的由给定索引值所指示的位置处的值在所述第一存储空间中的存储位置。
- 根据权利要求1所述的处理器,其中所述目标指令包括第二索引确定指令,所述源操作数指定所述存储器的第一存储空间的位置,所述源操作数还指定第一立即数,并且所述算数逻辑单元被配置为如下,以执行与所述目标指令所指定的 向量操作相关联的算数逻辑运算:确定第二索引,所述第二索引指示所述第一立即数在所述第一存储空间中的存储位置。
- 根据权利要求1所述的处理器,其中所述目标指令包括第一数值确定指令,所述源操作数指定所述存储器的第一存储空间的位置,所述源操作数还指定所述存储器的第二存储空间内的所述待处理数据的给定索引值,并且所述算数逻辑单元被配置为如下,以执行与所述目标指令所指定的向量操作相关联的算数逻辑运算:确定所述待处理数据在由所述给定索引值所指示的位置处的给定值;以及确定所述第一存储空间中的在以所述给定值为索引的位置处的第一值。
- 根据权利要求1所述的处理器,其中所述目标指令包括第二数值确定指令,所述源操作数指定所述存储器的第二存储空间内的所述待处理数据的给定索引值,并且所述算数逻辑单元被配置为如下,以执行与所述目标指令所指定的向量操作相关联的算数逻辑运算:确定所述待处理数据中的由给定索引值所指示的位置处的第二值。
- 根据权利要求1所述的处理器,其中所述目标指令包括向量转置指令,所述源操作数至少指定所述存储器中的第一存储空间中的第一位置,并且所述算数逻辑单元被配置为如下,以执行与所述目标指令所指定的向量操作相关联的算数逻辑运算:将所述第一存储空间中的第一位置处的所述待处理数据进行向量转置,以获得转置后的待处理数据。
- 根据权利要求1所述的处理器,其中所述目标指令包括指数指令,所述源操作数指定所述源存储位置,并且所述算数逻辑单元被配置为如下,以执行与所述目标指令所指定的向量操作相关联的算数逻辑运算:以所述源存储位置处的所述待处理数据为幂来确定以预定数值为底数的指数值。
- 根据权利要求1所述的处理器,其中所述目标指令包括向量掩码VM寄存器指令,所述源操作数指示所述存储器中的源VM寄存器,并且所述算数逻辑单元被配置为如下,以执行与所述目标指令所指定的向量操作相关联的算数逻辑运算:将所述源VM寄存器中被启用的位置处的索引存储到所述目标存储位置处。
- 根据权利要求2至8中任一项所述的处理器,其中所述目标存储位置处包括处理结果向量,所述目标操作数还指示目标向量掩码VM寄存器,所述目标VM寄存器的各个位置处的值指示所述处理结果向量的相应位置处是否要被写入相应的处理结果。
- 根据权利要求1所述的处理器,其中所述目标指令包括独热码转换指令,所述源操作数指定所述存储器的第二存储空间内的所述待处理数据的给定索引值,所述目标操作数指定目标向量掩码VM寄存器,并且所述算数逻辑单元被配置为如下,以执行与所述目标指令所指定的向量操作相关联的算数逻辑运算:将所述待处理数据在给定索引值处的值转换为独热码;以及将所述独热码存储到所述目标VM寄存器中。
- 一种数据处理的方法,包括:解码用于向量操作的目标指令,所述目标指令涉及目标操作码、源 操作数和目标操作数,所述目标操作码指示所述目标指令所指定的向量操作,所述源操作数至少指定存储器中的用于读取待处理数据的源存储位置,所述目标操作数至少指定所述存储器中的用于写入处理结果的目标存储位置;从所述存储器的所述源存储位置读取所述待处理数据;对所述待处理数据执行与所述目标指令所指定的向量操作相关联的算数逻辑运算;以及将所述待处理数据的处理结果写入所述存储器的所述目标存储位置。
- 根据权利要求11所述的方法,其中所述目标指令包括索引确定指令,所述源操作数指定所述存储器的第一存储空间的位置,所述源操作数还指定以下至少一项:所述存储器的第二存储空间内的所述待处理数据的给定索引值、第一立即数,并且其中执行与所述目标指令所指定的向量操作相关联的算数逻辑运算包括以下至少一项:确定第一索引,所述第一索引指示所述待处理数据中的由给定索引值所指示的位置处的值在所述第一存储空间中的存储位置,确定第二索引,所述第二索引指示所述第一立即数在所述第一存储空间中的存储位置。
- 根据权利要求11所述的方法,其中所述目标指令包括第一数值确定指令,所述源操作数指定所述存储器的第一存储空间的位置,所述源操作数还指定所述存储器的第二存储空间内的所述待处理数据的给定索引值,并且其中执行与所述目标指令所指定的向量操作相关联的算数逻辑运算包括:确定所述待处理数据在由所述给定索引值所指示的位置处的给定值;以及确定所述第一存储空间中的在以所述给定值为索引的位置处的第一值。
- 根据权利要求11所述的方法,其中所述目标指令包括第二数值确定指令,所述源操作数指定所述存储器的第二存储空间内的所述待处理数据的给定索引值,并且其中执行与所述目标指令所指定的向量操作相关联的算数逻辑运算包括:确定所述待处理数据中的由给定索引值所指示的位置处的第二值。
- 根据权利要求11所述的方法,其中所述目标指令包括向量转置指令,所述源操作数至少指定所述存储器中的第一存储空间中的第一位置,并且其中执行与所述目标指令所指定的向量操作相关联的算数逻辑运算包括:将所述第一存储空间中的第一位置处的所述待处理数据进行向量转置,以获得转置后的待处理数据。
- 根据权利要求11所述的方法,其中所述目标指令包括指数指令,所述源操作数指定所述源存储位置,并且其中执行与所述目标指令所指定的向量操作相关联的算数逻辑运算包括:以所述源存储位置处的所述待处理数据为幂来确定以预定数值为底数的指数值。
- 根据权利要求11所述的方法,其中所述目标指令包括向量掩码VM寄存器指令,所述源操作数指示所述存储器中的源VM寄存器,并且其中执行与所述目标指令所指定的向量操作相关联的算数逻辑运算包括:将所述源VM寄存器中被启用的位置处的索引存储到所述目标存储位置处。
- 根据权利要求11所述的方法,其中所述目标指令包括独热码转换指令,所述源操作数指定所述存储器的第二存储空间内的所述待处理 数据的给定索引值,所述目标操作数指定目标向量掩码VM寄存器,并且其中执行与所述目标指令所指定的向量操作相关联的算数逻辑运算包括:将所述待处理数据在给定索引值处的值转换为独热码;以及将所述独热码存储到所述目标VM寄存器中。
- 一种电子设备,至少包括根据权利要求1至10任一项所述的处理器。
- 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行以实现根据权利要求11至18任一项所述的方法。
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2024573128A JP2025519635A (ja) | 2022-06-14 | 2023-06-06 | プロセッサ、データ処理のための方法、デバイス、及び記憶媒体 |
| KR1020247041480A KR20250008533A (ko) | 2022-06-14 | 2023-06-06 | 프로세서, 데이터 처리를 위한 방법, 디바이스 및 기억 매체 |
| EP23822987.6A EP4524729A4 (en) | 2022-06-14 | 2023-06-06 | PROCESSOR, DATA PROCESSING METHOD, DEVICE AND STORAGE MEDIUM |
| US18/979,402 US12461747B2 (en) | 2022-06-14 | 2024-12-12 | Processor, method, device and storage medium for data processing |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202210674857.6A CN117289991B (zh) | 2022-06-14 | 2022-06-14 | 处理器以及用于数据处理的方法、设备和存储介质 |
| CN202210674857.6 | 2022-06-14 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/979,402 Continuation US12461747B2 (en) | 2022-06-14 | 2024-12-12 | Processor, method, device and storage medium for data processing |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| WO2023241418A1 true WO2023241418A1 (zh) | 2023-12-21 |
| WO2023241418A9 WO2023241418A9 (zh) | 2024-07-25 |
Family
ID=89192127
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/098716 Ceased WO2023241418A1 (zh) | 2022-06-14 | 2023-06-06 | 处理器以及用于数据处理的方法、设备和存储介质 |
Country Status (6)
| Country | Link |
|---|---|
| US (1) | US12461747B2 (zh) |
| EP (1) | EP4524729A4 (zh) |
| JP (1) | JP2025519635A (zh) |
| KR (1) | KR20250008533A (zh) |
| CN (1) | CN117289991B (zh) |
| WO (1) | WO2023241418A1 (zh) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN121277714B (zh) * | 2025-12-05 | 2026-03-31 | 上海壁仞科技股份有限公司 | 执行单元和计算设备 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109791488A (zh) * | 2016-10-01 | 2019-05-21 | 英特尔公司 | 用于执行用于复数的融合乘-加指令的系统和方法 |
| CN109992240A (zh) * | 2017-12-29 | 2019-07-09 | 英特尔公司 | 用于多加载和多存储向量指令的方法和装置 |
| CN113849769A (zh) * | 2020-06-27 | 2021-12-28 | 英特尔公司 | 矩阵转置和乘法 |
Family Cites Families (22)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US6907443B2 (en) * | 2001-09-19 | 2005-06-14 | Broadcom Corporation | Magnitude comparator |
| JP4788177B2 (ja) * | 2005-03-31 | 2011-10-05 | 日本電気株式会社 | 情報処理装置、演算処理装置、メモリアクセス制御方法およびプログラム |
| US20070118832A1 (en) * | 2005-11-18 | 2007-05-24 | Huelsbergen Lorenz F | Method and apparatus for evolution of custom machine representations |
| CN104137054A (zh) * | 2011-12-23 | 2014-11-05 | 英特尔公司 | 用于执行从索引值列表向掩码值的转换的系统、装置和方法 |
| CN104185838B (zh) | 2011-12-30 | 2017-12-22 | 英特尔公司 | 使用精减指令集核 |
| US8874933B2 (en) | 2012-09-28 | 2014-10-28 | Intel Corporation | Instruction set for SHA1 round processing on 128-bit data paths |
| US9552205B2 (en) * | 2013-09-27 | 2017-01-24 | Intel Corporation | Vector indexed memory access plus arithmetic and/or logical operation processors, methods, systems, and instructions |
| US9851970B2 (en) * | 2014-12-23 | 2017-12-26 | Intel Corporation | Method and apparatus for performing reduction operations on a set of vector elements |
| US9830151B2 (en) * | 2014-12-23 | 2017-11-28 | Intel Corporation | Method and apparatus for vector index load and store |
| GB2543302B (en) * | 2015-10-14 | 2018-03-21 | Advanced Risc Mach Ltd | Vector load instruction |
| US10509726B2 (en) * | 2015-12-20 | 2019-12-17 | Intel Corporation | Instructions and logic for load-indices-and-prefetch-scatters operations |
| US20170177360A1 (en) * | 2015-12-21 | 2017-06-22 | Intel Corporation | Instructions and Logic for Load-Indices-and-Scatter Operations |
| US20170315812A1 (en) * | 2016-04-28 | 2017-11-02 | Microsoft Technology Licensing, Llc | Parallel instruction scheduler for block isa processor |
| WO2018158603A1 (en) * | 2017-02-28 | 2018-09-07 | Intel Corporation | Strideshift instruction for transposing bits inside vector register |
| WO2018189728A1 (en) * | 2017-04-14 | 2018-10-18 | Cerebras Systems Inc. | Floating-point unit stochastic rounding for accelerated deep learning |
| US11481218B2 (en) * | 2017-08-02 | 2022-10-25 | Intel Corporation | System and method enabling one-hot neural networks on a machine learning compute platform |
| US10380063B2 (en) * | 2017-09-30 | 2019-08-13 | Intel Corporation | Processors, methods, and systems with a configurable spatial accelerator having a sequencer dataflow operator |
| GB2568230B (en) * | 2017-10-20 | 2020-06-03 | Graphcore Ltd | Processing in neural networks |
| US11294670B2 (en) * | 2019-03-27 | 2022-04-05 | Intel Corporation | Method and apparatus for performing reduction operations on a plurality of associated data element values |
| US10997116B2 (en) * | 2019-08-06 | 2021-05-04 | Microsoft Technology Licensing, Llc | Tensor-based hardware accelerator including a scalar-processing unit |
| US12086080B2 (en) * | 2020-09-26 | 2024-09-10 | Intel Corporation | Apparatuses, methods, and systems for a configurable accelerator having dataflow execution circuits |
| US12373206B2 (en) * | 2020-12-24 | 2025-07-29 | Intel Corporation | Methods, systems, and apparatuses to optimize cross-lane packed data instruction implementation on a partial width processor with a minimal number of micro-operations |
-
2022
- 2022-06-14 CN CN202210674857.6A patent/CN117289991B/zh active Active
-
2023
- 2023-06-06 KR KR1020247041480A patent/KR20250008533A/ko active Pending
- 2023-06-06 JP JP2024573128A patent/JP2025519635A/ja active Pending
- 2023-06-06 WO PCT/CN2023/098716 patent/WO2023241418A1/zh not_active Ceased
- 2023-06-06 EP EP23822987.6A patent/EP4524729A4/en active Pending
-
2024
- 2024-12-12 US US18/979,402 patent/US12461747B2/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109791488A (zh) * | 2016-10-01 | 2019-05-21 | 英特尔公司 | 用于执行用于复数的融合乘-加指令的系统和方法 |
| CN109992240A (zh) * | 2017-12-29 | 2019-07-09 | 英特尔公司 | 用于多加载和多存储向量指令的方法和装置 |
| CN113849769A (zh) * | 2020-06-27 | 2021-12-28 | 英特尔公司 | 矩阵转置和乘法 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4524729A4 |
Also Published As
| Publication number | Publication date |
|---|---|
| US20250110742A1 (en) | 2025-04-03 |
| US12461747B2 (en) | 2025-11-04 |
| CN117289991A (zh) | 2023-12-26 |
| EP4524729A4 (en) | 2025-12-17 |
| KR20250008533A (ko) | 2025-01-14 |
| JP2025519635A (ja) | 2025-06-26 |
| CN117289991B (zh) | 2025-09-12 |
| EP4524729A1 (en) | 2025-03-19 |
| WO2023241418A9 (zh) | 2024-07-25 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN109284130B (zh) | 神经网络运算装置及方法 | |
| US20240070226A1 (en) | Accelerator for sparse-dense matrix multiplication | |
| CN112506467B (zh) | 用于使用操作的混合精度分解的较高精度计算的计算机处理器 | |
| CN113762490B (zh) | 使用列折叠和挤压的稀疏矩阵的矩阵乘法加速 | |
| CN114625423A (zh) | 用于执行将矩阵变换为行交错格式的指令的系统和方法 | |
| KR101787819B1 (ko) | 정렬 가속화 프로세서들, 방법들, 시스템들 및 명령어들 | |
| EP3623941B1 (en) | Systems and methods for performing instructions specifying ternary tile logic operations | |
| EP3567472B1 (en) | Systems, methods, and apparatuses utilizing an elastic floating-point number | |
| CN114625418A (zh) | 用于执行快速转换片并且将片用作一维向量的指令的系统 | |
| CN110312992A (zh) | 用于片矩阵乘法和累加的系统、方法和装置 | |
| CN109992304A (zh) | 用于加载片寄存器对的系统和方法 | |
| EP3623940A2 (en) | Systems and methods for performing horizontal tile operations | |
| US10437562B2 (en) | Apparatus and method for processing sparse data | |
| CN109992305A (zh) | 用于将片寄存器对归零的系统和方法 | |
| CN114691217A (zh) | 用于8位浮点矩阵点积指令的装置、方法和系统 | |
| EP4462249A2 (en) | Matrix transpose and multiply | |
| CN114721624A (zh) | 用于处理矩阵的处理器、方法和系统 | |
| CN110826722A (zh) | 用于通过排序来生成索引并基于排序对元素进行重新排序的系统、装置和方法 | |
| WO2023077769A1 (zh) | 数据处理方法、装置以及设备和计算机可读存储介质 | |
| CN116097212A (zh) | 用于16比特浮点矩阵点积指令的装置、方法和系统 | |
| CN112149050A (zh) | 用于增强的矩阵乘法器架构的装置、方法和系统 | |
| US12461747B2 (en) | Processor, method, device and storage medium for data processing | |
| US20240037179A1 (en) | Data processing method and apparatus |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23822987 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2024573128 Country of ref document: JP |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023822987 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 20247041480 Country of ref document: KR Kind code of ref document: A |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 1020247041480 Country of ref document: KR |
|
| ENP | Entry into the national phase |
Ref document number: 2023822987 Country of ref document: EP Effective date: 20241212 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |