WO2017185395A1 - 一种用于执行向量比较运算的装置和方法 - Google Patents
一种用于执行向量比较运算的装置和方法 Download PDFInfo
- Publication number
- WO2017185395A1 WO2017185395A1 PCT/CN2016/081115 CN2016081115W WO2017185395A1 WO 2017185395 A1 WO2017185395 A1 WO 2017185395A1 CN 2016081115 W CN2016081115 W CN 2016081115W WO 2017185395 A1 WO2017185395 A1 WO 2017185395A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vector
- vector comparison
- comparison operation
- operation instruction
- instruction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/02—Comparing digital values
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7839—Architectures of general purpose stored program computers comprising a single central processing unit with memory
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30021—Compare instructions, e.g. Greater-Than, Equal-To, MINMAX
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30007—Arrangements for executing specific machine instructions to perform operations on data operands
- G06F9/30036—Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
- G06F9/3016—Decoding the operand specifier, e.g. specifier format
Definitions
- the present disclosure relates to the field of computer instruction operations, and more particularly to an apparatus and method for performing vector comparison operations.
- Vector comparison that is, for two vectors of the same length, the corresponding elements are compared, and the obtained comparison result constitutes a new output vector.
- a restricted Boltzmann machine of an artificial neural network there is a step of sampling a vector composed of a group of neurons, that is, in a vector. Each neuron is compared to a random number, and the value of the neuron is greater than the vector and is taken as 1 and vice versa.
- Another prior art is to perform vector comparison operations on a graphics processing unit (GPU) in which operations are performed by using a general purpose register file and a general purpose stream processing unit to execute general SIMD instructions.
- GPU graphics processing unit
- the GPU on-chip cache is too small, and it is necessary to continuously perform off-chip data transfer when performing large-scale vector comparison operations, and the off-chip bandwidth becomes a main performance bottleneck.
- the present disclosure provides an apparatus and method for performing vector comparison operations, which can support vector comparison operations of arbitrary lengths according to instructions, and also has excellent execution performance.
- the device can perform a series of vector comparison operations according to the instructions, including but not limited to greater than or equal to, greater than, equal to, not equal to, less than, less than or equal to, and can flexibly support vector data of different lengths.
- an apparatus for performing a vector comparison operation including:
- a storage unit configured to store vector data related to the vector comparison operation instruction
- a register unit for storing scalar data related to the vector comparison operation instruction
- control unit configured to decode the vector comparison operation instruction, and control an operation process of the vector comparison operation instruction
- a vector comparison unit configured to perform a vector comparison operation on the two input vector data to be compared according to the decoded vector comparison operation instruction
- the vector comparison unit is a customized hardware circuit.
- the scalar data stored by the register unit includes an input vector start address to be compared, a comparison result output vector storage address, and a length of the input vector to be compared related to the vector comparison operation instruction; wherein the input vector to be compared is The start address and the comparison result output vector storage address are addresses in the storage unit.
- control unit includes:
- the instruction queue module is configured to sequentially store the decoded vector comparison operation instructions, and obtain scalar data related to the vector comparison operation instruction.
- control unit includes:
- the dependency processing unit is configured to determine whether the current vector comparison operation instruction has a dependency relationship with the previously unexecuted operation instruction before the vector comparison unit acquires the current vector comparison operation instruction.
- control unit includes:
- the storage queue module is configured to temporarily store the current vector comparison operation instruction when the current vector comparison operation instruction has a dependency relationship with the previously unexecuted operation instruction, and send the temporarily stored vector comparison operation instruction when the dependency relationship is eliminated. Compare units to vectors.
- the device further includes:
- An instruction cache unit configured to store a vector operation instruction to be executed
- the input/output unit is configured to store the vector data related to the vector comparison operation instruction in the storage unit, or obtain the comparison result output vector of the vector comparison operation instruction from the storage unit.
- the vector comparison operation instruction includes an operation code and an operation domain
- the operation code is used to indicate an execution vector comparison operation
- the operational field includes an immediate value and/or a register number indicating scalar data associated with a vector comparison operation, wherein the register number points to the register unit address.
- the storage unit is a scratch pad memory.
- an apparatus for performing a vector comparison operation comprising:
- the fetch module is configured to take out a vector comparison operation instruction to be executed from the instruction sequence, and transmit the vector comparison operation instruction to the decoding module;
- a decoding module configured to decode the vector comparison operation instruction, and transmit the decoded vector comparison operation instruction to the instruction queue module;
- the instruction queue module is configured to temporarily store the decoded vector comparison operation instruction, and obtain scalar data related to the vector comparison operation instruction from the vector comparison operation instruction or the scalar register; after obtaining the scalar data, the vector comparison operation instruction is obtained Sent to the dependency processing unit;
- a scalar register file including a plurality of scalar registers for storing scalar data associated with vector comparison operations instructions
- a dependency processing unit configured to determine whether there is a dependency relationship between the vector comparison operation instruction and the previously unexecuted operation instruction; if there is a dependency relationship, send the vector comparison operation instruction to the storage queue module, if not If there is a dependency, the vector comparison operation instruction is sent to the vector comparison unit;
- a storage queue module configured to store a vector comparison operation instruction having a dependency relationship with the previous operation instruction, and sending the vector comparison operation instruction to the vector comparison unit after the dependency relationship is released;
- a vector comparison unit configured to perform a vector comparison operation on the input vector data according to the received vector comparison operation instruction
- a scratchpad memory for storing an input vector to be compared and a comparison result output vector
- An input/output access module for directly accessing the scratchpad memory, responsible for reading the input vector to be compared and writing the comparison result output vector from the scratchpad memory.
- the vector comparison unit is a customized hardware circuit.
- a method for performing a vector comparison operation comprising:
- the instruction fetch module takes the next vector comparison operation instruction to be executed from the instruction sequence, and transmits the vector comparison operation instruction to the decoding module;
- the decoding module decodes the vector comparison operation instruction, and transmits the decoded vector comparison operation instruction to the instruction queue module;
- the instruction queue module temporarily stores the decoded vector comparison operation instruction, and obtains scalar data related to the vector comparison instruction operation from the vector comparison operation instruction or the scalar register; after obtaining the scalar data, sending the vector comparison operation instruction to the dependency Relationship processing unit;
- the dependency processing unit determines whether there is a dependency relationship between the vector comparison operation instruction and the previously unexecuted operation instruction; if there is a dependency relationship, the vector comparison operation instruction is sent to the storage queue module if there is no dependency relationship Sending the vector comparison operation instruction to the vector comparison unit;
- the storage queue module stores a vector comparison operation instruction having a dependency relationship with the previous operation instruction, and after the dependency relationship is released, sends the vector comparison operation instruction to the vector comparison unit;
- the vector comparison unit extracts the input vector to be compared from the scratchpad memory through the input/output access module according to the received vector comparison operation instruction, and then performs a vector comparison operation on the comparison input vector, and writes the comparison result output vector into the cache. Memory.
- the apparatus and method for performing vector comparison operations provided by the present disclosure implement a complete process of a reduced vector comparison operation instruction by a customized hardware circuit, that is, a vector comparison operation can be realized by a simplified vector comparison instruction.
- the present disclosure also enables the vector data participating in the calculation to be temporarily stored in the scratch pad memory (Scratchpad Memory), so that the operation process can more flexibly and effectively support different widths.
- the volume data, while the customized vector comparison unit can implement various comparison operations more efficiently, the instructions used in the present disclosure have a compact format, making the instructions easy to use.
- the present disclosure can be applied to the following scenarios (including but not limited to): data processing, robots, computers, printers, scanners, phones, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, cloud servers , cameras, camcorders, projectors, watches, earphones, mobile storage, wearable devices and other electronic products; aircraft, ships, vehicles and other types of transportation; televisions, air conditioners, microwave ovens, refrigerators, rice cookers, humidifiers, washing machines, Electric lights, gas stoves, range hoods and other household appliances; and including nuclear magnetic resonance instruments, B-ultrasound, electrocardiograph and other medical equipment.
- FIG. 1 is a schematic structural diagram of an apparatus for performing vector comparison operations provided by the present disclosure.
- FIG. 2 is a schematic diagram of a format of a vector comparison original instruction provided by the present disclosure.
- FIG. 3 is a schematic structural diagram of an apparatus for performing vector comparison operation provided by an embodiment of the present disclosure.
- FIG. 4 is a flowchart of a vector comparison operation device performing vector comparison original instruction according to an embodiment of the present disclosure.
- the vector comparison operation apparatus includes:
- the storage unit may be a scratch pad memory (Scratchpad Memory) capable of supporting vector data of different sizes; the present disclosure will calculate the necessary data.
- the data is temporarily stored in the scratch pad memory (Scratchpad Memory), which enables the computing device to support different widths of data in a more flexible and efficient manner during vector operations.
- the vector data related to the vector comparison operation instruction includes input vector data to be compared and comparison result output vector data; the present disclosure temporarily stores the vector data participating in the operation on the scratch pad memory, so that the vector operation process can be more flexible and effective. Support for data of different widths.
- the scratchpad memory can be implemented by a variety of different memory devices such as SRAM, DRAM, eDRAM, memristor, 3D-DRAM, and nonvolatile memory.
- a register unit configured to store scalar data related to the vector comparison operation instruction, where the scalar data includes a start address and a length of the input vector data to be compared, a comparison result output vector data storage address, and other related parameters, wherein the vector is The address of the input vector data and the storage address of the output vector data are the addresses of the vector stored in the memory unit; in one embodiment, the register unit may be a scalar register file, providing a scalar register required for the operation, and the scalar register is not only The vector storage address is stored, and other scalar data is also stored.
- the control unit is configured to decode the vector comparison operation instruction and control the execution process of the vector comparison operation instruction; and control the execution process of the vector comparison operation instruction mainly by controlling the behavior of each module in the control device.
- the control unit reads the prepared instruction, performs decoding to generate a control signal, and transmits it to other modules in the device, and other modules perform corresponding operations according to the obtained control signal.
- a vector comparison unit that implements a specified comparison operation on input vector data in accordance with an instruction.
- This unit is a vector operation unit and performs the same operation on all input data.
- the vector comparison unit obtains the start address and length of the two to-be-compared vectors according to the vector comparison operation instruction, acquires two to-be-compared vectors from the storage unit, and compares the corresponding elements in the two to-be-compared vectors, when the condition When satisfied, the corresponding position of the comparison result output vector is set to 1, otherwise it is set to 0, and the comparison result is obtained.
- the vector comparison unit in the present disclosure is a customized hardware circuit, including but not limited to an FPGA, a CGRA, an application specific integrated circuit ASIC, an analog circuit, a memristor, etc.; the vector comparison unit cooperates with other modules in the device , can complete the comparison operation of any length vector.
- the present disclosure provides a vector comparison operation device that controls a manner of a specifically performed comparison operation and an address and a length of vector data by an instruction
- the arithmetic device mainly includes a storage unit, a register unit, a control unit, and a comparison operation unit.
- a vector is stored in the storage unit
- a vector storage address and other scalar parameters are stored in the register unit
- the control unit performs a decoding operation
- each module is controlled according to the instruction
- the comparison operation unit acquires a vector in the instruction or from the register unit according to the instruction.
- the present disclosure temporarily stores the vector data participating in the calculation on the scratchpad memory, so that the vector data of different widths can be more flexibly and effectively supported in the operation process.
- the vector comparison operation device further includes: an instruction buffer unit, configured to store a vector comparison operation instruction to be executed.
- the vector compare original instruction is also cached in the instruction cache unit during execution. When an instruction is executed, the instruction will be submitted.
- the vector comparison operation instruction includes an operation code and a plurality of operation fields, wherein the operation code is used to indicate which vector comparison operation is performed. If the comparison operation is greater than or equal to, greater than, equal to, not equal to, less than, and less than or equal to; the operation domain is used to store scalar data related to the vector comparison operation, including an immediate value and a register number, the register number pointing to a specific register unit; The immediate and register unit is configured to store a start address and a length of the vector to be compared, a storage address of the output vector, etc., according to the instruction, the device may obtain the instruction related directly from the instruction or by accessing the register number provided by the instruction. Scalar data. The start address of the vector to be compared and the storage address of the output vector are all addresses in the storage unit.
- control unit of the vector comparison apparatus further includes: an instruction queue module, configured to sequentially store the decoded vector comparison operation instructions, and perform vector comparison operations
- the operation field in the instruction acquires the scalar data related to the vector comparison operation instruction, such as the start address and length of the vector to be compared, etc., and fills it into the vector comparison operation instruction and sends it to the dependency processing unit.
- the control unit of the apparatus further includes: a dependency processing unit, configured to determine, before the vector comparison operation unit acquires the vector comparison original instruction, the vector comparison operation instruction and the previously unexecuted operation instruction Whether there is a dependency relationship, such as whether to access the same vector storage address, and if so, the vector comparison operation instruction is stored in the storage queue module, and the storage queue module compares the vector after the execution of the operation instruction with the dependency relationship is completed.
- the instruction is supplied to the vector comparison operation unit; otherwise, the vector comparison operation instruction is directly supplied to the vector comparison operation unit.
- the front and back instructions may access the same block of storage space. To ensure the correctness of the instruction execution result, if the current instruction is detected to have a dependency relationship with the data of the previous instruction, The instruction must wait in the store queue until the dependency is removed.
- control unit of the apparatus further includes: a storage queue module, the module includes an ordered queue, and an instruction having a dependency on the data in the previous instruction is stored in the ordered queue until The dependency is eliminated, and after the dependency is eliminated, it provides the operation instruction to the vector comparison operation unit.
- the apparatus further includes: an input and output unit configured to store the vector in the storage unit, or obtain a vector comparison operation result from the storage unit.
- the input and output unit can directly store the unit, and is responsible for reading vector data or writing vector data from the memory.
- the instruction design of the device is in a simplified manner, and an instruction can perform a complete vector comparison operation.
- the device fetches the instruction for decoding, and then sends it to the instruction queue for storage. According to the decoding result, each parameter in the instruction is obtained, and the parameters may be directly written in the operation domain of the instruction (ie, In immediate data, it can also be read from the specified register according to the register number in the instruction operation field.
- the advantage of using register storage parameters is that there is no need to change the instruction itself. As long as the value in the register is changed by the instruction, most of the loops can be realized, thus greatly saving the number of instructions required to solve some practical problems.
- the dependency processing unit determines whether the data actually needed by the instruction has a dependency relationship with the previous instruction, which determines whether the instruction can be immediately sent to the execution unit. Once a dependency is found between the previous data and the previous data, the instruction must wait until the instruction it depends on has been executed before it can be sent to the arithmetic unit for execution. In a custom arithmetic unit, the instruction will be executed quickly, and the result, that is, the generated random vector, is written back to the address provided by the instruction, and the instruction is executed.
- the device can execute the following vector comparison operation instructions:
- the operation instruction (GE) is greater than or equal to, according to the instruction, the device can obtain the parameter of the instruction directly from the instruction or by accessing the register number provided by the instruction, including the length of the vector, the start address of the two vectors, and the storage address of the output vector. Then read the two vector data and compare the vector in the vector comparison operation unit The elements in all positions are compared. If the value of the previous vector of a certain position line is greater than or equal to the value of the latter vector, the value of the comparison result vector at this position is set to 1, otherwise it is set to 0. Finally, the comparison result is written back to the specified storage address of the scratch pad memory.
- the device can obtain the parameters of the instruction directly from the instruction or by accessing the register number provided by the instruction, including the length of the vector, the start address of the two vectors, and the storage address of the output vector. Then, the two vector data are read, and the elements in all positions in the vector are compared in the vector comparison operation unit. If the value of the previous vector of a certain position line is less than or equal to the value of the latter vector, the comparison result vector is at the position. The value is set to 1, otherwise it is set to 0. Finally, the comparison result is written back to the specified storage address of the value scratch pad memory.
- the device can obtain the parameters of the instruction directly from the instruction or by accessing the register number provided by the instruction, including the length of the vector, the start address of the two vectors, and the storage address of the output vector, and then Reading the two vector data, comparing the elements at all positions in the vector in the vector comparison operation unit, if the value of the previous vector of a certain position line is greater than the value of the latter vector, the value of the result vector at the position is compared Set to 1, otherwise set to 0. Finally, the comparison result is written back to the specified storage address of the value scratch pad memory.
- GT operation instruction
- the device can obtain the parameters of the instruction directly from the instruction or by accessing the register number provided by the instruction, including the length of the vector, the start address of the two vectors, and the storage address of the output vector, and then Reading the two vector data, comparing the elements at all positions in the vector in the vector comparison operation unit, if the value of the previous vector of a certain position line is smaller than the value of the latter vector, the value of the result vector at the position is compared Set to 1, otherwise set to 0. Finally, the comparison result is written back to the specified storage address of the value scratch pad memory.
- Equal to the operation instruction (EQ), according to which the device can obtain the parameters of the instruction directly from the instruction or by accessing the register number provided by the instruction, including the length of the vector, the start address of the two vectors, and the storage address of the output vector, and then Reading the two vector data, comparing the elements at all positions in the vector in the vector comparison operation unit, if the value of the previous vector of a certain position line is equal to the value of the latter vector, the value of the result vector at the position is compared Set to 1, otherwise set to 0. Finally, the comparison result is written back to the specified storage address of the value scratch pad memory.
- the device can obtain the parameters of the instruction directly from the instruction or by accessing the register number provided by the instruction, including the length of the vector, the start address of the two vectors, and the storage address of the output vector. Then, the two vector data are read, and the elements in all positions in the vector are compared in the vector comparison operation unit. If the value of the previous vector of a certain position line is not equal to the value of the latter vector, the comparison result vector is at the position. The value is set to 1, otherwise it is set to 0. Finally, the comparison result is written back to the specified storage address of the value scratch pad memory.
- UEQ operation instruction
- FIG. 3 is a schematic structural diagram of an apparatus for performing a vector comparison operation according to an embodiment of the present disclosure.
- the apparatus includes an instruction module, a decoding module, an instruction queue module, a scalar register file, and a dependency processing.
- Unit storage queue module, vector comparison operation unit, cache register, IO memory access module;
- the fetch module which is responsible for fetching the next instruction to be executed from the instruction sequence and passing the instruction to the decoding module;
- the module is responsible for decoding the instruction, and transmitting the decoded instruction to the instruction queue;
- the instruction queue module is used to temporarily store the instruction obtained from the decoding module, and obtain the corresponding data of the instruction operation from the instruction or the scalar register, including the starting address and size of the vector data and some scalar constants. After obtaining the data, the instruction is sent to the dependency processing unit;
- a dependency processing unit that handles storage dependencies that may exist between vector comparison operations instructions and previously unexecuted instructions.
- the vector compare operation instruction accesses the scratch pad memory to obtain the vector to be compared, and the front and back instructions may access the same block of memory.
- the instruction is sent to the storage queue module until the dependency is eliminated. That is, whether the storage section of the input data for detecting the instruction of this instruction overlaps with the storage section of the output data of the instruction that has not been executed before, and the storage section is determined by the start address and the data length. If there is overlap, it means that this instruction actually needs the execution result of the previous instruction as input, so it must wait until the instruction is executed before the instruction can start execution. In this process, the instructions are actually temporarily stored in the storage queue module.
- the storage queue module which is an ordered queue, in which instructions related to the previous instruction on the data are stored in the queue until the storage relationship is eliminated; the vector comparison operation instruction after the dependency is eliminated is sent to the vector comparison Arithmetic unit
- a vector comparison operation unit wherein the unit is responsible for performing a comparison operation of the two comparison vectors, including a comparison operation greater than or equal to, greater than, less than or equal to, less than, equal to, and not equal to; the vector comparison operation unit is implemented by a customized hardware circuit;
- the module is a vector data dedicated temporary storage device capable of supporting vector data of different sizes;
- the cache register is mainly used for storing vector data to be compared and comparison operation result vector data;
- IO memory access module which is used to directly access the scratchpad memory and is responsible for reading data or writing data from the scratchpad memory.
- FIG. 4 is a flowchart of a vector comparison operation device performing a vector comparison operation instruction according to an embodiment of the present disclosure. As shown in FIG. 4, the process of executing a vector comparison operation instruction includes:
- the fetch module extracts the vector comparison operation instruction, and sends the vector comparison operation instruction to the decoding module.
- the decoding module decodes the vector comparison operation instruction, and sends the vector comparison operation instruction to the instruction queue module.
- the vector comparison operation instruction is sent to the dependency processing unit.
- the dependency processing unit analyzes whether the vector comparison operation instruction has a dependency on the data with the previous instruction that has not been executed. If there is a dependency, the vector comparison operation instruction is sent to the storage queue until it waits until the previous unexecuted instruction no longer has a dependency on the data. If there is no dependency, the vector comparison instruction is sent to the vector comparison operation unit.
- the vector comparison operation unit extracts a part of the two to-be-compared comparison vector data to be compared from the cache according to the start address and the length of the two to-be-compared vectors in the vector comparison operation instruction.
- the vector comparison unit simultaneously compares the elements at all positions in a part of the two compared comparison vector data, and when the two elements at a certain position are equal, the corresponding position of the output result is 1 or 0.
- step S8 the vector comparison unit continues to take out the next part of the two to-be-compared vector data for comparison until the operation of whether the two comparison vectors are equal.
- the present disclosure provides a vector comparison operation device, which cooperates with corresponding instructions, and can better solve more and more comparison operations for vectors in the current computer field.
- the present disclosure can have the advantages of simple instruction, convenient use, flexible vector length support, and sufficient on-chip buffering.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computer Hardware Design (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Algebra (AREA)
- Databases & Information Systems (AREA)
- Complex Calculations (AREA)
- Advance Control (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
一种用于执行向量比较运算的装置和方法。所述装置包括:存储单元,用于存储向量比较运算指令相关的向量数据;寄存器单元,用于存储向量比较运算指令相关的标量数据;控制单元,用于对向量比较运算指令进行译码,并控制向量比较运算指令的运算过程;向量比较单元,用于根据译码后的向量比较运算指令,对两待比较输入向量数据进行向量比较操作;其中,所述向量比较单元为定制的硬件电路。所述用于执行向量比较运算的装置和方法,通过定制的硬件电路实现了精简向量比较运算指令的完整过程,即通过一条精简的向量比较指令即可实现向量比较运算。
Description
本公开涉及计算机指令运算领域,尤其涉及一种用于执行向量比较运算的装置和方法。
向量比较,即针对两长度一致的向量,对应的元素之间进行比较,得到的比较结果构成新的输出向量。深度学习领域存在着针对两向量进行大小比较操作的需求,在人工神经网络的受限玻尔兹曼机中,即存在这样的步骤,要求对一组神经元组成的向量进行采样,即将向量中每一个神经元与一个随机数进行比较,神经元的值大于该向量就取1反之取0。又比如将一组32位单精度浮点数转换值16位的半精度浮点数,如果选择随机进位的方法,则需要将截断部分与满足某分布的随机数进行比较,大于该随机数选择进1,这同样要求对两向量数据进行比较操作。
在现有技术中,一种最常用的实现向量比较操作的方法是在通用CPU上逐个比较,该方法执行效率低。
另一种现有技术是在图形处理器(GPU)上进行向量比较操作,其中,通过使用通用寄存器堆和通用流处理单元执行通用SIMD指令来进行运算。但在上述方案中,GPU片上缓存太小,在进行大规模向量比较运算时需要不断进行片外数据搬运,片外带宽成为了主要性能瓶颈。
综上所述,现有的不管是通用处理器还是图形处理器,均不能高效处理大规模的向量比较运算。
发明内容
有鉴于此,本公开提供了一种执行向量比较运算的装置和方法,根据指令可以支持任意长度的向量比较运算,同时还具有优异的执行性能。该装置根据指令可以执行一系列向量比较运算,包括但不限于大于等于、大于、等于、不等于、小于、小于等于,可以灵活支持不同长度的向量数据。
根据本公开一方面,提供了一种用于执行向量比较运算的装置,包括:
存储单元,用于存储向量比较运算指令相关的向量数据;
寄存器单元,用于存储向量比较运算指令相关的标量数据;
控制单元,用于对向量比较运算指令进行译码,并控制向量比较运算指令的运算过程;
向量比较单元,用于根据译码后的向量比较运算指令,对两待比较输入向量数据进行向量比较操作;
其中,所述向量比较单元为定制的硬件电路。
可选地,所述寄存器单元所存储的标量数据包括向量比较运算指令相关的待比较输入向量起始地址、比较结果输出向量存储地址、待比较输入向量长度;其中,所述待比较输入向量的起始地址以及比较结果输出向量存储地址为所述存储单元中的地址。
可选地,所述控制单元包括:
指令队列模块,用于对译码后的向量比较运算指令进行顺序存储,并获取向量比较运算指令相关的标量数据。
可选地,所述控制单元包括:
依赖关系处理单元,用于在向量比较单元获取当前向量比较运算指令前,判断当前向量比较运算指令与之前未执行完的运算指令是否存在依赖关系。
可选地,所述控制单元包括:
存储队列模块,用于在当前向量比较运算指令与之前未执行完的运算指令存在依赖关系时,暂时存储当前向量比较运算指令,并且在该依赖关系消除时,将暂存的向量比较运算指令送往向量比较单元。
可选地,所述装置还包括:
指令缓存单元,用于存储待执行的向量运算指令;
输入输出单元,用于将向量比较运算指令相关的向量数据存储于存储单元,或者,从存储单元中获取向量比较运算指令的比较结果输出向量。
可选地,所述向量比较运算指令包括操作码和操作域;
所述操作码用于指示执行向量比较操作;
所述操作域包括立即数和/或寄存器号,指示向量比较运算相关的标量数据,其中寄存器号指向所述寄存器单元地址。
可选地,所述存储单元为高速暂存存储器。
根据本公开第二方面,提供了一种用于执行向量比较运算的装置,包括:
取指模块,用于从指令序列中取出下一条要执行的向量比较运算指令,并将该向量比较运算指令传给译码模块;
译码模块,用于对该向量比较运算指令进行译码,并将译码后的向量比较运算指令传送给指令队列模块;
指令队列模块,用于暂存译码后的向量比较运算指令,并从向量比较运算指令或标量寄存器获得向量比较运算指令相关的标量数据;获得所述标量数据后,将所述向量比较运算指令送至依赖关系处理单元;
标量寄存器堆,包括多个标量寄存器,用于存储向量比较运算指令相关的标量数据;
依赖关系处理单元,用于判断所述向量比较运算指令与之前未执行完的运算指令之间是否存在依赖关系;如果存在依赖关系,则将所述向量比较运算指令送至存储队列模块,如果不存在依赖关系,则将所述向量比较运算指令送至向量比较单元;
存储队列模块,用于存储与之前运算指令存在依赖关系的向量比较运算指令,并且在所述依赖关系解除后,将所述向量比较运算指令送至向量比较单元;
向量比较单元,用于根据接收到向量比较运算指令对输入向量数据进行向量比较操作;
高速暂存存储器,用于存储待比较输入向量和比较结果输出向量;
输入输出存取模块,用于直接访问所述高速暂存存储器,负责从所述高速暂存存储器中读取待比较输入向量和写入比较结果输出向量。
可选地,所述向量比较单元为定制的硬件电路。
根据本公开第三方面,提供了一种用于执行向量比较运算方法,该方法包括:
取指模块从指令序列中取出下一条要执行的向量比较运算指令,并将该向量比较运算指令传给译码模块;
译码模块对该向量比较运算指令进行译码,并将译码后的向量比较运算指令传送给指令队列模块;
指令队列模块暂存译码后的向量比较运算指令,并从向量比较运算指令或标量寄存器获得向量比较指令运算相关的标量数据;获得所述标量数据后,将所述向量比较运算指令送至依赖关系处理单元;
依赖关系处理单元判断所述向量比较运算指令与之前未执行完的运算指令之间是否存在依赖关系;如果存在依赖关系,则将所述向量比较运算指令送至存储队列模块,如果不存在依赖关系,则将所述向量比较运算指令送至向量比较单元;
存储队列模块存储与之前运算指令存在依赖关系的向量比较运算指令,并且在所述依赖关系解除后,将所述向量比较运算指令送至向量比较单元;
向量比较单元根据接收到的向量比较运算指令,通过输入输出存取模块从高速暂存存储器取出待比较输入向量,然后对待比较输入向量进行向量比较运算,并将比较结果输出向量写入高速暂存存储器。
本公开提供的用于执行向量比较运算的装置和方法,通过定制的硬件电路实现了精简向量比较运算指令的完整过程,即通过一条精简的向量比较指令即可实现向量比较运算。本公开还通过将参与计算的向量数据暂存在高速暂存存储器上(Scratchpad Memory),使得运算过程中可以更加灵活有效地支持不同宽度的向
量数据,同时定制的向量比较单元能够更加高效地实现各种比较运算,本公开采用的指令具有精简的格式,使得指令使用方便。
本公开可以应用于以下场景中(包括但不限于):数据处理、机器人、电脑、打印机、扫描仪、电话、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备等各类电子产品;飞机、轮船、车辆等各类交通工具;电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机等各类家用电器;以及包括核磁共振仪、B超、心电图仪等各类医疗设备。
图1是本公开提供的用于执行向量比较运算的装置的结构示意图。
图2是本公开提供的向量比较原始指令的格式示意图。
图3是本公开实施例提供的用于执行向量比较运算装置的结构示意图。
图4是本公开实施例提供的向量比较运算装置执行向量比较原始指令的流程图。
为使本公开的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本公开作进一步的详细说明。
图1是本公开提供的一种用于执行向量比较运算的装置的结构示意图,如图1所示,向量比较运算装置包括:
存储单元,用于存储向量比较运算指令相关的向量数据,在一种实施方式中,该存储单元可以是高速暂存存储器(Scratchpad Memory),能够支持不同大小的向量数据;本公开将必要的计算数据暂存在高速暂存存储器上(Scratchpad Memory),使本运算装置在进行向量运算过程中可以更加灵活有效地支持不同宽度的数据。所述向量比较运算指令相关的向量数据包括待比较的输入向量数据和比较结果输出向量数据;本公开将参与运算的向量数据暂存在高速暂存存储器上,使得向量运算过程中可以更加灵活有效地支持不同宽度的数据。所述高速暂存存储器可以通过各种不同存储器件如SRAM、DRAM、eDRAM、忆阻器、3D-DRAM和非易失存储等实现。
寄存器单元,用于存储向量比较运算指令相关的标量数据,所述标量数据包括待比较的输入向量数据的起始地址及长度、比较结果输出向量数据存储地址以及其他相关参数,其中,向量所述输入向量数据的地址和输出向量数据的存储地址为向量在存储单元中存储的地址;在一种实施方式中,寄存器单元可以是标量寄存器堆,提供运算过程中所需的标量寄存器,标量寄存器不只存放向量存储地址,还存放有其他的标量数据。
控制单元,用于对向量比较运算指令进行译码,并控制向量比较运算指令的执行过程;其主要通过控制装置中各个模块的行为实现对向量比较运算指令的执行过程的控制。在一种实施方式中,控制单元读取准备好的指令,进行译码生成控制信号,发送给装置中的其他模块,其他模块根据得到的控制信号执行相应的操作。
向量比较单元,该单元根据指令实现对输入向量数据的指定比较运算。该单元是向量运算单元,同时对所有输入数据进行相同的运算。在一实施例中,向量比较单元根据向量比较运算指令得到两待比较向量的起始地址和长度,从存储单元获取两待比较向量,并对两待比较向量中对应的元素进行比较,当条件满足时将比较结果输出向量的相应位置置1,否则置0,得到比较结果。本公开中所述向量比较单元为定制的硬件电路,包括但不限于FPGA、CGRA、专用集成电路ASIC、模拟电路和忆阻器等;所述向量比较单元通过与该装置中的其它模块相互协作,能够完成任意长度向量的比较运算。
本公开提供一种向量比较运算装置,通过指令来控制具体执行的比较操作的方式以及向量数据的地址和长度,运算装置主要包括了存储单元、寄存器单元、控制单元和比较运算单元。存储单元中存储有向量,寄存器单元中存储有向量存储地址和其他标量参数,控制单元执行译码操作,根据指令控制各个模块,而比较运算单元根据指令在指令中或从寄存器单元中获取向量的长度、地址和其他参数,然后,根据该地址和长度在存储单元中获取相应的向量数据,接着,对向量执行比较运算,依指令的不同,可以执行大于等于、大于、等于、不等于、小于和小于等于等比较运算。本公开将参与计算的向量数据暂存在高速暂存存储器上,使得运算过程中可以更加灵活有效地支持不同宽度的向量数据。
根据本公开的一种实施方式,所述向量比较运算装置还包括:指令缓存单元,用于存储待执行的向量比较运算指令。向量比较原始指令在执行过程中,同时也被缓存在指令缓存单元中,当一条指令执行完之后,该指令将被提交。
图2示出了本公开中向量比较运算指令的格式示意图,如图2所示,所述向量比较运算指令包含操作码和多个操作域,其中操作码用于指示进行何种向量比较运算,如大于等于、大于、等于、不等于、小于和小于等于等比较运算;而操作域用于存储向量比较运算相关的标量数据,包括立即数和寄存器号,所述寄存器号指向具体的寄存器单元;所述立即数和寄存器单元用于存储待比较向量的起始地址及长度、输出向量的存储地址等,根据该指令,装置可以直接从指令中或者通过访问指令提供的寄存器号来获得指令相关的标量数据。所述待比较向量的起始地址和输出向量的存储地址均为存储单元中的地址。
根据本公开的一种实施方式,所述向量比较装置的控制单元还包括:指令队列模块,用于对译码后的向量比较运算指令进行顺序存储,其通过向量比较运算
指令中的操作域获取向量比较运算指令相关的标量数据,如待比较向量起始地址和长度等,将其填充至向量比较运算指令后将其送往依赖关系处理单元。
根据本公开的一种实施方式,所述装置的控制单元还包括:依赖关系处理单元,用于在向量比较运算单元获取向量比较原始指令前,判断该向量比较运算指令与之前未执行的运算指令是否存在依赖关系,如是否访问相同的向量存储地址,若是,则将该向量比较运算指令存储在存储队列模块中,待与其存在依赖关系的运算指令执行完毕后,存储队列模块将该向量比较运算指令提供给所述向量比较运算单元;否则,直接将该向量比较运算指令提供给所述向量比较运算单元。具体地,向量比较运算指令访问高速暂存存储器时,前后指令可能会访问同一块存储空间,为了保证指令执行结果的正确性,当前指令如果被检测到与之前的指令的数据存在依赖关系,该指令必须在存储队列内等待至依赖关系被消除。
根据本公开的一种实施方式,所述装置的控制单元还包括:存储队列模块,该模块包括一个有序队列,与之前指令在数据上有依赖关系的指令被存储在该有序队列内直至依赖关系被消除,在依赖关系消除后,其将运算指令提供给向量比较运算单元。
根据本公开的一种实施方式,所述装置还包括:输入输出单元,用于将向量存储于存储单元,或者,从存储单元中获取向量比较运算结果。其中,输入输出单元可直接存储单元,负责从内存中读取向量数据或写入向量数据。
根据本公开的一种实施方式,本装置的指令设计采用精简化的方式,一条指令可以完成一次完整的向量比较运算。
在本装置执行向量比较运算的过程中,装置取出指令进行译码,然后送至指令队列存储,根据译码结果,获取指令中的各个参数,这些参数可以是直接写在指令的操作域(即立即数)中,也可以是根据指令操作域中的寄存器号从指定的寄存器中读取。这种使用寄存器存储参数的好处是无需改变指令本身,只要用指令改变寄存器中的值,就可以实现大部分的循环,因此大大节省了在解决某些实际问题时所需要的指令条数。在全部操作数之后,依赖关系处理单元会判断指令实际需要使用的数据与之前指令中是否存在依赖关系,这决定了这条指令是否可以被立即发送至运算单元中执行。一旦发现与之前的数据之间存在依赖关系,则该条指令必须等到它依赖的指令执行完毕之后才可以送至运算单元执行。在定制的运算单元中,该条指令将快速执行完毕,并将结果,即生成的随机向量写回至指令提供的地址,该条指令执行完毕。
本装置可以执行下列几种向量比较运算指令:
大于等于运算指令(GE),根据该指令,装置可以直接从指令中或者通过访问指令提供的寄存器号来获得指令的参数,包括向量的长度、两向量的起始地址以及输出向量的存储地址,然后读取两向量数据,在向量比较运算单元中对向量
中所有位置上的元素进行比较,若某位置行前一向量的值大于等于后一向量的值,则将比较结果向量在该位置上的值置为1,否则置为0。最后将比较结果写回至高速暂存存储器的指定存储地址。
小于等于运算指令(LE),根据该指令,装置可以直接从指令中或者通过访问指令提供的寄存器号来获得指令的参数,包括向量的长度、两向量的起始地址以及输出向量的存储地址,然后读取两向量数据,在向量比较运算单元中对向量中所有位置上的元素进行比较,若某位置行前一向量的值小于等于后一向量的值,则将比较结果向量在该位置上的值置为1,否则置为0。最后将比较结果写回值高速暂存存储器的指定存储地址。
大于运算指令(GT),根据该指令,装置可以直接从指令中或者通过访问指令提供的寄存器号来获得指令的参数,包括向量的长度、两向量的起始地址以及输出向量的存储地址,然后读取两向量数据,在向量比较运算单元中对向量中所有位置上的元素进行比较,若某位置行前一向量的值大于后一向量的值,则将比较结果向量在该位置上的值置为1,否则置为0。最后将比较结果写回值高速暂存存储器的指定存储地址。
小于运算指令(LT),根据该指令,装置可以直接从指令中或者通过访问指令提供的寄存器号来获得指令的参数,包括向量的长度、两向量的起始地址以及输出向量的存储地址,然后读取两向量数据,在向量比较运算单元中对向量中所有位置上的元素进行比较,若某位置行前一向量的值小于后一向量的值,则将比较结果向量在该位置上的值置为1,否则置为0。最后将比较结果写回值高速暂存存储器的指定存储地址。
等于运算指令(EQ),根据该指令,装置可以直接从指令中或者通过访问指令提供的寄存器号来获得指令的参数,包括向量的长度、两向量的起始地址以及输出向量的存储地址,然后读取两向量数据,在向量比较运算单元中对向量中所有位置上的元素进行比较,若某位置行前一向量的值等于后一向量的值,则将比较结果向量在该位置上的值置为1,否则置为0。最后将比较结果写回值高速暂存存储器的指定存储地址。
不等于运算指令(UEQ),根据该指令,装置可以直接从指令中或者通过访问指令提供的寄存器号来获得指令的参数,包括向量的长度、两向量的起始地址以及输出向量的存储地址,然后读取两向量数据,在向量比较运算单元中对向量中所有位置上的元素进行比较,若某位置行前一向量的值不等于后一向量的值,则将比较结果向量在该位置上的值置为1,否则置为0。最后将比较结果写回值高速暂存存储器的指定存储地址。
为使本公开的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本公开进一步详细说明。
图3是本公开一实施例提供的用于执行向量比较运算的装置的结构示意图,如图3所示,该装置包括取指模块、译码模块、指令队列模块、标量寄存器堆、依赖关系处理单元、存储队列模块、向量比较运算单元、高速暂存器、IO内存存取模块;
取指模块,该模块负责从指令序列中取出下一条将要执行的指令,并将该指令传给译码模块;
译码模块,该模块负责对指令进行译码,并将译码后指令传给指令队列;
指令队列模块,该模块用于暂存从译码模块获得的指令,并从指令或标量寄存器获得指令运算相应的数据,包括向量数据的起始地址和大小以及一些标量常数。获得数据后,指令被送至依赖关系处理单元;
标量寄存器堆,提供运算过程中所需的标量寄存器;
依赖关系处理单元,该单元用于处理向量比较运算指令与之前未执行完的指令可能存在的存储依赖关系。向量比较运算指令会访问高速暂存存储器以获取待比较向量,前后指令可能会访问同一块存储空间。为了保证指令执行结果的正确性,当前指令如果被检测到与之前的指令的数据存在依赖关系,该指令被送至存储队列模块内等待至依赖关系被消除。即检测本条指令的输入数据的存储区间与之前没有执行完毕的指令的输出数据的存储区间是否有重叠,存储区间是由起始地址和数据长度决定的。如果有重叠,则说明本条指令实际上是需要之前指令的执行结果作为输入的,因此必须等到那条指令执行完毕后,这条指令才能开始执行。在这个过程中,指令实际被暂存在存储队列模块中。
存储队列模块,该模块是一个有序队列,与之前指令在数据上有依赖关系的指令被存储在该队列内直至存储关系被消除;依赖关系被消除后的向量比较运算指令被送往向量比较运算单元;
向量比较运算单元,该单元负责执行两待比较向量的比较操作,包括大于等于、大于、小于等于、小于、等于、不等于的比较运算;该向量比较运算单元为定制的硬件电路实现;
高速暂存器,该模块是向量数据专用的暂存存储装置,能够支持不同大小的向量数据;所述高速暂存器主要用于存储待比较向量数据和比较运算结果向量数据;
IO内存存取模块,该模块用于直接访问高速暂存存储器,负责从高速暂存存储器中读取数据或写入数据。
图4是本公开一实施例提供的向量比较运算装置执行向量比较运算指令的流程图,如图4所示,执行向量比较运算指令的过程包括:
S1,取指模块取出该条向量比较运算指令,并将该向量比较运算指令送往译码模块。
S2,译码模块对该向量比较运算指令译码,并将该向量比较运算指令送往指令队列模块。
S3,在指令队列模块中,从向量比较运算指令本身或从标量寄存器堆中获取向量比较运算指令中操作域所对应的标量数据,包括两待比较输入向量的起始地址、输入向量长度、输出向量地址。
S4,在取得需要的标量数据后,该向量比较运算指令被送往依赖关系处理单元。
S5,依赖关系处理单元分析该向量比较运算指令与前面的尚未执行结束的指令在数据上是否存在依赖关系。若存在依赖关系,则该条向量比较运算指令被送往存储队列中等待至其与前面的未执行结束的指令在数据上不再存在依赖关系为止。若不存在依赖关系,则该条向量比较原始指令被送往向量比较运算单元。
S6,向量比较运算单元根据向量比较运算指令中的两待比较向量的起始地址和长度从高速暂存器中取出需比较的两待比较向量数据的一部分。
S7,向量比较单元同时比较所取出的两待比较向量数据的一部分中所有位置上的元素,当某位置上的两元素相等时,置输出结果的相应位置为1或0。
S8,转步骤S6,向量比较单元继续取出两待比较向量数据的下一部分进行比较,直至完成两待比较向量是否相等的运算。
S9,运算完成后,将结果向量写回至高速暂存存储器的指定地址。
综上所述,本公开提供向量比较运算装置,配合相应的指令,能够很好地解决当前计算机领域越来越多的针对向量的比较操作。相比于已有的传统解决方案,本公开可以具有指令精简、使用方便、支持的向量长度灵活、片上缓存充足等优点。
以上所述的具体实施例,对本公开的目的、技术方案和有益效果进行了进一步详细说明,应理解的是,以上所述仅为本公开的具体实施例而已,并不用于限制本公开,凡在本公开的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。
Claims (11)
- 一种用于执行向量比较运算的装置,其特征在于,包括:存储单元,用于存储向量比较运算指令相关的向量数据;寄存器单元,用于存储向量比较运算指令相关的标量数据;控制单元,用于对向量比较运算指令进行译码,并控制向量比较运算指令的运算过程;向量比较单元,用于根据译码后的向量比较运算指令,对两待比较输入向量数据进行向量比较操作;其中,所述向量比较单元为定制的硬件电路。
- 如权利要求1所述的装置,其特征在于,所述寄存器单元所存储的标量数据包括向量比较运算指令相关的待比较输入向量起始地址、比较结果输出向量存储地址、待比较输入向量长度;其中,所述待比较输入向量的起始地址以及比较结果输出向量存储地址为所述存储单元中的地址。
- 如权利要求1所述的装置,其特征在于,所述控制单元包括:指令队列模块,用于对译码后的向量比较运算指令进行顺序存储,并获取向量比较运算指令相关的标量数据。
- 如权利要求1所述的装置,其特征在于,所述控制单元包括:依赖关系处理单元,用于在向量比较单元获取当前向量比较运算指令前,判断当前向量比较运算指令与之前未执行完的运算指令是否存在依赖关系。
- 如权利要求1所述的装置,其特征在于,所述控制单元包括:存储队列模块,用于在当前向量比较运算指令与之前未执行完的运算指令存在依赖关系时,暂时存储当前向量比较运算指令,并且在该依赖关系消除时,将暂存的向量比较运算指令送往向量比较单元。
- 如权利要求1-5任一项所述的装置,其特征在于,所述装置还包括:指令缓存单元,用于存储待执行的向量运算指令;输入输出单元,用于将向量比较运算指令相关的向量数据存储于存储单元,或者,从存储单元中获取向量比较运算指令的比较结果输出向量。
- 如权利要求1所述的装置,其特征在于,所述向量比较运算指令包括操作码和操作域;所述操作码用于指示执行向量比较操作;所述操作域包括立即数和/或寄存器号,指示向量比较运算相关的标量数据,其中寄存器号指向所述寄存器单元地址。
- 如权利要求1-5、7任一项所述的装置,其特征在于,所述存储单元为高速暂存存储器。
- 一种用于执行向量比较运算的装置,其特征在于,包括:取指模块,用于从指令序列中取出下一条要执行的向量比较运算指令,并将该向量比较运算指令传给译码模块;译码模块,用于对该向量比较运算指令进行译码,并将译码后的向量比较运算指令传送给指令队列模块;指令队列模块,用于暂存译码后的向量比较运算指令,并从向量比较运算指令或标量寄存器获得向量比较运算指令相关的标量数据;获得所述标量数据后,将所述向量比较运算指令送至依赖关系处理单元;标量寄存器堆,包括多个标量寄存器,用于存储向量比较运算指令相关的标量数据;依赖关系处理单元,用于判断所述向量比较运算指令与之前未执行完的运算指令之间是否存在依赖关系;如果存在依赖关系,则将所述向量比较运算指令送至存储队列模块,如果不存在依赖关系,则将所述向量比较运算指令送至向量比较单元;存储队列模块,用于存储与之前运算指令存在依赖关系的向量比较运算指令,并且在所述依赖关系解除后,将所述向量比较运算指令送至向量比较单元;向量比较单元,用于根据接收到向量比较运算指令对输入向量数据进行向量比较操作;高速暂存存储器,用于存储待比较输入向量和比较结果输出向量;输入输出存取模块,用于直接访问所述高速暂存存储器,负责从所述高速暂存存储器中读取待比较输入向量和写入比较结果输出向量。
- 如权利要求9所述的装置,其特征在于,所述向量比较单元为定制的硬件电路。
- 一种用于执行向量比较运算方法,其特征在于,该方法包括:取指模块从指令序列中取出下一条要执行的向量比较运算指令,并将该向量比较运算指令传给译码模块;译码模块对该向量比较运算指令进行译码,并将译码后的向量比较运算指令传送给指令队列模块;指令队列模块暂存译码后的向量比较运算指令,并从向量比较运算指令或标量寄存器获得向量比较指令运算相关的标量数据;获得所述标量数据后,将所述向量比较运算指令送至依赖关系处理单元;依赖关系处理单元判断所述向量比较运算指令与之前未执行完的运算指令之间是否存在依赖关系;如果存在依赖关系,则将所述向量比较运算指令送至存储队列模块,如果不存在依赖关系,则将所述向量比较运算指令送至向量比较单元;存储队列模块存储与之前运算指令存在依赖关系的向量比较运算指令,并且在所述依赖关系解除后,将所述向量比较运算指令送至向量比较单元;向量比较单元根据接收到的向量比较运算指令,通过输入输出存取模块从高速暂存存储器取出待比较输入向量,然后对待比较输入向量进行向量比较运算,并通过输入输出存取模块将比较结果输出向量写入高速暂存存储器。
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP16899906.8A EP3451151B1 (en) | 2016-04-26 | 2016-05-05 | Apparatus and method for executing vector comparison operation |
| US16/171,289 US20190065189A1 (en) | 2016-04-26 | 2018-10-25 | Apparatus and Methods for Comparing Vectors |
| US16/247,260 US10853069B2 (en) | 2016-04-26 | 2019-01-14 | Apparatus and methods for comparing vectors |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610266782.2A CN107315563B (zh) | 2016-04-26 | 2016-04-26 | 一种用于执行向量比较运算的装置和方法 |
| CN201610266782.2 | 2016-04-26 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/171,289 Continuation-In-Part US20190065189A1 (en) | 2016-04-26 | 2018-10-25 | Apparatus and Methods for Comparing Vectors |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2017185395A1 true WO2017185395A1 (zh) | 2017-11-02 |
Family
ID=60160566
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2016/081115 Ceased WO2017185395A1 (zh) | 2016-04-26 | 2016-05-05 | 一种用于执行向量比较运算的装置和方法 |
Country Status (4)
| Country | Link |
|---|---|
| US (2) | US20190065189A1 (zh) |
| EP (1) | EP3451151B1 (zh) |
| CN (2) | CN107315563B (zh) |
| WO (1) | WO2017185395A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10853069B2 (en) | 2016-04-26 | 2020-12-01 | Cambricon Technologies Corporation Limited | Apparatus and methods for comparing vectors |
Families Citing this family (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10684854B2 (en) * | 2017-11-28 | 2020-06-16 | Intel Corporation | Apparatus and method for converting a floating-point value from half precision to single precision |
| CN110163350B (zh) * | 2018-02-13 | 2021-06-08 | 上海寒武纪信息科技有限公司 | 一种计算装置及方法 |
| CN110163360B (zh) * | 2018-02-13 | 2021-06-25 | 上海寒武纪信息科技有限公司 | 一种计算装置及方法 |
| CN111353595A (zh) * | 2018-12-20 | 2020-06-30 | 上海寒武纪信息科技有限公司 | 运算方法、装置及相关产品 |
| CN110598175B (zh) * | 2019-09-17 | 2021-01-01 | 西安邮电大学 | 一种基于图计算加速器的稀疏矩阵列向量比较装置 |
| CN113407351B (zh) * | 2021-07-20 | 2024-08-23 | 昆仑芯(北京)科技有限公司 | 执行运算的方法、装置、芯片、设备、介质和程序产品 |
| CN119397155B (zh) * | 2021-08-20 | 2026-04-17 | 华为技术有限公司 | 一种计算装置、方法、系统、电路、芯片及设备 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030037221A1 (en) * | 2001-08-14 | 2003-02-20 | International Business Machines Corporation | Processor implementation having unified scalar and SIMD datapath |
| CN102722469A (zh) * | 2012-05-28 | 2012-10-10 | 西安交通大学 | 基于浮点运算单元的基本超越函数运算方法及其协处理器 |
| CN102750133A (zh) * | 2012-06-20 | 2012-10-24 | 中国电子科技集团公司第五十八研究所 | 支持simd的32位三发射的数字信号处理器 |
| CN103699360A (zh) * | 2012-09-27 | 2014-04-02 | 北京中科晶上科技有限公司 | 一种向量处理器及其进行向量数据存取、交互的方法 |
| CN104699458A (zh) * | 2015-03-30 | 2015-06-10 | 哈尔滨工业大学 | 定点向量处理器及其向量数据访存控制方法 |
| CN105229599A (zh) * | 2013-03-15 | 2016-01-06 | 甲骨文国际公司 | 用于单指令多数据处理器的高效硬件指令 |
Family Cites Families (39)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP2609618B2 (ja) * | 1987-08-13 | 1997-05-14 | 株式会社東芝 | データ処理装置 |
| US5197130A (en) * | 1989-12-29 | 1993-03-23 | Supercomputer Systems Limited Partnership | Cluster architecture for a highly parallel scalar/vector multiprocessor system |
| JP3512272B2 (ja) * | 1995-08-09 | 2004-03-29 | 株式会社日立製作所 | 比較演算装置およびグラフィック演算システム |
| US5838984A (en) * | 1996-08-19 | 1998-11-17 | Samsung Electronics Co., Ltd. | Single-instruction-multiple-data processing using multiple banks of vector registers |
| US7100026B2 (en) * | 2001-05-30 | 2006-08-29 | The Massachusetts Institute Of Technology | System and method for performing efficient conditional vector operations for data parallel architectures involving both input and conditional vector values |
| EP1044407B1 (en) * | 1998-10-09 | 2014-02-26 | Koninklijke Philips N.V. | Vector data processor with conditional instructions |
| US7793084B1 (en) * | 2002-07-22 | 2010-09-07 | Mimar Tibet | Efficient handling of vector high-level language conditional constructs in a SIMD processor |
| CN101299185B (zh) * | 2003-08-18 | 2010-10-06 | 上海海尔集成电路有限公司 | 一种基于cisc结构的微处理器结构 |
| US7941585B2 (en) * | 2004-09-10 | 2011-05-10 | Cavium Networks, Inc. | Local scratchpad and data caching system |
| US7565514B2 (en) * | 2006-04-28 | 2009-07-21 | Freescale Semiconductor, Inc. | Parallel condition code generation for SIMD operations |
| US7676647B2 (en) * | 2006-08-18 | 2010-03-09 | Qualcomm Incorporated | System and method of processing data using scalar/vector instructions |
| US8332620B2 (en) * | 2008-07-25 | 2012-12-11 | Freescale Semiconductor, Inc. | System, method and computer program product for executing a high level programming language conditional statement |
| CN101685388B (zh) * | 2008-09-28 | 2013-08-07 | 北京大学深圳研究生院 | 执行比较运算的方法和装置 |
| JP4720915B2 (ja) * | 2009-02-26 | 2011-07-13 | 日本電気株式会社 | ベクトル命令間追い越し判定装置と方法 |
| US8972698B2 (en) * | 2010-12-22 | 2015-03-03 | Intel Corporation | Vector conflict instructions |
| CN202133997U (zh) * | 2011-02-28 | 2012-02-01 | 江苏中科芯核电子科技有限公司 | 一种数据的重排装置 |
| CN102156637A (zh) * | 2011-05-04 | 2011-08-17 | 中国人民解放军国防科学技术大学 | 向量交叉多线程处理方法及向量交叉多线程微处理器 |
| CN102200964B (zh) * | 2011-06-17 | 2013-05-15 | 孙瑞琛 | 基于并行处理的fft装置及其方法 |
| US9195463B2 (en) * | 2011-11-30 | 2015-11-24 | International Business Machines Corporation | Processing core with speculative register preprocessing in unused execution unit cycles |
| CN102495719B (zh) * | 2011-12-15 | 2014-09-24 | 中国科学院自动化研究所 | 一种向量浮点运算装置及方法 |
| SE535973C2 (sv) * | 2011-12-20 | 2013-03-12 | Mediatek Sweden Ab | Exekveringsenhet för digital signalprocessor |
| US9582464B2 (en) * | 2011-12-23 | 2017-02-28 | Intel Corporation | Systems, apparatuses, and methods for performing a double blocked sum of absolute differences |
| CN104126171B (zh) * | 2011-12-27 | 2018-08-07 | 英特尔公司 | 用于基于两个源写掩码寄存器生成依赖向量的系统、装置和方法 |
| US9588762B2 (en) * | 2012-03-15 | 2017-03-07 | International Business Machines Corporation | Vector find element not equal instruction |
| US9575753B2 (en) * | 2012-03-15 | 2017-02-21 | International Business Machines Corporation | SIMD compare instruction using permute logic for distributed register files |
| US9459864B2 (en) * | 2012-03-15 | 2016-10-04 | International Business Machines Corporation | Vector string range compare |
| US9489199B2 (en) * | 2012-12-28 | 2016-11-08 | Intel Corporation | Vector compare instructions for sliding window encoding |
| US9098121B2 (en) * | 2013-01-22 | 2015-08-04 | Freescale Semiconductor, Inc. | Vector comparator system for finding a peak number |
| US9389854B2 (en) * | 2013-03-15 | 2016-07-12 | Qualcomm Incorporated | Add-compare-select instruction |
| US20140289497A1 (en) * | 2013-03-19 | 2014-09-25 | Apple Inc. | Enhanced macroscalar comparison operations |
| US10191743B2 (en) * | 2013-12-29 | 2019-01-29 | Intel Corporation | Versatile packed data comparison processors, methods, systems, and instructions |
| US9678715B2 (en) * | 2014-10-30 | 2017-06-13 | Arm Limited | Multi-element comparison and multi-element addition |
| US20160179521A1 (en) * | 2014-12-23 | 2016-06-23 | Intel Corporation | Method and apparatus for expanding a mask to a vector of mask values |
| US20160179550A1 (en) * | 2014-12-23 | 2016-06-23 | Intel Corporation | Fast vector dynamic memory conflict detection |
| US10203955B2 (en) * | 2014-12-31 | 2019-02-12 | Intel Corporation | Methods, apparatus, instructions and logic to provide vector packed tuple cross-comparison functionality |
| US11544214B2 (en) * | 2015-02-02 | 2023-01-03 | Optimum Semiconductor Technologies, Inc. | Monolithic vector processor configured to operate on variable length vectors using a vector length register |
| US10387150B2 (en) * | 2015-06-24 | 2019-08-20 | International Business Machines Corporation | Instructions to count contiguous register elements having a specific value in a selected location |
| GB2548600B (en) * | 2016-03-23 | 2018-05-09 | Advanced Risc Mach Ltd | Vector predication instruction |
| CN107315563B (zh) | 2016-04-26 | 2020-08-07 | 中科寒武纪科技股份有限公司 | 一种用于执行向量比较运算的装置和方法 |
-
2016
- 2016-04-26 CN CN201610266782.2A patent/CN107315563B/zh active Active
- 2016-04-26 CN CN201911329417.1A patent/CN111176608B/zh active Active
- 2016-05-05 EP EP16899906.8A patent/EP3451151B1/en active Active
- 2016-05-05 WO PCT/CN2016/081115 patent/WO2017185395A1/zh not_active Ceased
-
2018
- 2018-10-25 US US16/171,289 patent/US20190065189A1/en not_active Abandoned
-
2019
- 2019-01-14 US US16/247,260 patent/US10853069B2/en active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030037221A1 (en) * | 2001-08-14 | 2003-02-20 | International Business Machines Corporation | Processor implementation having unified scalar and SIMD datapath |
| CN102722469A (zh) * | 2012-05-28 | 2012-10-10 | 西安交通大学 | 基于浮点运算单元的基本超越函数运算方法及其协处理器 |
| CN102750133A (zh) * | 2012-06-20 | 2012-10-24 | 中国电子科技集团公司第五十八研究所 | 支持simd的32位三发射的数字信号处理器 |
| CN103699360A (zh) * | 2012-09-27 | 2014-04-02 | 北京中科晶上科技有限公司 | 一种向量处理器及其进行向量数据存取、交互的方法 |
| CN105229599A (zh) * | 2013-03-15 | 2016-01-06 | 甲骨文国际公司 | 用于单指令多数据处理器的高效硬件指令 |
| CN104699458A (zh) * | 2015-03-30 | 2015-06-10 | 哈尔滨工业大学 | 定点向量处理器及其向量数据访存控制方法 |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10853069B2 (en) | 2016-04-26 | 2020-12-01 | Cambricon Technologies Corporation Limited | Apparatus and methods for comparing vectors |
Also Published As
| Publication number | Publication date |
|---|---|
| CN107315563A (zh) | 2017-11-03 |
| US20190163477A1 (en) | 2019-05-30 |
| CN111176608A (zh) | 2020-05-19 |
| CN111176608B (zh) | 2025-03-11 |
| EP3451151A4 (en) | 2019-12-25 |
| EP3451151B1 (en) | 2021-03-24 |
| EP3451151A1 (en) | 2019-03-06 |
| CN107315563B (zh) | 2020-08-07 |
| US20190065189A1 (en) | 2019-02-28 |
| US10853069B2 (en) | 2020-12-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN107315715B (zh) | 一种用于执行矩阵加/减运算的装置和方法 | |
| WO2017185395A1 (zh) | 一种用于执行向量比较运算的装置和方法 | |
| WO2017185393A1 (zh) | 一种用于执行向量内积运算的装置和方法 | |
| CN107315566B (zh) | 一种用于执行向量循环移位运算的装置和方法 | |
| CN111651201B (zh) | 一种用于执行向量合并运算的装置和方法 | |
| CN111651203A (zh) | 一种用于执行向量四则运算的装置和方法 | |
| CN111651206B (zh) | 一种用于执行向量外积运算的装置和方法 | |
| WO2017185390A1 (zh) | 一种用于执行向量超越函数运算的装置和方法 | |
| EP3451161B1 (en) | Apparatus and method for executing operations of maximum value and minimum value of vectors | |
| WO2017185404A1 (zh) | 一种用于执行向量逻辑运算的装置及方法 | |
| CN107315565B (zh) | 一种用于生成服从一定分布的随机向量装置和方法 | |
| TW201805802A (zh) | 一種運算裝置及其操作方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2016899906 Country of ref document: EP |
|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16899906 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2016899906 Country of ref document: EP Effective date: 20181126 |