WO2023030153A1 - 数据存储装置和数据处理方法 - Google Patents
数据存储装置和数据处理方法 Download PDFInfo
- Publication number
- WO2023030153A1 WO2023030153A1 PCT/CN2022/114735 CN2022114735W WO2023030153A1 WO 2023030153 A1 WO2023030153 A1 WO 2023030153A1 CN 2022114735 W CN2022114735 W CN 2022114735W WO 2023030153 A1 WO2023030153 A1 WO 2023030153A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- address
- ndp
- data
- unit
- space
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0815—Cache consistency protocols
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/14—Protection against unauthorised use of memory or access to memory
- G06F12/1416—Protection against unauthorised use of memory or access to memory by checking the object accessibility, e.g. type of access defined by the memory independently of subject rights
- G06F12/1425—Protection against unauthorised use of memory or access to memory by checking the object accessibility, e.g. type of access defined by the memory independently of subject rights the protection being physical, e.g. cell, word, block
- G06F12/1441—Protection against unauthorised use of memory or access to memory by checking the object accessibility, e.g. type of access defined by the memory independently of subject rights the protection being physical, e.g. cell, word, block for a range
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/14—Protection against unauthorised use of memory or access to memory
- G06F12/1458—Protection against unauthorised use of memory or access to memory by checking the subject access rights
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/7821—Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/0223—User address space allocation, e.g. contiguous or non contiguous base addressing
- G06F12/0292—User address space allocation, e.g. contiguous or non contiguous base addressing using tables or multilevel address translation means
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0875—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches with dedicated cache, e.g. instruction or stack
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1027—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
- G06F12/1036—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] for multiple virtual address spaces, e.g. segmentation
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1016—Performance improvement
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1028—Power efficiency
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1041—Resource optimization
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/10—Providing a specific technical effect
- G06F2212/1052—Security improvement
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2212/00—Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
- G06F2212/25—Using a specific main memory architecture
- G06F2212/254—Distributed memory
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present application relates to the field of chip technology, and in particular to a data storage device and a data processing method.
- Near Data Processing is a technology that deploys computing units (such as Microprocessor Units (MPUs), etc.) near storage devices (such as memory) to perform related data operations.
- MPUs Microprocessor Units
- This technology integrates computing units into storage devices through high-bandwidth links, endowing storage devices with part of their computing capabilities.
- CPU Central Processing Unit
- some computing tasks originally performed by the central processing unit (Central Processing Unit, CPU) can be offloaded to a storage device with computing power, thereby greatly reducing the long-distance data transmission between the CPU and the storage device. To improve system performance and reduce energy consumption.
- the computing unit used to perform computing tasks can access the memory in a virtual address mode or a physical address mode, so as to obtain corresponding data to perform computing tasks.
- the computing unit needs to interact with the processor multiple times during the calculation process.
- the embodiments of the present application provide a data storage device and a data processing method, which can eliminate the hardware overhead caused by address translation, and greatly improve the computing performance and energy consumption ratio in the near-data computing process.
- the present application provides a data storage device, the data storage device includes a memory and a first near-data computing NDP unit, the first NDP unit is electrically connected to the memory, and the data storage device is connected via a bus Connected to the processor; wherein the first NDP unit is used to store a base address of the first physical address information and a first length; wherein the base address of the first physical address information and the first length point to the memory
- the first address space in the first address space is a section of continuous memory space that the first NDP unit has the right to use; the memory is used to store information from the processor in the first address space
- the first data used for near data calculation; the first NDP unit is also used to acquire a first offset address, and based on the acquired first offset address and the first physical address information from the Reading part or all of the first data from the first address space; performing calculations based on part or all of the first data.
- the first NDP unit internally stores the first physical address information used to point to the first address space, so that subsequent addressing in the memory can be performed based on the physical address, so as to obtain from the continuous In the first address space, the first data used for near-data calculation is obtained.
- the process of address translation in the first NDP unit that is, virtual address and physical address
- address conversion process which can effectively reduce the hardware complexity of the first NDP unit.
- this application can significantly reduce the number of interactions between the NDP unit and the processor side, that is, significantly reduce the communication overhead and data transmission volume on the bus, and thus greatly improve the computing performance and energy consumption in the process of near-data computing Compare.
- the above-mentioned data storage device may include multiple NDP units, and the process of each NDP unit performing the calculation task is correspondingly the same as the process of the first NDP unit performing the calculation task.
- the above-mentioned data storage device may be a storage product with programmable processing capability, for example, it may be a general-purpose memory stick or a magnetic disk.
- the first NDP unit includes a first register unit and a near data computing core NDP core; the first NDP unit is specifically configured to: store the first register unit through the first register unit. Physical address information; obtain the first physical address information from the first register unit through the NDP core, and obtain the first physical address information from the first address space based on the first physical address information and the first offset address Part or all of the first data is read.
- the application since the application stores the first physical address information pointing to the first address space through the register unit, as long as the data in the register unit is not illegally modified, the security of the memory access process can be guaranteed .
- the first physical address information includes a first boundary address and a first length; the NDP core is specifically used to: when the first offset address is less than or equal to the first length , the first access address is calculated based on the first offset address and the first boundary address; wherein, the first boundary address is the starting physical address of the first address space or the first The termination physical address of the address space, the first length being the length of the first address space; reading part or all of the first data from the first access address in the first address space data.
- the first physical address information includes a second boundary address and a third boundary address
- the NDP core is specifically configured to: based on the first offset address and the second boundary address Calculate the first access address, or calculate the first access address based on the first offset address and the third boundary address; wherein, the second boundary address and the third boundary address are respectively the The starting physical address of the first address space and the ending physical address of the first address space; when the first access address is located between the second boundary address and the third boundary address, from the Part or all of the first data is read from the first access address.
- the first offset address is an offset relative to the starting physical address or the ending physical address of the first address space.
- the first physical address information may include at least two of the starting physical address of the first address space, the ending physical address of the first address space, and the length of the first address space (ie, the first length).
- the first NDP unit in the above embodiment accesses the memory (that is, the calculation method of the first access address), it can be ensured that the access range of the NDP core is within a preset continuous memory space (that is, the first access address) An address space) will not exceed the continuous memory space, thereby improving the security of the memory access process.
- the data storage device further includes a second NDP unit, the second NDP unit is electrically connected to the memory; the second NDP unit is configured to store second physical address information; Wherein, the second physical address information is used to point to a second address space in the memory, and the second address space is a continuous memory space that the second NDP unit has the right to use; the memory is used for In the second address space, store the second data from the processor for near data calculation; the first NDP unit is also used to obtain the second offset address and the second physical The address information reads part or all of the second data from the second address space; and performs calculation based on part or all of the second data.
- the second NDP unit is electrically connected to the memory
- the second NDP unit is configured to store second physical address information
- the second physical address information is used to point to a second address space in the memory, and the second address space is a continuous memory space that the second NDP unit has the right to use
- the memory is used for In the second address space, store the second data from the processor for near data
- the first NDP unit when the first NDP unit performs computing tasks, the first NDP unit can not only access the corresponding first address space, but also access the address spaces corresponding to other NDP units (such as the second NDP unit corresponding to The second address space), that is, in the process of near data calculation using the embodiment of the present application, when the first NDP unit needs to perform data interaction with other NDP units (such as the second NDP unit), it can be based on the stored second physical address Information directly obtains the data used for calculation from the second address space.
- the embodiment of the present application can make the first NDP unit Access to other address spaces except the first address space has good scalability.
- the memory is further configured to receive an instruction from the processor, the instruction instructs the memory to allocate the first address space for the first NDP unit, and instructs the memory Allocating the second address space for the second NDP unit.
- the first NDP unit further includes a second register unit and a cache unit; the first NDP unit is also used to obtain the second offset address based on the second physical address and the second physical address
- the first NDP unit is specifically configured to: cache the second physical address information in the cache unit
- the second physical address information is obtained from the cache unit through the NDP core, and updated into the second storage unit; through the NDP core based on the obtained second offset address and The second physical address information reads part or all of the second data from the second address space; or, when the cache unit does not cache the second physical address information, by The NDP core acquires the second physical address information from the second NDP unit, and updates it into the second register unit and the cache unit; through the NDP core, based on the obtained second offset The shift address and the second physical address information read part or all of the second data from the second address space.
- the cache unit can be a cache cache close to the NDP core in terms of physical implementation, for example, it can be a static random access memory (SRAM), which can be logically understood as a data structure with corresponding content cached, for example, it can be used to store the second Form of physical base address and second length.
- SRAM static random access memory
- this application caches the obtained second physical address information by setting a cache unit in the first NDP unit, so that when it needs to be used again in the subsequent calculation process, the second physical address information can be directly obtained from the cache unit.
- the physical address information is used to calculate the corresponding access address, so that the data used for near data calculation is read based on the calculated access address. Since the application adds the above cache mechanism in the near data calculation process, the delay in the near data calculation process can be effectively reduced, and the calculation efficiency and energy consumption ratio can be improved.
- the second physical address information includes a fourth boundary address and a second length; in the NDP core based on the second offset address and the second physical address information
- the NDP core is specifically configured to: read the fourth boundary address and the second address from the second register unit.
- the second length wherein, the fourth boundary address is the starting physical address of the second address space or the ending physical address of the second address space, and the second length is the physical address of the second address space length; when the second offset address is less than or equal to the second length, a second access address is calculated based on the second offset address and the fourth boundary address; from the second address space Part or all of the data in the second data is read from the second access address.
- the second physical address information includes a fifth boundary address and a sixth boundary address; in the NDP core based on the second offset address and the second physical address
- the NDP core is specifically used to: calculate based on the second offset address and the fifth boundary address The second access address, or the second access address is calculated based on the second offset address and the sixth boundary address; wherein, the fifth boundary address and the sixth boundary address are respectively the first The starting physical address of the second address space and the ending physical address of the second address space; when the second access address is located between the fifth boundary address and the sixth boundary address, from the second Part or all of the first data is read from the access address.
- the second offset address is an offset relative to the starting physical address or the ending physical address of the second address space.
- the first NDP unit in the above embodiment accesses the second address space in the memory, it can ensure that the access range is inside the second address space and will not exceed the second address space, thereby ensuring that the memory is Security of the access process.
- the NDP unit in the data storage device is not aware of the first data cached in the processor for near data calculation, when there is no other mechanism to ensure cache consistency in this application, the processor needs to be Clear the data in the first address space of the cache in order to avoid the problem of cache coherency.
- each of the first register unit and the second register unit includes at least one register.
- the above-mentioned registration unit can be a high-speed storage device with data storage and data read and write functions, and its data read and write speed is much higher than that of external storage devices such as hard disks and U disks (external storage devices refer to memory devices other than computing device memory and processor cache) memory device).
- the number of registers in each storage unit must meet the requirements for storing address information.
- the first physical address information may be stored together in the same register, or stored in two registers respectively.
- the second physical address information may also be stored together in the same register, or stored in two registers respectively, which is not limited in the present application.
- the first NDP unit is further configured to send a second signal to the processor through the bus after completing the near data calculation; the second signal is used to indicate The first NDP unit has completed the near data calculation.
- the first NDP unit when the first NDP unit completes the near data calculation, it can inform the processor side of its progress, so as to ensure that the processor side can grasp the completion status of the near data calculation tasks of each NDP unit.
- the present application provides a data processing method applied to a data storage device, the data storage device includes a memory and a first near-data computing NDP unit, the first NDP unit is electrically connected to the memory, and the The data storage device is connected to the processor through a bus; the method includes: storing first physical address information through the first NDP unit; wherein the first physical address information is used to point to the first address in the memory space, the first address space is a continuous memory space that the first NDP unit has the right to use; the memory stores in the first address space the first address used for near data calculation from the processor data; the first NDP unit reads part or all of the first data from the first address space based on the obtained first offset address and first physical address information; based on the first Perform calculations on some or all of the data.
- the first NDP unit includes a first register unit and a near data calculation core NDP core; storing the first physical address information through the first NDP unit includes: through the first NDP unit A register unit stores the first physical base address and the first length; the first NDP unit reads from the first address space based on the obtained first offset address and first physical address information Retrieving part or all of the first data includes: obtaining the first physical address information from the first registration unit by the NDP core, and obtaining the first physical address information based on the first physical address information and the first The offset address reads part or all of the first data from the first address space.
- the first physical address information includes a first boundary address and a first length; the first address based on the first physical address information and the first offset address Reading part or all of the first data in the space includes: when the first offset address is less than or equal to the first length, by the NDP Core based on the first offset address and The first boundary address is calculated to obtain a first access address; wherein, the first boundary address is the starting physical address of the first address space or the ending physical address of the first address space, and the first The length is the length of the first address space; part or all of the first data is read by the NDP Core from the first access address in the first address space.
- the first physical address information includes a second boundary address and a third boundary address
- Reading part or all of the first data in the address space includes: calculating the first access address based on the first offset address and the second boundary address by the NDP Core, or calculating the first access address based on the The first offset address and the third boundary address are calculated to obtain the first access address; wherein, the second boundary address and the third boundary address are respectively the starting physical address and the first address space of the first address space The termination physical address of the first address space; when the first access address is between the second boundary address and the third boundary address, read from the first access address by the NDP Core Obtain part or all of the first data.
- the data storage device further includes a second NDP unit, and the second NDP unit is electrically connected to the memory; the method further includes: storing the second Physical address information; wherein, the second physical address information is used to point to a second address space in the memory, and the second address space is a continuous memory space that the second NDP unit has the right to use; by the memory Store in the second address space the second data used for near data calculation from the processor; based on the obtained second offset address and the second physical address information from the first NDP unit Reading part or all of the second data from the second address space; performing calculations based on part or all of the second data.
- the method further includes: receiving, by the memory, an instruction of the processor, the instruction instructing the memory to allocate the first address space for the first NDP unit, and instructing the memory to allocate the second address space for the second NDP unit.
- the first NDP unit further includes a second register unit and a cache unit; the second offset address obtained based on the second physical address information is obtained from the second address space
- Reading part or all of the second data in the second data includes: obtaining the second physical address information from the cache unit through the NDP core under the condition that the cache unit caches the second physical address information Two physical address information, and update it into the second register unit; read the second address space from the second address space through the NDP core based on the second offset address and the second physical address information Some or all of the data in the second data; or, in the case that the cache unit does not cache the second physical address information, obtain the second physical address information from the second NDP unit through the NDP core , and update to the second register unit and the cache unit; read the NDP core from the second address space based on the second offset address and the second physical address information Some or all of the data in the second data.
- the second physical address information includes a fourth boundary address and a second length
- the NDP core based on the second offset address and the second physical address information from Reading part or all of the second data in the second address space includes: reading the fourth boundary address and the second length from the second register unit through the NDP core ;
- the fourth boundary address is the starting physical address of the second address space or the ending physical address of the second address space
- the second length is the length of the second address space; when the When the second offset address is less than or equal to the second length, the NDP core calculates a second access address based on the second offset address and the fourth boundary address; Reading part or all of the second data from the second access address in the second address space.
- the second physical address information includes a fifth boundary address and a sixth boundary address
- the NDP core is based on the second offset address and the second physical address information
- the aspect of reading part or all of the second data from the second address space includes: calculating the second offset address and the fifth boundary address by the NDP core based on the second offset address and the fifth boundary address An access address, or the second access address is calculated based on the second offset address and the sixth boundary address; wherein, the fifth boundary address and the sixth boundary address are respectively the second address The starting physical address of the space and the ending physical address of the second address space; when the second access address is located between the fifth boundary address and the sixth boundary address, the NDP core starts from the Read part or all of the first data from the second access address.
- each of the first register unit and the second register unit includes at least one register.
- the method further includes: after the first NDP unit completes the near data calculation, the first NDP unit sends a signal to the processor through the bus; The above signal is used to indicate that the first NDP unit has completed the near data calculation.
- an embodiment of the present application provides a data processing device, including a processor, the data storage device provided in any one of the implementation manners in the first aspect above, and a discrete device coupled to the data storage device.
- an embodiment of the present application provides a computer storage medium, the computer storage medium stores a computer program, and when the computer program is executed, the data processing method described in any one of the above second aspects is realized.
- an embodiment of the present application provides a computer program, the computer program includes instructions, and when the computer program is executed, the data processing method described in any one of the above-mentioned second aspects is implemented.
- FIG. 1 is a schematic structural diagram of a system architecture for near-data computing in an embodiment of the present application
- Fig. 2 is a schematic structural diagram of a data processing device in an embodiment of the present application.
- Fig. 3 is a schematic structural diagram of another data processing device in the embodiment of the present application.
- Fig. 4 is a schematic diagram of calculation logic of a memory access address in the embodiment of the present application.
- Fig. 5 is a schematic structural diagram of another data processing device in the embodiment of the present application.
- FIG. 6 is a schematic diagram of a hardware structure of a data storage device in an embodiment of the present application.
- FIG. 7 is a schematic diagram of a spatial layout of a continuous memory space in an embodiment of the present application.
- Fig. 8 is a schematic flowchart of a data processing method in the embodiment of the present application.
- NDP Near-data Processing
- Cache consistency In a computer system using a hierarchical storage system, it is a mechanism to ensure that the data in the cache memory is the same as the data in the main memory. In a system where many different devices share a common memory resource, inconsistent data in the cache can cause problems. If some shared data exists in the caches of different devices at the same time, it is also necessary to ensure the consistency of the data therein.
- Address translation refers to the conversion process between virtual address and physical address.
- a virtual address usually refers to an address provided by a program, while a physical address refers to an effective memory address.
- the set of all virtual addresses is called the virtual address space (Virtual Address Space), and the set of all physical addresses is called the physical address space (Physical Address Space).
- the process of address translation can be simply understood as querying the physical address corresponding to the virtual address through the page table, and the translation lookaside buffer (Translation Lookaside Buffer, TLB) in the abbreviation table is used to cache part of the page table to speed up the query process A hardware structure.
- TLB Translation Lookaside Buffer
- Cache A level in the computer memory system, located between the main memory and the processor, is added to bridge the difference in processing speed between the two. Compared with the main memory, the cache access speed is faster , but with a smaller capacity. Cache can also usually be divided into multiple layers. The closer to the CPU, the lower the capacity and the faster the access speed.
- TSV Through Silicon Via
- Stack A special space in memory used to save local variables, function call parameters, etc., with the characteristics of last-in-first-out, generally growing from high address to low address.
- the ratio of performance to power consumption that is, the ratio of performance to power consumption, usually indicates the performance level of a processor under a specific power consumption. The higher the value, the more calculations the processor can perform at a fixed power consumption.
- Continuous memory space refers to a physically continuous storage space (in the memory), which corresponds to a continuous physical address.
- Offset address refers to the offset relative to the starting physical address or ending physical address of the corresponding address space.
- FIG. 1 is a schematic structural diagram of a system architecture for near data computing in an embodiment of the present application.
- the system architecture 100 may include a central processing unit 110 (Center Processing Unit, CPU), a graphics processing unit 120 (Graphics Processing Unit, GPU), a digital signal processor 130 (Digital Signal Processor, DSP), M The data storage device 150 and the bus 160, wherein M is a positive integer.
- CPU Center Processing Unit
- GPU Graphics Processing Unit
- DSP Digital Signal Processor
- Each data storage device may include a memory and one or more near data computing NDP units.
- the data storage device 1 includes a memory 1 and N near-data computing NDP units
- the data storage device M includes a memory M and K NDP units; wherein, the memory and the NDP units are directly connected through a physical link, and N and K are positive integers .
- system architecture 100 may further include a memory controller (Memory Controller, MC, not shown in FIG. 1 ), which is used to control data read and write operations in the data storage device.
- MC Memory Controller
- the data storage device may be a storage product with programmable processing capabilities. For example, general-purpose memory sticks or disks, etc.
- the memory included in the data storage device may be any one of random access memory (Random Access Memory, RAM) or non-volatile memory (Non-Volatile Memory, NVM).
- Random access memory RAM includes static random access memory (Static Random-Access Memory, SRAM) and dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.
- nonvolatile memory NVM includes read-only memory (Read-only Memory, ROM) and flash memory FlashMemory, etc.
- the above-mentioned system architecture 100 can be included in any terminal device for performing near-data computing, and the terminal device can be a mobile phone, a computer, a tablet, a wearable device, or a vehicle-mounted terminal.
- the system architecture 100 can be applied to any scenario requiring near-data computing, such as general computing, high-performance computing, or artificial intelligence, which is not limited in this application.
- FIG. 2 is a schematic structural diagram of a data processing device in an embodiment of the present application.
- the data processing device 200 is a data processing device obtained based on the near-data computing system architecture 100 in FIG. 1 , and includes: a data storage device 210 , a processor 220 , and a bus 230 .
- the data storage device 210 is connected to the processor 220 through a bus 230 .
- the data storage device 210 includes a memory 211 and a first NDP unit 212 , and the memory 211 is electrically connected to the first NDP unit 212 .
- the electrical connection may refer to a direct connection between the memory 211 and the first NDP unit 212 through a physical circuit such as copper foil or a wire that can transmit electrical signals, that is, no other devices are included between them.
- the data storage device 210 may be any one of the M data storage devices in FIG. 1 .
- the data processing apparatus 200 may be any terminal device for performing near-data calculation, such as a mobile phone, a computer, a tablet, a wearable device, or a vehicle-mounted terminal.
- Processor 220 may be a central processing unit (CPU) or other processing core.
- the processor 220 may also be a heterogeneous processor, that is, a different type of processor, and the specific implementation solution of the processor is not described in this embodiment.
- the first NDP unit 212 is used to store the first address information; wherein, the first address information is used to point to the first address space in the memory, and the first address space is a section of continuous memory allocated by the processor for the first NDP unit 212 space.
- the processor 220 first performs initialization to allocate a continuous memory space in the memory 211 for the first NDP unit 212 , that is, the first address space.
- the first address space may be a set of a series of continuous physical addresses.
- the processor 220 allocates the first address space to the first NDP unit 212, the first NDP unit 212 stores first physical address information representing a specific location of the first address space in the memory.
- the data processing apparatus 200 may execute a corresponding initialization process (also called a near-data computing task distribution process) through the processor 220 to run a piece of initialization code, so as to allocate a piece of continuous memory in the memory 211 for the first NDP unit 212 space.
- the first NDP unit may be a central processing unit CPU or a microprocessing unit (Microprocessor Unit, MCU), etc., which is not limited in this application.
- the memory 211 is configured to store, in the first address space, the first data sent by the processor for near data calculation.
- the data processing apparatus 200 may implement writing the first data used for near data calculation into the first address space in the memory 211 by running the above initialization code on the processor 220 . Specifically, after the processor 220 allocates the corresponding first address space for the first NDP unit 212 , the first data may be written into the first address space through the bus 230 .
- the continuous memory space corresponding to the first address space in the memory 211 can be logically divided into a data block and a code block, which are used to store data and code respectively, that is, in the first data for near-data calculation Includes data and code. It should be noted that in the paged memory management system, the size of the continuous memory space allocated by the processor 220 is not limited by the page size.
- the memory 211 may be any one of random access memory (Random Access Memory, RAM) or non-volatile memory (Non-Volatile Memory, NVM).
- Random access memory RAM includes static random access memory (Static Random-Access Memory, SRAM) and dynamic random access memory (Dynamic Random Access Memory, DRAM), etc.
- nonvolatile memory NVM includes read-only memory (Read-only Memory, ROM) and flash memory FlashMemory, etc.
- the first NDP unit 212 is further configured to read part or all of the data in the first data from the first address space based on the obtained first offset address and the first physical address information; based on the part in the first data or all data to perform calculations.
- the calculation performed by the first NDP unit 212 may include one or more rounds of calculation processes. During each round of calculation, the first NDP unit needs to read corresponding data from the memory 211 . During one round of calculation of the multiple rounds of calculation, the first NDP unit may first obtain the first offset address (also referred to as the first offset), and then based on the first offset address and the first physical address The information reads part or all of the first data from the first address space, and then performs computation based on the read part or all of the first data.
- the first offset address also referred to as the first offset
- the first NDP unit 212 can read from the code block in the first address space (for example, the operand of the load or store instruction in the code block) or the program counter (Program Counter, PC) in the first NDP unit register, or the first NDP unit obtains the first offset address from the data read from the first address space during the last round of calculation, and then based on the first physical address information and the first offset address, from the first address Read part or all of the first data in the space.
- the code block in the first address space for example, the operand of the load or store instruction in the code block
- PC program Counter
- the first NDP unit internally stores the first physical address information used to point to the first address space, so that subsequent addressing in the memory can be performed based on the physical address, so as to obtain from the continuous
- the first data used for near data calculation is obtained in the first address space.
- this application can significantly reduce the number of interactions between the NDP unit and the processor side, that is, significantly reduce the communication overhead and data transmission volume on the bus, and thus greatly improve the computing performance and energy consumption in the process of near-data computing Compare.
- FIG. 3 is a schematic structural diagram of another data processing device in the embodiment of the present application, as a refinement of the first NDP unit 212 in the data processing device 200 in FIG. 2 .
- the first NDP unit 212 may include a near data computing core NDP core 2121 and a first register unit 2122.
- the first NDP unit 212 is specifically configured to: store the first physical address information through the first register unit 2122; obtain the information from the first register unit through the NDP core 2121 The first physical address information, and read part or all of the first data from the first address space based on the first physical address information and the first offset address.
- the process of the first NDP unit 212 reading data from the first address space based on the first physical address information and the first offset address is implemented by the NDP core2121.
- the application since the application stores the first physical address information pointing to the first address space through the register unit, as long as the data in the register unit is not illegally modified, the security of the memory access process can be guaranteed .
- the first physical address information includes a first boundary address and a first length; the NDP core is specifically used to: when the first offset address is less than or equal to the first length , the first access address is calculated based on the first offset address and the first boundary address; wherein the first boundary address is the starting physical address of the first address space or the first address The end physical address of the space, the first length being the length of the first address space; reading part or all of the first data from the first access address in the first address space .
- the first offset address is an offset relative to the starting physical address or the ending physical address of the first address space.
- the first physical address information may include at least two of the starting physical address of the first address space, the ending physical address of the first address space, and the length of the first address space (ie, the first length).
- NDP Core 2121 first compares the size of the first offset address and the first length: when the first offset address is less than or equal to the first length , calculating the first access address based on the first offset address and the first boundary address.
- the process of calculating the first access address based on the first offset address and the first boundary address includes four situations:
- the first offset address is the offset relative to the starting physical address of the first address space, and when the first boundary address is the starting physical address of the first address space, NDP Core 2121 will combine the first boundary address and the first boundary address An offset address is added to obtain the first access address.
- the first offset address is the offset relative to the starting physical address of the first address space, and when the first boundary address is the ending physical address of the first address space, NDP Core 2121 first subtracts the first boundary address from The first length is added to the first offset address to obtain the first access address.
- the first offset address is the offset relative to the end physical address of the first address space, and when the first boundary address is the initial physical address of the first address space, NDP Core 2121 adds the first boundary address to the second A length, then subtract the first offset to obtain the first access address.
- the first offset address is the offset relative to the end physical address of the first address space.
- NDP Core 2121 subtracts the first boundary address from the first Offset to get the first access address.
- the first physical address information includes a second boundary address and a third boundary address
- the NDP core is specifically configured to: based on the first offset address and the second boundary address Calculate the first access address, or calculate the first access address based on the first offset address and the third boundary address; wherein, the second boundary address and the third boundary address are respectively the The starting physical address of the first address space and the ending physical address of the first address space; when the first access address is located between the second boundary address and the third boundary address, from the Part or all of the first data is read from the first access address.
- NDP Core 2121 can first calculate the first access address based on the second boundary address, the third boundary address and the first offset address, and then judging whether the calculated first access address falls in the first address space, and when the first access address falls in the first address space, reading part of the first data from the first access address or all data.
- the first access address calculated based on the second boundary address, the third boundary address and the first offset address includes two cases:
- NDP Core 2121 adds the first offset address and the second boundary address to obtain the first access address.
- the NDP Core 2121 subtracts the first offset address from the third boundary address to obtain the first access address.
- the first access address calculated by using the above calculation logic in the embodiment of the present application can just fall into the first address space, that is, the first The access of the NDP unit 212 to the memory 211 will not exceed the continuous memory space (ie, the first address space) previously allocated by the processor to the first NDP unit, thus improving the security of the access process of the first NDP unit 212 to the memory 211 .
- the first offset address is greater than the first length or the calculated first access address does not fall into the first address space, in order to ensure the security of the memory 211 being accessed, the first NDP unit may output an exception indicating the access address signal of.
- FIG. 4 is a schematic diagram of calculation logic of a memory access address in an embodiment of the present application.
- the calculation process can be implemented by a hardware module in the first NDP unit.
- the hardware modules involved in calculating the memory access address can include: PC register 410, first register 420, second register 430, adder 440, comparator 450 and access unit (Load Store Unit, LSU) 460.
- the calculation process of the first access address will be described in detail below with reference to the hardware logic shown in FIG. 4 .
- the hardware modules in FIG. 4 may be part of the hardware modules included in the first NDP unit 212 .
- the PC register 410 is used to store the first offset address.
- the first register unit 2122 includes a first register 420 and a second register 430, and the first register 420 and the second register 430 can be used to respectively store two kinds of information in the first physical address information (the first boundary address and the first length, or second boundary address and third boundary address).
- the comparator 450 respectively obtains the first offset address and the first length from the PC register 410 and the second register 430, and compares the An offset address and a first length size.
- the adder 440 respectively obtains the first offset address and the first boundary address from the PC register and the first register 420, and calculates the first access address based on one of the four corresponding calculation methods in the above-mentioned embodiments.
- the access unit 460 determines whether to read the data used for calculation from the first access address in the memory 211 based on the comparison result of the comparator; specifically, when the first offset address is less than or equal to the first length, the access The unit LSU reads data from the first access address in the first address space; when the first offset address is greater than the first length, the access unit LSU generates an address exception signal.
- FIG. 5 is a schematic structural diagram of another data processing device in the embodiment of the present application, as a refinement of some modules in the data processing device 200 in FIG. 2 or FIG. 3 .
- the data storage device 200 may include E NDP units (ie, the first NDP unit 212 , the second NDP unit 213 . . . the Eth NDP unit 214 ), where E is an integer greater than 2.
- the E NDP units respectively correspond to the E segments of continuous memory space in the memory 211.
- the first address space 2112 is a segment of continuous memory space allocated by the processor 220 for the first NDP unit 212
- the second address space 2113 is a segment of continuous memory space allocated by the processor 211.
- E NDP unit 220 is a segment of continuous memory space allocated for the second NDP unit...Eth address space 2114 A segment of continuous memory space allocated by the processor 220 for the Eth NDP unit.
- the aforementioned E address spaces may be allocated once by the processor 220 during initialization.
- the E NDP units are connected through an Internet (Crossbar Network) 214.
- the E NDP units are respectively directly connected to the memory 211 through physical links.
- FIG. 5 only shows the specific internal structure of the first NDP unit 212, and the internal structure of the other E-1 NDP units may be the same as or different from the internal structure of the first NDP unit 212. Not limited.
- the first NDP unit 212 in this embodiment of the present application may be any one of the E NDP units included in the data storage device 210 .
- the second NDP unit is configured to store second physical address information; wherein, the second physical address information is used to point to a second address space in the memory, and the first The second address space is a section of continuous memory space that the second NDP unit has the right to use; the memory is used to store in the second address space the second data used for near data calculation from the processor; The first NDP unit is further configured to read part or all of the second data from the second address space based on the acquired second offset address and the second physical address information; based on the Part or all of the second data performs calculations.
- the processor 220 allocates a continuous memory space in the memory 211 for the second NDP unit 213, namely the second address space 2113 .
- the second address space 2113 is characterized by a second physical base address and a second length, that is, the starting physical address of the second address space 2113 is the second physical base address, and the length of the second address space 2113 is the second length.
- the second address space 2113 is a set of continuous physical addresses.
- processor 220 distributing near-data calculation tasks, after allocating the second address space 2113 to the second NDP unit 213, the processor 220 is also used to write the address space 2113 into the second address space 2113 through the bus 230.
- the second data calculated on the near data.
- the manner in which the first NDP unit 212 acquires the second offset address may be correspondingly the same as the manner in which the first offset address is acquired, and details are not repeated here.
- the above-mentioned process in which the first NDP unit 212 reads part or all of the second data from the second address space for calculation may be one of one or more rounds of calculation tasks performed by the first NDP unit 212 .
- the first NDP unit can either read the corresponding data from the first address space 2112 allocated by the processor 220 for calculation, or read the corresponding data from the continuous memory allocated by the processor 220 for other near data calculation units. Read the corresponding data in the space for calculation. That is, for two NDP units that need to perform data interaction, they only need to perform point-to-point communication and do not need to perform global synchronization. Therefore, in the embodiment of the present application, each NDP unit has good scalability in accessing the remote address space in the memory .
- this embodiment of the present application only uses the first NDP unit 212 as an object to describe the specific calculation process of the NDP unit.
- the calculation process performed by each NDP unit is correspondingly the same as the calculation process performed by the first NDP unit 212 in the embodiment of the present application, and will not be repeated here.
- the memory is further configured to receive an instruction from the processor, the instruction instructs the memory to allocate the first address space for the first NDP unit, and instructs the memory Allocating the second address space for the second NDP unit.
- first address space and the second address space are continuous memory spaces allocated by the processor for the first NDP unit and the second NDP unit respectively.
- the first NDP unit further includes a second register unit and a cache unit; the first NDP unit is also used to obtain the second offset address based on the second physical address and the second physical address
- the first NDP unit is specifically configured to: cache the second physical address information in the cache unit
- the second physical address information is obtained from the cache unit through the NDP core, and updated into the second storage unit; through the NDP core based on the obtained second offset address and The second physical address information reads part or all of the second data from the second address space; or, when the cache unit does not cache the second physical address information, by The NDP core acquires the second physical address information from the second NDP unit, and updates it into the second register unit and the cache unit; through the NDP core, based on the obtained second offset The shift address and the second physical address information read part or all of the second data from the second address space.
- the cache unit 2124 is configured to cache the second physical address information used to point to the second address space 2113 during the calculation process of the first NDP unit 212 .
- the second NDP unit 213 may be any other NDP unit except the first NDP unit 212, that is, the second address space 2113 may be any continuous memory space except the first address space 2112, and the second The address space 2113 is a continuous memory space allocated by the processor 220 for the second NDP unit 213 .
- the cache unit 2124 may be random access memory RAM or any one of other feasible memories in hardware.
- the random access memory RAM may include static random-access memory (Static Random-Access Memory, SRAM) and dynamic random-access memory (Dynamic Random-Access Memory, DRAM).
- SRAM static random-access memory
- DRAM Dynamic Random-Access Memory
- the cache unit 2124 can logically be a table for storing the second physical address and the second length.
- Reading part or all of the second data from the second address space through the NDP core based on the obtained second offset address and second physical address information further includes: NDP Core from the second register unit 2123 The second physical address information is obtained, and part or all of the second data is obtained from the second address space 2113 based on the second physical address information and the second offset address, so as to perform a next round of calculation.
- this application caches the obtained second physical address information by setting a cache unit in the first NDP unit, so that when it needs to be used again in the subsequent calculation process, the second physical address information can be directly obtained from the cache unit.
- the physical address information is used to calculate the corresponding access address, so that the data used for calculation is read based on the calculated access address. Since the application adds the above cache mechanism in the near data calculation process, the delay in the near data calculation process can be effectively reduced, and the calculation efficiency and energy consumption ratio can be improved.
- the second physical address information includes a fourth boundary address and a second length; in the NDP core based on the second offset address and the second physical address information
- the NDP core is specifically configured to: read the fourth boundary address and the second address from the second register unit.
- the second length wherein, the fourth boundary address is the starting physical address of the second address space or the ending physical address of the second address space, and the second length is the physical address of the second address space length; when the second offset address is less than or equal to the second length, a second access address is calculated based on the second offset address and the fourth boundary address; from the second address space Part or all of the data in the second data is read from the second access address.
- the second physical address information includes a fifth boundary address and a sixth boundary address; in the NDP core based on the second offset address and the second physical address
- the NDP core is specifically used to: calculate based on the second offset address and the fifth boundary address The second access address, or the second access address is calculated based on the second offset address and the sixth boundary address; wherein, the fifth boundary address and the sixth boundary address are respectively the first The starting physical address of the second address space and the ending physical address of the second address space; when the second access address is located between the fifth boundary address and the sixth boundary address, from the second Part or all of the first data is read from the access address.
- the second offset address is an offset relative to the starting physical address or the ending physical address of the second address space.
- the first NDP unit in the above embodiment accesses the second address space in the memory, it can ensure that the access range is inside the second address space and will not exceed the second address space, thereby ensuring that the memory is Security of the access process.
- the NDP unit in the data storage device is not aware of the first data cached in the processor for near data calculation, when there is no other mechanism to ensure cache consistency in this application, the processor needs to be Clear the data in the first address space of the cache in order to avoid the problem of cache coherency.
- the processor 220 may clear the first address space cached in the processor 220 before the first NDP unit 212 obtains the data used for calculation from the first address space based on the first access address data in .
- the first NDP unit 212 since the first NDP unit 212 does not perceive the cache structure of other components, before the first NDP unit 212 starts calculation, it needs to instruct the processor 220 to clear the data in the first address space it caches, And the data in the first address space cached by the processor 220 is no longer accessed during subsequent calculations by the first NDP unit 212 , thereby avoiding cache coherence problems.
- each of the first register unit and the second register unit includes at least one register.
- the above-mentioned registration unit can be a high-speed storage device with data storage and data read and write functions, and its data read and write speed is much higher than that of external storage devices such as hard disks and U disks (external storage devices refer to memory devices other than computing device memory and processor cache) memory device).
- the register unit can be implemented by registers with high read and write speeds.
- the number of registers in each storage unit must meet the requirements for storing address information.
- the first physical address information may be stored in the one register.
- the first register unit 2122 includes one register and the first physical address information includes two types of information
- the first physical address information may be stored in the one register.
- the two registers can be used to respectively store two types of information in the first physical address information (the first boundary address and the first length, or the second boundary address and the third boundary address ).
- the second register unit may also store the second physical address information in the same manner as the first register unit, which will not be repeated here.
- this application uses hardware registers to store the first physical address information pointing to the first address space 2112 in the memory 211. Since the information in the hardware registers is difficult to be tampered with and has high security, this application is completed based on hardware logic. The calculation of the first access address has high security. At the same time, by adopting the calculation logic of the first access address in this application, the access range of the first NDP unit 212 in the memory 211 will not exceed the continuous memory space pointed to by the first physical address information, ensuring that the memory 211 is accessed safety. Similarly, security when the first NDP unit 212 accesses other address spaces in the memory 211 can also be guaranteed.
- the present application can also control the access rights of different blocks in the first address space by adding simple hardware logic, so as to enhance security.
- a memory protection unit Memory Protection Unit, MPU
- MPU memory Protection Unit
- different access permissions can also be set for data blocks and code blocks in other address spaces, which will not be repeated here.
- the first NDP unit is further configured to send a signal to the processor through the bus after completing the near data calculation; the signal is used to indicate that the first The NDP unit has completed near data calculations.
- the first NDP unit 212 may send a signal to the processor 220 through the bus 230 .
- the processor 220 may maintain a counter, increase the value of the counter by 1 after receiving a signal sent by an NDP unit, and store the identifier of the NDP unit accordingly;
- the identifier may be the serial number of the NDP unit.
- FIG. 6 is a schematic diagram of a hardware structure of a data storage device in an embodiment of the present application.
- the data storage device 600 is a storage device based on a hybrid memory cube (Hybrid Memory Cube, HMC).
- the data storage device 600 can be applied to the data processing device in FIG. 1 , FIG. 2 , FIG. 3 or FIG. 5 as a data storage device therein.
- HMC Hybrid Memory Cube
- the data storage device 600 includes 8-layer stacked DRAM chips (respectively, crystal grain A to crystal grain H in FIG. 6 ) and 1 layer of logic chips.
- the memory 211 and the logic chip in the foregoing embodiments may serve as a control unit, and may include the first NDP unit in the embodiment of FIG. 2-FIG. 3 or the E NDP units in the embodiment of FIG. 5 .
- the chips of each layer can be connected by through silicon vias (Through Silicon Via, TSV).
- TSV Through Silicon Via
- Each layer of chips in the data storage device 600 can be logically divided into several units (as shown in FIG. area, the logic chip is divided into 32 logic cells).
- the multi-layer memory and logic unit in the vertical direction form a storage library Vault, and the data storage device 600 can be divided into 32 vaults (Vault 00 to Vault 31), and Vault 00 includes logic unit 00 and 8 storage units in the vertical direction (P00A to P00H).
- Each Vault is connected through an internal interconnection network (Crossbar Network), and the interconnection network communicates with the processor (not shown in Figure 6) through a bus (high-speed serial link).
- the data storage device 600 may be an 8GB storage device, and each layer of DRAM chips has a data capacity of 1GB.
- the interconnection network is connected to devices outside the data storage device 600 through eight 40GB/s high-speed serial links.
- the logical unit 00 inside Vault 00 includes a repository controller VC00 (Vault Controller, VC) and an NDP unit 00.
- VC00 is integrated in logic unit 00 and is responsible for data read and write operations inside Vault 00.
- the NDP unit can include Near-Data Processing Core (NDP Core), high-speed temporary memory (Scratchpad Memory, SPM), memory protection unit (Memory Protection Unit, MPU), direct memory access (Direct Memory Access, DMA) )engine.
- NDP Core Near-Data Processing Core
- SPM high-speed temporary memory
- MPU memory protection unit
- DMA direct memory access
- each NDP00 unit may further include a first register unit, a second register unit and a cache unit, which are not shown in FIG. 6 for simplicity. It should be understood that the internal structure of other logical units may be the same as that of the logical unit 00, and will not be repeated here.
- FIG. 7 is a schematic diagram of a spatial layout of a continuous memory space in an embodiment of the present application.
- the continuous memory space 700 shown in FIG. 7 may be a continuous physical address space in the memory in a Vault, and the physical address space is allocated to the NDP unit in the Vault during the initialization process.
- the continuous memory space includes: .text area, .data area and .stack area.
- the .text area is used to save the compiled machine instructions of the computing task source code, corresponding to the code blocks in the aforementioned embodiments;
- the .data area is used to store data, corresponding to the data blocks in the aforementioned embodiments;
- the .stack area is a block Reserved stack space.
- the initialization process (that is, the distribution process of near-data computing tasks) will be described below by taking the continuous memory space 700 in FIG. 7 as an example.
- the processor can dynamically distribute computing tasks to the NDP units in each Vault through the bus according to the computing requirements and load conditions of the system.
- a continuous memory space and then write the corresponding data for near data calculation into the continuous memory space corresponding to each NDP unit.
- the process for the processor to write corresponding calculation data for near data to each continuous memory space may include: calculation logic distribution and calculation data distribution.
- the processor when performing calculation logic distribution, the processor writes the compiled binary calculation task source code into the .text area in the continuous memory space 700, and then sets the PC register in the NDP core to store the corresponding offset address.
- the processor When distributing calculation data, the processor writes corresponding data into the .data area in the continuous memory space 700 . It should be noted that if the strategy of data division and data distribution selected in the memory allocation stage is reasonable, the processor may have completed the distribution of calculation data while processing other calculation tasks. In this case, only the distribution of calculation logic is needed.
- each NDP unit can start computing independently.
- the SPM is used to store the data required for the operation of the NDP unit;
- the MPU is used to provide protection for the data in the memory, that is, to assign different access rights to different areas in the continuous memory space ;
- the DMA engine is used to move data between the memory and the SPM.
- FIG. 8 is a schematic flowchart of a data processing method in an embodiment of the present application.
- the method is applied to a data storage device, and the data storage device includes a memory and a first near-data computing NDP unit, the first NDP unit is electrically connected to the memory, and the data storage device is connected to a processor through a bus.
- the methods include:
- Step S810 storing first physical address information through the first NDP unit; wherein, the first physical address information is used to point to a first address space in the memory, and the first address space is the first NDP unit A contiguous memory space that you have the right to use.
- Step S820 storing, by the memory, in the first address space, first data for performing near data calculations from the processor.
- Step S830 The first NDP unit reads part or all of the first data from the first address space based on the obtained first offset address and first physical address information; based on the first A calculation is performed on some or all of the data.
- the first NDP unit includes a first register unit and a near data calculation core NDP core; storing the first physical address information through the first NDP unit includes: through the first NDP unit A register unit stores the first physical base address and the first length; the first NDP unit reads from the first address space based on the obtained first offset address and first physical address information Retrieving part or all of the first data includes: obtaining the first physical address information from the first registration unit by the NDP core, and obtaining the first physical address information based on the first physical address information and the first The offset address reads part or all of the first data from the first address space.
- the first physical address information includes a first boundary address and a first length; the first address based on the first physical address information and the first offset address Reading part or all of the first data in the space includes: when the first offset address is less than or equal to the first length, the NDP Core based on the first offset address and The first boundary address is calculated to obtain a first access address; wherein, the first boundary address is the starting physical address of the first address space or the ending physical address of the first address space, and the first The length is the length of the first address space; part or all of the first data is read by the NDP Core from the first access address in the first address space.
- the first physical address information includes a second boundary address and a third boundary address
- Reading part or all of the first data in the address space includes: calculating the first access address based on the first offset address and the second boundary address by the NDP Core, or calculating the first access address based on the The first offset address and the third boundary address are calculated to obtain the first access address; wherein, the second boundary address and the third boundary address are respectively the starting physical address and the first address space of the first address space The termination physical address of the first address space; when the first access address is between the second boundary address and the third boundary address, read from the first access address by the NDP Core Obtain part or all of the first data.
- the data storage device further includes a second NDP unit, and the second NDP unit is electrically connected to the memory; the method further includes: storing the second Physical address information; wherein, the second physical address information is used to point to a second address space in the memory, and the second address space is a continuous memory space that the second NDP unit has the right to use; by the memory Store in the second address space the second data used for near data calculation from the processor; based on the obtained second offset address and the second physical address information from the first NDP unit Reading part or all of the second data from the second address space; performing calculations based on part or all of the second data.
- the method further includes: receiving, by the memory, an instruction of the processor, the instruction instructing the memory to allocate the first address space for the first NDP unit, and instructing the memory to allocate the second address space for the second NDP unit.
- the first NDP unit further includes a second register unit and a cache unit; the second offset address obtained based on the second physical address information is obtained from the second address space
- Reading part or all of the second data in the second data includes: obtaining the second physical address information from the cache unit through the NDP core under the condition that the cache unit caches the second physical address information Two physical address information, and update it into the second register unit; read the second address space from the second address space through the NDP core based on the second offset address and the second physical address information Some or all of the data in the second data; or, in the case that the cache unit does not cache the second physical address information, obtain the second physical address information from the second NDP unit through the NDP core , and update to the second register unit and the cache unit; read the NDP core from the second address space based on the second offset address and the second physical address information Some or all of the data in the second data.
- the second physical address information includes a fourth boundary address and a second length
- the NDP core based on the second offset address and the second physical address information from Reading part or all of the second data in the second address space includes: reading the fourth boundary address and the second length from the second register unit through the NDP core ;
- the fourth boundary address is the starting physical address of the second address space or the ending physical address of the second address space
- the second length is the length of the second address space; when the When the second offset address is less than or equal to the second length, the NDP core calculates a second access address based on the second offset address and the fourth boundary address; Reading part or all of the second data from the second access address in the second address space.
- the second physical address information includes a fifth boundary address and a sixth boundary address
- the NDP core is based on the second offset address and the second physical address information
- the aspect of reading part or all of the second data from the second address space includes: calculating the second offset address and the fifth boundary address by the NDP core based on the second offset address and the fifth boundary address An access address, or the second access address is calculated based on the second offset address and the sixth boundary address; wherein, the fifth boundary address and the sixth boundary address are respectively the second address The starting physical address of the space and the ending physical address of the second address space; when the second access address is located between the fifth boundary address and the sixth boundary address, the NDP core starts from the Read part or all of the first data from the second access address.
- each of the first register unit and the second register unit includes at least one register.
- the method further includes: after the first NDP unit completes the near data calculation, the first NDP unit sends a signal to the processor through the bus; The above signal is used to indicate that the first NDP unit has completed the near data calculation.
- An embodiment of the present application provides a data processing device, including a processor, the data storage device provided in any implementation manner in the foregoing embodiments, and a discrete device coupled to the data storage device.
- the data processing device may be the data processing device described in any one of the embodiments in FIG. 2 , FIG. 3 and FIG. 5 .
- the embodiment of the present application provides a computer storage medium, the computer storage medium stores a computer program, and when the computer program is executed, the first NDP unit 212 can execute any part as described in the above method embodiments Or all steps to complete the above calculation process.
- the embodiment of the present application provides a computer program, the computer program includes instructions, when the computer program is executed by the processor or the first NDP unit 212, the first NDP unit 212 can execute any one of the methods described in the above method embodiments. some or all of the steps.
- the disclosed device can be implemented in other ways.
- the device embodiments described above are only illustrative.
- the division of the above units is only a logical function division.
- there may be other division methods for example, multiple units or components can be combined or integrated. to another system, or some features may be ignored, or not implemented.
- the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical or other forms.
- the units described above as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Computing Systems (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
本申请公开了一种数据存储装置和数据处理方法。数据存储装置包括存储器和第一近数据计算NDP单元,第一NDP单元与存储器电连接,数据存储装置通过总线与处理器相连。第一NDP单元,用于存储第一物理地址信息,该信息指向第一地址空间,第一地址空间是第一NDP单元有权使用的一段连续内存空间;存储器,用于在第一地址空间存储来自处理器的第一数据;第一NDP单元,还用于基于获取的第一偏移地址和第一物理地址信息从第一地址空间中读取第一数据中的部分或全部数据,并执行计算。采用本申请实施例,可以消除地址翻译在第一NDP单元中带来的硬件开销,并大幅提升近数据计算过程中的计算性能和能耗比。
Description
本申请要求于2021年8月30日提交中国专利局、申请号为202111008166.4、申请名称为“数据存储装置和数据处理方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及芯片技术领域,尤其涉及一种数据存储装置和数据处理方法。
近数据计算(Near Data Processing,NDP)是一种在存储器件(如内存)附近部署计算单元(如微处理单元(Microprocessor Unit,MPU)等)进行相关数据运算的技术。此种技术通过高带宽的链路将计算单元集成到存储器件中,赋予存储器件一部分计算能力。通过近数据计算技术可以将一些原本由中央处理单元(Central Processing Unit,CPU)执行的计算任务卸载到具有计算能力的存储器件中完成,从而大幅减少CPU与存储器件之间的远距离数据传输,以提升系统性能和降低能耗。
在近数据计算过程中,用于执行计算任务的计算单元可以采用虚拟地址方式或物理地址方式访问存储器,以获取相应的数据来执行计算任务。当采用物理地址访问内存时,计算单元在执行计算过程中需要与处理器进行多次交互。
上述现有技术中,当采用虚拟地址访问内存时,所设计的计算单元的硬件复杂度高;而采用物理地址访问内存时,总线上的通信开销和数据传输量都较大。
发明内容
本申请实施例提供了一种数据存储装置和数据处理方法,可以消除地址翻译带来的硬件开销,并大幅提升近数据计算过程中的计算性能和能耗比。
第一方面,本申请提供了一种数据存储装置,所述数据存储装置包括存储器和第一近数据计算NDP单元,所述第一NDP单元和所述存储器电连接,所述数据存储装置通过总线与处理器相连;其中,所述第一NDP单元,用于存储第一物理地址信息基址和第一长度;其中,所述第一物理地址信息基址和所述第一长度指向所述存储器中的第一地址空间,所述第一地址空间是所述第一NDP单元有权使用的一段连续内存空间;所述存储器,用于在所述第一地址空间中存储来自所述处理器的用于进行近数据计算的第一数据;所述第一NDP单元,还用于获取第一偏移地址,并基于获取的所述第一偏移地址和所述第一物理地址信息从所述第一地址空间中读取所述第一数据中的部分或全部数据;基于所述第一数据中的部分或全部数据执行计算。
从技术效果上看,本申请中,第一NDP单元通过在其内部存储用于指向第一地址空间的第一物理地址信息,使得后续可以基于物理地址在存储器中进行寻址,以从连续的第一地址空间中获取用于近数据计算的第一数据,相比现有技术中利用虚拟地址寻址来进行计算的过程,省略了第一NDP单元中地址翻译的过程(即虚拟地址和物理地址的转换过程),可以有效降低第一NDP单元的硬件复杂度。此外,在利用本申请中的装置进行近数据计算过程中,只有在处理器为第一NDP单元分配第一地址空间以及向第一地址空间写入数据时,处理器与第一NDP单元才会通过总线进行交互,而第一NDP单元在启动计算的后续过程中再无需通 过总线和处理器侧进行交互,只通过物理链路与存储装置中的存储器进行交互。因而本申请相对现有技术而言,可以显著降低NDP单元与处理器侧的交互次数,即显著降低总线上的通信开销以及数据传输量,进而大幅提升近数据计算过程中的计算性能和能耗比。
应当理解,上述数据存储装置可以包括多个NDP单元,每个NDP单元执行计算任务的过程与第一NDP单元执行计算任务的过程对应相同。
其中,上述数据存储装置可以是具有可编程处理能力的存储产品,例如,其可以是通用内存条或磁盘等。在一种可行的实施方式中,所述第一NDP单元包括第一寄存单元和近数据计算核心NDP core;所述第一NDP单元具体用于:通过所述第一寄存单元存储所述第一物理地址信息;通过所述NDP core从所述第一寄存单元获取所述第一物理地址信息,并基于所述第一物理地址信息和所述第一偏移地址从所述第一地址空间中读取所述第一数据中的部分或全部数据。
从技术效果上看,由于本申请通过寄存单元存储用于指向第一地址空间的第一物理地址信息,因而只要确保寄存单元中的数据不被非法修改,便可保证存储器被访问过程的安全性。
在一种可行的实施方式中,所述第一物理地址信息包括第一边界地址和第一长度;所述NDP core具体用于:当所述第一偏移地址小于或等于所述第一长度时,基于所述第一偏移地址和所述第一边界地址,计算得到第一访问地址;其中,所述第一边界地址为所述第一地址空间的起始物理地址或所述第一地址空间的终止物理地址,所述第一长度为所述第一地址空间的长度;从所述第一地址空间中的所述第一访问地址中读取所述第一数据中的部分或全部数据。
在一种可行的实施方式中,所述第一物理地址信息包括第二边界地址和第三边界地址;所述NDP core具体用于:基于所述第一偏移地址和所述第二边界地址计算得到第一访问地址,或者基于所述第一偏移地址和所述第三边界地址计算得到所述第一访问地址;其中,所述第二边界地址和所述第三边界地址分别为所述第一地址空间的起始物理地址和所述第一地址空间的终止物理地址;当所述第一访问地址位于所述第二边界地址和所述第三边界地址之间时,从所述第一访问地址中读取所述第一数据中的部分或全部数据。
其中,第一偏移地址为相对第一地址空间的起始物理地址或终止物理地址的偏移量offset。
可选地,第一物理地址信息可以包括第一地址空间的起始物理地址、第一地址空间的终止物理地址和第一地址空间的长度(即第一长度)中的至少两个。
从技术效果上看,通过上述实施例中的第一NDP单元访问存储器的方式(即第一访问地址的计算方式),可以确保NDP core的访问范围在预先设定的一段连续内存空间(即第一地址空间)的内部,不会超出该连续内存空间,从而提升存储器被访问过程的安全性。
在一种可行的实施方式中,所述数据存储装置还包括第二NDP单元,所述第二NDP单元和所述存储器电连接;所述第二NDP单元,用于存储第二物理地址信息;其中,所述第二物理地址信息用于指向所述存储器中的第二地址空间,所述第二地址空间是所述第二NDP单元有权使用的一段连续内存空间;所述存储器,用于在所述第二地址空间中存储来自所述处理器的用于进行近数据计算的第二数据;所述第一NDP单元,还用于基于获取的第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据;基于所述第二数据中的部分或全部数据执行计算。
从技术效果上看,在第一NDP单元执行计算任务的过程中,第一NDP单元不仅可以访问与其对应的第一地址空间,还可以访问其它NDP单元对应的地址空间(如第二NDP单元对应的第二地址空间),即采用本申请实施例进行近数据计算过程中,当第一NDP单元需要 与其它NDP单元(如第二NDP单元)进行数据交互时,可以基于存储的第二物理地址信息直接从第二地址空间获取用于计算的数据,相对于现有技术中需要遍历数据存储装置中的所有其它NDP单元对应的地址空间而言,本申请实施例可以使得第一NDP单元对存储器中除第一地址空间外的其它地址空间的访问具有良好的扩展性。
在一种可行的实施方式中,所述存储器还用于接收所述处理器的指令,所述指令指示所述存储器为所述第一NDP单元分配所述第一地址空间,以及指示所述存储器为所述第二NDP单元分配所述第二地址空间。
在一种可行的实施方式中,所述第一NDP单元还包括第二寄存单元和缓存单元;在所述第一NDP单元还用于基于获取的第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据的方面,所述第一NDP单元具体用于:在所述缓存单元缓存有所述第二物理地址信息的情况下,通过所述NDP core从所述缓存单元中获取所述第二物理地址信息,并更新到所述第二寄存单元中;通过所述NDP core基于获取的所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据;或者,在所述缓存单元未缓存所述第二物理地址信息的情况下,通过所述NDP core从所述第二NDP单元中获取所述第二物理地址信息,并更新到所述第二寄存单元和所述缓存单元中;通过所述NDP core基于获取的所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据。
其中,缓存单元从物理实现上,可以是靠近NDP core的缓存Cache,例如,可以是静态随机存储器SRAM,其在逻辑上可以理解为缓存有相应内容的数据结构,例如可以为用于存储第二物理基址和第二长度的表单。
从技术效果上看,本申请通过在第一NDP单元中设置缓存单元来缓存获取过的第二物理地址信息,从而当后续计算过程中如需再次使用时,可以直接从缓存单元中获取第二物理地址信息来计算相应的访问地址,从而基于计算得到的访问地址读取用于进行近数据计算的数据。由于本申请在近数据计算过程中增加了上述缓存机制,因而可以有效降低近数据计算过程的延迟,提高计算效率和能耗比。
在一种可行的实施方式中,所述第二物理地址信息包括第四边界地址和第二长度;在所述通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据的方面,所述NDP core具体用于:从所述第二寄存单元中读取所述第四边界地址和所述第二长度;其中,所述第四边界地址为所述第二地址空间的起始物理地址或所述第二地址空间的终止物理地址,所述第二长度为所述第二地址空间的长度;当所述第二偏移地址小于或等于所述第二长度时,基于所述第二偏移地址和所述第四边界地址计算得到第二访问地址;从所述第二地址空间中的所述第二访问地址中读取所述第二数据中的部分或全部数据。
在一种可行的实施方式中,所述第二物理地址信息包括第五边界地址和第六边界地址;在所述通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据的方面,所述NDP core具体用于:基于所述第二偏移地址和所述第五边界地址计算得到第二访问地址,或者基于所述第二偏移地址和所述第六边界地址计算得到所述第二访问地址;其中,所述第五边界地址和所述第六边界地址分别为所述第二地址空间的起始物理地址和所述第二地址空间的终止物理地址;当所述第二访问地址位于所述第五边界地址和所述第六边界地址之间时,从所述第二访问地址中读取所述第一数据中的部分或全部数据。
其中,第二偏移地址为相对第二地址空间的起始物理地址或终止物理地址的偏移量。
从技术效果上看,通过上述实施例中的第一NDP单元访问存储器中第二地址空间的方式,可以确保访问范围在第二地址空间的内部,不会超出第二地址空间,从而保证存储器被访问过程的安全性。
从技术效果上看,由于数据存储装置中的NDP单元不感知处理器中缓存的用于近数据计算的第一数据,因而在本申请中没有其它保证缓存一致性的机制时,需要将处理器中缓存的第一地址空间中的数据进行清除,以避免缓存一致性的问题。
在一种可行的实施方式中,所述第一寄存单元和所述第二寄存单元分别都包括至少一个寄存器。
其中,上述寄存单元可以为具有数据存储和数据读写功能的高速存储器件,其数据读写速度远高于硬盘、U盘等外部存储器件(外部存储器件指计算设备内存及处理器缓存以外的存储器件)。
其中,每个寄存单元中寄存器的数量要满足保存地址信息的要求。可选地,可以将第一物理地址信息共同存储在同一个寄存器中,或者分别存储在两个寄存器中。同理,第二物理地址信息也可以共同存储在同一个寄存器中,或者分别存储在两个寄存器中,本申请对此不限定。
从技术效果上看,由于第一物理地址信息是存储于寄存器中,因而只要确保寄存器中的数据不被非法修改,便可保证第一存储器被访问过程的安全性,即安全性好。
在一种可行的实施方式中,所述第一NDP单元,还用于在完成所述近数据计算后,通过所述总线向所述处理器发送第二信号;所述第二信号用于指示所述第一NDP单元已完成近数据计算。
从技术效果上看,当第一NDP单元完成近数据计算时,可以告知处理器侧其进度情况,以确保处理器侧能够掌握各NDP单元的近数据计算任务的完成情况。
第二方面,本申请提供了一种数据处理方法,应用于数据存储装置,所述数据存储装置包括存储器和第一近数据计算NDP单元,所述第一NDP单元和所述存储器电连接,所述数据存储装置通过总线与处理器相连;所述方法包括:通过所述第一NDP单元存储第一物理地址信息;其中,所述第一物理地址信息用于指向所述存储器中的第一地址空间,所述第一地址空间是第一NDP单元有权使用的一段连续内存空间;由所述存储器在所述第一地址空间中存储来自所述处理器的用于进行近数据计算的第一数据;由所述第一NDP单元基于获取的第一偏移地址和第一物理地址信息从所述第一地址空间中读取所述第一数据中的部分或全部数据;基于所述第一数据中的部分或全部数据执行计算。
在一种可行的实施方式中,所述第一NDP单元包括第一寄存单元和近数据计算核心NDP core;所述通过所述第一NDP单元存储第一物理地址信息,包括:通过所述第一寄存单元存储所述第一物理基址和所述第一长度;所述由所述第一NDP单元基于获取的第一偏移地址和第一物理地址信息从所述第一地址空间中读取所述第一数据中的部分或全部数据,包括:由所述NDP core从所述第一寄存单元获取所述第一物理地址信息,并基于所述第一物理地址信息和所述第一偏移地址从所述第一地址空间中读取所述第一数据中的部分或全部数据。
在一种可行的实施方式中,所述第一物理地址信息包括第一边界地址和第一长度;所述基于所述第一物理地址信息和所述第一偏移地址从所述第一地址空间中读取所述第一数据中的部分或全部数据,包括:当所述第一偏移地址小于或等于所述第一长度时,由所述NDP Core 基于所述第一偏移地址和所述第一边界地址,计算得到第一访问地址;其中,所述第一边界地址为所述第一地址空间的起始物理地址或所述第一地址空间的终止物理地址,所述第一长度为所述第一地址空间的长度;由所述NDP Core从所述第一地址空间中的所述第一访问地址中读取所述第一数据中的部分或全部数据。
在一种可行的实施方式中,所述第一物理地址信息包括第二边界地址和第三边界地址;所述基于所述第一物理地址信息和所述第一偏移地址从所述第一地址空间中读取所述第一数据中的部分或全部数据,包括:由所述NDP Core基于所述第一偏移地址和所述第二边界地址计算得到第一访问地址,或者基于所述第一偏移地址和所述第三边界地址计算得到所述第一访问地址;其中,所述第二边界地址和所述第三边界地址分别为所述第一地址空间的起始物理地址和所述第一地址空间的终止物理地址;当所述第一访问地址位于所述第二边界地址和所述第三边界地址之间时,由所述NDP Core从所述第一访问地址中读取所述第一数据中的部分或全部数据。
在一种可行的实施方式中,所述数据存储装置还包括第二NDP单元,所述第二NDP单元和所述存储器电连接;所述方法还包括:通过所述第二NDP单元存储第二物理地址信息;其中,所述第二物理地址信息用于指向所述存储器中的第二地址空间,所述第二地址空间是第二NDP单元有权使用的一段连续内存空间;由所述存储器在所述第二地址空间中存储来自所述处理器的用于进行近数据计算的第二数据;由所述第一NDP单元基于获取的第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据;基于所述第二数据中的部分或全部数据执行计算。
在一种可行的实施方式中,所述方法还包括:由所述存储器接收所述处理器的指令,所述指令指示所述存储器为所述第一NDP单元分配所述第一地址空间,以及指示所述存储器为所述第二NDP单元分配所述第二地址空间。
在一种可行的实施方式中,所述第一NDP单元还包括第二寄存单元和缓存单元;所述基于获取的第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据,包括:在所述缓存单元缓存有所述第二物理地址信息的情况下,通过所述NDP core从所述缓存单元中获取所述第二物理地址信息,并更新到所述第二寄存单元中;通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据;或者,在所述缓存单元未缓存所述第二物理地址信息的情况下,通过所述NDP core从所述第二NDP单元中获取所述第二物理地址信息,并更新到所述第二寄存单元和所述缓存单元中;通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据。
在一种可行的实施方式中,所述第二物理地址信息包括第四边界地址和第二长度;所述通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据,包括:通过所述NDP core从所述第二寄存单元中读取所述第四边界地址和所述第二长度;其中,所述第四边界地址为所述第二地址空间的起始物理地址或所述第二地址空间的终止物理地址,所述第二长度为所述第二地址空间的长度;当所述第二偏移地址小于或等于所述第二长度时,由所述NDP core基于所述第二偏移地址和所述第四边界地址计算得到第二访问地址;由所述NDP core从所述第二地址空间中的所述第二访问地址中读取所述第二数据中的部分或全部数据。在一种可行的实施方式中,所述第二物理地址信息包括第五边界地址和第六边界地址;所述通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据 的方面,包括:由所述NDP core基于所述第二偏移地址和所述第五边界地址计算得到第二访问地址,或者基于所述第二偏移地址和所述第六边界地址计算得到所述第二访问地址;其中,所述第五边界地址和所述第六边界地址分别为所述第二地址空间的起始物理地址和所述第二地址空间的终止物理地址;当所述第二访问地址位于所述第五边界地址和所述第六边界地址之间时,由所述NDP core从所述第二访问地址中读取所述第一数据中的部分或全部数据。
在一种可行的实施方式中,所述第一寄存单元和所述第二寄存单元分别都包括至少一个寄存器。
在一种可行的实施方式中,所述方法还包括:在所述第一NDP单元完成所述近数据计算后,由所述第一NDP单元通过所述总线向所述处理器发送信号;所述信号用于指示所述第一NDP单元已完成近数据计算。
第三方面,本申请实施例提供了一种数据处理装置,包括处理器、上述第一方面中的任意一种实施方式所提供的数据存储装置以及耦合于该数据存储装置的分立器件。
第四方面,本申请实施例提供了一种计算机存储介质,所述计算机存储介质存储有计算机程序,该计算机程序被执行时,上述第二方面中任意一项所述的数据处理方法得以实现。
第五方面,本申请实施例提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被执行时,上述第二方面中任意一项所述的数据处理方法得以实现。
以下对本申请实施例用到的附图进行介绍。
图1是本申请实施例中一种用于近数据计算的系统架构的结构示意图;
图2是本申请实施例中一种数据处理装置的结构示意图;
图3是本申请实施例中另一种数据处理装置的结构示意图;
图4是本申请实施例中一种访存地址的计算逻辑示意图;
图5是本申请实施例中又一种数据处理装置的结构示意图;
图6是本申请实施例中一种数据存储装置的硬件结构示意图;
图7是本申请实施例中一种连续内存空间的空间布局示意图;
图8是本申请实施例中一种数据处理方法的流程示意图。
下面结合本申请实施例中的附图对本申请实施例进行描述。
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
首先对本申请实施例中的相关术语进行解释:
(1)近数据计算(Near-data Processing,NDP):一种在存储器件附近部署计算单元的技术,旨在大幅减少远距离访存请求,以解决访存受限问题,提升整体性能和能耗比。
(2)缓存一致性:在采用层次结构存储系统的计算机系统中,保证高速缓冲存储器中数据与主存储器中数据相同机制。在一个系统中,当许多不同的设备共享一个共同存储器资源,在高速缓存中的数据不一致,就会产生问题。如果某些共享数据同时存在于不同设备的缓存中,也需要保证其中数据的一致性。
(3)地址翻译:指虚拟地址和物理地址的转换过程。虚拟地址通常是指由程序提供的地址,而物理地址则是指有效的存储器地址。由所有虚拟地址构成的集合叫做虚拟地址空间(Virtual Address Space),由所有物理地址构成的集合叫做物理地址空间(Physical Address Space)。地址翻译的过程,可以简单理解为通过页表查询出虚拟地址对应的物理地址,而缩略语表格中的转译后备缓冲器(Translation Lookaside Buffer,TLB)就是用于缓存部分页表以加速查询过程的一种硬件结构。
(4)缓存(Cache):计算机内存系统中一个层次,处于主存和处理器之间,是为了弥合两者之间的处理速度差异而加入的,与主存相比,cache访问速度更快,但容量更小。Cache通常也可分为多层,越靠近CPU的层次容量越小,访问速度也越快。
(5)硅通孔(Through Silicon Via,TSV):在3D封装内存中通常采用的一种技术,可以将芯片中的多层结构在垂直方向上连接在一起,提供非常高的数据传输带宽。
(6)栈(Stack):在内存中用于保存局部变量、函数调用参数等的一段特殊空间,具有后进先出的特征,一般从高地址向低地址方向增长。
(7)能耗比:性能功耗比,即性能与功耗的比值,通常表示处理器在特定功耗下的性能水平。该值越高,表示在固定功耗下处理器可以完成的计算越多。
(8)连续内存空间:指在物理上(存储器中)一段连续的存储空间,该内存空间对应一段连续的物理地址。
(9)偏移地址:指相对于所对应地址空间的起始物理地址或终止物理地址的偏移量offset。
请参见图1,图1为本申请实施例中一种用于近数据计算的系统架构的结构示意图。如图1所示,系统架构100可以包括中央处理单元110(Center Processing Unit,CPU)、图形处理单元120(Graphics Processing Unit,GPU)、数字信号处理器130(Digital Signal Processor,DSP)、M个数据存储装置150和总线160,其中M为正整数。
其中,CPU、GPU和DSP通过总线160与MC相连,MC通过物理连线与M个数据存储装置150直连。CPU、GPU和DSP可以作为系统级芯片(System on Chip,SOC)的一部分。每个数据存储装置可以包括一个存储器和一个或多个近数据计算NDP单元。例如,数据存储装置1包括存储器1和N个近数据计算NDP单元,数据存储装置M包括存储器M和K个NDP单元;其中,存储器和NDP单元通过物理链路直连,N和K为正整数。
其中,系统架构100中还可以包括存储控制器(Memory Controller,MC,图1未示出),用于控制数据存储装置中的数据读写操作。
数据存储装置可以是具有可编程处理能力的存储产品。例如,通用内存条或磁盘等。数据存储装置中包括的存储器可以是随机存储器(Random Access Memory,RAM)或非易失性存储器(Non-Volatile Memory,NVM)中的任意一种。随机存储器RAM包括静态随机存储器(Static Random-Access Memory,SRAM)和动态随机存储器(Dynamic Random Access Memory,DRAM)等;非易失性存储器NVM包括只读存储器(Read-only Memory,ROM)和闪存FlashMemory等。
其中,上述系统架构100可以被包含于任意用于进行近数据计算的终端设备中,该终端设备可以是手机、电脑、平板、可穿戴设备或车载终端等。系统架构100可以应用于通用计算、高性能计算或人工智能等任意需要进行近数据计算的场景中,本申请对此不限定。
请参见图2,图2是本申请实施例中一种数据处理装置的结构示意图。如图2所示,数据处理装置200是基于图1中近数据计算的系统架构100得到的一种数据处理装置,包括:数据存储装置210、处理器220、总线230。数据存储装置210通过总线230和处理器220连接。数据存储装置210包括存储器211和第一NDP单元212,存储器211与第一NDP单元212电连接。
可选地,电连接可以指存储器211与第一NDP单元212之间通过铜箔或导线等可传输电信号的实体线路进行直接连接,即它们之间不再包含其他的器件。
其中,数据存储装置210可以是图1中M个数据存储装置中的任意一个。数据处理装置200可以是任意用于进行近数据计算的终端设备,如手机、电脑、平板、可穿戴设备或车载终端等。处理器220可以是中央处理单元(CPU)或其他处理核心。处理器220也可以是异构处理器,即不同类型处理器,关于处理器的具体实现方案本实施例不做展开。
第一NDP单元212,用于存储第一地址信息;其中,第一地址信息用于指向存储器中的第一地址空间,第一地址空间是由处理器为第一NDP单元212分配的一段连续内存空间。
具体地,在数据处理装置200进行近数据计算的过程中,首先由处理器220进行初始化,来为第一NDP单元212在存储器211中分配一段连续的内存空间,即第一地址空间。该第一地址空间可以是一系列连续的物理地址的集合。在处理器220为第一NDP单元212分配完第一地址空间后,由第一NDP单元212存储表征第一地址空间在存储器中具体位置的第一物理地址信息。
可选地,数据处理装置200可以通过处理器220运行一段初始化代码来执行相应初始化过程(也可称为近数据计算任务分发过程),来为第一NDP单元212在存储器211中分配一段连续内存空间。其中,第一NDP单元可以是中央处理单元CPU或微处理单元(Microprocessor Unit,MCU)等,本申请对此不做限定。
存储器211,用于在第一地址空间中存储处理器发送的用于进行近数据计算的第一数据。
可选地,数据处理装置200可以通过在处理器220运行上述初始化代码,来实现向存储器211中第一地址空间中写入用于近数据计算的第一数据。具体地,在处理器220为第一NDP单元212分配完对应的第一地址空间后,可以通过总线230向第一地址空间中写入第一数据。
可选地,第一地址空间在存储器211中对应的连续内存空间在逻辑上可以划分为数据区块和代码区块,分别用于保存数据和代码,即用于近数据计算的第一数据中包括数据和代码。应当注意,在分页式内存管理系统中,上述由处理器220分配的连续内存空间大小不受页大小的限制。
其中,存储器211可以是随机存储器(Random Access Memory,RAM)或非易失性存储器(Non-Volatile Memory,NVM)中的任意一种。随机存储器RAM包括静态随机存储器(Static Random-Access Memory,SRAM)和动态随机存储器(Dynamic Random Access Memory,DRAM)等;非易失性存储器NVM包括只读存储器(Read-only Memory,ROM)和闪存FlashMemory等。
第一NDP单元212,还用于基于获取的第一偏移地址和所述第一物理地址信息从第一地址空间中读取第一数据中的部分或全部数据;基于第一数据中的部分或全部数据执行计算。
具体地,第一NDP单元212执行的计算可以包括一个或多个轮次的计算过程。在每轮计算的过程中,第一NDP单元都需要从存储器211中读取相应的数据。在该多轮计算的一轮计算过程中,第一NDP单元可以首先基于获取的第一偏移地址(也可称为第一偏移量),然后基于第一偏移地址和第一物理地址信息从第一地址空间中读取第一数据中的部分或全部数据,然后基于读取的第一数据的部分或全部数据执行计算。可选的,第一NDP单元212可以从第一地址空间中的代码区块(例如,代码区块中load或store指令的操作数)或者第一NDP单元中的程序计数器(Program Counter,PC)寄存器中,或者上一轮计算过程中第一NDP单元从第一地址空间所读取的数据中获取该第一偏移地址,然后基于第一物理地址信息和第一偏移地址从第一地址空间中读取第一数据中的部分或者全部数据。
从技术效果上看,本申请中,第一NDP单元通过在其内部存储用于指向第一地址空间的第一物理地址信息,使得后续可以基于物理地址在存储器中进行寻址,以从连续的第一地址空间中获取用于近数据计算的第一数据,相比现有技术中利用虚拟地址寻址来进行近数据计算的过程,省略了第一NDP单元中地址翻译的过程(即虚拟地址和物理地址的转换过程),可以有效降低第一NDP单元的硬件复杂度。此外,在利用本申请中的装置进行近数据计算过程中,只有在处理器为第一NDP单元分配第一地址空间以及向第一地址空间写入数据时,处理器与第一NDP单元才会通过总线进行交互,而第一NDP单元在启动计算的后续过程中再无需通过总线和处理器侧进行交互,只通过物理链路与存储装置中的存储器进行交互。因而本申请相对现有技术而言,可以显著降低NDP单元与处理器侧的交互次数,即显著降低总线上的通信开销以及数据传输量,进而大幅提升近数据计算过程中的计算性能和能耗比。
请参见图3,图3是本申请实施例中另一种数据处理装置的结构示意图,作为对图2中数据处理装置200中第一NDP单元212的细化。如图3所示,第一NDP单元212可以包括近数据计算核心NDP core2121和第一寄存单元2122。
在一种可行的实施方式中,所述第一NDP单元212具体用于:通过所述第一寄存单元2122存储所述第一物理地址信息;通过所述NDP core2121从所述第一寄存单元获取所述第一物理地址信息,并基于所述第一物理地址信息和所述第一偏移地址从所述第一地址空间中读取所述第一数据中的部分或全部数据。
具体地,第一NDP单元212基于第一物理地址信息和第一偏移地址从第一地址空间中读取数据的过程是由NDP core2121实现的。
从技术效果上看,由于本申请通过寄存单元存储用于指向第一地址空间的第一物理地址信息,因而只要确保寄存单元中的数据不被非法修改,便可保证存储器被访问过程的安全性。
在一种可行的实施方式中,所述第一物理地址信息包括第一边界地址和第一长度;所述NDP core具体用于:当所述第一偏移地址小于或等于所述第一长度时,基于所述第一偏移地址和所述第一边界地址计算得到第一访问地址;其中,所述第一边界地址为所述第一地址空间的起始物理地址或所述第一地址空间的终止物理地址,所述第一长度为所述第一地址空间的长度;从所述第一地址空间中的所述第一访问地址中读取所述第一数据中的部分或全部数据。
其中,第一偏移地址为相对第一地址空间的起始物理地址或终止物理地址的偏移量。
可选地,第一物理地址信息可以包括第一地址空间的起始物理地址、第一地址空间的终 止物理地址和第一地址空间的长度(即第一长度)中的至少两个。
具体地,当第一物理地址信息包括第一边界地址和第一长度时,NDP Core 2121首先比较第一偏移地址和第一长度的大小:当第一偏移地址小于或等于第一长度时,基于第一偏移地址和第一边界地址计算得到第一访问地址。
此时基于第一偏移地址和第一边界地址计算第一访问地址的过程共包含四种情况:
(1)第一偏移地址为相对于第一地址空间起始物理地址的偏移量,第一边界地址为第一地址空间的起始物理地址时,NDP Core 2121将第一边界地址和第一偏移地址相加,得到第一访问地址。
(2)第一偏移地址为相对于第一地址空间起始物理地址的偏移量,第一边界地址为第一地址空间的终止物理地址时,NDP Core 2121首先将第一边界地址减去第一长度,再加上第一偏移地址,得到第一访问地址。
(3)第一偏移地址为相对于第一地址空间终止物理地址的偏移量,第一边界地址为第一地址空间的起始物理地址时,NDP Core 2121将第一边界地址加上第一长度,再减去第一偏移量,得到第一访问地址。
(4)第一偏移地址为相对于第一地址空间终止物理地址的偏移量,第一边界地址为第一地址空间的终止物理地址时,NDP Core 2121将第一边界地址减去第一偏移量,得到第一访问地址。
在一种可行的实施方式中,所述第一物理地址信息包括第二边界地址和第三边界地址;所述NDP core具体用于:基于所述第一偏移地址和所述第二边界地址计算得到第一访问地址,或者基于所述第一偏移地址和所述第三边界地址计算得到所述第一访问地址;其中,所述第二边界地址和所述第三边界地址分别为所述第一地址空间的起始物理地址和所述第一地址空间的终止物理地址;当所述第一访问地址位于所述第二边界地址和所述第三边界地址之间时,从所述第一访问地址中读取所述第一数据中的部分或全部数据。
具体地,当第一物理地址信息包括第二边界地址和第三边界地址时,NDP Core 2121首先可以基于第二边界地址、第三边界地址和第一偏移地址计算得到第一访问地址,然后判断计算得到的第一访问地址是否落入第一地址空间中,且当第一访问地址落入第一地址空间内时,从所述第一访问地址中读取所述第一数据中的部分或全部数据。
上述基于第二边界地址、第三边界地址和第一偏移地址计算得到第一访问地址包含两种情况:
(1)第一偏移地址为相对于第一地址空间起始物理地址的偏移量时,NDP Core 2121将第一偏移地址和第二边界地址相加,得到第一访问地址。
(2)第一偏移地址为相对于第一地址空间终止物理地址的偏移量时,NDP Core 2121利用第三边界地址减去第一偏移地址,得到第一访问地址。
从技术效果上看,当第一物理地址信息中分别包含不同的内容时,采用本申请实施例中的上述计算逻辑计算得到的第一访问地址可以正好落入第一地址空间中,即第一NDP单元212对存储器211的访问不会超出处理器预先为第一NDP单元分配的连续内存空间(即第一地址空间),因而可以提升第一NDP单元212对存储器211访问过程的安全性。当第一偏移地址大于第一长度或者计算得到的第一访问地址未落入第一地址空间时,为保证存储器211被访问过程的安全性,此时第一NDP单元可以输出表征访问地址异常的信号。
请参见图4,图4为本申请实施例中一种访存地址的计算逻辑示意图。该计算过程可以由第一NDP单元中的硬件模块实现。如图4所示,用于计算访存地址所涉及的硬件模块可以包括:PC寄存器410、第一寄存器420、第二寄存器430、加法器440、比较器450和存取单元(Load Store Unit,LSU)460。
下面将参照图4中所示硬件逻辑来详细描述第一访问地址的计算过程。图4中硬件模块可以是第一NDP单元212所包含硬件模块的一部分。
具体地,在第一访问地址计算过程中,PC寄存器410用于存储第一偏移地址。第一寄存单元2122包括第一寄存器420和第二寄存器430,第一寄存器420和第二寄存器430可以用于分别存储第一物理地址信息中的两种信息(第一边界地址和第一长度,或者第二边界地址和第三边界地址)。当第一寄存器420和第二寄存器430分别存储第一边界地址和第一长度时,比较器450从PC寄存器410和第二寄存器430中分别获取第一偏移地址和第一长度,并比较第一偏移地址和第一长度大小。加法器440从PC寄存器和第一寄存器420中分别获取第一偏移地址和第一边界地址,并基于上述实施例中对应的四种计算方式中的一种来计算第一访问地址。存取单元460基于比较器的比较结果来决定是否从存储器211中的第一访问地址中读取用于计算的数据;具体地,当第一偏移地址小于或等于第一长度时,存取单元LSU从第一地址空间中的第一访问地址中进行数据读取;当第一偏移地址大于第一长度时,存取单元LSU生成地址异常信号。
应当理解,图4中所示的利用硬件计算访存地址的逻辑只是本申请实施例中的一个示例,本领域中技术人员可以采用其它硬件逻辑或软件逻辑来实现访存地址的计算,本申请对此不限定。
请参见图5,图5是本申请实施例中又一种数据处理装置的结构示意图,作为对图2或图3中数据处理装置200中部分模块的细化。如图5所示,数据存储装置200可以包括E个NDP单元(即第一NDP单元212、第二NDP单元213…第E个NDP单元214),E为大于2的整数。该E个NDP单元分别对应存储器211中的E段连续内存空间,具体地,第一地址空间2112为处理器220为第一NDP单元212分配的一段连续内存空间、第二地址空间2113为处理器220为第二NDP单元分配的一段连续内存空间…第E地址空间2114位处理器220为第E个NDP单元分配的一段连续内存空间。上述E个地址空间可以由处理器220在进行初始化过程中一次性分配完成。该E个NDP单元通过互联网络(Crossbar Network)214进行连接。该E个NDP单元分别通过物理链路与存储器211直连。
应当理解,图5中为方便起见只示出了第一NDP单元212内部的具体结构,其它E-1个NDP单元的内部结构可以与第一NDP单元212内部结构相同或不同,本申请对此不限定。本申请实施例中的第一NDP单元212可以是数据存储装置210中包含的E个NDP单元中的任意一个。
在一种可行的实施方式中,所述第二NDP单元,用于存储第二物理地址信息;其中,所述第二物理地址信息用于指向所述存储器中的第二地址空间,所述第二地址空间是第二NDP单元有权使用的一段连续内存空间;所述存储器,用于在所述第二地址空间中存储来自所述处理器的用于进行近数据计算的第二数据;所述第一NDP单元,还用于基于获取的第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据;基于所述第二数据中的部分或全部数据执行计算。
可选地,在处理器220运行初始化代码来进行各NDP单元近数据计算任务分发的过程中, 处理器220为第二NDP单元213在存储器211中分配一段连续内存空间,即第二地址空间2113。第二地址空间2113由第二物理基址和第二长度进行表征,即第二地址空间2113的起始物理地址为第二物理基址,第二地址空间2113的长度为第二长度。第二地址空间2113为一段连续的物理地址的集合。同时,在处理器220进行近数据计算任务分发的过程中,处理器220在为第二NDP单元213分配第二地址空间2113后,还用于通过总线230向第二地址空间2113中写入用于近数据计算的第二数据。
可选的,上述第一NDP单元212获取第二偏移地址的方式与获取第一偏移地址的方式可以对应相同,此处不再赘述。
应当理解,上述第一NDP单元212从第二地址空间读取第二数据中的部分或全部数据进行计算的过程可以是第一NDP单元212进行的一轮或多轮计算任务中的一轮。
从技术效果上看,第一NDP单元既可以从处理器220为其分配的第一地址空间2112中读取相应的数据进行计算,也可从处理器220为其它近数据计算单元分配的连续内存空间中读取相应的数据进行计算。即对于两个需要进行数据交互的NDP单元,其只需进行点到点通信,无需进行全局同步,因而本申请实施例中,每个NDP单元对存储器中远程地址空间的访问具有良好的扩展性。
应当理解,本申请实施例只是以第一NDP单元212为对象描述NDP单元进行计算的具体过程。当数据存储装置210中包含多个NDP单元时,每个NDP单元进行计算的过程与本申请实施例中第一NDP单元212进行计算的过程对应相同,此处不再赘述。
在一种可行的实施方式中,所述存储器还用于接收所述处理器的指令,所述指令指示所述存储器为所述第一NDP单元分配所述第一地址空间,以及指示所述存储器为所述第二NDP单元分配所述第二地址空间。
即第一地址空间和第二地址空间是由处理器为第一NDP单元和第二NDP单元分别分配的连续内存空间。
在一种可行的实施方式中,所述第一NDP单元还包括第二寄存单元和缓存单元;在所述第一NDP单元还用于基于获取的第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据的方面,所述第一NDP单元具体用于:在所述缓存单元缓存有所述第二物理地址信息的情况下,通过所述NDP core从所述缓存单元中获取所述第二物理地址信息,并更新到所述第二寄存单元中;通过所述NDP core基于获取的所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据;或者,在所述缓存单元未缓存所述第二物理地址信息的情况下,通过所述NDP core从所述第二NDP单元中获取所述第二物理地址信息,并更新到所述第二寄存单元和所述缓存单元中;通过所述NDP core基于获取的所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据。
其中,缓存单元2124用于在第一NDP单元212进行计算的过程中,缓存用于指向第二地址空间2113的第二物理地址信息。
应当理解,该第二NDP单元213可以是除第一NDP单元212外的其它任意一个NDP单元,即第二地址空间2113可以是除第一地址空间2112外的任意一段连续内存空间,且第二地址空间2113是由处理器220为第二NDP单元213分配的一段连续内存空间。
可选地,缓存单元2124在硬件上可以是随机存储器RAM或其它可行的存储器中的任意一种。其中,随机存储器RAM可以包括静态随机存储器(Static Random-Access Memory,SRAM)和动态随机存储器(Dynamic Random Access Memory,DRAM)。缓存单元2124在逻辑上可 以为用于存储第二物理地址和第二长度的表单。
上述通过NDP core基于获取的第二偏移地址和第二物理地址信息从第二地址空间中读取所述第二数据中的部分或全部数据,进一步包括:NDP Core从第二寄存单元2123中获取第二物理地址信息,基于第二物理地址信息和第二偏移地址从第二地址空间2113中获取第二数据中部分或全部数据,以进行接下来的一轮计算。
从技术效果上看,本申请通过在第一NDP单元中设置缓存单元来缓存获取过的第二物理地址信息,从而当后续计算过程中如需再次使用时,可以直接从缓存单元中获取第二物理地址信息来计算相应的访问地址,从而基于计算得到的访问地址读取用于进行计算的数据。由于本申请在近数据计算过程中增加了上述缓存机制,因而可以有效降低近数据计算过程的延迟,提高计算效率和能耗比。
在一种可行的实施方式中,所述第二物理地址信息包括第四边界地址和第二长度;在所述通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据的方面,所述NDP core具体用于:从所述第二寄存单元中读取所述第四边界地址和所述第二长度;其中,所述第四边界地址为所述第二地址空间的起始物理地址或所述第二地址空间的终止物理地址,所述第二长度为所述第二地址空间的长度;当所述第二偏移地址小于或等于所述第二长度时,基于所述第二偏移地址和所述第四边界地址计算得到第二访问地址;从所述第二地址空间中的所述第二访问地址中读取所述第二数据中的部分或全部数据。
在一种可行的实施方式中,所述第二物理地址信息包括第五边界地址和第六边界地址;在所述通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据的方面,所述NDP core具体用于:基于所述第二偏移地址和所述第五边界地址计算得到第二访问地址,或者基于所述第二偏移地址和所述第六边界地址计算得到所述第二访问地址;其中,所述第五边界地址和所述第六边界地址分别为所述第二地址空间的起始物理地址和所述第二地址空间的终止物理地址;当所述第二访问地址位于所述第五边界地址和所述第六边界地址之间时,从所述第二访问地址中读取所述第一数据中的部分或全部数据。
具体地,上述第一NDP单元212通过NDP Core计算第二访问地址的具体过程可以参见与前述实施例中NDP core计算第一访问地址具体过程的对应描述,此处不再赘述。
其中,第二偏移地址为相对第二地址空间的起始物理地址或终止物理地址的偏移量。
从技术效果上看,通过上述实施例中的第一NDP单元访问存储器中第二地址空间的方式,可以确保访问范围在第二地址空间的内部,不会超出第二地址空间,从而保证存储器被访问过程的安全性。
从技术效果上看,由于数据存储装置中的NDP单元不感知处理器中缓存的用于近数据计算的第一数据,因而在本申请中没有其它保证缓存一致性的机制时,需要将处理器中缓存的第一地址空间中的数据进行清除,以避免缓存一致性的问题。
在一种可行的实施方式中,处理器220可以在第一NDP单元212在基于第一访问地址从第一地址空间中获取用于计算的数据之前,清除处理器220中缓存的第一地址空间中的数据。
从技术效果上看,由于第一NDP单元212不感知其它部件的缓存结构,因而在第一NDP单元212启动计算之前,需要指示处理器220将其缓存的第一地址空间中的数据进行清除,且在后续第一NDP单元212进行计算期间不再访问处理器220缓存的第一地址空间的数据, 从而避免缓存一致性的问题。
在一种可行的实施方式中,所述第一寄存单元和所述第二寄存单元分别都包括至少一个寄存器。
其中,上述寄存单元可以为具有数据存储和数据读写功能的高速存储器件,其数据读写速度远高于硬盘、U盘等外部存储器件(外部存储器件指计算设备内存及处理器缓存以外的存储器件)。寄存单元可以由具有高速读写速度的寄存器来实现。
其中,每个寄存单元中寄存器的数量要满足保存地址信息的要求。
具体地,当第一寄存单元2122包括一个寄存器且第一物理地址信息中包含两类信息时,可以将第一物理地址信息存入该一个寄存器中。例如,在64位寄存器中,可以用48位表示第一边界地址,剩余16位表示第一地址空间的长度,即第一长度,或者用32位表示第一边界地址,剩余32位表示第二边界地址。当第一寄存单元2122包括两个寄存器时,可以利用该两个寄存器分别存储第一物理地址信息中的两类信息(第一边界地址和第一长度,或者第二边界地址和第三边界地址)。同理,第二寄存单元也可以采用与第一寄存单元相同的方式存储第二物理地址信息,此处不再赘述。
从技术效果上看,本申请通过硬件寄存器来存储指向存储器211中第一地址空间2112的第一物理地址信息,由于硬件寄存器中信息很难被篡改,安全性高,因而本申请基于硬件逻辑完成第一访问地址的计算具有较高的安全性。同时,采用本申请中第一访问地址的计算逻辑,可以使得第一NDP单元212在存储器211中访问的范围不会超出第一物理地址信息所指向的连续内存空间,保证存储器211被访问过程的安全性。同理,也可保证第一NDP单元212访问存储器211中其它地址空间时的安全性。
可选地,本申请还可以通过增加简单的硬件逻辑,可以控制第一地址空间中不同区块的访问权限,以增强安全性。例如,可增加内存保护单元(Memory Protection Unit,MPU)来控制第一地址空间中代码区块和数据区块的读写权限。同理,对其它地址空间中的数据区块和代码区块也可设置不同的访问权限,此处不再赘述。
在一种可行的实施方式中,所述第一NDP单元,还用于在完成所述近数据计算后,通过所述总线向所述处理器发送信号;所述信号用于指示所述第一NDP单元已完成近数据计算。
具体地,在第一NDP单元212完成了处理器220为其分配的近数据计算任务(可以包括一轮或多轮计算过程)后,可以通过总线230向处理器220发送信号。
可选地,在处理器220收到上述信号后,处理器220可以维护一个计数器,在收到一个NDP单元发送的信号后,将该计数器的值增加1,并相应保存该NDP单元的标识;其中,标识可以是NDP单元的编号。
请参见图6,图6为本申请实施例中一种数据存储装置的硬件结构示意图。该数据存储装置600是基于混合存储立方体(Hybrid Memory Cube,HMC)的存储装置。数据存储装置600可以适用到图1、图2、图3或图5中的数据处理装置中,作为其中的数据存储装置。
如图6所示,该数据存储装置600包括8层堆叠的DRAM芯片(分别为图6中的晶粒A到晶粒H)和1层逻辑芯片,8层堆叠的DRAM芯片作为存储器,可以对应前述实施例中的存储器211;逻辑芯片作为控制单元,可以包括图2-图3实施例中的第一NDP单元或图5实施例中的E个NDP单元。各层芯片之间可以通过硅通孔(Through Silicon Via,TSV)连接。该数据存储装置600中的每层芯片在逻辑上可以划分为若干单元(如图6所示,每层被划分 为32个区域,例如,晶粒Die A被划分为从P00A到P31A共32个区域,逻辑芯片被划分为32个逻辑单元)。垂直方向上的多层存储器和逻辑单元构成一个存储库Vault,数据存储装置600可被划分为32个Vault(Vault 00到Vault 31),Vault 00包括逻辑单元00、以及垂直方向上的8个存储器(P00A到P00H)。
每个Vault通过内部的互联网络(Crossbar Network)连接,互联网络通过总线(高速串行链路)与处理器(图6中未示出)通信。数据存储装置600可以是8GB规格的存储装置,每层DRAM芯片的数据容量为1GB,互连网络通过8条40GB/s的高速串行链路与数据存储装置600外部的设备相连。
如图6所示,以Vault 00为例,Vault 00内部的逻辑单元00中包含存储库控制器VC00(Vault Controller,VC)和NDP单元00。VC00集成在逻辑单元00中,负责Vault 00内部的数据读写操作。NDP单元可以包括近数据计算核心(Near-Data Processing Core,NDP Core)、高速暂存存储器(Scratchpad Memory,SPM)、内存保护单元(Memory Protection Unit,MPU)、直接内存访问(Direct Memory Access,DMA)引擎。此外,每个NDP00单元还可以包括第一寄存单元、第二寄存单元和缓存单元,图6中为简便起见未示出。应当理解,其它逻辑单元的内部结构可以与逻辑单元00相同,此处不再赘述。
如图7所示,图7为本申请实施例中一种连续内存空间的空间布局示意图。图7所示的连续内存空间700可以是一个Vault中存储器内一段连续的物理地址空间,在初始化过程中该物理地址空间被分配给该Vault中的NDP单元。如图7所述,该连续内存空间包括:.text区域、.data区域和.stack区域。其中,.text区域用于保存计算任务源码编译后的机器指令,对应前述实施例中的代码区块;.data区域用于保存数据,对应前述实施例中的数据区块;.stack区域为一块预留的栈空间。
下面将以图7中连续内存空间700为例描述初始化过程(即近数据计算任务的分发过程)。
首先,处理器可以根据系统的计算需求和负载情况,通过总线将计算任务动态地分发给各个Vault中的NDP单元,具体地:处理器可以通过运行一段初始化代码,一次性为各NDP单元分别分配一段连续的内存空间,然后向每个NDP单元对应的连续内存空间中写入相应的用于近数据计算的数据。
具体地,处理器向每个连续内存空间写入相应用于近数据计算数据的过程可以包括:计算逻辑分发和计算数据分发。以连续内存空间700为例,在进行计算逻辑分发时,处理器将编译后的二进制计算任务源码写入连续内存空间700中的.text区域,然后设置NDP core中的PC寄存器来存储相应的偏移地址。在进行计算数据分发时,处理器将相应数据写入连续内存空间700中的.data区域。应当注意,如果内存分配阶段选用的数据划分和数据分布的策略合理,处理器可能在处理其它计算任务时已经完成了计算数据的分发过程,在这种情况下,只需进行计算逻辑的分发。
在处理器对各NDP单元完成近数据计算任务的分发后,各NDP单元可以开始独立地启动计算。在图6所示的NDP单元进行计算过程中,SPM用于存储NDP单元运行所需的数据;MPU用于提供对存储器中数据的保护,即为连续内存空间中的不同区域分配不同的访问权限;DMA引擎用于实现存储器与SPM之间的数据搬移。
请参见图8,图8为本申请实施例中一种数据处理方法的流程示意图。该方法应用于数据存储装置,所述数据存储装置包括存储器和第一近数据计算NDP单元,所述第一NDP单元与所述存储器电连接,所述数据存储装置通过总线与处理器相连。所述方法包括:
步骤S810:通过所述第一NDP单元存储第一物理地址信息;其中,所述第一物理地址信息用于指向所述存储器中的第一地址空间,所述第一地址空间是第一NDP单元有权使用的一段连续内存空间。
步骤S820:由所述存储器在所述第一地址空间中存储来自所述处理器的用于进行近数据计算的第一数据。
步骤S830:由所述第一NDP单元基于获取的第一偏移地址和第一物理地址信息从所述第一地址空间中读取所述第一数据中的部分或全部数据;基于所述第一数据中的部分或全部数据执行计算。
在一种可行的实施方式中,所述第一NDP单元包括第一寄存单元和近数据计算核心NDP core;所述通过所述第一NDP单元存储第一物理地址信息,包括:通过所述第一寄存单元存储所述第一物理基址和所述第一长度;所述由所述第一NDP单元基于获取的第一偏移地址和第一物理地址信息从所述第一地址空间中读取所述第一数据中的部分或全部数据,包括:由所述NDP core从所述第一寄存单元获取所述第一物理地址信息,并基于所述第一物理地址信息和所述第一偏移地址从所述第一地址空间中读取所述第一数据中的部分或全部数据。
在一种可行的实施方式中,所述第一物理地址信息包括第一边界地址和第一长度;所述基于所述第一物理地址信息和所述第一偏移地址从所述第一地址空间中读取所述第一数据中的部分或全部数据,包括:当所述第一偏移地址小于或等于所述第一长度时,由所述NDP Core基于所述第一偏移地址和所述第一边界地址,计算得到第一访问地址;其中,所述第一边界地址为所述第一地址空间的起始物理地址或所述第一地址空间的终止物理地址,所述第一长度为所述第一地址空间的长度;由所述NDP Core从所述第一地址空间中的所述第一访问地址中读取所述第一数据中的部分或全部数据。
在一种可行的实施方式中,所述第一物理地址信息包括第二边界地址和第三边界地址;所述基于所述第一物理地址信息和所述第一偏移地址从所述第一地址空间中读取所述第一数据中的部分或全部数据,包括:由所述NDP Core基于所述第一偏移地址和所述第二边界地址计算得到第一访问地址,或者基于所述第一偏移地址和所述第三边界地址计算得到所述第一访问地址;其中,所述第二边界地址和所述第三边界地址分别为所述第一地址空间的起始物理地址和所述第一地址空间的终止物理地址;当所述第一访问地址位于所述第二边界地址和所述第三边界地址之间时,由所述NDP Core从所述第一访问地址中读取所述第一数据中的部分或全部数据。
在一种可行的实施方式中,所述数据存储装置还包括第二NDP单元,所述第二NDP单元和所述存储器电连接;所述方法还包括:通过所述第二NDP单元存储第二物理地址信息;其中,所述第二物理地址信息用于指向所述存储器中的第二地址空间,所述第二地址空间是第二NDP单元有权使用的一段连续内存空间;由所述存储器在所述第二地址空间中存储来自所述处理器的用于进行近数据计算的第二数据;由所述第一NDP单元基于获取的第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据;基于所述第二数据中的部分或全部数据执行计算。
在一种可行的实施方式中,所述方法还包括:由所述存储器接收所述处理器的指令,所述指令指示所述存储器为所述第一NDP单元分配所述第一地址空间,以及指示所述存储器为所述第二NDP单元分配所述第二地址空间。
在一种可行的实施方式中,所述第一NDP单元还包括第二寄存单元和缓存单元;所述基于获取的第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中 的部分或全部数据,包括:在所述缓存单元缓存有所述第二物理地址信息的情况下,通过所述NDP core从所述缓存单元中获取所述第二物理地址信息,并更新到所述第二寄存单元中;通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据;或者,在所述缓存单元未缓存所述第二物理地址信息的情况下,通过所述NDP core从所述第二NDP单元中获取所述第二物理地址信息,并更新到所述第二寄存单元和所述缓存单元中;通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据。
在一种可行的实施方式中,所述第二物理地址信息包括第四边界地址和第二长度;所述通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据,包括:通过所述NDP core从所述第二寄存单元中读取所述第四边界地址和所述第二长度;其中,所述第四边界地址为所述第二地址空间的起始物理地址或所述第二地址空间的终止物理地址,所述第二长度为所述第二地址空间的长度;当所述第二偏移地址小于或等于所述第二长度时,由所述NDP core基于所述第二偏移地址和所述第四边界地址计算得到第二访问地址;由所述NDP core从所述第二地址空间中的所述第二访问地址中读取所述第二数据中的部分或全部数据。在一种可行的实施方式中,所述第二物理地址信息包括第五边界地址和第六边界地址;所述通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据的方面,包括:由所述NDP core基于所述第二偏移地址和所述第五边界地址计算得到第二访问地址,或者基于所述第二偏移地址和所述第六边界地址计算得到所述第二访问地址;其中,所述第五边界地址和所述第六边界地址分别为所述第二地址空间的起始物理地址和所述第二地址空间的终止物理地址;当所述第二访问地址位于所述第五边界地址和所述第六边界地址之间时,由所述NDP core从所述第二访问地址中读取所述第一数据中的部分或全部数据。
在一种可行的实施方式中,所述第一寄存单元和所述第二寄存单元分别都包括至少一个寄存器。
在一种可行的实施方式中,所述方法还包括:在所述第一NDP单元完成所述近数据计算后,由所述第一NDP单元通过所述总线向所述处理器发送信号;所述信号用于指示所述第一NDP单元已完成近数据计算。
本申请实施例提供了一种数据处理装置,包括处理器、上述实施例中的任意一种实施方式所提供的数据存储装置以及耦合于该数据存储装置的分立器件。该数据处理装置可以是前述图2、图3和图5中任一实施例中所述的数据处理装置。
本申请实施例提供了一种计算机存储介质,所述计算机存储介质存储有计算机程序,该计算机程序被执行时,使得第一NDP单元212可以执行如上述方法实施例中记载的任意一种的部分或全部步骤,以完成上述计算过程。
本申请实施例提供了一种计算机程序,该计算机程序包括指令,当该计算机程序被处理器或第一NDP单元212执行时,使得第一NDP单元212可以执行上述方法实施例中记载的任意一种的部分或全部步骤。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其它实施例的相关描述。需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本申请并不受所描述的动作顺序的限制,因为依据本申请,某些步骤可能可以采用其它顺序或者同时进行。其次, 本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本申请所必须的。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如上述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。
上述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。
Claims (24)
- 一种数据存储装置,其特征在于,所述数据存储装置包括存储器和第一近数据计算NDP单元,所述第一NDP单元与所述存储器电连接,所述数据存储装置通过总线与处理器相连;其中,所述第一NDP单元,用于存储第一物理地址信息;其中,所述第一物理地址信息用于指向所述存储器中的第一地址空间,所述第一地址空间是所述第一NDP单元有权使用的一段连续内存空间;所述存储器,用于在所述第一地址空间中存储来自所述处理器的用于进行近数据计算的第一数据;所述第一NDP单元,还用于基于获取的第一偏移地址和所述第一物理地址信息从所述第一地址空间中读取所述第一数据中的部分或全部数据;基于所述第一数据中的部分或全部数据执行计算。
- 根据权利要求1所述的装置,其特征在于,所述第一NDP单元包括第一寄存单元和近数据计算核心NDP core;所述第一NDP单元具体用于:通过所述第一寄存单元存储所述第一物理地址信息;通过所述NDP core从所述第一寄存单元获取所述第一物理地址信息,并基于所述第一物理地址信息和所述第一偏移地址从所述第一地址空间中读取所述第一数据中的部分或全部数据。
- 根据权利要求2所述的装置,其特征在于,所述第一物理地址信息包括第一边界地址和第一长度;所述NDP core具体用于:当所述第一偏移地址小于或等于所述第一长度时,基于所述第一偏移地址和所述第一边界地址,计算得到第一访问地址;其中,所述第一边界地址为所述第一地址空间的起始物理地址或所述第一地址空间的终止物理地址,所述第一长度为所述第一地址空间的长度;从所述第一地址空间中的所述第一访问地址中读取所述第一数据中的部分或全部数据。
- 根据权利要求2所述的装置,其特征在于,所述第一物理地址信息包括第二边界地址和第三边界地址;所述NDP core具体用于:基于所述第一偏移地址和所述第二边界地址计算得到第一访问地址,或者基于所述第一偏移地址和所述第三边界地址计算得到所述第一访问地址;其中,所述第二边界地址和所述第三边界地址分别为所述第一地址空间的起始物理地址和所述第一地址空间的终止物理地址;当所述第一访问地址位于所述第二边界地址和所述第三边界地址之间时,从所述第一访问地址中读取所述第一数据中的部分或全部数据。
- 根据权利要求1-4中任一项所述的装置,其特征在于,所述数据存储装置还包括第二NDP单元,所述第二NDP单元和所述存储器电连接;所述第二NDP单元,用于存储第二物理地址信息;其中,所述第二物理地址信息用于指向所述存储器中的第二地址空间,所述第二地址空间是所述第二NDP单元有权使用的一段连续内存空间;所述存储器,用于在所述第二地址空间中存储来自所述处理器的用于进行近数据计算的 第二数据;所述第一NDP单元,还用于基于获取的第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据;基于所述第二数据中的部分或全部数据执行计算。
- 根据权利要求5所述的装置,其特征在于,所述存储器还用于:接收所述处理器的指令,所述指令指示所述存储器为所述第一NDP单元分配所述第一地址空间,以及指示所述存储器为所述第二NDP单元分配所述第二地址空间。
- 根据权利要求5或6所述的装置,其特征在于,所述第一NDP单元还包括第二寄存单元和缓存单元;在所述第一NDP单元还用于基于获取的第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据的方面,所述第一NDP单元具体用于:在所述缓存单元缓存有所述第二物理地址信息的情况下,通过所述NDP core从所述缓存单元中获取所述第二物理地址信息,并更新到所述第二寄存单元中;通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据;或者,在所述缓存单元未缓存所述第二物理地址信息的情况下,通过所述NDP core从所述第二NDP单元中获取所述第二物理地址信息,并更新到所述第二寄存单元和所述缓存单元中;通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据。
- 根据权利要求7所述的装置,其特征在于,所述第二物理地址信息包括第四边界地址和第二长度;在所述通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据的方面,所述NDP core具体用于:从所述第二寄存单元中读取所述第四边界地址和所述第二长度;其中,所述第四边界地址为所述第二地址空间的起始物理地址或所述第二地址空间的终止物理地址,所述第二长度为所述第二地址空间的长度;当所述第二偏移地址小于或等于所述第二长度时,基于所述第二偏移地址和所述第四边界地址计算得到第二访问地址;从所述第二地址空间中的所述第二访问地址中读取所述第二数据中的部分或全部数据。
- 根据权利要求7所述的装置,其特征在于,所述第二物理地址信息包括第五边界地址和第六边界地址;在所述通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据的方面,所述NDP core具体用于:基于所述第二偏移地址和所述第五边界地址计算得到第二访问地址,或者基于所述第二偏移地址和所述第六边界地址计算得到所述第二访问地址;其中,所述第五边界地址和所述第六边界地址分别为所述第二地址空间的起始物理地址和所述第二地址空间的终止物理地址;当所述第二访问地址位于所述第五边界地址和所述第六边界地址之间时,从所述第二访 问地址中读取所述第一数据中的部分或全部数据。
- 根据权利要求5-9中任一项所述的装置,其特征在于,所述第一寄存单元和所述第二寄存单元分别都包括至少一个寄存器。
- 根据权利要求1-10中任一项所述的装置,其特征在于,所述第一NDP单元,还用于在完成所述近数据计算后,通过所述总线向所述处理器发送信号;所述信号用于指示所述第一NDP单元已完成近数据计算。
- 一种数据处理方法,其特征在于,应用于数据存储装置,所述数据存储装置包括存储器和第一近数据计算NDP单元,所述第一NDP单元和所述存储器电连接,所述数据存储装置通过总线与处理器相连;所述方法包括:通过所述第一NDP单元存储第一物理地址信息;其中,所述第一物理地址信息用于指向所述存储器中的第一地址空间,所述第一地址空间是所述第一NDP单元有权使用的一段连续内存空间;由所述存储器在所述第一地址空间中存储来自所述处理器的用于进行近数据计算的第一数据;由所述第一NDP单元基于获取的第一偏移地址和第一物理地址信息从所述第一地址空间中读取所述第一数据中的部分或全部数据;基于所述第一数据中的部分或全部数据执行计算。
- 根据权利要求12所述的方法,其特征在于,所述第一NDP单元包括第一寄存单元和近数据计算核心NDP core;所述通过所述第一NDP单元存储第一物理地址信息,包括:通过所述第一寄存单元存储所述第一物理基址和所述第一长度;所述由所述第一NDP单元基于获取的第一偏移地址和第一物理地址信息从所述第一地址空间中读取所述第一数据中的部分或全部数据,包括:由所述NDP core从所述第一寄存单元获取所述第一物理地址信息,并基于所述第一物理地址信息和所述第一偏移地址从所述第一地址空间中读取所述第一数据中的部分或全部数据。
- 根据权利要求13所述的方法,其特征在于,所述第一物理地址信息包括第一边界地址和第一长度;所述基于所述第一物理地址信息和所述第一偏移地址从所述第一地址空间中读取所述第一数据中的部分或全部数据,包括:当所述第一偏移地址小于或等于所述第一长度时,由所述NDP Core基于所述第一偏移地址和所述第一边界地址,计算得到第一访问地址;其中,所述第一边界地址为所述第一地址空间的起始物理地址或所述第一地址空间的终止物理地址,所述第一长度为所述第一地址空间的长度;由所述NDP Core从所述第一地址空间中的所述第一访问地址中读取所述第一数据中的部分或全部数据。
- 根据权利要求13所述的方法,其特征在于,所述第一物理地址信息包括第二边界地 址和第三边界地址;所述基于所述第一物理地址信息和所述第一偏移地址从所述第一地址空间中读取所述第一数据中的部分或全部数据,包括:由所述NDP Core基于所述第一偏移地址和所述第二边界地址计算得到第一访问地址,或者基于所述第一偏移地址和所述第三边界地址计算得到所述第一访问地址;其中,所述第二边界地址和所述第三边界地址分别为所述第一地址空间的起始物理地址和所述第一地址空间的终止物理地址;当所述第一访问地址位于所述第二边界地址和所述第三边界地址之间时,由所述NDP Core从所述第一访问地址中读取所述第一数据中的部分或全部数据。
- 根据权利要求12-15中任一项所述的方法,其特征在于,所述数据存储装置还包括第二NDP单元,所述第二NDP单元和所述存储器电连接;所述方法还包括:通过所述第二NDP单元存储第二物理地址信息;其中,所述第二物理地址信息用于指向所述存储器中的第二地址空间,所述第二地址空间是所述第二NDP单元有权使用的一段连续内存空间;由所述存储器在所述第二地址空间中存储来自所述处理器的用于进行近数据计算的第二数据;由所述第一NDP单元基于获取的第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据;基于所述第二数据中的部分或全部数据执行计算。
- 根据权利要求16所述的方法,其特征在于,所述方法还包括:由所述存储器接收所述处理器的指令,所述指令指示所述存储器为所述第一NDP单元分配所述第一地址空间,以及指示所述存储器为所述第二NDP单元分配所述第二地址空间。
- 根据权利要求16或17所述的方法,其特征在于,所述第一NDP单元还包括第二寄存单元和缓存单元;所述基于获取的第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据,包括:在所述缓存单元缓存有所述第二物理地址信息的情况下,通过所述NDP core从所述缓存单元中获取所述第二物理地址信息,并更新到所述第二寄存单元中;通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据;或者,在所述缓存单元未缓存所述第二物理地址信息的情况下,通过所述NDP core从所述第二NDP单元中获取所述第二物理地址信息,并更新到所述第二寄存单元和所述缓存单元中;通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据。
- 根据权利要求18所述的方法,其特征在于,所述第二物理地址信息包括第四边界地址和第二长度;所述通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据,包括:通过所述NDP core从所述第二寄存单元中读取所述第四边界地址和所述第二长度;其中, 所述第四边界地址为所述第二地址空间的起始物理地址或所述第二地址空间的终止物理地址,所述第二长度为所述第二地址空间的长度;当所述第二偏移地址小于或等于所述第二长度时,由所述NDP core基于所述第二偏移地址和所述第四边界地址计算得到第二访问地址;由所述NDP core从所述第二地址空间中的所述第二访问地址中读取所述第二数据中的部分或全部数据。
- 根据权利要求18中所述的方法,其特征在于,所述第二物理地址信息包括第五边界地址和第六边界地址;所述通过所述NDP core基于所述第二偏移地址和所述第二物理地址信息从所述第二地址空间中读取所述第二数据中的部分或全部数据的方面,包括:由所述NDP core基于所述第二偏移地址和所述第五边界地址计算得到第二访问地址,或者基于所述第二偏移地址和所述第六边界地址计算得到所述第二访问地址;其中,所述第五边界地址和所述第六边界地址分别为所述第二地址空间的起始物理地址和所述第二地址空间的终止物理地址;当所述第二访问地址位于所述第五边界地址和所述第六边界地址之间时,由所述NDP core从所述第二访问地址中读取所述第一数据中的部分或全部数据。
- 根据权利要求18-20中任一项所述的方法,其特征在于,所述第一寄存单元和所述第二寄存单元分别都包括至少一个寄存器。
- 根据权利要求12-21中任一项所述的方法,其特征在于,所述方法还包括:在所述第一NDP单元完成所述近数据计算后,由所述第一NDP单元通过所述总线向所述处理器发送信号;所述信号用于指示所述第一NDP单元已完成近数据计算。
- 一种计算机存储介质,其特征在于,所述计算机存储介质存储有计算机程序,所述计算机程序被执行时,权利要求12-21中任意一项所述的数据处理方法得以实现。
- 一种计算机程序,其特征在于,该计算机程序包括指令,当所述计算机程序被执行时,权利要求12-21中任意一项所述的数据处理方法得以实现。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP22863291.5A EP4354309A4 (en) | 2021-08-30 | 2022-08-25 | Data storage apparatus and data processing method |
| US18/592,356 US20240281381A1 (en) | 2021-08-30 | 2024-02-29 | Data storage apparatus and data processing method |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111008166.4 | 2021-08-30 | ||
| CN202111008166.4A CN115729845A (zh) | 2021-08-30 | 2021-08-30 | 数据存储装置和数据处理方法 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/592,356 Continuation US20240281381A1 (en) | 2021-08-30 | 2024-02-29 | Data storage apparatus and data processing method |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023030153A1 true WO2023030153A1 (zh) | 2023-03-09 |
Family
ID=85291386
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/114735 Ceased WO2023030153A1 (zh) | 2021-08-30 | 2022-08-25 | 数据存储装置和数据处理方法 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20240281381A1 (zh) |
| EP (1) | EP4354309A4 (zh) |
| CN (1) | CN115729845A (zh) |
| WO (1) | WO2023030153A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116795494A (zh) * | 2023-08-23 | 2023-09-22 | 北京紫光芯能科技有限公司 | 内存保护单元信息的处理方法、系统以及可读介质 |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN117312330B (zh) * | 2023-11-29 | 2024-02-09 | 中国人民解放军国防科技大学 | 基于便签式存储的向量数据聚集方法、装置及计算机设备 |
| CN119759807B (zh) * | 2025-03-07 | 2025-06-06 | 山东云海国创云计算装备产业创新中心有限公司 | 缓存数据读取方法、装置、计算机设备及存储介质 |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160350074A1 (en) * | 2015-06-01 | 2016-12-01 | Samsung Electronics Co., Ltd. | Highly efficient inexact computing storage device |
| CN109213697A (zh) * | 2017-06-30 | 2019-01-15 | 英特尔公司 | 智能存储器数据存储或加载方法和装置 |
| CN110019004A (zh) * | 2017-09-08 | 2019-07-16 | 华为技术有限公司 | 一种数据处理方法、装置及系统 |
Family Cites Families (9)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8639858B2 (en) * | 2010-06-23 | 2014-01-28 | International Business Machines Corporation | Resizing address spaces concurrent to accessing the address spaces |
| CN103140894B (zh) * | 2010-08-17 | 2017-08-22 | 技术研究及发展基金公司 | 在非易失性存储器(nvm)单元中减轻单元间耦合效应 |
| GB2547893B (en) * | 2016-02-25 | 2018-06-06 | Advanced Risc Mach Ltd | Combining part of an offset with a corresponding part of a base address and comparing with a reference address |
| KR102724161B1 (ko) * | 2016-12-16 | 2024-10-30 | 에스케이하이닉스 주식회사 | 니어-데이터 처리를 수행하는 메모리 장치 및 이를 포함하는 시스템 |
| GB2578135B (en) * | 2018-10-18 | 2020-10-21 | Advanced Risc Mach Ltd | Range checking instruction |
| US11669454B2 (en) * | 2019-05-07 | 2023-06-06 | Intel Corporation | Hybrid directory and snoopy-based coherency to reduce directory update overhead in two-level memory |
| WO2021028723A2 (en) * | 2019-08-13 | 2021-02-18 | Neuroblade Ltd. | Memory-based processors |
| CN111159094A (zh) * | 2019-12-05 | 2020-05-15 | 天津芯海创科技有限公司 | 一种基于risc-v的近数据流式计算加速阵列 |
| CN111400202A (zh) * | 2020-03-13 | 2020-07-10 | 宁波中控微电子有限公司 | 应用于片上控制系统的寻址方法、模块及片上控制系统 |
-
2021
- 2021-08-30 CN CN202111008166.4A patent/CN115729845A/zh active Pending
-
2022
- 2022-08-25 EP EP22863291.5A patent/EP4354309A4/en active Pending
- 2022-08-25 WO PCT/CN2022/114735 patent/WO2023030153A1/zh not_active Ceased
-
2024
- 2024-02-29 US US18/592,356 patent/US20240281381A1/en active Pending
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20160350074A1 (en) * | 2015-06-01 | 2016-12-01 | Samsung Electronics Co., Ltd. | Highly efficient inexact computing storage device |
| CN109213697A (zh) * | 2017-06-30 | 2019-01-15 | 英特尔公司 | 智能存储器数据存储或加载方法和装置 |
| CN110019004A (zh) * | 2017-09-08 | 2019-07-16 | 华为技术有限公司 | 一种数据处理方法、装置及系统 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4354309A4 |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116795494A (zh) * | 2023-08-23 | 2023-09-22 | 北京紫光芯能科技有限公司 | 内存保护单元信息的处理方法、系统以及可读介质 |
| CN116795494B (zh) * | 2023-08-23 | 2024-01-02 | 北京紫光芯能科技有限公司 | 内存保护单元信息的处理方法、系统以及可读介质 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN115729845A (zh) | 2023-03-03 |
| EP4354309A4 (en) | 2024-08-14 |
| US20240281381A1 (en) | 2024-08-22 |
| EP4354309A1 (en) | 2024-04-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US10896136B2 (en) | Storage system including secondary memory that predicts and prefetches data | |
| WO2023030153A1 (zh) | 数据存储装置和数据处理方法 | |
| US7689783B2 (en) | System and method for sharing memory by heterogeneous processors | |
| CN105740164B (zh) | 支持缓存一致性的多核处理器、读写方法、装置及设备 | |
| US10824341B2 (en) | Flash-based accelerator and computing device including the same | |
| US12111775B2 (en) | Memory hub providing cache coherency protocol system method for multiple processor sockets comprising multiple XPUs | |
| TWI883041B (zh) | 具有增進記憶體側快取實施之多層記憶體 | |
| CN105830059B (zh) | 文件访问方法、装置及存储设备 | |
| US20070288701A1 (en) | System and Method for Using a Plurality of Heterogeneous Processors in a Common Computer System | |
| EP3422198A1 (en) | Multi-chip multiprocessor cache coherence operation method and multi-chip multiprocessor | |
| JP2022539010A (ja) | 高帯域クロスリンクを使用したgpuチップレット | |
| KR20140097483A (ko) | 휘발성 메모리 및 비휘발성 메모리 간의 코드 및 데이터 저장소들을 분산하기 위한 방법 및 장치 | |
| JP6343722B2 (ja) | マルチコアシステムにおいてデータ訪問者ディレクトリにアクセスするための方法及びデバイス | |
| WO2018022175A1 (en) | Techniques to allocate regions of a multi level, multitechnology system memory to appropriate memory access initiators | |
| JP2022510715A (ja) | データ領域を記憶するためのキャッシュ | |
| CN115456862B (zh) | 一种用于图像处理器的访存处理方法及设备 | |
| CN113892091A (zh) | 通过通信网络连接的操作系统间存储器服务 | |
| US9703516B2 (en) | Configurable interface controller | |
| CN119576801A (zh) | 一种内存分配方法及相关设备 | |
| CN117827417A (zh) | 一种内存管理方法和相关设备 | |
| CN118779280B (zh) | 降低总线负载的方法、cxl模组、处理系统和处理器芯片 | |
| CN119862134A (zh) | 缓存器件、缓存访问方法、处理器、芯片及电子设备 | |
| CN114490441A (zh) | 一种内存管理方法及混合内存管理单元 | |
| US20250284645A1 (en) | Storage device, storage system and operating method of the same using memory buffer | |
| US11281612B2 (en) | Switch-based inter-device notational data movement system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22863291 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022863291 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2022863291 Country of ref document: EP Effective date: 20240110 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |