WO2023072048A1 - 网络存储方法、存储系统、数据处理单元及计算机系统 - Google Patents
网络存储方法、存储系统、数据处理单元及计算机系统 Download PDFInfo
- Publication number
- WO2023072048A1 WO2023072048A1 PCT/CN2022/127306 CN2022127306W WO2023072048A1 WO 2023072048 A1 WO2023072048 A1 WO 2023072048A1 CN 2022127306 W CN2022127306 W CN 2022127306W WO 2023072048 A1 WO2023072048 A1 WO 2023072048A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- host
- controller
- data processing
- processing unit
- storage device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4204—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
- G06F13/4221—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being an input/output bus, e.g. ISA bus, EISA bus, PCI bus, SCSI bus
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0635—Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0679—Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/14—Session management
- H04L67/141—Setup of application sessions
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
- H04L67/566—Grouping or aggregating service requests, e.g. for unified processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/08—Protocols for interworking; Protocol conversion
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/14—Multichannel or multilink protocols
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2213/00—Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F2213/0026—PCI express
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present application relates to the technical field of storage, and in particular to a network storage method, a storage system, a data processing unit and a computer system.
- multi-path software needs to be installed in the host, and the host uses the multi-path software and a host bus adapter (host bus adapter, HBA) card to transfer input/output (input/output, I/O) requests are distributed to different controllers in the storage array.
- host bus adapter host bus adapter, HBA
- the host uses open source multipath software or third-party multipath software, the cooperation between these multipath software and the controllers in the storage array is often not good enough, which may lead to unbalanced load among controllers or a large number of controllers. I/O forwarding. If the host adopts the multi-path software customized by the storage array provider, and the customized multi-path software cooperates with the controller in the storage array to balance the distribution, it is necessary to develop corresponding customized multi-path software for different host operating systems. However, At present, there are many types and versions of operating systems, which will bring great difficulties in software development.
- embodiments of the present application provide a network storage method, a storage system, a data processing unit, and a computer system.
- the host does not need to install multi-path software and can implement I/O Balanced distribution of O requests among controllers, avoiding massive forwarding of I/O requests among controllers.
- the present application provides a network storage method for a storage system
- the storage system includes a host, a data processing unit, and a storage device
- the data processing unit is connected to the host through a PCIe interface
- the storage device includes multiple controllers and multiple a logical unit
- the network storage method includes: the host computer sends an input/output request to a data processing unit, and the input/output request is used to access one of the logical units in a plurality of logical units; the data processing unit receives from a plurality of controllers A controller for processing the input/output request is determined, so as to realize load balancing among multiple controllers, and the input/output request is sent to the determined controller.
- the data processing unit can serve as an external device (PCIe device) of the host, and be connected to the host through a PCIe interface on the host. Therefore, the host can directly send the I/O request to the data processing unit, and there is no need to pay attention to the distribution (allocation and sending) of the I/O request among the multiple controllers in the storage device, which reduces the processing in the host
- the burden of the processor such as the central processing unit (central processing unit, CPU)
- the data processing unit determines a controller for processing the I/O request from multiple controllers, so that the load can be balanced between multiple controllers, and the I/O request can be avoided as much as possible.
- the /O request is forwarded among the controllers, and then the data processing unit sends the I/O request to the determined controller.
- the data processing unit communicates with the host through the NVMe protocol, and the data processing unit communicates with the controller in the storage device through the NVMe-oF protocol.
- the data processing unit is also responsible for completing the protocol conversion between the host and the storage device .
- the host does not need to pay attention to the interaction with the storage device.
- the network protocol processing, I/O distribution and other tasks originally completed by the processor in the host are now offloaded to the data processing unit for execution, which can reduce the pressure on the processor in the host. .
- the host communicates with the data processing unit using the NVMe protocol, the host can regard the data processing unit as a local NVMe storage device, and the data processing unit is responsible for the interaction with the real storage device.
- the data processing unit communicates with storage devices (remote or on the cloud) through high-performance storage protocols such as NVMe-oF, which helps to improve the network storage efficiency of the host, does not affect the performance of the host operating system, and facilitates Expand the host's external storage capacity.
- the data processing unit determines from a plurality of controllers the controller used to process the input/output request, including: the data processing unit determines from the plurality of controllers according to a hash algorithm The controller used to handle I/O requests to achieve load balancing among multiple controllers.
- the data processing unit can use the hash algorithm to balance the load among the controllers.
- a consistent hash algorithm can be used to cooperate with the controller, and the shards divided in the logical unit can be used as the granularity of load balancing.
- each I/O issued by the host The request is distributed to each controller in the storage device, avoiding the I/O forwarding of the I/O request among the controllers.
- the method further includes: the data processing unit sends a report command to the storage device, and the report command instructs the storage device to send information about the logic unit corresponding to the host to the data processing unit; the data processing unit Receive the information of the logical unit corresponding to the host sent by the controller in the storage device, generate a corresponding device for the logical unit corresponding to the host according to the information of the logical unit corresponding to the host, and send the generated device information to the host.
- the logical unit corresponding to the host refers to the part of the logical unit allocated by the storage device to the host.
- the storage device may allocate one or more logical units in the storage device to the host according to the storage resource requirements of the host or other factors.
- the data processing unit can perform an operation of scanning logical units for the host, so as to find the logical unit allocated by the storage device to the host, and then report to the host. Since there is no affiliation relationship between the controller and the logical unit in the storage device, the storage device can send the information of the logical unit corresponding to the host to the data processing unit through each controller. After the data processing unit receives the information sent by each controller, it needs to aggregate, and then generate a corresponding device for each logical unit corresponding to the host, and finally send the generated device information to the host through the NVMe protocol.
- the method further includes: the host generates a virtual storage device corresponding to the device according to the device information sent by the data processing unit, and the virtual storage device is used to provide the application in the host to perform access.
- the host can abstract the device into a corresponding virtual storage device, and will not directly expose the device to applications in the host.
- the present application provides a storage system
- the storage system includes a host, a data processing unit and a storage device, the data processing unit is connected to the host through a PCIe interface, and the storage device includes multiple controllers and multiple logic units;
- the host Used to send an input/output request to a data processing unit, the input/output request is used to access a logic unit in a plurality of logic units;
- the data processing unit is used to determine from a plurality of controllers to process the input The controller of the /output request, so as to realize load balancing among multiple controllers, sends the input/output request to the determined controller.
- the data processing unit is further configured to: communicate with the host through the NVMe protocol; communicate with the controller in the storage device through the NVMe-oF protocol.
- determining a controller for processing an input/output request from a plurality of controllers includes: determining a controller for processing an input/output request from a plurality of controllers according to a hash algorithm The requested controller to achieve load balancing among multiple controllers.
- the data processing unit is further configured to: send a report command to the storage device, and the report command instructs the storage device to send the information of the logical unit corresponding to the host to the data processing unit; receive the information in the storage device The controller sends the information of the logic unit corresponding to the host; according to the information of the logic unit corresponding to the host, a corresponding device is generated for the logic unit corresponding to the host, and the information of the generated device is sent to the host.
- the host is further configured to: generate a virtual storage device corresponding to the device according to the device information sent by the data processing unit, and the virtual storage device is used to provide an application in the host for access .
- the present application provides a data processing unit, the data processing unit is configured to: receive an input/output request sent by a host, wherein the data processing unit is connected to the host through a PCIe interface, and the input/output request is used to access A logical unit in a storage device, the storage device includes multiple logical units and multiple controllers; the controller used to process input/output requests is determined from the multiple controllers to achieve load balancing among multiple controllers , sending the input/output request to the determined controller.
- the data processing unit is a special processor, which can be understood as a data processing unit chip here.
- the chip As a PCIe device outside the host, the chip is connected to the host through the PCIe interface, and then can provide the host with functions such as balanced distribution of I/O requests, network protocol processing, and scanning logic units, reducing the pressure on the processor in the host.
- the data processing unit is further configured to: communicate with the host through the NVMe protocol; and communicate with the controller in the storage device through the NVMe-oF protocol.
- the data processing unit is further configured to: determine a controller for processing the input/output request from a plurality of controllers according to a hash algorithm, so as to implement load balancing among them.
- the data processing unit is further configured to: send a report command to the storage device, and the report command instructs the storage device to send the information of the logical unit corresponding to the host to the data processing unit; receive the information in the storage device
- the information of the logical unit corresponding to the host is sent by the controller; according to the information of the logical unit corresponding to the host, a corresponding device is generated for the logical unit corresponding to the host, and the information of the generated device is sent to the host.
- the present application provides a computer system, including a host and the data processing unit in any one of the embodiments of the third aspect above, and the data processing unit is connected to the host through a PCIe interface.
- the embodiment of the present application configures a data processing unit for the host (the data processing unit is connected to the host through a PCIe interface as a PCIe device outside the host), and distributes I/O requests in a balanced manner, processes network protocols, and scans logical units. and other tasks are offloaded to the data processing unit, which can reduce the pressure on the processor in the host.
- the host does not need to install multi-path software, it only needs to be equipped with the data processing unit, which can avoid the difficulty of developing corresponding customized multi-path software for many types of operating systems.
- almost all types of operating systems currently support the NVMe protocol, and the data processing unit can be driven by using the NVMe driver in the host operating system. Therefore, the data processing unit is not limited by the type and version of the host operating system, has good adaptability/generality, and can be widely used.
- the data processing unit communicates with the host through the NVMe protocol, and appears to the host as a local NVMe device, so the host does not need to pay attention to the interaction with the storage device.
- the data processing unit communicates with the controller in the storage device through the NVMe-oF protocol to achieve high-performance data access. In fact, it expands the storage space of the host to storage devices located on the remote end or on the cloud, which can simplify the design of the host , to reduce the local storage capacity required by the host as much as possible and save costs.
- the data processing unit determines the controller used to process the I/O request from among the multiple controllers of the storage device, so as to realize load balancing among the controllers.
- the hash algorithm can be used to configure the controller, and the I/O distribution between the controllers can be performed using fragmentation as the granularity of load balancing, so as to avoid the forwarding of I/O requests between the controllers.
- the data processing unit is also used to realize the function of scanning the logic unit, and send a report command to the storage device to instruct the storage device to send the information of the logic unit corresponding to the host to the data processing unit.
- the data processing unit After receiving the information of the logical unit corresponding to the host from each controller, the data processing unit generates a corresponding device for each logical unit corresponding to the host, and then sends the generated device information to the host so that the host can perceive To the storage space allocated by the storage device (that is, the logical unit corresponding to the host), and then the data can be accessed in the allocated storage space.
- FIG. 1 is a schematic diagram of a host relying on multipath software to implement I/O request distribution provided by an embodiment of the present application;
- FIG. 2 is a schematic diagram of distributing I/O requests to different controllers in units of fragments provided by the embodiment of the present application;
- FIG. 3 is an architecture diagram of a storage system provided by an embodiment of the present application.
- FIG. 4 is a schematic flowchart of a network storage method provided by an embodiment of the present application.
- FIG. 5 is a schematic diagram of the DPU according to the embodiment of the present application distributing the I/O request issued by the host to each controller;
- Fig. 6 is a schematic diagram of distributing I/O requests sent by the host to different controllers by the multipath module provided by the embodiment of the present application;
- FIG. 7 is a schematic flow diagram of reporting a LUN to a host by a storage device provided in an embodiment of the present application
- FIG. 8 is a schematic diagram of reporting LUN1 and LUN2 to the host provided by the embodiment of the present application.
- FIG. 9 is a schematic structural diagram of a computer system provided by an embodiment of the present application.
- Data processing unit data processing unit, DPU
- DPU is a large category of special-purpose processors. It is the third important computing power chip in the data center scene after the central processing unit (CPU) and graphics processing unit (GPU). Computing engines are provided for high-bandwidth, low-latency, and data-intensive computing scenarios.
- DPU has the versatility and programmability of CPU, but it is more specialized. DPU is distinguished from CPU by a greater degree of parallelism.
- LUN logical unit number
- a LUN may have two types of identifiers with different functions, namely a LUN ID and a universally unique identifier (UUID).
- LUN ID is the identification of LUN in the device, which is used to distinguish different LUNs in the device. However, the LUN ID is only unique within the same device, and may overlap with the LUN ID of a LUN in other devices.
- the UUID is also the identifier of the LUN, the UUID is a globally unique number. It should be noted that after the LUN is mounted on the host, the LUN seen from the host is the host LUN, and the host LUN ID is called the host LUN ID. Because the host LUN is not the actual physical space, it is only the mapping of the LUN on the host. Therefore, the host LUN ID can also be regarded as the identification of the LUN.
- LUN has no ownership, which means that there is no ownership relationship between each LUN and (storage) controller, and each controller can access any LUN.
- the disks in the two disk enclosures together form a storage pool, and each controller in the storage system can access the storage space of the storage pool, so the storage pool is global.
- the access cache is shared between the controllers (or the caches between the controllers are mutually backed up), that is, the cache is also global.
- the storage space of a storage pool is divided into LUNs for use by hosts, and each LUN has a corresponding identifier. For each LUN, it does not belong to any controller, and each controller can access it. For example, assume that LUN1 is divided into 12 slices, and each slice occupies a logical space of a set size in LUN1. When the host has some I/O requests to access the slices in LUN1, these I/O requests are issued through the storage area network (storage area network, SAN) and distributed to each controller (only is an example of I/O break distribution).
- storage area network storage area network
- controller A is assigned to access some I/O requests of slice 1, slice 3, or slice 7 in LUN1, and then controller A accesses LUN1 according to the assigned I/O requests and executes corresponding I/O requests.
- controller B is assigned to access some I/O requests of slice 2, slice 4 or slice 11 in LUN1, so controller B will also access LUN1 according to the assigned I/O request...
- LUN1 does not belong to any controller, and all four controllers can access LUN1.
- Fig. 3 is the architecture diagram of a kind of storage system 300 provided in the embodiment of the present application
- storage system 300 comprises one or more hosts 301 (only one host 301 is taken as an example in the figure), DPU 302 and storage device 303 .
- each host 301 has a corresponding DPU 302, and the DPU 302 is connected to the host 301 through a peripheral component interconnect express (PCIe) interface (or connected to a processor in the host through a PCIe bus).
- PCIe peripheral component interconnect express
- the DPU 302 in the figure is located outside the host 301, but the DPU 302 can also be directly integrated into the host 301 or connected to the host 301 in the form of a plug.
- the host 301, the DPU 302, and the storage device 303 are specifically introduced below.
- the host 301 may be a server, a notebook computer, a desktop computer, etc., which is not specifically limited in this application.
- the operating system (operating system, OS) of the host 301 includes a non-volatile memory standard (non-volatile memory express, NVMe) driver, and the NVMe driver is used to drive the NVMe device.
- NVMe non-volatile memory express
- various application programs may be installed in the host 301, and the user may trigger an I/O request through the application programs to access data.
- the DPU 302 provides one or more types of network ports for the host 301, so that the host 301 can be connected to a corresponding type of network, and then can interact with the storage device 303. Therefore, the host 301 does not need a separate network interface card (network interface card, NIC).
- network interface card network interface card
- the DPU 302 provides a RoCE network port for the host 301 so that the host 301 can be connected to the RoCE network, thereby enabling Store/fetch data in the storage device 303 .
- RoCE converged ethernet
- the DPU 302 can also access the storage device 303 through a switch to access data. Only one network port is drawn in Figure 3, but it does not mean that there is only one or one type of network port.
- the DPU 302 may include an NVMe controller, a multipath module, and an NVMe over Fabrics (referred to as NVMe-oF or NoF) initiator:
- NVMe controller ie NVMe controller
- the NVMe controller can simulate the DPU 302 as an NVMe device. From the perspective of the host 301, the DPU 302 is equivalent to a local NVMe device, so the host 301 can use its own NVMe driver to drive the DPU 302. The host 301 and the DPU 302 uses NVMe protocol for communication.
- the multi-path module is used to aggregate the LUNs reported to the DPU 302 by the storage device 303 through multiple paths (or in other words, through multiple controllers 305), and report the aggregated LUNs to the host 301 by the NVMe controller .
- the multi-path module is also used to distribute the I/O requests delivered by the host 301 to each controller 305 in the storage device 303 in order to achieve load balancing among the controllers 305 .
- NoF initiator also referred to as NoF INI
- NoF target used to realize NoF connection with NoF target (NoF target)
- realize communication link establishment LUN scanning, I/O delivery, etc. Function.
- the DPU 302 can also include more or fewer modules.
- the three modules NVMe controller, multipath module, and NoF starter in the figure are only an exemplary division, and each module can also be further divided. Splitting into multiple functional modules, or combining multiple modules into one module is not limited in this application.
- the storage device 303 may be a storage array, a hard disk enclosure, a server (cluster), a desktop computer, etc., which is not specifically limited in this application.
- the storage device 303 includes a storage medium 304 and a plurality of controllers 305 .
- the storage medium 304 is configured to provide storage space.
- the storage space of the storage medium 304 is virtualized into a storage pool, and then divided into multiple LUNs. There is no affiliation relationship between the LUNs and the controller 305, and each controller 305 can control access to any LUN.
- n is a positive integer greater than 1
- the n logical units are respectively represented by LUN1, LUN2 to LUNn.
- Each controller 305 in the storage device 303 can access any one of the n logical units, so there is no affiliation relationship between the controller 305 and the LUN.
- the storage medium 304 can be the same type of memory, such as one or more solid-state hard disks; it can also be a combination of multiple types of memory, such as a combination of solid-state hard disks and mechanical hard disks; it can also be a or multiple hard disk enclosures, one or more hard disks can be placed in a hard disk enclosure, and so on. It should be noted that this application does not specifically limit the storage medium 304 .
- the controller 305 is configured to perform access control on the storage medium 304, and perform corresponding I/O operations according to received I/O requests.
- each controller 305 has a corresponding interface card so that the controller 305 can be connected to the network.
- each controller 305 has a RoCE interface card, so that the controller 305 can be connected to the RoCE network.
- the RoCE network it can also be other types of networks supporting the NoF protocol, which are not limited in this application, and the original network structure does not need to be changed.
- the interface card in FIG. 3 is located outside the controller 305, but the interface card can also be directly integrated into the controller 305 or connected to the controller 305 in the form of a plug-in card, which is not limited in this application.
- each controller 305 in the storage device 303 has a cache inside, and each controller 305 may share the cache as mutual backup.
- controller A and controller B Take the storage device 303 as an example of a storage array including two controllers 305 (respectively controller A and controller B). There is a mirror channel between controller A and controller B. When controller A writes a copy of data After it is cached, a copy of the data can be sent to controller B through the mirroring channel, and controller B stores the copy in its own cache. Therefore, controller A and controller B are mutual backups. When controller A fails, controller B can take over the business of controller A. When controller B fails, controller A can take over the business of controller B. business, thereby avoiding the unavailability of the entire storage array due to hardware failure. Similarly, if four controllers 305 are deployed in the storage array, there is a mirror channel between any two controllers 305, so any two controllers 305 are mutual backups.
- FIG. 4 is a schematic flowchart of a network storage method provided by an embodiment of the present application. The method is used in the above-mentioned storage system 300, and includes the following steps:
- the host 301 sends an I/O request to the DPU 302, the I/O request is used to access one of the logic units in the storage device 303, and the DPU is connected to the host 301 through a PCIe interface.
- the host 301 can send one or more I/O requests to the DPU 302 at a time, and each I/O request can access any one of the multiple logical units or a specified logical unit.
- an I/O request is taken as an example, but it does not mean that there can only be one I/O request.
- each I/O request includes address information, and the address information indicates the address segment targeted by the I/O request, so as to store/fetch data on the address segment.
- the address information may directly indicate the address segment to be accessed by the I/O request, or may indirectly indicate the address segment to be accessed by the I/O request, which is not limited in this application.
- the address information in the read request indicates the address segment to be accessed by the read request, so as to read data from the address segment; if the host 301 sends It is a write request (i.e. I request), and the write request also includes the data to be written, so the address information in the write request indicates the address segment to be accessed by the write request, so as to store data on the address segment.
- the storage medium 304 in the storage device 303 is used to provide storage space, but the storage space of the storage medium 304 is not directly exposed to the host 301, but is virtualized into a storage pool, which can then be divided into LUNs and provided to the host 301 use. Therefore, in some possible embodiments, the address information includes identifiers of one or more types of LUNs.
- the identifier of the LUN here can be the UUID of the LUN, or the identifier of the mapping of the LUN in the storage medium 304 in the host 301, such as the host LUN ID, or the corresponding block device (virtual storage device) of the LUN in the host 301 identification, etc.
- the specific location of data in a LUN can be determined by the start address and the length (length) of the data.
- start address those skilled in the art usually call it a logical block address (logical block address, LBA). Therefore, in a possible embodiment, the address information includes the LUN identifier, LBA and length, and the three factors of the LUN identifier, LBA and length can identify a certain address segment. Therefore, the I/O request sent by the host 301 will be located on an address segment, so as to read data from the address segment or write data to the address segment.
- address information in addition to carrying the LUN ID, LBA, and length as address information in the I/O request, other logical addresses can also be used to construct address information, such as virtual space ID, virtual space start address, and length. This application does not specifically limit the expression manner of the address information.
- various application programs can be installed in the host 301, and the user can trigger the host 301 to generate an I/O request to access data through the application programs running in the host 301, and then the host 301 will send the generated I/O request The /O request is sent to the DPU 302 through the NVMe protocol.
- the DPU 302 determines the controller 305 for processing the I/O request from the multiple controllers 305 in the storage device 303, so as to realize load balancing between the multiple controllers 305, and sends the determined controller 305 the I/O requests.
- the DPU 302 when the DPU 302 receives one or more I/O requests issued by the host 301, the DPU 302 will request the one or more I/O requests from the multiple controllers 305 in the storage device 303 The /O request respectively determines a controller 305 for processing the I/O request, and then sends the one or more I/O requests to the determined controller 305, so as to realize load balancing among multiple controllers 305 .
- each controller 305 in the storage device 303 has a path to the DPU 302, and when the DPU 302 receives one or more I/O requests sent by the host 301, the DPU 302 provides the one or more I/O requests for the one or more I/O requests.
- the I/O requests determine a path to the storage device 303 respectively, that is, they are divided into multiple paths, which is equivalent to indirectly determining a controller 305 for each I/O request, and the DPU 302 separates each I/O request
- the determined path is sent to the controller 305 corresponding to the path, so that load balancing among multiple controllers 305 can be achieved.
- the storage device 303 includes two controllers 305 , which are controller A and controller B respectively. Both controller A and controller B have a path to DPU 302, for ease of description, this path between controller A and DPU 302 is called path 1, this path between controller B and DPU 302 is called Referred to as path 2, both path 1 and path 2 can be attributed to the path between the storage device 303 and the DPU 302.
- the logical unit LUN1 in the storage medium 304 is divided into six slices, which are represented by slices 1 to 6 respectively, and each slice occupies a logical space of a set size in the LUN1.
- the host 301 generates some I/O requests, and these I/O requests respectively need to access different slices in the LUN1, and then the host 301 sends these I/O requests to the DPU 302.
- the DPU 302 determines a controller 305 for processing the I/O request for each I/O request, so as to realize controller A and control Load balancing between server B.
- the DPU 302 includes an NVMe controller, a multipath module, and a NoF initiator; the NVMe controller is responsible for receiving one or more I/O requests sent by the host 301 through the NVMe protocol, and then handing them over to the DPU 302 Multi-path module; then, the multi-path module determines a controller 305 for processing the I/O request respectively for the one or more I/O requests, so that the load is balanced between multiple controllers 305 in the storage device 303 allocation; finally, the NoF initiator sends the one or more I/O requests to the determined controller 305 according to the allocation of the multipath module.
- the NVMe controller is responsible for receiving one or more I/O requests sent by the host 301 through the NVMe protocol, and then handing them over to the DPU 302 Multi-path module; then, the multi-path module determines a controller 305 for processing the I/O request respectively for the one or more I/O requests, so that the load is balanced between multiple controllers 305 in the storage device
- LUN1 is one of the logical units divided in the storage medium 304.
- the logical space of LUN1 is divided into six slices, which are respectively represented by slice 1 to slice 6. Each slice occupies the size set in LUN1. logical space. It can be understood that an I/O request to access any slice in LUN1 can be attributed to an I/O request to access LUN1.
- the host 301 Assume that the host 301 generates some I/O requests, and these I/O requests respectively access different slices in the LUN1, and then the host 301 sends these I/O requests to the DPU 302 through the NVMe protocol. After receiving these I/O requests, the NVMe controller in the DPU 302 sends these I/O requests to the multipath module.
- the multi-path module determines a path to the storage device 303 for these I/O requests, so as to realize load balancing between controller A and controller B. Among them, the I/O request for accessing slice 1, slice 3, or slice 6 in LUN1 is determined to be sent to the storage device 303 through path 1, which is equivalent to sending the I/O request for accessing slice 1, slice 3, or slice 6 to the storage device 303.
- the /O request is allocated to controller A; the I/O request to access slice 2, slice 4, or slice 6 in LUN1 is determined to be sent to storage device 303 through path 2, which is equivalent to being allocated to controller B. Finally, the NoF initiator sends the I/O request for accessing slice 1, slice 3, or slice 6 to controller A through path 1 according to the allocation of multipath modules, and sends The I/O request of slice 5 is sent to controller B through path 2.
- the controller A and the controller B will respectively access corresponding locations in the storage medium 304 according to the I/O requests they receive, and perform corresponding I/O operations.
- the multipath module can perform load balancing according to a hash (Hash) algorithm, and determine a controller 305 for processing the I/O request from a plurality of controllers 305, so as to realize a plurality of controllers 305 load balance among them, and then send the I/O request to the determined controller 305.
- the hash algorithm actually adopted can cooperate with the controllers 305 to distribute I/O between the controllers 305 in units of slices, and try to avoid the forwarding of I/O requests among the controllers 305 .
- the multipath module can access the same
- the I/O request of slice 1, slice 3, or slice 5 in the logical unit is allocated to controller A, and the I/O request for accessing slice 2, slice 4, or slice 6 in the same logical unit is allocated
- controller B that is to say, the I/O requests for accessing different fragments of the same logical unit are evenly distributed between controller A and controller B, which can realize load balancing between controller A and controller B .
- the multipath module can also use other load balancing algorithms to evenly distribute the I/O requests issued by the host 301 to each controller 305, which is not specifically limited in this application.
- the multi-path module can also determine the corresponding controller 305 for the I/O request according to factors such as the visit volume of each controller per unit time or the CPU utilization rate of each controller, so as to realize load balancing among the controllers 305 .
- the multipath module can request the slice to be accessed according to the I/O
- the controller 305 corresponding to the allocation is determined as the controller 305 for processing the I/O request, and then the NoF initiator passes the NoF protocol to the I/O request Send to the determined controller 305.
- LUN1 in the storage device 303 has no ownership relationship with each controller 305 , and any controller 305 can access LUN1 , but each slice in LUN1 has a corresponding relationship with each controller 305 .
- LUN1 is divided into six slices, slice 1, slice 3, and slice 5 correspond to controller A, while slice 2, slice 4, and slice 6 correspond to controller B, that is, anyone who accesses LUN1
- the I/O requests of slice 1, slice 3, or slice 5 are handled by controller A, and the I/O requests of slice 2, slice 4, or slice 6 in LUN1 are all handled by controller A Controller B is responsible for processing.
- This correspondence allocates the I/O requests for accessing LUN1 to different controllers 305 at the granularity of fragmentation, which helps to achieve load balancing between controllers A and B. Therefore, when the multipath module receives an I/O request to access LUN1, it can first determine which slice in LUN1 the I/O request actually wants to access (assuming it is slice 1), and then according to slice 1 According to the corresponding relationship with the controller A, the controller A can be determined as the controller 305 for processing the I/O request, and finally the I/O request is sent to the controller A.
- the host 301 is equipped with a corresponding DPU 302, and the DPU 302 is used as an external PCIe device of the host 301, and is connected to the host 301 through a PCIe interface.
- the DPU 302 offloads tasks such as multipath distribution of I/O requests from the host 301, network protocol processing, and scanning LUNs to itself, reducing the pressure on the CPU in the host 301.
- the design of the host 301 is simplified, and the cost can be saved at the same time.
- the host 301 does not need to install multi-path software, software developers are not required to additionally develop multi-path software, which avoids the difficulties caused by developing corresponding customized multi-path software for many operating systems.
- the DPU 302 can be driven by using the NVMe driver in the host operating system. Therefore, the DPU 302 will not be limited by the type or version of the host operating system.
- it can be popularized and used in various hosts 301 .
- the host 301 can directly treat the DPU 302 as a local NVMe device, so the host 301 does not need to pay attention to the communication with the storage device 303, and only needs to send the generated I/O request directly to the DPU 302, which affects the performance of the host 301 smaller.
- the NVMe controller in the DPU 302 is responsible for implementing the NVMe protocol, and performs NVMe communication with the host 301.
- the NVMe controller receives the I/O requests sent by the host 301 through the NVMe protocol, it sends these I/O requests to the DPU 302. multipath module.
- the multi-path module is responsible for distributing I/O requests to the controllers 305 in the storage device 303 to achieve balanced distribution of I/O requests among multiple controllers 305 .
- the hash algorithm can be used to cooperate with the controller 305, and the I/O distribution between the controllers 305 is performed in units of fragments, so that the load between the controllers 305 can be balanced, and a large number of I/O requests between the controllers 305 can be avoided.
- the NoF initiator in the DPU 302 communicates with the controller in the storage device through the NVMe-oF protocol, which can achieve high-performance data access, that is, expand the storage space of the host 301 to a storage device located on the remote end or on the cloud 303, the design of the host 301 can be simplified, the local storage capacity required by the host 301 can be reduced as much as possible, and the cost can be saved.
- the DPU 302 is responsible for the balanced distribution of I/O requests between multiple controllers 305. Another advantage is that the hardware design of the storage device 303 can be simplified, and the storage device 303 side does not need to increase the corresponding design of load balancing (such as designing a custom chip. To realize the balanced distribution of I/O requests among multiple controllers 305), each controller 305 can process the I/O requests received by itself.
- the network storage method further includes: each controller 305 in the storage device 303 respectively processes the I/O request received by itself.
- each controller 305 in the storage device 303 determines the address segment to be accessed by the I/O request according to the address information in the received I/O request, and then executes the corresponding I/O request on the address segment. /O operation.
- an I/O request received by controller A carries the ID of LUN8, LBA, and length as address information. According to the ID, LBA, and length of LUN8, it can be determined that the I/O request is to access LUN8.
- the length is the length of the address segment, and then perform the corresponding I/O operation on this address segment. If it is a write request, write the corresponding data into this address segment; if it is a read request, read the corresponding data from this address segment.
- controller A determines the address segment according to the I/O request, it actually needs to determine the corresponding global address according to the address segment (one address segment can be indexed to a global address, and the space indicated by the global address is in the storage pool is unique), then access the corresponding physical location in the storage medium 304 according to the global address (according to the global address, the corresponding physical address can be determined, and the physical address indicates which memory the space represented by the global address is actually located on, And the offset in this memory, that is, the location in physical space).
- the storage device 303 needs to report the LUN corresponding to the host 301 to the host 301.
- the process of reporting the LUN by the storage device 303 to the host 301 will be introduced in conjunction with the storage system 300 .
- FIG. 7 is a schematic flowchart of a storage device 303 reporting a LUN to a host 301 according to an embodiment of the present application, which is used in the storage system 300, and includes the following steps:
- the DPU 302 sends a report command to the storage device 303, and the report command instructs the storage device 303 to send the LUN corresponding to the host 301 to the DPU 302.
- the storage medium 304 in the storage device 303 is virtualized into a storage pool, and then divided into LUNs for use by the host 301.
- the LUN corresponding to the host 301 refers to the allocation of the storage device 303 to the host 301. part of the LUN.
- the storage device 303 may provide one or more LUNs to the host 301 according to the resource demand of the host 301 or other factors.
- reporting a certain/certain LUN refers to reporting the relevant information of the LUN.
- the relevant information of the LUN may include the LUN identification, logical address, capacity, etc., which is not specifically limited in this application.
- the DPU 302 may send the report command to the storage device 303 through multiple paths respectively.
- each controller 305 in the storage device 303 can have a path to the DPU 302, and the path between any controller 305 and the DPU 302 in the storage device 303 can be attributed to the storage device 303 and the path between DPU 302. Since there are multiple controllers 305 in the storage device 303, there will be multiple paths between the storage device 303 and the DPU 302. Therefore, the DPU 302 sends the report command to the storage device 303 through multiple paths, which may be to send the report command to multiple controllers 305 in the storage device 303 respectively.
- the report command sent by the DPU 302 to the storage device 303 carries the identifier of the host 301 and/or the port information of the DPU 302 that transmits the report command.
- the DPU 302 can perceive the change of the LUN configuration in the storage device 303 (for example, there is a LUN newly allocated to the host 301), and actively send a report command to the storage device 303, that is, the DPU 302 can replace
- the host 301 realizes the function of scanning LUNs, so that the host 301 can perceive the change of the logical unit allocated to the host 301 by the storage device 303 .
- the storage device 303 determines the LUN corresponding to the host 301 according to the report command, and sends the information of the LUN corresponding to the host 301 to the DPU 302 through the controller 305 in the storage device 303.
- the storage device 303 sends the information of the LUN corresponding to the host 301 to the DPU 302 through multiple paths between the storage device 303 and the DPU 302. It should be understood that since there is no affiliation relationship between the LUN and the controller 305, the same LUN should be reported through multiple controllers 305, that is, reported through multiple paths, so that the DPU 302 can determine multiple access paths for the same LUN .
- the storage device 303 After the storage device 303 determines the LUN corresponding to the host 301 according to the report command, it generates LUN report information, and then sends the LUN report information to the DPU 302 through multiple paths.
- the LUN report information includes information about the LUN corresponding to the host 301 , such as the LUN identifier, logical address range, capacity, and the like. It can be understood that the storage device 303 sends the LUN report information to the DPU 302 respectively through multiple paths between the storage device 303 and the DPU 302, so the DPU 302 will receive the LUN report information from different paths. In other words, multiple controllers 305 will respectively send the LUN report information to the DPU 302, and the DPU 302 will receive the LUN report information from different controllers 305.
- the controller 305 through which the LUN report information passes will add the identification of the respective ports (or the identification of the controller 305) into the LUN report information, that is, the LUN The corresponding path information is added to the reported information.
- the LUN reporting information sent to the DPU 302 by the controller A in the storage device 303 carries the identifier of the port of the controller A
- the LUN reporting information sent to the DPU 302 by the controller B carries the controller B
- the port identifier of the LUN in order to distinguish the information reported by LUNs from different paths.
- the DPU 302 generates a corresponding device for each LUN corresponding to the host 301 according to the LUN information sent by the controller in the storage device 303, and reports the generated device information to the host 301.
- the NoF initiator in the DPU 302 is responsible for receiving the LUN report information sent by each controller 305, and then generating a corresponding LUN for each LUN indicated in the LUN report information sent by each controller 305.
- device is a virtual device.
- the multipath module aggregates the devices corresponding to the same LUN among the devices generated by the NoF initiator to obtain the aggregated devices.
- the NVMe controller represents the aggregated devices with namespaces (namespace, NS) defined in the NVMe protocol, assigns corresponding namespace identifiers, and then reports to the host 301 .
- the multipath module sets the path of a certain LUN reported by the storage device 303 as the path for accessing the LUN.
- the path between controller A and DPU 302 is path 1
- the path between controller B and DPU 302 is path 2.
- the relevant information of LUN1 is sent to DPU 302 through controller A and controller B respectively, so the multi-path module in DPU 302 will set path 1 and path 2 as the paths to access LUN1.
- the DPU 302 receives an I/O request to access LUN1, it can choose between these two paths according to a preset strategy (such as a certain load balancing strategy), and then pass the determined path to the LUN1.
- the I/O request is sent to the controller 305 corresponding to the path in the storage device 303 .
- the host 301 creates a corresponding virtual storage device according to the device information reported by the DPU 302.
- the host 301 generates a corresponding virtual storage device based on the received device information, which is equivalent to abstracting the device reported by the DPU 302, and does not directly expose it to the application in the host 301.
- the application can only perceive the created virtual storage device. Storage devices, to access these virtual storage devices.
- the NVMe initiator in the host 301 receives the namespace information sent by the NVMe controller in the DPU 302, it registers it as a block device in the host 301 (a block device is some kind of virtual storage device, used to provide the application in the host 301 for operation), including assigning a corresponding name to the block device, establishing a mapping relationship between the block device and the namespace, etc., and recording other information in the block device, such as logic Address space, capacity size, etc.
- controllers 305 in the storage device 303 which are respectively controller A and controller B, and both controller A and controller B have a path to DPU 302, both paths belong to the path between storage device 303 and DPU 302.
- the two logical units LUN1 and LUN2 divided in the storage device 303 are both LUNs corresponding to the host 301. Since there is no ownership relationship between the LUN and the controller 305, any controller 305 can perform access control on LUN1 and LUN2. .
- the storage device 303 can determine the LUN1 and LUN2 corresponding to the host 301 according to the identifier of the host 301 in the report command, and then report the LUN1 and LUN2 to the DPU 302 through controller A and controller B respectively, that is, through two different paths Send the information of LUN1 and LUN2 to DPU 302.
- the NoF initiator in DPU 302 generates a corresponding device for LUN1 according to the information of LUN1 sent by controller A, named NoF11, and according to the information of LUN1 sent by controller B, it is also named as LUN1 Another corresponding device was generated, named NoF12. It can be seen that since LUN1 is reported through two different paths, LUN1 is recognized as two devices by the NoF initiator, and these two devices actually correspond to LUN1. Similarly, the NoF initiator will generate two corresponding devices for LUN2 according to the information of LUN2 sent by controller A and controller B respectively, namely NoF21 and NoF22.
- the multipath module can recognize that NoF11 and NoF12 correspond to the same LUN (namely LUN1), and then aggregate these two devices into device Dev1, so Dev1 corresponds to LUN1. Similarly, the multipath module will aggregate NoF21 and NoF22 into the same device Dev2. Since the NVMe controller is used to implement the NVMe protocol, the NVMe controller needs to use the namespace defined in the NVMe protocol to represent Dev1 and Dev2, and identify them as NS1 and NS2 respectively. NS1 and NS2 actually correspond to storage devices. LUN1 and LUN2 in 303. Then, the NVMe controller reports the information of NS1 and NS2 to the host 301 .
- the NVMe driver in the host 301 After the NVMe driver in the host 301 receives the information of NS1 and NS2 reported by the DPU 302, it registers NS1 and NS2 in the block device layer, and generates two corresponding block devices, represented by nvme0n1 and nvme0n2 respectively, to provide the upper layer application for operation .
- the host 301 does not need to install multi-path software, as long as it is equipped with the corresponding DPU 302, so it is not necessary for software developers to additionally develop multi-path software, avoiding the Difficulty in developing custom multipathing software for numerous operating systems.
- the DPU 302 performs the function of scanning LUNs for the host 301, and sends a report command to the storage device 303; the DPU 302 is also responsible for aggregating the LUN (information) reported by the storage device 303 through multiple paths, and then The aggregated device information is reported to the host 301 .
- the host 301 does not need to pay attention to the communication with the storage device 303, and functions such as network protocol processing and scanning LUN are all offloaded to the DPU 302, which reduces the pressure on the processor (such as CPU) in the host 301 and can also simplify the host 301.
- the design minimizes the local storage capacity of the host 301 and saves costs.
- the embodiment of the present application also provides a data processing unit, which may be the DPU 302 in any of the foregoing embodiments.
- the data processing unit is used to: receive the input/output request sent by the host 301, wherein the data processing unit is connected to the host 301 through the PCIe interface, and the input/output request is used to access a logic unit in the storage device 303, the storage device 303 Including a plurality of logic units and a plurality of controllers 305; determining a controller 305 for processing the input/output request from the plurality of controllers 305, so as to realize load balancing among the plurality of controllers 305, to the determined
- the controller 305 sends the input/output request.
- the data processing unit is a special processor, which can be understood as a data processing unit chip here.
- the chip can be used as a PCIe device outside the host and connected to the host through the PCIe interface.
- the data processing unit is further configured to: communicate with the host 301 through the NVMe protocol; communicate with the controller 305 in the storage device 303 through the NVMe-oF protocol.
- the data processing unit is further configured to: determine the controller 305 used to process the input/output request from the plurality of controllers 305 according to the hash algorithm, so as to realize the communication between the plurality of controllers 305 Load balancing.
- the data processing unit is further configured to: send a report command to the storage device 303, and the report command instructs the storage device 303 to send the information of the logic unit corresponding to the host 301 to the data processing unit;
- FIG. 9 is a schematic structural diagram of a computer system 900 provided by an embodiment of the present application.
- the computer system 900 includes a host 301 and the DPU 302 in any of the above-mentioned embodiments, and the DPU 302 is connected to the host 301 through a PCIe interface. That is to say, computer system 900 includes main frame 301 part and external equipment (DPU 302 is as external equipment).
- the host 301 includes a processor 901 , a memory 902 and a communication interface 903 .
- the processor 901, the memory 902, and the communication interface 903 may be connected to each other through an internal bus 904, or may communicate through other means such as wireless transmission.
- the embodiment of the present application takes the connection through the bus 904 as an example, and the bus 904 may be a peripheral component interconnect standard (peripheral component interconnect, PCI), a fast peripheral component interconnect standard (peripheral component interconnect express, PCIe) bus, an extended industry standard Extended industry standard architecture (EISA) bus, unified bus (Ubus or UB), computer express link (compute express link, CXL) or cache coherent interconnect for accelerators (CCIX), etc.
- PCI peripheral component interconnect
- PCIe fast peripheral component interconnect express
- EISA Extended industry standard architecture
- Ubus or UB unified bus
- computer express link compute express link
- CXL cache coherent interconnect for accelerators
- bus 904 may also include an address bus, a power bus, a control bus, and a status signal bus.
- address bus a bus address bus
- power bus a bus that is connected to the bus 904
- control bus a bus that is connected to the bus 904
- status signal bus a bus that indicates whether there is only one bus or one type of bus.
- the processor 901 may be formed by at least one general-purpose processor, such as a central processing unit (central processing unit, CPU), or a combination of a CPU and a hardware chip.
- the aforementioned hardware chip may be an application-specific integrated circuit (application-specific integrated circuit, ASIC), a programmable logic device (programmable logic device, PLD) or a combination thereof.
- the aforementioned PLD may be a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), a general array logic (generic array logic, GAL) or any combination thereof.
- Processor 901 executes various types of digitally stored instructions, such as software or firmware programs stored in memory 902, which enable computer system 900 to provide various services.
- the memory 902 is used to store program codes, which are executed under the control of the processor 901 .
- the memory 902 can include a volatile memory (volatile memory), such as a random access memory (random access memory, RAM); the memory 902 can also include a non-volatile memory (non-volatile memory), such as a read-only memory (read-only memory). only memory (ROM), flash memory (flash memory), hard disk (hard disk drive, HDD) or solid-state drive (solid-state drive, SSD); the memory 902 may also include a combination of the above types.
- the memory 902 may store program codes, which may specifically be used to execute any embodiment of the network storage method in FIG. 7 or any embodiment in FIG. 8 , which will not be repeated here.
- Comprising at least one PCIe interface and other communication interfaces in the communication interface 903, can be wired interface (such as Ethernet interface), can be internal interface, wired interface (such as Ethernet interface) or wireless interface (such as cellular network interface or use wireless local area network interface) for communicating with other devices or modules.
- the DPU 302 is connected to the processor 901 of the host 301 through the PCIe interface of the host 301, so as to execute any embodiment of the network storage method in FIG. 7 or any embodiment of reporting the LUN in FIG. 8 .
- FIG. 9 is only a possible implementation of the embodiment of the present application.
- the computer system 900 may include more or fewer components, which is not limited here.
- the embodiment of the present application also provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a processor, the method in any one of the embodiments in FIG. 7 or FIG. 8 is implemented.
- An embodiment of the present application further provides a computer program product, and when the computer program product runs on a processor, the method in any one of the embodiments in FIG. 7 or FIG. 8 is implemented.
- the storage medium may be a magnetic disk, an optical disk, a read-only memory (read-only memory, ROM) or a random access memory (random access memory, RAM), etc.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Computer Security & Cryptography (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Computer And Data Communications (AREA)
Abstract
本申请公开了一种网络存储方法、存储系统、数据处理单元及计算机系统。该网络存储方法用于存储系统,存储系统包括主机、数据处理单元和存储设备,数据处理单元通过PCIe接口连接到主机,存储设备包括多个控制器和多个逻辑单元;该方法包括:主机将输入/输出请求发送给数据处理单元,该输入/输出请求用于访问多个逻辑单元中的其中一个逻辑单元;数据处理单元从多个控制器中确定用于处理该输入/输出请求的控制器,以实现多个控制器之间的负载均衡,向确定的控制器发送该输入/输出请求。主机无需安装多路径软件,由数据处理单元负责将主机下发的I/O请求均衡分发到不同控制器,能够减轻主机中CPU的压力。
Description
本申请要求于2021年10月27日提交中国专利局、申请号为202111258240.8、申请名称为“网络存储方法、存储系统、数据处理单元及计算机系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本申请涉及存储技术领域,尤其涉及一种网络存储方法、存储系统、数据处理单元及计算机系统。
如图1所示,在传统的网络存储方案中,需要在主机中安装多路径软件,主机通过多路径软件和主机总线适配器(host bus adapter,HBA)卡,将输入/输出(input/output,I/O)请求分发给存储阵列中的不同控制器。
如果主机采用开源的多路径软件或者第三方的多路径软件,这些多路径软件与存储阵列中的控制器的配合效果往往不够好,可能会导致控制器间的负载不均衡或者控制器间的大量I/O转发。如果让主机采用存储阵列提供商所定制的多路径软件,由定制多路径软件与存储阵列中的控制器进行配合以均衡分发,就需要为不同的主机操作系统开发相应的定制多路径软件,然而,目前操作系统的类型以及版本众多,会带来软件开发上的巨大困难。
发明内容
为了解决主机依赖多路径软件实现I/O分发所存在的上述问题,本申请实施例提供一种网络存储方法、存储系统、数据处理单元及计算机系统,主机无需安装多路径软件,能够实现I/O请求在控制器间的均衡分发,避免I/O请求在控制器间的大量转发。
第一方面,本申请提供了一种网络存储方法,用于存储系统,存储系统包括主机、数据处理单元和存储设备,数据处理单元通过PCIe接口连接到主机,存储设备包括多个控制器和多个逻辑单元;该网络存储方法包括:主机将输入/输出请求发送给数据处理单元,该输入/输出请求用于访问多个逻辑单元中的其中一个逻辑单元;数据处理单元从多个控制器中确定用于处理输入/输出请求的控制器,以实现多个控制器之间的负载均衡,向确定的这个控制器发送该输入/输出请求。
可以看出,数据处理单元在这里可以作为主机的外部设备(PCIe设备),通过主机上的PCIe接口连接到主机上。于是,主机可以直接把I/O请求发送给数据处理单元,完全不需要关注I/O请求在存储设备中的多个控制器之间的分发(分配及发送)工作,减轻了主机中的处理器(比如中央处理器(central processing unit,CPU))的负担,节省了CPU资源。数据处理单元收到主机发送的I/O请求后,从多个控制器中确定一个用于处理该I/O请求的控制器,以使得多个控制器之间能够负载均衡,可以尽量避免I/O请求在控制器间的转发,然后数据处理单元把该I/O请求发送给确定的这个控制器。
基于第一方面,在可能的实施例中,数据处理单元与主机之间通过NVMe协议进行通信,数据处理单元与存储设备中的控制器之间通过NVMe-oF协议进行通信。
可以看出,数据处理单元与主机、存储设备(中的控制器)之间,分别采用的是不同的通信协议,所以,数据处理单元在主机和存储设备之间还要负责完成协议转换的工作。主机无需关注和存储设备之间的交互,原本由主机中的处理器来完成的网络协议处理、I/O分发等任务,现在都卸载到了数据处理单元来执行,能够减轻主机中处理器的压力。
由于主机与数据处理单元之间采用NVMe协议进行通信,所以主机可以把数据处理单元当成本地的NVMe存储设备,与真正的存储设备的交互工作全都由数据处理单元来负责。数据处理单元通过NVMe-oF这样的高性能存储协议与(远端或者云上)的存储设备进行通信,有助于提升主机的网络存储效率,不会对主机操作系统的性能造成影响,还便于拓展主机的外部存储容量。
基于第一方面,在可能的实施例中,数据处理单元从多个控制器中确定用于处理输入/输出请求的控制器,包括:数据处理单元根据哈希算法,从多个控制器中确定用于该处理输入/输出请求的控制器,以实现多个控制器之间的负载均衡。
也就是说,数据处理单元可以使用哈希算法做控制器间的负载均衡。比如,可以采用一致性哈希算法与控制器进行配合,以逻辑单元中划分的分片作为负载均衡的粒度,根据I/O请求所要访问的分片,将主机下发的每个I/O请求分发给存储设备中的各个控制器,避免I/O请求在控制器间的I/O转发。
基于第一方面,在可能的实施例中,所述方法还包括:数据处理单元向存储设备发送上报命令,上报命令指示存储设备将主机对应的逻辑单元的信息发送给数据处理单元;数据处理单元接收存储设备中的控制器发送的主机对应的逻辑单元的信息,根据主机对应的逻辑单元的信息,为主机对应的逻辑单元生成对应的设备,将生成的设备的信息发送给主机。
需要说明的是,主机对应的逻辑单元,指的是存储设备为主机分配的那部分逻辑单元。比如,存储设备可以根据主机的存储资源需求量或者其他因素,将存储设备中的一个或者多个逻辑单元分配给主机。
可以看出,数据处理单元可以为主机执行扫描逻辑单元的操作,以便发现存储设备为该主机分配的逻辑单元,然后上报给主机。由于存储设备中的控制器与逻辑单元之间没有归属关系,所以存储设备可以分别通过各个控制器,将主机对应的逻辑单元的信息发送给数据处理单元。数据处理单元接收到每个控制器所发送的信息后,需要进行聚合,然后为主机对应的每个逻辑单元分别生成一个对应的设备,最后将生成的设备的信息通过NVMe协议发送给主机。
基于第一方面,在可能的实施例中,所述方法还包括:主机根据数据处理单元发送的设备的信息,生成该设备对应的虚拟存储设备,虚拟存储设备用于提供给主机中的应用进行访问。
应理解,主机收到数据单元发送的设备的信息后,可以把该设备抽象为对应的虚拟存储设备,不会直接把该设备暴露给主机中的应用。
第二方面,本申请提供了一种存储系统,该存储系统包括主机、数据处理单元和存储设备,数据处理单元通过PCIe接口连接到主机,存储设备包括多个控制器和多个逻辑单元;主机用于,将输入/输出请求发送给数据处理单元,该输入/输出请求用于访问多个逻辑单元中的一个逻辑单元;数据处理单元用于,从多个控制器中确定用于处理该输入/输出请求的控制器,以实现多个控制器之间的负载均衡,向确定的该控制器发送该输入/输出请求。
基于第二方面,在可能的实施例中,数据处理单元还用于:通过NVMe协议与主机进行 通信;通过NVMe-oF协议与存储设备中的控制器进行通信。
基于第二方面,在可能的实施例中,从多个控制器中确定用于处理输入/输出请求的控制器,包括:根据哈希算法,从多个控制器中确定用于处理输入/输出请求的控制器,以实现多个控制器之间的负载均衡。
基于第二方面,在可能的实施例中,数据处理单元还用于:向存储设备发送上报命令,上报命令指示存储设备将主机对应的逻辑单元的信息发送给数据处理单元;接收存储设备中的控制器发送的主机对应的逻辑单元的信息;根据主机对应的逻辑单元的信息,为主机对应的逻辑单元生成对应的设备,将生成的设备的信息发送给主机。
基于第二方面,在可能的实施例中,主机还用于:根据数据处理单元发送的设备的信息,生成该设备对应的虚拟存储设备,该虚拟存储设备用于提供给主机中的应用进行访问。
第三方面,本申请提供了一种数据处理单元,数据处理单元用于:接收主机发送的输入/输出请求,其中,该数据处理单元通过PCIe接口连接到主机,该输入/输出请求用于访问存储设备中的一个逻辑单元,存储设备包括多个逻辑单元和多个控制器;从多个控制器中确定用于处理输入/输出请求的控制器,以实现多个控制器之间的负载均衡,向确定的该控制器发送该输入/输出请求。
需要说明的是,数据处理单元是一种专用处理器,这里可以理解为是数据处理单元芯片。该芯片作为主机外部的PCIe设备,通过PCIe接口连接到主机上,进而能够为主机提供I/O请求的均衡分发、网络协议处理、扫描逻辑单元等功能,减轻了主机中的处理器的压力。
基于第三方面,在可能的实施例中,数据处理单元还用于:通过NVMe协议与主机进行通信;通过NVMe-oF协议与存储设备中的控制器进行通信。
基于第三方面,在可能的实施例中,数据处理单元还用于:根据哈希算法,从多个控制器中确定用于处理该输入/输出请求的控制器,以实现多个控制器之间的负载均衡。
基于第三方面,在可能的实施例中,数据处理单元还用于:向存储设备发送上报命令,上报命令指示存储设备将主机对应的逻辑单元的信息发送给数据处理单元;接收存储设备中的控制器发送的该主机对应的逻辑单元的信息;根据主机对应的逻辑单元的信息,为主机对应的逻辑单元生成对应的设备,将生成的设备的信息发送给主机。
第四方面,本申请提供了一种计算机系统,包括主机和上述第三方面中任一实施例的数据处理单元,数据处理单元通过PCIe接口连接至主机。
综上所述,本申请实施例为主机配置数据处理单元(数据处理单元作为主机外部的PCIe设备,通过PCIe接口连接到主机),将I/O请求的均衡分发、网络协议处理、扫描逻辑单元等任务均卸载到了数据处理单元,能够减轻主机中处理器的压力。由于主机不需要安装多路径软件,只需要配备该数据处理单元,能够避免为众多类型的操作系统开发相应的定制多路径软件的困难。而且,目前几乎所有类型的操作系统均支持NVMe协议,使用主机操作系统中的NVMe驱动器即可实现对数据处理单元的驱动。所以,该数据处理单元不会受到主机操作系统类型、版本的限制,适配性/通用性好,可以广泛使用。
数据处理单元通过NVMe协议与主机通信,对主机呈现为本地的NVMe设备,所以主机无需关注与存储设备之间的交互。数据处理单元通过NVMe-oF协议与存储设备中的控制器进行通信,实现高性能的数据存取,实际是把主机的存储空间拓展到位于远端或者云上的存储设备,能够简化主机的设计,尽可能减少主机本地所需的存储容量,节省成本。数据处理单元接收到主机发送来的I/O请求后,从存储设备的多个控制器中确定用于处理该I/O请求 的控制器,以实现控制器之间的负载均衡。具体可以采用哈希算法与控制器进行配置,以分片作为负载均衡的粒度在控制器间进行I/O分发,避免I/O请求在控制器之间的转发。
数据处理单元还用于实现扫描逻辑单元的功能,向存储设备发送上报命令,以指示存储设备将主机对应的逻辑单元的信息发送给数据处理单元。数据处理单元接收到每个控制器发来的主机对应的逻辑单元的信息后,为主机对应的每一个逻辑单元都生成一个对应的设备,然后将生成的设备的信息发送给主机,以便主机感知到存储设备为其分配的存储空间(即主机对应的逻辑单元),进而可以往分配的存储空间中存取数据。
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍。
图1是本申请实施例提供的一种主机依赖多路径软件实现I/O请求分发的示意图;
图2是本申请实施例提供的以分片为单位将I/O请求打散分发到不同控制器的示意图;
图3是本申请实施例提供的一种存储系统的架构图;
图4是本申请实施例提供的一种网络存储方法的流程示意图;
图5是本申请实施例提供的DPU将主机下发的I/O请求分发给各控制器的示意图;
图6是本申请实施例提供的多路径模块将主机下发的I/O请求分配给不同控制器的示意图;
图7是本申请实施例提供的一种存储设备向主机上报LUN的流程示意图;
图8是本申请实施例提供的向主机上报LUN1和LUN2的示意图;
图9是本申请实施例提供的一种计算机系统的结构示意图。
为了便于理解本申请实施例中的技术方案,下面先对本申请实施例中涉及的部分术语及概念进行解释说明。
1、数据处理单元(data processing unit,DPU):
DPU是专用处理器的一个大类,是继中央处理器(central processing unit,CPU)、图形处理器(graphics processing unit,GPU)之后,数据中心场景中的第三颗重要的算力芯片,为高带宽、低延迟、数据密集的计算场景提供计算引擎。DPU具有的CPU通用性和可编程性,但更具有专用性,DPU通过较大程度的并行性与CPU区别开来。
2、逻辑单元(logic unit,LU):
在存储技术中,各种存储器(比如硬盘)一般不直接暴露给主机,而是将多个存储器抽象化为存储池(或者内存池),然后将存储池的存储空间划分为逻辑单元提供给主机使用。每个逻辑单元都会分配相应的标识,即逻辑单元号(logical unit number,LUN)。需要说明的是,由于主机一般能直接感知到LUN,本领域技术人员通常直接用LUN代指逻辑单元,若无特殊说明,下文均采用LUN表示逻辑单元。
LUN可能具有两种不同功能的标识,分别是LUN ID和通用唯一识别码(universally unique identifier,UUID)。LUN ID是LUN在设备内的标识,用于区别设备内的不同LUN。然而,LUN ID只是在同一个设备中是唯一的,有可能会和其他设备中的某个LUN的LUN ID重合。UUID虽然也是LUN的标识,但UUID是全球唯一的编号。需要说明的是,由于LUN 挂载到主机上之后,从主机上看到的LUN是主机LUN,主机LUN的标识称为主机LUN ID。由于主机LUN并不是实际存在的物理空间,只是LUN在主机上的映射。因此,主机LUN ID也可以认为是LUN的标识。
3、LUN无归属:
LUN无归属,指的是各个LUN与(存储)控制器之间没有归属关系,每个控制器都可以对任意一个LUN进行访问。
举例来说,如图2所示,两个磁盘框中的磁盘共同构成存储池,存储系统中的每个控制器都可以访问这个存储池的存储空间,所以这个存储池是全局性的。而且,控制器间是共享访问缓存的(或者说控制器间的缓存互为备份),即缓存也是全局性的。
存储池的存储空间被划分为LUN供主机使用,每个LUN都有对应的标识。对于每个LUN而言,它不归属于任何一个控制器,每个控制器都可以对其进行访问。比如,假设将LUN1切分为12个分片,每个分片占用LUN1中设定大小的逻辑空间。当主机有访问LUN1中的各分片的一些I/O请求时,这些I/O请求通过存储区域网络(storage area network,SAN)下发,并且被打散分发给各个控制器(图中仅是一种I/O打散分发的示例)。其中,控制器A被分配到访问LUN1中的分片1、分片3或分片7的一些I/O请求,然后控制器A根据分配到的I/O请求访问LUN1并执行相应的I/O操作;控制器B被分配到访问LUN1中的分片2,分片4或分片11的一些I/O请求,所以控制器B根据分配到的I/O请求也会去访问LUN1……总之,LUN1不归属于任意一个控制器,4个控制器均可以访问LUN1。
下面介绍本申请实施例中涉及的存储系统300。
请参见图3,图3是本申请实施例中提供的一种存储系统300的架构图,存储系统300包括一个或多个主机301(图中仅以一个主机301为例)、DPU 302和存储设备303。其中,每个主机301均有对应的DPU 302,DPU 302通过快捷外围部件互连标准(peripheral component interconnect express,PCIe)接口与主机301连接(或者说通过PCIe总线与主机中的处理器连接)。需要说明的是,图中的DPU 302位于主机301的外部,但DPU 302也可以直接集成到主机301的内部或者以插接的形式连接到主机301。
下面分别对主机301、DPU 302和存储设备303进行具体介绍。
1、主机301可以是服务器、笔记本电脑、台式计算机等设备,本申请不做具体限定。如图3所示,在主机301的操作系统(operating system,OS)中,包括非易失性存储器标准(non-volatile memory express,NVMe)驱动器,NVMe驱动器用于驱动NVMe设备。
在可能的实施例中,主机301中可以安装有各种应用程序,用户可以通过应用程序触发I/O请求,以存取数据。
2、DPU 302为主机301提供一种或多种类型的网口,以便主机301能连接到相应类型的网络中,进而能够和存储设备303进行交互。所以,主机301无需单独的网络接口卡(network interface card,NIC)。
比如,在存储前端网络为聚合以太网上的远程直接内存访问(remote direct memory access over converged ethernet,RoCE)网络时,DPU 302为主机301提供RoCE网口,以便主机301能连接到RoCE网络,进而能够在存储设备303中存/取数据。
需要说明的是,DPU 302除了可以直接通过网络与存储设备303通信,还可以通过交换机访问存储设备303以存取数据。图3中只绘制了一个网口,但不代表只有一个或一种类型 的网口。
在可能的实施例中,DPU 302可以包括NVMe控制器、多路径模块和NVMe over Fabrics(简称NVMe-oF或者NoF)启动器:
1)NVMe控制器(即NVMe controller),用于实现NVMe协议,使DPU 302对主机301呈现为一个NVMe设备。换句话说,NVMe控制器可以将DPU 302模拟成一个NVMe设备,从主机301来看,DPU 302相当于本地的NVMe设备,所以主机301可以使用自身的NVMe驱动器来驱动DPU 302,主机301和DPU 302之间采用NVMe协议进行通信。
2)多路径模块,用于对存储设备303通过多条路径(或者说通过多个控制器305)上报到DPU 302的LUN进行聚合,并将聚合后的LUN交由NVMe控制器上报给主机301。多路径模块还用于,将主机301下发的I/O请求打散分配给存储设备303中的各个控制器305,以实现各个控制器305之间的负载均衡。
3)NoF启动器(即NoF initiator,也可简称NoF INI),用于实现和NoF目标器(即NoF target)之间的NoF连接,实现建立通信链路、扫描LUN、I/O下发等功能。
需要说明的是,DPU 302中还可以包括更多或者更少的模块,图中的三个模块NVMe控制器、多路径模块、NoF启动器仅是一种示例性的划分,各模块也可以进一步拆分为多个功能模块,或者将多个模块组合为一个模块,本申请不做限定。
3、存储设备303可以是存储阵列、硬盘框、服务器(集群)、台式计算机等,本申请不做具体限定。存储设备303包括存储介质304以及多个控制器305。
1)存储介质304,用于提供存储空间。存储介质304的存储空间被虚拟化为存储池,然后划分为多个LUN,LUN与控制器305之间无归属关系,每个控制器305都可以控制访问任意一个LUN。
例如,如图3所示,假设存储介质304的存储空间被划分为了n个逻辑单元(n为大于1的正整数),n个逻辑单元分别用LUN1、LUN2至LUNn来表示。存储设备303中的每一个控制器305都可以访问这n个逻辑单元中的任意一个,所以控制器305和LUN之间没有归属关系。
在本申请实施例中,存储介质304可以是同一种类型的存储器,比如一个或多个固态硬盘;也可以是多种类型的存储器的组合,比如固态硬盘和机械硬盘的组合;还可以是一个或多个硬盘框,在硬盘框中可以放置一块或多块硬盘,等等。需要说明的是,本申请对存储介质304不做具体限定。
2)控制器305,用于对存储介质304进行访问控制,根据接收到的I/O请求执行相应的I/O操作。每个控制器305中都有NoF目标器(即NoF target),NoF目标器是一个软件模块,用于与NoF启动器实现NoF通信,一个NoF target就表示接收NoF命令的一个节点。
在可能的实施例中,每个控制器305都有一个对应的接口卡,以便该控制器305能够连接到网络中。比如,在存储前端网络为RoCE网络时,每个控制器305均有一个RoCE接口卡,以便该控制器305能够连接到RoCE网络。除了RoCE网络,还可以是其他类型的支持NoF协议的网络,本申请不做限定,不需要改变原来的组网结构。需要说明的是,图3中的接口卡处于控制器305的外部,但接口卡也可以直接集成到控制器305的内部或者以插卡的方式连接到控制器305,本申请不作限定。
在可能的实施例中,存储设备303中的每个控制器305内部都设有缓存,各个控制器305间可以共享缓存,互为备份。
以存储设备303是包含两个控制器305(分别是控制器A和控制器B)的存储阵列为例,控制器A和控制器B间具有镜像通道,当控制器A将一份数据写入其缓存后,可以通过镜像通道将该数据的副本发送给控制器B,控制器B将该副本存储在自身的缓存中。于是,控制器A和控制器B互为备份,当控制器A发生故障的时候,控制器B可以接管控制器A的业务,当控制器B发生故障时,控制器A可以接管控制器B的业务,从而避免硬件故障导致整个存储阵列的不可用。类似的,如果存储阵列中部署有4个控制器305,任意两个控制器305之间都具有镜像通道,因此任意两个控制器305互为备份。
下面基于存储系统300,介绍本申请提供的网络存储方法的实施例。
请参见图4,图4是本申请实施例提供的一种网络存储方法的流程示意图,该方法用于上述存储系统300,包括以下步骤:
S401、主机301将I/O请求发送给DPU 302,所述I/O请求用于访问存储设备303中的多个逻辑单元中的一个逻辑单元,DPU通过PCIe接口连接到主机301。
应理解,主机301一次可以将一个或者多个I/O请求发送给DPU 302,每个I/O请求可以访问这多个逻辑单元中的任意一个或者某个指定的逻辑单元。步骤S401中是以一个I/O请求为例进行表述,但不代表只能有一个I/O请求。
在可能的实施例中,每个I/O请求中都包含地址信息,地址信息指示该I/O请求所针对的地址段,以便在该地址段上存/取数据。地址信息可以直接指示I/O请求所要访问的地址段,也可以间接指示I/O请求所要访问的地址段,本申请不做限定。
举例来说,如果主机301发送的是读请求(即O请求),那么读请求中的地址信息指示该读请求所要访问的地址段,以便从该地址段上读取数据;如果主机301发送的是写请求(即I请求),写请求中还包括要写入的数据,那么写请求中的地址信息指示该写请求所要访问的地址段,以便往该地址段上存储数据。
由前述内容可知,存储设备303中的存储介质304用于提供存储空间,但存储介质304的存储空间并不直接暴露给主机301,而是虚拟化为存储池,然后可以划分为LUN提供给主机301使用。因此,在一些可能的实施例中,地址信息中包括一种或多种LUN的标识。这里的LUN的标识,可以是LUN的UUID,也可以是存储介质304中的LUN在主机301中的映射的标识,比如主机LUN ID,或者LUN在主机301中对应的块设备(虚拟存储设备)的标识等。
可以理解的是,数据位于一个LUN内的具体位置可以由起始地址和该数据的长度(length)来确定。对于起始地址,本领域技术人员通常称为逻辑地址块(logical block address,LBA)。所以,在可能的实施例中,地址信息中包括LUN的标识、LBA和length,LUN的标识、LBA和length这三个因素可以标识一个确定的地址段。所以,主机301发送的I/O请求都会被定位到一个地址段上,以便从所述地址段上读取数据,或者往所述地址段上写入数据。
需要说明的是,I/O请求中除了可以携带LUN ID、LBA和length作为地址信息,还可以采用其他逻辑地址构建地址信息,例如虚拟空间ID、虚拟空间的起始地址以及长度。本申请对地址信息的表现方式不做具体限定。
在一种可能的实施例中,主机301中可以安装各种应用程序,用户可以通过主机301中运行的应用程序触发主机301生成I/O请求以存取数据,然后主机301会把生成的I/O请求通过NVMe协议发送给DPU 302。
S402、DPU 302从存储设备303中的多个控制器305中确定用于处理该I/O请求的控制器305,以实现多个控制器305间的负载均衡,向确定的控制器305发送该I/O请求。
在一具体实施例中,当DPU 302接收到主机301下发的一个或多个I/O请求时,DPU 302会从存储设备303中的多个控制器305当中,为这一个或多个I/O请求分别确定一个用于处理该I/O请求的控制器305,然后将这一个或多个I/O请求分别发送给确定的控制器305,以实现多个控制器305间的负载均衡。或者说,存储设备303中的每个控制器305均有一条到DPU 302的路径,当DPU 302接收到主机301下发的一个或者多个I/O请求时,DPU 302为这一个或者多个I/O请求分别确定一条到存储设备303的路径,也就是打散到多条路径,相当于间接为每个I/O请求确定一个控制器305,DPU 302再将每个I/O请求分别通过确定的路径发送到该路径相应的控制器305,使得多个控制器305间能够负载均衡。
举例来说,如图5所示,存储设备303包含两个控制器305,分别是控制器A和控制器B。控制器A和控制器B均有一条路径到DPU 302,为了便于描述,将控制器A和DPU 302之间的这条路径称为路径1,将控制器B和DPU 302之间的这条路径称为路径2,路径1和路径2均可以归结为存储设备303和DPU 302之间的路径。将存储介质304中的逻辑单元LUN1划分为6个分片,分别用分片1至分片6来表示,每个分片都占用LUN1中设定大小的逻辑空间。
假设主机301生成了一些I/O请求,这些I/O请求分别要访问LUN1中的不同分片,然后主机301将这些I/O请求发送给DPU 302。相应的,当DPU 302接收到主机301下发的这些I/O请求后,分别为每个I/O请求确定一个用于处理该I/O请求的控制器305,以实现控制器A和控制器B之间的负载均衡。假设访问LUN1中的分片1、分片3或分片6的这部分I/O请求被确定由控制器A来处理(或者说是确定由路径1发送给存储设备303),访问LUN1中的分片2、分片4或分片5的这部分I/O请求被确定由控制器B来处理(或者说是确定由路径2发送给存储设备303)。于是,DPU 302通过相应的路径,将这些I/O请求分别发送给了存储设备303中的不同控制器305,其中,访问LUN1中的分片2、分片4或分片5的这部分I/O请求通过路径1发送给控制器A,访问LUN1中的分片2、分片4或分片5的这部分I/O请求则通过路径2发送给控制器B。
在可能的实施例中,DPU 302包括NVMe控制器、多路径模块和NoF启动器;NVMe控制器负责接收主机301通过NVMe协议发送的一个或多个I/O请求,然后交给DPU 302中的多路径模块;接着,多路径模块为这一个或多个I/O请求分别确定一个用于处理该I/O请求的控制器305,使得负载在存储设备303中的多个控制器305间均衡分配;最后,NoF启动器根据多路径模块的分配情况,将这一个或多个I/O请求分别发送给确定的控制器305。
例如,如图6所示,假设存储设备303包含两个控制器305,分别是控制器A和控制器B,控制器A和控制器B均有一条路径到DPU 302,将控制器A和DPU 302之间的这条路径称为路径1,将控制器B和DPU 302之间的这条路径称为路径2,路径1和路径2均可以称为存储设备303和DPU 302之间的路径。LUN1是存储介质304中划分出来的其中一个逻辑单元,将LUN1的逻辑空间切分为六个分片,分别用分片1至分片6来表示,每个分片都占用LUN1中设定大小的逻辑空间。可以理解的是,访问LUN1中的任意一个分片的I/O请求,都可以归结为访问LUN1的I/O请求。
假设主机301生成了一些I/O请求,这些I/O请求分别要访问的是LUN1中的不同分片,然后主机301通过NVMe协议将这些I/O请求发送给了DPU 302。当DPU 302中的NVMe 控制器接收到这些I/O请求后,将这些I/O请求发送给多路径模块。多路径模块为这些I/O请求分别确定一条到存储设备303的路径,以实现控制器A和控制器B之间的负载均衡。其中,访问LUN1中的分片1、分片3或分片6的I/O请求被确定通过路径1发送给存储设备303,相当于把访问分片1、分片3或分片6的I/O请求分配给了控制器A;访问LUN1中的分片2、分片4或分片6的I/O请求被确定通过路径2发送给存储设备303,相当于分配给了控制器B。最后,NoF启动器根据多路径模块的分配情况,将访问分片1、分片3或分片6的I/O请求通过路径1发送给控制器A,将访问分片2、分片4或分片5的I/O请求通过路径2发送给控制器B。控制器A和控制器B会分别根据自身接收到的I/O请求去访问存储介质304中的相应位置,执行相应的I/O操作。
在可能的实施例中,多路径模块可以根据哈希(Hash)算法进行负载均衡,从多个控制器305中确定用于处理该I/O请求的控制器305,以实现多个控制器305间的负载均衡,然后向确定的这个控制器305发送该I/O请求。实际采用的哈希算法可以与控制器305间进行配合,以分片为单位在控制器305间进行I/O分发,尽量避免I/O请求在控制器305间的转发。
比如,假设每个逻辑单元都划分为6个分片,每个分片均占用该逻辑单元中设定大小的逻辑空间,多路径模块可以根据一致性哈希算法(consistent hashing),将访问同一逻辑单元中的分片1、分片3或分片5的I/O请求分配给控制器A,将访问同一逻辑单元中的分片2、分片4或分片6的I/O请求分配给控制器B,也就是说,访问同一逻辑单元的不同分片的I/O请求,被均匀分配到了控制器A和控制器B之间,能够实现控制器A和控制器B间的负载均衡。当然,除了哈希算法之外,多路径模块也可以采用其他负载均衡算法,将主机301下发的I/O请求均衡分配给各个控制器305,本申请不做具体限定。多路径模块还可以根据每个控制器的单位时间访问量或者每个控制器的CPU利用率等因素,为I/O请求确定相应的控制器305,以实现控制器305间的负载均衡。
在可能的实施例中,LUN与控制器间没有归属关系,但同一逻辑单元中的不同分片与控制器之间可以有对应关系;多路径模块可以根据该I/O请求所要访问的分片以及该分片与控制器305的对应关系,将该分配所对应的控制器305确定为用于处理该I/O请求的控制器305,然后由NoF启动器通过NoF协议将该I/O请求发送给确定的这个控制器305。
举例来说,存储设备303中的LUN1与各个控制器305间没有归属关系,任意一个控制器305均可以访问LUN1,但是,LUN1中的各分片与各个控制器305间有对应关系。假设LUN1划分为6个分片,分片1、分片3、分片5对应控制器A,而分片2、分片4和分片6对应控制器B,也就是,凡是访问LUN1中的分片1、分片3或分片5的I/O请求,都由控制器A来负责处理,访问LUN1中的分片2、分片4或分片6的I/O请求,都由控制器B来负责处理,这种对应关系将访问LUN1的I/O请求以分片为粒度分配给不同的控制器305,有助于实现控制器A、B间的负载均衡。所以,当多路径模块接收到访问LUN1的I/O请求时,便可以先确定该I/O请求实际要访问的是LUN1中的哪个分片(假设是分片1),然后根据分片1与控制器A的对应关系,便可以将控制器A确定为用于处理该I/O请求的控制器305,最后将这个I/O请求发送给控制器A。
综上所述,在本申请实施例提供的存储系统300中,为主机301配备了相应DPU 302,DPU 302作为主机301的外部PCIe设备,通过PCIe接口连接到主机301上。作为主机301和存储设备303之间的连接枢纽,DPU 302将主机301的I/O请求的多路径分发、网络协议处理、扫描LUN等任务均卸载到自身,减轻了主机301中CPU的压力,简化了主机301的 设计,同时能够节约成本。
由于主机301不需要安装多路径软件,所以不需要软件开发人员额外开发多路径软件,避免了为众多操作系统开发相应的定制多路径软件所带来的困难。而且,目前几乎所有类型的操作系统均支持NVMe协议,使用主机操作系统中的NVMe驱动器即可实现对DPU 302的驱动,所以,该DPU 302不会受到主机操作系统类型或版本的限制,适用性好,可以在各种主机301中推广使用。主机301可以直接把DPU 302当成本地的NVMe设备,所以主机301无需关注和存储设备303之间的沟通,只需要把生成的I/O请求直接发送给DPU 302即可,对主机301的性能影响较小。
DPU 302中的NVMe控制器负责实现NVMe协议,与主机301进行NVMe通信,当NVMe控制器接收到主机301通过NVMe协议发来的I/O请求时,将这些I/O请求发送给DPU 302中的多路径模块。多路径模块负责将I/O请求分配给存储设备303中的控制器305,实现I/O请求在多个控制器305间的均衡分发。具体可以使用哈希算法与控制器305进行配合,以分片为单位在控制器305间进行I/O分发,使得控制器305间能够负载均衡,避免I/O请求在控制器305间的大量转发。DPU 302中的NoF启动器通过NVMe-oF协议与存储设备中的控制器进行通信,能够实现高性能的数据存取,也就是把主机301的存储空间拓展到位于远端或者云上的存储设备303,能够简化主机301的设计,尽可能减少主机301本地所需的存储容量,节省成本。
由DPU 302负责I/O请求在多个控制器305间的均衡分发,还有一个好处就是可以简化存储设备303的硬件设计,存储设备303侧不需要增加负载均衡的相应设计(比如设计定制芯片来实现I/O请求在多个控制器305件的均衡分发),各控制器305根据自身接到的I/O请求进行处理即可。
在可能的实施例中,在上述步骤S402之后,该网络存储方法还包括:存储设备303中的每个控制器305分别处理自身接收到的I/O请求。
具体来说,存储设备303中的各个控制器305分别根据接收到的I/O请求中的地址信息,确定出该I/O请求所要访问的地址段,然后在该地址段上执行相应的I/O操作。
比如,在控制器A接收到的一个I/O请求中,携带有LUN8的标识、LBA和length作为地址信息,根据LUN8的标识、LBA和length,可以确定该I/O请求要访问的是LUN8中的以LBA为起始地址,length为长度的地址段,然后在这个地址段上执行相应的I/O操作。如果是写请求,就在往这个地址段中写入相应的数据;如果是读请求,就从这个地址段中读取相应的数据。
需要说明的是,本申请对控制器305如何根据I/O请求访问存储介质304不做具体限定。比如,控制器A根据I/O请求确定出地址段后,实际还要根据该地址段确定对应的全局地址(由一个地址段可以索引到一个全局地址,全局地址所指示的空间在存储池中是唯一的),然后根据全局地址访问存储介质304中的相应物理位置(根据全局地址也就可以确定其对应的物理地址,物理地址指示的是该全局地址所代表的空间实际位于哪个存储器上,以及在该存储器中的偏移量,即物理空间的位置)。
应理解,在主机301通过DPU 302向存储设备303下发I/O请求之前,存储设备303需要将主机301对应的LUN上报给主机301。下面结合存储系统300,介绍存储设备303向主机301上报LUN的过程进行介绍。
请参见图7,图7是本申请实施例提供的一种存储设备303向主机301上报LUN的流程示意图,用于存储系统300,包括以下步骤:
S701、DPU 302向存储设备303发送上报命令,上报命令指示存储设备303将主机301对应的LUN发送给DPU 302。
需要说明的是,存储设备303中的存储介质304是被虚拟化为存储池,然后划分为LUN提供给主机301使用的,而上述主机301对应的LUN,指的就是存储设备303分配给主机301的那部分LUN。比如,存储设备303可以根据主机301的资源需求量或者其他因素,向主机301提供一个或多个LUN。
还需要说明的是,上报某个/某些LUN,指的是把该LUN的相关信息进行上报,LUN的相关信息可以包括LUN的标识、逻辑地址、容量大小等,本申请不做具体限定。
在可能的实施例中,DPU 302可以分别通过多条路径,将所述上报命令发送给存储设备303。
需要说明的是,存储设备303中的每个控制器305都可以有一条路径到DPU 302,存储设备303中的任意一个控制器305与DPU 302之间的路径,都可以归结为该存储设备303和DPU 302之间的路径。由于存储设备303中有多个控制器305,所以存储设备303和DPU 302之间会有多条路径。于是,DPU 302分别通过多条路径将上报命令发送给存储设备303,可以是把上报命令分别发送给存储设备303中的多个控制器305。
在可能的实施例中,在DPU 302发送给存储设备303的上报命令中,携带有主机301的标识和/或传输所述上报命令的DPU 302的端口信息。
在可能的实施例中,DPU 302可以感知到存储设备303中的LUN配置的变化(比如有新分配给主机301的LUN),主动向存储设备303发送上报命令,也就是说,DPU 302可以替主机301实现扫描LUN的功能,以便主机301能够感知到存储设备303分配给主机301的逻辑单元的变化。
S702、存储设备303根据上报命令确定主机301对应的LUN,通过存储设备303中的控制器305将主机301对应的LUN的信息发送给DPU 302。
在一具体实施例中,存储设备303将主机301对应的LUN的信息,分别通过存储设备303和DPU 302之间的多条路径发送给DPU 302。应理解,由于LUN与控制器305之间无归属关系,所以同一个LUN要分别通过多个控制器305进行上报,也就是通过多条路径上报,以便DPU 302为同一个LUN确定多条访问路径。
在可能的实施例中,存储设备303根据上报命令确定主机301对应的LUN后,生成LUN上报信息,然后分别通过多条路径将LUN上报信息发送给DPU 302。其中,LUN上报信息包括主机301对应的LUN的信息,比如LUN的标识、逻辑地址范围、容量大小等。可以理解的是,存储设备303通过存储设备303和DPU 302之间的多条路径,将LUN上报信息分别发送到DPU 302,所以DPU 302会收到来自不同路径的LUN上报信息。或者说,多个控制器305会分别将LUN上报信息发送给DPU 302,DPU 302会收到来自不同控制器305的LUN上报信息。
在可能的实施例中,在LUN上报信息的传输过程中,LUN上报信息所经过的控制器305会将各自的端口的标识(或者控制器305的标识)加入LUN上报信息中,也就是在LUN上报信息中增加相应的路径信息。比如,通过存储设备303中的控制器A发送给DPU 302的LUN上报信息中,携带有控制器A的端口的标识,通过控制器B发送给DPU 302的LUN上 报信息中,携带有控制器B的端口标识,以便区分是来自不同路径的LUN上报信息。
S703、DPU 302根据存储设备303中的控制器发送的LUN的信息,为主机301对应的每个LUN生成对应的设备,将生成的设备的信息上报给主机301。
在可能的实施例中,DPU 302中的NoF启动器负责接收各个控制器305发送的LUN上报信息,然后分别为每个控制器305发送的LUN上报信息中指示的每一个LUN,生成一个对应的设备(是虚拟设备)。接着,多路径模块根据LUN UUID,将NoF启动器生成的设备中对应同一个LUN的设备进行聚合,得到聚合后的设备。NVMe控制器将聚合后的设备分别用NVMe协议中定义的命名空间(namespace,NS)来表示,分配相应的命名空间标识,然后上报给主机301。
在可能的实施例中,多路径模块将存储设备303上报某个LUN的路径,设置为访问该LUN的路径。
比如,假设控制器A与DPU 302之间的路径为路径1,控制器B与DPU 302之间的路径为路径2。LUN1的相关信息分别通过控制器A和控制器B发送给了DPU 302,于是DPU 302中的多路径模块会把路径1和路径2均设为访问LUN1的路径。后续当DPU 302收到访问LUN1的I/O请求时,就可以根据预设的策略(比如某种负载均衡策略)在这两条路径中选择进行选择,进而通过确定出来的那条路径把该I/O请求发送给存储设备303中该路径所对应的控制器305。
S704、主机301根据DPU 302上报的设备的信息,创建相应的虚拟存储设备。
应理解,主机301根据接收到的设备的信息生成相应的虚拟存储设备,相当于把DPU 302上报的设备给抽象化了,不直接暴露给主机301中的应用,应用只能感知到创建的虚拟存储设备,对这些虚拟存储设备进行访问。
在可能是实施例中,当主机301中的NVMe启动器接收到DPU 302中的NVMe控制器发送的命名空间的信息后,将其注册为主机301中的块设备(块设备是某种虚拟存储设备,用于提供给主机301中的应用进行操作),包括为该块设备分配对应的名称,建立该块设备与命名空间的映射关系等,还可以在该块设备中记录其他信息,如逻辑地址空间、容量大小等等。
下面结合图8,对图7中的上报LUN的流程进行举例说明。
如图8所示,假设存储设备303(为了便于看图,未标出)中有两个控制器305,分别是控制器A和控制器B,控制器A和控制器B均有一条路径到DPU 302,两条路径都属于存储设备303与DPU 302之间的路径。存储设备303中划分的LUN1和LUN2这两个逻辑单元,均是主机301对应的LUN,由于LUN和控制器305之间没有归属关系,所以任意一个控制器305都可以对LUN1和LUN2进行访问控制。
假设DPU 302将上报命令发送给了存储设备303。存储设备303根据上报命令中的主机301的标识,可以确定出主机301对应的LUN1和LUN2,然后分别通过控制器A和控制器B,将LUN1和LUN2上报给DPU 302,即通过两条不同路径将LUN1和LUN2的信息发送给DPU 302。
如图8所示,DPU 302中的NoF启动器,根据控制器A发送的LUN1的信息,为LUN1生成了一个对应的设备,命名为NoF11,根据控制器B发送的LUN1的信息,又为LUN1生成了另一个对应的设备,命名为NoF12。可以看出,由于LUN1是通过两条不同路径上报的,LUN1在NoF启动器这里被识别为了两个设备,而这两个设备对应的实际都是LUN1。同样 的,NoF启动器分别根据控制器A和控制器B发送的LUN2的信息,也会为LUN2生成两个对应的设备,分别为NoF21和NoF22。
多路径模块根据LUN UUID,可以识别出NoF11和NoF12对应的其实是同一个LUN(即LUN1),然后将这两个设备聚合为设备Dev1,所以Dev1对应的也是的LUN1。同样的,多路径模块也会将NoF21和NoF22聚合为同一个设备Dev2。由于NVMe控制器是用于实现NVMe协议的,所以NVMe控制器需要采用NVMe协议中定义的命名空间来表示Dev1和Dev2,将它们分别标识为NS1和NS2,NS1和NS2实际上分别对应是存储设备303中的LUN1和LUN2。然后,NVMe控制器把NS1和NS2的信息上报给主机301。
主机301中的NVMe驱动器接收DPU 302上报的NS1和NS2的信息后,将NS1和NS2注册到块设备层,生成两个对应的块设备,分别用nvme0n1和nvme0n2表示,以提供给上层应用进行操作。
需要说明的是,上述例子中的各种设备的名称均为示例,不构成限定。
综上所述,在本申请实施例提供的存储系统300中,主机301不需要安装多路径软件,只要配备有相应DPU 302即可,所以不需要软件开发人员额外开发多路径软件,避免了为众多操作系统开发定制多路径软件所带来的困难。在本申请实施例中,DPU 302为主机301执行扫描LUN的功能,向存储设备303发送上报命令;DPU 302还负责对存储设备303通过多条路径上报的LUN(的信息)进行聚合,然后将聚合后得到的设备的信息上报给主机301。所以,主机301不需要关注与存储设备303之间的通信,将网络协议处理、扫描LUN等功能均卸载到DPU 302,减轻了主机301中处理器(比如CPU)的压力,还能够简化主机301的设计,尽量减少主机301本地的存储容量,节约成本。
本申请实施例还提供一种数据处理单元,该数据处理单元可以是前述任一实施例中的DPU 302。该数据处理单元用于:接收主机301发送的输入/输出请求,其中,数据处理单元通过PCIe接口连接到主机301,该输入/输出请求用于访问存储设备303中的一个逻辑单元,存储设备303包括多个逻辑单元和多个控制器305;从多个控制器305中确定用于该处理输入/输出请求的控制器305,以实现多个控制器305之间的负载均衡,向确定的该控制器305发送该输入/输出请求。
需要说明的是,数据处理单元是一种专用处理器,这里可以理解为是数据处理单元芯片。该芯片可以作为主机外部的PCIe设备,通过PCIe接口连接到主机上。
在可能的实施例中,数据处理单元还用于:通过NVMe协议与主机301进行通信;通过NVMe-oF协议与存储设备303中的控制器305进行通信。
在可能的实施例中,数据处理单元还用于:根据哈希算法,从多个控制器305中确定用于处理该输入/输出请求的控制器305,以实现多个控制器305之间的负载均衡。
在可能的实施例中,数据处理单元还用于:向存储设备303发送上报命令,上报命令指示存储设备303将主机301对应的逻辑单元的信息发送给数据处理单元;接收存储设备303中的每个控制器305发送的主机301对应的逻辑单元的信息;根据主机301对应的逻辑单元的信息,为主机301对应的逻辑单元生成对应的设备,将生成的设备的信息发送给主机301。
图9是本申请实施例提供的一种计算机系统900的结构示意图。计算机系统900包括主机301以及上述任一实施例中的DPU 302,DPU 302通过PCIe接口连接到所述主机301。也 就是说,计算机系统900包括主机301部分以及外部设备(DPU 302作为外部设备)。
主机301包括处理器901、存储器902以及通信接口903。其中,处理器901、存储器902以及通信接口903可以通过内部总线904相互连接,也可通过无线传输等其他手段实现通信。本申请实施例以通过总线904连接为例,总线904可以是外设部件互连标准(peripheral component interconnect,PCI)、快捷外围部件互连标准(peripheral component interconnect express,PCIe)总线、、扩展工业标准结构(extended industry standard architecture,EISA)总线、统一总线(unified bus,Ubus或UB)、计算机快速链接(compute express link,CXL)或者缓存一致互联协议(cache coherent interconnect for accelerators,CCIX)等。总线904除包括数据总线之外,还可以包括地址总线、电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中仅用一条粗线表示,将各种总线都标为总线904,但并不表示仅有一根总线或一种类型的总线。
处理器901可以由至少一个通用处理器构成,例如中央处理器(central processing unit,CPU),或者CPU和硬件芯片的组合。上述硬件芯片可以是专用集成电路(application-specific integrated circuit,ASIC)、可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件(complex programmable logic device,CPLD)、现场可编程逻辑门阵列(field-programmable gate array,FPGA)、通用阵列逻辑(generic array logic,GAL)或其任意组合。处理器901执行各种类型的数字存储指令,例如存储在存储器902中的软件或者固件程序,它能使计算机系统900提供多种服务。
存储器902用于存储程序代码,并由处理器901来控制执行。
存储器902可以包括易失性存储器(volatile memory),例如随机存取存储器(random access memory,RAM);存储器902也可以包括非易失性存储器(non-volatile memory),例如只读存储器(read-only memory,ROM)、快闪存储器(flash memory)、硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD);存储器902还可以包括上述种类的组合。存储器902可以存储有程序代码,具体可以用于执行图7中的网络存储方法的任一实施例或者图8中的任一实施例,这里不再进行赘述。
通信接口903中包括至少一个PCIe接口以及其他通信接口,可以为有线接口(例如以太网接口),可以为内部接口、有线接口(例如以太网接口)或无线接口(例如蜂窝网络接口或使用无线局域网接口),用于与与其他设备或模块进行通信。其中,DPU 302通过主机301的PCIe接口连接到主机301的处理器901,以执行图7中的网络存储方法中的任一实施例或者图8中的上报LUN的任一实施例。
需要说明的,图9仅仅是本申请实施例的一种可能的实现方式,实际应用中,计算机系统900还可以包括更多或更少的部件,这里不作限制。关于本申请实施例中未出示或未描述的内容,可参见前述图7或图8的任一实施例中的相关阐述,这里不再赘述。
本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质中存储有指令,当其在处理器上运行时,图7或者图8的任一实施例的方法得以实现。
本申请实施例还提供一种计算机程序产品,当计算机程序产品在处理器上运行时,图7或图8的任一实施例的方法得以实现。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程 序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(read-only memory,ROM)或随机存储记忆体(random access memory,RAM)等。
以上所揭露的仅为本申请一种较佳实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于发明所涵盖的范围。
Claims (13)
- 一种网络存储方法,其特征在于,用于存储系统,所述存储系统包括主机、数据处理单元和存储设备,所述数据处理单元通过快捷外围部件互连标准PCIe接口连接到所述主机,所述存储设备包括多个控制器和多个逻辑单元;所述方法包括:所述主机将输入/输出请求发送给所述数据处理单元,所述输入/输出请求用于访问所述多个逻辑单元中的一个逻辑单元;所述数据处理单元从所述多个控制器中确定用于处理所述输入/输出请求的控制器,以实现所述多个控制器之间的负载均衡,向确定的所述控制器发送所述输入/输出请求。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:所述数据处理单元与所述主机之间通过NVMe协议进行通信,所述数据处理单元与所述存储设备中的控制器之间通过NVMe-oF协议进行通信。
- 根据权利要求1或2所述的方法,其特征在于,所述数据处理单元从所述多个控制器中确定用于处理所述输入/输出请求的控制器,包括:所述数据处理单元根据哈希算法,从所述多个控制器中确定用于处理所述输入/输出请求的控制器,以实现所述多个控制器之间的负载均衡。
- 一种存储系统,其特征在于,所述存储系统包括主机、数据处理单元和存储设备,所述数据处理单元通过PCIe接口连接到所述主机,所述存储设备包括多个控制器和多个逻辑单元;所述主机用于,将输入/输出请求发送给所述数据处理单元,所述输入/输出请求用于访问所述多个逻辑单元中的一个逻辑单元;所述数据处理单元用于,从所述多个控制器中确定用于处理所述输入/输出请求的控制器,以实现所述多个控制器之间的负载均衡,向确定的所述控制器发送所述输入/输出请求。
- 根据权利要求4所述的系统,其特征在于,所述数据处理单元还用于:通过NVMe协议与所述主机进行通信;通过NVMe-oF协议与所述存储设备中的控制器进行通信。
- 根据权利要求4或5所述的系统,其特征在于,所述从所述多个控制器中确定用于处理所述输入/输出请求的控制器,包括:根据哈希算法,从所述多个控制器中确定用于处理所述输入/输出请求的控制器,以实现所述多个控制器之间的负载均衡。
- 根据权利要求4或5所述的系统,其特征在于,所述数据处理单元还用于:向所述存储设备发送上报命令,所述上报命令指示所述存储设备将所述主机对应的逻辑单元的信息发送给所述数据处理单元;接收所述存储设备中的控制器发送的所述主机对应的逻辑单元的信息;根据所述主机对应的逻辑单元的信息,为所述主机对应的逻辑单元生成对应的设备,将 生成的所述设备的信息发送给所述主机。
- 根据权利要求7所述的系统,其特征在于,所述主机还用于:根据所述数据处理单元发送的所述设备的信息,生成所述设备对应的虚拟存储设备,所述虚拟存储设备用于提供给所述主机中的应用进行访问。
- 一种数据处理单元,其特征在于,所述数据处理单元用于:接收主机发送的输入/输出请求,其中,所述数据处理单元通过PCIe接口连接到所述主机,所述输入/输出请求用于访问存储设备中的一个逻辑单元,所述存储设备包括多个逻辑单元和多个控制器;从所述多个控制器中确定用于处理所述输入/输出请求的控制器,以实现所述多个控制器之间的负载均衡,向确定的所述控制器发送所述输入/输出请求。
- 根据权利要求9所述的数据处理单元,其特征在于,所述数据处理单元还用于:通过NVMe协议与所述主机进行通信;通过NVMe-oF协议与所述存储设备中的控制器进行通信。
- 根据权利要求9或10所述的数据处理单元,其特征在于,所述数据处理单元还用于:根据哈希算法,从所述多个控制器中确定用于处理所述输入/输出请求的控制器,以实现所述多个控制器之间的负载均衡。
- 根据权利要求9或10所述的数据处理单元,其特征在于,所述数据处理单元还用于:向所述存储设备发送上报命令,所述上报命令指示所述存储设备将所述主机对应的逻辑单元的信息发送给所述数据处理单元;接收所述存储设备中的控制器发送的所述主机对应的逻辑单元的信息;根据所述主机对应的逻辑单元的信息,为所述主机对应的逻辑单元生成对应的设备,将生成的所述设备的信息发送给所述主机。
- 一种计算机系统,其特征在于,包括主机和权利要求9至12中任一项所述的数据处理单元,所述数据处理单元通过PCIe接口连接到所述主机。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP22885925.2A EP4407952A4 (en) | 2021-10-27 | 2022-10-25 | NETWORK STORAGE METHOD, STORAGE SYSTEM, DATA PROCESSING UNIT AND COMPUTER SYSTEM |
| US18/647,189 US12602344B2 (en) | 2021-10-27 | 2024-04-26 | Network storage method, storage system, data processing unit, and computer system |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202111258240.8A CN116032930A (zh) | 2021-10-27 | 2021-10-27 | 网络存储方法、存储系统、数据处理单元及计算机系统 |
| CN202111258240.8 | 2021-10-27 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/647,189 Continuation US12602344B2 (en) | 2021-10-27 | 2024-04-26 | Network storage method, storage system, data processing unit, and computer system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2023072048A1 true WO2023072048A1 (zh) | 2023-05-04 |
Family
ID=86071119
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2022/127306 Ceased WO2023072048A1 (zh) | 2021-10-27 | 2022-10-25 | 网络存储方法、存储系统、数据处理单元及计算机系统 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US12602344B2 (zh) |
| EP (1) | EP4407952A4 (zh) |
| CN (1) | CN116032930A (zh) |
| WO (1) | WO2023072048A1 (zh) |
Families Citing this family (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116301663B (zh) * | 2023-05-12 | 2024-06-21 | 新华三技术有限公司 | 一种数据存储方法、装置及主机 |
| US20250004831A1 (en) * | 2023-06-30 | 2025-01-02 | Vmware, Inc. | System and method for operating a hardware watchdog timer in a data processing unit |
| CN116880759B (zh) * | 2023-07-13 | 2024-08-09 | 北京大禹智芯科技有限公司 | 一种基于DPU的NVMe系统及其启动方法 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100115174A1 (en) * | 2008-11-05 | 2010-05-06 | Aprius Inc. | PCI Express Load Sharing Network Interface Controller Cluster |
| US20190004942A1 (en) * | 2016-01-21 | 2019-01-03 | Hitachi, Ltd. | Storage device, its controlling method, and storage system having the storage device |
| CN109298839A (zh) * | 2018-10-26 | 2019-02-01 | 深圳大普微电子科技有限公司 | 基于pis的存储装置控制器、存储装置、系统及方法 |
| CN110572459A (zh) * | 2019-09-10 | 2019-12-13 | 苏州浪潮智能科技有限公司 | 一种存储设备控制方法、装置及电子设备和存储介质 |
| US20200073840A1 (en) * | 2018-09-05 | 2020-03-05 | Fungible, Inc. | Dynamically changing configuration of data processing unit when connected to storage device or computing device |
| CN112820337A (zh) * | 2019-11-18 | 2021-05-18 | 三星电子株式会社 | 存储器控制器、存储器系统和存储器系统的操作方法 |
Family Cites Families (11)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JP5246157B2 (ja) * | 2007-04-04 | 2013-07-24 | 富士通株式会社 | 負荷分散システム |
| US9323658B2 (en) * | 2009-06-02 | 2016-04-26 | Avago Technologies General Ip (Singapore) Pte. Ltd. | Multi-mapped flash RAID |
| US9229654B2 (en) * | 2013-08-29 | 2016-01-05 | Avago Technologies General Ip (Singapore) Pte. Ltd. | Input/output request shipping in a storage system with multiple storage controllers |
| US10275160B2 (en) * | 2015-12-21 | 2019-04-30 | Intel Corporation | Method and apparatus to enable individual non volatile memory express (NVME) input/output (IO) Queues on differing network addresses of an NVME controller |
| US10929309B2 (en) * | 2017-12-19 | 2021-02-23 | Western Digital Technologies, Inc. | Direct host access to storage device memory space |
| US11687277B2 (en) * | 2018-12-31 | 2023-06-27 | Micron Technology, Inc. | Arbitration techniques for managed memory |
| US11652746B1 (en) * | 2020-03-27 | 2023-05-16 | Amazon Technologies, Inc. | Resilient consistent hashing for a distributed cache |
| US11494115B2 (en) * | 2020-05-13 | 2022-11-08 | Alibaba Group Holding Limited | System method for facilitating memory media as file storage device based on real-time hashing by performing integrity check with a cyclical redundancy check (CRC) |
| US11151071B1 (en) * | 2020-05-27 | 2021-10-19 | EMC IP Holding Company LLC | Host device with multi-path layer distribution of input-output operations across storage caches |
| US11467994B2 (en) * | 2020-12-11 | 2022-10-11 | Hewlett Packard Enterprise Development Lp | Identifiers for connections between hosts and storage devices |
| US11656987B2 (en) * | 2021-10-18 | 2023-05-23 | Dell Products L.P. | Dynamic chunk size adjustment for cache-aware load balancing |
-
2021
- 2021-10-27 CN CN202111258240.8A patent/CN116032930A/zh active Pending
-
2022
- 2022-10-25 EP EP22885925.2A patent/EP4407952A4/en active Pending
- 2022-10-25 WO PCT/CN2022/127306 patent/WO2023072048A1/zh not_active Ceased
-
2024
- 2024-04-26 US US18/647,189 patent/US12602344B2/en active Active
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20100115174A1 (en) * | 2008-11-05 | 2010-05-06 | Aprius Inc. | PCI Express Load Sharing Network Interface Controller Cluster |
| US20190004942A1 (en) * | 2016-01-21 | 2019-01-03 | Hitachi, Ltd. | Storage device, its controlling method, and storage system having the storage device |
| US20200073840A1 (en) * | 2018-09-05 | 2020-03-05 | Fungible, Inc. | Dynamically changing configuration of data processing unit when connected to storage device or computing device |
| CN109298839A (zh) * | 2018-10-26 | 2019-02-01 | 深圳大普微电子科技有限公司 | 基于pis的存储装置控制器、存储装置、系统及方法 |
| CN110572459A (zh) * | 2019-09-10 | 2019-12-13 | 苏州浪潮智能科技有限公司 | 一种存储设备控制方法、装置及电子设备和存储介质 |
| CN112820337A (zh) * | 2019-11-18 | 2021-05-18 | 三星电子株式会社 | 存储器控制器、存储器系统和存储器系统的操作方法 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4407952A4 |
Also Published As
| Publication number | Publication date |
|---|---|
| US12602344B2 (en) | 2026-04-14 |
| CN116032930A (zh) | 2023-04-28 |
| US20240273050A1 (en) | 2024-08-15 |
| EP4407952A4 (en) | 2024-12-25 |
| EP4407952A1 (en) | 2024-07-31 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| JP7752489B2 (ja) | メモリリソースを管理するためのシステム及び方法 | |
| US11829309B2 (en) | Data forwarding chip and server | |
| EP3929756B1 (en) | Method, system, and intelligent network interface card for migrating data | |
| US7519745B2 (en) | Computer system, control apparatus, storage system and computer device | |
| US11606429B2 (en) | Direct response to IO request in storage system having an intermediary target apparatus | |
| US10423332B2 (en) | Fibre channel storage array having standby controller with ALUA standby mode for forwarding SCSI commands | |
| WO2023072048A1 (zh) | 网络存储方法、存储系统、数据处理单元及计算机系统 | |
| US20060195663A1 (en) | Virtualized I/O adapter for a multi-processor data processing system | |
| US20260111390A1 (en) | Data transmission apparatus, data processing device, system, and method, and medium | |
| CN101124541A (zh) | 修改逻辑分区数据处理系统中的虚拟适配器资源 | |
| US20230229341A1 (en) | Intelligent target routing in a distributed storage system | |
| US12519745B2 (en) | Network node configuration method and apparatus, and access request processing method and apparatus | |
| CN112052291A (zh) | 一种用户态访问分布式块存储系统的方法及系统 | |
| CN107577733A (zh) | 一种数据复制的加速方法及系统 | |
| WO2016101856A1 (zh) | 数据访问方法及装置 | |
| CN110471627B (zh) | 一种共享存储的方法、系统及装置 | |
| US20240126847A1 (en) | Authentication method and apparatus, and storage system | |
| WO2021179556A1 (zh) | 一种存储系统和请求处理方法以及交换机 | |
| WO2025086691A1 (zh) | Rdma网络配置方法及服务器 | |
| CN120321221A (zh) | 基于公有云技术的服务器系统及其访问方法 | |
| US11481147B1 (en) | Buffer allocation techniques | |
| US11971835B2 (en) | Techniques for creating connections in a network fabric | |
| US11601515B2 (en) | System and method to offload point to multipoint transmissions | |
| US20230026171A1 (en) | Intelligent control plane communication | |
| US10936219B2 (en) | Controller-based inter-device notational data movement system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22885925 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2022885925 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2022885925 Country of ref document: EP Effective date: 20240424 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |