WO2018059077A1 - 一种计算机系统和存储访问装置 - Google Patents
一种计算机系统和存储访问装置 Download PDFInfo
- Publication number
- WO2018059077A1 WO2018059077A1 PCT/CN2017/092816 CN2017092816W WO2018059077A1 WO 2018059077 A1 WO2018059077 A1 WO 2018059077A1 CN 2017092816 W CN2017092816 W CN 2017092816W WO 2018059077 A1 WO2018059077 A1 WO 2018059077A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- storage
- access device
- storage access
- resource
- request
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0806—Multiuser, multiprocessor or multiprocessing cache systems
- G06F12/0813—Multiuser, multiprocessor or multiprocessing cache systems with a network or matrix configuration
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
- G06F3/0611—Improving I/O performance in relation to response time
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0629—Configuration or reconfiguration of storage systems
- G06F3/0631—Configuration or reconfiguration of storage systems by allocating resources to storage systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0662—Virtualisation aspects
- G06F3/0665—Virtualisation aspects at area level, e.g. provisioning of virtual or logical volumes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5077—Logical partitioning of resources; Management or configuration of virtualized resources
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45579—I/O management, e.g. providing access to device drivers or storage
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/44—Arrangements for executing specific programs
- G06F9/455—Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
- G06F9/45533—Hypervisors; Virtual machine monitors
- G06F9/45558—Hypervisor-specific management and integration aspects
- G06F2009/45595—Network integration; Enabling network access in virtual machine instances
Definitions
- the present invention relates to storage technology, and more particularly to a computer system and a storage access device.
- Virtualization technology is becoming more and more widely used. The need to improve resource utilization through network and storage virtualization and improve the performance of virtual machines accessing networks and storage is increasing.
- the existing virtualization technology implements virtual storage resource management through a virtualization layer hypervisor or a virtual machine manager (VMM), and the virtualization layer or the virtual machine manager encapsulates the suspended storage resources into virtual hard disks. And assigned to different VMs.
- the path of the virtual machine (VM) accessing the allocated storage resource is complex. You need to access the back-end access interface on the virtualization layer or the virtual machine manager through the front-end access interface deployed on the virtual machine.
- the access interface is generally in the kernel mode, and then forwarded by the backend access interface to the storage resource scheduling module deployed on the virtualization layer or the virtual machine manager for actual physical storage resource scheduling or positioning (the storage resource scheduling module is generally located in the user mode) Finally, the storage access request can be forwarded to the physical storage resource.
- the storage resource access method described above has a complicated access path, a long path, and a large delay. Moreover, the access request must pass through a front-end access interface of the virtual machine, a back-end access interface of the virtualization layer or the virtual machine manager, and a storage resource scheduling module. These need to occupy the resources of the CPU in the host, affecting the CPU usage of the host.
- the embodiment of the invention provides a computer system and a storage access device, so as to realize direct access of the virtual machine to the storage resource, shorten the path and delay of the storage access, and reduce the occupation of the CPU resource of the computing node.
- an embodiment of the present invention provides a computer system including n computing nodes, n storage access devices, and m network storage devices, and running at least one virtual machine VM on each computing node, n network storage devices provide distributed storage resources for the at least one virtual machine, each computing node includes a processor, a memory, and a storage access device, and n and m are integers greater than or equal to 1;
- Each storage access device includes a processing unit of hardware, a high-speed peripheral component interconnect bus PCIe interface, and a network interface, and one end of the storage access device connects the processor of the at least one computing node through the PCIe interface, and one end passes through the network The interface is connected to the at least one network storage device;
- the storage access device in the present application supports a single root I/O virtualization SRIOV for configuring at least one virtual function VF through the physical function PF in the SRIOV function, and configuring the association relationship between the VM and the VF to make the associated VM and Establishing a direct communication channel between the VFs, wherein one VM corresponds to one VF; the storage access device further supports a distributed storage resource scheduling function, configured to acquire, by using the network interface, the data block resources provided by the connected at least one network storage device And the obtained data block resource is composed into multiple virtual volumes, and the relationship between the VF and the virtual volume is configured, where one VF corresponds to at least one virtual volume.
- the storage access device provided by the present application can directly establish a direct channel between the storage resource and the virtual machine.
- the storage access method supported by the existing cloud computing virtualization technology eliminates the front-end software stack of the VM accessing the storage resource, reduces the software stack path, reduces the delay, enhances the performance, and does not need to use too many hosts (calculation The CPU in the node) improves the resource utilization of the host.
- the PF backend driver is deployed in the storage access device provided by the application, and the PF front-end driver is deployed in the computing node connected to the storage access device; and the PF backend driver is loaded after the storage access device is started.
- the computing node connected to the storage access device loads the PF front-end driver, acquires resource information of the storage access device by using the PF front-end driver, and sends the resource access information to the storage access device according to the resource information of the storage access device
- a configuration command is issued to enable the storage access device to perform resource configuration, and allocate corresponding hardware resources to the PF and each VF.
- the storage access device provided by the present application specifically performs an association operation between the VM and the VF: the computing node connected to the storage access device passes the first VM association command after receiving the first VM association command sent by the upper application.
- the PF front-end driver module forwards to the PF back-end driver module; the storage access device receives the first VM-related command by the PF back-end driver module, and is the first specified in the first VM association command
- the VM configures a corresponding first VF, and records an association relationship between the first VM and the first VF.
- the storage access device specifically performs an association operation between the VF and the virtual volume as follows: a storage access device (the storage access device may provide a management interface, such as providing a command line interface CLI or a web user interface WEB UI to the management layer) Receiving an allocation request for allocating a storage resource to the first VM, determining a first VF associated with the first VM, and allocating at least one virtual volume from the plurality of virtual volumes to the first VM, establishing the allocation An association relationship between the at least one virtual volume and the first VF, returning an allocation response, the allocation response including information of the at least one virtual volume allocated for the first VM.
- a storage access device may provide a management interface, such as providing a command line interface CLI or a web user interface WEB UI to the management layer
- Receiving an allocation request for allocating a storage resource to the first VM determining a first VF associated with the first VM, and allocating at least one virtual volume from the plurality of virtual volumes to the first VM, establishing
- the storage access device specifically performs a read/write request of the VM: the first VF of the storage access device receives the I/O request of the first VM through a through channel, and places the I/O request Determining, in the IO queue of the first VF, a data block in the network storage device corresponding to the I/O request according to the virtual volume associated with the first VF, and corresponding to the I/O request The data block in the network storage device performs a read or write operation.
- the present application further provides a storage access device, which includes a processing unit of hardware, a high-speed peripheral component interconnection bus PCIe interface, and a network interface, and one end of the storage access device is connected to the computing node through the PCIe interface. a processor, one end of which is connected to the at least one network storage device through the network interface;
- the storage access device includes a pass-through module that supports a single root I/O virtualization SRIOV for virtualizing at least one virtual function VF through the physical function PF in the SRIOV function, and configuring the VM and the VF.
- the association relationship is such that a direct channel is established between the associated VM and the VF, wherein one VM corresponds to one VF;
- the storage access device further includes a resource scheduler, and the resource scheduler supports a distributed storage resource scheduling function for
- the network interface acquires a data block resource provided by the connected at least one network storage device, and combines the acquired data block resource into multiple virtual volumes, and configures an association relationship between the VF and the virtual volume, where one VF corresponds to at least A virtual volume.
- the present application further provides a storage access device including a processing unit of hardware, a first interface (eg, a high-speed peripheral component interconnect bus PCIe interface), and a second interface (eg, a network interface) mouth);
- a storage access device including a processing unit of hardware, a first interface (eg, a high-speed peripheral component interconnect bus PCIe interface), and a second interface (eg, a network interface) mouth);
- the processing unit of the hardware in the storage access device provided by the application is configured to support the SRIOV and the distributed storage resource scheduling function, for virtualizing at least one virtual function VF by using the physical function PF in the SRIOV function, and configuring the VM and the VM
- the relationship between the VFs is such that a direct channel is established between the associated VM and the VF, wherein one VM corresponds to one VF, and the data block resources provided by the connected at least one network storage device are obtained through the network interface, and the obtained data is obtained.
- the data block resources are composed of a plurality of virtual volumes, and the relationship between the VF and the virtual volume is configured, where one VF corresponds to at least one virtual volume.
- the processing unit of the hardware in the storage access device provided by the application is used to execute the application.
- a unique storage access method after the first VM issues an I/O request, the first VF of the storage access device receives the I/O request of the first VM through the through channel, and places the I/O request into the In the IO queue of the first VF, the processing unit determines, according to the virtual volume associated with the first VF, the data block in the network storage device corresponding to the I/O request, Said I / O request block corresponding to the network storage device read or write operation.
- the storage access method implemented by the storage access device of the present application can eliminate the need for the front and rear end interfaces in the virtualization middle layer, reduce the VM access path, reduce the software stack path, reduce the access delay, and enhance the storage. Access performance; at the same time, there is no need to use too many hosts (CPUs in the compute nodes) resources to improve the resource utilization of the host.
- FIG. 1 is a schematic structural diagram of a computer system provided by the present application.
- FIG. 2 is a schematic diagram showing the details of the composition of the computer system provided by the present application.
- FIG. 3 is a schematic structural diagram of a storage access apparatus provided by the present application.
- FIG. 1 is a schematic diagram of a computer system for performing the storage access method of the present application, and FIG. 1 respectively shows a hardware layer and a software layer of the computer system, the hardware layer of the computer system including n computing nodes ( n In the actual service, it may be any positive integer greater than or equal to 1, n storage access devices and m network storage devices (m may be any positive integer greater than or equal to 1 in actual service), wherein each The computing node can be a blade server, a rack server, or other type of server for providing computing resources. Each network storage device can be a SAN storage device, including a storage array or an HDD, an SDD hard disk, or the like, for providing storage resources.
- the software layer of the computer system includes a virtualization layer for virtualization (possibly distributed in the operating system OS of each computing node, or independently installed in the operating system of one of the computing nodes) and Virtual machines (VMs).
- VMs Virtual machines
- the composition of the computing node will be described below by taking one of the computing nodes 11 in FIG. 1 as an example.
- the computing node 11 includes a CPU 11-1, a memory 11-2, and may also include some other hardware components, such as a network interface card or the like (not shown).
- the CPU 11-1 may be one or more INTEL processor chips, and the memory 11 -2 provides storage capacity for the CPU 11-1 to temporarily store programs and data run by the CPU 11-1, and may be a random access memory (RAM), a read only memory (ROM), a cache (CACHE), or the like.
- the storage access device 11-3 is a hardware device newly provided for the present application, and is used as a key role to implement the storage access method of the present application.
- the storage access device 11-3 includes a processing unit of hardware, a high-speed peripheral component interconnection bus PCIe interface, and a network interface, one end of which is connected to the CPU 11-1 via a PCIe (Peripheral Component Interconnect Express) bus (can be regarded as an access computing node).
- PCIe Peripheral Component Interconnect Express
- the other end of the CPU 11-1 is connected to a network storage device (for example, network storage devices 21, 22, ... 2n) through a network interface (ETH/ROCE/IB, etc.).
- the storage access device 11-3 includes a hardware processing unit, and the processing chip in the storage access device 11-3 can be implemented by using a system-on-chip (SOC) or an application-specific integrated circuit (ASIC). Can be implemented with a CPU.
- the storage access device 11-3 may further include firmware and OS boot media, and other hardware devices such as a power source and a clock.
- the storage access device 11-3 supports a single-root I/O virtualization (SRIOV) function based on storage, and an internal processing chip provides an interface supporting a SRIOV-enabled PCIe endpoint device, and supports SR-IOV technology.
- the storage access device 11-3 includes a physical function (PF) and a virtual function (VF).
- the PF is a PCIe function that supports the SR-IOV extension function, and is used to configure and manage the SR-IOV function, PF. It is a full-featured PCIe feature that can be discovered, managed and processed just like any other PCIe device.
- the PF has a fully configured resource that can be used to configure or control the storage access device 11-3.
- VF is a function associated with physical functions.
- VF is a lightweight PCIe function that shares one or more physical resources with physical functions and other VFs associated with the same physical function.
- Each SR-IOV device can have at least one PF, and each PF can be configured with one or more VFs associated with it.
- the PF can create VFs through registers, which are presented in the PCIe configuration space, and each VF has its own PCIe configuration space.
- the VF can be displayed as a physically existing PCIe device, and one or more virtual functions can be allocated to the virtual machine by simulating the configuration space.
- the storage access device 11-3 supports the SRIOV for configuring at least one virtual function VF through the physical function PF in the SRIOV function, and configuring the association relationship between the VM and the VF to establish a through channel between the associated VM and the VF, wherein One VM corresponds to one VF.
- the storage access device 11-3 further supports a distributed storage resource scheduling function, configured to acquire, by using the network interface, a data block resource provided by the connected at least one network storage device, and form the acquired data block resource into multiple virtual
- a volume is configured to associate a VF with a virtual volume, where one VF corresponds to at least one virtual volume.
- the storage resources in the network storage device are encapsulated into virtual volumes, which are directly passed to the VM through VF.
- the virtual volume in the present application may also be referred to as a virtual disk.
- the physical disk is mainly a storage resource that is provided to the virtual machine, and is generally obtained by integrating physical storage resources.
- the storage access device 11-3 provided by the present application can directly establish a direct communication path between the storage resource and the VM, and the supported storage access method does not need the front-end software stack of the VM accessing the storage resource in the existing cloud computing virtualization technology. Reduce the software stack path, reduce latency, enhance performance, and eliminate the need to use too many hosts (CPU in the compute node) resources, improving the resource utilization of the host.
- the device 21-2n is a network storage device, and the storage access device 11-3 is connected thereto through a network interface.
- the network storage device 21-2n can form a distributed storage resource pool, and the capacity of the storage resource pool can be arbitrarily expanded.
- Each network storage device includes a storage controller and a storage medium, implements hard disk management and underlying data management, and presents data blocks and data upward. The block calls the interface, and each network storage device can be set locally at the remote rather than the compute node.
- the network storage device 21-2n supports distributed storage modes including multiple copies or ECs to ensure storage resource reliability.
- FIG. 2 is a schematic diagram of the composition of software modules of the computer system of the present application. Only one memory access device is illustrated in FIG. 2 as an example.
- the storage access device 11-3 includes an operating system 11-30, a resource scheduler 11-31, a pass-through module 11-32, a PF backend driver 11-a loaded in the operating system 11-30, and a pass-through module 11-32 including a PF and Multiple VFs.
- the PF front-end driver 11-b is provided in the operating system of the computing node 11.
- the VF driver module is loaded in the client operating system of the virtual machine 31.
- the pass-through module 11-32 of the storage access device 11-3 supports a single root I/O virtualized SRIOV for configuring at least one virtual function VF through the physical function PF in the SRIOV function, and configuring the association relationship between the VM and the VF. In order to establish a through channel between the associated VM and the VF.
- the PF backend driver 11-a is loaded to perform initialization of the SRIOV function; the computing node 11 loads the PF front-end driver 11-b, and passes the PF.
- the front end driver 11-b acquires resource information of the storage access device, configures a queue or interrupt resource for the storage access device, and implements communication with the storage access device 11-3.
- the resource scheduler 11-31 of the storage access device 11-3 further supports a distributed storage resource scheduling function for acquiring data block resources provided by the connected at least one network storage device through the network interface, and acquiring the obtained data.
- the data block resource constitutes a plurality of virtual volumes, and the virtual volume corresponding to the VF is configured and the association relationship between the VF and the corresponding virtual volume is recorded.
- the resource scheduler 11-31 encapsulates the data block resource into a virtual volume by accessing and calling the data block of the network storage device and the metadata of the data block, and configuring the VF to form a mapping relationship with the virtual volume.
- the resource scheduler 11-31 is also responsible for the management of the metadata of the virtual volume, and can query the specific physical address of the data block of the network storage device through the determined virtual volume.
- a management agent unit (not shown in FIG. 2) may also be deployed on the storage access device 11-3, the management agent unit providing a management interface connected to the management layer of the computer system (the management layer of the computer system may be a cloud management component or a storage management) a component or other component for managing a computer system, the management layer of the computer system may determine the allocation of the virtual machine resource according to the command of the upper layer application or according to the preset management policy, and determine the allocation of the virtual machine resource. The management layer of the computer system may initiate an allocation request for allocating storage resources to the VM through the management interface.
- the allocation request for allocating storage resources for the VM may be a request for creating a volume or mounting a volume
- the present application indicates an arbitrary virtual machine by using a first virtual machine
- the storage access device 11-3 receives an allocation request for allocating storage resources for the first VM through the management interface, and determines a first VF associated with the first VM, from the Allocating at least one virtual volume to the first VM in the plurality of virtual volumes, establishing the allocated at least one virtual volume and An association relationship between the first VFs, returning an allocation response, the allocation response including information of the at least one virtual volume allocated for the first VM.
- the pass-through module 11-32 and the resource scheduler 11-31 of the storage access device respectively implement the association between the VM and the VF and the association between the VF and the virtual volume
- the VM and the virtual volume actually implement the through-through, when the first VM issues
- the first VF of the storage access device 11-3 receives the I/O request of the first VM through the through channel
- the I/O request is placed in the IO queue of the first VF
- the resource scheduler of the storage access device 11-3 determines the I/O request corresponding according to the virtual volume associated with the first VF.
- the data block in the network storage device performs a read or write operation on a data block in the network storage device corresponding to the I/O request.
- the foregoing storage access method of the present application can eliminate the need for the front-end interface in the virtualization middle layer, reduce the VM access path, reduce the software stack path, reduce the access delay, and enhance the storage access performance;
- the use of too many hosts (CPUs in the compute nodes) resources improves the resource utilization of the host.
- VF driver Each VM is deployed with a VF driver (VF driver).
- the VF driver can be a SCSI driver or an NVMe driver. It is deployed inside the VM.
- the I/O request advertises that the VF driver is directly sent to and from the VM.
- the VM-associated VF when the storage access device performs the operation of the I/O request, the operation result of the I/O request is also transmitted back to the VM that issues the I/O request through the VF driver.
- the initial configuration of the storage access device and the computing node to implement communication after the storage access device 11-3 is booted, the storage access device 11-3 loads the PF backend driver 11-a, and the storage access device 11-3 performs initialization.
- the initialization process may include register initialization of PF and VF, configuration of in-band/out-of-band address translation unit mapping, Doorbell address configuration, DMA (Direct Memory Access) initialization, and base address register (Base Address).
- Register, BAR BAR initialization, the purpose of this initialization is to ensure that the storage access device 11-3 can be enumerated and recognized by the computing node 11 as a PCIe endpoint device as the computing node 11 to implement interrupt transmission between the hardware of the computing node 11. And memory access.
- the computing node 11 loads the PF front-end driver 11-b, acquires resource information on the storage access device 11-3, and configures resources such as interrupts, queues, and base address registers for the PF or VF in the storage access device 11-3 to implement storage access. Communication of the device.
- the PF in the storage access device 11-3 can be used to implement the function of single root I/O virtualization, the query of the storage medium in the storage device, the maintenance of the resource allocation table, and the like.
- the computing node 11 can load the PF front-end driver 11-b, enable the SR-IOV function, create a management queue, and send a resource query command to the PF through the PF front-end driver management queue. After receiving the query command, the PF receives the query command.
- the state of the storage medium, the interrupt resource, and the queue resource included in the storage device may be returned to the computing node 11.
- the computing node 11 may send an allocation command to the PF, and the PF divides the unified storage resource into a plurality of storage sub-resources according to the foregoing allocation command, thereby storing the sub-resources, Queue resources and interrupt resources are allocated to PF or VF.
- the process of configuring the relationship between the VM and the VF may occur when the VM is newly created, or after the VM is created.
- the management module of the VM or the upper layer application sends a VM association command to the PF backend driver 11-a of the storage access device 11-3 through the PF front end driver 11-b on the computing node 11, and the passthrough module 11 of the storage access device 11-3 32: Select the associated VF, and associate the VF with the VM, that is, establish a mapping relationship between the VF and the VM, record the association relationship between the specified VM and the corresponding VF, and return the association to the PF front-end driver 11-b.
- the message, the PF front-end driver 11-b returns the message of successful association to the management module of the VM or the upper application module.
- the process of integrating the data blocks on the network storage device into virtual volumes there is no specific relationship between the process and the foregoing process 2.
- the resource scheduler 11-31 forms a storage resource provided by the distributed network storage device acquirer through the network interface, that is, the data block resource of each network storage device, and composes the acquired data block resource into multiple virtual volumes, and records the virtual The correspondence between the volume and the physical address of the data block (metadata management of the virtual volume).
- the process of configuring the association between the VF and the virtual volume Take the association in the process of mounting the volume as an example (or the association can be implemented in the process of creating the volume).
- the user or the upper layer application initiates a command for the first VM to mount the volume, and the storage access device receives the command of the mounted volume through the management interface, and determines the first VF associated with the first VM, and the resource scheduler 11-31 processes the Selecting at least one virtual volume for the first VM, establishing an association relationship between the selected at least one virtual volume and the first VF, and returning a mount volume response, the mount
- the volume response includes information of the at least one virtual volume allocated for the first VM.
- the processing flow of the I/O request of the VM the first VM initiates an I/O request, and the first VF is further configured to receive the I/O request of the first VM through the through channel, and place the I/O request In the IO queue of the first VF, the resource scheduler 11-31 determines, according to the virtual volume associated with the first VF, a data block in the network storage device corresponding to the I/O request, The data block in the network storage device corresponding to the I/O request performs a read or write operation, and after the resource scheduler 11-31 performs the operation of the I/O request, the operation result of the I/O request is passed through the VF.
- the driver returns the first VM that issued the I/O request.
- an embodiment of the present invention further provides a storage access device 300.
- the storage access device 300 includes a first interface 301, a second interface 302, and a processing unit 303.
- the first interface 301 is a PCIe interface for connecting to the CPU of the computing node
- the second interface is a network interface for connecting to the network storage device
- the processing unit 303 can be implemented by a hardware processing circuit, such as a system-on-chip SOC. (System on Chip) or Application Specific Integrated Circuit (ASIC) or Field Programmable Gate Array (FPGA)
- the processing unit 303 may also be a combination of a CPU and a memory (including corresponding code) to realise.
- the storage access device shown in FIG. 3 is used to implement the functions of the storage access device 11-3 described in FIG. 1 and FIG. 2, and details are not described herein again.
- the disclosed systems, devices, and methods may be implemented in other manners.
- the device embodiments described above are merely illustrative.
- the division of the unit is only a logical function division.
- there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
- the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, or an electrical, mechanical or other form of connection.
- the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to On multiple network elements. Some or all of the units may be selected according to actual needs to achieve the objectives of the embodiments of the present invention.
- each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
- the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
- the integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, may be stored in a computer readable storage medium.
- the technical solution of the present invention contributes in essence or to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
- a number of instructions are included to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
- the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like. .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Human Computer Interaction (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
一种计算机系统和存储访问装置,所提供的存储访问装置用于通过SRIOV功能中的物理功能PF配置至少一个虚拟功能VF,并配置VM与VF的关联关系以使得相关联VM与VF之间建立直通通道,通过网络接口获取所连接的至少一个网络存储设备所提供的数据块资源,将所获取到的数据块资源组成多个虚拟卷,并配置VF与虚拟卷的关联关系。由于存储访问装置支持单根I/O虚拟化SRIOV和分布式存储资源调度功能,能够直接建立存储资源与虚拟机之间的直通通道,其所支持的存储访问方法可以缩短存储访问的路径和时延,以及减少对计算节点CPU资源的占用。
Description
本发明涉及存储技术,特别涉及一种计算机系统和存储访问装置。
虚拟化技术应用范围越来越广泛,通过网络、存储虚拟化提升资源利用率,提高虚拟机访问网络和存储的性能的需求越来越强。
现有虚拟化技术,通过虚拟化层Hypervisor或者虚拟机管理器(Virtual Machine Manager,VMM)实现虚拟存储资源的管理,虚拟化层或虚拟机管理器对下挂的存储资源进行封装成虚拟硬盘,并分配给不同的VM使用。虚拟机(Virtual Machine,VM)访问所分配的存储资源的路径比较复杂,需要通过部署在虚拟机上的前端访问接口接入到虚拟化层或虚拟机管理器上的后端访问接口(后端访问接口一般位于内核态),然后再由后端访问接口转发到虚拟化层或虚拟机管理器上部署的存储资源调度模块进行实际物理存储资源调度或定位(存储资源调度模块一般位于用户态),最后才能将存储访问请求转发到物理存储资源中去。
上述的存储资源访问方式,访问路径复杂,路径长,延时大;并且,访问请求必须经过虚拟机的前端访问接口、虚拟化层或虚拟机管理器的后端访问接口以及存储资源调度模块,这些都需要占用主机中的CPU的资源,影响主机的CPU资源占用率。
发明内容
本发明实施例提供一种计算机系统和存储访问装置,以实现虚拟机对存储资源的直通访问,缩短存储访问的路径和时延,以及减少对计算节点CPU资源的占用。
第一方面,本发明实施例提供一种计算机系统,该计算机系统包括n个计算节点、n个存储访问装置和m个网络存储设备,在每个计算节点上运行至少一个虚拟机VM,所述n个网络存储设备为所述至少一个虚拟机提供分布式存储资源,每个计算节点包括处理器、内存和存储访问装置,n和m为大于等于1的整数;
每个存储访问装置包括硬件的处理单元、高速外围组件互连总线PCIe接口以及网络接口,所述存储访问装置一端通过所述PCIe接口连接所述至少一个计算节点的处理器,一端通过所述网络接口连接所述至少一个网络存储设备;
本申请中的存储访问装置支持单根I/O虚拟化SRIOV,用于通过所述SRIOV功能中的物理功能PF配置至少一个虚拟功能VF,并配置VM与VF的关联关系以使得相关联VM与VF之间建立直通通道,其中,一个VM对应一个VF;该存储访问装置还支持分布式存储资源调度功能,用于通过所述网络接口获取所连接的至少一个网络存储设备所提供的数据块资源,将所获取到的数据块资源组成多个虚拟卷,并配置VF与虚拟卷的关联关系,其中,一个VF对应至少一个虚拟卷。
本申请提供的存储访问装置由于可直接建立存储资源与虚拟机之间的的直通通道,
其所支持的存储访问方法无需现有云计算虚拟化技术中VM访问存储资源的前后端软件栈,减少软件栈路径,减小延时,增强性能,同时也无需运用到太多的主机(计算节点中的CPU)资源,提高了主机的资源利用率。
可选地,本申请提供的存储访问装置中部署有PF后端驱动,与该存储访问装置相连的计算节点中部署有PF前端驱动;当该存储访问装置启动之后,加载所述PF后端驱动进行初始化;该存储访问装置连接的计算节点,加载所述PF前端驱动,通过所述PF前端驱动获取所述存储访问装置的资源信息,根据所述存储访问装置的资源信息向所述存储访问装置下发配置命令,以使得所述存储访问装置进行资源配置,为PF以及每个VF分配对应的硬件资源。
可选地,本申请提供的存储访问装置具体如下执行VM与VF的关联操作:存储访问装置连接的计算节点在接收到上层应用发送的第一VM关联命令后,将第一VM关联命令通过所述PF前端驱动模块转发到所述PF后端驱动模块;该存储访问装置通过所述PF后端驱动模块接收所述第一VM关联命令后,为所述第一VM关联命令中指定的第一VM配置对应的第一VF,并记录所述第一VM与第一VF之间的关联关系。
可选地,本申请提供的存储访问装置具体如下执行VF与虚拟卷的关联操作:存储访问装置(存储访问装置可以提供管理接口,例如面向管理层提供命令行界面CLI或网页用户界面WEB UI的接口)接收为第一VM分配存储资源的分配请求,确定所述第一VM关联的第一VF,从所述多个虚拟卷中为所述第一VM分配至少一个虚拟卷,建立所述分配的至少一个虚拟卷与所述第一VF之间的关联关系,返回分配响应,所述分配响应包含为所述第一VM分配的所述至少一个虚拟卷的信息。
可选地,本申请提供的存储访问装置具体如下执行VM的读写请求:存储访问装置的第一VF通过直通通道接收所述第一VM的I/O请求,将所述I/O请求放入所述第一VF的IO队列中,根据所述第一VF所关联的虚拟卷,确定所述I/O请求对应的所述网络存储设备中的数据块,对所述I/O请求对应的所述网络存储设备中的数据块执行读或写的操作。
第二方面,本申请还提供一种存储访问装置,该存储访问装置包括硬件的处理单元、高速外围组件互连总线PCIe接口以及网络接口,所述存储访问装置一端通过所述PCIe接口连接计算节点的处理器,一端通过所述网络接口连接至少一个网络存储设备;
所述计算节点上运行至少一个虚拟机VM,所述至少一个网络存储设备为所述至少一个虚拟机提供分布式存储资源;
本申请提供的存储访问装置包括直通模块,所述直通模块支持单根I/O虚拟化SRIOV,用于通过所述SRIOV功能中的物理功能PF虚拟出至少一个虚拟功能VF,并配置VM与VF的关联关系以使得相关联VM与VF之间建立直通通道,其中,一个VM对应一个VF;该存储访问装置还包括资源调度器,所述资源调度器支持分布式存储资源调度功能,用于通过所述网络接口获取所连接的至少一个网络存储设备所提供的数据块资源,将所获取到的数据块资源组成多个虚拟卷,并配置VF与虚拟卷的关联关系,其中,一个VF对应至少一个虚拟卷。
第三方面,本申请还提供一种存储访问装置,该存储访问装置包括硬件的处理单元、第一接口(例如,高速外围组件互连总线PCIe接口)以及第二接口(例如网络接
口);
本申请提供的存储访问装置中的硬件的处理单元,用于支持SRIOV和分布式存储资源调度功能,用于通过所述SRIOV功能中的物理功能PF虚拟出至少一个虚拟功能VF,并配置VM与VF的关联关系以使得相关联VM与VF之间建立直通通道,其中,一个VM对应一个VF,以及通过所述网络接口获取所连接的至少一个网络存储设备所提供的数据块资源,将所获取到的数据块资源组成多个虚拟卷,并配置VF与虚拟卷的关联关系,其中,一个VF对应至少一个虚拟卷;本申请提供的存储访问装置中的硬件的处理单元,用于执行本申请特有的存储访问方法:当第一VM发出I/O请求后,存储访问装置的第一VF通过直通通道接收所述第一VM的I/O请求,将所述I/O请求放入所述第一VF的IO队列中,处理单元再根据所述第一VF所关联的虚拟卷,确定所述I/O请求对应的所述网络存储设备中的数据块,对所述I/O请求对应的所述网络存储设备中的数据块执行读或写的操作。
本申请通过所述存储访问装置实现的存储访问方法可以不再需要经过虚拟化中间层中的前后端接口,减少了VM访问路径,减少了软件栈路径,减小了访问延时,增强了存储访问性能;同时也无需运用到太多的主机(计算节点中的CPU)资源,提高了主机的资源利用率。
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本申请提供的计算机系统的架构示意图;
图2为本申请提供的计算机系统的组成细节示意图;
图3为本申请提供的存储访问装置的结构示意图。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明的一部分实施例,而不是全部实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都应属于本发明保护的范围。
参考图1,图1为用于执行本申请中存储访问方法的计算机系统的示意图,图1分别示出了该计算机系统的硬件层和软件层,该计算机系统的硬件层包括n个计算节点(n在实际业务中,可为大于等于1的任何正整数)、n个存储访问装置与m个网络存储设备(m在实际业务中,可为大于等于1的任何正整数),其中,每个计算节点可以为刀片服务器、机架服务器或者其他类型的服务器,用于提供计算资源,每个网络存储设备可以为SAN存储设备,包含存储阵列或者HDD、SDD硬盘等,用于提供存储资源。该计算机系统的软件层包括用于进行虚拟化的虚拟化层(可能分布式设置于每个计算节点的操作系统OS中,或者独立设置于其中一个计算节点的操作系统中)和多
个虚拟机(Virtual Machine,VM)。
下面以在图1中其中的一个计算节点11为例来说明计算节点的组成。计算节点11包括CPU11-1、内存11-2,还可以包括一些其他的硬件组成,例如网络接口卡等等(图未示),CPU11-1可以是一个或多个INTEL处理器芯片,内存11-2为CPU11-1提供存储能力,暂时存放CPU11-1所运行的程序和数据,可以是随机存储器(RAM)、只读存储器(ROM)、以及高速缓存(CACHE)等等。
存储访问装置11-3为本申请新提供的一个硬件设备,用于作为关键角色实现本申请的存储访问方法。存储访问装置11-3包括硬件的处理单元、高速外围组件互连总线PCIe接口以及网络接口,其一端通过PCIe(Peripheral Component Inter connect Express)总线与CPU11-1进行连接(可以视为接入计算节点中CPU11-1的一个PCIe端点设备),另一端通过网络接口(ETH/ROCE/IB等)连接网络存储设备(例如网络存储设备21、22…2n)。存储访问装置11-3内部包含硬件的处理单元,存储访问装置11-3内部的处理芯片可以采用系统级芯片SOC(System on Chip)或专用集成电路ASIC(Application Specific Integrated Circuit)等来实现,也可以采用CPU来实现。存储访问装置11-3内部还可以包含Firmware及OS引导介质以及电源、时钟等其他硬件装备。存储访问装置11-3支持基于存储的单根I/O虚拟化(Single-root I/O virtualization,SRIOV)功能,其内部处理芯片提供支持SRIOV的PCIe端点设备的接口,支持SR-IOV技术的存储访问装置11-3包括物理功能(Physical Function,PF)和虚拟功能(Virtual Function,VF),PF为支持SR-IOV扩展功能的PCIe功能,被用于配置和管理SR-IOV功能特性,PF是全功能的PCIe功能,可以像其他任何PCIe设备一样进行发现、管理和处理。PF拥有完全配置资源,可以用于配置或控制存储访问装置11-3。VF是与物理功能关联的一种功能,VF是一种轻量级PCIe功能,可以与物理功能以及与同一物理功能关联的其他VF共享一个或多个物理资源。每个SR-IOV设备都可有至少一个PF,并且每个PF都可配置一个或多个与其关联的VF。PF可以通过寄存器创建VF,VF呈现在PCIe配置空间中,每个VF都有它们自己的PCIe配置空间。VF对外可显示为实际存在的PCIe设备,可通过模拟配置空间来分配一个或者多个虚拟功能给虚拟机。存储访问装置11-3支持SRIOV,用于通过所述SRIOV功能中的物理功能PF配置至少一个虚拟功能VF,并配置VM与VF的关联关系以使得相关联VM与VF之间建立直通通道,其中,一个VM对应一个VF。
存储访问装置11-3还支持分布式存储资源调度功能,用于通过所述网络接口获取所连接的至少一个网络存储设备所提供的数据块资源,将所获取到的数据块资源组成多个虚拟卷,配置VF与虚拟卷的关联关系,其中,一个VF对应至少一个虚拟卷。简单来说,就是将网络存储设备中的存储资源封装为虚拟卷,通过VF直通给VM使用。
本申请中的虚拟卷也可以称为虚拟磁盘,相对于物理磁盘,主要指提供给虚拟机使用的存储资源,一般由物理存储资源整合划分得到。
本申请提供的存储访问装置11-3由于可直接建立存储资源与VM之间的的直通通道,其所支持的存储访问方法无需现有云计算虚拟化技术中VM访问存储资源的前后端软件栈,减少软件栈路径,减小延时,增强性能,同时也无需运用到太多的主机(计算节点中的CPU)资源,提高了主机的资源利用率。
设备21-2n为网络存储设备,存储访问装置11-3通过网络接口与之连接。网络存储设备21-2n可组成分布式存储资源池,存储资源池的容量可任意扩展,每个网络存储设备包括存储控制器和存储介质,实现硬盘管理以及底层数据管理,向上呈现数据块及数据块调用接口,每个网络存储设备可设置在远端而非计算节点本地。网络存储设备21-2n支持包括多副本或者EC等分布式存储方式,可保证存储资源可靠性。
图2为本申请计算机系统的软件模块组成示意图。该图2中仅示意出一个存储访问装置为例。存储访问装置11-3包括操作系统11-30、资源调度器11-31、直通模块11-32,操作系统11-30中装载有PF后端驱动11-a,直通模块11-32包括PF以及多个VF。计算节点11的操作系统中设置有PF前端驱动11-b。虚拟机31的客户端操作系统中装载有VF驱动模块。
存储访问装置11-3的直通模块11-32支持单根I/O虚拟化SRIOV,用于通过所述SRIOV功能中的物理功能PF配置至少一个虚拟功能VF,并配置个VM与VF的关联关系以使得相关联VM与VF之间建立直通通道。
具体地,所述存储访问装置11-3启动之后,加载所述PF后端驱动11-a进行SRIOV功能的初始化;所述计算节点11,加载所述PF前端驱动11-b,通过所述PF前端驱动11-b获取所述存储访问装置的资源信息,为所述存储访问装置配置队列或中断资源,实现与存储访问装置11-3的通信。
存储访问装置11-3的资源调度器11-31还支持分布式存储资源调度功能,用于通过所述网络接口获取所连接的至少一个网络存储设备所提供的数据块资源,将所获取到的数据块资源组成多个虚拟卷,配置VF对应的虚拟卷并记录VF与对应的虚拟卷的的关联关系。
具体地,资源调度器11-31通过对网络存储设备的数据块和数据块的元数据的访问和调用,把数据块资源封装成虚拟卷,以及配置VF与虚拟卷形成两者映射关系,另外,资源调度器11-31还负责虚拟卷的元数据的管理,能够通过确定的虚拟卷查询到网络存储设备的数据块具体的物理地址。
存储访问装置11-3上还可部署管理代理单元(图2未示出),该管理代理单元提供管理接口连接到计算机系统的管理层(该计算机系统的管理层可以是云管理组件或存储管理组件或者其它用于对计算机系统进行管理的组件),计算机系统的管理层可以根据上层应用的命令,或者根据预置的管理策略确定进行虚拟机资源的分配,在确定需要进行虚拟机资源的分配的时候,计算机系统的管理层可以通过管理接口发起为VM分配存储资源的分配请求,具体地,该为VM分配存储资源的分配请求可以是创建卷或挂载卷的请求,以任意虚拟机为例,本申请以第一虚拟机表示任意虚拟机,存储访问装置11-3通过管理接口接收为第一VM分配存储资源的分配请求,确定所述第一VM关联的第一VF,从所述多个虚拟卷中为所述第一VM分配至少一个虚拟卷,建立所述分配的至少一个虚拟卷与所述第一VF之间的关联关系,返回分配响应,所述分配响应包含为所述第一VM分配的所述至少一个虚拟卷的信息。
在上述存储访问装置的直通模块11-32和资源调度器11-31分别实现了VM与VF的关联以及VF与虚拟卷的关联之后,VM和虚拟卷实际上实现了直通,当第一VM发出I/O请求后,存储访问装置11-3的第一VF通过直通通道接收所述第一VM的I/O请求,
将所述I/O请求放入所述第一VF的IO队列中,存储访问装置11-3的资源调度器再根据所述第一VF所关联的虚拟卷,确定所述I/O请求对应的所述网络存储设备中的数据块,对所述I/O请求对应的所述网络存储设备中的数据块执行读或写的操作。
本申请上述的存储访问方法可以不再需要经过虚拟化中间层中的前后端接口,减少了VM访问路径,减少了软件栈路径,减小了访问延时,增强了存储访问性能;同时也无需运用到太多的主机(计算节点中的CPU)资源,提高了主机的资源利用率。
每个VM中部署有VF驱动(VF driver),该VF驱动可以是SCSI driver或者NVMe driver,部署在VM内部,当VM发出I/O请求后,该I/O请求通告VF驱动直接发送到与VM关联的VF,当存储访问装置执行完该I/O请求的操作后,I/O请求的操作结果也通过VF驱动传回发出I/O请求的VM。
下面结合图2具体描述存储访问装置所执行的各项操作的流程。
1、存储访问装置与计算节点的初始配置以实现通信的流程:在存储访问装置11-3启动以后,存储访问装置11-3加载PF后端驱动11-a,存储访问装置11-3执行初始化,该初始化过程可以包括PF和VF的寄存器初始化,配置带内/带外地址转换单元映射,Doorbell(门铃)地址配置,DMA(Direct Memory Access,直接内存存取)初始化,基地址寄存器(Base Address Register,BAR)初始化,该初始化的目的是确保存储访问装置11-3可被计算节点11枚举到并认可为作为计算节点11的PCIe端点设备,以实现和计算节点11的硬件间的中断发送以及内存访问。计算节点11加载PF前端驱动11-b,获取存储访问装置11-3上的资源信息,为存储访问装置11-3中的PF或VF配置中断、队列、基地址寄存器等资源,实现与存储访问装置的通信。
在本申请中,存储访问装置11-3中的PF可用于实现单根I/O虚拟化的使能,存储设备中的存储介质的查询、分配以及资源分配表的维护等功能。具体的,计算节点11可加载PF前端驱动11-b,使能SR-IOV功能,创建管理队列,并通过PF前端驱动管理队列下发资源查询命令给PF,PF接收到上述查询命令之后,则可将存储设备中包括的存储介质、中断资源以及队列资源的状态返回给计算节点11。计算节点11接收到PF返回的存储介质中断资源以及队列资源的状态之后,则可向PF发送分配命令,PF根据上述分配命令将统一存储资源划分为多个存储子资源,进而将存储子资源、队列资源和中断资源分配给PF或者VF。
2、配置VM与VF的关联关系的流程:VM与VF的关联可以发生在VM新创建的时候,也可以发生在VM创建之后。VM的管理模块或者上层应用通过计算节点11上的PF前端驱动11-b发送VM关联命令到存储访问装置11-3的PF后端驱动11-a,存储访问装置11-3的直通模块11-32选择相关联的VF,并进行VF与VM的关联,即建立VF与VM的映射关系,记录所述指定的VM与对应的VF之间的关联关系,向PF前端驱动11-b返回关联成功的消息,PF前端驱动11-b再将关联成功的消息返回给VM的管理模块或者上层应用模块。
3、将网络存储设备上的数据块整合为虚拟卷的流程:该流程与前述的流程2之间没有特定的先后关系,在存储访问装置11-3启动后,上述两个流程都可随时执行。资源调度器11-31通过网络接口向分布式的网络存储设备获取器提供的存储资源,即各个网络存储设备的数据块资源,将所获取到的数据块资源组成多个虚拟卷,记录虚拟
卷与数据块的物理地址的对应关系(虚拟卷的元数据管理)。
4、配置VF与虚拟卷的关联关系的流程:以在挂载卷的过程中实现关联为例(也可以是创建卷的过程中实现关联)。用户或上层应用发起为第一VM挂载卷的命令,存储访问装置通过管理接口接收到该挂载卷的命令,确定所述第一VM关联的第一VF,资源调度器11-31从流程3中组成的多个虚拟卷中为第一VM选择至少一个虚拟卷,建立所述选择的至少一个虚拟卷与所述第一VF之间的关联关系,返回挂载卷响应,所述挂载卷响应包含为第一VM分配的所述至少一个虚拟卷的信息。
5、VM的I/O请求的处理流程:第一VM发起I/O请求,第一VF还用于通过直通通道接收所述第一VM的I/O请求,将所述I/O请求放入所述第一VF的IO队列中,资源调度器11-31根据所述第一VF所关联的虚拟卷,确定所述I/O请求对应的所述网络存储设备中的数据块,对所述I/O请求对应的所述网络存储设备中的数据块执行读或写的操作,资源调度器11-31执行完该I/O请求的操作后,将I/O请求的操作结果通过VF驱动传回发出I/O请求的第一VM。
如图3所示,本发明实施例还提供了一种存储访问装置300,所述存储访问装置300包括第一接口301、第二接口302以及处理单元303。具体地,第一接口301为PCIe接口,用于连接计算节点的CPU,第二接口为网络接口,用于连接网络存储设备,处理单元303可以是硬件的处理电路来实现,例如系统级芯片SOC(System on Chip)或专用集成电路ASIC(Application Specific Integrated Circuit)或现场可编程门阵列(Field Programmable Gate Array,FPGA),处理单元303也可以是由CPU与内存(包含相应代码)的组合的方式来实现。
图3所示存储访问装置用于实现上述图1和图2中所描述的存储访问装置11-3的功能,在此不再赘述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到
多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本发明实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。
Claims (10)
- 一种计算机系统,其特征在于,所述计算机系统包括n个计算节点、n个存储访问装置和m个网络存储设备,在每个计算节点上运行至少一个虚拟机VM,所述n个网络存储设备为所述至少一个虚拟机提供分布式存储资源,每个计算节点包括处理器、内存和存储访问装置,n和m为大于等于1的整数;每个存储访问装置包括硬件的处理单元、高速外围组件互连总线PCIe接口以及网络接口,所述存储访问装置一端通过所述PCIe接口连接所述至少一个计算节点的处理器,一端通过所述网络接口连接所述至少一个网络存储设备;所述存储访问装置支持单根I/O虚拟化SRIOV,用于通过所述SRIOV功能中的物理功能PF配置至少一个虚拟功能VF,并配置VM与VF的关联关系以使得相关联VM与VF之间建立直通通道,其中,一个VM对应一个VF;所述存储访问装置支持分布式存储资源调度功能,用于通过所述网络接口获取所连接的至少一个网络存储设备所提供的数据块资源,将所获取到的数据块资源组成多个虚拟卷,并配置VF与虚拟卷的关联关系,其中,一个VF对应至少一个虚拟卷。
- 根据权利要求1所述的计算机系统,其特征在于,所述存储访问装置中部署有PF后端驱动,与所述存储访问装置相连的计算节点中部署有PF前端驱动;所述存储访问装置启动之后,加载所述PF后端驱动进行初始化配置;所述存储访问装置连接的计算节点,加载所述PF前端驱动,通过所述PF前端驱动获取所述存储访问装置的资源信息,根据所述存储访问装置的资源信息向所述存储访问装置下发配置命令,以使得所述存储访问装置进行资源配置,为PF以及每个VF分配对应的硬件资源。
- 根据权利要求2所述的计算机系统,其特征在于,所述存储访问装置连接的计算节点在接收到上层应用发送的第一VM关联命令后,将所述第一VM关联命令通过所述PF前端驱动模块转发到所述PF后端驱动模块;所述存储访问装置具体用于通过所述PF后端驱动模块接收所述第一VM关联命令后,为所述第一VM关联命令中指定的第一VM配置对应的第一VF,并记录所述第一VM与第一VF之间的关联关系。
- 根据权利要求1-3任一项所述的计算机系统,其特征在于,所述存储访问装置具体用于接收为第一VM分配存储资源的分配请求,确定所述第一VM关联的第一VF,从所述多个虚拟卷中为所述第一VM分配至少一个虚拟卷,建立所述分配的至少一个虚拟卷与所述第一VF之间的关联关系,返回分配响应,所述分配响应包含为所述第一VM分配的所述至少一个虚拟卷的信息。
- 根据权利要求4所述的计算机系统,其特征在于,所述为第一VM分配存储资源的分配请求为所述第一VM的虚拟卷的创建请求或者所述第一VM的虚拟卷的挂载请求。
- 根据权利要求4或5所述的计算机系统,其特征在于,所述存储访问装置的所述第一VF还用于通过直通通道接收所述第一VM的I/O请求,将所述I/O请求放入所述第一VF的IO队列中,根据所述第一VF所关联的虚拟卷,确定所述I/O请求对应的所述网络存储设备中的数据块,对所述I/O请求对应的所述网络存储设备中的数据块 执行读或写的操作。
- 一种存储访问装置,其特征在于,所述存储访问装置包括硬件的处理单元、高速外围组件互连总线PCIe接口以及网络接口,所述存储访问装置一端通过所述PCIe接口连接计算节点的处理器,一端通过所述网络接口连接至少一个网络存储设备;所述计算节点上运行至少一个虚拟机VM,所述至少一个网络存储设备为所述至少一个虚拟机提供分布式存储资源;所述存储访问装置包括直通模块,所述直通模块支持单根I/O虚拟化SRIOV,用于通过所述SRIOV功能中的物理功能PF虚拟出至少一个虚拟功能VF,并配置VM与VF的关联关系以使得相关联VM与VF之间建立直通通道,其中,一个VM对应一个VF;所述存储访问装置还包括资源调度器,所述资源调度器支持分布式存储资源调度功能,用于通过所述网络接口获取所连接的至少一个网络存储设备所提供的数据块资源,将所获取到的数据块资源组成多个虚拟卷,并配置VF与虚拟卷的关联关系,其中,一个VF对应至少一个虚拟卷。
- 根据权利要求7所述的存储访问装置,其特征在于,所述资源调度器具体用于接收为第一VM分配存储资源的分配请求,从所述多个虚拟卷中为所述第一VM分配至少一个虚拟卷,根据所述直通模块确定的所述第一VM关联的第一VF,建立所述分配的至少一个虚拟卷与所述第一VF之间的关联关系,返回分配响应,所述分配响应包含为所述第一VM分配的所述至少一个虚拟卷的信息。
- 根据权利要求8所述的存储访问装置,其特征在于,所述为第一VM分配存储资源的分配请求为创建所述第一VM的虚拟卷的请求或者挂载所述第一VM的虚拟卷的请求。
- 根据权利要求8或9所述的存储访问装置,其特征在于,所述直通模块中的所述第一VF还用于通过直通通道接收所述第一VM的I/O请求,将所述I/O请求放入所述第一VF的IO队列中;所述资源调度器根据所述第一VF所关联的虚拟卷,确定所述I/O请求对应的所述网络存储设备中的数据块,对所述I/O请求对应的所述网络存储设备中的数据块执行读或写的操作。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP17854531.5A EP3457288B1 (en) | 2016-09-30 | 2017-07-13 | Computer system and storage access device |
| US16/258,575 US20190155548A1 (en) | 2016-09-30 | 2019-01-26 | Computer system and storage access apparatus |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610878406.9A CN107894913B (zh) | 2016-09-30 | 2016-09-30 | 一种计算机系统和存储访问装置 |
| CN201610878406.9 | 2016-09-30 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US16/258,575 Continuation US20190155548A1 (en) | 2016-09-30 | 2019-01-26 | Computer system and storage access apparatus |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2018059077A1 true WO2018059077A1 (zh) | 2018-04-05 |
Family
ID=61763741
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2017/092816 Ceased WO2018059077A1 (zh) | 2016-09-30 | 2017-07-13 | 一种计算机系统和存储访问装置 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20190155548A1 (zh) |
| EP (1) | EP3457288B1 (zh) |
| CN (1) | CN107894913B (zh) |
| WO (1) | WO2018059077A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116583824A (zh) * | 2020-08-19 | 2023-08-11 | 华为技术有限公司 | 可互换接口角色重新映射 |
Families Citing this family (29)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10990445B2 (en) * | 2017-08-04 | 2021-04-27 | Apple Inc. | Hardware resource allocation system for allocating resources to threads |
| CN108959127B (zh) * | 2018-05-31 | 2021-02-09 | 华为技术有限公司 | 地址转换方法、装置及系统 |
| US11003372B2 (en) | 2018-05-31 | 2021-05-11 | Portworx, Inc. | Protecting volume namespaces from corruption in a distributed container orchestrator |
| CN110825485A (zh) | 2018-08-07 | 2020-02-21 | 华为技术有限公司 | 数据处理的方法、设备和服务器 |
| US11074013B2 (en) * | 2018-08-07 | 2021-07-27 | Marvell Asia Pte, Ltd. | Apparatus and methods for providing quality of service over a virtual interface for solid-state storage |
| JP6957431B2 (ja) * | 2018-09-27 | 2021-11-02 | 株式会社日立製作所 | Hci環境でのvm/コンテナおよびボリューム配置決定方法及びストレージシステム |
| CN112148418B (zh) * | 2019-06-26 | 2026-02-24 | 昆仑芯(北京)科技股份有限公司 | 用于访问数据的方法、装置、设备和介质 |
| CN112395071B (zh) * | 2019-08-12 | 2026-01-02 | 昆仑芯(北京)科技有限公司 | 用于资源管理的方法、装置、电子设备和存储介质 |
| KR102910539B1 (ko) * | 2020-02-18 | 2026-01-09 | 삼성전자주식회사 | 다중-호스트를 지원하도록 구성된 스토리지 장치 및 그것의 동작 방법 |
| CN113312137B (zh) * | 2020-04-09 | 2025-06-13 | 阿里巴巴集团控股有限公司 | 数据处理方法、装置、设备及系统 |
| CN113657069B (zh) * | 2020-05-12 | 2024-07-12 | 北京东土科技股份有限公司 | 片上系统soc仿真验证方法、装置、验证服务器及存储介质 |
| US11822964B2 (en) * | 2020-06-03 | 2023-11-21 | Baidu Usa Llc | Data protection with static resource partition for data processing accelerators |
| CN111949371B (zh) | 2020-08-14 | 2022-07-22 | 苏州浪潮智能科技有限公司 | 一种命令信息传输方法、系统、装置及可读存储介质 |
| CN113312143B (zh) * | 2021-03-03 | 2024-01-23 | 阿里巴巴新加坡控股有限公司 | 云计算系统、命令处理方法及虚拟化仿真装置 |
| US12039357B2 (en) | 2021-04-23 | 2024-07-16 | Samsung Electronics Co., Ltd. | Mechanism for distributed resource-based I/O scheduling over storage device |
| US12050807B2 (en) * | 2021-04-23 | 2024-07-30 | EMC IP Holding Company, LLC | Memory management system and method |
| CN115469962A (zh) * | 2021-06-10 | 2022-12-13 | 宏碁股份有限公司 | 虚拟机的装置直通方法及其服务器 |
| CN113626139B (zh) * | 2021-06-30 | 2023-03-24 | 济南浪潮数据技术有限公司 | 一种高可用的虚拟机存储方法及装置 |
| CN114691298A (zh) * | 2022-03-16 | 2022-07-01 | 阿里云计算有限公司 | 数据处理方法、装置、设备和存储介质 |
| CN114756332B (zh) * | 2022-05-19 | 2025-04-25 | 阿里巴巴(中国)有限公司 | 基于虚拟机设备直通的数据访问方法、设备以及系统 |
| CN114860387B (zh) * | 2022-06-08 | 2023-04-18 | 无锡众星微系统技术有限公司 | 一种面向虚拟化存储应用的hba控制器i/o虚拟化方法 |
| CN115185880B (zh) * | 2022-09-09 | 2022-12-09 | 南京芯驰半导体科技有限公司 | 一种数据存储方法及装置 |
| CN115913953B (zh) * | 2022-11-04 | 2024-06-04 | 陕西浪潮英信科技有限公司 | 一种云资源加速方法、装置及其介质 |
| CN116048658A (zh) * | 2022-12-30 | 2023-05-02 | 中国长城科技集团股份有限公司 | 数据访问模块的部署方法、数据访问方法及板卡 |
| CN115827168B (zh) * | 2023-02-01 | 2023-05-12 | 南京芯传汇电子科技有限公司 | 一种二进制仿真环境下虚拟机通信的优化方法 |
| CN116069451B (zh) * | 2023-03-13 | 2023-06-16 | 苏州浪潮智能科技有限公司 | 一种虚拟化方法、装置、设备、介质、加速器及系统 |
| CN116627883B (zh) * | 2023-06-08 | 2024-10-01 | 北京大禹智芯科技有限公司 | PCIe设备的片上空间的优化方法及设备 |
| CN117369734B (zh) * | 2023-12-08 | 2024-03-08 | 浪潮电子信息产业股份有限公司 | 一种存储资源管理系统、方法及存储虚拟化系统 |
| CN120821504A (zh) * | 2024-04-11 | 2025-10-21 | 成都华为技术有限公司 | 一种配置pci设备资源的方法及装置 |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110179414A1 (en) * | 2010-01-18 | 2011-07-21 | Vmware, Inc. | Configuring vm and io storage adapter vf for virtual target addressing during direct data access |
| CN102255962A (zh) * | 2011-07-01 | 2011-11-23 | 成都市华为赛门铁克科技有限公司 | 一种分布式存储方法、装置和系统 |
| CN104461958A (zh) * | 2014-10-31 | 2015-03-25 | 杭州华为数字技术有限公司 | 支持sr-iov的存储资源访问方法、存储控制器及存储设备 |
| CN105808167A (zh) * | 2016-03-10 | 2016-07-27 | 深圳市杉岩数据技术有限公司 | 一种基于sr-iov的链接克隆的方法、存储设备及系统 |
Family Cites Families (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US8489699B2 (en) * | 2010-08-13 | 2013-07-16 | Vmware, Inc. | Live migration of virtual machine during direct access to storage over SR IOV adapter |
| US9135044B2 (en) * | 2010-10-26 | 2015-09-15 | Avago Technologies General Ip (Singapore) Pte. Ltd. | Virtual function boot in multi-root I/O virtualization environments to enable multiple servers to share virtual functions of a storage adapter through a MR-IOV switch |
| TW201235946A (en) * | 2011-02-18 | 2012-09-01 | Hon Hai Prec Ind Co Ltd | Method and system for configuring USB device in virtual environment |
| CN102650976B (zh) * | 2012-04-01 | 2014-07-09 | 中国科学院计算技术研究所 | 一种支持单根io虚拟化用户级接口控制装置及其方法 |
| CN103870311B (zh) * | 2012-12-10 | 2016-12-21 | 华为技术有限公司 | 通过半虚拟化驱动访问硬件的方法、后端驱动及前端驱动 |
| US9665309B2 (en) * | 2014-06-27 | 2017-05-30 | International Business Machines Corporation | Extending existing storage devices in virtualized environments |
-
2016
- 2016-09-30 CN CN201610878406.9A patent/CN107894913B/zh active Active
-
2017
- 2017-07-13 WO PCT/CN2017/092816 patent/WO2018059077A1/zh not_active Ceased
- 2017-07-13 EP EP17854531.5A patent/EP3457288B1/en active Active
-
2019
- 2019-01-26 US US16/258,575 patent/US20190155548A1/en not_active Abandoned
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20110179414A1 (en) * | 2010-01-18 | 2011-07-21 | Vmware, Inc. | Configuring vm and io storage adapter vf for virtual target addressing during direct data access |
| CN102255962A (zh) * | 2011-07-01 | 2011-11-23 | 成都市华为赛门铁克科技有限公司 | 一种分布式存储方法、装置和系统 |
| CN104461958A (zh) * | 2014-10-31 | 2015-03-25 | 杭州华为数字技术有限公司 | 支持sr-iov的存储资源访问方法、存储控制器及存储设备 |
| CN105808167A (zh) * | 2016-03-10 | 2016-07-27 | 深圳市杉岩数据技术有限公司 | 一种基于sr-iov的链接克隆的方法、存储设备及系统 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP3457288A4 * |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN116583824A (zh) * | 2020-08-19 | 2023-08-11 | 华为技术有限公司 | 可互换接口角色重新映射 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP3457288A4 (en) | 2019-07-17 |
| EP3457288B1 (en) | 2021-11-17 |
| US20190155548A1 (en) | 2019-05-23 |
| CN107894913A (zh) | 2018-04-10 |
| CN107894913B (zh) | 2022-05-13 |
| EP3457288A1 (en) | 2019-03-20 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| WO2018059077A1 (zh) | 一种计算机系统和存储访问装置 | |
| US10768862B2 (en) | Extending existing storage devices in virtualized environments | |
| US9529773B2 (en) | Systems and methods for enabling access to extensible remote storage over a network as local storage via a logical storage controller | |
| TWI621023B (zh) | 用於支持對經由nvme控制器、通過網路存取的遠端存儲設備的熱插拔的系統和方法 | |
| US11922072B2 (en) | System supporting virtualization of SR-IOV capable devices | |
| US12248708B2 (en) | System supporting virtualization of SR-IOV capable devices | |
| US8141093B2 (en) | Management of an IOV adapter through a virtual intermediary in an IOV management partition | |
| US9146766B2 (en) | Consistent unmapping of application data in presence of concurrent, unquiesced writers and readers | |
| US12273283B2 (en) | Packet forwarding method, computer device, and intermediate device | |
| WO2021000717A1 (zh) | 一种io处理的方法和装置 | |
| US20170102874A1 (en) | Computer system | |
| US11016817B2 (en) | Multi root I/O virtualization system | |
| CN105739930A (zh) | 一种存储架构及其初始化方法和数据存储方法及管理装置 | |
| CN105556473A (zh) | 一种i/o任务处理的方法、设备和系统 | |
| US10437495B1 (en) | Storage system with binding of host non-volatile memory to one or more storage devices | |
| CN110704163A (zh) | 一种服务器及其虚拟化存储方法和装置 | |
| JP6760579B2 (ja) | ネットワークラインカード(lc)のホストオペレーティングシステム(os)への統合 | |
| JP2018181305A (ja) | プールされた物理リソースのローカルディスク消去メカニズム | |
| CN116048658A (zh) | 数据访问模块的部署方法、数据访问方法及板卡 | |
| CN115910191A (zh) | 用于虚拟机环境的磁盘测试方法、装置、设备及介质 | |
| CN112131166A (zh) | 轻量桥接器电路及其操作方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17854531 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2017854531 Country of ref document: EP Effective date: 20181210 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |