WO2023040504A1 - 数据处理系统、数据处理方法及相关装置 - Google Patents

数据处理系统、数据处理方法及相关装置 Download PDF

Info

Publication number
WO2023040504A1
WO2023040504A1 PCT/CN2022/110102 CN2022110102W WO2023040504A1 WO 2023040504 A1 WO2023040504 A1 WO 2023040504A1 CN 2022110102 W CN2022110102 W CN 2022110102W WO 2023040504 A1 WO2023040504 A1 WO 2023040504A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
storage
virtual bus
message queue
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2022/110102
Other languages
English (en)
French (fr)
Inventor
兰龙文
程卓
程桢
周文
苏毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to EP22868881.8A priority Critical patent/EP4391463A4/en
Publication of WO2023040504A1 publication Critical patent/WO2023040504A1/zh
Priority to US18/606,527 priority patent/US12568140B2/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • H04L12/40006Architecture of a communication node
    • H04L12/40013Details regarding a bus controller
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/40Bus networks
    • H04L12/40052High-speed IEEE 1394 serial bus
    • H04L12/40104Security; Encryption; Content protection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1073Registration or de-registration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/46Interconnection of networks
    • H04L12/4641Virtual LANs, VLANs, e.g. virtual private networks [VPN]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications

Definitions

  • the present application relates to the field of storage technologies, and in particular to a data processing system, a data processing method and related devices.
  • Embodiments of the present application provide a data processing system, a data processing method, and related devices, which can realize the interconnection between devices and meet the requirements for data flow between devices.
  • an embodiment of the present application provides a data processing system, including: a storage layer, a virtual bus, and a service layer;
  • the storage layer includes a plurality of storage devices, and the storage layer is used to provide storage space;
  • the virtual bus is configured to receive the registration of the storage device to form a virtual network including multiple storage devices
  • the service layer is used to provide services for users based on the virtual network.
  • the registered storage devices can be connected to the virtual bus, and the interconnection between the storage devices is realized based on the virtual bus, so that multiple devices can discover and communicate with each other without being directly connected.
  • the services in the service layer can also be connected to the virtual bus, so as to realize data flow management, orchestration and other services based on the virtual bus.
  • the storage devices and services connected to the virtual bus may be heterogeneous, for example, the hardware model, software version, manufacturer, location, or file system of the storage devices may be different from each other, and the service content, form, Or architecture etc. can be different from each other. Heterogeneous devices can also be connected to each other through the layout specification provided by the virtual bus, so as to establish a loosely coupled and distributed data collaboration relationship.
  • the aforementioned multiple storage devices include centralized storage devices, distributed storage devices, virtual storage devices, and physical storage devices.
  • the multiple storage devices include one or more of a storage array, a storage server, a public cloud, a private cloud, or a hybrid cloud.
  • hybrid multi-cloud users may have multiple sets of storage devices, which may be distributed in different regions or in different clouds.
  • various storage devices can be connected to each other through the virtual bus to meet the user's connection requirements for different storage devices.
  • one storage device among the multiple storage devices is configured to submit registration information to the virtual bus.
  • the virtual bus can record the registration information of the storage device, thereby completing the registration of the storage device.
  • the virtual bus can provide multiple access points, and the storage device can perform device registration through the access points. Since there are usually multiple access points and distributed in different regions, the registration service is decentralized and serverless. In addition, the storage device can also choose an access point that is close to itself or meets its own business needs to access the virtual bus, so as to improve the registration efficiency of the storage device and improve the service using the virtual bus.
  • the registration information includes device information of the storage device (or referred to as configuration information of the storage device).
  • the registration information includes data metadata.
  • the registration information includes storage pool information.
  • the registration information written into the virtual bus is written in a registration format.
  • the registration format may be a pre-defined, pre-configured or protocol-specified data format, or the registration format may be a data format defined by the virtual bus, or the registration format may be a data format negotiated between the virtual bus and the storage device.
  • the data format may include a key-value format.
  • Design 1 The storage device writes the registration information according to the registration format.
  • Design 2 After the storage device uploads the registration information to the virtual bus, it converts the registration information into a unified format and writes it into the virtual bus through other services.
  • one of the multiple storage devices is configured to submit the registration information to the virtual bus after obtaining authorization.
  • the sending end and the receiving end in the plurality of storage devices are not directly connected, and the sending end and the receiving end exchange data or transfer data through the virtual bus information.
  • the virtual bus further includes a message queue, and the message queue is configured to store messages about any one of the storage devices.
  • the multiple storage devices include a sending end and a receiving end;
  • the message queue of the virtual bus is used to record a data push request from the receiving end, and the data push request is used to request the sending end to push target data;
  • the sending end is used to push the target data to a preset location
  • the message queue of the virtual bus is also used to record a data preparation message about the receiving end, and the data preparation message is used to notify the receiving end to pull the target data.
  • the embodiment of the present application provides a data processing method, including:
  • a storage layer is provided, the storage layer includes a plurality of storage devices, and the storage layer is used to provide storage space;
  • the virtual bus is provided, and the virtual bus is used to receive the registration of the storage device to form a virtual network including a plurality of storage devices;
  • a service layer is provided, and the service layer is used to provide services for users based on the virtual network.
  • the registered devices can be connected to the virtual bus, and the interconnection between storage devices can be realized based on the virtual bus, so that multiple devices can discover and communicate with each other without being directly interconnected.
  • the services in the service layer can also be connected to the virtual bus, so as to realize data flow management, orchestration and other services based on the virtual bus.
  • the storage devices and services connected to the virtual bus may be heterogeneous, for example, the hardware model, software version, manufacturer, location, or file system of the storage devices may be different from each other, and the service content, form, Or architecture etc. can be different from each other. Heterogeneous devices can also be connected to each other through the layout specification provided by the virtual bus, so as to establish a loosely coupled and distributed data collaboration relationship.
  • the aforementioned multiple storage devices include centralized storage devices, distributed storage devices, virtual storage devices, and physical storage devices.
  • the multiple storage devices include one or more of a storage array, a storage server, a public cloud, a private cloud, or a hybrid cloud.
  • users may have multiple sets of storage devices, which may be distributed in different regions or in different clouds.
  • various storage devices can be connected to each other through a virtual bus to meet the user's connection requirements for different storage devices.
  • the method further includes: one storage device among the multiple storage devices submits registration information to the virtual bus.
  • the virtual bus can record the registration information of the storage device, thereby completing the registration of the storage device.
  • the virtual bus can provide multiple access points, and the storage device can perform device registration through the access points. Since there are usually multiple access points and distributed in different regions, the registration service is decentralized and serverless. In addition, the storage device can also choose an access point that is close to itself or meets its own business needs to access the virtual bus, so as to improve the registration efficiency of the storage device and improve the service using the virtual bus.
  • the registration information includes device information of the storage device (or referred to as configuration information of the storage device).
  • the registration information includes data metadata.
  • the registration information includes storage pool information.
  • the registration information written into the virtual bus is written in a registration format.
  • the registration format may be a pre-defined, pre-configured or protocol-specified data format, or the registration format may be a data format defined by the virtual bus, or the registration format may be a data format negotiated between the virtual bus and the storage device.
  • the data format may include a key-value format.
  • Design 1 The storage device writes the registration information according to the registration format.
  • Design 2 After the storage device uploads the registration information to the virtual bus, it converts the registration information into a unified format and writes it into the virtual bus through other services.
  • the method further includes: submitting the registration information to the virtual bus by one of the multiple storage devices after obtaining the authorization.
  • the sending end and the receiving end in the plurality of storage devices are not directly connected, and the sending end and the receiving end exchange data or transfer data through the virtual bus information.
  • the virtual bus further includes a message queue, and the message queue is configured to store messages about any one of the storage devices.
  • the multiple storage devices include a sending end and a receiving end; the method further includes:
  • the virtual bus records a data push request from the receiving end through the message queue, and the data pushing request is used to request the sending end to push target data;
  • the sending end pushes the target data to a preset location
  • the virtual bus records a data preparation message about the receiving end through the message queue, and the data preparation message is used to notify the receiving end to fetch the target data.
  • the embodiment of the present application provides a computing node, the computing node includes a processor and a memory; at least one computer instruction is stored in the memory; the instruction is loaded and executed by the processor, so as to realize any one of the aforementioned second aspect The method operation performed by the virtual bus in the item.
  • an embodiment of the present application provides a storage device, the storage device includes a processor and a memory; at least one computer instruction is stored in the memory; the instruction is loaded and executed by the processor, so as to realize any one of the aforementioned second aspects.
  • the embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores instructions, and when the instructions are run on at least one processor, any one of the aforementioned second aspects is implemented. method described.
  • the present application provides a computer program product, the computer program product includes computer instructions, and when the instructions are run on at least one processor, the method described in any one of the aforementioned second aspects is implemented.
  • the computer program product may be a software installation package. If the aforementioned method needs to be used, the software installation package may be downloaded and the computer instructions formed by the software installation package may be executed on the computing device.
  • FIG. 1 is a schematic diagram of a data flow scenario provided by an embodiment of the present application.
  • Fig. 2 is a schematic diagram of a data processing system provided by an embodiment of the present application.
  • Fig. 3 is a schematic diagram of a device provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an object storage pool provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a view of a device on a virtual bus provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a usage scenario of a data processing system provided by an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a device directory of a virtual bus provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of an online state provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of another online state provided by the embodiment of the present application.
  • FIG. 10 is a schematic diagram of a catalog of contents stored in a storage pool provided by an embodiment of the present application.
  • Fig. 11 is a schematic flowchart of another data processing method provided by the embodiment of the present application.
  • FIG. 12 is a schematic diagram of another directory in a storage pool provided by an embodiment of the present application.
  • Fig. 13 is a schematic flowchart of another data processing method provided by the embodiment of the present application.
  • Fig. 14 is a schematic diagram of mapping blocks into objects provided by the embodiment of the present application.
  • the object storage service is an object-based storage service, in which an object contains file data and its related attribute information.
  • Object storage services are usually provided by service providers, for example, service providers may include Huawei, Amazon, Facebook, or Microsoft.
  • OBS Object Storage Service
  • the object includes one or more items of data (data), metadata (metedata), or key (key).
  • data is the data content of the file.
  • the key can be regarded as the identifier of the object (or the name of the object).
  • metedata that is, the description information of the object, usually exists in the form of key-value pairs (Key-Value).
  • a bucket is a container for storing objects.
  • a bucket contains attributes such as access domain name, storage class, access permission, or the region it belongs to.
  • Object storage service users can access the bucket through the domain name of the bucket.
  • a message queue is a communication method that can be understood as a list containing one or more messages.
  • the message is stored on the message queue before being processed and deleted, and the message sender can interact with the message receiver through the message queue service.
  • the present application collectively refers to the data structure containing multiple messages as a message queue, and does not intend to limit the realization of the message queue by means of queues.
  • the message queue can also be implemented by means of a list, a heap, a linked list, or a stack.
  • the device can obtain messages that need to be processed by itself by periodically or aperiodically reading (or listening to) the message queue.
  • the message can be notified to the receiver through the message queue service.
  • the message queue service can notify the receiver to process the message. For example, remind the receiver to read the message through the doorbell mechanism, etc.
  • Message queues are suitable for serverless architectures, microservice architectures, and server architectures as well.
  • a user is an individual or an organization with independent resources (eg, one or more of computing resources, storage resources, or network resources, etc.). In some scenarios, it can also be called a tenant or a member.
  • independent resources eg, one or more of computing resources, storage resources, or network resources, etc.
  • Redirection refers to the process of redirecting a network request to another location for processing.
  • the content originally in address 1 is migrated to address 2.
  • the user's access to address 1 can be redirected, that is, the user's access to address 1 is redirected to address 2. Allow users to access the content they need.
  • the way of redirection can include 301 redirect, 302 redirect, or meta fresh method, etc.
  • Logical unit number (logical unit number, LUN)
  • a LUN is a division of storage units, which can specifically include numbers, symbols, and character strings.
  • LUN is used to identify a logical unit. That is, the storage system partitions the physical hard disk into various parts with logical addresses, and then allows users to access the storage system. Such a partition is called a LUN.
  • LUN also refers to logical disks created on SAN storage.
  • Devices often need to communicate, such as data migration, data push, data reading, or data writing between storage devices.
  • the sending end and the receiving end of the communication may be distributed in different regions, or in different clouds (such as different public clouds, or different private clouds, or different hybrid clouds), and it is difficult for the sending end and the receiving end to Direct access through the Internet makes it impossible for devices to be interconnected, resulting in the inability of data to flow.
  • users may have multiple sets of storage devices and/or multiple sets of services. These storage devices (and/or services) may be located in different regions offline, and distributed in different regions online. In the cloud, communication connections cannot be made due to communication area restrictions or security controls.
  • FIG. 1 is a schematic diagram of a possible data flow scenario provided by an embodiment of the present application.
  • the storage device 1 is located behind the firewall 101
  • the storage device 2 is located behind the firewall 102
  • the service 1 is located behind the firewall 103 . Since the storage device 1 and the storage device 2 are located in different firewalls, if the storage device 1 initiates a request (such as a connection request or a data flow request) to the storage device 2, it will be rejected by the firewall 102 of the storage device 2 .
  • the storage device 2 initiates a request to the storage device 1, it will be rejected by the firewall 101 of the storage device 1 .
  • storage device 1 and storage device 2 cannot establish a connection, making it difficult for data to flow between storage device 1 and data 2 .
  • the connection and data flow between the service 1 and the storage device 2 are also restricted by the firewall 103 and/or the firewall 102 .
  • storage device 1 is a data center of enterprise A, which stores a large amount of business data.
  • Storage device 2 is a public cloud virtual machine purchased by enterprise A.
  • Enterprise A needs to push the business data in storage device 1 to the public cloud for data analysis. Since the connection between storage device 1 and storage device 2 cannot be established, the business data in storage device 1 cannot be pushed to the public cloud. It cannot meet the data flow requirements of enterprise A.
  • service 1 may be a data management service in a private cloud established by enterprise A, and since service 1 is located behind the firewall 103, it cannot meet the service usage requirements of enterprise A.
  • the embodiments of the present application provide a data processing system, a data processing method, and related devices, which can realize the interconnection between devices and meet the requirements for data flow between devices.
  • FIG. 2 is a schematic diagram of a possible data processing system 20 provided by an embodiment of the present application.
  • the data processing system 20 may include a storage layer 201 , a virtual bus 203 and a service layer 202 .
  • the storage layer 201 includes one or more storage devices, and the storage layer is used to provide storage space.
  • the storage device is a device for providing storage space.
  • This application does not limit the organization form (such as centralized or distributed, etc.), existence form (such as physical device or virtual device, etc.), storage space size, location, provider, etc. of storage devices.
  • the storage device may include a centralized storage device, such as a storage array or a storage server, and may also include a public cloud, a private cloud, a hybrid cloud, etc., or part or all of the storage space in the cloud.
  • a centralized storage device such as a storage array or a storage server
  • a public cloud such as a public cloud, a private cloud, a hybrid cloud, etc., or part or all of the storage space in the cloud.
  • the storage device may include a physical device, such as a storage device of a solid-state hard disk, a mechanical hard disk, or other types of storage media, or a storage server, etc., and may also include a virtual device, such as a container, a virtual machine, or a virtualized physical device The storage pool formed after processing, etc.
  • a physical device such as a storage device of a solid-state hard disk, a mechanical hard disk, or other types of storage media, or a storage server, etc.
  • a virtual device such as a container, a virtual machine, or a virtualized physical device The storage pool formed after processing, etc.
  • the service layer 202 may contain one or more services.
  • a service is a resource provided by a service provider or a series of activities completed on a resource, and these resources or activities can solve user problems or satisfy user needs.
  • the services here can include the following categories:
  • Resources For example, resources such as storage resources (such as object storage services), computing resources (such as elastic cloud servers, cloud phones, etc.), network resources (such as virtual private clouds, or elastic load balancing, etc.).
  • storage resources such as object storage services
  • computing resources such as elastic cloud servers, cloud phones, etc.
  • network resources such as virtual private clouds, or elastic load balancing, etc.
  • Category 2 Activities performed on resources. Further, these services may include basic services, management services, user-oriented services, and so on.
  • the basic service is the service that the virtual bus depends on, and the virtual bus maintains operation based on the basic service.
  • Basic services may include, for example, security services, message queue services, and the like.
  • the management service is used to manage the virtual bus itself, or manage the services of the users of the virtual bus, such as data query service, data view service, data flow orchestration and management service, identity authentication service, cloud supervision service, or cloud audit service.
  • User-oriented services may include, for example, registration services, data processing services (such as data query, data analysis, big data insights, etc.), software development platform services (such as development platforms, project management, project deployment, code hosting, or mirroring services, etc.) , artificial intelligence services (such as natural language processing, content review, image recognition, etc.), application running services (such as cloud performance testing services), etc.
  • data processing services such as data query, data analysis, big data insights, etc.
  • software development platform services such as development platforms, project management, project deployment, code hosting, or mirroring services, etc.
  • artificial intelligence services such as natural language processing, content review, image recognition, etc.
  • application running services such as cloud performance testing services
  • Category 3 Software that the user registers in the virtual bus. For example, users can register their developed data analysis software on the virtual bus to form services.
  • service layer may include more or fewer service categories during specific implementation.
  • Part of the services in the service layer can be provided to users in a paid mode.
  • some services in the service layer may be user-oriented value-added services, such as identity authentication services, cloud monitoring services, and so on.
  • the virtual bus 203 may also be called a virtual bus layer or an intermediate layer.
  • Users of the virtual bus are devices connected to the bus (hereinafter referred to as devices for convenience of description), such as one or more of storage devices in the storage layer or services in the service layer.
  • the virtual bus can receive device registrations (or releases, hereinafter referred to as registrations for ease of description), thereby forming a virtual network including multiple devices. Registered devices can be discovered by other devices. Correspondingly, users of the virtual bus can discover other devices through the virtual bus, thereby realizing the interconnection between devices.
  • the device submits registration information to the virtual bus for device registration.
  • the registration information is used to describe the configuration of the device, or describe the data stored in the device, or describe the type of service provided by the device, and the like.
  • the registration information includes one or more of configuration information of the device, metadata of data stored in the device, or information such as a storage pool of the device.
  • the configuration information includes an identifier (identifier, ID, for example, a unique identifier (universally unique identifier, UUID)), a name (name), a model (model), a version number, a geographical area (for example, it can be Western China, Central China) , North China region, etc.), location (location, which can be city name, building name, computer room number, or rack number, etc., or described by latitude and longitude), (physical) address (address), Internet protocol (internet protocol, One or more of IP address, media access control (media access control, MAC) address, port number, management IP, or application programming interface.
  • the device may be managed by a certain management device or through a certain management port, and the aforementioned management IP is the IP address of the management device or the management port.
  • the metadata of the data can include information about the storage pool where the data resides, the LUN corresponding to the data, information about the file system, directory information shared to the global file system, directory information mounted from other devices, and status information (For example, the online status of the file system where the data is located, whether it is faulty, space usage, etc.), the name of the data, the shared address (if the data is shared, other devices can access the data through the shared address of the data), or permission One or more of certificates, etc., wherein the shared address may include an IP address, a MAC address, or a uniform resource locator (uniform resource locator, URL) address, etc.
  • the shared address may include an IP address, a MAC address, or a uniform resource locator (uniform resource locator, URL) address, etc.
  • the information of the storage pool includes an identifier of the storage pool, an identifier of a hard disk included in the storage pool, free storage space in the storage pool, capacity of the storage pool, minimum access unit, etc., or multiple.
  • the storage pool may be obtained through virtualization. For example, a tenant can rent a segment of storage space from a storage service provider, and the storage space can be virtualized to obtain storage pool information.
  • the virtual bus includes one or more access points, as shown by dot 204 in FIG. 2 .
  • the access point refers to the path through which the device accesses the virtual bus.
  • the path which may also be called a resource address, indicates the location of the access point, and the device can find the access point through the path.
  • the path can be a URL, such as: https://virtualbus.s3.cn north1.amazonaws.com.cn.
  • the path of the access point is the directory of the bucket.
  • the access point may include two types of access points: a basic access point and a derived access point.
  • the path of the basic access point is consistent with the address requested by the user (hereinafter referred to as the access address), but the path of the derived access point is not consistent with the access address.
  • the virtual device accesses the basic access point
  • the user can be redirected to the derivative access point through the redirection mechanism, so that a new access point can be introduced on the basic access point to form an access point network .
  • the number of access points can be regulated and the flexibility of the system can be improved.
  • some basic access points may not be able to continue to be used due to failure, business adjustment and other reasons.
  • the access point of the device can be redirected to other available access points through the redirection mechanism, so as to adapt to the change of the access point and improve system flexibility and stability.
  • the device has a bus connector (or other functional units that can realize access to the access point, which will be described in detail below), and the device corresponds to the access point through the bus connector.
  • Write the registration information in the path that is, you can connect to the virtual bus.
  • the device registration process can be decentralized and serverless since there are usually multiple access points and they are distributed in different regions.
  • the device can also access the virtual bus through an access point that is relatively close or meets its own business needs, thereby improving the registration efficiency of storage devices and improving the experience of using the virtual bus.
  • An access point has its own information to describe the access point, which is referred to as information of the access point in this embodiment.
  • the information of the access point includes the access address, number, region, provider, etc. of the access point.
  • the device When registering, the device can acquire information of multiple access points, and write the above registration information to one or more of the access points, thereby realizing registration on the virtual bus.
  • This application does not limit the storage form of the information of the access point, for example, the information of the access point may be provided to the device in the form of a queue or a table.
  • Table 1 shows a list of possible access point information, for example, the access point numbered 1 is provided by provider P1, the location is China, and the access address is "https://virtualbus.s3.cn north 1.amazonaws.com.cn”; and so on for other access points.
  • the above access point information can be provided to the device through a white list.
  • the above-mentioned whitelist may be pre-configured or pre-defined in the device, for example, the whitelist is configured in the storage device when the device leaves the factory. Or optionally, the device can obtain a whitelist at a specified address.
  • the connection function module enables the device to connect to the virtual bus in a compliant connection process, and perform services related to the virtual bus (such as registering the device on the bus, pushing data to the object storage pool on the virtual bus, receiving/sending messages wait).
  • the connection function module can be implemented by software, hardware, or a combination of software and hardware.
  • the connection function module includes one or more of instructions for executing the bus connection process, security certificates, security contexts, keys, connection protocols, etc., to support devices to connect to the virtual bus in a compliant connection process middle.
  • FIG. 3 is a schematic diagram of a possible device provided by an embodiment of the present application.
  • the device 301 (such as the storage device shown in FIG. 1, or the service shown in FIG. 1, etc.) includes a bus connector.
  • the bus connector is a connection function module in the device 301, and is used for connecting an access point, thereby connecting to a virtual bus.
  • the bus connector can be realized by hardware (for example, a communication interface, or a transceiver, etc.), software modules, or a combination of software modules and hardware modules.
  • Bus connectors are also used to support traffic between device implementations and virtual buses.
  • the device 301 implements services such as bus registration, bus data release, or bus message processing through the bus connector.
  • the bus registration means that the device registers itself on the virtual bus.
  • the bus data release is used to push the metadata of the data stored in the registered device to the virtual bus, and is also used to push the data stored in the device to be registered to the virtual bus. After data or metadata is pushed to the virtual bus, it can be stored in the object storage resource pool of the virtual bus.
  • Bus message processing is used for devices to complete message-related functions, such as: writing messages to message queues in the form of virtual bus support, or reading messages in message queues, etc. (described below).
  • the registration format may be a pre-defined, pre-configured or protocol-specified data format, or the registration format may be a data format defined by the management service in the virtual bus, or the registration format may be the management service and storage via the virtual bus
  • the data format negotiated by the device is a key-value (KV) format.
  • KV key-value
  • the device uses its own identifier (such as UUID) as the prefix (prefix) of the key in the registration information.
  • Table 2 shows a possible registration information format by taking the registration information as device configuration information as an example, where the key of the configuration information is the path of the configuration information, and the value is the value corresponding to the key.
  • the key of the configuration information of the device is prefixed with the location address of the access point and the UUID of the device.
  • ⁇ vbus_access_point ⁇ is the path for the device to connect to the access point
  • ⁇ device-uuid ⁇ is the UUID of the device.
  • the access point connected to the device is the access point numbered 1 shown in Table 1, the UUID of the device is "cloud_0" as an example, and the key of the device name (name) is "https://virtualbus. s3.cn north 1.amazonaws.com.cn/cloud_0/name", the value is "shanghai-dorado88" (example).
  • the key of the device template (model) is "https://virtualbus.s3.cn north 1.amazonaws.com.cn/cloud_0/model”, and the value is "OceanS tor Dorado" (example).
  • the device registers uses the resource location address of the access point and the identifier of the device itself as a prefix.
  • Table 3 takes the metadata of the file system as an example, and shows a schematic table of metadata in a possible KV format.
  • the key of the registration information is prefixed with the access point, the type of data (such as file system, snapshot, etc.), and the identifier of the device.
  • “ ⁇ vbus_access_point ⁇ ” indicates the access point to be written by the storage device to be registered
  • "filesystem” indicates that the type of data is a file system
  • “ ⁇ fs-uuid ⁇ ” is the unique identifier of the file system.
  • the key is "https:// ⁇ vbus_access_point ⁇ /filesystem/ ⁇ fs-uuid ⁇ /name", and the value is "finance”. And so on for the rest of the metadata.
  • the device to be registered writes the registration information according to the set format.
  • the embodiment of the present application also provides another implementation mode.
  • this implementation mode after the device uploads the registration information to the virtual bus, other services (for example, the format conversion function provided by the virtual bus, or the service provided by the service layer, etc.) converts the registration information into a unified format and writes it into the virtual bus.
  • the transformed format is the same as or similar to the preset format described above, and will not be repeated here.
  • the virtual bus contains storage space, and the registration information of the device can be stored in the storage space of the virtual bus.
  • the virtual bus provides storage space through the object storage service as an example, and the object storage service contains multiple object buckets.
  • the path of the access point of the virtual bus is the directory of the object bucket, and the registration information of the device can be stored in the directory of the corresponding object bucket.
  • the storage space of the virtual bus may be called an object storage pool.
  • An object storage service usually includes multiple object buckets, which are the storage space provided by the object storage service. That is: an object storage pool can contain multiple object buckets provided by one or more object storage services.
  • FIG. 4 shows a schematic diagram of a possible object storage pool.
  • the object storage pool 401 includes m (m>0) object buckets, and the path of the access point 1 is the directory of the object bucket 1, wherein the access There may be n (n>0) points.
  • n n>0 points.
  • the registered device as cloud_0 and the registered access point as access point 1 as an example, when cloud_0 registers, its registration information is stored in the object bucket in KV format, and the key is "path of access point 1/cloud_0 /" as a prefix.
  • the key corresponding to the name of the device cloud_0 is "access point address 1/cloud_0/name", and the value is written into the storage space corresponding to the path.
  • object storage services may include public cloud object storage services, private cloud object storage services, or hybrid cloud object storage services, etc., or object storage services may be deployed locally, for example, enterprises use computing devices and storage devices , to establish an offline object storage service.
  • virtual bus provides storage space based on the object storage service of the public cloud, based on the global accessibility of the public cloud, a virtual network of devices covering a wide area can be realized.
  • the virtual bus can provide device discovery and sharing functions, and a device can discover other registered devices through the virtual bus.
  • the device may discover devices connected to the virtual bus through the access point by scanning the access point.
  • FIG. 5 is a schematic diagram of a view of a possible device on a virtual bus according to an embodiment of the present application. As shown in FIG. 5 , the device scans the access point 1 so as to obtain information about the device connected to the virtual bus from the access point 1.
  • the device information may include the aforementioned device registration information.
  • security services may include security protection in one or more of the following three aspects:
  • the first aspect the grant and withdrawal of authority.
  • the permission may include viewing permission, access permission, communication permission (such as the permission to send a message to a specified device, or the permission to receive a message from a certain device, etc.), data push permission, and the like.
  • the party requesting permission is called the requester.
  • the security service may determine whether to grant permission to the requester based on conditions such as whether the requester's identity is authenticated, whether the requester purchases related services, and the like.
  • Case 1 The discoverer needs to obtain access authorization first to access storage devices and/or services. For example, before accessing the storage device, the discoverer can request authentication information (for example: Access Key) from the security service, and the discoverer can request to access the storage device based on the authentication information.
  • authentication information for example: Access Key
  • Case 2 The discoverer should return the authorization when it does not need to use the authorization. For example, after the discoverer finishes accessing the storage device, he can return the requested Access Key to the security service.
  • Case 3 A security service can revoke granted permissions.
  • the security service may revoke the granted authentication information, or the security service may invalidate the issued Access Key.
  • ACL access control lists
  • ACL is an access control technology based on conditional filtering, which can filter a device's request for certain data according to set conditions, allowing it to pass or discarding it.
  • an ACL is established for the registration information of the storage device 1 in the virtual bus, and only users who satisfy a certain condition can read the registration information of the storage device 1 .
  • the ACL may check the read request of the storage device 2 according to the set conditions.
  • the storage device 2 is allowed to read the registration information of the storage device 1 .
  • the ACL may also be based on a single object. For example, use the ACL to perform access control on the IP address in the registration information of a certain storage device.
  • the security service can encrypt the data in the KV format stored in the virtual bus, so as to further ensure the security of the data. Furthermore, encrypted keys can be managed through the key service.
  • the key service can be integrated in the security service, or it can be an independent key service.
  • the virtual bus can receive the registration of devices, and the registered devices can be discovered by other devices based on the virtual bus.
  • the virtual bus can be regarded as an intermediary to realize the indirect connection between devices, so that multiple devices can be connected without direct interconnection. discover and communicate with each other.
  • the storage devices and services connected to the virtual bus can be heterogeneous, for example, the hardware model, software version, manufacturer, location, or file system of the storage devices can be different from each other, and the content of the service , form, or structure etc. can be different from each other.
  • Heterogeneous devices can also be connected to each other through the layout specification provided by the virtual bus, so as to establish a loosely coupled and distributed data collaboration relationship.
  • FIG. 6 is a schematic diagram of a usage scenario of a possible data processing system provided by the embodiment of the present application.
  • the storage devices in the storage layer can perform device registration on the virtual bus 603, and the registered storage devices can be discovered by other devices.
  • the storage devices in the storage layer may include online storage devices, such as cloud 601a, cloud 601b, or cloud 601c, etc., or offline physical devices, such as storage device 602a, storage device 602b, storage device 602c, or storage device 602d and so on.
  • the multiple online storage devices may come from different providers, for example, cloud 601a may be Huawei Cloud, cloud 601b may be Amazon Cloud, and cloud 601c may be Facebook Cloud.
  • storage device 602a may be a mass storage service OceanStor
  • storage device 602b may be a mass storage service OceanStor Cube
  • storage device 602c may be a mass storage service OceanStor Pacific
  • storage device 602d may be a mass storage service OceanStor Dorado.
  • the service layer can provide services based on the virtual bus 603, for example, the services in the service layer can include service 604a.
  • the service description in the service layer refer to the description of the service layer in FIG. 2 .
  • the service 604a may be implemented through a data management engine (data management engine, DME).
  • DME data management engine
  • DME can be used to present a global view, set device status, or manage data flow in a device, etc.
  • the DME can be managed by the administrator 605, for example, the DME 604a responds to the input and operation of the administrator 605, and executes corresponding services.
  • the administrator 605 here may be a person, or may be a management service implemented by a management device, a computer program, or the like.
  • users may have multiple sets of storage devices, which may be distributed in different regions or in different clouds.
  • multiple storage devices can be connected to each other through a virtual bus, which can meet the user's connection requirements for devices in different regions and with different architectures.
  • the structure of the virtual bus is introduced above, and some possible functions provided by the virtual bus are described below.
  • the virtual bus is essentially the virtualization of storage services and message services. Among them, the storage services are used to store registration information and data after virtualization processing, and the message services are used to realize communication between devices, thus forming a comprehensive and available A virtual network of storage resources and message service resources.
  • the function of the virtual bus can be realized through the following four exemplary designs:
  • the virtual bus can implement data storage through the object storage service.
  • the type of object storage service may be one type, or may be multiple types, and may even include multiple types of object storage services from multiple providers.
  • the object storage service can include one or more of AWSS3 provided by Amazon, HuaweiOBS provided by Huawei, or Azure Blob provided by Microsoft.
  • the storage space provided by the object storage service can form an object storage pool.
  • the object storage pool may be used to store access point information, device registration information, data pushed by the storage device, or data metadata, and the like.
  • the object storage service provides persistent object storage
  • the content stored in the storage pool is persistent and can always be kept in the object storage pool.
  • the user of the virtual bus can view the content stored in the object storage pool through the virtual bus at any time, such as discovering other devices.
  • FIG. 7 is a schematic diagram of a possible virtual bus device directory provided by an embodiment of the present application.
  • cloud_0 is the UUID of the device
  • proc directory is the configuration information of the device, such as capabilities (capabilities), contacts (contacts), Internet protocol version 4 (internet protocol version 4, ipv4) related information, Internet protocol version 6 (internet protocol version 6, ipv6) related information, location (location), model (model), port (ports), or version (version) and other information.
  • the capabilities include data used to describe the functions of the device, such as communication capabilities, storage capabilities, and supported file systems of the device.
  • the contacts include information about the user or the device maintaining the device, for example, the contact information, physical address, IP address, etc. of the device administrator corresponding to the device.
  • the ipv4 and ipv6 shown in Figure 6 are used to store related files and keys of the communication protocol.
  • the registration information may include storage pool information.
  • the pool0 directory is the information of the storage pool in the device, for example, it can include the information of the disks (disks) used to obtain the storage pool (that is, disk), the free space (free) of the storage pool, and the information of the disk slices. Grain size (grain_size), or storage pool size (size), etc.
  • the registration information shown in Figure 7 is persistently stored in the resource storage pool, and the user of the virtual bus can read (or write, edit) the information through the path at any time when the device is accessible and the permission permits.
  • the information in the directory, or write related information For example, after the registration information of the device is stored in the directory shown in FIG. 7 , the discoverer can access the registration information of the device through the path.
  • the path for the discoverer to access contacts can be: https:// ⁇ vbus_access_point ⁇ /cloud_0/proc/contacts, and so on for the rest of the information.
  • the content information stored in the storage pool is persistent, when the configuration of the device changes, it may cause the problem that the device cannot be accessed through the original registration information. Therefore, at some point, the content stored in the storage pool may be updated, or by adding additional information for the discoverer (the discoverer here refers to users of other virtual buses except the discovered target device) to judge Timeliness of stored content.
  • the discoverer can judge whether the target device is still in the active state.
  • Solution 1 The online status is the time when the device updated its own status (or information) last time. It should be understood that the time here may be a moment, a time stamp, or a time length, and the like.
  • FIG. 8 is a schematic diagram of a possible online status provided by the embodiment of the present application. It can be seen from area 801 that the storage device 1 was last updated at 15:50:00 on October 12, 2021. Similarly, the last update time of storage device 2 is "minutes ago" The last update time of storage device 2 is "1634025000", where "1634025000" is the timestamp representation method, which can correspond to 15:50 on October 12, 2021 minutes 00 seconds.
  • the discoverer can determine whether the target device is in an active state according to the time when the target device updated its own status last time.
  • the target device is in an active state if the time when the target device updated its own state last time is within a preset first time period (for example, the first time period may be half an hour, 1 hour, or 1 day, etc.). Taking the first duration as 1 hour as an example, the last update time of the storage device 2 is 5 minutes ago, and if it falls within the range of the first duration, the storage device 2 is in an active state.
  • a preset first time period for example, the first time period may be half an hour, 1 hour, or 1 day, etc.
  • the target device is in an inactive state.
  • the last update time of storage device 1 is "2021.10.12 15:50:00"
  • the current time is "2021.10.14 10:00:00”
  • storage device 1 is inactive.
  • the above only takes two active states as examples for illustration, and more or fewer types may be set in specific implementations, which are not listed here.
  • the above-mentioned online status setting of the storage device is also applicable to the service, as shown in Figure 8, the service in the service layer can also correspond to the update time, which is used by the discoverer to judge whether it is in an active state.
  • the online state may be an identifier indicating whether the device is online or not, or indicating the status of the connection of the device, such as the following identifiers: online, offline, or in the process of being connected.
  • FIG. 9 is a schematic diagram of a possible online state provided by the embodiment of the present application. It can be seen from area 901 that the corresponding online state of the storage device 1 is "online”. Similarly, the online status corresponding to the storage device 2 is "offline”.
  • the discoverer can determine whether the target device is still active by sending an inquiry message to the target device and receiving a response message from the target device.
  • the inquiry message may be implemented through a message mechanism.
  • the message mechanism may be pre-defined or pre-configured.
  • inactive devices or offline devices may correspond to different management policies or communication policies. For example, if a device is inactive, it can be flagged in the data view. For another example, if the device is not online, the management service can remind the administrator to restore its online status.
  • the discoverer determines the communication strategy for the target device based on the online status of the target device. Taking Figure 9 as an example, if the target device that the finder wants to communicate with is storage device 1, and the online status of storage device 1 is "online”, the finder can communicate with storage device 1; if the finder wants to communicate with The target device is storage device 2. Since the online status of storage device 2 is "offline”, the discoverer cannot communicate with storage device 1 at this time; The presence status is "Connecting", at which point the finder can revisit the presence status of the target status after a certain time interval.
  • some public background services can also periodically scan the bus and clean up old, inactive registration information.
  • the virtual bus can provide metadata service, which is used to define the format of metadata, or convert the registration information of the device into a certain data format.
  • the management service of the virtual bus includes the metadata service.
  • the device When the device registers, it can upload the original registration information to the metadata service, and the metadata service converts the original registration information of the device into registration information in a fixed format and writes it into the access point.
  • the metadata service may be implemented by the storage resources and computing resources of the virtual bus itself, or may be implemented through an independent service.
  • the metadata service is implemented by an independent service, the metadata service belongs to a service in the service layer.
  • the virtual bus can provide a data tunnel to realize the indirect data flow capability between multiple devices.
  • storage device 1 when storage device 1 is the sender and storage device 2 is the receiver, storage device 1 can push data to a designated location, and storage device 2 can obtain data from the designated location.
  • the designated location may be a predefined location, a location negotiated between the sending end and the receiving end, or a location indicated by the receiving end.
  • the storage space of the virtual bus can contain a pre-divided cache space, which is used as a location for temporarily storing data. At this time, the sending end can push the data to a certain location in the buffer space, and the receiving end can Pull data from this cache space.
  • the virtual bus provides message services, and multiple devices can provide message services through the virtual bus for message communication.
  • the message service may be realized by the storage resources and computing resources of the virtual bus, or may be realized by independent storage resources and computing resources.
  • the message service can belong to a service in the service layer.
  • Implementation mode 1 the message service is provided through the message queue service.
  • the message queue service includes storage resources and computing resources, and can complete one or more of the following functions: store message queues, receive written messages, notify message receivers to obtain messages, and so on.
  • a message queue is used to store one or more messages.
  • the messages in the message queue can be actively written by the device, or triggered when an operation is performed.
  • storage devices or services in the virtual bus can actively write messages to the message queue.
  • a user of the virtual bus performs an operation on an object in the virtual bus (for example, operations such as adding, deleting, editing, or searching), a message may be triggered, and the message may be recorded in a message queue.
  • the message queue service can be used to implement various types of message queues, for example, it can include control message queues, notification message queues, or broadcast message queues.
  • first message queue is used to store messages sent to the storage device 1
  • message queue 2 is used to store messages sent to the storage device 2 . It should be understood that the first message queue and the second message queue may be the same message queue or different message queues.
  • the storage device 2 When performing data flow, the storage device 2 writes a request message into the first message queue, and the request message indicates that the storage device 1 is requested to push certain data.
  • the message queue service can notify the storage device 1 to read the message.
  • the storage device 1 obtains the request message by reading the first message queue, so as to push the data to the specified location.
  • the storage device 1 may write a data ready message to the second message queue.
  • the message queue service can notify the storage device 2 to read the message.
  • the storage device 2 reads the second message queue to obtain the data ready message, so as to pull the data pushed by the storage device 1 at the specified location. In this way, the data flow process is completed through the message queue service and the virtual bus.
  • the message queue is implemented based on the storage space of the virtual bus itself, or in other words, the message queue is stored in the storage space of the virtual bus.
  • the message queue can contain one or more messages, and the messages in the message queue can be actively written by the device or triggered when an operation is performed.
  • the message receiver obtains the message by periodically or aperiodically reading (or listening) the message queue.
  • the message queue is stored in the object storage pool of the virtual bus in KV format.
  • the key of the message queue may have a fixed format, and the format may be pre-defined or pre-configured.
  • the key of the message queue can contain the specified key prefix. Exemplarily, taking the data processing system shown in FIG.
  • the prefix of the key of the message queue of the storage device 1 can be: https:// ⁇ vbus_access_point ⁇ /cloud_0/sys/massage/, wherein, “ ⁇ vbus_access_point ⁇ " is the path of the access point connected to device cloud0, "cloud_0" is the UUID of storage device 1, "sys” is the directory where the system files of storage device 1 are located, and “massage” is the location of the message queue under the sys directory . It should be understood that the UUID, file name, etc. here are only examples, and are not intended to limit the storage location of the message queue. For example, “massage” may also be replaced with other names, such as "cli".
  • FIG. 10 shows a schematic diagram of a directory of content stored in a possible storage pool, as shown in area 901 , where the message queue related to device cloud0 is stored.
  • path/cloud_0/sys/maggese/request/dorado_0/1 of access point 1 indicates a request (request) message about device cloud0, and the sender of the message is UUID "dorado_0" device, the number of the request message is 1.
  • the path of access point 1/cloud_0/sys/maggese/response/dorado_0/1 indicates the response (response) message sent by the device cloud0, and the receiver of the response message is the device whose UUID is "dorado_0" , the response message is the response to the request message numbered 1.
  • the device can obtain messages to be processed by periodically or aperiodically reading the message queue.
  • This method can be implemented based on the object storage service, without the need to establish a message queue service, and has good independence.
  • the storage device 1 when storage device 1 (whose UUID is, for example, “cloud_0”) is the sending end, and storage device 2 (whose UUID is, for example, “dorado_0") is the receiving end.
  • the storage device 2 writes a request message (number of the request message is 1) to the request message queue of the storage device 1, and the request message indicates that the storage device 1 is requested to push certain data.
  • the key of the request message written by the storage device 2 is: "path/cloud_0/sys/maggese/request/dorado_0/1 of the access point 1".
  • the storage device 1 can obtain the request message by reading the request message queue, that is, read the content stored in the path "path/cloud_0/sys/maggese/request of access point 1", and push the data to the specified location.
  • the storage device 1 may write a data ready message to the response message queue.
  • the key of the data prepared message is: "path/cloud_0/sys/maggese/response/dorado_0/1 of access point 1".
  • the storage device 2 can obtain the data ready message from the response message queue, so as to pull the data pushed by the storage device 1 at a specified location. In this way, the data flow process is completed through the virtual bus.
  • the above two implementation methods only use data flow between storage devices and storage devices as an example to illustrate the process of data flow, and are not intended to limit data to only flow between storage devices.
  • the virtual bus can also support data flow between services or between storage devices and services.
  • the data flow methods in the above two scenarios can also refer to the data flow methods between devices, which are not discussed here. Let me repeat them one by one.
  • the flow of data across the virtual bus is done in a push-pull fashion.
  • the sender pushes the data to the specified location, and the receiver pulls (downloads) the data from the specified location.
  • the virtual bus controls the sequence of data flow through message services.
  • FIG. 11 is a schematic flowchart of a possible data processing method provided by the embodiment of the present application.
  • the sending ends may be devices of the same type, or devices of different types. That is, the data flow process shown in FIG. 11 may specifically be a data flow between storage devices, a flow between a storage device and a service, or a flow between a storage device and a service.
  • the method can be applied to the data processing system shown in FIG. 2 .
  • the data processing method shown in FIG. 11 includes steps S1101 to S1105.
  • Step S1101 the receiving end acquires the indication of the message queue of the sending end of the data.
  • the receiving end is the destination point of the data
  • the sending end is the source of the data.
  • FIG. 2 when the data in the storage device 1 needs to be migrated to the storage device 2, the storage device 1 is the sending end, and the storage device 2 is the receiving end.
  • the storage device 1 when the data stored by the user in the storage device 1 needs to be sent to the service 1 for data analysis, the storage device 1 is the sending end, and the service 1 is the receiving end.
  • the number of receiving end and sending end may be one or more.
  • storage device 1 may copy data to storage device 2, storage device 3, and storage device 4. At this time, storage device 1 is the sending end, and storage device 2, storage device 3, and storage device 4 are receiving ends.
  • the sender's message queue is used to record messages about the sender. There can be one or more message queues at the sender. If the sender has multiple message queues, the multiple message queues can belong to different types of message queues.
  • the message queue at the sending end may include a request message queue, a response message queue, and the like.
  • the request message queue is used to store messages sent by other devices to the sender
  • the response message is used to store messages sent by the sender to other devices.
  • message queues may include point-to-point message queues, multicast message queues, broadcast message queues, and so on.
  • the number of receivers of the message in the point-to-point message queue is usually 1, and the receiver of the message in the multicast message queue belongs to a communication group (there can be multiple receivers in the group, also can have 1 receiver, even There may be no receivers in the communication group), and the message receivers in the broadcast message queue are all receivers in a range.
  • the scope here may cover the entire virtual bus, or cover some pre-divided devices on the virtual bus.
  • the message queue may include a control message queue, a data message queue, and the like.
  • the message queue indication is used to mark a certain message queue.
  • the indication of the message queue may include the number of the message queue, the ID of the message queue, the path of the message queue, or the URL of the message queue.
  • the receiving end can find the message queue of the sending end according to the instruction of the message queue.
  • the virtual bus provides a message queue service, in which message queues corresponding to multiple devices are stored, and different message queues are distinguished by message queue IDs.
  • the receiver can obtain the message queue ID of the sender from the message queue service based on the virtual bus.
  • the message queue corresponding to the device is stored in the resource storage pool of the virtual bus.
  • the sender is a device whose UUID is "cloud_0”
  • the indication of the message queue at the sender may be: "path of access point 1/cloud_0/sys/maggese/”.
  • the indication of the message queue at the sending end may be included in the metadata.
  • the receiving end scans the access point from the virtual bus, and can obtain the metadata under the access point, so as to obtain the indication of the message queue of the sending end. For example, as shown in FIG. 10 , the receiving end scans the access point 1 to obtain the metadata under the access point, so as to obtain the indication of the message queue of the device whose UUID is "cloud_0".
  • Step S1102 the receiver device adds a data push request in the message queue.
  • the receiving end may add a new message in the message queue according to the instruction of the message queue. Since the receiving end needs to obtain the data of the sending end, a data push request can be added in the message queue.
  • the receiving end device can add a data push request to the message queue of the sending end through the message queue service according to the instruction of the message queue.
  • the sending end device can add a data push request under the path corresponding to the message queue according to the instruction of the message queue.
  • the sending end when the sending end is a device with UUID "cloud_0", the receiving end can add Data push request.
  • the data push request for example, the number is 1
  • the data push request can be specifically added to "the path of access point 1/cloud_0/sys/maggese/request/dorado_0/1".
  • the sending end writes a data push request in the message queue, and the receiving end can obtain the data push request.
  • the message queue service can notify the sender that there are messages to be read through the doorbell mechanism.
  • the sender can obtain the data push request from the message queue.
  • the sender can periodically or aperiodically read its own message queue, so as to obtain the data push request.
  • This method can also be called a polling (polling) mechanism, that is, the sender periodically or aperiodically reads the message queue in turn to find unprocessed messages.
  • the receiving end may include a bus connector 301 as shown in FIG. 3 , and the receiving end implements message-related services through the bus connector. For example, the receiving end writes a data push message to the message queue, reads messages in the message queue, etc. in a predefined format or according to a communication protocol through the bus connector.
  • the sending end may also include a bus connector 301 as shown in FIG. 3 .
  • the sender reads the message in the message queue through the bus connector, writes the data and prepares the message to the message queue, and so on.
  • Step S1103 The sender pushes the data to a specified location in response to the data push request.
  • the specified location provides storage space for storing the data pushed by the sender.
  • the specified location can be described by means of path, URL or the like.
  • the specified location may be in the storage space of the virtual bus.
  • the specified location may be located in the resource storage pool of the virtual bus.
  • FIG. 12 shows a schematic diagram of directories in a possible storage pool.
  • the object storage pool 1201 includes k (k>0) object buckets, wherein object bucket 1 and object bucket m are used to store registration information, etc., and then Entry point k and so on are used to cache data.
  • the sending end is a device with UUID "cloud_0” and the receiving end is a device with UUID "dorado_0”
  • the specified location can be the following path: "object bucket number m/temp_0/cloud_0/data_1".
  • the sender can push the data to the storage space corresponding to the above path for storage.
  • the specified location may also be other storage locations located in the non-virtual bus.
  • the sender When pushing data, the sender first determines the specified location to push the data, and then pushes the data to the specified location. According to the date, the sender can determine the designated location of the data to be pushed in the following ways:
  • Mode 1 It may be a pre-configured, pre-defined storage location. For example, pre-allocate storage space in the object storage pool of the virtual bus to cache the data pushed by the sender.
  • Mode 2 The sender selects a section of storage space from available storage locations as a designated location for cached data.
  • Mode 3 The sending end and the receiving end jointly negotiate to obtain the specified location.
  • Mode 4 The receiving end selects a section of storage space from available storage locations as a designated location for caching data. Further, in this case, the receiver can include the specified location in the data push request, so that the sender can know where the data is pushed according to the data push request.
  • Step S1104 the receiving end acquires a data ready message.
  • the receiving end can obtain the data ready message through the message queue.
  • the message queue here may be the reply message queue of the sender, or a point-to-point message queue, a multicast message queue, and the like.
  • the sender after the sender pushes the data to the specified location, it can write a data ready message (the message number is 1) in the following path: "Path of access point 1 /cloud_0/sys/ maggese/response/dorado_0".
  • the receiver can periodically or aperiodically read the message queue in the path to obtain the data ready message.
  • the message queue service includes a broadcast message queue, and the receiver can accept messages in the broadcast message queue. After the sender pushes the data to the specified location, it can write a data ready message in the broadcast message queue. Correspondingly, the receiving end obtains the data ready message by reading the broadcast message queue.
  • the virtual bus can use the message queue service in the form of a doorbell mechanism to remind the receiving end to read the data and prepare the message.
  • the receiving end can obtain the data ready message.
  • Step S1105 the receiving end acquires data.
  • the receiving end may pull (download) data (or data fragments) from the aforementioned specified location.
  • the sender can push data in the form of multiple data fragments. At this point, each time the sender uploads a data segment, it writes a data ready message. The receiving end receives multiple data fragments respectively through multiple data ready messages, and combines them to obtain complete data.
  • the sender can also pull data from a specified location on demand.
  • Pulling on demand means that the receiving end starts to pull only the metadata, or pulls part of the data, or pulls both the metadata and part of the data.
  • the unpulled data needs to be used, the remaining part of the data or all the remaining data is pulled through the virtual bus.
  • the virtual bus can be regarded as a data tunnel, and data flow between devices can be realized based on the virtual bus.
  • the process of data flow can be controlled.
  • the message queue service can also actively remind the receiver to process the message, shorten the waiting time for the message to be processed, and improve the efficiency of data flow.
  • the device that initiates the data push request and the device that receives the data may be different devices.
  • FIG. 13 is a schematic flowchart of another possible data processing method provided by the embodiment of the present application, including steps S1301 to S1305 .
  • steps S1101 to S1105 please refer to the relevant descriptions of steps S1101 to S1105 .
  • the requesting end is a device that initiates a data push request
  • the sending end is a data outflow party
  • the receiving end is a data inflow party.
  • the requesting end may obtain an indication of the message queue of the sending end, and add a data push request to the message queue.
  • the sender can actively read the data push request from the message queue, or the message queue service reminds the sender to read the data push request. After the sender obtains the data push request, it can push the data to the specified location.
  • the receiving end can actively read the data ready message from the message queue, or the message queue service reminds the receiving end to read the data ready message.
  • the receiving end can obtain data from the specified location.
  • the offline data center of enterprise A has a file system snapshot, and now enterprise A needs to push the data in the file system to the virtual machine (virtual machine, VM) of the public cloud for data analysis.
  • virtual machine virtual machine
  • Step S01 The data center registers with the virtual bus, and the registration information includes metadata of the snapshot of the file system.
  • the access address of snapshot metadata on the virtual bus is as follows: https:// ⁇ vbus_access_point ⁇ /file/ ⁇ fs_uuid ⁇ /snapshots/2021-09-07.
  • ⁇ vbus_access_point ⁇ is the identifier of the access point
  • ⁇ fs_uuid ⁇ is the unique identifier of the file system.
  • the metadata of the snapshot includes the information of the message queue of the data center, such as the address and key of the message queue.
  • Step S02 The VM acquires bus access and acquires metadata through the virtual bus.
  • the VM can obtain metadata of snapshots registered in the data center.
  • Step S03 The VM obtains the message queue of the data center from the metadata of the snapshot.
  • a data center can have one or more associated message queues.
  • a data center may include a control message queue and a notification message queue, wherein the control message queue contains messages sent to the data center; the notification message queue contains messages sent by the data center.
  • the notification message queue there may be one recipient of the message (for example, point-to-point sending), or multiple recipients (for example, group sending or group sending, etc.).
  • a possible message queue control information (which may include information such as message queue metadata, message queue address, or message queue key) on the virtual bus can be as follows: https:// ⁇ vbus_access_point ⁇ /file/ ⁇ fs_uuid ⁇ /snapshots/2021-09-07/device/message_queue/ctrl-endpoint.
  • information of a possible notification message queue may be as follows: https:// ⁇ vbus_access_point ⁇ /file/ ⁇ f s_uuid ⁇ /snapshots/2021-09-07/device/message_queue/broadcast-endpoint.
  • information of a possible secret file of a message queue may be as follows: https:// ⁇ vbus_access_point ⁇ /file/ ⁇ fs_uuid ⁇ /snapshots/2021-09-07/device/message_queue/secrets.
  • Step S04 The VM sends a data push request to the control message queue.
  • the data push request may contain a location indication, which is used to specify the cache location of the data on the bus.
  • a possible location indication may be as follows: https:// ⁇ vbus_access_point ⁇ /file/ ⁇ fs_uuid ⁇ /snapshots/2021-09-07/relay/buffer.
  • Step S05 The data center receives the data push request, and pushes the data to a designated location.
  • the data center can periodically or non-periodically read the message queue, so as to obtain the data push request.
  • the message queue service of the virtual bus can remind the data center that there is a new message to be processed in the form of a doorbell, and correspondingly, the data center can obtain the data push request.
  • the data center when it pushes data, it may push all the data, or push data multiple times, and push a part of data each time.
  • the following uses the data center to push data in the form of multiple pushes as an example for illustration. This application is also applicable to the case of pushing all data at one time.
  • Step S06 The data center sends a notification to the notification message queue every time a part of the data is sent.
  • the data center may proactively write a data ready message to the notification message queue.
  • Step S07 The VM receives the data ready message by notifying the message queue, and downloads the data segment.
  • the VM can periodically or non-periodically read the message queue, so as to acquire the data ready message.
  • the message queue service of the virtual bus can remind the VM that there is a new message to be processed in the form of a doorbell, and accordingly, the VM can obtain the data ready message.
  • the data center and the VM may execute steps S06 and S07 multiple times, so that the VM receives the required data.
  • Step S08 The VM performs data analysis based on the received data.
  • the devices interconnected using the virtual bus are located within the security boundary of the enterprise or institution, and the security boundary here may be a security protection service such as a firewall.
  • the virtual bus does not send data to or read any information from the device or service connected to itself, so the firewall of the device or service will not affect the normal operation of the virtual bus.
  • Object storage and message queues in the virtual bus are only used to store data.
  • Each device or service connected to the virtual bus reads data from the virtual bus or writes data to the virtual bus according to its own business requirements, so as to realize interconnection and data transmission in a loosely coupled manner.
  • the use of virtual bus for interconnection can realize the interconnection between data and the data flow in the form of non-direct connection without destroying the security boundary of each device or service, so the data of devices and services within the security boundary can be guaranteed safety.
  • the virtual bus can aggregate metadata of devices, or metadata of data, etc., so as to provide a global view of devices and data.
  • information such as device metadata and data metadata, you can indicate a variety of valuable content about the data, such as: how to access the data, where the data is stored on the bus (for example, which object bucket, Store which access point, access prefix, etc.), which devices are currently sharing access, related message queue information, etc.
  • the information, or content can be collected by the management service (or the device that manages the virtual bus) to present a data-centric data management view.
  • the service 604a can view the data and metadata in the virtual bus through the DME, and generate a data-centric data management management view.
  • the management view can be presented in the form shown in FIG. 4 , based on the data management view, the administrator can determine the storage location of the data on the bus, related message queues, and the like.
  • information such as the degree of hotness and coldness of data can be marked, so as to facilitate data sharing and data security.
  • management device can select a subset of them according to business requirements to establish a federated file system (data plane).
  • the file system may be a file system established for a specific application scenario and a certain use requirement.
  • the life cycle of data is different from that of devices, and in most cases the life cycle of data is longer. For example, when a device fails or the device is decommissioned, the data needs to be migrated to other devices.
  • the device can be used as the medium for storing data and the channel (Access Point) for data access, and the two functions of "media for storing data” and "channel for providing data access” can also be separated of.
  • Design 1 the file system and device exist independently and do not belong to any device.
  • the file systems are distinguished by the identity information of the file system, and the identity information of the file system may be an ID, a serial number, etc. of the file system.
  • the identity information of the file system is globally unique, also known as the unique identifier of the file system. For example, a UUID can be used to uniquely identify a filesystem.
  • the unique identifier of the file system is defined when the file system is created and cannot be changed.
  • the file system has at least one device as the primary copy point (copy), and the primary copy can be changed.
  • the primary copy point For example, taking FIG. 2 as an example, when the storage device 1 (the original primary copy) is decommissioned, the file system can be migrated to the storage device 2, and at this time, the device 2 can serve as the new primary copy.
  • the file system has at least one device to provide an access point (Access Point).
  • Access Point an access point
  • the device providing the access point does not necessarily have a complete data copy, and the data can be streamed from any device with a file system data copy as required.
  • the metadata of the file system indicates which devices currently have a complete copy of the data.
  • the mount point of the file system on the device can be unified as ⁇ device ⁇ / ⁇ uuid ⁇ .
  • Primary file system The device where the current primary copy of the file system is located is called the primary file system (Primary FS).
  • Primary FS There can be multiple primary file systems, such as active-active file systems), and other access points that do not contain complete data are called shadow file systems (Shadow FS).
  • Shadow FS For any device, it has two file system types: primary file system and shadow file system.
  • Shadow file system There is a corresponding relationship between the shadow file system and the main file system.
  • main file system For a shadow file system, its corresponding main file system can be found through the global metadata.
  • the file system is registered to the bus with a UUID as an identifier, and other shadow file systems can obtain the master device by querying the bus records to complete the business process.
  • Each node (which can be a cluster) of the federated file system can share and obtain metadata through a virtual bus.
  • the target node does not care where the source file system is and which device provides it.
  • the required data can be obtained through the metadata.
  • the data layout of the file system provides the format of the metadata on the bus, so that all parties involved in the global file system can update the metadata in an orderly manner to complete file collaboration between multiple sites.
  • the storage space in the user's storage device may be large. At this time, registering a large storage space as an object will affect the efficiency of access and reading, and it is difficult to meet the user's needs.
  • the embodiment of the present application provides a flexible mapping method to realize the mapping from blocks to objects, so as to support various business requirements of the enterprise.
  • a block is a finite linear space.
  • a block can have multiple configuration types, for example, a thin provisioned block, or a full allocated block, etc.
  • the block of Thin provision is a sparse linear space.
  • a fully allocated block device can form a de facto sparse linear space by removing all zero intervals. Considering the logic of snapshots, multiple sparse spaces are related to each other.
  • the linear address of the LUN may be mapped as an object. For example, map each 4MB of space in a block as an object.
  • the key of the object includes prefix, LUN UID (unique ID), and logical block addressing (Logical Block Addressing, LBA) address, etc.
  • LBA address can also be replaced by the LBA address shifted to the right by N bits, N is an integer, For example: N can be 22.
  • a snapshot ID can be added between the LUN UID and the address, and the snapshot data is the superposition of snapshot 0 to current snapshot data.
  • This mapping table is a separate file, which is convenient for one-time download.
  • bitmap bitmap
  • a simple bitmap file only contains the data modified by the current snapshot.
  • FIG. 14 is a schematic diagram of a possible mapping of blocks into objects provided by an embodiment of the present application.
  • the key of each object file includes the prefix, LUNIUD and LBA, and the LBA is as shown in 1401, and the value is the space obtained by segmenting, and the size is 4MB.
  • segmentation may also be determined according to services, input/output (input/output, IO) size, etc., so as to meet user requirements as much as possible.
  • the segment size As an example, the following problem may occur: the user has to upload the entire segment after modifying a sector.
  • a mapping table can be added at the head of each segment (the size of the mapping table is within 4MB), so that the actual content of each segment is less than 4MB, which will make the snapshot mapping table larger.
  • the segment header map is also easily extensible to support compression and deduplication.
  • the base (base) data In order to support cascading snapshots (writable snapshots), the base (base) data should be placed under the LUN UID or the root directory of the snapshot. A snapshot data mapping table needs to go back to the parent directory until the level where the LUN UID is located.
  • a LOG space including multiple small files is established in the object storage, where the small files are leased by one or more fixed physical spaces.
  • the LOG space formed in this way can grow wirelessly.
  • Small files can use the LOG segment address as the key.
  • the actual size of the small file is smaller than the segment size.
  • it can be regarded as the mapping from the fixed-size linear space (logical address) of the LUN to the linear space (physical address) of the persistent layer, and the LUN is mapped in the form of "index + LOG".
  • the LOG space can be garbage collected through public cloud computing resources to reduce the waste of storage space and bandwidth. For example, through a preset strategy, LOGs within a certain period of time are merged to generate increments of continuous time periods. Exemplarily, keep all IOs of the last 15 minutes, merge every minute after 15 minutes, merge every 5 minutes after half an hour, merge every hour after 1 hour, merge every day after 1 day, etc.
  • the index also needs to be mapped to an object storage service (such as S3), which can be mapped through a checkpoint (checkpoint) or a LOG-based mapping.
  • object storage service such as S3
  • checkpoint checkpoint
  • LOG-based mapping For writable snapshots, you can create a separate LOG. The starting point of the LOG is not 0, but the latest physical address of the source LUN when the snapshot is formed. This address was previously read from the source LUN and then read from the snapshot LUN.
  • the embodiment of the present application also provides a computing node, the computing node includes a processor and a memory; at least one computer instruction is stored in the memory; the instruction is loaded and executed by the processor, so as to realize the method operation performed by the aforementioned virtual bus.
  • the embodiment of the present application also provides a storage device, the storage device includes a processor and a memory; at least one computer instruction is stored in the memory; the instruction is loaded and executed by the processor, so as to realize the execution of the aforementioned storage device or the storage layer. method of operation.
  • the present application also provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on at least one processor, the aforementioned version management method is implemented, for example, as shown in FIG. 11 described method.
  • the present application also provides a computer program product, which includes computer instructions, and when executed by a computing device, implements the aforementioned version management method, such as the method described in FIG. 11 .
  • words such as “exemplary” or “for example” are used as examples, illustrations or descriptions. Any embodiment or design described herein as “exemplary” or “for example” is not to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as “exemplary” or “such as” is intended to present related concepts in a concrete manner.
  • At least one refers to one or more, and the “multiple” refers to two or more.
  • At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items.
  • at least one item (piece) of a, b, or c may represent: a, b, c, (a and b), (a and c), (b and c), or (a and b and c), where a, b, c can be single or multiple.
  • “And/or” describes the association relationship of associated objects, indicating that there may be three types of relationships, for example, A and/or B, which may indicate: A exists alone, A and B exist simultaneously, and B exists alone. A and B may be singular or plural.
  • the character "/" generally indicates that the contextual objects are an "or” relationship.
  • first and second use ordinal numerals such as "first" and “second” to distinguish multiple objects, and are not used to limit the order, timing, priority or importance of multiple objects degree.
  • first message queue and the second message queue are only for the convenience of description, and do not represent the difference in the structure and importance of the first message queue and the second message queue.
  • first message The queue and the second message queue can also be the same device.
  • the program can be stored in a computer-readable storage medium.
  • the above-mentioned The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例提供一种数据处理系统、数据处理方法及相关装置,应用于存储技术领域。本申请实施例中的虚拟总线系统包含存储层、虚拟总线和服务层,其中,存储层包括多个存储设备,服务层包含多个服务;存储设备和/或服务经过注册可以将注册信息写入虚拟总线中,而经过注册的存储设备或服务可以发现其他设备或服务;存储设备与存储设备之间、存储设备与服务之间、服务与服务之间可以经过虚拟总线进行数据流动,实现多方设备能够在不直接互联的情况下彼此发现和交流。本申请还提供了数据处理方法及相关的装置。通过本申请实施例,能够实现设备之间的流动,满足用户对数据流动的需求。

Description

数据处理系统、数据处理方法及相关装置
本申请要求于2021年09月17日提交中国专利局、申请号为202111091182.4、申请名称为“一种数据处理方法及系统”的中国专利申请的优先权,以及于2021年11月27日提交中国专利局、申请号为202111426817.1、申请名称为“数据处理系统、数据处理方法及相关装置”的中国专利申请的优先权,前述二者的全部内容通过引用结合在本申请中。
技术领域
本申请涉及存储技术领域,尤其涉及数据处理系统、数据处理方法及相关装置。
背景技术
随着数据存储技术的发展,越来越多的数据可以存储多个存储设备中,这些存储设备可以分布在不同的地区、不同的云(例如不同的公有云、或不同的私有云、或不同的混合云)中。
当数据需要在不同的存储设备中进行流动时,由于不同的存储设备具有不同的组织形式、安全管理(例如防火墙),使得数据的流动变得困难。一些方案中通过建立站点间(site-2-site)虚拟专用网(site-to-site virtual private network,VPN)、数据缓冲区来实现设备互联,但是VPN、数据缓冲区等建立过程复杂,不满足用户的使用需求。
发明内容
本申请实施例提供了数据处理系统、数据处理方法及相关装置,能够实现设备之间的互联,满足对设备之间的数据流动的需求。
第一方面,本申请实施例提供了一种数据处理系统,包括:存储层、虚拟总线和服务层;
所述存储层包括多个存储设备,所述存储层用于提供存储空间;
所述虚拟总线,用于接收所述存储设备的注册,形成一个包含多个存储设备的虚拟网络;
所述服务层,用于基于所述虚拟网络为用户提供服务。
本申请实施例中,经过注册的存储设备可以连接到虚拟总线,基于虚拟总线实现存储设备之间的互联,实现多方设备能够在不直接互联的情况下彼此发现和交流。而服务层中的服务也可以连接到虚拟总线中,从而基于虚拟总线实现数据流动管理、编排等服务。
进一步的,连接到虚拟总线的存储设备、服务等可以是异构的,例如,存储设备的硬件型号、软件版本、厂商、所在位置、或文件系统等可以互不相同,服务的内容、形式、或架构等可以互不相同。异构的设备通过虚拟总线所提供的布局规范,也可以实现互相连接,从而建立松耦合、分布式的数据协作关系。
可选地,前述多个存储设备包含可以是集中式的存储设备,也可以是分布式的存储设备,可以是虚拟的存储设备,也可以是实体存储设备。
在第一方面的一种可能的实施方式中,所述多个存储设备包含存储阵列、存储服务器、公有云、私有云、或混合云等中的一项或者多项。
在混合多云的情况下,用户可能具有多套存储设备,其分布的位置可能在不同的区域,也可能分布在不同云中。而本申请实施例中,多种存储设备通过虚拟总线就可以实现互相连 接,满足用户对于不同存储设备的连接需求。
在第一方面的一种可能的实施方式中,所述多个存储设备中的一个存储设备用于向所述虚拟总线提交注册信息。
相应的,虚拟总线可以记录存储设备的注册信息,从而完成存储设备的注册。
进一步的,虚拟总线可以提供多个接入点,存储设备可以通过接入点进行设备注册。由于接入点通常有多个且分布在不同的区域,因此注册服务是去中心化、无服务器化的。此外,存储设备也可以选择靠近自身、或符合自身业务需求的接入点来接入虚拟总线,提高存储设备的注册效率,提升使用虚拟总线的服务。
在第一方面的一种可能的实施方式中,所述注册信息包含所述存储设备的设备信息(或称为存储设备的配置信息)。
在第一方面的一种可能的实施方式中,所述注册信息包含数据的元数据。
在第一方面的一种可能的实施方式中,所述注册信息包含存储池的信息。
在第一方面的一种可能的实施方式中,写入虚拟总线的注册信息为注册格式写入。其中,注册格式可以是预先定义、预先配置或者协议规定的数据格式,或者,注册格式可以是由虚拟总线定义的数据格式,或者,注册格式可以是经过虚拟总线和存储设备协商得到的数据格式。
例如,所述数据格式可以包含键值key-value格式。
在第一方面的一种可能的实施方式中,在以注册格式写入存储设备的信息时,可以有以下两种可能的设计:
设计一:存储设备按照注册格式写入注册信息。
设计二:存储设备将注册信息上传至虚拟总线后,通过其他服务将注册信息转化为统一的格式写入虚拟总线中。
在第一方面的一种可能的实施方式中,所述多个存储设备中的一个存储设备用于在获取授权之后向所述虚拟总线提交所述注册信息。
在第一方面的一种可能的实施方式中,所述多个存储设备中的发送端和接收端之间不直接相连,所述发送端和所述接收端通过所述虚拟总线交互数据或者传递消息。
在第一方面的一种可能的实施方式中,所述虚拟总线还包括消息队列,所述消息队列,所述消息队列用于存储关于任意一个所述存储设备的消息。
在第一方面的一种可能的实施方式中,所述多个存储设备包含发送端和接收端;
所述虚拟总线的消息队列用于记录来自所述接收端的数据推送请求,所述数据推送请求用于请求所述发送端推送目标数据;
所述发送端用于向预设位置推送所述目标数据;
所述虚拟总线的消息队列还用于记录关于所述接收端的数据准备消息,所述数据准备消息用于通知所述接收端拉取所述目标数据。
第二方面,本申请实施例提供了一种数据处理方法,包括:
提供存储层,所述存储层包括多个存储设备,所述存储层用于提供存储空间;
提供所述虚拟总线,所述虚拟总线用于接收所述存储设备的注册,形成一个包含多个存储设备的虚拟网络;
提供服务层,所述服务层用于基于所述虚拟网络为用户提供服务。
本申请实施例中,经过注册的设备可以连接到虚拟总线,基于虚拟总线实现存储设备之间的互联,实现多方设备能够在不直接互联的情况下彼此发现和交流。而服务层中的服务也可以连接到虚拟总线中,从而基于虚拟总线实现数据流动管理、编排等服务。
进一步的,连接到虚拟总线的存储设备、服务等可以是异构的,例如,存储设备的硬件型号、软件版本、厂商、所在位置、或文件系统等可以互不相同,服务的内容、形式、或架构等可以互不相同。异构的设备通过虚拟总线所提供的布局规范,也可以实现互相连接,从而建立松耦合、分布式的数据协作关系。
可选地,前述多个存储设备包含可以是集中式的存储设备,也可以是分布式的存储设备,可以是虚拟的存储设备,也可以是实体存储设备。
在第二方面的一种可能的实施方式中,所述多个存储设备包含存储阵列、存储服务器、公有云、私有云、或混合云等中的一项或者多项。
在混合多云的情况下,用户可能具有多套存储设备,其分布的位置可能在不同的区域,也可能分布在不同云中。而本申请实施例中,多种存储设备通过虚拟总线就可以实现互相连接,满足用户对于不同存储设备的连接需求。
在第二方面的一种可能的实施方式中,所述方法还包括:所述多个存储设备中的一个存储设备向所述虚拟总线提交注册信息。
相应的,虚拟总线可以记录存储设备的注册信息,从而完成存储设备的注册。
进一步的,虚拟总线可以提供多个接入点,存储设备可以通过接入点进行设备注册。由于接入点通常有多个且分布在不同的区域,因此注册服务是去中心化、无服务器化的。此外,存储设备也可以选择靠近自身、或符合自身业务需求的接入点来接入虚拟总线,提高存储设备的注册效率,提升使用虚拟总线的服务。
在第二方面的一种可能的实施方式中,所述注册信息包含所述存储设备的设备信息(或称为存储设备的配置信息)。
在第二方面的一种可能的实施方式中,所述注册信息包含数据的元数据。
在第二方面的一种可能的实施方式中,所述注册信息包含存储池的信息。
在第二方面的一种可能的实施方式中,写入虚拟总线的注册信息为注册格式写入。其中,注册格式可以是预先定义、预先配置或者协议规定的数据格式,或者,注册格式可以是由虚拟总线定义的数据格式,或者,注册格式可以是经过虚拟总线和存储设备协商得到的数据格式。
例如,所述数据格式可以包含键值key-value格式。
在第二方面的一种可能的实施方式中,在以注册格式写入存储设备的信息时,可以有以下两种可能的设计:
设计一:存储设备按照注册格式写入注册信息。
设计二:存储设备将注册信息上传至虚拟总线后,通过其他服务将注册信息转化为统一的格式写入虚拟总线中。
在第二方面的一种可能的实施方式中,所述方法还包括:多个存储设备中的一个存储设备在获取授权之后向所述虚拟总线提交所述注册信息。
在第二方面的一种可能的实施方式中,所述多个存储设备中的发送端和接收端之间不直接相连,所述发送端和所述接收端通过所述虚拟总线交互数据或者传递消息。
在第二方面的一种可能的实施方式中,所述虚拟总线还包括消息队列,所述消息队列, 所述消息队列用于存储关于任意一个所述存储设备的消息。
在第二方面的一种可能的实施方式中,所述多个存储设备包含发送端和接收端;所述方法还包括:
所述虚拟总线通过所述消息队列记录来自所述接收端的数据推送请求,所述数据推送请求用于请求所述发送端推送目标数据;
所述发送端向预设位置推送所述目标数据;
所述虚拟总线通过所述消息队列记录关于所述接收端的数据准备消息,所述数据准备消息用于通知所述接收端拉取所述目标数据。
第三方面,本申请实施例提供一种计算节点,该计算节点包括处理器和存储器;存储器中存储有至少一条计算机指令;该指令由该处理器加载并执行,以实现前述第二方面任一项中虚拟总线所执行的方法操作。
第四方面,本申请实施例提供一种存储设备,该存储设备包括处理器和存储器;存储器中存储有至少一条计算机指令;该指令由该处理器加载并执行,以实现前述第二方面任一项中存储层所执行的方法操作。
第五方面,本申请实施例提供一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在至少一个处理器上运行时,实现前述第二方面任一项所描述的方法。
第六方面,本申请提供了一种计算机程序产品,计算机程序产品包括计算机指令,当所述指令在至少一个处理器上运行时,实现前述第二方面任一项所描述的方法。
可选地,该计算机程序产品可以为一个软件安装包,在需要使用前述方法的情况下,可以下载该软件安装包并在计算设备上执行该软件安装包所形成的计算机指令。
本申请第二至第五方面所提供的技术方案,其有益效果可以参考第一方面的技术方案的有益效果,此处不再赘述。
附图说明
下面将对实施例描述中所需要使用的附图作简单的介绍。
图1是本申请实施例提供的一种数据流动的场景示意图;
图2是本申请实施例提供的一种数据处理系统的示意图;
图3是本申请实施例提供的一种设备的示意图;
图4是本申请实施例提供的一种对象存储池的示意图;
图5是本申请实施例提供的一种设备在虚拟总线上的视图的示意图;
图6是本申请实施例提供的一种数据处理系统的使用场景示意图;
图7是本申请实施例提供的一种虚拟总线的设备目录的示意图;
图8是本申请实施例提供的一种在线状态的示意图;
图9是本申请实施例提供的又一种在线状态的示意图;
图10是本申请实施例提供的一种存储池中存储的内容的目录示意图;
图11是本申请实施例提供的又一种数据处理方法的流程示意图;
图12是本申请实施例提供的又一种存储池中的目录的示意图;
图13是本申请实施例提供的又一种数据处理方法的流程示意图;
图14是本申请实施例提供的一种将块映射为对象的示意图。
具体实施方式
下面结合附图对本申请实施例进行详细介绍。
为了便于理解,以下示例地给出了部分与本申请实施例相关概念的说明以供参考。如下所述:
1.对象存储服务
对象存储服务是基于对象的存储服务,其中,一个对象包含文件的数据和与其相关的属性信息。对象存储服务通常由服务供应商提供,例如,服务提供商可以包含华为、亚马逊、阿里巴巴、或微软等。
以华为云的对象存储服务(Object Storage Service,OBS)为例,OBS包含对象和桶(bucket)。
其中,对象包含数据(data)、元数据(metedata)、或键(key)等中的一项或者多项。data是文件的数据内容。key可以看作对象的标识(或对象的名称),一般来说,一个桶里的每个对象拥有唯一的键。metedata,即对象的描述信息,通常以键值对(Key-Value)的形式存在。
桶是存储对象的容器。桶包含访问域名、存储类别、访问权限、或所属区域等属性。对象存储服务的使用者可以通过桶的访问域名来访问桶。
2.消息队列
消息队列是一种通信方式,可以理解为包含一条或者多条消息的列表。消息在被处理和删除之前存储在消息队列上,消息发送方通过消息队列服务可以与消息接收方进行交互。应理解,本申请为了便于描述,故将包含多个消息的数据结构统一称为消息队列,并不旨在限定通过队列的方式实现消息队列。例如,具体实施过程中,消息队列还可以通过列表、堆、链表、或栈等方式来实现。
一种设计中,设备通过周期或者非周期性的读取(或侦听)消息队列,可以获取到需要自身处理的消息。
又一种设计中,通过消息队列服务可以将消息通知给接收方。消息发送方(或者消息触发方)向消息中写入消息后,消息队列服务可以通知接收方对消息进行处理。例如,通过门铃机制提醒接收方读取消息等。
消息队列适用于无服务器架构、微服务架构,对于服务器架构同样适用。
3.用户
用户是具有独立资源(例如:计算资源、存储资源、或网络资源等中的一个或者多个)的个人、或组织。部分场景中,也可以称为租户、或成员(member)。
4.重定向
重定向(redirect)是指将网络请求重新确定请求方向,转到其它位置进行处理的过程。
例如,由于网络资源的调整,原本在地址1的内容迁移到地址2中,此时可以将用户对地址1的访问进行重定向,即:将用户对地址1的访问重定向到地址2中,使得用户可以访问所需的内容。
其中,重定向的方式可以有301 redirect、302 redirect、或meta fresh方法等。
5.逻辑单元号(logical unit number,LUN)
LUN是一种对存储单元的划分,具体可以包含数字、符号、字符串等。
在一些存储系统中,例如存储区域网络(Storage Area Network,SAN)中,LUN用来 标识一个逻辑单元。即:存储系统将物理硬盘进行分区,成为拥有逻辑地址的各个部分,进而允许使用者对存储系统进行访问,这样的一个分区便称为一个LUN。
一些场景中,LUN也指在SAN存储上创建的逻辑磁盘。
上述对概念的示例性说明可以应用在下文的实施例中。
设备之间经常需要进行通信,例如存储设备之间的数据迁移、数据推送、读数据、或写数据等。但是,通信的发送端和接收端可能会分布在不同的地区、或不同的云(例如不同的公有云、或不同的私有云、或不同的混合云)中,此时发送端和接收端难以通过Internet直接访问,使得设备之间无法实现互联,从而导致数据无法流动。
例如,在混合多云的场景中,用户可能拥有多套存储设备和/或多套服务,这些存储设备(和/或服务)可能在线下会位于不同的区域,线上又可能会分布在不同的云中,由于通信区域限制、或安全控制等原因,无法进行通信连接。
请参见图1,图1是本申请实施例提供的一种可能的数据流动的场景示意图。存储设备1位于防火墙101后,存储设备2位于防火墙102后,服务1位于防火墙103后。由于存储设备1和存储设备2位于不同的防火墙内,若存储设备1向存储设备2发起请求(例如连接请求、或数据流动请求等),则会被存储设备2的防火墙102拒绝。相应的,若存储设备2向存储设备1发起请求,则会被存储设备1的防火墙101拒绝。因此存储设备1和存储设备2无法建立连接,使得数据在存储设备1和数据2之间流动变得困难。类似的,服务1与存储设备2之间的连接、数据流动,也受到防火墙103和/或防火墙102的限制。
一种可能的场景中,存储设备1为企业A的数据中心,存储了大量业务数据。存储设备2为企业A购买的公有云虚拟机。而企业A需要将存储设备1中的业务数据推送到公有云中进行数据分析,由于存储设备1和存储设备2之间无法建立连接,因此存储设备1中的业务数据无法推送到公有云,从而无法满足企业A的数据流动需求。
进一步的,服务1可以是企业A建立的私有云中的数据管理服务,由于服务1位于防火墙103后,也从而无法满足企业A的服务使用需求。
有鉴于此,本申请实施例提供一种数据处理系统、数据处理方法及相关设备,能够实现设备之间的互联,满足对设备之间的数据流动的需求。
请参见图2,图2所示为本申请实施例提供的一种可能的数据处理系统20的示意图。参见图2,数据处理系统20可以包含存储层201、虚拟总线203和服务层202。
(1)存储层201包含一个或者多个存储设备,存储层用于提供存储空间。
其中,存储设备是用于提供存储空间的装置。本申请对于存储设备的组织形式(例如集中式、或分布式等)、存在形式(例如实体装置、或虚拟装置等)、存储空间大小、所在位置、提供商等不不作限制。
示例性地,存储设备可以包含集中式的存储设备,例如存储阵列、或存储服务器等,也可以包含公有云、私有云、混合云等,或者云中部分或者全部存储空间。
示例性地,存储设备可以包含实体装置,例如固态硬盘、机械硬盘或其他类型存储介质的存储设备、或存储服务器等,也可以包含虚拟装置,例如容器、虚拟机、或者将实体装置进行虚拟化处理后形成的存储池等。
(2)服务层202可以包含一个或者多个服务。其中,服务是由服务提供商提供的资源或在资源上完成的一系列活动,这些资源或活动可以解决用户问题、或满足用户需求。
示例性地,这里的服务可以包含以下几种类别:
类别1:资源。例如,存储资源(例如对象存储服务)、计算资源(如弹性云服务器、云手机等)、网络资源(例如虚拟私有云、或弹性负载均衡等)等资源。
类别2:在资源上进行的活动。进一步的,这些服务可以包含基础服务、管理服务、面向用户的服务等。
基础服务是虚拟总线所依赖的服务,虚拟总线基于基础服务来维持运行。基础服务例如可以包含安全服务、消息队列服务等。
管理服务用于管理虚拟总线本身、或者管理虚拟总线的使用者的服务,例如数据查询服务、数据视图服务、数据流动编排与管理服务、身份认证服务、云监管服务、或云审计服务等。
面向用户的服务例如可以包含注册服务、数据处理服务(例如数据查询、数据分析、大数据洞察等)、软件开发平台服务(例如开发平台、项目管理、项目部署、代码托管、或镜像服务等)、人工智能服务(如自然语言处理、内容审核、图像识别等)、应用运行服务(例如云性能测试服务)等。
类别3:用户在虚拟总线中注册的软件。例如,用户可以将其开发的数据分析软件在虚拟总线上进行注册,从而形成服务。
应理解,上述服务类别仅为示例,具体实施过程中服务层可以包含更多或者更少的服务类别。
服务层中的部分服务可以以付费模式向用户提供。例如,服务层中的部分服务可以是面向用户的增值服务,例如身份认证服务、云监管服务等。
应理解,图1所示的存储层中的存储设备的数量、以及服务层中服务的数量、连接点的数量等仅为示意,不作为对本申请的数据处理系统的限定,在实现过程中,存储设备、服务、或连接点等可以包含一个或者多个。
(3)虚拟总线203,也可以称为虚拟总线层、或中间层。虚拟总线的使用者是连接到总线的设备(便于描述以下简称为设备),例如存储层中的存储设备、或服务层中的服务等中的一个或者多个。
虚拟总线可以接收设备的注册(或发布,便于描述以下称为注册),从而形成包含多个设备的虚拟网络。经过注册的设备可以被其他设备发现,相应的,虚拟总线的使用者可以通过虚拟总线发现其他设备,从而实现设备之间的互联。
下面介绍设备向虚拟总线注册的过程。
首先,设备向虚拟总线提交注册信息,进行设备注册。其中,注册信息用于描述该设备的配置、或者描述设备中存储的数据、或者描述设备所提供的服务类型等。
示例性地,注册信息包含设备的配置信息、设备中存储的数据的元数据、或设备的存储池等信息等中的一项或者多项。其中,配置信息包含身份标识(identifier,ID,例如,唯一识别码(universally unique identifier,UUID))、名称(name)、型号(model)、版本号、地理区域(例如可以为中国西部、华中区域、华北区域等)、位置(location,可以为城市名、建筑名称、机房号、或机架号等,或通过经纬度等描述)、(物理)地址(address)、网际互连协议(internet protocol,IP)地址、介质访问控制(mediaaccesscontrol,MAC)地址、端口号、管理IP、或应用程序接口(application programming interface)等中的一项或者多项。其中,设备可以由某一管理设备进行管理、或者通过某一管理端口进行管理,前述的管理IP 为管理设备或管理端口的IP地址。
示例性地,数据的元数据可以包含数据所在的存储池的信息、数据对应的LUN、文件系统的信息、共享到全局文件系统的目录信息、挂载的源自其他设备的目录信息、状态信息(例如,数据所在的文件系统的在线状态、是否故障、空间使用情况等)、数据的名称、分享地址(数据如果被共享,则其他设备可以通过数据的共享地址访问到该数据)、或许可证等中的一项或者多项,其中,分享地址可以包含IP地址、MAC地址、或统一资源定位符(uniform resource locator,URL)地址等。示例性地,存储池的信息包括存储池的标识、所述存储池所包含的硬盘的标识、所述存储池中空闲存储空间、所述存储池的容量、最小访问单元等中的一项或者多项。可选的,存储池可以是经过虚拟化得到的。例如,租户可以在存储服务提供商处租用一段存储空间,存储空间经过虚拟化处理后得到存储池的信息。
虚拟总线包含一个或者多个接入点,如图2所示的圆点204所示。接入点是指设备接入虚拟总线的路径。具体的,路径,也可称为资源地址,指示了接入点的所在的位置,设备可以通过路径来找到接入点。可选的,路径可以为URL,如:https://virtualbus.s3.cn north1.amazonaws.com.cn。或者可选的,在接入点为对象桶(存储对象的容器)时,接入点的路径是桶的目录。
本申请涉及的接入点的类型可以有一种也可以有多种。示例性地,接入点可以包含基础接入点和衍生接入点这两种类型的接入点。其中,基础接入点的路径与用户所请求访问的地址(以下称为访问地址)一致,而衍生接入点的路径与访问地址不一致。
一种设计中,在虚拟设备访问基础接入点时,可以通过重定向机制将用户重新向至衍生接入点,从而可以在基础接入点上引入新的接入点,构成接入点网络。一方面,通过衍生接入点可以调控接入点的数量、提升系统的弹性。另一方面,由于一些基础接入点可能会因为故障、业务调整等原因而无法继续使用。在设备访问的接入点无法使用时,通过重定向机制可以将设备的访问重定向至其他可使用的接入点,从而适应接入点的变化,提高系统灵活性和稳定性。
虚拟总线中设有一个或者多个接入点,设备中具有总线连接器(或者其他能够实现接入接入点的功能单元,下文中进行详细描述),设备通过总线连接器向接入点对应的路径中写入注册信息,即可以连接到虚拟总线。由于接入点通常有多个且分布在不同的区域,因此设备注册过程可以是去中心化、无服务器化的。此外,设备还能够通过距离较近、或符合自身业务需求的接入点,来接入虚拟总线,从而提高存储设备的注册效率,提升使用虚拟总线的体验。
一个接入点拥有它自己的信息以描述该接入点,本实施例将这些信息称为接入点的信息。接入点的信息包括该接入点的访问地址、编号、所在地区、提供商等。
在进行注册时,设备可以获取多个接入点的信息,向其中一个或者多个接入点写入上述注册信息,从而实现在虚拟总线上的注册。本申请对于接入点的信息的存储形式不做限定,例如接入点的信息可以以队列、或表格等形式提供给设备等。表1示出了一种可能的接入点的信息的列表,例如编号为1的接入点,由提供商P1提供,所在地区为中国,访问地址为“https://virtualbus.s3.cn north 1.amazonaws.com.cn”;其余接入点以此类推。
表1 接入点的信息列表
编号 提供商 地区 访问地址
1 P1 中国 https://virtualbus.s3.cn north 1.amazonaws.com.cn
2 P1 美国西部 https://virtualbus.s3.us-west-2.amazonaws.com
3 P1 美国东部 https://virtualbus.s3.us-east-3.amazonaws.com
4 P2 - https://virtualbus.obs.cn-north-4.myhuaweicloud.com
…… …… …… ……
上述接入点的信息可以通过白名单向设备提供。可选地,设备中可以预先被配置、或者预先定义上述白名单,例如,在设备出厂时即在存储设备中配置白名单。或者可选地,设备可以在指定地址获取白名单。
设备获取接入点的信息后,通过连接功能模块来接入接入点。连接功能模块能够让设备以合规的连接流程连接至虚拟总线中,并执行与虚拟总线相关的业务(例如设备在总线上的注册、向虚拟总线上的对象存储池推送数据、接收/发送消息等)。连接功能模块可以通过软件、硬件、或者软硬结合的方式实现。通常来说,连接功能模块中包含执行总线连接流程的指令、安全证书、安全上下文、密钥、连接协议等中的一项或者多项,用于支持设备以合规的连接流程连接至虚拟总线中。
示例性地,请参见图3,图3是本申请实施例提供的一种可能的设备的示意图。设备301(例如图1所示的存储设备、或图1所示的服务等)包含总线连接器。总线连接器是设备301中的连接功能模块,用于连接接入点,从而连接到虚拟总线。总线连接器可以通过硬件(例如,通信接口、或收发器等)、软件模块、或者软件模块与硬件模块的结合体来实现。
总线连接器还用于支持设备实现与虚拟总线之间的业务。设备301通过总线连接器,实现总线注册、总线数据发布、或总线消息处理等业务。其中,总线注册是指设备将自己注册在虚拟总线上。总线数据发布用于将已注册的设备中存储的数据的元数据推送到虚拟总线上,也用于将待注册的设备中存储的数据推送到虚拟总线上。数据或元数据推送至虚拟总线后,可存储在虚拟总线的对象存储资源池中。总线消息处理用于设备完成与消息相关的功能,例如:以虚拟总线支持的形式向消息队列中写消息、或读取消息队列中的消息等(下文中进行说明)。
设备在写入注册信息时,可以按照一定格式写入虚拟总线。其中,注册格式可以是预先定义、预先配置或者协议规定的数据格式,或者,注册格式可以是由虚拟总线中的管理服务定义的数据格式,或者,注册格式可以是经过虚拟总线的管理服务和存储设备协商得到的数据格式。例如,注册格式是键值(key-value,KV)格式。再如,在注册信息为KV格式的情况下,设备以自己的标识(例如UUID)作为注册信息中的key的前缀(prefix)。示例性地,表2以注册信息为设备的配置信息为例,示出了一种可能的注册信息格式,其中,配置信息的key为配置信息的路径,value为key对应的值。设备的配置信息的key以接入点的定位地址和设备的UUID为前缀。如表1的key所示,{vbus_access_point}为设备连接接入点的路径,{device-uuid}为设备的UUID。
示例性地,以设备连接的接入点为表1所示的编号为1的接入点,设备的UUID为“cloud_0”为例,设备名称(name)的key为“https://virtualbus.s3.cn north 1.amazonaws.com.cn/cloud_0/name”,value为“shanghai-dorado88”(示例)。类似的,设备模板(model)的key为“https://virtualbus.s3.cn north 1.amazonaws.com.cn/cloud_0/model”,value为“OceanS tor Dorado”(示例)。设备在注册时,将接入点的资源定位地址和设备本身的标识作为前缀。
表2 KV格式的设备的配置信息
键(key) 值(value)
https://{vbus_access_point}/{device-uuid}/name shanghai-dorado88
https://{vbus_access_point}/{device-uuid}/model OceanStor Dorado
https://{vbus_access_point}/{device-uuid}/version 6.5.3
https://{vbus_access_point}/{device-uuid}/location Pudong,Shanghai
https://{vbus_access_point}/{device-uuid}/management_ip 10.2.2.88
…… ……
再如,表3以文件系统的元数据为例,示出了一种可能的KV格式的元数据的示意表。可以看出,注册信息的key以接入点、数据的类别(例如文件系统、快照等)、设备的标识为前缀。其中,“{vbus_access_point}”指示待注册存储设备写入的接入点,“filesystem”指示数据的类别是文件系统,“{fs-uuid}”为文件系统的唯一标识符。以“数据的名称”这一元数据为例,key为“https://{vbus_access_point}/filesystem/{fs-uuid}/name”,value为“finance”。其余元数据以此类推。
表3 KV格式的元数据
键(key) 值(value)
https://{vbus_access_point}/filesystem/{fs-uuid}/name finance
https://{vbus_access_point}/filesystem/{fs-uuid}/shares/nfs/name finance
https://{vbus_access_point}/filesystem/{fs-uuid}/shares/nfs/permission rw
https://{vbus_access_point}/filesystem/{fs-uuid}/shares/nfs/IP 10.2.2.9
…… ……
上述例子均是待注册设备按照设定的格式写入注册信息,本申请实施例还提供另一种实现方式,在这种实现方式中,设备将注册信息上传至虚拟总线后,通过其他服务(例如虚拟总线所提供的格式转换功能、或服务层提供的服务等)将注册信息转化为统一的格式写入虚拟总线中。转化后的格式与上面描述的预先设定的格式相同或类似,这里不再赘述。
虚拟总线中包含了存储空间,设备的注册信息可以在虚拟总线的存储空间中进行存储。例如,虚拟总线通过对象存储服务提供存储空间为例,对象存储服务包含多个对象桶。虚拟总线的接入点的路径即为对象桶的目录,设备的注册信息可以存储在对应的对象桶的目录下。
在虚拟总线以对象存储服务提供存储空间的情况下,虚拟总线所具有的存储空间可以称为对象存储池。对象存储服务通常包含多个对象桶,该对象桶即对象存储服务提供的存储空间。即:对象存储池可以包含一个或者多个对象存储服务提供的多个对象桶。
示例性地,图4示出了一种可能的对象存储池的示意图,对象存储池401包含m(m>0)个对象桶,接入点1的路径为对象桶1的目录,其中接入点可以为n(n>0)个。以注册的设备是cloud_0、注册的接入点是接入点1为例,cloud_0在注册时,其注册信息的以KV格式存储在对象桶中,且key以“接入点1的路径/cloud_0/”为前缀。如图4所示,设备cloud_0的名称对应的key为“接入点地址1/cloud_0/name”,值则写入该路径对应的存储空间中。类似的,通过接入点2接入的设备,其配置信息、元数据等也可以写入接入点2的路径下。上述对象存储服务可以包含公有云的对象存储服务、私有云的对象存储服务、或混合云的对象存储服务等,或者,对象存储服务可以是本地化部署的,例如,企业通过计算装置和存储装置,建立线下的对象存储服务。当虚拟总线基于公有云的对象存储服务提供存储空间时,基于公有云的全球可访问性,可以实现覆盖广泛区域的设备虚拟网络。
虚拟总线可以提供设备发现和共享功能,设备通过虚拟总线可以发现其他已注册的设备。 示例性地,设备通过扫描接入点,可以发现通过接入点连接到虚拟总线的设备。示例性地,请参见图5,图5是本申请实施例提供的一种可能的设备在虚拟总线上的视图的示意图。如图5所示,设备扫描接入点1从而可以获取从接入点1接入虚拟总线的设备的信息,示例性地,设备的信息可以包含前述的设备的注册信息。
为了提升系统安全性和稳定性,虚拟总线通过安全服务来提供安全防护。示例性地,安全服务可以包含以下三个方面中的一个或者多个方面的安全防护:
第一方面:权限的授予和收回。其中,权限可以包含查看权限、访问权限、通信权限(例如向指定设备发送消息的权限、或接收某一设备的消息的权限等)、数据推送权限等。
为了便于描述,在权限授予场景中,将请求授予权限的一方称为请求者。作为一种可能的实施方式,安全服务可以基于请求者的身份的认证是否通过、请求者是否购买相关服务等条件,来确定是否授予请求者权限。
为了便于理解,下面示例性地说明几个关于授予权限和收回权限的案例:
案例1:发现者访问存储设备和/或者服务需要先获取访问授权。例如,发现者在访问存储设备之前,可以向安全服务请求鉴权信息(例如:Access Key),发现者基于鉴权信息可以请求访问该存储设备。
案例2:发现者在不需要使用权限时,应该归还授权。例如,发现者在结束对存储设备的访问后,可以向安全服务归还申请的Access Key。
案例3:安全服务可以收回已授予的权限。例如,安全服务可以收回授予的鉴权信息,或者,安全服务使得颁发的Access Key无效。
第二方面:通过访问控制列表(access control lists,ACL)实现对虚拟总线中的数据或信息的访问控制。具体地,ACL是一种基于条件过滤的访问控制技术,可以根据设定的条件将设备对于某一数据的请求进行过滤,允许其通过或丢弃。
以图2为例,对虚拟总线中的存储设备1的注册信息建立ACL,只有满足某一条件的用户可以读取存储设备1的注册信息。当存储设备2请求读取存储设备1的注册信息时,ACL可以根据设定的条件,对存储设备2的读取请求进行检查。在存储设备2的读取请求满足设定的条件时,允许存储设备2读取存储设备1的注册信息。
作为一种可能的实施方案中,ACL还可以以单个对象为单位。例如,通过ACL对某一存储设备的注册信息中的IP地址进行访问控制。
第三方面:安全服务可以对虚拟总线中存储的KV格式的数据进行加密,从而进一步保证数据的安全。进一步的,加密的密钥可以通过密钥服务,来进行管理。该密钥服务可以是集成在安全服务中,也可以是独立的密钥服务。
本申请实施例中,虚拟总线可以接收设备的注册,已注册的设备可以基于虚拟总线供其他设备发现,虚拟总线可以看作中间者实现设备之间的间接连接,使得多方设备能够在不直接互联的情况下彼此发现和交流。
应理解,本申请部分示例,为了方便理解故以存储设备为例对设备注册、设备发现、设备之间的数据流动进行说明,并不旨在限定虚拟总线的使用对象,本申请中的注册、发现、数据流动等功能,对于服务、软件等同样适用。
本申请实施例中,连接到虚拟总线的存储设备、服务等可以是异构的,例如,存储设备的硬件型号、软件版本、厂商、所在位置、或文件系统等可以互不相同,服务的内容、形式、 或架构等可以互不相同。异构的设备通过虚拟总线所提供的布局规范,也可以实现互相连接,从而建立松耦合、分布式的数据协作关系。
例如,图6所示为本申请实施例提供的一种可能的数据处理系统的使用场景示意图。
其中,存储层中的存储设备可以在虚拟总线603进行设备注册,已注册的存储设备可以被其他设备发现。而存储层中的存储设备,既可以包含线上存储设备,例如云601a、云601b、或云601c等,也可以包含线下实体设备,例如存储设备602a、存储设备602b、存储设备602c、或存储设备602d等。
可选的,存储层中的线上存储设备的数量可以有一个或者多个。在线上存储设备的数量为多个的情况下,多个线上存储设备可以来自不同的提供商,例如:云601a可以为华为云,云601b可以为亚马逊云,云601c可以为阿里云。
类似的,存储层中的线下存储设备的数量可以有一个或多个。在线下存储设备的数量为多个的情况下,多个线下存储设备的所在地区、组织形式、供应商可以相同,也可以不同。例如,存储设备602a可以是海量存储服务OceanStor,存储设备602b可以是海量存储服务OceanStor Cube,存储设备602c可以是海量存储服务OceanStor Pacific,存储设备602d可以是海量存储服务OceanStor Dorado。
服务层可以基于虚拟总线603提供服务,例如服务层中的服务可以包含服务604a。关于服务层中的服务描述可以参见对图2中服务层的描述。
作为一种可能的实施方案,服务604a可以通过数据管理引擎(datamanagementengine,DME)实现。其中,DME可以用于呈现全局视图、设置设备状态、或管理设备见的数据流动等。进一步的,在服务604a为DME时,DME可以由管理员605进行管理,例如,DME604a响应管理员605的输入、操作等,执行相应的业务。应理解,此处的管理员605可以是人,也可以是通过管理设备、计算机程序等实现的管理服务。
如图6所示的场景,在混合多云的情况下,用户可能具有多套存储设备,其分布的位置可能在不同的区域,也可能分布在不同云中。而本申请实施例中,多种存储设备通过虚拟总线就可以实现互相连接,可以满足用户对于不同区域、不同架构的设备的连接需求。
以上对虚拟总线的架构进行介绍,以下对虚拟总线所提供的一些可能的功能进行说明。虚拟总线本质上是对存储服务和消息服务的虚拟化,其中,存储服务经过虚拟化处理后用于存储注册信息、数据等,消息服务用于实现设备之间的通信,从而形成一个囊括可用的存储资源和消息服务资源的虚拟网络。
作为一种可能的方案虚拟总线的功能可以通过以下4种示例性的设计来实现:
设计1,虚拟总线可以通过对象存储服务来实现数据存储。其中,对象存储服务的种类可以是一种,也可以是多种,甚至可以包含来自多个提供商的多种对象存储服务。一种可选方案中,对象存储服务可以包含亚马逊提供的AWSS3、华为提供的HuaweiOBS、或微软提供的Azure Blob等中的一项或者多项。
对象存储服务所提供的存储空间可以形成对象存储池。如图2所示,对象存储池可以用于存储接入点的信息、设备的注册信息、存储设备推送的数据、或数据的元数据等。
由于对象存储服务提供持久化的对象存储,因此存储池中存储的内容是持久化的,可以一直保持在对象存储池中。在设备可访问以及权限允许的情况下,虚拟总线的使用者可以随时通过虚拟总线查看对象存储池中存储的内容,例如发现其他设备等。
示例性地,请参见图7,图7是本申请实施例提供的一种可能的虚拟总线的设备目录的示意图。设备cloud_0经过注册后,其注册信息在虚拟总线中通过如图6所示的目录进行存储,而且图7所示的目录是持久化的。如图7所示的目录中,cloud_0为设备的UUID,proc目录下为设备的配置信息,例如可以包含能力(capabilities)、联系人(contacts)、网际协议版本4(internet protocol version 4,ipv4)的相关信息、网际协议版本6(internet protocol version 6,ipv6)的相关信息、位置(location)、型号(model)、端口(ports)、或版本(version)等信息。其中,capabilities中包含了用于描述设备的功能的数据,例如设备的通信能力、存储能力、支持何种文件系统等。contacts中包含了维护所述设备的用户、或者设备的信息,例如,该设备对应的设备管理员的联系方式、物理地址、IP地址等。图6所示的ipv4、ipv6用于存储通信协议的相关文件、密钥等。
在设备中包含存储池的情况下,注册信息中可能包含存储池的信息。如图6所示,pool0目录下为设备中的存储池的信息,例如可以包含用于得到存储池的磁盘(disks)的信息(即disk)、存储池的空闲空间(free)、磁盘切片的颗粒大小(grain_size)、或存储池的大小(size)等。
如图7所示的注册信息在资源存储池中是持久化存储的,虚拟总线的使用者在设备可访问和权限允许的情况下,可以随时通过路径来读取(或写入、编辑)该目录下的信息,或者写入相关信息。例如,设备的注册信息经过如图7所示的目录存储后,发现者可以通过路径访问设备的注册信息。示例性的,发现者访问contacts的路径可以为:https://{vbus_access_p oint}/cloud_0/proc/contacts,其余信息以此类推。
一些场景中,由于存储中池中存储的内容息是持久化的,在设备的配置产生变化时,可能会导致通过原本的注册信息无法访问设备的问题。因此,某些时候,存储池中存储的内容可能被更新,或者,通过附加额外信息以供发现者(这里的发现者是指除了被发现的目标设备之外的其他虚拟总线的使用者)判断存储的内容的时效性。
一种可能的设计中,通过周期性或者非周期性写入与设备相关的在线状态(例如时间戳、更新时间等信息),例如心跳协议,发现者可以根据在线状态来判断目标设备是否还处于活跃状态。
为了便于理解,以下列举三种可能的在线状态的方案:
方案1:在线状态为设备上次更新自身状态(或信息)的时间。应理解,此处的时间可以是时刻、时间戳、或者时间长度等。
图8所示为本申请实施例提供的一种可能的在线状态的示意图,从区域801看出,存储设备1上次更新的时间为2021年10月12日15时50分00秒。类似的,存储设备2上次更新的时间为“分钟前”存储设备2上次更新的时间为“1634025000”,其中“1634025000”为时间戳表示方法,可以对应2021年10月12日15时50分00秒。
在方案1中,发现者可以根据目标设备上次更新自身状态的时间,来确定目标设备是否为活跃状态。
例如,目标设备上次更新自身状态的时间在预设的第一时长(例如,第一时长可以为半小时、1小时、或1天等)内,则目标设备为活跃状态。以第一时长为1小时为例,存储设备2上次更新的时间为5分钟前,落入第一时长的范围内,则存储设备2为活跃状态。
再如,目标设备上次更新自身状态时间在预设的第二时长内,则目标设备为不活跃状态。以第二时长为24小时以上为例,存储设备1上次更新的时间为“2021.10.12 15:50:00”,若 当前时间为“2021.10.14 10:00:00”,则存储设备1为不活跃状态。
应理解,以上仅以两种活跃状态为例进行说明,具体实施或称可以设置更多或者更少的种类,此处不在一一例举。上述关于存储设备的在线状态设置,对于服务同样适用,如图8所示,服务层中服务也可以对应更新时间,用于发现者判断其是否处于活跃状态。
方案2:在线状态可以为指示设备是否在线、或指示设备的连接的情况的标识,例如以下标识:在线、离线、或连接中等。图9所示为本申请实施例提供的一种可能的在线状态的示意图,从区域901看出,存储设备1可以对应的在线状态为“在线”。类似的,存储设备2对应的在线状态为“离线”。
方案3:发现者可以通过向目标设备发送询问消息以及接收来自目标设备的应答消息,来判断目标设备是否还处于活跃状态。可选地,该询问消息可以通过消息机制实现。进一步可选的,该消息机制可以是预先定义、或预先配置的。
一种可能的方案中,对于不活跃的设备、或者不在线的设备,可以对应不同的管理策略或者通信策略。例如,若设备处于不活跃状态,则在数据视图中可以对其进行标记。再如,若设备处于不在线状态,通过管理服务可以提醒管理员恢复其在线状态。
一些场景中,发现者根据目标设备的在线状态来确定对目标设备的通信策略。以图9为例,若发现者想要通信的目标设备为存储设备1,而存储设备1的在线状态为“在线”,因此发现者可以与存储设备1进行通信;若发现者想要通信的目标设备为存储设备2,由于存储设备2的在线状态为“离线”,因此发现者此时无法与存储设备1进行通信;若发现者相同通信的目标设备为存储设备3,由于存储设备2的在线状态为“连接中”,此时发现者可以在某一时间间隔后重新查看目标状态的在线状态。
可选地,一些公有的后台服务也可以周期性扫描总线并清理老旧的、非活跃的注册信息。
设计2,虚拟总线可以提供元数据服务,用于定义元数据的格式、或者将设备的注册信息转换为某一数据格式。
作为一种可选的方案,虚拟总线的管理服务中包含元数据服务。设备在进行注册时,可以将原始的注册信息上传到元数据服务中,由元数据服务将设备的原始的注册信息转换为固定格式的注册信息,写入到接入点中。
可选的,元数据服务可以由虚拟总线的本身的存储资源和计算资源来实现,也可以通过独立的服务来实现。当元数据服务通过独立的服务来实现时,元数据服务属于服务层中的一项服务。
设计3,虚拟总线可以提供数据隧道,以实现多设备间非直连的数据流动能力。
以图2所示的系统为例,在存储设备1为发送端、存储设备2为接收端时,存储设备1可以将数据推送到指定位置,而存储设备2可以从该指定位置获取数据。
其中,指定位置可以是预先定义的位置、发送端和接收端协商的位置、或者接收端指示的位置等。作为一种可能的方案,虚拟总线的存储空间中可以包含预先划分的缓存空间,用于作为临时存储数据的位置,此时发送端可以将数据推送到缓存空间的某一位置中,接收端则从该缓存空间拉取数据。
设计4,虚拟总线提供消息服务,多个设备之间可以通过虚拟总线提供消息服务进行消息沟通。其中,消息服务可以由虚拟总线的存储资源和计算资源来实现,也可以通过独立的存储资源和计算资源来实现。当消息服务通过独立的存储资源和计算资源来实现时,消息服务可以属于服务层中的一项服务。
为了便于理解,以下列举两种可能的实现方式:
实现方式1,通过消息队列服务来提供消息服务。其中,消息队列服务包含了存储资源和计算资源,可以完成以下功能中的一项或者多项:存储消息队列、接收写入的消息、通知消息接收方获取消息等。一个消息队列用于存储一个或者多个消息。可选的,消息队列中的消息可以是设备主动写入的,也可以是进行操作时触发的。例如,虚拟总线中的存储设备、或服务等,可以主动写入消息到消息队列中。再如,虚拟总线的使用者对虚拟总线中的对象进行操作(例如添加、删除、编辑、或查找等操作)时,可以触发消息,该消息可以被记录到消息队列中。
消息队列服务可以用于实现多种类型的消息队列,例如可以包含控制消息队列、或通知消息队列、或广播消息队列等。
以图2所示的数据处理系统为例,在存储设备1为发送端、存储设备2为接收端时,存储设备1和存储设备2之间数据流动时,可以使用消息队列服务提供的一个或者多个消息队列,以下以使用第一消息队列和第二消息队列为例进行描述,第一消息队列用于存储向存储设备1发送的消息,消息队列2用于存储向存储2发送的消息。应理解,第一消息队列和第二消息队列可以是同一个消息队列,也可以是不同的消息队列。
在进行数据流动时,存储设备2向第一消息队列中写入请求消息,该请求消息指示请求存储设备1推送某数据。而消息队列服务可以通知存储设备1读取消息,相应的,存储设备1通过读取第一消息队列获取该请求消息,从而将数据推送到指定位置。在推送数据后,存储设备1可以向第二消息队列写入数据已准备消息。而消息队列服务可以通知存储设备2读取消息,相应的,存储设备2读取第二消息队列获取该数据已准备消息,从而在指定位置拉取存储设备1推送的数据。如此,通过消息队列服务和虚拟总线完成数据流动过程。
实现方式2,基于虚拟总线本身的存储空间实现消息队列,或者说,消息队列存储在虚拟总线的存储空间中。消息队列可以包含一个或者多个消息,消息队列中的消息可以是设备主动写入的,也可以是进行操作时触发的。可选的,消息接收方通过周期或者非周期的读取(或侦听)消息队列,获取消息。
作为一种可能的实施方式,消息队列以KV格式存储在虚拟总线的对象存储池中。进一步的,消息队列的key可以有固定的格式,该格式可以是预先定义、或预先配置的。例如,消息队列的key可以包含指定的key前缀。示例性地,以图2所示的数据处理系统为例,存储设备1的消息队列的key的前缀可以为:https://{vbus_access_point}/cloud_0/sys/massage/,其中,“{vbus_access_point}”为设备cloud0所连接的接入点的路径,“cloud_0”为存储设备1的UUID,“sys”为存储设备1的系统文件所在的目录,“massage”为sys目录下的消息队列所在的位置。应理解,此处的UUID、文件名称等仅为示例,不作为对消息队列存储位置的限定,例如,“massage”也可以替换为其他名称,例如“cli”等。
图10示出了一种可能的存储池中存储的内容的目录示意图,如区域901所示设备cloud0相关的消息队列存储的位置。如图10所示,“接入点1的路径/cloud_0/sys/maggese/request/dorado_0/1”,表示关于设备cloud0的请求(request)消息,且消息的发送方为UUID为“dorado_0”的设备,该请求消息的编号为1。再如,“接入点1的路径/cloud_0/sys/maggese/response/dorado_0/1”,表示设备cloud0所发出的应答(response)消息,该应答消息的接收方为UUID为“dorado_0”的设备,该应答消息为编号为1的请求消息的应答。
在实现方式2中,设备通过周期或者非周期的读取消息队列,可以获得需要处理的消息。 这种方式基于对象存储服务即可实现,无需建立消息队列服务,独立性好。
以图2所示的数据处理系统为例,在存储设备1(其UUID例如为“cloud_0”)为发送端、存储设备2(其UUID例如为“dorado_0”)为接收端时。存储设备2向存储设备1的请求消息队列写入请求消息(该请求消息的编号为1),该请求消息指示请求存储设备1推送某数据。以图9为例,存储设备2写入的请求消息的key为:“接入点1的路径/cloud_0/sys/maggese/request/dorado_0/1”。而存储设备1通过读取请求消息队列,即读取路径“接入点1的路径/cloud_0/sys/maggese/request”中存储的内容,可以获取该请求消息,从而将数据推送到指定位置。
在推送数据后,存储设备1可以向应答消息队列写入数据已准备消息。以图9为例,该数据已准备消息的key为:“接入点1的路径/cloud_0/sys/maggese/response/dorado_0/1”。而存储设备2可以从应答消息队列获取该数据已准备消息,从而在指定位置拉取存储设备1推送的数据。如此,通过虚拟总线完成数据流动过程。
以上两个实现方式仅以存储设备和存储设备之间的数据流动为例说明数据流动的过程,并不旨在限定数据只能在存储设备之间流动。具体实施过程中,虚拟总线还可以支持服务与之间、或存储设备与服务之间的数据流动,上述两种场景的数据流动方式也可以参考设备与设备之间的数据流动方式,此处不在一一赘述。
总之,经过虚拟总线的数据流动通过推拉的方式实现。发送端将数据推送到指定位置,然后接收端从指定位置拉取(下载)数据。而虚拟总线通过消息服务来控制数据流动的顺序。
请参见图11,图11是本申请实施例提供的一种可能的数据处理方法的流程示意图,图11所示的实施例中,数据流出的一方称为发送端,将数据流入的一方称为接收端。其中,发送端可以是相同类型的设备,也可以是不同的类型的设备。即,图11所示的数据流动过程具体可以是存储设备与存储设备之间的数据流动、也可以是存储设备与服务之间的流动,还可以是存储设备与服务之间的流动。
该方法可以应用于图2所示的数据处理系统。图11所示的数据处理方法包含步骤S1101至步骤S1105。
步骤S1101:接收端获取数据的发送端的消息队列的指示。
其中,接收端是数据的目的点,发送端是数据的来源。以图2为例,当存储设备1中的数据需要迁移到存储设备2中的情况下,存储设备1为发送端,存储设备2则为接收端。再如,用户存储在存储设备1中的数据需要发送到服务1中进行数据分析时,存储设备1为发送端,服务1则为接收端。
本申请实施例中,接收端和发送端的数量可以是一个,也可以是多个。例如,存储设备1可以将数据复制到存储设备2、存储设备3和存储设备4中,此时存储设备1为发送端,存储设备2、存储设备3和存储设备4均为接收端。
发送端的消息队列,用于记录关于发送端的消息。发送端的消息队列可以有一个,也可以有多个,在发送端拥有多个消息队列的情况下,多个消息队列可以属于不同类别的消息队列。
一种分类方式中,按照消息的发送方向的不同,发送端的消息队列可以包含请求消息队列、应答消息队列等。其中,请求消息队列用于存储其他设备向发送端发送的消息,应答消 息用于存储发送端向其他设备发送的消息。
又一种分类方式中,按照消息的接收方的数量的不同,消息队列可以包含点对点消息队列、组播消息队列、广播消息队列等等。其中,点对点消息队列中的消息的接收方数量通常为1个,组播消息队列中的消息的接收方属于一个通信组(组内可以有多个接收方,也可以有1个接收方,甚至通信组内可以没有接收方),广播消息队列中的消息接收方为一个范围的所有接收方。应理解,此处的范围可以覆盖整个虚拟总线,或者覆盖在虚拟总线的预先划分得到的部分设备。
又一种可能的分类方式中,按照消息所起的功能不同,消息队列可以包含控制消息队列、数据消息队列等。消息队列的指示用于标记某一消息队列。消息队列的指示可以包含消息队列的编号、消息队列的ID、消息队列的路径、或消息队列的URL。接收端根据消息队列的指示,可以找到发送端的消息队列。
示例性的,虚拟总线提供消息队列服务,消息队列服务中存储了多个设备分别对应的消息队列,并通过消息队列ID来区分不同的消息队列。接收端可以基于虚拟总线从消息队列服务中获取发送端的消息队列ID。
示例性的,虚拟总线的资源存储池中存储了设备对应的消息队列。以图10所示的资源存储池为例,发送端是UUID为“cloud_0”的设备,则发送端的消息队列的指示可以为:“接入点1的路径/cloud_0/sys/maggese/”。
一种可能的设计中,发送端的消息队列的指示可以包含于元数据中。接收端从虚拟总线扫描接入点,可以获取该接入点下的元数据,从而可以获取发送端的消息队列的指示。例如,如图10所示,接收端扫描接入点1,可以获取接入点下的元数据,从而可以获取到UUID为“cloud_0”的设备的消息队列的指示。
步骤S1102:接收端设备在消息队列中添加数据推送请求。
具体地,接收端根据消息队列的指示,可以在消息队列中添加新的消息。由于接收端需要获取发送端的数据,因此可以在消息队列中添加数据推送请求。
在发送端使用消息队列服务存储消息队列的情况下,接收端设备可以根据消息队列的指示,通过消息队列服务可以在发送端的消息队列中添加数据推送请求。
在发送端使用虚拟总线本身的存储空间来存储消息队列的情况下,发送端设备可以根据消息队列的指示,在消息队列对应的路径下添加数据推动请求。以图10所示的资源存储池为例,当发送端是UUID为“cloud_0”的设备时,接收端可以在“接入点1的路径/cloud_0/sys/maggese/request”这一路径下添加数据推送请求。例如,接收端是UUID为“dorado_0”的设备时,数据推送请求(例如编号为1)具体可以添加在“接入点1的路径/cloud_0/sys/maggese/request/dorado_0/1”。
可理解的,发送端在消息队列中写入数据推送请求,接收端可以通过获取该数据推送请求。
例如,在使用消息队列服务存储消息队列的情况下,消息队列服务可以通过门铃机制告知发送端存在消息待读取。相应的,发送端可以从消息队列中获取该数据推送请求。
再如,在使用虚拟总线的存储空间存储消息队列的情况下,发送端可以周期或者非周期的读取自身的消息队列,从而可以获取该数据推送请求。这种方式也可以称为轮询(polling)机制,即发送端周期或者非周期的轮流读取消息队列,从而发现未处理的消息。
作为一种可能的方案,接收端可以包含如图3所示的总线连接器301,接收端通过总线 连接器来实现消息相关的业务。例如,接收端通过总线连接器,以预先定义的格式、或者按照通信协议向消息队列中写入数据推送消息、读取消息队列中的消息等。
类似地,发送端也可以包含如图3所示的总线连接器301。例如,发送端通过总线连接器读取消息队列中的消息、向消息队列中写入数据已准备消息等。
步骤S1103:发送端响应于数据推送请求,将数据推送到指定位置。
该指定位置提供了存储空间,用于存储发送端所推送的数据。该指定位置可以通过路径、或URL等方式描述。
指定位置可以在虚拟总线的存储空间中。示例性地,指定位置可以位于虚拟总线的资源存储池。图12示出了一种可能的存储池中的目录的示意图,对象存储池1201包含k(k>0)个对象桶,其中,对象桶1、对象桶m等用于存储注册信息等,接入点k等用于缓存数据。以发送端是UUID为“cloud_0”的设备、接收端是UUID为“dorado_0”的设备为例,指定位置可以为以下路径:“对象桶编号m/temp_0/cloud_0/data_1”。此时,发送端可以将数据推送到上述路径对应的存储空间中进行存储。
当然,指定位置也可以是位于非虚拟总线中的其他存储位置。
在推动数据时,发送端先确定推送数据的指定位置,然后再向指定位置推送数据。据日的,发送端可以通过以下方式,确定要推送数据的指定位置:
方式1:可以是预先配置的、预先定义的存储位置。例如,在虚拟总线的对象存储池中预先划分存储空间,用于缓存发送端推送的数据。
方式2:由发送端从可使用的存储位置中选择一段存储空间作为缓存数据的指定位置。
方式3:发送端和接收端共同协商得到该指定位置。
方式4:由接收端从可使用的存储位置中选择一段存储空间作为缓存数据的指定位置。进一步的,在这种情况下,接收端可以将指定位置包含于数据推送请求中,从而使得发送端根据数据推送请求可以了解数据推送到何处。
步骤S1104:接收端获取数据已准备消息。
具体地,接收端可以通过消息队列获取数据已准备消息。此处的消息队列可以是发送端的应答消息队列、或者点对点消息队列、组播消息队列等。
示例性地,以图10为例,发送端将数据推送到指定位置后,可以在以下路径中写入数据已准备消息(消息编号为1):“接入点1的路径/cloud_0/sys/maggese/response/dorado_0”。接收端可以周期或者非周期的读取该路径中的消息队列,从而获取该数据已准备消息。
示例性地,消息队列服务中包含广播消息队列,接收端可以接受该广播消息队列的消息。发送端将数据推送到指定位置后,可以在广播消息队列中写入数据已准备消息。相应的,接收端通过读取广播消息队列,从而获取数据已准备消息。
示例性地,若该指定位置位于虚拟总线中,发送端将数据推送到指定位置时,自动触发数据已准备消息。在使用消息队列服务存储消息队列时,虚拟总线可以通过消息队列服务以门铃机制的形式,提醒接收端读取该数据已准备消息。相应的,接收端可以获取该数据已准备消息。
步骤S1105:接收端获取数据。
具体地,接收端可以从前述指定位置拉取(下载)数据(或数据片段)。
在一种可能的设计中,发送端可以以多个数据片段的形式推送数据。此时,发送端每上传一个数据片段,则写入一个数据已准备消息。接收端通过多个数据已准备消息分别接收多 个数据片段,合并得到完整数据。
在又一种可能的设计中,发送端还可以对指定位置中的数据进行按需拉取。按需拉取是指,接收端开始仅拉取元数据,或者拉取部分数据,或者,拉取元数据和部分数据。当需要使用未拉取的数据时,再通过虚拟总线拉取剩余的部分数据或者剩余的全部数据。
在图11所示的实施例中,虚拟总线可以看作是数据隧道,基于虚拟总线可以实现设备之间的数据流动。通过消息队列,可以控制数据流动的过程。在通过消息队列服务来控制消息队列的情况下,消息队列服务还可以主动提醒接收方处理消息,缩短消息等待处理的时间,提升数据流动效率。
在一种可能的设计中,发起数据推送请求的设备与接收数据的设备可以是不同的设备。示例性的,请参见图13,图13是本申请实施例提供的又一种可能的数据处理方法的流程示意图,包含步骤S1301至步骤S1305,详细描述可以参见步骤S1101至步骤S1105的相关描述。其中,请求端是发起数据推送请求的设备,发送端是数据流出的一方,接收端是数据流入的一方。
在图13所示的实例中,请求端可以获取发送端的消息队列的指示,向消息队列添加数据推送请求。相应的,发送端可以主动从消息队列读取数据推送请求,或者消息队列服务提醒发送端读取数据推送请求。发送端获取数据推送请求后,可以将数据推送到指定位置。
数据推送到指定位置后,接收端可以主动从消息队列读取数据已准备消息,或者消息队列服务提醒接收端读取数据已准备消息。接收端可以从指定位置获取数据。
为了便于理解,以下列举一种可能的数据流动过程。一种可能的设计中,企业A的线下的数据中心有文件系统快照,现企业A需要将文件系统中的数据推送到公有云的虚拟机(virtual machine,VM)中做数据分析。线下虚拟机和云端VM之间无网络连接,但都可以通过防火墙配置访问华为云对象存储服务。
步骤S01:数据中心在虚拟总线进行注册,注册信息中包含文件系统的快照的元数据。
示例性的,快照的元数据在虚拟总线上的访问地址如下:https://{vbus_access_point}/file/{fs_uuid}/snapshots/2021-09-07。其中,{vbus_access_point}为接入点的标识,{fs_uuid}为文件系统的唯一标识。
其中,快照的元数据中包含数据中心的消息队列的信息,例如消息队列的地址、密钥等。
步骤S02:VM获取总线访问通过虚拟总线获取元数据。
可选的,VM可以获取对数据中心的访问授权后,获取数据中心注册的快照的元数据。
步骤S03:VM从快照元数据获知数据中心的消息队列。
可选的,数据中心可以有一个或者多个相关的消息队列。
例如,数据中心可以包含控制消息队列和通知消息队列,其中,控制消息队列中包含向数据中心发送的消息;通知消息队列中包含由数据中心发送的消息。可选的,通知消息队列中,消息的接收方可以是一个(例如点对点发送),也可以是多个(例如组发、或者群发等)。
示例性地,一种可能的控制消息队列的信息(其中可以包含消息队列的元数据、消息队列的地址、或消息队列的密钥等信息)在虚拟总线上的地址可以如下:https://{vbus_access_point}/file/{fs_uuid}/snapshots/2021-09-07/device/message_queue/ctrl-endpoint。
示例性地,一种可能的通知消息队列的信息可以如下:https://{vbus_access_point}/file/{f s_uuid}/snapshots/2021-09-07/device/message_queue/broadcast-endpoint。
示例性地,一种可能的消息队列的秘密文件的信息可以如下:https://{vbus_access_point}/file/{fs_uuid}/snapshots/2021-09-07/device/message_queue/secrets。
步骤S04:VM向控制消息队列发送数据推送请求。
可选的,数据推送请求中可以包含位置指示,用于指定数据在总线的缓存位置。
例如,一种可能的位置指示可以如下:https://{vbus_access_point}/file/{fs_uuid}/snapshots/2021-09-07/relay/buffer。
步骤S05:数据中心接收数据推送请求,将推送数据到指定位置。
可选地,数据中心可以周期或者非周期性地读取消息队列,从而可以获取数据推送请求。
或者可选地,虚拟总线的消息队列服务可以以门铃形式提醒数据中心存在新消息待处理,相应的,数据中心可以获取该数据推送请求。
可选地,数据中心推送数据时,可以是推送全部数据,也可以是进行多次数据推送,每次推送一部分数据。以下以数据中心以多次推送到形式推送数据为例进行说明,对于一次推送全部数据的情况,本申请同样适用。
步骤S06:数据中心每发送完一部分数据,则发送通知到通知消息队列。
具体的,数据中心可以主动向通知消息队列中写入数据已准备消息。
步骤S07:VM通过通知消息队列,接收数据已准备消息,下载数据片段。
可选地,VM可以周期或者非周期性地读取消息队列,从而可以获取数据已准备消息。
或者可选地,虚拟总线的消息队列服务可以以门铃形式提醒VM存在新消息待处理,相应的,VM可以获取该数据已准备消息。
数据中心和VM可以多次执行步骤S06、步骤S07,以使得VM收到所需数据。
步骤S08:VM基于收到的数据进行数据分析。
如此,通过虚拟总线可以使得企业A的线下设备和线上设备之间的非直连的数据流动,无需额外建设数据隧道,可以减少维护、管理数据隧道的成本,提升数据流动的效率。
应理解,上述设备注册、设备发现、数据流动过程中,使用虚拟总线进行互联的设备都位于企业或者机构的安全边界内,这里的安全边界可以是防火墙等安全防护服务。而虚拟总线并不会向连接到自身的设备或服务发送数据,或者从设备或服务读取任何信息,因此,设备或服务的防火墙并不会影响虚拟总线的正常工作。虚拟总线中的对象存储和消息队列只用来存储数据。连接到虚拟总线的各个设备或服务,根据自身的业务需求从虚拟总线上读取数据或者向虚拟总线写入数据,从而实现以松耦合的方式进行互联及数据传输。
因此,使用虚拟总线进行互联,可以实现数据之间的互联,以非直连的形式进行数据流动,且不会破坏各个设备或服务的安全边界,因此可以保障安全边界内的设备和服务的数据安全性。
在一些可能的设计中,虚拟总线可以汇聚设备的元数据、或数据的元数据等,从而可以提供设备和数据的全局视图。通过设备的元数据、数据的元数据等信息,可以指示关于数据的多种有价值的内容,例如:如何对数据进行访问、数据在总线上存放的位置(例如,存储与哪一个对象bucket、存储与哪一个接入点、访问前缀prefix等)、目前被哪些设备共享访问、相关的消息队列的信息等。
一些可能的设计中,这些信息、或内容,可以被管理服务(或者管理虚拟总线的设备)收集,用于呈现以数据为中心的数据管理视图。例如,如图5所示的数据处理系统中,服务604a可以通过DME来查看虚拟总线中的数据、元数据,生成以数据为中心的数据管理管理视图。示例性地,该管理视图可以以图4所示的形式来呈现,管理员基于数据管理视图可以数据在总线上的存放位置、相关的消息队列等。
作为一种可能的方案,在以数据为中心的数据管理视图中,数据的冷热程度等信息可以被标记,从而为数据分享、数据安全提供便利。
进一步的,管理设备可以根据业务的需求,选择其中的子集,以建立联邦文件系统(数据面)。
可选地,文件系统可以是针对某一特定应用场景、某一使用需求建立的文件系统。
从生命周期的角度来看,数据的生命周期和设备是不一样的,大多数情况下数据的生命周期更长。例如,在设备存在故障或者设备退役时,数据需要搬迁到别的设备里。通过以数据为中心的管理视图,可以将设备作为存放数据的介质和提供数据访问的通道(Access Point),且“存放数据的介质”和“提供数据访问的通道”这两种功能也是可以分离的。
例如,通过以数据为中心的数据管理视图,可以实现以下3种设计:
设计1,文件系统与设备是独立的存在,不属于任何设备。
文件系统以文件系统的身份信息进行区分,文件系统的身份信息可以是文件系统的ID、编号等。文件系统的身份信息是全局唯一的,也称为文件系统的唯一标识。例如,可以使用UUID来唯一标识一个文件系统。
通常来说,文件系统创建时则定义文件系统的唯一标识,且不可改变。
设计2,文件系统有至少一个设备作为主复制点(copy),主copy可以变更。例如,以图2为例,在存储设备1(原本的主copy)退役时,可以将文件系统迁移至存储设备2中,此时设备2可以作为新的主copy。
设计3,文件系统有至少一个设备提供访问点(Access Point)。在联邦文件系统中,可以有多个设备提供多个访问点,以供相关设备访问文件系统。
应理解,提供访问点的设备不一定有完整的数据copy,数据可以按需从有文件系统数据copy的任意设备流(streaming)过来。文件系统的元数据指明当前哪些设备有数据的完整copy。对任意一个设备来说,文件系统在该设备的挂载点都可以统一为{device}/{uuid}。
文件系统的当前主copy所在设备称为主文件系统(Primary FS)。主文件系统可以为多个,比如双活文件系统),其他不包含完整数据的访问点称为影子文件系统(Shadow FS)。对任意设备来说,它有两种文件系统类型:主文件系统和影子文件系统。
影子文件系统与主文件系统存在对应关系。一般来说,对于一个影子文件系统,通过全局元数据则可以找到其对应的主文件系统。
在有虚拟总线的环境下,文件系统以UUID为标识注册到总线,其他影子文件系统可以通过查询总线记录得到主设备来完成业务流程。
联邦文件系统的各个节点(可以是集群)之间可以通过虚拟总线来共享和获取元数据,目标节点可以不关心源文件系统在哪里,由哪个设备提供,通过元数据可以获取需要的数据,来完成对文件系统访问。文件系统的数据布局提供了元数据在总线上的格式,使得全局文件系统的参与各方能够有序地对元数据进行更新完成多站点间的文件协作。
一种可能的场景中,用户的存储设备中的存储空间可能较大,此时将一个较大存储空间注册为一个对象,在访问、读取时效率都会受到影响,难以满足用户的需求。
示例性地,在企业的存储中心中可能具有多个块。这些块通过LUN来区分,一个LUN可以对应一个或者多个块。当多个块需要注册到虚拟总线时,将一个LUN映射为一个对象无法满足用户的需求。而本申请实施例提供一种灵活的映射方法,以实现块到对象的映射,以支持企业的各种业务的需求。
具体地,块是一个有限线性空间。块可以有多种配置类型,例如,精简配置(thin provision)的块、或全分配(full allocated)的块等。其中,Thin provision的块是一个稀疏的线性空间。而全分配(Full allocated)的块设备,可以通过去除全零的区间来形成事实上的稀疏线性空间。考虑到快照的逻辑,多个稀疏空间就有了相互的关联。
一种可能的方案中,可以将LUN的线性地址映射为一个对象。例如,比如将块中的每个4MB的空间映射为一个对象。对象的key包含前缀、LUN UID(unique ID)、和逻辑块编址(Logical Block Addressing,LBA)地址等,可选的,LBA地址也可以替换为右移N位的LBA地址,N为整数,例如:N可以为22。
为了支持快照,可以在LUN UID和地址之间加入快照ID,快照的数据就是快照0到当前快照数据的叠加。如此,通过给每一个快照创建一个映射表来加速数据的访问,这个映射表是一个单独的文件,方便一次性下载。为了方便这个映射表的创建,在上传每个快照的数据的时,可以在上传一个当前快照数据的位图表(bitmap),一个简单的bitmap文件,只包含被当前快照修改过的数据。
例如,请参见图14,图14是本申请实施例提供的一种可能的将块映射为对象的示意图。其中,每个对象文件的key为包含前缀、LUNIUD和LBA,LBA如1401所示,而value中为分段得到的空间,大小为4MB。
应理解,上述4MB仅为示例,实际过程中还可以有其他的分段大小。进一步的,还可以根据业务、输入输出(input/output,IO)大小等确定分段,以尽可能满足用户需求。
而在使用某一个分段大小时,可能出现各不满足各种输入输出(input/output,IO)大小大小业务的情况。以分段大小是4MB为例,可能会以下问题:用户修改了一个扇区却不得不上传整个段。为了解决上述问题,可以在每个段的头部添加一个映射表(映射表的大小在4MB内),这样每个分段实际的内容就小于4MB,这会让快照映射表变大。分段头部映射表也很容易扩展来支持压缩和重删。为了支持级联快照(可写快照),基础(base)数据放到LUN UID或快照根目录下面就好。一个快照数据的映射表需要回溯父目录直到LUN UID所在的层级。
又一种可能的方案中,在对象存储中建立包含多个小文件的LOG空间,其中,小文件由一个或者多个固定物理空间租成。如此形成的LOG空间可以是无线增长的。
小文件可以使用LOG分段地址作为Key。在支持压缩重删的情况下,小文件的实际大小小于分段大小。本质上可以看作LUN的固定大小线性空间(逻辑地址)到持久层线性空间(物理地址)的映射,将LUN映射为“索引+LOG”的形式。
通过LOG,可以保留写入的历史数据,支持IO级快照。而对于新写入的数据,累积到分段大小就可以上传。
由于LOG保留了写入的历史数据,可能会存在浪费存储空间和带宽的问题。而通过公有云计算资源可以对LOG空间进行垃圾回收,从来减缓存储空间和带宽的浪费。例如,通过预 先设置的策略,合并一段时长内的LOG,生成连续时间段的增量。示例性地,保留最后15分钟的所有IO,15分钟之后每分钟合并,半小时之后每5分钟合并,1小时之后每小时合并,1天之后每天合并等。
索引也需要映射到对象存储服务(例如S3)中,可以是通过检查点(checkpoint)进行映射,也可能是基于LOG的映射。对于可写快照的,可以建一个单独的LOG,LOG的起点不是0,而是形成快照时刻源LUN的最新物理地址。该地址之前从源LUN读取,之后从快照LUN读取。
应理解,具体使用时,不是所有的应用场景都需要这种级别的历史数据。在文件跨云实时协作方面,这种基于LOG的方案也许会带来更多的好处。
本申请实施例还提供一种计算节点,该计算节点包括处理器和存储器;存储器中存储有至少一条计算机指令;该指令由该处理器加载并执行,以实现前述虚拟总线所执行的方法操作。
本申请实施例还提供一种存储设备,该存储设备包括处理器和存储器;存储器中存储有至少一条计算机指令;该指令由该处理器加载并执行,以实现前述存储设备、或存储层所执行的方法操作。
本申请还提供了一种算机可读存储介质,所述计算机可读存储介质中存储有指令,当所述指令在至少一个处理器上运行时,实现前述的版本管理方法,例如图11所述的方法。
本申请还提供了一种计算机程序产品,该计算机程序产品包括计算机指令,在被计算设备执行时,实现前述的版本管理方法,例如图11所述的方法。
本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其他实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念。
本申请中实施例提到的“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a、b、或c中的至少一项(个),可以表示:a、b、c、(a和b)、(a和c)、(b和c)、或(a和b和c),其中a、b、c可以是单个,也可以是多个。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A、同时存在A和B、单独存在B这三种情况,其中A、B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。
以及,除非有相反的说明,本申请实施例使用“第一”、“第二”等序数词是用于对多个对象进行区分,不用于限定多个对象的顺序、时序、优先级或者重要程度。例如,第一消息队列和第二消息队列,只是为了便于描述,而并不是表示这第一消息队列和第二消息队列的结构、重要程度等的不同,在某些实施例中,第一消息队列和第二消息队列还可以是同样的设备。
上述实施例中所用,根据上下文,术语“当……时”可以被解释为意思是“如果……”或“在……后”或“响应于确定……”或“响应于检测到……”。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。

Claims (23)

  1. 一种数据处理系统,其特征在于,包括:存储层、虚拟总线和服务层;
    所述存储层包括多个存储设备,所述存储层用于提供存储空间;
    所述虚拟总线,用于接收所述存储设备的注册,形成一个包含多个存储设备的虚拟网络;
    所述服务层,用于基于所述虚拟网络为用户提供服务。
  2. 根据权利要求1所述的系统,其特征在于,所述多个存储设备包含存储阵列、存储服务器、公有云、私有云、或混合云中的一项或者多项。
  3. 根据权利要求1或2所述的系统,其特征在于,所述多个存储设备中的一个存储设备用于向所述虚拟总线提交注册信息。
  4. 根据权利要求3所述的系统,其特征在于,所述注册信息包含所述存储设备的设备信息。
  5. 根据权利要求3或4所述的系统,其特征在于,所述注册信息包含数据的元数据。
  6. 根据权利要求3-5任一项所述的系统,其特征在于,所述注册信息包含存储池的信息。
  7. 根据权利要求3-6任一项所述的系统,其特征在于,所述注册信息满足预设的数据格式,所述数据格式包含键值key-value格式。
  8. 根据权利要求3-7任一所述的系统,其特征在于,所述多个存储设备中的一个存储设备用于在获取授权之后向所述虚拟总线提交所述注册信息。
  9. 根据权利要求1-8任一项所述的系统,其特征在于,所述多个存储设备中的发送端和接收端之间不直接相连,所述发送端和所述接收端通过所述虚拟总线交互数据或者传递消息。
  10. 根据权利要求1-9任一项所述的系统,其特征在于,所述虚拟总线还包括消息队列,所述消息队列,所述消息队列用于存储关于任意一个所述存储设备的消息。
  11. 根据权利要求10所述的系统,其特征在于,所述多个存储设备包含发送端和接收端;
    所述消息队列用于记录来自所述接收端的数据推送请求,所述数据推送请求用于请求所述发送端推送目标数据;
    所述发送端用于向预设位置推送所述目标数据;
    所述消息队列还用于记录关于所述接收端的数据准备消息,所述数据准备消息用于通知所述接收端拉取所述目标数据。
  12. 一种数据处理方法,其特征在于,包括:
    提供存储层,所述存储层包括多个存储设备,所述存储层用于提供存储空间;
    提供所述虚拟总线,所述虚拟总线用于接收所述存储设备的注册,形成一个包含多个存储设备的虚拟网络;
    提供服务层,所述服务层用于基于所述虚拟网络为用户提供服务。
  13. 根据权利要求12所述的方法,其特征在于,所述多个存储设备包含存储阵列、存储服务器、公有云、私有云、或混合云中的一项或者多项。
  14. 根据权利要求12或13所述的方法,其特征在于,所述方法还包括:
    所述多个存储设备中的一个存储设备向所述虚拟总线提交注册信息。
  15. 根据权利要求14所述的方法,其特征在于,所述注册信息包含所述存储设备的设备信息。
  16. 根据权利要求14或15所述的方法,其特征在于,所述注册信息包含数据的元数据。
  17. 根据权利要求14-16任一项所述的方法,其特征在于,所述注册信息包含存储池的信 息。
  18. 根据权利要求12-17任一项所述的方法,其特征在于,所述注册信息满足预设的数据格式,所述数据格式包含键值key-value格式。
  19. 根据权利要求14-18任一所述的方法,其特征在于,所述方法还包括:
    多个存储设备中的一个存储设备在获取授权之后向所述虚拟总线提交所述注册信息。
  20. 根据权利要求12-19任一项所述的方法,其特征在于,所述多个存储设备中的发送端和接收端之间不直接相连,所述发送端和所述接收端通过所述虚拟总线交互数据或者传递消息。
  21. 根据权利要求12-20任一项所述的方法,其特征在于,所述虚拟总线还包括消息队列,所述消息队列,所述消息队列用于存储关于任意一个所述存储设备的消息。
  22. 根据权利要求21所述的方法,其特征在于,所述多个存储设备包含发送端和接收端,所述方法还包括:
    所述虚拟总线通过所述消息队列记录来自所述接收端的数据推送请求,所述数据推送请求用于请求所述发送端推送目标数据;
    所述发送端向预设位置推送所述目标数据;
    所述虚拟总线通过所述消息队列记录关于所述接收端的数据准备消息,所述数据准备消息用于通知所述接收端拉取所述目标数据。
  23. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当所述指令在至少一个处理器上运行时,实现如权利要求12-22中任一项所述的方法。
PCT/CN2022/110102 2021-09-17 2022-08-03 数据处理系统、数据处理方法及相关装置 Ceased WO2023040504A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22868881.8A EP4391463A4 (en) 2021-09-17 2022-08-03 DATA PROCESSING SYSTEM, DATA PROCESSING METHOD AND ASSOCIATED DEVICE
US18/606,527 US12568140B2 (en) 2021-09-17 2024-03-15 Using virtual bus to form virtual network

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202111091182 2021-09-17
CN202111091182.4 2021-09-17
CN202111426817.1 2021-11-27
CN202111426817.1A CN115834276A (zh) 2021-09-17 2021-11-27 数据处理系统、数据处理方法及相关装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/606,527 Continuation US12568140B2 (en) 2021-09-17 2024-03-15 Using virtual bus to form virtual network

Publications (1)

Publication Number Publication Date
WO2023040504A1 true WO2023040504A1 (zh) 2023-03-23

Family

ID=85515492

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/110102 Ceased WO2023040504A1 (zh) 2021-09-17 2022-08-03 数据处理系统、数据处理方法及相关装置

Country Status (4)

Country Link
US (1) US12568140B2 (zh)
EP (1) EP4391463A4 (zh)
CN (1) CN115834276A (zh)
WO (1) WO2023040504A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025016239A1 (zh) * 2023-07-17 2025-01-23 中兴通讯股份有限公司 多网元之间的通信方法和系统、消息总线系统、存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115834276A (zh) * 2021-09-17 2023-03-21 华为技术有限公司 数据处理系统、数据处理方法及相关装置
US12437034B2 (en) * 2023-05-18 2025-10-07 Microsoft Technology Licensing, Llc Multiple targeted intranet and employee engagement experiences within a tenant
CN118409716B (zh) * 2024-07-02 2024-09-10 成都山莓科技有限公司 一种基于服务器超融合的数据写入管理方法、设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101753617A (zh) * 2009-12-11 2010-06-23 中兴通讯股份有限公司 一种云存储系统和方法
CN106598498A (zh) * 2016-12-13 2017-04-26 郑州信息科技职业学院 一种兼容整合异构存储设备的存储虚拟化系统和方法
CN106686140A (zh) * 2017-03-06 2017-05-17 郑州云海信息技术有限公司 一种网络虚拟化存储方法、设备和系统
US9930115B1 (en) * 2014-12-18 2018-03-27 EMC IP Holding Company LLC Virtual network storage function layer comprising one or more virtual network storage function instances
US20210289026A1 (en) * 2020-03-13 2021-09-16 Amazon Technologies, Inc. Multi-service storage layer for storing application-critical data

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101191544B1 (ko) * 2011-01-21 2012-10-15 엔에이치엔(주) 캐시 클라우드 구조를 이용한 캐시 시스템 및 캐싱 서비스 제공 방법
US9563480B2 (en) * 2012-08-21 2017-02-07 Rackspace Us, Inc. Multi-level cloud computing system
WO2014101218A1 (zh) * 2012-12-31 2014-07-03 华为技术有限公司 一种计算存储融合的集群系统
US9471360B2 (en) * 2014-03-14 2016-10-18 International Business Machines Corporation Returning terminated virtual machines to a pool of available virtual machines to be reused thereby optimizing cloud resource usage and workload deployment time
US20150381736A1 (en) * 2014-06-30 2015-12-31 Chris Timothy Seltzer Distributed cloud storage
BR112017001840A2 (pt) * 2014-08-04 2017-11-21 Huawei Tech Co Ltd método e aparelho para desenvolvimento de operação, administração e manutenção virtual, e sistema de rede virtualizada
US9531814B2 (en) * 2014-09-23 2016-12-27 Nuvem Networks, Inc. Virtual hosting device and service to provide software-defined networks in a cloud environment
CN105893139B (zh) * 2015-01-04 2020-09-04 伊姆西Ip控股有限责任公司 在云存储环境中用于向租户提供存储服务的方法和装置
US9609025B1 (en) * 2015-11-24 2017-03-28 International Business Machines Corporation Protection of sensitive data from unauthorized access
US10892942B2 (en) * 2016-01-22 2021-01-12 Equinix, Inc. Container-based cloud exchange disaster recovery
US11405423B2 (en) * 2016-03-11 2022-08-02 Netskope, Inc. Metadata-based data loss prevention (DLP) for cloud resources
US10007459B2 (en) * 2016-10-20 2018-06-26 Pure Storage, Inc. Performance tuning in a storage system that includes one or more storage devices
US10491698B2 (en) * 2016-12-08 2019-11-26 International Business Machines Corporation Dynamic distribution of persistent data
US10970309B2 (en) * 2019-06-05 2021-04-06 Advanced New Technologies Co., Ltd. Data storage method and apparatus
CN114270306A (zh) * 2019-08-27 2022-04-01 西门子股份公司 应用程序开发部署方法、装置和计算机可读介质
CN115834276A (zh) * 2021-09-17 2023-03-21 华为技术有限公司 数据处理系统、数据处理方法及相关装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101753617A (zh) * 2009-12-11 2010-06-23 中兴通讯股份有限公司 一种云存储系统和方法
US9930115B1 (en) * 2014-12-18 2018-03-27 EMC IP Holding Company LLC Virtual network storage function layer comprising one or more virtual network storage function instances
CN106598498A (zh) * 2016-12-13 2017-04-26 郑州信息科技职业学院 一种兼容整合异构存储设备的存储虚拟化系统和方法
CN106686140A (zh) * 2017-03-06 2017-05-17 郑州云海信息技术有限公司 一种网络虚拟化存储方法、设备和系统
US20210289026A1 (en) * 2020-03-13 2021-09-16 Amazon Technologies, Inc. Multi-service storage layer for storing application-critical data

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4391463A4

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2025016239A1 (zh) * 2023-07-17 2025-01-23 中兴通讯股份有限公司 多网元之间的通信方法和系统、消息总线系统、存储介质

Also Published As

Publication number Publication date
US20240223655A1 (en) 2024-07-04
EP4391463A4 (en) 2024-10-16
EP4391463A1 (en) 2024-06-26
US12568140B2 (en) 2026-03-03
CN115834276A (zh) 2023-03-21

Similar Documents

Publication Publication Date Title
CN101449559B (zh) 分布式存储器
US10782880B2 (en) Apparatus and method for providing storage for providing cloud services
WO2023040504A1 (zh) 数据处理系统、数据处理方法及相关装置
TWI221220B (en) Virtual one-dimensional method and device of multiple network storages
KR101366220B1 (ko) 분산형 저장소
CN112445570A (zh) 一种云平台资源迁移方法及其装置、存储介质
US10503693B1 (en) Method and system for parallel file operation in distributed data storage system with mixed types of storage media
CN109314721B (zh) 分布式文件系统的多个集群的管理
US8627446B1 (en) Federating data between groups of servers
CN111385325B (zh) 基于p2p的文件分发系统和方法
US9817832B1 (en) Unified framework for policy-based metadata-driven storage services
CN116615891B (zh) 发布-订阅系统上的密钥轮换
US11853616B2 (en) Identity-based access to volume objects
CN105740469A (zh) 存储服务器和元数据访问方法
WO2021047227A1 (zh) 跨区域共享服务的方法、装置、管理设备及存储介质
US20190121899A1 (en) Apparatus and method for managing integrated storage
WO2025025694A1 (zh) 权限校验方法、装置、设备及集群
US20050234961A1 (en) Systems and Methods for providing a proxy for a shared file system
CN114745397A (zh) 一种基于私有云的在线存储方法、系统
CN119854324B (zh) 一种基于ceph的网盘存储方法
TWI537750B (zh) 支援實體檔案系統之檔案管理的方法及應用該方法的檔案伺服器
JP2023509903A (ja) ハイブリッドクラウド非同期データ同期
EP1860846B1 (en) Method and devices for managing distributed storage
CN119201850A (zh) 数据访问方法、装置及系统
CN118784630A (zh) 一种数据访问方法及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22868881

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022868881

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022868881

Country of ref document: EP

Effective date: 20240321

NENP Non-entry into the national phase

Ref country code: DE