WO2024140375A1 - 一种存储设备、数据通信方法以及系统 - Google Patents
一种存储设备、数据通信方法以及系统 Download PDFInfo
- Publication number
- WO2024140375A1 WO2024140375A1 PCT/CN2023/140344 CN2023140344W WO2024140375A1 WO 2024140375 A1 WO2024140375 A1 WO 2024140375A1 CN 2023140344 W CN2023140344 W CN 2023140344W WO 2024140375 A1 WO2024140375 A1 WO 2024140375A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- sent
- chip
- otn
- storage device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04J—MULTIPLEX COMMUNICATION
- H04J3/00—Time-division multiplex systems
- H04J3/16—Time-division multiplex systems in which the time allocation to individual channels within a transmission cycle is variable, e.g. to accommodate varying complexity of signals, to vary number of channels transmitted
- H04J3/1605—Fixed allocated frame structures
- H04J3/1652—Optical Transport Network [OTN]
- H04J3/1664—Optical Transport Network [OTN] carrying hybrid payloads, e.g. different types of packets or carrying frames and packets in the paylaod
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04J—MULTIPLEX COMMUNICATION
- H04J3/00—Time-division multiplex systems
- H04J3/16—Time-division multiplex systems in which the time allocation to individual channels within a transmission cycle is variable, e.g. to accommodate varying complexity of signals, to vary number of channels transmitted
- H04J3/1605—Fixed allocated frame structures
- H04J3/1652—Optical Transport Network [OTN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17306—Intercommunication techniques
- G06F15/17331—Distributed shared memory [DSM], e.g. remote direct memory access [RDMA]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/0604—Improving or facilitating administration, e.g. storage management
- G06F3/0607—Improving or facilitating administration, e.g. storage management by facilitating the process of upgrading existing storage systems, e.g. for improving compatibility between host and storage device
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0655—Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
- G06F3/0661—Format or protocol conversion arrangements
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/067—Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L1/00—Arrangements for detecting or preventing errors in the information received
- H04L1/0001—Systems modifying transmission characteristics according to link quality, e.g. power backoff
- H04L1/0002—Systems modifying transmission characteristics according to link quality, e.g. power backoff by adapting the transmission rate
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04Q—SELECTING
- H04Q11/00—Selecting arrangements for multiplex systems
- H04Q11/0001—Selecting arrangements for multiplex systems using optical switching
- H04Q11/0005—Switch and router aspects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04Q—SELECTING
- H04Q11/00—Selecting arrangements for multiplex systems
- H04Q11/0001—Selecting arrangements for multiplex systems using optical switching
- H04Q11/0062—Network aspects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04Q—SELECTING
- H04Q11/00—Selecting arrangements for multiplex systems
- H04Q11/0001—Selecting arrangements for multiplex systems using optical switching
- H04Q11/0005—Switch and router aspects
- H04Q2011/0007—Construction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04Q—SELECTING
- H04Q11/00—Selecting arrangements for multiplex systems
- H04Q11/0001—Selecting arrangements for multiplex systems using optical switching
- H04Q11/0062—Network aspects
- H04Q2011/0073—Provisions for forwarding or routing, e.g. lookup tables
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04Q—SELECTING
- H04Q2213/00—Indexing scheme relating to selecting arrangements in general and for multiplex systems
- H04Q2213/1301—Optical transmission, optical switches
Definitions
- the present application relates to the field of optical communication technology, and in particular to a storage device, a data communication method and a system.
- the Optical Transport Network In the Data Communication Network (DCN), the Optical Transport Network (OTN) has become the mainstream technology used in the transport network due to its high bandwidth, large capacity, high reliability, and low latency. It is widely used in backbone, metropolitan, core, and aggregation networks. OTN frames are used to carry various business data and provide rich management and monitoring functions.
- DCN Data Communication Network
- OTN Optical Transport Network
- the server is deployed with Remote Direct Memory Access (RDMA) applications.
- RDMA Remote Direct Memory Access
- the RDMA application calls the Ethernet card in the storage device to read the service data in the storage device, and the Ethernet card encapsulates the service data and sends it to the Ethernet switch.
- the Ethernet switch forwards the encapsulated data to the OTN device.
- the OTN device adds an OTN frame header to the encapsulated data to generate an OTN frame and sends the OTN frame to other devices in the OTN. It can be seen that the service data needs to be transmitted and encapsulated multiple times to determine the OTN frame. When the amount of service data is large, the communication efficiency between end-to-end devices in the OTN is low.
- the present application provides a storage device, a data communication method and a system, which solves the problem that service data needs to be transmitted and encapsulated multiple times through different devices to determine an OTN frame, thereby improving the communication efficiency between end-to-end devices in the OTN.
- the network card in the storage device completes the OTN encapsulation of the data, so that the data can generate an OTN frame mapped with the data without being transmitted and encapsulated through multiple devices, reducing the transmission delay between the data from the storage device to the optical transceiver and improving the data communication efficiency.
- the protocol stack used by the storage device to encapsulate the data into the OTN frame does not need to use the Ethernet (ETH) protocol, reducing the encapsulation process required to generate the OTN frame and the amount of data contained in the OTN frame, which is conducive to further improving the data communication efficiency.
- the optical transceiver is further used to receive a data write response from the target storage device.
- the data write response is used to indicate that the data to be sent has been written into the target storage device.
- the memory is further used to maintain at least one remote direct memory access (RDMA) sending queue, the at least one RDMA sending queue includes a first sending queue, the first sending queue stores one or more work queue elements (WQE) of data, the one or more data including the aforementioned data to be sent.
- the optical transceiver is also used to provide multiple OTN channels, wherein one OTN channel is used to transmit data corresponding to one RDMA sending queue.
- the aforementioned data processing chip is also used to: read the WQE of the data to be sent from the first sending queue, and establish a mapping relationship between the WQE and the first OTN channel among the multiple OTN channels. The mapping relationship is used to indicate that the data to be sent can be transmitted through the first OTN channel.
- the RDMA queue in the storage device can establish a mapping relationship with the OTN channel provided by the optical transceiver, so that data of different RDMA queues are transmitted via different OTN channels, wherein the mapping relationship is based on the network card recorded in the RDMA queue.
- the WQE of the data is used to establish the OTN frame, which prevents the optical transceiver from sending the OTN frame corresponding to the data to the OTN channel that does not match the WQE of the data, thereby improving the accuracy of data communication.
- the network card can reuse the mapping relationship to transmit the OTN frame mapped with the other data through the first OTN channel, further improving the data communication efficiency in the optical communication network.
- the data processing chip includes: a first chip and a second chip.
- the first chip is used to read the WQE of the data to be sent from the first sending queue. And the first chip is also used to: write the data to be sent from the memory to the storage medium according to the source address indicated by the WQE.
- the second chip is used to: establish a mapping relationship between the WQE and the first OTN channel among the multiple OTN channels. And the second chip is also used to: read the data to be sent in the storage medium, and map the data to be sent to the payload area of the OTN frame generated by the second chip.
- the second chip can map data to the payload area of different OSU frames, so that the data to be sent can be communicated with finer time slot granularity.
- OSU technology considers the need for lossless adjustment from the beginning, and there is no compatibility problem with OTN communication, so that the communication process of the data to be sent can support a larger lossless bandwidth adjustment range, which is conducive to improving data communication efficiency.
- the lossless bandwidth adjustment here includes at least one of bandwidth increase, bandwidth reduction and bandwidth fallback. The bandwidth fallback is used to indicate the operation of restoring the original state after a problem occurs.
- the second chip is further used to determine whether the data flow rate of the first queue is greater than or equal to a set rate threshold.
- the data flow rate is the amount of data written to the first queue by the first chip per unit time. If the data flow rate is greater than or equal to the set rate threshold, the second chip is further used to instruct the first chip to lower the data flow rate of writing data to the first queue.
- the second chip when the expected speed of processing data by the second chip is too high, such as the data flow speed of the second chip is greater than or equal to the set rate threshold, the second chip can instruct the first chip to lower the data flow rate of writing data to the first queue, thereby reducing the amount of data to be processed (such as packaging) by the second chip per unit time, so as to reduce the communication load of the second chip, which is beneficial to avoiding network packet loss of the second chip and improving the communication performance of the optical communication network.
- an embodiment of the present application provides an optical communication system.
- the optical communication system includes: a storage device and an optical network device.
- the storage device includes: a processor, a memory and a network card, and the network card includes a data processing chip, a storage medium and an optical transceiver.
- the memory is used to store the data to be sent written by the processor.
- the data processing chip is used to write the data to be sent from the memory to the storage medium.
- the data processing chip is also used to read the data to be sent in the storage medium and map the data to be sent to the payload area of the OTN frame generated by the data processing chip.
- the optical transceiver is used to send the OTN frame to the aforementioned optical network device.
- the memory 1213 stores software programs, and the processor 1212 runs the software programs in the memory 1213 to manage the hard disk.
- the hard disk is abstracted into a storage resource pool, and the storage resource pool is provided to the server in the form of a logical unit number (LUN).
- LUN here is actually the hard disk seen on the server.
- some centralized storage systems are also file servers themselves, which can provide shared file services for the server.
- the engine may not have a hard disk slot, and the hard disk needs to be placed in a hard disk frame, and the back-end interface 1214 communicates with the hard disk frame.
- the back-end interface 1214 exists in the engine in the form of an adapter card, and two or more back-end interfaces 1214 can be used simultaneously on one engine to connect multiple hard disk frames.
- the adapter card can also be integrated on the motherboard, in which case the adapter card can communicate with the processor 1212 via a Peripheral Component Interconnect Express (PCIe) bus.
- PCIe Peripheral Component Interconnect Express
- FIG. 2A shows only one engine.
- the storage system may include two or more engines, and redundancy or load balancing may be performed between the multiple engines.
- the first chip 231 may be a DPU or other processors with data processing functions, etc.
- the first chip 231 is used to write data stored in the memory 22 into the cache 233 .
- the cache 233 may be used to temporarily store data read by the first chip 231, or to temporarily store data received by the optical transceiver 24, etc.
- the cache 233 may refer to a cache.
- the cache 233 may also be replaced by other types of storage media, such as DRAM, SCM, mechanical hard disk or SSD, etc.
- FIG. 2A and FIG. 2B are only possible implementations of the two storage devices provided in the embodiment of the present application.
- the devices included in the two storage devices can be interchanged.
- the names used in different diagrams are different, but both can realize the functions of the storage devices provided in the embodiment of the present application to realize the optical signal transmission function in the optical communication system.
- the network card 23 can be used to realize the functions of the network card 1226, and the network card 1226 can also include the various chips and caches included in the network card 23, and the present application does not limit this.
- the data access request is used to request the data to be sent stored in the memory 22 .
- the set threshold is 100 million bytes (MB), 500 MB or other values.
- an RDMA application initiates a data transmission task, it identifies this task as a long-distance, large-volume data transmission task (10GB data transmission over 1,000 kilometers) and notifies the data processing chip to start the long-distance data transmission.
- the data processing chip writes the data to be sent from the memory 22 into the cache 233 .
- the data processing chip here may include the first chip 231 and the second chip 232 shown in FIG. 2B .
- the first chip 231 may write the to-be-sent data from the memory 22 into the cache 233.
- the specific implementation process of the first chip writing the data from the memory into the cache may refer to the relevant description of FIG. 5 below, which will not be described in detail here.
- the data processing chip reads the data to be sent in the buffer 233, and maps the data to be sent to the payload area of the OTN frame generated by the data processing chip.
- the second chip 232 maps the data to be sent to multiple OSU frames, and the data to be sent is carried in the payload areas of the multiple OSU frames; and the second chip 232 maps the multiple OSU frames to OTN frames.
- the second chip can map data to the payload area of different OSU frames, so that the data to be sent can be communicated with finer time slot granularity.
- the OSU technology considers the need for lossless adjustment from the beginning, and there is no compatibility issue with OTN communication, so that the communication process of the data to be sent can support a larger lossless bandwidth adjustment range, which is conducive to improving data communication efficiency.
- the lossless bandwidth adjustment here includes at least one of: bandwidth increase, bandwidth reduction and bandwidth rollback.
- the bandwidth rollback is used to indicate the operation of restoring the original state after a problem occurs.
- S350 The optical transceiver 24 sends the OTN frame generated in S340 .
- the network card in the storage device implements the OTN encapsulation function of the data, so that the data can generate an OTN frame mapped with the data without being transmitted and encapsulated through multiple devices, thereby reducing the transmission delay of the data from the storage to the optical transceiver and improving the data communication efficiency.
- the protocol stack used by the storage device to encapsulate the data into OTN frames does not need to use the Ethernet protocol, which reduces the encapsulation process required to generate OTN frames and the amount of data contained in the OTN frames, which is conducive to further improving data communication efficiency.
- the storage device can directly output the OTN frame on the end-side without forwarding it through the Ethernet switch, thus realizing the hard pipe transmission capability from end-side device to end-side device in the optical communication network, avoiding packet loss during data communication, and improving data communication efficiency.
- FCS frame check sequence
- FCS is the tail field of the protocol data unit (frame) of the computer network data link layer, which is a 4-byte cyclic redundancy check code.
- FCS is also called the frame tail.
- IB payload is used to carry message payloads, such as RDMA messages or data.
- IB BTH is the InfiniBand base transport header (IB BTH) provided by the IB protocol.
- the IB BTH field is used to indicate the destination QP, operation code, packet sequence numbers (PSN) and partition.
- the OpCode field in the BTH field determines the start and end of the SEND message.
- the user datagram protocol (UDP) field is used to indicate that the payload of the message is an RDMA message.
- IP internet protocol
- the internet protocol (IP) field is used for layer 3 forwarding through the switch.
- the ETH header field is used to indicate additional fields in the Ethernet transmission process, etc.
- the OTN header field is used to indicate the frame header for processing the optical signal in the optical transmission network process.
- the structural diagram of the protocol stack provided in the example, the protocol stack used by the OTN frame in the technical solution provided in this application includes: RDMA application layer protocol (RDMA application protocol), IB transmission protocol, OSU link layer protocol (OSU link layer protocol), OSU physical layer protocol (OSU physical layer protocol, OSU PHY layer).
- RDMA application protocol RDMA application layer protocol
- IB transmission protocol OSU link layer protocol
- OSU link layer protocol OSU link layer protocol
- OSU physical layer protocol OSU physical layer protocol
- OSU PHY layer OSU physical layer protocol
- the storage device can instruct that small data transmission tasks between different DCs still take the switch transmission path, such as the storage device 121-storage device 121 included in the network card-Ethernet switch 21-optical network device 31 shown in Figure 1.
- the long-distance transmission of large amounts of data in batches is directly transmitted through the OTN network card of the storage device connected to the OTN transmission device.
- the storage device can directly output OTN frames on the end side without forwarding through the Ethernet switch, realizing the hard pipe transmission capability from end-to-end device in the optical communication network, avoiding packet loss during data communication, and improving data communication efficiency.
- the optical transceiver 24 may also receive a data write response from the target storage device.
- the data write response is used to indicate that the data to be sent has been written into the target storage device.
- the target storage device may refer to the storage device 122 in FIG1 .
- the data write response may be an OTN frame generated and sent by a network card included in the storage device 122 .
- the storage device determines that the current data transmission is completed, thereby avoiding resource consumption caused by the storage device reserving hardware resources (such as computing resources or storage resources) for this data transmission, which is conducive to the storage device using limited hardware resources to execute other services.
- hardware resources such as computing resources or storage resources
- the memory 22 maintains one or more RDMA send queues (send queue, SQ), which may include SQ 1 to SQ N, etc., and each SQ stores a plurality of WQEs of data.
- WQE includes: source address of data, target address, memory address of data storage, destination storage device identifier, transmission completion time, data volume, etc.
- SQE includes: source address of data, target address, memory address of data storage, destination storage device identifier, transmission completion time, data volume, etc.
- WQE includes: source address of data, target address, memory address of data storage, destination storage device identifier, transmission completion time, data volume, etc.
- RQ receive queue
- SQ 1 can also be called the first sending queue.
- SQ 1 stores WQEs of one or more data, and the one or more data include the aforementioned data to be sent.
- the optical transceiver 24 is also used to provide multiple OTN channels (such as channel 1 and channel 2 in FIG5 ), wherein one OTN channel is used to transmit data corresponding to one SQ.
- channel 1 is used to transmit data corresponding to WQE1 and WQE2 in SQ 1.
- Step 2 DPU (first chip 231) reads the WQE of the data to be sent from SQ 1, and writes the data to be sent from the memory 22 to the cache 233 according to the source address indicated by the WQE.
- Step 3 The data processing chip establishes a mapping relationship between the WQE of the data to be sent and the first OTN channel among the multiple OTN channels.
- Step 4 The DPU writes the data to be sent from the memory 22 into the cache of the network card.
- the cache 233 in the network card maintains multiple queues, including the first queue (QM1), and the storage space corresponding to the first queue is used to store data to be sent.
- the OTN chip establishes a mapping relationship between QM1 and channel 1.
- the data corresponding to WQE1 and WQE2 is transmitted through channel 1 corresponding to QM1
- the data corresponding to WQEm is transmitted through channel 2 corresponding to QM2.
- different chips in the network card are used to implement different functions.
- the DPU implements the interaction between the network card and the application layer
- the OTN chip implements the interaction between the network card and the optical communication network. Therefore, hard pipe transmission of data from the storage device to the optical communication network can be realized by coordinating between different chips in the network card, which is conducive to improving data communication efficiency.
- Step 5 The OTN chip reads the data to be sent in the buffer, and maps the data to be sent to the payload area of the OTN frame generated by the OTN chip.
- the OTN chip includes multiple modules: queue management (QM) unit, optical line packet processing (OLPKT) module, customer exchange (CXC) module, and OTN Lite Line Node (OLLN).
- QM is used to implement multi-queue management, data back pressure processing and other functions
- OLPKT is used to encode the data to be sent into 256/257B format and cut it into OSU fixed length cells (OSU frames)
- CXC module is used to map OSU frames into and out of channels
- OLLN module is used to complete OTN maintenance signal insertion and analysis, etc.
- the OTN chip divides the data to be sent into multiple 192B data units, maps one data unit to the payload area of an OSU frame, and after multiple data units are mapped to the payload area of the OSU frame, multiple OSU frames are mapped to the OTN frame.
- the format of the OTN frame can be referred to in Table 2, which is not described here.
- the data communication method provided in the embodiment of the present application further includes: the OTN chip determines whether the data flow rate of QM1 is greater than or equal to a set rate threshold, and if the data flow rate is greater than or equal to the set rate threshold, the OTN chip instructs the DPU to lower the data flow rate of writing data to QM1.
- the data flow rate is the amount of data written by the DPU to QM1 per unit time.
- the above-set rate threshold can be set according to the hardware characteristics of the OTN chip and the DPU. In some optional situations, the set rate threshold can also be set by the user according to the data communication requirements between different DCs, which is not limited in this application.
- the rate threshold can be 5GB/s, 500MB/s or other values.
- Step 6 The optical transceiver 24 sends the OTN frame corresponding to the data to be sent to the target storage device through channel 1.
- the target storage device is also referred to as the peer storage device of the source storage device (referred to as peer).
- Step 7 The opposite-end OTN chip in the target storage device parses the received OTN frame and writes the aforementioned data to be sent into the cache of the opposite-end network card.
- the OLLN module parses the OTN maintenance signal to determine whether the OTN frame has an alarm. If there is no alarm, the CXC module demaps the data through the OTN channel, parses the RDMA data (data to be sent), and writes it into the QM queue cached and maintained in the peer network card.
- Step 9 After the peer DPU writes the data to be sent into the peer storage, the peer OTN chip sends a data write response to the network card of the source storage device through the optical transceiver.
- the RDMA queues in the storage device can establish a mapping relationship with the OTN channels provided by the optical transceiver, so that data in different RDMA queues are transmitted via different OTN channels.
- the mapping relationship is established by the network card based on the WQE of the data recorded in the RDMA queue, which avoids the optical transceiver sending the OTN frame corresponding to the data to the OTN channel that does not match the WQE of the data, thereby improving the accuracy of data communication.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Computer Hardware Design (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Optical Communication System (AREA)
Abstract
Description
Claims (19)
- 一种存储设备,其特征在于,包括:处理器、存储器和网卡,所述网卡包括数据处理芯片、存储介质和光收发器;所述存储器,用于存储所述处理器写入的待发送数据;所述数据处理芯片,用于将所述待发送数据从所述存储器写入所述存储介质;所述数据处理芯片,还用于读取所述存储介质中的待发送数据,并将所述待发送数据映射到所述数据处理芯片生成的光传送网络OTN帧的净荷区;所述光收发器,用于发送所述OTN帧。
- 根据权利要求1所述的存储设备,其特征在于,所述存储器,还用于维护至少一个远程直接存储访问RDMA发送队列,所述至少一个RDMA发送队列包括第一发送队列,所述第一发送队列中存储有一个或多个数据的工作队列元素WQE,所述一个或多个数据包括所述待发送数据;所述光收发器,还用于提供多个OTN通道,其中,一个OTN通道用于传输一个RDMA发送队列对应的数据;所述数据处理芯片,还用于:从所述第一发送队列中读取所述待发送数据的WQE,并为所述WQE和所述多个OTN通道中的第一OTN通道建立映射关系;其中,所述映射关系用于指示:所述待发送数据能够通过所述第一OTN通道进行传输。
- 根据权利要求2所述的存储设备,其特征在于,所述数据处理芯片包括:第一芯片和第二芯片;所述第一芯片,用于从所述第一发送队列中读取所述待发送数据的WQE;以及,所述第一芯片,还用于:根据所述WQE指示的源地址,将所述待发送数据从所述存储器写入所述存储介质;所述第二芯片,用于:为所述WQE和所述多个OTN通道中的第一OTN通道建立所述映射关系;以及,所述第二芯片,还用于:读取所述存储介质中的待发送数据,并将所述待发送数据映射到所述第二芯片生成的OTN帧的净荷区。
- 根据权利要求3所述的存储设备,其特征在于,所述存储介质维护有多个队列;所述第一芯片,具体用于:根据所述WQE指示的源地址,将所述待发送数据从所述存储器写入所述多个队列中第一队列对应的存储空间;所述第二芯片,具体用于:为所述第一队列与所述第一OTN通道建立所述映射关系。
- 根据权利要求4所述的存储设备,其特征在于,所述第二芯片,还用于判断所述第一队列的数据流速率是否大于或等于设定的速率阈值;所述数据流速率为单位时间内,所述第一芯片向所述第一队列中写入的数据量;若所述数据流速率大于或等于设定的速率阈值,则所述第二芯片还用于指示所述第一芯片调低向所述第一队列写数据的数据流速率。
- 根据权利要求3-5中任一项所述的存储设备,其特征在于,所述第二芯片,具体用于:将所述待发送数据映射到多个光业务单元OSU帧;所述待发送数据承载于所述多个OSU帧的净荷区;以及,将所述多个OSU帧映射到OTN帧。
- 根据权利要求1-6中任一项所述的存储设备,其特征在于,所述处理器,用于获取数据访问请求,所述数据访问请求用于请求所述待发送数据;所述处理器,还用于判断所述待发送数据的数据量大于或等于设定的阈值;所述数据处理芯片,具体用于:若所述待发送数据的数据量大于或等于设定的阈值,将所述待发送数据从所述存储器写入所述存储介质。
- 根据权利要求1-6中任一项所述的存储设备,其特征在于,所述光收发器,还用于接收目标存储设备的数据写响应,所述数据写响应用于指示所述待发送数据已写入所述目标存储设备。
- 一种数据通信方法,其特征在于,所述方法由存储设备执行,所述存储设备包括:处理器、存储器和网卡,所述存储器用于存储所述处理器写入的待发送数据,所述网卡包括数据处理芯片、存储介质 和光收发器;所述方法包括:所述数据处理芯片将所述待发送数据从所述存储器写入所述存储介质;所述数据处理芯片读取所述存储介质中的待发送数据,并将所述待发送数据映射到所述数据处理芯片生成的光传送网络OTN帧的净荷区;所述光收发器发送所述OTN帧。
- 根据权利要求9所述的方法,其特征在于,所述存储器还用于维护至少一个远程直接存储访问RDMA发送队列,所述至少一个RDMA发送队列包括第一发送队列,所述第一发送队列中存储有一个或多个数据的工作队列元素WQE,所述一个或多个数据包括所述待发送数据;所述光收发器用于提供多个OTN通道,其中,一个OTN通道用于传输一个RDMA发送队列对应的数据;所述数据处理芯片读取所述存储介质中的待发送数据,包括:所述数据处理芯片从所述第一发送队列中读取所述待发送数据的WQE,并根据所述WQE指示的源地址,将所述待发送数据从所述存储器写入所述存储介质;在所述光收发器发送所述OTN帧之前,所述方法还包括:所述数据处理芯片为所述WQE和所述多个OTN通道中的第一OTN通道建立映射关系;其中,所述映射关系用于指示:所述待发送数据能够通过所述第一OTN通道进行传输。
- 根据权利要求10所述的方法,其特征在于,所述数据处理芯片包括:第一芯片和第二芯片;所述数据处理芯片从所述第一发送队列中读取所述待发送数据的WQE,并根据所述WQE指示的源地址,将所述待发送数据从所述存储器写入所述存储介质,包括:所述第一芯片从所述第一发送队列中读取所述待发送数据的WQE,并根据所述WQE指示的源地址,将所述待发送数据从所述存储器写入所述存储介质;所述数据处理芯片为所述WQE和所述多个OTN通道中的第一OTN通道建立映射关系,包括:所述第二芯片为所述WQE和所述多个OTN通道中的第一OTN通道建立所述映射关系;所述数据处理芯片将所述待发送数据映射到所述数据处理芯片生成的OTN帧的净荷区,包括:所述第二芯片读取所述存储介质中的待发送数据,并将所述待发送数据映射到所述第二芯片生成的OTN帧的净荷区。
- 根据权利要求11所述的方法,其特征在于,所述存储介质维护有多个队列,所述多个队列包括第一队列,所述第一队列对应的存储空间用于存储所述待发送数据;所述第二芯片为所述WQE和所述多个OTN通道中的第一OTN通道建立所述映射关系,包括:所述第二芯片为所述第一队列与所述第一OTN通道建立所述映射关系。
- 根据权利要求12所述的方法,其特征在于,所述方法还包括:所述第二芯片判断所述第一队列的数据流速率是否大于或等于设定的速率阈值;所述数据流速率为单位时间内,所述第一芯片向所述第一队列中写入的数据量;若所述数据流速率大于或等于设定的速率阈值,则所述第二芯片指示所述第一芯片调低向所述第一队列写数据的数据流速率。
- 根据权利要求11-12中任一项所述的方法,其特征在于,所述数据处理芯片将所述待发送数据映射到所述数据处理芯片生成的OTN帧的净荷区,包括:所述第二芯片将所述待发送数据映射到多个光业务单元OSU帧;所述待发送数据承载于所述多个OSU帧的净荷区;所述第二芯片将所述多个OSU帧映射到OTN帧。
- 根据权利要求9-14中任一项所述的方法,其特征在于,在所述数据处理芯片将所述待发送数据从所述存储器写入所述存储介质之前,所述方法还包括:所述处理器获取数据访问请求,所述数据访问请求用于请求所述待发送数据;所述处理器判断所述待发送数据的数据量大于或等于设定的阈值;若所述待发送数据的数据量大于或等于设定的阈值,所述数据处理芯片将所述待发送数据从所述存储器写入所述存储介质。
- 根据权利要求9-15中任一项所述的方法,其特征在于,所述方法还包括:所述光收发器接收目标存储设备的数据写响应,所述数据写响应用于指示所述待发送数据已写入所述目标存储设备。
- 一种光通信系统,其特征在于,所述系统包括:存储设备和光网络设备;其中:所述存储设备包括:处理器、存储器和网卡,所述网卡包括数据处理芯片、存储介质和光收发器;所述存储器,用于存储所述处理器写入的待发送数据;所述数据处理芯片,用于将所述待发送数据从所述存储器写入所述存储介质;所述数据处理芯片,还用于读取所述存储介质中的待发送数据,并将所述待发送数据映射到所述数据处理芯片生成的OTN帧的净荷区;所述光收发器,用于向所述光网络设备发送所述OTN帧。
- 一种计算机可读存储介质,其特征在于,包括:计算机软件指令;当所述计算机软件指令在存储设备中运行时,所述存储设备执行权利要求9-16中任一项所述的方法。
- 一种计算机程序产品,其特征在于,当所述计算机程序产品在存储设备中运行时,所述存储设备执行权利要求9-16中任一项所述的方法。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23910304.7A EP4625853A4 (en) | 2022-12-27 | 2023-12-20 | STORAGE DEVICE, AND METHOD AND SYSTEM FOR DATA COMMUNICATION |
| US19/245,593 US20260106684A1 (en) | 2022-12-27 | 2025-06-23 | Storage device, data communication method, and system |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211691501.X | 2022-12-27 | ||
| CN202211691501.XA CN118264353A (zh) | 2022-12-27 | 2022-12-27 | 一种存储设备、数据通信方法以及系统 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US19/245,593 Continuation US20260106684A1 (en) | 2022-12-27 | 2025-06-23 | Storage device, data communication method, and system |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024140375A1 true WO2024140375A1 (zh) | 2024-07-04 |
Family
ID=91606340
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/140344 Ceased WO2024140375A1 (zh) | 2022-12-27 | 2023-12-20 | 一种存储设备、数据通信方法以及系统 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20260106684A1 (zh) |
| EP (1) | EP4625853A4 (zh) |
| CN (1) | CN118264353A (zh) |
| WO (1) | WO2024140375A1 (zh) |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN119835239A (zh) * | 2024-12-31 | 2025-04-15 | 无锡众星微系统技术有限公司 | 一种基于rdma的单qp场景性能优化系统及方法 |
| WO2026066110A1 (zh) * | 2024-09-26 | 2026-04-02 | 华为技术有限公司 | 存储节点、存储阵列和数据访问方法 |
Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030123493A1 (en) * | 2001-12-19 | 2003-07-03 | Nec Corporation | Network, switching apparatus and OTN frame processing method for use therein; its circuit and integrated circuit |
| CN101155006A (zh) * | 2006-09-30 | 2008-04-02 | 华为技术有限公司 | 一种固定速率业务传送的方法与装置 |
| CN109491809A (zh) * | 2018-11-12 | 2019-03-19 | 西安微电子技术研究所 | 一种降低高速总线延迟的通信方法 |
| US20200026656A1 (en) * | 2018-07-20 | 2020-01-23 | International Business Machines Corporation | Efficient silent data transmission between computer servers |
| CN113900972A (zh) * | 2020-07-06 | 2022-01-07 | 华为技术有限公司 | 一种数据传输的方法、芯片和设备 |
| CN115499084A (zh) * | 2022-09-19 | 2022-12-20 | 中国电信股份有限公司 | 以太网业务传输方法、装置、电子设备及存储介质 |
Family Cites Families (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN111356037B (zh) * | 2018-12-21 | 2021-08-20 | 深圳市海思半导体有限公司 | 光传送网线路带宽切换方法及装置 |
| CN113395613B (zh) * | 2020-03-11 | 2022-08-19 | 华为技术有限公司 | 一种业务承载的方法、装置和系统 |
-
2022
- 2022-12-27 CN CN202211691501.XA patent/CN118264353A/zh active Pending
-
2023
- 2023-12-20 EP EP23910304.7A patent/EP4625853A4/en active Pending
- 2023-12-20 WO PCT/CN2023/140344 patent/WO2024140375A1/zh not_active Ceased
-
2025
- 2025-06-23 US US19/245,593 patent/US20260106684A1/en active Pending
Patent Citations (6)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20030123493A1 (en) * | 2001-12-19 | 2003-07-03 | Nec Corporation | Network, switching apparatus and OTN frame processing method for use therein; its circuit and integrated circuit |
| CN101155006A (zh) * | 2006-09-30 | 2008-04-02 | 华为技术有限公司 | 一种固定速率业务传送的方法与装置 |
| US20200026656A1 (en) * | 2018-07-20 | 2020-01-23 | International Business Machines Corporation | Efficient silent data transmission between computer servers |
| CN109491809A (zh) * | 2018-11-12 | 2019-03-19 | 西安微电子技术研究所 | 一种降低高速总线延迟的通信方法 |
| CN113900972A (zh) * | 2020-07-06 | 2022-01-07 | 华为技术有限公司 | 一种数据传输的方法、芯片和设备 |
| CN115499084A (zh) * | 2022-09-19 | 2022-12-20 | 中国电信股份有限公司 | 以太网业务传输方法、装置、电子设备及存储介质 |
Cited By (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2026066110A1 (zh) * | 2024-09-26 | 2026-04-02 | 华为技术有限公司 | 存储节点、存储阵列和数据访问方法 |
| CN119835239A (zh) * | 2024-12-31 | 2025-04-15 | 无锡众星微系统技术有限公司 | 一种基于rdma的单qp场景性能优化系统及方法 |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4625853A4 (en) | 2026-03-18 |
| EP4625853A1 (en) | 2025-10-01 |
| US20260106684A1 (en) | 2026-04-16 |
| CN118264353A (zh) | 2024-06-28 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20240388545A1 (en) | Packet processing method, gateway device, and storage system | |
| US7917682B2 (en) | Multi-protocol controller that supports PCIe, SAS and enhanced Ethernet | |
| US20260106684A1 (en) | Storage device, data communication method, and system | |
| US8725879B2 (en) | Network interface device | |
| TW202016744A (zh) | 主機、非揮發性記憶體快速固態驅動器及儲存服務的方法 | |
| CN115701063B (zh) | 一种报文传输方法以及通信装置 | |
| US20130138758A1 (en) | Efficient data transfer between servers and remote peripherals | |
| CN115858146A (zh) | 内存扩展系统和计算节点 | |
| CN117221225A (zh) | 一种网络拥塞通知方法、装置及存储介质 | |
| CN116250220A (zh) | 一种多芯片封装结构、交换机 | |
| CN102843435A (zh) | 一种在集群系统中存储介质的访问、响应方法和系统 | |
| CN113783808A (zh) | 一种转发方式自适应切换的数据转发方法及装置 | |
| WO2022042396A1 (zh) | 数据传输方法和系统、芯片 | |
| WO2015055008A1 (zh) | 一种存储控制芯片及磁盘报文传输方法 | |
| US7580410B2 (en) | Extensible protocol processing system | |
| US12093571B1 (en) | Accelerating request/response protocols | |
| CN118590566A (zh) | 应用于dpu的报文转发装置、方法、dpu设备和存储介质 | |
| CN118555034A (zh) | 一种数据传输方法及计算机可读存储介质 | |
| CN1731730A (zh) | 一种用于海量存储系统的核心存储交换平台系统 | |
| CN120881177B (zh) | 一种通信方法、装置及存储介质 | |
| CN115529275B (zh) | 一种报文处理系统及方法 | |
| CN118101605B (zh) | 天文射电望远镜终端高性能数据交换方法及系统 | |
| CN118283135B (zh) | Toe组件及其处理数据的方法 | |
| CN120803988B (zh) | 基于ROCEv2的高带宽低延迟数据处理方法 | |
| CN118283134B (zh) | Toe组件及其处理数据的方法 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23910304 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023910304 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2023910304 Country of ref document: EP Effective date: 20250624 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023910304 Country of ref document: EP |