WO2024254894A1 - Système de traitement de données de graphe à grande échelle basé sur une unité unique - Google Patents
Système de traitement de données de graphe à grande échelle basé sur une unité unique Download PDFInfo
- Publication number
- WO2024254894A1 WO2024254894A1 PCT/CN2023/101407 CN2023101407W WO2024254894A1 WO 2024254894 A1 WO2024254894 A1 WO 2024254894A1 CN 2023101407 W CN2023101407 W CN 2023101407W WO 2024254894 A1 WO2024254894 A1 WO 2024254894A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- module
- sub
- graph
- subgraph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/24569—Query processing with adaptation to specific hardware, e.g. adapted for using GPUs or SSDs
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/90335—Query processing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/544—Buffers; Shared memory; Pipes
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Definitions
- the present application relates to the field of data processing technology, and in particular to a large-scale graph data processing system based on a single machine.
- graph data has become a highly valued topic in the field of data science and engineering, as it is easy to abstract entities and relationships in the real world. It has been widely used in many fields such as social network analysis, recommendation systems, financial fraud detection, and drug discovery. At the same time, graph data has high flexibility, and many problems that were originally modeled using matrices, relationships, or other data structures can also be converted to graph data processing, further highlighting the importance of graph data.
- the scale of abstract graph data generated or collected by computer systems is growing rapidly. This growth in magnitude has posed an extremely sharp challenge to the large-scale data storage, analysis, and mining capabilities of modern computer systems.
- a large-scale graph data processing system based on a single machine to solve the above problem includes:
- a large-scale graph data processing system based on a single machine, comprising a data loading module, a data calculation module, a data release module, a storage management module and a disk;
- the disk stores large-scale graph data consisting of a plurality of subgraphs;
- the storage management module stores state information corresponding to each of the subgraphs; in an initial state, the state of the subgraph is active;
- the data loading module is used for acquiring the sub-graph in the active state from the disk, and transmitting the sub-graph to the data calculation module;
- the data calculation module is used to update the subgraph and transmit the message generated by the update to the storage management module;
- the data calculation module is further used to transmit the sub-graph to the data release module;
- the data release module is used to write the sub-graph into the disk
- the storage management module When the subgraph is written to the disk, the storage management module is used to set the state of the subgraph to converged.
- the data calculation module is further used to write the sub-graph into the disk.
- the data releasing module is further used to transmit the sub-graph to the data loading module.
- the storage management module is further used to set the state of the subgraph that has received the message to active.
- the data calculation module is also used to aggregate all the sub-graphs to obtain updated large-scale graph data.
- the storage management module includes a message storage unit and a state management unit; the state management unit stores the state information;
- the data calculation module is used to transmit the updated message to the message storage unit
- the state management unit is used to update the subgraph received the message Set the status to Active.
- the data calculation module includes an aggregation calculation unit
- the aggregation calculation unit is used to aggregate all the subgraphs to obtain updated large-scale graph data.
- the storage management module is further used to set the state of the subgraph to waiting for calculation.
- the storage management module is further used to set the state of the subgraph to being calculated.
- the storage management module is further used to set the state of the sub-graph to being released.
- the present application provides a solution for applying a sub-graph-centric computing model to a single machine system and establishing a pipeline processing architecture, specifically: "A large-scale graph data processing system based on a single machine, comprising a data loading module, a data computing module, a data release module, a storage management module and a disk; the disk stores large-scale graph data consisting of a number of sub-graphs; the storage management module stores status information corresponding to each of the sub-graphs; in an initial state, the sub-graph is active; The data loading module is used to obtain the subgraph with active status from the disk and transmit the subgraph to the data calculation module; the data calculation module is used to update the subgraph and transmit the message generated by the update to the storage management module; when the updated subgraph has changes, the data calculation module is also used to transmit the subgraph to the data release module; when the subgraph is
- the architecture can overlap data I/O and CPU operations, thereby reducing the I/O cost of the traditional vertex-centric computing model while improving CPU utilization and promoting sequential access to the disk.
- the architecture uses shared memory data structures for message passing and efficient synchronization, which can separate computing from memory management and scheduling. out, thus providing new opportunities for optimization.
- FIG1 is a schematic diagram of the connected component calculation process on the vertex center model and the subgraph center model
- FIG2 is a schematic diagram of a processing architecture of a large-scale graph data processing system provided by an embodiment of the present application
- FIG3 is a schematic diagram of a state management and optimization strategy for a large-scale graph data processing system provided in an embodiment of the present application.
- the vertex-centric computing model will undoubtedly bring additional communication overhead or I/O overhead due to the transmission of messages between vertices.
- the subgraph-centric computing model (which allows information in the calculation process to be freely transmitted within the subgraph) has significantly fewer computing steps than the vertex-centric computing model.
- the traditional subgraph-centric computing model is designed for multi-machine systems, and no work has yet introduced the subgraph-centric computing model into a single-machine environment. Therefore, some issues remain unclear. For example, when the communication cost is potentially converted into I/O cost, can the introduction of the subgraph-centric computing model systematically reduce the I/O cost of the out-of-core graph system and improve multi-core parallelism?
- the traditional subgraph-centric computing model requires a finer-grained partitioning of the graph to improve parallelism, but this The cost is more redundant control information, such as the mapping of global vertex IDs to local vertex IDs. This problem can be solved in a distributed environment by allocating enough memory to each computing node, but in a single-machine multi-core environment, finer-grained graph partitioning will occupy the already precious memory resources.
- auxiliary storage such as hard
- a large-scale graph data processing system based on a single machine, comprising a data loading module, a data calculation module, a data release module, a storage management module and a disk;
- the disk stores large-scale graph data consisting of a plurality of subgraphs;
- the storage management module stores state information corresponding to each of the subgraphs; in an initial state, the state of the subgraph is active;
- the data loading module is used for acquiring the sub-graph in the active state from the disk, and transmitting the sub-graph to the data calculation module;
- the data calculation module is used to update the subgraph and transmit the message generated by the update to the storage management module;
- the data calculation module is further used to transmit the sub-graph to the data release module;
- the data release module is used to write the sub-graph into the disk
- the storage management module When the subgraph is written to the disk, the storage management module is used to set the state of the subgraph to converged.
- the present application compared with the existing large-scale graph processing system based on a single machine, In order to solve the problem of high pin or I/O overhead, the present application applies the subgraph-centric computing model to a single-machine system and establishes a pipeline processing architecture.
- the pipeline processing architecture uses the subgraph ⁇ F0, F1, F2, F3, ..., Fn-1 ⁇ of graph G as the minimum input and output unit, and updates the large graph G iteratively with a pipeline.
- the architecture decomposes the out-of-core processing of the subgraph Fi into three consecutive stages: reading Fi into memory, calculating and updating Fi, and if necessary, writing the updated Fi back to external memory.
- These stages are completed through three modules, namely, a data loading module, a data calculation module, and a data release module.
- these modules work asynchronously through two task queues "input queue” and "output queue”.
- the pipeline processing architecture effectively overlaps subgraph I/O and CPU operations, performs calculations on memory subgraphs while loading suspended subgraphs from disk, which can reduce the I/O cost of traditional vertex-centric computing models, while improving CPU utilization by reducing idle waits and enabling continuous access to disks; in addition, the architecture uses shared memory data structures for message passing and efficient synchronization, which can separate computing from memory management and scheduling, thereby providing new opportunities for optimization.
- the system adopts APIs based on a hybrid computing model.
- the APIs adopt a unified PIE+ interface, which integrates vertex-centric and subgraph-centric programming models. Users can not only parallelize sequential graph algorithms under the subgraph-centric computing model to simplify parallel programming (inter-subgraph parallelism), but also further explore the parallelism within the subgraph based on the vertex-centric computing model through new interfaces.
- the hybrid model supports both inter-subgraph parallelism of the "subgraph-centric computing model” and intra-subgraph parallelism of the "vertex-centric computing model”. Under limited memory, it can better utilize multi-core resources and avoid fragmentation of the input graph; in addition, it provides a unified interface from which users can choose the interface that best suits their applications and graphics.
- the system further includes a scheduler.
- the scheduler is used to track and allocate threads in a thread pool, where each thread corresponds to a physical CPU core. It decides to allocate physical threads to virtual worker threads to perform (parallel) calculations on subgraphs. It also actively adjusts to support two levels of parallelism: when threads are available, the scheduler allocates them to new computing units by consuming the "input queue" to speed up the parallelism between subgraphs, or through the running worker cores. To improve the parallelism within a subgraph.
- the storage management module includes a message storage unit; the data calculation module is used to transmit the updated messages to the message storage unit.
- the message storage unit is used to realize message synchronization between parallel computing units.
- the message storage unit is implemented as a data structure in memory. In order to improve space efficiency, it can be implemented as a compact variable-length array. It should be noted that the space complexity of the message storage unit is closely related to the partitioning strategy. The more boundary vertices/edges there are, the larger the space consumed by the message storage unit. Compared with the message passing strategy of a multi-machine system, the use of a message storage unit is more efficient in a shared memory environment.
- the storage management module further includes a state management unit; the state management unit stores the state information and can update the state information at a specific time.
- the state management unit is used to maintain a state machine to model the state of the subgraph.
- the state management unit is implemented as a lightweight data structure, and each subgraph only maintains a few states, and the memory space occupied can be ignored.
- the state management unit uses the state management unit to record the state information of the subgraph to record the state information of the subgraph to record the state information of the subgraph. It only requires a tag list M to help track the message exchange between computing units, and a lightweight state machine to model the work progress of each computing unit. Specifically, the state management unit constructs a tag list M, and each subgraph corresponds to a tag to indicate whether it has received any messages in this round of iteration. If a subgraph has at least one pending update to be extracted from the message storage unit, its corresponding M[i] is true, otherwise, M[i] is false. In actual operation, a finite state machine can be used to model the progress of each subgraph, and the flag M[i] can be used to trigger the state transition of the subgraph.
- the states of the subgraphs include “active”, “waiting for calculation”, “calculating”, “releasing” and “converging”. At any time, the subgraph is in one of the five states, where the first two states indicate that the subgraph is on the disk, and the remaining states indicate that the subgraph is in the memory.
- the initial state of each subgraph is "active", which means that the subgraph is waiting for the data loading module to load it into the memory; when the subgraph is obtained by the data loading module, the state management unit is used to set the state of the subgraph to "waiting for calculation", which means It means that the subgraph has been resident in the memory and the subgraph is waiting to be assigned a processing core; when the subgraph is transmitted to the data computing module, the state management unit is used to set the state of the subgraph to "computing", which means that the subgraph is being processed by the processing core; when the subgraph is transmitted to the data release module (that is, the subgraph generates a message that needs to be sent to other subgraphs in the current round of update), the state management unit is used to set the state of the subgraph to "releasing"; when the subgraph is written to the disk, the state management unit is used to set the state of the subgraph to "converging"; when the current round of update ends, the state management unit is also used to set the state of the subgraph participating in the
- the system can skip certain states in a round of computation without affecting correctness, that is, it can take some "shortcuts" in state transitions and reduce unnecessary computation and I/O.
- the storage management module when the current round of updates is completed, the storage management module is used to set the state of the subgraph that received the message to "active" ("shortcut A").
- the state of the subgraph with a "convergence” state needs to be reset to "active”. If M[i] corresponding to a subgraph is true at this time, the subgraph can be kept in the "convergence” state so that it does not participate in the next round of updates. This allows the processing of subgraphs that do not need to be updated to be skipped without affecting the correctness of the program.
- "Shortcut A” is often used when the input graph is not well connected and a subgraph is "isolated”, and effectively reduces I/O costs.
- the data release module is used to transfer the subgraph to the data loading module ("shortcut B"). After all the subgraphs have completed the current round of updates, a new round of updates will begin. If a subgraph is still in the "releasing" state, that is, it has not been completely saved to the disk, the state of the subgraph can be directly set to "waiting for calculation", thereby starting a new round of updates to the subgraph without going through the disk. "Shortcut B" can be used at the end of each round and effectively reduces I/O costs.
- the data calculation module when there is no change in the updated sub-graph, the data calculation module is used to write the sub-graph to the disk ("shortcut C"). There is no change compared to before calculation. You can skip the "releasing" state and directly set it to “converging", which can effectively reduce redundant disk writes.
- the data calculation module includes an aggregation calculation unit.
- the aggregation calculation unit is used to call a preset aggregation function to aggregate all the sub-graphs to obtain updated large-scale graph data.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
La présente invention concerne un système de traitement de données de graphe à grande échelle basé sur une unité unique, comprenant un module de chargement de données, un module de calcul de données, un module de libération de données, un module de gestion de stockage et un disque ; le module de chargement de données est utilisé pour acquérir des sous-graphes dans un état actif à partir du disque, et transmettre les sous-graphes au module de calcul de données ; le module de calcul de données est utilisé pour mettre à jour les sous-graphes, et transmettre des messages générés par la mise à jour au module de gestion de stockage ; le module de calcul de données est en outre utilisé pour transmettre les sous-graphes au module de libération de données ; le module de libération de données est utilisé pour écrire les sous-graphes dans le disque ; lorsque les sous-graphes sont écrits dans le disque, le module de gestion de stockage est utilisé pour régler les états des sous-graphes de façon qu'ils soient convergents. L'application d'un modèle de calcul centré sur les sous-graphes à un système à unité unique et l'établissement d'un ensemble d'architecture de traitement à pipeline unique pouvant faire se chevaucher des opérations d'E/S de données et de CPU réduisent les coûts d'E/S de modèles de calcul classiques centrés sur les sommets tout en augmentant le taux d'utilisation de CPU, et favorisent un accès séquentiel au disque.
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202310695465.2 | 2023-06-12 | ||
| CN202310695465.2A CN116680296B (zh) | 2023-06-12 | 2023-06-12 | 一种基于单机的大规模图数据处理系统 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024254894A1 true WO2024254894A1 (fr) | 2024-12-19 |
Family
ID=87790518
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/101407 Ceased WO2024254894A1 (fr) | 2023-06-12 | 2023-06-20 | Système de traitement de données de graphe à grande échelle basé sur une unité unique |
Country Status (2)
| Country | Link |
|---|---|
| CN (1) | CN116680296B (fr) |
| WO (1) | WO2024254894A1 (fr) |
Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109308327A (zh) * | 2018-09-19 | 2019-02-05 | 浙江天猫技术有限公司 | 基于子图模型兼容点中心模型的图计算方法装置介质设备 |
| US20200249998A1 (en) * | 2019-02-01 | 2020-08-06 | Alibaba Group Holding Limited | Scheduling computation graph heterogeneous computer system |
| CN111859027A (zh) * | 2019-04-24 | 2020-10-30 | 华为技术有限公司 | 图计算方法及装置 |
| CN112988064A (zh) * | 2021-02-09 | 2021-06-18 | 华中科技大学 | 一种面向并发多任务的磁盘图处理方法 |
Family Cites Families (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| AU2003287815A1 (en) * | 2002-12-02 | 2004-06-23 | Mount Sinai Hospital | Methods and products for representing and analyzing complexes of biological molecules |
| CN112352234B (zh) * | 2018-06-15 | 2024-03-08 | 华为云计算技术有限公司 | 用于处理并发属性图查询的系统 |
| WO2020019313A1 (fr) * | 2018-07-27 | 2020-01-30 | 浙江天猫技术有限公司 | Procédé de mise à jour de données de graphe, système, support d'informations lisible par ordinateur et dispositif |
| CN111241353B (zh) * | 2020-01-16 | 2023-08-22 | 支付宝(杭州)信息技术有限公司 | 一种图数据的分区方法、装置以及设备 |
| CN113392280B (zh) * | 2021-06-10 | 2023-08-04 | 东北大学 | 一种面向跨区域的多主模型分布式图计算方法 |
| CN113434702A (zh) * | 2021-07-27 | 2021-09-24 | 支付宝(杭州)信息技术有限公司 | 一种用于图计算的自适应控制方法和系统 |
| CN114756483A (zh) * | 2022-03-31 | 2022-07-15 | 深圳清华大学研究院 | 基于核间存储访问的子图分段优化方法及应用 |
-
2023
- 2023-06-12 CN CN202310695465.2A patent/CN116680296B/zh active Active
- 2023-06-20 WO PCT/CN2023/101407 patent/WO2024254894A1/fr not_active Ceased
Patent Citations (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN109308327A (zh) * | 2018-09-19 | 2019-02-05 | 浙江天猫技术有限公司 | 基于子图模型兼容点中心模型的图计算方法装置介质设备 |
| US20200249998A1 (en) * | 2019-02-01 | 2020-08-06 | Alibaba Group Holding Limited | Scheduling computation graph heterogeneous computer system |
| CN111859027A (zh) * | 2019-04-24 | 2020-10-30 | 华为技术有限公司 | 图计算方法及装置 |
| CN112988064A (zh) * | 2021-02-09 | 2021-06-18 | 华中科技大学 | 一种面向并发多任务的磁盘图处理方法 |
Non-Patent Citations (1)
| Title |
|---|
| MEI ZHENJIE: "Research on Optimization Method of Graph-oriented Computation", CHINA MASTER’S THESES FULL-TEXT DATABASE, 24 May 2017 (2017-05-24), XP093247168 * |
Also Published As
| Publication number | Publication date |
|---|---|
| CN116680296B (zh) | 2026-01-13 |
| CN116680296A (zh) | 2023-09-01 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN110704360B (zh) | 一种基于异构fpga数据流的图计算优化方法 | |
| CN101950282B (zh) | 一种多处理器系统及其同步引擎 | |
| CN105653204A (zh) | 一种基于磁盘的分布式图计算方法 | |
| CN103455371B (zh) | 用于优化的管芯内小节点间消息通信的方法和系统 | |
| US20110265093A1 (en) | Computer System and Program Product | |
| CN111190735A (zh) | 一种基于Linux的片上CPU/GPU流水化计算方法及计算机系统 | |
| WO2023274278A1 (fr) | Procédé et dispositif d'ordonnancement de ressources et nœud informatique | |
| CN116719646A (zh) | 热点数据处理方法、装置、电子装置和存储介质 | |
| CN107168795A (zh) | 基于cpu‑gpu异构复合式并行计算框架的密码子偏差系数模型方法 | |
| CN106095552A (zh) | 一种基于i/o去重的多任务图处理方法及系统 | |
| CN109144749A (zh) | 一种使用处理器实现多处理器间通信的方法 | |
| CN116340024B (zh) | 仿真模型组件进程间的数据共享方法、计算机设备及介质 | |
| CN117539598A (zh) | 任务处理方法、装置、电子设备及存储介质 | |
| CN116841952A (zh) | 核间通信系统、方法、装置、设备、芯片及可读存储介质 | |
| CN107528871A (zh) | 存储系统中的数据分析 | |
| CN114792186A (zh) | 生产排产仿真方法及其装置 | |
| WO2024254894A1 (fr) | Système de traitement de données de graphe à grande échelle basé sur une unité unique | |
| CN118277490A (zh) | 数据处理系统、数据同步方法、电子设备和存储介质 | |
| CN114741166B (zh) | 一种分布式任务的处理方法、分布式系统及第一设备 | |
| CN119807082A (zh) | 一种基于Stream write指令的GPU写存储系统及其方法 | |
| CN116820713A (zh) | 流水控制方法、加速器和电子设备 | |
| Singh | Communication Coroutines For Parallel Program Using DW26010 Many Core Processor | |
| CN110515729B (zh) | 基于图形处理器的图计算节点向量负载平衡方法及装置 | |
| CN108958904A (zh) | 嵌入式多核中央处理器的轻量级操作系统的驱动程序框架 | |
| CN119861974B (zh) | 基于数据流核心架构的任务处理方法、装置、设备及介质 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23941127 Country of ref document: EP Kind code of ref document: A1 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |