WO2024254894A1 - Système de traitement de données de graphe à grande échelle basé sur une unité unique - Google Patents

Système de traitement de données de graphe à grande échelle basé sur une unité unique Download PDF

Info

Publication number
WO2024254894A1
WO2024254894A1 PCT/CN2023/101407 CN2023101407W WO2024254894A1 WO 2024254894 A1 WO2024254894 A1 WO 2024254894A1 CN 2023101407 W CN2023101407 W CN 2023101407W WO 2024254894 A1 WO2024254894 A1 WO 2024254894A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
module
sub
graph
subgraph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2023/101407
Other languages
English (en)
Chinese (zh)
Inventor
朱筱可
刘阳
刘书昊
樊文飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Computing Sciences
Original Assignee
Shenzhen Institute of Computing Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Computing Sciences filed Critical Shenzhen Institute of Computing Sciences
Publication of WO2024254894A1 publication Critical patent/WO2024254894A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/24569Query processing with adaptation to specific hardware, e.g. adapted for using GPUs or SSDs
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of data processing technology, and in particular to a large-scale graph data processing system based on a single machine.
  • graph data has become a highly valued topic in the field of data science and engineering, as it is easy to abstract entities and relationships in the real world. It has been widely used in many fields such as social network analysis, recommendation systems, financial fraud detection, and drug discovery. At the same time, graph data has high flexibility, and many problems that were originally modeled using matrices, relationships, or other data structures can also be converted to graph data processing, further highlighting the importance of graph data.
  • the scale of abstract graph data generated or collected by computer systems is growing rapidly. This growth in magnitude has posed an extremely sharp challenge to the large-scale data storage, analysis, and mining capabilities of modern computer systems.
  • a large-scale graph data processing system based on a single machine to solve the above problem includes:
  • a large-scale graph data processing system based on a single machine, comprising a data loading module, a data calculation module, a data release module, a storage management module and a disk;
  • the disk stores large-scale graph data consisting of a plurality of subgraphs;
  • the storage management module stores state information corresponding to each of the subgraphs; in an initial state, the state of the subgraph is active;
  • the data loading module is used for acquiring the sub-graph in the active state from the disk, and transmitting the sub-graph to the data calculation module;
  • the data calculation module is used to update the subgraph and transmit the message generated by the update to the storage management module;
  • the data calculation module is further used to transmit the sub-graph to the data release module;
  • the data release module is used to write the sub-graph into the disk
  • the storage management module When the subgraph is written to the disk, the storage management module is used to set the state of the subgraph to converged.
  • the data calculation module is further used to write the sub-graph into the disk.
  • the data releasing module is further used to transmit the sub-graph to the data loading module.
  • the storage management module is further used to set the state of the subgraph that has received the message to active.
  • the data calculation module is also used to aggregate all the sub-graphs to obtain updated large-scale graph data.
  • the storage management module includes a message storage unit and a state management unit; the state management unit stores the state information;
  • the data calculation module is used to transmit the updated message to the message storage unit
  • the state management unit is used to update the subgraph received the message Set the status to Active.
  • the data calculation module includes an aggregation calculation unit
  • the aggregation calculation unit is used to aggregate all the subgraphs to obtain updated large-scale graph data.
  • the storage management module is further used to set the state of the subgraph to waiting for calculation.
  • the storage management module is further used to set the state of the subgraph to being calculated.
  • the storage management module is further used to set the state of the sub-graph to being released.
  • the present application provides a solution for applying a sub-graph-centric computing model to a single machine system and establishing a pipeline processing architecture, specifically: "A large-scale graph data processing system based on a single machine, comprising a data loading module, a data computing module, a data release module, a storage management module and a disk; the disk stores large-scale graph data consisting of a number of sub-graphs; the storage management module stores status information corresponding to each of the sub-graphs; in an initial state, the sub-graph is active; The data loading module is used to obtain the subgraph with active status from the disk and transmit the subgraph to the data calculation module; the data calculation module is used to update the subgraph and transmit the message generated by the update to the storage management module; when the updated subgraph has changes, the data calculation module is also used to transmit the subgraph to the data release module; when the subgraph is
  • the architecture can overlap data I/O and CPU operations, thereby reducing the I/O cost of the traditional vertex-centric computing model while improving CPU utilization and promoting sequential access to the disk.
  • the architecture uses shared memory data structures for message passing and efficient synchronization, which can separate computing from memory management and scheduling. out, thus providing new opportunities for optimization.
  • FIG1 is a schematic diagram of the connected component calculation process on the vertex center model and the subgraph center model
  • FIG2 is a schematic diagram of a processing architecture of a large-scale graph data processing system provided by an embodiment of the present application
  • FIG3 is a schematic diagram of a state management and optimization strategy for a large-scale graph data processing system provided in an embodiment of the present application.
  • the vertex-centric computing model will undoubtedly bring additional communication overhead or I/O overhead due to the transmission of messages between vertices.
  • the subgraph-centric computing model (which allows information in the calculation process to be freely transmitted within the subgraph) has significantly fewer computing steps than the vertex-centric computing model.
  • the traditional subgraph-centric computing model is designed for multi-machine systems, and no work has yet introduced the subgraph-centric computing model into a single-machine environment. Therefore, some issues remain unclear. For example, when the communication cost is potentially converted into I/O cost, can the introduction of the subgraph-centric computing model systematically reduce the I/O cost of the out-of-core graph system and improve multi-core parallelism?
  • the traditional subgraph-centric computing model requires a finer-grained partitioning of the graph to improve parallelism, but this The cost is more redundant control information, such as the mapping of global vertex IDs to local vertex IDs. This problem can be solved in a distributed environment by allocating enough memory to each computing node, but in a single-machine multi-core environment, finer-grained graph partitioning will occupy the already precious memory resources.
  • auxiliary storage such as hard
  • a large-scale graph data processing system based on a single machine, comprising a data loading module, a data calculation module, a data release module, a storage management module and a disk;
  • the disk stores large-scale graph data consisting of a plurality of subgraphs;
  • the storage management module stores state information corresponding to each of the subgraphs; in an initial state, the state of the subgraph is active;
  • the data loading module is used for acquiring the sub-graph in the active state from the disk, and transmitting the sub-graph to the data calculation module;
  • the data calculation module is used to update the subgraph and transmit the message generated by the update to the storage management module;
  • the data calculation module is further used to transmit the sub-graph to the data release module;
  • the data release module is used to write the sub-graph into the disk
  • the storage management module When the subgraph is written to the disk, the storage management module is used to set the state of the subgraph to converged.
  • the present application compared with the existing large-scale graph processing system based on a single machine, In order to solve the problem of high pin or I/O overhead, the present application applies the subgraph-centric computing model to a single-machine system and establishes a pipeline processing architecture.
  • the pipeline processing architecture uses the subgraph ⁇ F0, F1, F2, F3, ..., Fn-1 ⁇ of graph G as the minimum input and output unit, and updates the large graph G iteratively with a pipeline.
  • the architecture decomposes the out-of-core processing of the subgraph Fi into three consecutive stages: reading Fi into memory, calculating and updating Fi, and if necessary, writing the updated Fi back to external memory.
  • These stages are completed through three modules, namely, a data loading module, a data calculation module, and a data release module.
  • these modules work asynchronously through two task queues "input queue” and "output queue”.
  • the pipeline processing architecture effectively overlaps subgraph I/O and CPU operations, performs calculations on memory subgraphs while loading suspended subgraphs from disk, which can reduce the I/O cost of traditional vertex-centric computing models, while improving CPU utilization by reducing idle waits and enabling continuous access to disks; in addition, the architecture uses shared memory data structures for message passing and efficient synchronization, which can separate computing from memory management and scheduling, thereby providing new opportunities for optimization.
  • the system adopts APIs based on a hybrid computing model.
  • the APIs adopt a unified PIE+ interface, which integrates vertex-centric and subgraph-centric programming models. Users can not only parallelize sequential graph algorithms under the subgraph-centric computing model to simplify parallel programming (inter-subgraph parallelism), but also further explore the parallelism within the subgraph based on the vertex-centric computing model through new interfaces.
  • the hybrid model supports both inter-subgraph parallelism of the "subgraph-centric computing model” and intra-subgraph parallelism of the "vertex-centric computing model”. Under limited memory, it can better utilize multi-core resources and avoid fragmentation of the input graph; in addition, it provides a unified interface from which users can choose the interface that best suits their applications and graphics.
  • the system further includes a scheduler.
  • the scheduler is used to track and allocate threads in a thread pool, where each thread corresponds to a physical CPU core. It decides to allocate physical threads to virtual worker threads to perform (parallel) calculations on subgraphs. It also actively adjusts to support two levels of parallelism: when threads are available, the scheduler allocates them to new computing units by consuming the "input queue" to speed up the parallelism between subgraphs, or through the running worker cores. To improve the parallelism within a subgraph.
  • the storage management module includes a message storage unit; the data calculation module is used to transmit the updated messages to the message storage unit.
  • the message storage unit is used to realize message synchronization between parallel computing units.
  • the message storage unit is implemented as a data structure in memory. In order to improve space efficiency, it can be implemented as a compact variable-length array. It should be noted that the space complexity of the message storage unit is closely related to the partitioning strategy. The more boundary vertices/edges there are, the larger the space consumed by the message storage unit. Compared with the message passing strategy of a multi-machine system, the use of a message storage unit is more efficient in a shared memory environment.
  • the storage management module further includes a state management unit; the state management unit stores the state information and can update the state information at a specific time.
  • the state management unit is used to maintain a state machine to model the state of the subgraph.
  • the state management unit is implemented as a lightweight data structure, and each subgraph only maintains a few states, and the memory space occupied can be ignored.
  • the state management unit uses the state management unit to record the state information of the subgraph to record the state information of the subgraph to record the state information of the subgraph. It only requires a tag list M to help track the message exchange between computing units, and a lightweight state machine to model the work progress of each computing unit. Specifically, the state management unit constructs a tag list M, and each subgraph corresponds to a tag to indicate whether it has received any messages in this round of iteration. If a subgraph has at least one pending update to be extracted from the message storage unit, its corresponding M[i] is true, otherwise, M[i] is false. In actual operation, a finite state machine can be used to model the progress of each subgraph, and the flag M[i] can be used to trigger the state transition of the subgraph.
  • the states of the subgraphs include “active”, “waiting for calculation”, “calculating”, “releasing” and “converging”. At any time, the subgraph is in one of the five states, where the first two states indicate that the subgraph is on the disk, and the remaining states indicate that the subgraph is in the memory.
  • the initial state of each subgraph is "active", which means that the subgraph is waiting for the data loading module to load it into the memory; when the subgraph is obtained by the data loading module, the state management unit is used to set the state of the subgraph to "waiting for calculation", which means It means that the subgraph has been resident in the memory and the subgraph is waiting to be assigned a processing core; when the subgraph is transmitted to the data computing module, the state management unit is used to set the state of the subgraph to "computing", which means that the subgraph is being processed by the processing core; when the subgraph is transmitted to the data release module (that is, the subgraph generates a message that needs to be sent to other subgraphs in the current round of update), the state management unit is used to set the state of the subgraph to "releasing"; when the subgraph is written to the disk, the state management unit is used to set the state of the subgraph to "converging"; when the current round of update ends, the state management unit is also used to set the state of the subgraph participating in the
  • the system can skip certain states in a round of computation without affecting correctness, that is, it can take some "shortcuts" in state transitions and reduce unnecessary computation and I/O.
  • the storage management module when the current round of updates is completed, the storage management module is used to set the state of the subgraph that received the message to "active" ("shortcut A").
  • the state of the subgraph with a "convergence” state needs to be reset to "active”. If M[i] corresponding to a subgraph is true at this time, the subgraph can be kept in the "convergence” state so that it does not participate in the next round of updates. This allows the processing of subgraphs that do not need to be updated to be skipped without affecting the correctness of the program.
  • "Shortcut A” is often used when the input graph is not well connected and a subgraph is "isolated”, and effectively reduces I/O costs.
  • the data release module is used to transfer the subgraph to the data loading module ("shortcut B"). After all the subgraphs have completed the current round of updates, a new round of updates will begin. If a subgraph is still in the "releasing" state, that is, it has not been completely saved to the disk, the state of the subgraph can be directly set to "waiting for calculation", thereby starting a new round of updates to the subgraph without going through the disk. "Shortcut B" can be used at the end of each round and effectively reduces I/O costs.
  • the data calculation module when there is no change in the updated sub-graph, the data calculation module is used to write the sub-graph to the disk ("shortcut C"). There is no change compared to before calculation. You can skip the "releasing" state and directly set it to “converging", which can effectively reduce redundant disk writes.
  • the data calculation module includes an aggregation calculation unit.
  • the aggregation calculation unit is used to call a preset aggregation function to aggregate all the sub-graphs to obtain updated large-scale graph data.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne un système de traitement de données de graphe à grande échelle basé sur une unité unique, comprenant un module de chargement de données, un module de calcul de données, un module de libération de données, un module de gestion de stockage et un disque ; le module de chargement de données est utilisé pour acquérir des sous-graphes dans un état actif à partir du disque, et transmettre les sous-graphes au module de calcul de données ; le module de calcul de données est utilisé pour mettre à jour les sous-graphes, et transmettre des messages générés par la mise à jour au module de gestion de stockage ; le module de calcul de données est en outre utilisé pour transmettre les sous-graphes au module de libération de données ; le module de libération de données est utilisé pour écrire les sous-graphes dans le disque ; lorsque les sous-graphes sont écrits dans le disque, le module de gestion de stockage est utilisé pour régler les états des sous-graphes de façon qu'ils soient convergents. L'application d'un modèle de calcul centré sur les sous-graphes à un système à unité unique et l'établissement d'un ensemble d'architecture de traitement à pipeline unique pouvant faire se chevaucher des opérations d'E/S de données et de CPU réduisent les coûts d'E/S de modèles de calcul classiques centrés sur les sommets tout en augmentant le taux d'utilisation de CPU, et favorisent un accès séquentiel au disque.
PCT/CN2023/101407 2023-06-12 2023-06-20 Système de traitement de données de graphe à grande échelle basé sur une unité unique Ceased WO2024254894A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202310695465.2 2023-06-12
CN202310695465.2A CN116680296B (zh) 2023-06-12 2023-06-12 一种基于单机的大规模图数据处理系统

Publications (1)

Publication Number Publication Date
WO2024254894A1 true WO2024254894A1 (fr) 2024-12-19

Family

ID=87790518

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/101407 Ceased WO2024254894A1 (fr) 2023-06-12 2023-06-20 Système de traitement de données de graphe à grande échelle basé sur une unité unique

Country Status (2)

Country Link
CN (1) CN116680296B (fr)
WO (1) WO2024254894A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109308327A (zh) * 2018-09-19 2019-02-05 浙江天猫技术有限公司 基于子图模型兼容点中心模型的图计算方法装置介质设备
US20200249998A1 (en) * 2019-02-01 2020-08-06 Alibaba Group Holding Limited Scheduling computation graph heterogeneous computer system
CN111859027A (zh) * 2019-04-24 2020-10-30 华为技术有限公司 图计算方法及装置
CN112988064A (zh) * 2021-02-09 2021-06-18 华中科技大学 一种面向并发多任务的磁盘图处理方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2003287815A1 (en) * 2002-12-02 2004-06-23 Mount Sinai Hospital Methods and products for representing and analyzing complexes of biological molecules
CN112352234B (zh) * 2018-06-15 2024-03-08 华为云计算技术有限公司 用于处理并发属性图查询的系统
WO2020019313A1 (fr) * 2018-07-27 2020-01-30 浙江天猫技术有限公司 Procédé de mise à jour de données de graphe, système, support d'informations lisible par ordinateur et dispositif
CN111241353B (zh) * 2020-01-16 2023-08-22 支付宝(杭州)信息技术有限公司 一种图数据的分区方法、装置以及设备
CN113392280B (zh) * 2021-06-10 2023-08-04 东北大学 一种面向跨区域的多主模型分布式图计算方法
CN113434702A (zh) * 2021-07-27 2021-09-24 支付宝(杭州)信息技术有限公司 一种用于图计算的自适应控制方法和系统
CN114756483A (zh) * 2022-03-31 2022-07-15 深圳清华大学研究院 基于核间存储访问的子图分段优化方法及应用

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109308327A (zh) * 2018-09-19 2019-02-05 浙江天猫技术有限公司 基于子图模型兼容点中心模型的图计算方法装置介质设备
US20200249998A1 (en) * 2019-02-01 2020-08-06 Alibaba Group Holding Limited Scheduling computation graph heterogeneous computer system
CN111859027A (zh) * 2019-04-24 2020-10-30 华为技术有限公司 图计算方法及装置
CN112988064A (zh) * 2021-02-09 2021-06-18 华中科技大学 一种面向并发多任务的磁盘图处理方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MEI ZHENJIE: "Research on Optimization Method of Graph-oriented Computation", CHINA MASTER’S THESES FULL-TEXT DATABASE, 24 May 2017 (2017-05-24), XP093247168 *

Also Published As

Publication number Publication date
CN116680296B (zh) 2026-01-13
CN116680296A (zh) 2023-09-01

Similar Documents

Publication Publication Date Title
CN110704360B (zh) 一种基于异构fpga数据流的图计算优化方法
CN101950282B (zh) 一种多处理器系统及其同步引擎
CN105653204A (zh) 一种基于磁盘的分布式图计算方法
CN103455371B (zh) 用于优化的管芯内小节点间消息通信的方法和系统
US20110265093A1 (en) Computer System and Program Product
CN111190735A (zh) 一种基于Linux的片上CPU/GPU流水化计算方法及计算机系统
WO2023274278A1 (fr) Procédé et dispositif d'ordonnancement de ressources et nœud informatique
CN116719646A (zh) 热点数据处理方法、装置、电子装置和存储介质
CN107168795A (zh) 基于cpu‑gpu异构复合式并行计算框架的密码子偏差系数模型方法
CN106095552A (zh) 一种基于i/o去重的多任务图处理方法及系统
CN109144749A (zh) 一种使用处理器实现多处理器间通信的方法
CN116340024B (zh) 仿真模型组件进程间的数据共享方法、计算机设备及介质
CN117539598A (zh) 任务处理方法、装置、电子设备及存储介质
CN116841952A (zh) 核间通信系统、方法、装置、设备、芯片及可读存储介质
CN107528871A (zh) 存储系统中的数据分析
CN114792186A (zh) 生产排产仿真方法及其装置
WO2024254894A1 (fr) Système de traitement de données de graphe à grande échelle basé sur une unité unique
CN118277490A (zh) 数据处理系统、数据同步方法、电子设备和存储介质
CN114741166B (zh) 一种分布式任务的处理方法、分布式系统及第一设备
CN119807082A (zh) 一种基于Stream write指令的GPU写存储系统及其方法
CN116820713A (zh) 流水控制方法、加速器和电子设备
Singh Communication Coroutines For Parallel Program Using DW26010 Many Core Processor
CN110515729B (zh) 基于图形处理器的图计算节点向量负载平衡方法及装置
CN108958904A (zh) 嵌入式多核中央处理器的轻量级操作系统的驱动程序框架
CN119861974B (zh) 基于数据流核心架构的任务处理方法、装置、设备及介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23941127

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE