WO2021027331A1 - Appareil et procédé de calcul de relation complète faisant appel à des données de graphiques, dispositif et support d'enregistrement - Google Patents

Appareil et procédé de calcul de relation complète faisant appel à des données de graphiques, dispositif et support d'enregistrement Download PDF

Info

Publication number
WO2021027331A1
WO2021027331A1 PCT/CN2020/087619 CN2020087619W WO2021027331A1 WO 2021027331 A1 WO2021027331 A1 WO 2021027331A1 CN 2020087619 W CN2020087619 W CN 2020087619W WO 2021027331 A1 WO2021027331 A1 WO 2021027331A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
data
identifier
degree
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2020/087619
Other languages
English (en)
Chinese (zh)
Inventor
邓强
张娟
屠宁
赵之砚
施奕明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Smart Technology Co Ltd
Original Assignee
OneConnect Smart Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Smart Technology Co Ltd filed Critical OneConnect Smart Technology Co Ltd
Publication of WO2021027331A1 publication Critical patent/WO2021027331A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • This application relates to the field of big data technology, and in particular to a method, device, device, and storage medium for calculating a full relationship based on graph data.
  • Graph data mining is an important method in relationship mining and group profiling.
  • Graph data is composed of nodes and edges.
  • the nodes in the graph data are used to represent the connected subjects, and the edges in the graph data are used to represent the association between the subjects.
  • a node is associated with other nodes through the edges connected to it.
  • One of the typical applications in graph calculation is to find the full relationship of a certain node and analyze the statistical characteristics. Among them, the calculation of the second-degree relationship and the third-degree relationship has become a difficult point in the graph calculation due to the large amount of calculation and computational resource consumption.
  • the current typical environment for graph computing is the GraphX environment in the Spark project open sourced by the Apache Software Foundation.
  • GraphX uses a memory computing strategy to achieve fast iterative calculations; however, the inventor realized that memory computing consumes a huge amount of memory resources, and it is difficult to support the processing of massive data.
  • index calculation of second-degree and third-degree associated nodes using GraphX to calculate graphs on 50 million nodes and 400 million edges will consume 2000GB of memory. The calculation efficiency of the full relationship is low. On social networks, such calculation efficiency is difficult to meet the application of billions to billions of nodes.
  • This application provides a method, device, device, and storage medium for calculating a full-scale relationship based on graph data, which are used to merge node attributes into node identifiers using bit operations to avoid duplication of node data, reduce memory resource consumption, and improve computing efficiency .
  • the first aspect of the embodiments of the present application provides a full-scale relationship calculation method based on graph data, including: obtaining preprocessed graph data, the preprocessed graph data including node data and edge data of each node; Perform a bit operation to generate the synthetic node ID of each node; divide the node data and the edge data with each node data as the center to generate multiple data groups, each data group includes a synthetic node of the current node ID and the edge data connected to the current node; send a single node ID list of each node to all adjacent nodes, the single node ID list is used to store the composite node ID of the adjacent node; according to each node The received single node identification list generates the second-degree relationship of each node.
  • a second aspect of the embodiments of the present application provides a full-scale relationship calculation device based on graph data, including: a first obtaining unit configured to obtain preprocessed graph data, the preprocessed graph data including node data of each node and Edge data; an operation generating unit for performing bit operations on the node data to generate a synthetic node identifier for each node; a division generating unit for centering the node data and the edge data on each node data Divide, generate multiple data groups, each data group includes a composite node ID of the current node and edge data connected to the current node; the first sending unit is used to send the single node ID list of each node to all Adjacent nodes, the single node identification list is used to store the composite node identification of the adjacent nodes; the first generating unit is used to generate the second degree of each node according to the single node identification list received by each node relationship.
  • the third aspect of the embodiments of the present application provides a computer device, including: one or more processors; a memory; one or more computer programs, wherein the one or more computer programs are stored in the memory and Is configured to be executed by the one or more processors, and the one or more computer programs are configured to execute a full relationship calculation method based on graph data, wherein the full relationship calculation method based on graph data includes :
  • the preprocessed graph data including node data and edge data of each node;
  • the second-degree relationship of each node is generated.
  • a fourth aspect of the embodiments of the present application provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, a method for calculating a full relationship based on graph data is implemented, wherein , The method for calculating the full relationship based on graph data includes the following steps:
  • the preprocessed graph data including node data and edge data of each node;
  • the second-degree relationship of each node is generated.
  • preprocessed graph data is obtained, and the preprocessed graph data includes node data and edge data of each node; bit operations are performed on the node data to generate a synthetic node identifier for each node ; Divide the node data and the edge data with each node data as the center to generate multiple data groups, each data group includes a synthetic node identifier of the current node and the edge data connected to the current node; The single-node identification list of each node is sent to all adjacent nodes, and the single-node identification list is used to store the composite node identification of the adjacent nodes; and each node is generated according to the single-node identification list received by each node. The second-degree relationship of the node. Use bit operation to merge node attributes into node identifiers, and eliminate self-connection, avoid node data duplication, reduce memory resource consumption, and improve calculation efficiency.
  • FIG. 1 is a schematic diagram of an embodiment of a method for calculating a full relationship based on graph data in an embodiment of the application;
  • FIG. 2 is a schematic diagram of another embodiment of a method for calculating a full relationship based on graph data in an embodiment of the application;
  • Fig. 3 is a schematic diagram of an embodiment of a full relation calculation device based on graph data in an embodiment of the application;
  • FIG. 4 is a schematic diagram of another embodiment of a full relationship calculation device based on graph data in an embodiment of the application;
  • FIG. 5 is a schematic diagram of an embodiment of a full relationship calculation device based on graph data in an embodiment of the application.
  • This application provides a method, device, device, and storage medium for calculating a full-scale relationship based on graph data, which are used to merge node attributes into node identifiers using bit operations to avoid duplication of node data, reduce memory resource consumption, and improve computing efficiency .
  • FIG. 1 a flowchart of a method for calculating a full relationship based on graph data provided by an embodiment of the present application, which specifically includes:
  • preprocessed graph data where the preprocessed graph data includes node data and edge data of each node.
  • the server obtains the preprocessed graph data, and the preprocessed graph data includes node data and edge data of each node. Specifically, the server loads the preprocessed graph data through Spark.
  • the preprocessed graph data is composed of node data and edge data.
  • the node data in the graph data is used to indicate the subject that is connected, and the edge data is used to indicate the association between the subjects. .
  • a node is related to other nodes through its connected edges.
  • the node data of a target node includes the target node attribute and a target node identifier
  • the target edge data includes a target edge identifier and two node identifiers connected to the target edge.
  • the node attribute contains several label data of the node.
  • the node label data may include an ID number, a mobile phone number, and three Boolean variables A, B, and C.
  • variable A is used to indicate the gender of the user, 1 is used to indicate male, 0 is used to indicate female;
  • variable B is used to indicate whether it is dishonest, 1 is used to indicate failure, and 0 is used to indicate no failure;
  • variable C is used to indicate whether there is a university degree, with 1 means a university degree, and 0 means no university degree.
  • the preprocessed graph data means that a large amount of original graph data has been deduplicated, and the graph data that meets the requirements have been selected.
  • the execution subject of this application may be a full-scale relational computing device based on graph data, or may be a terminal or a server, which is not specifically limited here.
  • the embodiment of the present application takes the server as the execution subject as an example for description.
  • the server performs bit operations on the node data to generate a synthetic node ID for each node. Specifically, the server determines multiple nodes in the node data; the server obtains the node attributes of each node and the initial node identifier corresponding to each node; the server obtains preset rules, which include the total storage bits for each node identifier Number and the starting and ending ordinal number of the storage location occupied by each variable; the server performs bit operations on the node attributes and initial node identification of each node according to the total number of storage bits for each node identification and the starting and ending ordinal number of the storage location occupied by each variable ; The server generates a synthetic node ID for each node.
  • the preset rules include: (1) the total number of storage bits for each node identification; (2) the starting and ending ordinal numbers of the storage bits occupied by each variable.
  • storage of ID card number, mobile phone number, and the aforementioned three Boolean variables A, B, and C are taken as examples for description.
  • the preset rules are: (1) Each node allocates 101 bits of storage; (2) 1-61 bits are used to store the ID number, the 62nd-98th bits store the mobile phone number, and the 99th bit stores the variable A, the 100th bit stores variable B, and the 101st bit stores variable C.
  • the server saves a lot of memory resources by using bit storage instead of individually assigning double integer storage space to each variable.
  • the initial node identification is collected by the user in the preprocessing stage of the graph data and stored on the external memory (hard disk) of the computer. After starting the calculation, the server loads the initial node identifier into the internal memory. The data identified by the initial node belongs to the original data, and its collection occurs before the bit operation.
  • each data group including a synthetic node identifier of the current node and edge data connected to the current node.
  • the server divides the node data and the edge data with each node data as the center, and generates multiple data groups.
  • Each data group includes a synthetic node identifier of the current node and the edge data connected to the current node.
  • the graph data contains node data and edge data. After being loaded into the memory, in order to perform distributed calculations, the graph data needs to be divided into small processing units, which are referred to as "data groups" here. By establishing a node-centric “data group”, it is ensured that each node has only one copy, which will only appear once, and avoiding multiple node data replication.
  • a data group contains the node ID and all the edge data on the node.
  • One edge data contains the composite node ID of the node and another node connected to it, but does not include any attribute data of the connected node.
  • the server sends the single node identification list of each node to all adjacent nodes, and the single node identification list is used to store the composite node identification of each node's adjacent nodes.
  • the single node identification list is collected based on the edge data on the data group. Because the edge data contains the synthetic node identification of the current node and the once-associated node connected to the current node, all connected nodes can be collected by traversing all the edge data.
  • the single node identifier list contains only the composite node identifiers of adjacent nodes.
  • node a and nodes b, c, and d are adjacent nodes. Then first collect the identification list of all adjacent nodes (ie, single-node identification list), where the node identification is the four letters a, b, c, d (here is a simple example, in actual situations it contains hundreds of millions of nodes, you need Use more complex node identification to indicate, for example, when the device is a node, use the production information of the device, such as the date of generation, serial number, etc., as the node identification). For node a, the obtained node identification list is the list of [b, c, d]; then, node a passes the list of [b, c, d] to the three nodes b, c, d, respectively.
  • the server generates the second-degree relationship of each node according to the single node identification list received by each node. Specifically, the server receives the list of single node identifiers of each node; the server determines its own synthetic node identifier for each node; the server separately receives the same node identifier as its own synthetic node identifier in the list of single node identifiers received by each node Delete; the server generates a second-degree relationship for each node, and the second-degree relationship is used to indicate that there is an interval between the second-degree associated node and the current node.
  • node a is connected to nodes b, c, and d, and nodes b, c, and d are not directly connected, then after node b receives the list [b, c, d] sent by node a, Delete b from the list, and there are two nodes [c, d] left in the list. These two nodes are the second-degree relationship of b.
  • a first-degree relationship refers to the connection between two nodes, that is, adjacent nodes; a second-degree relationship refers to a node between two nodes.
  • node b and node c are separated by node a, so node b and node c are second-degree related, that is, node b and node c have a second-degree relationship.
  • the method further includes: the server obtains the node attribute of each node according to the second-degree relationship of each node, and performs statistical analysis according to the node attribute of each node to generate an analysis result.
  • the server reads the second-degree relationship of each node; the server determines the node attribute of each node from the second-degree relationship; the server separates the node attribute of each node from the synthetic node identification according to preset rules; Perform statistical analysis on the node attributes of each node to generate analysis results.
  • FIG. 2 another flowchart of the full relationship calculation method based on graph data provided by the embodiment of the present application, which specifically includes:
  • the preprocessed graph data includes node data and edge data of each node.
  • the server obtains the preprocessed graph data, and the preprocessed graph data includes the node data and edge data of each node. Specifically, the server loads the preprocessed graph data through Spark.
  • the preprocessed graph data is composed of node data and edge data.
  • the node data in the graph data is used to indicate the subject that is connected, and the edge data is used to indicate the association between the subjects. .
  • a node is related to other nodes through its connected edges.
  • the node data of a target node includes the target node attribute and a target node identifier
  • the target edge data includes a target edge identifier and two node identifiers connected to the target edge.
  • the node attribute contains several label data of the node.
  • the node label data may include an ID number, a mobile phone number, and three Boolean variables A, B, and C.
  • variable A is used to indicate the gender of the user, 1 is used to indicate male, 0 is used to indicate female;
  • variable B is used to indicate whether it is dishonest, 1 is used to indicate failure, and 0 is used to indicate no failure;
  • variable C is used to indicate whether there is a university degree, with 1 means a university degree, and 0 means no university degree.
  • the preprocessed graph data means that a large amount of original graph data has been deduplicated, and the graph data that meets the requirements have been selected.
  • the execution subject of this application may be a full-scale relational computing device based on graph data, or may be a terminal or a server, which is not specifically limited here.
  • the embodiment of the present application takes the server as the execution subject as an example for description.
  • the server performs bit operations on the node data to generate a synthetic node ID for each node. Specifically, the server determines multiple nodes in the node data; the server obtains the node attributes of each node and the initial node identifier corresponding to each node; the server obtains preset rules, which include the total storage bits for each node identifier Number and the starting and ending ordinal number of the storage location occupied by each variable; the server performs bit operations on the node attributes and initial node identification of each node according to the total number of storage bits for each node identification and the starting and ending ordinal number of the storage location occupied by each variable ; The server generates a synthetic node ID for each node.
  • the preset rules include: (1) the total number of storage bits for each node identification; (2) the starting and ending ordinal numbers of the storage bits occupied by each variable.
  • storage of ID card number, mobile phone number, and the aforementioned three Boolean variables A, B, and C are taken as examples for description.
  • the preset rules are: (1) Each node allocates 101 bits of storage; (2) 1-61 bits are used to store the ID number, the 62nd-98th bits store the mobile phone number, and the 99th bit stores the variable A, the 100th bit stores variable B, and the 101st bit stores variable C.
  • the server saves a lot of memory resources by using bit storage instead of individually assigning double integer storage space to each variable.
  • the initial node identification is collected by the user in the preprocessing stage of the graph data and stored on the external memory (hard disk) of the computer. After starting the calculation, the server loads the initial node identifier into the internal memory. The data identified by the initial node belongs to the original data, and its collection occurs before the bit operation.
  • each data group including a synthetic node identifier of the current node and the edge data connected to the current node.
  • the server divides the node data and edge data with each node data as the center, and generates multiple data groups.
  • Each data group includes a synthetic node identifier of the current node and the edge data connected to the current node.
  • the graph data contains node data and edge data. After being loaded into the memory, in order to perform distributed calculations, the graph data needs to be divided into small processing units, which are referred to as "data groups" here. By establishing a node-centric “data group”, it is ensured that each node has only one copy, which will only appear once, and avoiding multiple node data replication.
  • a data group contains the node ID and all the edge data on the node.
  • One edge data contains the composite node ID of the node and another node connected to it, but does not include any attribute data of the connected node.
  • the server sends the single node identification list of each node to all adjacent nodes, and the single node identification list is used to store the composite node identification of the adjacent nodes of each node.
  • the single node identification list is collected based on the edge data on the data group. Because the edge data contains the synthetic node identification of the current node and the once-associated node connected to the current node, all connected nodes can be collected by traversing all the edge data.
  • the single node identifier list contains only the composite node identifiers of adjacent nodes.
  • node a and nodes b, c, and d are adjacent nodes. Then first collect the identification list of all adjacent nodes (ie, single-node identification list), where the node identification is the four letters a, b, c, d (here is a simple example, in actual situations it contains hundreds of millions of nodes, you need Use more complex node identification to indicate, for example, when the device is a node, use the production information of the device, such as the date of generation, serial number, etc., as the node identification). For node a, the obtained node identification list is the list of [b, c, d]; then, node a passes the list of [b, c, d] to the three nodes b, c, d, respectively.
  • each node According to the single node identification list received by each node, generate a second-degree relationship of each node.
  • the server generates the second-degree relationship of each node according to the single node identification list received by each node. Specifically, the server receives the list of single node identifiers of each node; the server determines its own synthetic node identifier for each node; the server separately receives the same node identifier as its own synthetic node identifier in the list of single node identifiers received by each node Delete; the server generates a second-degree relationship for each node, and the second-degree relationship is used to indicate that there is an interval between the second-degree associated node and the current node.
  • node a is connected to nodes b, c, and d, and nodes b, c, and d are not directly connected, then after node b receives the list [b, c, d] sent by node a, Delete b from the list, and there are two nodes [c, d] left in the list. These two nodes are the second-degree relationship of b.
  • the first-degree relationship refers to the connection between two nodes, that is, adjacent nodes;
  • the second-degree relationship refers to the distance between two nodes by one node.
  • node b and node c are separated by node a, so node b and node c are second-degree related, that is, node b and node c have a second-degree relationship.
  • the server generates a second-degree relationship identifier list according to the second-degree relationship of each node, and the second-degree relationship identifier list is used to store the second-degree relationship of each node.
  • the server sends the second-degree relationship identifier list of each node to all neighboring nodes, and the second-degree relationship identifier list is used to store the composite node identifier of the neighboring nodes of each node.
  • the three-degree relationship identifier list is used to store the three-degree relationship of each node, and the third-degree relationship is used for Indicates that there is a first-degree associated node and a second-degree associated node between the three-degree associated node and the current node.
  • the server generates a three-degree relationship identifier list for each node according to the second-degree relationship identifier list received by each node from neighboring nodes.
  • the three-degree relationship identifier list is used to store the three-degree relationship of each node, and the third-degree relationship is used to indicate Between the third-degree associated node and the current node, there is a first-degree associated node and a second-degree associated node.
  • the server obtains the second-degree relationship identifier list of each node; the server determines each node's own synthetic node identifier; the server separately receives the second-degree relationship identifier list received by each node that is the same as its own synthetic node identifier
  • the node ID is deleted; the server determines the three-degree relationship of each node, and the three-degree relationship is used to indicate the interval between two nodes (that is, the interval between the three-degree associated node and the current node is a one-degree associated node and a two-degree associated node.
  • Degree-related nodes generate a three-degree relationship identifier list, and the three-degree relationship identifier list is used to store the three-degree relationship of each node.
  • node a is connected to nodes b, c, and d
  • node e is connected to node b
  • nodes b, c, and d are not directly connected
  • node b receives the list [b, c, d] sent by node a, Delete b from the list, and there are two nodes [c, d] left in the list.
  • the node b continues to send the list [c, d] to the node e, and then obtains the three-degree relationship list of the node e, that is, the node e and the node c, and the e and d form a three-degree relationship.
  • node b will also send the node ID list [c, d] to node a.
  • the list [c, d] also exists in the one-degree relationship list [b, c, d] of the node a, c and d need to be excluded and do not form a three-degree relationship with a.
  • the method further includes: the server obtains the node attributes of each node according to the three-degree relationship identification list of each node, and performs statistical analysis according to the node attributes of each node.
  • the separation here refers to the process of reading the node attributes from the synthetic node identifier. It corresponds to the previous data reading process and follows the unified node attribute preset rules. Using the aforementioned example, use 101 bits to store the ID number, mobile phone number, and three Boolean variables of A, B, and C.
  • the separated node attributes are statistically analyzed according to business needs, which improves the calculation efficiency. For example, count all the friends worth recommending in the three-degree relationship, or count all the users with good credit, and so on.
  • the method further includes: the server obtains the original graph data of each node; the server performs deduplication and verification processing on the original graph data; and the server generates preprocessed graph data that meets the requirements.
  • An embodiment of the relational computing device includes:
  • the first obtaining unit 301 is configured to obtain preprocessed graph data, where the preprocessed graph data includes node data and edge data of each node;
  • the operation generating unit 302 is configured to perform a bit operation on the node data to generate a synthetic node identifier of each node;
  • the dividing and generating unit 303 is configured to divide the node data and the edge data with each node data as the center to generate multiple data groups, each data group including a composite node identifier of the current node and connection with the current node Edge data;
  • the first sending unit 304 is configured to send a single node identification list of each node to all adjacent nodes, and the single node identification list is used to store the composite node identification of the adjacent nodes;
  • the first generating unit 305 is configured to generate the second degree relationship of each node according to the single node identification list received by each node.
  • FIG. 4 another embodiment of the full relation calculation device based on graph data in the embodiment of the present application includes:
  • the first obtaining unit 301 is configured to obtain preprocessed graph data, where the preprocessed graph data includes node data and edge data of each node;
  • the operation generating unit 302 is configured to perform a bit operation on the node data to generate a synthetic node identifier of each node;
  • the dividing and generating unit 303 is configured to divide the node data and the edge data with each node data as the center to generate multiple data groups, each data group including a composite node identifier of the current node and connection with the current node Edge data;
  • the first sending unit 304 is configured to send a single node identification list of each node to all adjacent nodes, and the single node identification list is used to store the composite node identification of the adjacent nodes;
  • the first generating unit 305 is configured to generate the second degree relationship of each node according to the single node identification list received by each node.
  • the operation generating unit 302 is specifically configured to:
  • the full relationship calculation device based on graph data further includes:
  • the statistical analysis unit 306 is configured to obtain the node attribute of each node according to the second-degree relationship of each node, and perform statistical analysis according to the node attribute of each node to generate an analysis result.
  • the statistical analysis unit 306 is specifically used to:
  • the first generating unit 305 is specifically configured to:
  • Receive the single-node identification list of each node determine the connection status of each node; delete the synthetic node identification that is the same as itself in the received single-node identification list; generate the second degree relationship of each node, the second degree The relationship is used to indicate the second degree associated node and the current node, and there is an interval of one degree associated node.
  • the full relationship calculation device based on graph data further includes:
  • the second generating unit 307 is configured to generate a second-degree relationship identifier list according to the second-degree relationship of each node, and the second-degree relationship identifier list is used to store the second-degree relationship of each node;
  • the second sending unit 308 is configured to send the second-degree relationship identifier list of each node to all adjacent nodes;
  • the third generating unit 309 is configured to generate a third-degree relationship identifier list of each node according to the second-degree relationship identifier list sent by neighboring nodes received by each node, and the third-degree relationship identifier list is used to store the three-degree relationship identifier list of each node.
  • Degree relationship the three-degree relationship is used to indicate that there is an interval between a first-degree associated node and a second-degree associated node between the third-degree associated node and the current node.
  • the full relationship calculation device based on graph data further includes:
  • the second obtaining unit 310 is configured to obtain the original graph data of each node
  • the processing unit 311 is configured to perform deduplication processing and verification processing on the original image data
  • the fourth generating unit 312 is used to generate preprocessed image data that meets the requirements.
  • FIG. 5 is a schematic structural diagram of a full relational computing device based on graph data provided by an embodiment of the present application.
  • the full relational computing device 500 based on graph data may have relatively large differences due to different configurations or performances, and may include one or One or more processors (central processing units, CPU) 501 (for example, one or more processors) and a memory 509, and one or more storage media 508 (for example, one or one storage device with a large amount of storage) storing application programs 507 or data 506.
  • the memory 509 and the storage medium 508 may be short-term storage or persistent storage.
  • the program stored in the storage medium 508 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the full relational computing device based on graph data. Further, the processor 501 may be configured to communicate with the storage medium 508, and execute a series of instruction operations in the storage medium 508 on the full relationship computing device 500 based on graph data.
  • the full relational computing device 500 based on graph data may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input and output interfaces 504, and/or, one or more operating systems 505 , Such as Windows Serve, MacOS X, Unix, Linux, FreeBSD, etc.
  • operating systems 505 Such as Windows Serve, MacOS X, Unix, Linux, FreeBSD, etc.
  • FIG. 5 does not constitute a limitation on the full relationship computing device based on graph data, and may include more or less components than shown in the figure. Or combine certain components, or different component arrangements.
  • the processor 501 can execute the first acquisition unit 301, the operation generation unit 302, the division generation unit 303, the first generation unit 305, the statistical analysis unit 306, the second generation unit 307, the third generation unit 309, and the second generation unit 301 in the above embodiments. Functions of the acquiring unit 310, the processing unit 311, and the fourth generating unit 312.
  • the processor 501 is the control center of the full relationship calculation device based on graph data, and can perform processing in accordance with the set full relationship calculation method based on graph data.
  • the processor 501 uses various interfaces and lines to connect various parts of the entire graph data-based full relational computing device, by running or executing software programs and/or modules stored in the memory 509, and calling data stored in the memory 509, Perform various functions and processing data of full relational computing equipment based on graph data.
  • the storage medium 508 and the memory 509 are both carriers for storing data.
  • the storage medium 508 may refer to an internal memory with a small storage capacity but high speed, and the storage 509 may have a large storage capacity but a slow storage speed. External memory.
  • the memory 509 may be used to store software programs and modules.
  • the processor 501 executes various functional applications and data processing of the full relational computing device 500 based on graph data by running the software programs and modules stored in the memory 509.
  • the memory 509 may mainly include a program storage area and a data storage area.
  • the storage program area may store an operating system and at least one application program required by a function (for example, sending a single node identification list of each node to all adjacent nodes, The single node identifier list is used to store the composite node identifiers of adjacent nodes, etc.; the storage data area can store data created according to the use of the full relationship computing device based on graph data (such as the composite node identifier of each node).
  • the memory 509 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • a non-volatile memory such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, the storage medium being a volatile storage medium or a non-volatile storage medium,
  • the computer instructions can be sent from one website site, computer, server, or data center to another website site, through wired (such as coaxial cable, optical fiber, twisted pair) or wireless (such as infrared, wireless, microwave, etc.) Computer, server or data center for transmission.
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, an optical disc), or a semiconductor medium (for example, a solid state disk (SSD)).
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un appareil et un procédé de calcul de relation complète faisant appel à des données graphiques, un dispositif et un support d'enregistrement, utilisés pour fusionner des attributs de nœuds dans des identifiants de nœuds à l'aide d'opérations de bits, éviter la duplication de données de nœuds, réduire la consommation de ressources de mémoire et augmenter l'efficacité de calcul. Le procédé consiste à : acquérir des données graphiques prétraitées, les données graphiques prétraitées comprenant des données de nœuds et des données de bord de chaque nœud (101) ; réaliser des opérations de bits sur les données de nœuds pour générer un identifiant de nœud synthétisé de chaque nœud (102) ; diviser les données de nœuds et les données de bord avec chacune des données de nœuds au centre pour générer de multiples groupes de données, chaque groupe de données comprenant l'identifiant de nœud synthétisé d'un nœud courant et des données de bord connectées au nœud courant (103) ; envoyer une liste d'identifiants de nœuds uniques de chaque nœud à tous les nœuds adjacents, la liste d'identifiants de nœuds uniques étant utilisée pour mémoriser les identifiants de nœuds synthétisés des nœuds adjacents (104) ; et générer, sur la base de la liste d'identifiants de nœuds uniques reçue par chaque nœud, les relations de second degré de chaque nœud (105).
PCT/CN2020/087619 2019-08-15 2020-04-28 Appareil et procédé de calcul de relation complète faisant appel à des données de graphiques, dispositif et support d'enregistrement Ceased WO2021027331A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910751784.4A CN110609924A (zh) 2019-08-15 2019-08-15 基于图数据的全量关系计算方法、装置、设备及存储介质
CN201910751784.4 2019-08-15

Publications (1)

Publication Number Publication Date
WO2021027331A1 true WO2021027331A1 (fr) 2021-02-18

Family

ID=68889757

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/087619 Ceased WO2021027331A1 (fr) 2019-08-15 2020-04-28 Appareil et procédé de calcul de relation complète faisant appel à des données de graphiques, dispositif et support d'enregistrement

Country Status (2)

Country Link
CN (1) CN110609924A (fr)
WO (1) WO2021027331A1 (fr)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110609924A (zh) * 2019-08-15 2019-12-24 深圳壹账通智能科技有限公司 基于图数据的全量关系计算方法、装置、设备及存储介质
CN111984828B (zh) * 2020-06-30 2024-07-23 联想(北京)有限公司 一种邻居节点检索方法和装置
CN112528090B (zh) * 2020-12-11 2023-08-04 北京百度网讯科技有限公司 图数据的存储方法和存储装置
CN116308825B (zh) * 2023-03-15 2026-04-28 平安科技(深圳)有限公司 基于孪生节点的图数据时序处理方法、装置、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104952032A (zh) * 2015-06-19 2015-09-30 清华大学 图的处理方法、装置以及栅格化表示及存储方法
CN106919628A (zh) * 2015-12-28 2017-07-04 阿里巴巴集团控股有限公司 一种图数据的处理方法和装置
US20180203897A1 (en) * 2017-01-18 2018-07-19 Oracle International Corporation Fast graph query engine optimized for typical real-world graph instances whose small portion of vertices have extremely large degree
CN109446362A (zh) * 2018-09-05 2019-03-08 北京费马科技有限公司 基于外存的图数据库结构、图数据存储方法、装置
CN110609924A (zh) * 2019-08-15 2019-12-24 深圳壹账通智能科技有限公司 基于图数据的全量关系计算方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104952032A (zh) * 2015-06-19 2015-09-30 清华大学 图的处理方法、装置以及栅格化表示及存储方法
CN106919628A (zh) * 2015-12-28 2017-07-04 阿里巴巴集团控股有限公司 一种图数据的处理方法和装置
US20180203897A1 (en) * 2017-01-18 2018-07-19 Oracle International Corporation Fast graph query engine optimized for typical real-world graph instances whose small portion of vertices have extremely large degree
CN109446362A (zh) * 2018-09-05 2019-03-08 北京费马科技有限公司 基于外存的图数据库结构、图数据存储方法、装置
CN110609924A (zh) * 2019-08-15 2019-12-24 深圳壹账通智能科技有限公司 基于图数据的全量关系计算方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN110609924A (zh) 2019-12-24

Similar Documents

Publication Publication Date Title
US20220237184A1 (en) Dynamically assigning queries to secondary query processing resources
CN110543586B (zh) 多重用户身份融合方法、装置、设备及存储介质
US20220269680A1 (en) Context dependent execution time prediction for redirecting queries
CN102236581B (zh) 用于数据中心的映射化简方法和系统
CN104809242B (zh) 一种基于分布式结构的大数据聚类方法和装置
WO2021027331A1 (fr) Appareil et procédé de calcul de relation complète faisant appel à des données de graphiques, dispositif et support d'enregistrement
CN110147357A (zh) 一种基于大数据环境下的多源数据聚合抽样方法及系统
CN104820708B (zh) 一种基于云计算平台的大数据聚类方法和装置
CN112925859A (zh) 数据存储方法和装置
CN112148693A (zh) 一种数据处理方法、装置及存储介质
CN113687964B (zh) 数据处理方法、装置、电子设备、存储介质及程序产品
CN107832407A (zh) 用于生成知识图谱的信息处理方法、装置和可读存储介质
US20220253222A1 (en) Data reduction method, apparatus, computing device, and storage medium
US11768814B2 (en) Data transmissions between two databases
US20200104425A1 (en) Techniques for lossless and lossy large-scale graph summarization
US20200159594A1 (en) Systems and methods for dynamic partitioning in distributed environments
CN114398520A (zh) 数据检索方法、系统、装置、电子设备及存储介质
CN106156319A (zh) 可伸缩的分布式的资源描述框架数据存储方法及装置
Elagib et al. Big data analysis solutions using MapReduce framework
CN111737206B (zh) 一种文件重删处理方法、系统、终端及存储介质
Marcu et al. Storage and ingestion systems in support of stream processing: a survey
Cao et al. LogKV: Exploiting key-value stores for event log processing
EP3559797A1 (fr) Indices meta-join et meta-group-by pour données volumineuses
CN108090186A (zh) 一种大数据平台上的电力数据去重方法
US11249901B2 (en) Ownership-based garbage collection of data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20852001

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20852001

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 03.08.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20852001

Country of ref document: EP

Kind code of ref document: A1