WO2024103714A1 - 一种数据处理方法、系统、装置及相关设备 - Google Patents
一种数据处理方法、系统、装置及相关设备 Download PDFInfo
- Publication number
- WO2024103714A1 WO2024103714A1 PCT/CN2023/100673 CN2023100673W WO2024103714A1 WO 2024103714 A1 WO2024103714 A1 WO 2024103714A1 CN 2023100673 W CN2023100673 W CN 2023100673W WO 2024103714 A1 WO2024103714 A1 WO 2024103714A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- metadata
- model
- management device
- permission
- mapping relationship
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6209—Protecting access to data via a platform, e.g. using keys or access control rules to a single file or object, e.g. in a secure envelope, encrypted and accessed using a key, or with access control rules appended to the object itself
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/214—Database migration support
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/258—Data format conversion from or to a database
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/604—Tools and structures for managing or administering access control systems
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/629—Protecting access to data via a platform, e.g. using keys or access control rules to features or functions of an application
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2141—Access rights, e.g. capability lists, access control lists, access tables, access matrices
Definitions
- the present application relates to the field of big data technology, and in particular to a data processing method, system, device and related equipment.
- Storage-computing separation architecture refers to a layered architecture that separates storage capacity from computing capacity and interconnects them through a network. It has become one of the mainstream technology trends in recent years. Among them, the storage-computing separation architecture includes a storage layer and a computing layer.
- the storage layer includes at least one storage hardware for persistent storage of data; in actual applications, the amount of data stored in the storage layer is large, forming a data lake.
- the computing layer includes at least one computing engine for reading and writing data on the storage layer and performing corresponding calculations.
- the metadata corresponding to the data in the storage layer is deployed in the computing layer. This means that when the calculation includes multiple computing engines, the metadata needs to be copied into multiple copies and configured in each computing engine so that multiple computing engines can share the data in the same storage layer. However, the copying and migration of metadata between different computing engines will form redundant data and easily cause data inconsistency problems.
- a management layer (or data analysis layer) can be added to the storage-computing separation architecture.
- the management layer is connected to the computing layer and the storage layer through the network to achieve unified management of the metadata corresponding to the data in the storage layer.
- Each computing engine in the computing layer obtains metadata through the metadata model and permission model in the management layer, so as to use the metadata to implement operations such as reading and writing data in the storage layer.
- the metadata model refers to the metadata structure adopted by the management layer
- the permission model refers to the permission definition corresponding to the metadata structure.
- the management layer connects to the computing engine of the computing layer, it is required that the computing engine be able to adapt to the metadata model and permission model fixed in the management layer. This makes it difficult for some computing engines to access the storage layer because they cannot adapt to the metadata model and permission model built into the management layer, thus limiting the scalability of the management layer connecting to the computing engine.
- the embodiment of the present application provides a data processing method to improve the scalability of the data processing system docking with the computing engine.
- the present application also provides a corresponding data processing system, a management device, a computing device cluster, a computer-readable storage medium, and a computer program product.
- an embodiment of the present application provides a data processing method, which is applied to a data processing system, wherein the data processing system includes a computing engine, a management device, and a storage device, and the management device is equipped with a first metadata model and a first permission model, and the first metadata model and the first permission model are adapted to the storage device.
- the management device receives an access request for metadata of the target data sent by the computing engine, and responds to the access request, determines the metadata of the target data according to a first mapping relationship, the first mapping relationship is a mapping relationship between the first metadata model and the second metadata model, the second metadata model is adapted to the computing engine, and the determined metadata of the target data satisfies the second metadata model; and the management device authenticates the access request according to the second mapping relationship, the second mapping relationship is a mapping relationship between the first permission model and the second permission model, and the second permission model is adapted to the computing engine, so that the management device authenticates the access request according to the second mapping relationship.
- the metadata of the target data is sent to the computing engine.
- the computing engine can adapt to the metadata model and permission model built into the management device based on the mapping between metadata models and the mapping between permission models. This can get rid of the adaptation limitations of the model built into the management device to the computing engine and improve the scalability of the data processing system docking with the computing engine.
- the management device can also receive a metadata update request sent by the computing engine, and respond to the metadata update request, and translate the original metadata carried in the metadata update request into target metadata according to the first mapping relationship, the original metadata satisfies the second metadata model, and the target metadata satisfies the first metadata model, so that the management device authenticates the metadata update request according to the second mapping relationship, and after the metadata update request is authenticated, updates the target metadata to the management device, for example, the target metadata can be persistently stored.
- the computing engine can update the metadata in the management device through the mapping relationship between the metadata model and the permission model, so that the computing engine can subsequently write new data to the storage device.
- the management device may also output a configuration interface, which may be presented to the user through a client provided to the outside by the data processing system, so that the management device responds to a first operation performed by the user on the configuration interface, establishes a first mapping relationship between the first metadata model and the second metadata model, and responds to a second operation performed by the user on the configuration interface, establishes a second mapping relationship between the first permission model and the second permission model.
- a configuration interface which may be presented to the user through a client provided to the outside by the data processing system, so that the management device responds to a first operation performed by the user on the configuration interface, establishes a first mapping relationship between the first metadata model and the second metadata model, and responds to a second operation performed by the user on the configuration interface, establishes a second mapping relationship between the first permission model and the second permission model.
- the management device may also generate an access control policy for the second metadata model and the second permission model in response to a third operation performed by the user on the configuration interface, so as to utilize the access control policy to implement constraints on access operations of the computing engine. In this way, the convenience of user configuration can be improved through interface interaction.
- the data processing system includes multiple computing engines
- the management device includes a metadata model and a permission model adapted by each of the multiple computing engines. In this way, the management device can use the multiple computing engines to reduce the difficulty of docking the computing engines, thereby further improving the scalability of the computing engines.
- the management device when the management device determines the metadata of the target data according to the first mapping relationship, it can be specifically that the first metadata is first read according to the access request, and the first metadata satisfies the first metadata model, and then the management device translates the first metadata into the second metadata that satisfies the second metadata model (that is, the metadata of the aforementioned target data) according to the first mapping relationship; and when authenticating the access request, the management device can specifically translate the first permission information in the access request into the second permission information that satisfies the first permission model according to the second mapping relationship, and the first permission information satisfies the second permission model, and then authenticates the second permission information; in this way, the management device can specifically send the second metadata to the computing engine after the second permission information is authenticated.
- the management device can determine the metadata required for the computing engine based on the first mapping relationship, and authenticate the computing engine based on the second mapping relationship, so as to achieve adaptation between the management device and the computing engine, thereby improving the scalability of the data processing system docking with the computing engine.
- the present application provides a data processing system, which includes a computing engine, a management device, and a storage device.
- the management device has a first metadata model and a first permission model built in.
- the first metadata model and The first permission model is adapted to the storage device;
- the computing engine is used to generate an access request for the metadata of the target data and send the access request to the management device;
- the management device is used to respond to the access request, determine the metadata of the target data according to the first mapping relationship, and authenticate the access request according to the second mapping relationship; after the access request is authenticated, the metadata of the target data is sent to the computing engine, wherein the first mapping relationship is a mapping relationship between the first metadata model and the second metadata model, the second metadata model is adapted to the computing engine, the metadata of the target data satisfies the second metadata model, and the second mapping relationship is a mapping relationship between the first permission model and the second permission model; the computing engine is also used to read the target data stored in the storage device according to the metadata of the target data.
- the computing engine is further used to generate a metadata update request and send the metadata update request to the management device; the management device is further used to respond to the metadata update request, translate original metadata carried in the metadata update request into target metadata according to the first mapping relationship, the original metadata satisfies the second metadata model, the target metadata satisfies the first metadata model, authenticate the metadata update request according to the second mapping relationship, and update the target metadata to the management device after the metadata update request passes the authentication.
- the management device is further configured to output a configuration interface, and establish a first mapping relationship in response to a first operation performed by a user on the configuration interface, and establish a second mapping relationship in response to a second operation performed by a user on the configuration interface.
- the management device is further configured to generate an access control policy for the second metadata model and the second permission model in response to a third operation performed by the user on the configuration interface.
- the data processing system includes multiple computing engines
- the management device includes a metadata model and a permission model adapted for each of the multiple computing engines.
- the metadata of the target data is the second metadata
- the management device is specifically used to read the first metadata according to the access request, the first metadata satisfies the first metadata model, translate the first metadata into the second metadata that satisfies the second metadata model according to the first mapping relationship, translate the first permission information in the access request into the second permission information that satisfies the first permission model according to the second mapping relationship, the first permission information satisfies the second permission model, and authenticate the second permission information; and after the second permission information is authenticated, send the second metadata to the computing engine.
- the present application provides a management device, which is applied to a data processing system.
- the data processing system also includes a computing engine and a storage device.
- the management device is equipped with a first metadata model and a first permission model, and the first metadata model and the first permission model are adapted to the storage device;
- the management device includes: an interaction module, which is used to receive an access request for metadata of target data sent by the computing engine, and the target data is stored in the storage device; a metadata determination module, which is used to respond to the access request and determine the metadata of the target data according to a first mapping relationship, the first mapping relationship is a mapping relationship between the first metadata model and the second metadata model, the second metadata model is adapted to the computing engine, and the metadata of the target data satisfies the second metadata model; an authentication module, which is used to authenticate the access request according to the second mapping relationship, the second mapping relationship is a mapping relationship between the first permission model and the second permission model, and the second permission model is adapted to the computing engine;
- the interaction module
- the interaction module is further used for the management device to receive a metadata update request sent by the computing engine;
- the metadata determination module is further used for responding to the metadata update request, translating the original metadata carried in the metadata update request into target metadata according to the first mapping relationship, the original metadata satisfying the second metadata model, and the target metadata The data satisfies the first metadata model;
- the authentication module is further used to authenticate the metadata update request according to the second mapping relationship;
- the interaction module is further used to update the target metadata to the management device after the metadata update request passes the authentication.
- the interaction module is also used to output the configuration interface;
- the management device also includes: a mapping module, used to respond to a first operation performed by the user on the configuration interface, establish a first mapping relationship, and respond to a second operation performed by the user on the configuration interface, establish a second mapping relationship.
- mapping module is further configured to generate an access control policy for the second metadata model and the second permission model in response to a third operation performed by the user on the configuration interface.
- the data processing system includes multiple computing engines
- the management device includes a metadata model and a permission model adapted for each of the multiple computing engines.
- the metadata of the target data is the second metadata
- the metadata determination module is specifically used to read the first metadata according to the access request, the first metadata satisfies the first metadata model, and translates the first metadata into the second metadata that satisfies the second metadata model according to the first mapping relationship
- the authentication module is specifically used to translate the first permission information in the access request into the second permission information that satisfies the first permission model according to the second mapping relationship, the first permission information satisfies the second permission model, and authenticates the second permission information
- the interaction module is specifically used to send the second metadata to the computing engine after the second permission information is authenticated.
- the management device provided in the third aspect corresponds to the data processing method provided in the first aspect. Therefore, the technical effects of the third aspect and any implementation method of the third aspect can refer to the technical effects of the first aspect or the corresponding implementation method of the first aspect.
- the present application provides a computing device cluster, wherein the computing device includes at least one computing device, and the at least one computing device includes at least one processor and at least one memory; the at least one memory is used to store instructions, and the at least one processor executes the instructions stored in the at least one memory, so that the computing device cluster executes the data processing method in the above-mentioned first aspect or any possible implementation of the first aspect.
- the memory can be integrated into the processor or can be independent of the processor.
- the at least one computing device may also include a bus.
- the processor is connected to the memory via a bus.
- the memory may include a readable memory and a random access memory.
- the present application provides a computer-readable storage medium, wherein instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is executed on at least one computing device, the at least one computing device executes the method described in the first aspect or any one of the implementations of the first aspect.
- the present application provides a computer program product comprising instructions, which, when executed on at least one computing device, enables the at least one computing device to execute the method described in the first aspect or any one of the implementations of the first aspect.
- FIG1 is a schematic diagram of the structure of an exemplary data processing system provided by the present application.
- FIG. 2 is a schematic diagram of the data structure of the metadata model 1 built into the management device 102 provided by the present application;
- FIG3 is a schematic diagram of a permission model 1 built into the management device 102 provided by the present application.
- FIG4 is a schematic diagram of an exemplary access control strategy provided by the present application.
- FIG5 is a schematic diagram of a data structure of a metadata model 2 adapted to the computing engine 1012 provided by the present application;
- FIG6 is a schematic diagram of the data structure of another metadata model 2 adapted to the computing engine 1012 provided by the present application;
- FIG. 7 is a schematic diagram of a permission model 2 adapted to the computing engine 1012 provided by the present application.
- FIG8 is a schematic diagram of another permission model 2 adapted to the computing engine 1012 provided by the present application.
- FIG9 is a schematic diagram of establishing a mapping between metadata model 1 and metadata model 2 provided in the present application.
- FIG10 is a schematic diagram of establishing a mapping between a permission model 1 and a permission model 2 provided by the present application;
- FIG11 is a schematic diagram of establishing a mapping between another permission model 1 and a permission model 2 provided in the present application;
- FIG12 is a schematic diagram of an access control policy defined for metadata model 2 provided by the present application.
- FIG13 is a schematic diagram of another access control policy defined for metadata model 2 provided by the present application.
- FIG14 is a flow chart of an exemplary data processing method provided by the present application.
- FIG15 is a schematic diagram of the structure of a computing device provided by the present application.
- FIG16 is a schematic diagram of the structure of a computing device cluster provided in the present application.
- the data processing system 100 includes a computing device 101, a management device 102, and a storage device 103, and the computing device 101, the management device 102, and the storage device 103 can be connected to each other through a network.
- the computing device 101 may include one or more computing engines, such as a structured query language (SQL) computing engine, an artificial intelligence (AI) computing engine, and a third-party computing engine. It may be a computing engine of an open source community or a computing engine for commercial use launched by a cloud vendor. Taking the SQL computing engine as an example, the SQL computing engine may specifically be a Presto engine, a Hive engine, a Spark engine, a Clickhouse engine, etc., and these types of computing engines all have versions of open source communities and versions for commercial use. For ease of understanding, FIG1 is illustrated by taking the computing device 101 including computing engines 1011 and computing engines 1012 as an example. The computing engines 1011 and 1012 belong to different types of computing engines. In other embodiments, the computing device 101 may include any number and any type of computing engines. The computing device 101 is used to read and write data in the storage device 103 through the computing engines it includes.
- SQL structured query language
- AI artificial intelligence
- third-party computing engine a third-party computing engine. It may be a
- the management device 102 has a built-in fixed metadata model 1 and a permission model 1.
- the metadata model 1 and the permission model 1 are adapted to the storage device 103, so that the management device 102 can use the metadata model 1 to manage the metadata corresponding to the data stored in the storage device 103, and use the permission model 1 to authenticate the processing operations on the metadata.
- the metadata is used to describe the attribute information of the data stored in the storage device 103, such as the directory to which the data belongs, the database to which the data belongs, the storage location of the data, the storage format, the compression algorithm used, the partition to which the data belongs, etc. interest.
- the storage device 103 is used for persistent storage of data, such as storing data uploaded by one or more users.
- the storage device 103 can be stored based on a data block format, a file format, or an object format, or adopt other storage methods such as column storage and message queues, which are not limited in this embodiment.
- the computing engine (such as SQL computing engine 1011 or AI computing engine 1012) in computing device 101 needs to access the data in storage device 103, it usually needs to first obtain the metadata corresponding to the data through management device 102, so as to access the data in storage device 103 based on the metadata. Based on this, when the computing engine is deployed in computing device 101, the computing engine is required to adapt to the metadata model 1 and permission model 1 built into management device 102, otherwise, the computing engine cannot interact with storage device 103 through management device 102. Therefore, in actual application scenarios, some computing engines may need to perform a more complex adaptation process to achieve docking with management device 102, which is difficult and time-consuming to adapt. There are even some computing engines that cannot adapt to the metadata model 1 and permission model 1 built into management device 102, which limits the scalability of docking computing engines with management device 102 in data processing system 100.
- the present application realizes the connection between the computing engine and the management device 102 by mapping the metadata model 2 and the permission model 2 that the computing engine can adapt to the metadata model 1 and the permission model 1 in the management device 102, respectively.
- the metadata model 2 and the permission model 2 that the computing engine can adapt to can be deployed in the management device 102, and a mapping relationship 1 between the metadata model 2 and the metadata model 1 and a mapping relationship 2 between the permission model 2 and the permission model 1 can be established.
- the computing engine in the computing device 102 can indirectly adapt to the metadata model 1 built into the management device 102 through the metadata model 2 that adapts to it, and indirectly adapt to the permission model 1 built into the management device 102 through the permission model 2 that adapts to it, so that the computing engine can access the management device 102.
- the data processing system 100 can be deployed in the cloud to provide users with cloud services for data processing, such as cloud services for data computing and data storage.
- the computing device 101, the management device 102, and the storage device 103 in the data processing system 100 can be implemented by a computing device or a computing device cluster in the cloud, respectively; the computing device 101, the management device 102, and the storage device 103 can also be deployed on the same computing device, or deployed in the same computing device cluster.
- the data processing system 100 can be deployed locally, so as to provide users with local data processing services.
- the computing device 101 and the management device 102 in the above data processing system 100 can be implemented by software or hardware respectively.
- computing device 101 may include code running on a computing instance.
- the computing instance may include at least one of a host, a virtual machine, and a container.
- the above-mentioned computing instance may be one or more.
- computing device 101 may include code running on multiple hosts/virtual machines/containers.
- the multiple hosts/virtual machines/containers used to run the code may be distributed in the same region (region) or in different regions.
- the multiple hosts/virtual machines/containers used to run the code may be distributed in the same availability zone (AZ) or in different AZs.
- each AZ includes one data center or multiple geographically close data centers.
- a region can include multiple AZs.
- VPC virtual private cloud
- multiple hosts/virtual machines/containers used to run the code can be distributed in the same virtual private cloud (VPC) or in multiple VPCs.
- VPC virtual private cloud
- a VPC is set up in a region.
- a communication gateway needs to be set up in each VPC to achieve interconnection between VPCs through the communication gateway.
- the computing device 101 is taken as an example of a hardware functional unit, and the computing device 101 may include at least one computing device, such as a server, etc. Alternatively, the computing device 101 may also be a device implemented by an application-specific integrated circuit (ASIC) or a programmable logic device (PLD).
- the PLD may be a complex programmable logical device (CPLD), a field-programmable gate array (FPGA), a generic array logic (GAL), a data processing unit (DPU), or any combination thereof.
- CPLD complex programmable logical device
- FPGA field-programmable gate array
- GAL generic array logic
- DPU data processing unit
- the multiple computing devices may be distributed in the same region or in different regions.
- the multiple computing devices included in the computing device 101 may be distributed in the same AZ or in different AZs.
- the multiple computing devices included in the computing device 101 may be distributed in the same VPC or in multiple VPCs.
- the multiple computing devices may be any combination of computing devices such as servers, ASICs, PLDs, CPLDs, FPGAs, and GALs.
- the management device 102 is similar to the computing device 101.
- the management device 102 may be a code running on a computing instance; when implemented by hardware, the management device 102 may include one or more computing devices.
- the storage device 103 is implemented by hardware and may include at least one storage device with data storage capability, such as one or more storage servers, or may include a device with a persistent storage medium, etc.
- the persistent storage device may be, for example, a hard disk (such as an SSD, HDD).
- the data processing system 100 shown in FIG. 1 is only used as an exemplary illustration. In actual application, the data processing system 100 may also have other implementations.
- the management device 102 in the data processing system 100 may include more numbers or types of metadata models and permission models; or, the metadata model 2 and the permission model 2 adapted to the computing engine in the computing device 101 may also be configured outside the management device 102, such as configured in the computing device 101, etc., and this embodiment does not limit this.
- the management device 102 in the data processing system 100 is fixedly configured with a metadata model 1 and a permission model 1, which may be, for example, a metadata model and a permission model adapted to an open source Hive engine.
- the metadata model 1 built into the management device 102 adopts the metadata structure shown in Figure 2.
- the metadata model includes information such as catalog, database, data table, function, column, partition, row, and location information.
- the directory is the top-level data structure resource in the storage device 103, that is, the largest namespace.
- a directory may include N1 databases, where N1 is a natural number.
- the database is the data structure resource at the next level of the directory.
- the lower-level data structure resources of the database include data tables and functions, and a database may include N2 data tables and N3 functions, where N2 and N3 are both natural numbers.
- a data table which usually includes a view and an index, is a lower-level data structure resource of a database.
- the lower-level data structure resource of the data table includes two dimensions, one dimension is a vertical data organization "column”, and the other dimension is a horizontal data organization "partition” or "row”.
- the data table and its lower-level data structure resource "partition” are mapped to "position", wherein "position” is used to indicate the storage location of the underlying data of the data table and partition in the storage device 103.
- Functions including built-in functions and user defined scalar functions (UDF), are lower-level data structure resources of the database and are mapped to a "location". This "location" is also used to indicate the underlying data storage location of the software package (JAR package) of the function's implementation class (such as a Java implementation class, etc.).
- JAR package software package
- implementation class such as a Java implementation class, etc.
- the metadata model 1 may also adopt other metadata structures adapted to the storage device 103, and this embodiment does not limit this.
- the permission model 1 built into the management device 102 may be as shown in FIG. 3 .
- the permissions corresponding to the directory may include “all”, “create database”, “modify (alter)”, “create catalog (creat catalog)”, “list all databases (list all database)”, etc.
- the permissions corresponding to the database may include: “all”, “create table”, “modify”, “delete (drop, delete structure)”, “describe”, “list table”, “list function”, “list database”, “list all database”, etc.
- the permissions corresponding to the data table may include: “All”, “Modify”, “Delete (structure)”, “Describe”, “Update (update)”, “Insert (insert)”, “Delete (delete, delete data)”, “Query (select)”, etc.
- the permissions corresponding to the columns may include: “All”, “Query”, etc.
- the permissions corresponding to the function may include: “all”, “create (creat)”, “execute (execucte)”, “delete (structure)”, etc.
- the permissions corresponding to the location may include: “read”, “write”, etc.
- the management device 102 may also include an access control policy for the metadata model 1 and the permission model 1, which is used to indicate the permission content required for the computing engine to call the application programming interface (API) to access the metadata in the management device 102.
- the access control policy may be as shown in FIG. 4 .
- the required permission is the "list” permission or the "all” permission.
- the required permission is the global "create directory" permission.
- the permission policies indicate the operations allowed, the objects of the operations, and the subjects who request the operations.
- the computing engine 1011 can access the management device 102 based on the metadata model 1 and the permission model 1, so that the computing engine 1011 can be connected to the management device 102 without additional configuration operations.
- the computing engine 1012 when the computing engine 1012 is deployed in the computing device 101, the computing engine 1012 is not compatible with the metadata model 1 and the permission model 1 built into the management device 102.
- the computing engine 1012 may be a Presto engine, etc.
- a metadata model 2 and a permission model 2 compatible with the computing engine 1012 are deployed in the management device 102, and a mapping relationship 1 is established between the metadata model 2 and the metadata model 1, and a mapping relationship 2 is established between the permission model 2 and the permission model 1.
- the deployed metadata model 2 and permission model 2 may be user-defined models, or may be known models adapted to the computing engine 1012 .
- a user may request to deploy a computing engine 1012 in the data processing system 100 , and provide the data processing system 100 with a metadata model 2 and a permission model 2 customized for the computing engine 1012 .
- the user-defined metadata model may adopt the metadata structure shown in FIG5, where the metadata model 2 includes a directory, a schema, a data table, a UDF (custom scalar function), a column, a partition, a row, and a position.
- the user-defined metadata model may adopt the metadata structure shown in FIG6, where the metadata model 2 includes a database, a data table, a view, a row, and a position.
- the user can also define the permission model 2 shown in Figure 7.
- the permissions corresponding to the directory can include “all” or “administrator”.
- the permissions corresponding to the mode include “use”, “create”, “delete (structure)”, etc.
- the permissions corresponding to the UDF include “all”, “create”, “delete (structure)”, “modify”, “query”, etc.
- the permissions corresponding to the data table include “all”, “query”, “insert”, “delete (data)”, “modify”, “update”, etc. (the permissions corresponding to the remaining metadata are not shown).
- the permissions corresponding to the database may include “create”.
- the permissions corresponding to the view may include “query”, “create”, etc.
- the permissions corresponding to the data table may include “query”, “insert”, etc. (the permissions corresponding to the other metadata are not shown).
- the user may define a mapping relationship 1 between the metadata model 2 and the metadata model 1 built into the management device 102 .
- the data processing system 100 may present a client to the outside, and the client may be, for example, an application running on a user-side device, or a web browser provided to the outside by the data processing system 100.
- the interactive module 1021 and the mapping module 1022 may be included, wherein the interactive module 1021 may output a configuration interface to the client, and the client may present the configuration interface to the user.
- the user may perform a first operation on the configuration interface to establish a mapping relationship 1 between the metadata model 2 and the metadata model 1.
- the interactive module 1021 may feed back the first operation performed by the user to the mapping module 1022, and the mapping module 1022 may map a certain metadata level in the metadata model 2 with the same metadata level in the metadata model 1 according to the first operation.
- mapping module 1022 may map the "schema" level in the metadata model 2 with the "database” level in the metadata model 1 based on the first operation performed by the user, or may map the "UDF" level in the metadata model 2 with the "function” level in the metadata model 1.
- mapping module 1022 can map the “database” level in metadata model 2 with the “directory” level and the “database” level in metadata model 1, respectively, based on the first operation performed by the user, or can map the “view” level in metadata model 2 with the “data table” level in metadata model 1. Then, mapping module 1022 also maps the remaining metadata levels in the two metadata models based on the already mapped levels.
- the mapping module 1022 may directly map the attribute that exists in the metadata model 2 but not in the metadata model 1 to an existing attribute in the metadata model 1 (that is, an existing metadata level, the same below) during the process of establishing the mapping relationship 1; and may map the attribute that does not exist in the metadata model 2 but exists in the metadata model 1 to an existing attribute in the metadata model 2.
- mapping module 1022 also establishes a mapping relationship 2 between the permission model 2 and the permission model 1 built into the management device 102 .
- the user can perform a second operation on the mapping relationship between the permission models on the configuration interface presented by the client.
- the interaction module 1021 can feed back the second operation to the mapping module 1022, and the mapping module 1022 establishes a mapping relationship 2 between the operation permissions corresponding to each metadata level in the metadata model 2 and the operation permissions corresponding to each metadata level in the metadata model 1 according to the second operation.
- the mapping module 1022 can establish a mapping relationship between the permission model 2 shown in FIG7 and the permission model 1.
- the operation permissions "all" and “administrator” for the directory in the permission model 2 can be mapped with the operation permissions "all", “create database”, “modify” and the like in the permission model 1, and a one-to-one mapping relationship, a many-to-one mapping relationship, or a one-to-many mapping relationship can be established between the operation permissions for the directory in the permission model 2 and the operation permissions for the directory in the permission model 1.
- the specific setting can be based on the needs of the actual application, and this embodiment does not limit this.
- the mapping relationship between the operation permissions corresponding to the other metadata levels can also be established in a similar manner.
- the mapping module 1022 can establish a mapping relationship between the permission model 2 shown in FIG8 and the permission model 1.
- the operation permissions "query” and “insert” of the data table in the permission model 2 can be mapped with the operation permissions "query”, “insert”, “delete (structure)", “describe” and other operation permissions in the permission model 1.
- all the operation permissions in the permission model 1 can be mapped according to the needs of the actual application, or only part of the operation permissions can be mapped, such as only establishing a one-to-one mapping relationship between the operation permissions "query” and “insert” of the data table in the permission model 2 and the operation permissions "query” and “insert” in the permission model 1.
- the mapping relationship between the operation permissions corresponding to the remaining metadata levels can also be established in a similar manner.
- the user can also define access control policies for calling metadata model 2 and permission model 2 for the computing engine 1012 to be deployed.
- the user can perform a third operation for the access control policy on the configuration interface presented by the client.
- the interaction module 1021 can feed back the third operation to the mapping module 1022, and the mapping module 1022 creates a corresponding access control policy based on the third operation.
- it can be for the API of adding/deleting/modifying/checking each metadata level in the metadata model 2, based on the permission model 2 corresponding to the metadata model 2, define the permission requirements of the API at the metadata level or the previous level of the metadata level to generate the corresponding access control policy.
- the mapping module 1022 can also define the permission requirements of the API for adding/deleting/modifying/checking the permission policy corresponding to the permission model 2 based on the permission model 2 to generate the corresponding access control policy.
- the access control policy generated by the mapping module 1022 based on the third operation may be as shown in FIG. 12 .
- the required permission is the "all" permission or the "administrator" permission.
- the access control policy generated by the mapping module 1022 based on the third operation may be as shown in FIG. 13 .
- the required permission is the global "Create" permission.
- the required permission for the database to which the data table belongs is the "query" permission.
- the access control policy defined by the mapping module 1022 based on the third operation performed by the user may also be an access control policy of other implementation modes, which is not limited in this embodiment.
- the management device 102 can provide the computing engine 1012 with metadata corresponding to the data in the storage device 103, or save the metadata corresponding to the data written by the computing engine 1012 to the storage device 103. The following describes these two processes in detail.
- the computing engine 1012 When the computing engine 1012 needs to read the data stored in the storage device 103 (hereinafter referred to as the target data), the computing engine 1012 can generate an access request for the metadata of the target data to be read, and send the access request to the management device 102. Specifically, it can call the API interface of the metadata model 2 and the permission model 2 in the management device 102, so that the management device 102 can receive the access request through the API interface.
- the management device 102 includes an interaction module 1021, a metadata determination module 1023, and an authentication module 1024, as shown in FIG1 .
- the interaction module 1021 can provide the access request to the metadata determination module 1023 and the authentication module 1024.
- the metadata determination module 1023 responds to the access request, determines the metadata corresponding to the target data according to the metadata model 2 and the metadata model 1, and feeds the metadata back to the interaction module 1021.
- the authentication module 1024 authenticates the access request according to the permission model 2 and the permission model 1, and feeds back the authentication result to the interaction module 1021.
- the interaction module 1021 sends the metadata corresponding to the target data to the computing engine 1012.
- the access request received by the interaction module 1021 carries indication information for the target data, an execution operation, and an identifier of the computing engine 1012.
- the metadata determination module 1023 may specifically determine the metadata to be accessed by the computing engine 1012 according to the indication information carried in the access request, which is referred to as the first metadata below for the sake of distinction. Since the first metadata satisfies the metadata model 1, and the metadata model 1 is not compatible with the computing engine 1012, the metadata determination module 1023 may translate the first metadata into the second metadata that satisfies the metadata model 2 according to the mapping relationship 1 between the metadata model 1 and the metadata model 2. It can be understood that the metadata model 2 is compatible with the computing engine 1012, and therefore, the computing engine 1012 can identify the second metadata based on the data structure adopted by the metadata model 2.
- the management device 102 after generating the second metadata, the management device 102 does not directly send the second metadata to the computing engine 1012, but sends the second metadata to the computing engine only when the access request passes the authentication.
- the authentication module 1024 in the management device 102 determines the first permission information carried in the access request, and the first permission information may include the identifier of the computing engine 1012, the requested operation (query), and the indication information of the target data (that is, indicating the metadata corresponding to the target data), and, in general, the first permission information satisfies the permission model 2. 's permission information.
- the authentication module 1024 can translate the first permission information into the second permission information that satisfies the permission model 1 according to the mapping relationship 2 between the permission model 2 and the permission model 1, and use the pre-configured permission policy (and access control policy) to authenticate the second permission information to determine whether the computing engine 1012 has the permission to perform the operation on the metadata, generate an authentication result for the access request and feed it back to the interaction module 1021.
- the interaction module 1021 sends the second metadata translated by the metadata determination module 1023 to the computing engine 1012; when the authentication result indicates that the access request is not authenticated, the interaction module 1021 can feed back to the computing engine 1012 response information indicating that the request failed or the authentication failed.
- the metadata determination module 1023 and the authentication module 1024 perform the operations of generating the second metadata and authenticating the access request in parallel as an example for explanation.
- the authentication module 1024 may first authenticate the access request and feed back the authentication result to the metadata determination module 1023; the metadata determination module 1023 performs the process of generating the second metadata only when it determines that the authentication result indicates that the access request has been authenticated.
- the computing engine 1012 can access the target data stored in the storage device 103 according to the second metadata.
- the computing engine 1012 can directly access the storage device 103 according to the second metadata to obtain the target data to be read; or the computing engine 1012 can call the data API interface provided by the management device 102 to the outside according to the second metadata, so as to indirectly access the target data stored in the storage device 103 by using the data API interface, which is not limited in this embodiment.
- the computing engine 1012 may also request access to the permission policy in the management device 102.
- the computing engine 1012 may generate an access request for the target permission policy in the management device 102, the access request including the identifier of the computing engine 1012, the indication information of the target permission policy, and the operation (query) for the target permission policy, and send the access request to the interaction module 1021, which provides the access request to the authentication module 1024.
- the authentication module 1024 may authenticate the access request based on the permission configuration for the permission policy to determine whether the computing engine 1012 has the permission to access the target permission policy built into the management device 102.
- the authentication module 1024 may translate the target permission policy into a permission policy that satisfies the permission model 2, and feed it back to the interaction module 1021, so that the interaction module 1021 sends the permission policy to the computing engine 1012.
- the computing engine 1012 can generate metadata corresponding to the new data according to the storage plan for the new data, which is referred to as original metadata for ease of description, and the original metadata is usually metadata that satisfies the metadata model 2. Then, the computing engine 1012 can generate a metadata update request including the original metadata (which can also include the identifier of the computing engine 1012 and the operation requested to be performed), and send the metadata update request to the interaction module 1021 in the management device 102.
- original metadata which can also include the identifier of the computing engine 1012 and the operation requested to be performed
- the interaction module 1021 may provide the received metadata update request to the authentication module 1024 and the metadata determination module 1023 .
- the authentication module 1024 may first authenticate the metadata update request, specifically by determining the third permission information carried in the metadata update request, which may include the identifier of the computing engine 1012, the requested operation (such as modification, creation, etc.), and the original metadata. Thus, the authentication module 1024 may translate the third permission information into fourth permission information that satisfies the permission model 1 according to the mapping relationship 2 between the permission model 2 and the permission model 1, and authenticate the fourth permission information using the pre-configured permission policy (and access control policy) to determine whether the computing engine 1012 has the permission to perform the operation on the original metadata, generate an authentication result for the metadata update request, and It is fed back to the metadata determination module 1023.
- the authentication module 1024 may translate the third permission information into fourth permission information that satisfies the permission model 1 according to the mapping relationship 2 between the permission model 2 and the permission model 1, and authenticate the fourth permission information using the pre-configured permission policy (and access control policy) to determine whether the computing engine 1012 has the permission to perform the operation on the original metadata, generate an
- the metadata determination module 1023 determines that the authentication result indicates that the metadata update request has passed the authentication
- the original metadata carried in the metadata update request is translated into target metadata that satisfies the metadata model 1 according to the mapping relationship 1 between the metadata model 2 and the metadata model 1, and the target metadata is updated to the management device 102.
- the target metadata can be persistently stored in the management device 102.
- the metadata determination module 1023 performs the translation process of the original metadata after determining that the metadata update request has passed the authentication as an example.
- the metadata determination module 1023 can perform the metadata translation and authentication request process in parallel with the authentication module 1024, and this embodiment does not limit this.
- the computing engine 1012 may write the new data into the storage device 103 according to the original metadata or the target metadata.
- the computing engine 1012 may directly access the storage device 103 and write the new data into the storage device 103; or the computing engine 1012 may call the data API interface provided by the management device 102 to write the new data into the storage device 103 indirectly through the management device 102.
- the computing engine 1012 deployed in the computing device 102 can utilize the metadata model 2 and permission model 2 adapted to it to achieve adaptation with the metadata model 1 and permission model 1 built into the management device 102, so that the computing engine 1012 can read and write data to the storage device 103 through the management device 102.
- the computing device 101 includes a plurality of computing engines (such as the computing engine 1011 and the computing engine 1012 in FIG. 1 ), and the management device 102 includes metadata models and permission models adapted for each of the plurality of computing engines, such as the metadata models 1 and 2 and the permission models 1 and 2 mentioned above, and by establishing mappings between different metadata models and mappings between different permission models, data sharing between different computing engines can be achieved.
- the computing engines 1011 and 1012 can use the metadata model 2 and the permission model 2 to write new data to the storage device 103
- the computing engine 1011 can use the metadata model 1 and the permission model 1 to write new data to the storage device 103.
- computing engine 1011 can access the data written by computing engine 1012 (or the created data table) based on mapping relationship 1 between metadata models and mapping relationship 2 between permission models; computing engine 1012 can access the data written by computing engine 1011 (or the created data table) based on mapping relationship 1 between metadata models and mapping relationship 2 between permission models.
- FIG 14 is a flow chart of a data processing method in an embodiment of the present application.
- the method can be applied to the data processing system 100 shown in Figure 1 above, or it can also be applied to other applicable application scenarios.
- the following is an example of application to the data processing system 100 shown in Figure 1.
- the operations performed by the computing device 101 are specifically performed by the computing engine 1012 in the computing device 101; the operations performed by the management device 102 are performed by multiple functional modules included in the management device 102.
- the data processing method shown in FIG14 may specifically include:
- the interaction module 1021 provides the access request to the metadata determination module 1023 and the authentication module 1024 respectively.
- the metadata determination module 1023 responds to the access request, determines the metadata of the target data according to the mapping relationship 1 between the metadata model 1 and the metadata model 2, and feeds the metadata of the target data back to the interaction module 1021, wherein:
- the metadata model 2 is adapted to the computing engine 1012 , and the metadata of the determined target data satisfies the metadata model 2 .
- mapping relationship 1 there is a mapping relationship 1 between the metadata model 1 and the metadata model 2, so that after the metadata determination module 1023 determines the metadata corresponding to the target data according to the access request, it can translate the metadata satisfying the data structure of the metadata model 1 into the metadata satisfying the data structure of the metadata model 2 (that is, the metadata of the target data) according to the mapping relationship 1.
- the mapping relationship 1 between the metadata model 1 and the metadata model 2 can be established in advance by the mapping module 1022 in the management device 102.
- the specific implementation process of establishing the mapping relationship 1 can refer to the relevant description of the above-mentioned embodiment, which will not be repeated here.
- the authentication module 1024 authenticates the access request according to the mapping relationship 2 between the permission model 1 and the permission model 2, and feeds back the authentication result to the interaction module 1021, wherein the permission model 2 is adapted to the computing engine 1012.
- the authentication module 1024 can determine the first permission information carried in the access request, and the first permission information can include, for example, the identifier of the computing engine 1012, the requested operation, and the indication information of the target data, and the first permission information is the permission information that satisfies the permission model 2. Therefore, the authentication module 1024 can translate the first permission information into the second permission information that satisfies the permission model 1 according to the mapping relationship 2 between the permission model 2 and the permission model 1, and authenticate the second permission information using the pre-configured permission policy to determine whether the computing engine 1012 has the permission to perform the operation on the metadata, generate the authentication result for the access request and feed it back to the interaction module 1021.
- the mapping relationship 2 between the permission model 1 and the permission model 2 can be established in advance by the mapping module 1022, and the specific implementation process of establishing the mapping relationship 2 can refer to the relevant description of the aforementioned embodiment, which will not be repeated here.
- the computing engine 1012 reads the target data stored in the storage device 103 according to the metadata of the target data.
- the computing engine 1012 can directly access the storage device 103 according to the metadata to read the target data in the storage device 103; or, the computing engine 1012 can call the data API interface provided by the management device 102 to the outside and indirectly read the target data in the storage device 103 through the management device 102.
- step S1403 and step S1404 may be executed simultaneously; or, the authentication module 1024 may first authenticate the access request and provide the generated authentication result to the metadata determination module 1023; then, after determining that the access request has passed the authentication according to the authentication result, the metadata determination module 1023 translates the metadata that satisfies the metadata model 1 into the metadata that satisfies the metadata model 2.
- the data processing method provided in this embodiment corresponds to the data processing system 100 shown in FIG. 1 above. Therefore, the specific implementation process of steps S1401 to S1406 can refer to the relevant description of the aforementioned embodiment and will not be repeated here.
- the calculation engine 1012 reads the target data in the storage device 103 as an example for explanation.
- the calculation engine 1012 writes new data to the storage device 103
- the calculation engine 1012 can generate the original metadata corresponding to the new data according to the storage plan for the new data, and the original metadata satisfies the metadata model 2 adapted to the calculation engine 1012.
- the calculation engine 1012 can generate a metadata update request including the original metadata (and can also include the identifier of the calculation engine 1012 and the operation requested to be performed), and send the metadata update request to the interaction module 1021 in the management device 102.
- the interaction module 1021 can provide the received metadata update request to the authentication module 1024 and Metadata determination module 1023.
- the authentication module 1024 may first authenticate the metadata update request, specifically by determining the third permission information carried in the metadata update request, the third permission information may include the identifier of the computing engine 1012, the requested operation (such as modification, creation, etc.) and the original metadata, and the third permission model satisfies the permission model 2. Then, the authentication module 1024 may translate the third permission information into the fourth permission information that satisfies the permission model 1 according to the mapping relationship 2 between the permission model 2 and the permission model 1, and authenticate the fourth permission information using the pre-configured permission policy to determine whether the computing engine 1012 has the permission to perform the operation on the original metadata, generate the authentication result for the metadata update request and feed it back to the metadata determination module 1023.
- the metadata determination module 1023 determines that the authentication result indicates that the metadata update request has passed the authentication
- the original metadata carried in the metadata update request is translated into target metadata that satisfies the metadata model 1 according to the mapping relationship 1 between the metadata model 2 and the metadata model 1, and the target metadata is updated to the management device 102, specifically, the target metadata may be persistently stored in the management device 102.
- the computing engine 1012 may write new data into the storage device 103 according to the original metadata or the target metadata.
- the management device 102 (including the interaction module 1021, mapping module 1022, metadata determination module 1023 and authentication module 1024) involved in the data processing process may be software configured on a computing device or a computing device cluster, and by running the software on the computing device or computing device cluster, the computing device or computing device cluster may implement the functions of the management device 102.
- the management device 102 involved in the data processing process is introduced in detail.
- Figure 15 shows a structural diagram of a computing device, on which the management device 102 can be deployed.
- the computing device can be a computing device in a cloud environment (such as a server), or a computing device in an edge environment, or a terminal device, etc., which can be specifically used to implement the functions of the interaction module 1021, mapping module 1022, metadata determination module 1023 and authentication module 1024 in the embodiment shown in Figure 1 above.
- the computing device 1500 includes a processor 1510, a memory 1520, a communication interface 1530, and a bus 1540.
- the processor 1510, the memory 1520, and the communication interface 1530 communicate with each other through the bus 1540.
- the bus 1540 may be a peripheral component interconnect (PCI) bus or an extended industry standard architecture (EISA) bus.
- the bus may be divided into an address bus, a data bus, a control bus, and the like.
- FIG. 15 is represented by only one thick line, but it does not mean that there is only one bus or one type of bus.
- the communication interface 1530 is used to communicate with the outside, such as obtaining access requests, feeding back metadata of target data, and the like.
- the processor 1510 can be a central processing unit (CPU), an application specific integrated circuit (ASIC), a graphics processing unit (GPU) or one or more integrated circuits.
- the processor 1510 can also be an integrated circuit chip with signal processing capabilities.
- the functions of each module in the management device 102 can be completed by the hardware integrated logic circuit in the processor 1510 or the instructions in the form of software.
- the processor 1510 can also be a general-purpose processor, a digital signal process (DSP), a field programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and can implement or execute the methods, steps and logic block diagrams disclosed in the embodiments of the present application.
- DSP digital signal process
- FPGA field programmable gate array
- the general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc., and the method disclosed in the embodiments of the present application can be directly embodied as a hardware decoding processor for execution, or it can be executed by a combination of hardware and software modules in the decoding processor.
- the software module can The information is located in a random access memory, flash memory, read-only memory, programmable read-only memory, electrically erasable programmable memory, register, or other mature storage media in the art.
- the storage medium is located in the memory 1520, and the processor 1510 reads the information in the memory 1520 and completes part or all of the functions in the management device 102 in combination with its hardware.
- the memory 1520 may include a volatile memory, such as a random access memory (RAM).
- the memory 1520 may also include a non-volatile memory, such as a read-only memory (ROM), a flash memory, a HDD, or a SSD.
- the memory 1520 stores executable codes, and the processor 1510 executes the executable codes to execute the method executed by the aforementioned management device 102 .
- the interaction module 1021, mapping module 1022, metadata determination module 1023 and authentication module 1024 described in the embodiment shown in Figure 1 are implemented by software
- the software or program code required to execute the functions of the interaction module 1021, mapping module 1022, metadata determination module 1023 and authentication module 1024 in Figure 1 are stored in the memory 1520
- the interaction between the interaction module 1021 and other devices is realized through the communication interface 1530
- the processor is used to execute instructions in the memory 1520 to implement the method executed by the management device 102.
- FIG16 is a schematic diagram showing the structure of a computing device cluster.
- the computing device cluster 160 shown in FIG16 includes multiple computing devices, and the management device 102 can be distributedly deployed on multiple computing devices in the computing device cluster 160.
- the computing device cluster 160 includes multiple computing devices 1600, each computing device 1600 includes a memory 1620, a processor 1610, a communication interface 1630 and a bus 1640, wherein the memory 1620, the processor 1610, and the communication interface 1630 are connected to each other through the bus 1640.
- the processor 1610 may be a CPU, a GPU, an ASIC, or one or more integrated circuits.
- the processor 1610 may also be an integrated circuit chip with signal processing capabilities. In the implementation process, some functions of the management device 102 may be completed by the hardware integrated logic circuit or software instructions in the processor 1610.
- the processor 1610 may also be a DSP, an FPGA, a general processor, other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, and may implement or execute some methods, steps, and logic block diagrams disclosed in the embodiments of the present application.
- the general processor may be a microprocessor or the processor may also be any conventional processor, etc.
- the steps of the method disclosed in the embodiments of the present application may be directly embodied as a hardware decoding processor for execution, or may be executed by a combination of hardware and software modules in the decoding processor.
- the software module may be located in a mature storage medium in the art such as a random access memory, a flash memory, a read-only memory, a programmable read-only memory, or an electrically erasable programmable memory, a register, etc.
- the storage medium is located in the memory 1620.
- the processor 1610 reads the information in the memory 1620, and in combination with its hardware, some functions of the management device 102 may be completed.
- the memory 1620 may include ROM, RAM, static storage device, dynamic storage device, hard disk (such as SSD, HDD), etc.
- the memory 1620 may store program codes, for example, part or all of the program codes for implementing the interaction module 1021, part or all of the program codes for implementing the mapping module 1022, part or all of the program codes for implementing the metadata determination module 1023, part or all of the program codes for implementing the authentication module 1024, etc.
- the processor 1610 executes part of the method executed by the management apparatus 102 based on the communication interface 1630, such as a part of the computing devices 1600 may be used to execute the method executed by the interaction module 1021, a part of the computing devices 1600 may be used to execute the method executed by the mapping module 1022, a part of the computing devices 1600 may be used to execute the method executed by the metadata determination module 1023, and a part of the computing devices 1600 may be used to execute the method executed by the authentication module 1024.
- the memory 1620 can also store data. For example: intermediate data or result data generated by the processor 1610 during the execution process, such as the above-mentioned first metadata, second metadata, first permission information, second permission information, etc.
- the communication interface 1603 in each computing device 1600 is used for external communication, such as interacting with other computing devices 1600 .
- the bus 1640 may be a peripheral component interconnect standard bus or an extended industry standard architecture bus, etc.
- the bus 1640 in each computing device 1600 in FIG16 is represented by only one thick line, but does not mean that there is only one bus or one type of bus.
- the plurality of computing devices 1600 establish communication paths through a communication network to implement the functions of the management apparatus 102.
- Any computing device may be a computing device in a cloud environment (eg, a server), or a computing device in an edge environment, or a terminal device.
- an embodiment of the present application also provides a computer-readable storage medium, which stores instructions.
- the computer-readable storage medium When the computer-readable storage medium is run on one or more computing devices, the one or more computing devices execute the methods executed by the various modules of the management device 102 of the above embodiment.
- the embodiment of the present application further provides a computer program product, and when the computer program product is executed by one or more computing devices, the one or more computing devices execute any of the aforementioned data processing methods.
- the computer program product may be a software installation package, and when any of the aforementioned data processing methods is required, the computer program product may be downloaded and executed on a computer.
- the device embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment.
- the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
- the technical solution of the present application is essentially or the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium, such as a computer floppy disk, a U disk, a mobile hard disk, a ROM, a RAM, a disk or an optical disk, etc., including a number of instructions to enable a computer device (which can be a personal computer, a training device, or a network device, etc.) to execute the methods described in each embodiment of the present application.
- a computer device which can be a personal computer, a training device, or a network device, etc.
- all or part of the embodiments may be implemented by software, hardware, firmware or any combination thereof.
- all or part of the embodiments may be implemented in the form of a computer program product.
- the computer program product includes one or more computer instructions.
- the computer may be A general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
- the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
- the computer instructions may be transmitted from one website, computer, training equipment, or data center to another website, computer, training equipment, or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.
- wired e.g., coaxial cable, optical fiber, digital subscriber line (DSL)
- wireless e.g., infrared, wireless, microwave, etc.
- the computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a training equipment, data center, etc. that includes one or more available media integrated.
- the available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid-state drive (SSD)).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- General Health & Medical Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Automation & Control Theory (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Storage Device Security (AREA)
Abstract
Description
Claims (20)
- 一种数据处理方法,其特征在于,所述方法应用于数据处理系统,所述数据处理系统包括计算引擎、管理装置以及存储装置,所述管理装置内置有第一元数据模型以及第一权限模型,所述第一元数据模型以及所述第一权限模型与所述存储装置适配,所述方法包括:所述管理装置接收所述计算引擎发送的针对目标数据的元数据的访问请求,所述目标数据存储于所述存储装置;所述管理装置响应所述访问请求,根据第一映射关系确定所述目标数据的元数据,所述第一映射关系为所述第一元数据模型与第二元数据模型之间的映射关系,所述第二元数据模型与所述计算引擎适配,所述目标数据的元数据满足所述第二元数据模型;所述管理装置根据第二映射关系对所述访问请求进行鉴权,所述第二映射关系为所述第一权限模型与第二权限模型之间的映射关系,所述第二权限模型与所述计算引擎适配;所述管理装置在所述访问请求通过鉴权后,将所述目标数据的元数据发送给所述计算引擎。
- 根据权利要求1所述的方法,其特征在于,所述方法还包括:所述管理装置接收所述计算引擎发送的元数据更新请求;所述管理装置响应所述元数据更新请求,根据所述第一映射关系,将所述元数据更新请求中携带的原始元数据翻译为目标元数据,所述原始元数据满足所述第二元数据模型,所述目标元数据满足所述第一元数据模型;所述管理装置根据所述第二映射关系对所述元数据更新请求进行鉴权;所述管理装置在所述元数据更新请求通过鉴权后,将所述目标元数据更新至所述管理装置。
- 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:所述管理装置输出配置界面;所述管理装置响应用户在所述配置界面执行的第一操作,建立所述第一映射关系,并响应用户在所述配置界面执行的第二操作,建立所述第二映射关系。
- 根据权利要求3所述的方法,其特征在于,所述方法还包括:所述管理装置响应用户在所述配置界面执行的第三操作,生成针对所述第二元数据模型以及所述第二权限模型的访问控制策略。
- 根据权利要求1至4任一项所述的方法,其特征在于,所述数据处理系统包括多种计算引擎,所述管理装置包括所述多种计算引擎中的每种计算引擎所适配的元数据模型以及权限模型。
- 根据权利要求1至5任一项所述的方法,其特征在于,所述目标数据的元数据为第二元数据,所述管理装置响应所述访问请求,根据第一映射关系确定所述目标数据的元数据,包括:所述管理装置根据所述访问请求读取第一元数据,所述第一元数据满足所述第一元数据模型;所述管理装置根据所述第一映射关系,将所述第一元数据翻译为满足所述第二元数据 模型的所述第二元数据;所述管理装置根据第二映射关系对所述访问请求进行鉴权,包括:所述管理装置根据所述第二映射关系,将所述访问请求中的第一权限信息翻译为满足所述第一权限模型的第二权限信息,所述第一权限信息满足所述第二权限模型;所述管理装置对所述第二权限信息进行鉴权;所述管理装置在所述访问请求通过鉴权后,将所述目标数据的元数据发送给所述计算引擎,包括:所述管理装置在所述第二权限信息通过鉴权后,将所述第二元数据发送给所述计算引擎。
- 一种数据处理系统,其特征在于,所述数据处理系统包括计算引擎、管理装置、存储装置,所述管理装置内置有第一元数据模型以及第一权限模型,所述第一元数据模型以及所述第一权限模型与所述存储装置适配;所述计算引擎,用于生成针对目标数据的元数据的访问请求,并将所述访问请求发送给所述管理装置;所述管理装置,用于响应所述访问请求,根据第一映射关系,确定所述目标数据的元数据,并根据第二映射关系对所述访问请求进行鉴权;在所述访问请求通过鉴权后,将所述目标数据的元数据发送给所述计算引擎,其中,所述第一映射关系为所述第一元数据模型与第二元数据模型之间的映射关系,所述第二元数据模型与所述计算引擎适配,所述目标数据的元数据满足所述第二元数据模型,所述第二映射关系为所述第一权限模型与第二权限模型之间的映射关系,所述第二权限模型与所述计算引擎适配;所述计算引擎,还用于根据所述目标数据的元数据,读取所述存储装置存储的所述目标数据。
- 根据权利要求7所述的数据处理系统,其特征在于,所述计算引擎,还用于生成元数据更新请求,并将所述元数据更新请求发送给所述管理装置;所述管理装置,还用于响应所述元数据更新请求,根据所述第一映射关系,将所述元数据更新请求中携带的原始元数据翻译为目标元数据,所述目标元数据满足所述第一元数据模型,所述原始元数据满足所述第二元数据模型,并根据所述第二映射关系对所述元数据更新请求进行鉴权,在所述元数据更新请求通过鉴权后,将所述目标元数据更新至所述管理装置。
- 根据权利要求7或8所述的数据处理系统,其特征在于,所述管理装置,还用于输出配置界面,并响应用户在所述配置界面执行的第一操作,建立所述第一映射关系,以及响应用户在所述配置界面执行的第二操作,建立所述第二映射关系。
- 根据权利要求9所述的数据处理系统,其特征在于,所述管理装置,还用于响应用户在所述配置界面执行的第三操作,生成针对所述第二元数据模型以及所述第二权限模型的访问控制策略。
- 根据权利要求7至10任一项所述的数据处理系统,其特征在于,所述数据处理系统 包括多种计算引擎,所述管理装置包括所述多种计算引擎中的每种计算引擎所适配的元数据模型以及权限模型。
- 根据权利要求7至11任一项所述的数据处理系统,其特征在于,所述目标数据的元数据为第二元数据;所述管理装置,具体用于根据所述访问请求读取第一元数据,所述第一元数据满足所述第一元数据模型,根据所述第一映射关系将所述第一元数据翻译为满足所述第二元数据模型的所述第二元数据,根据所述第二映射关系,将所述访问请求中的第一权限信息翻译为满足所述第一权限模型的第二权限信息,所述第一权限信息满足所述第二权限模型,并对所述第二权限信息进行鉴权;并在所述第二权限信息通过鉴权后,将所述第二元数据发送给所述计算引擎。
- 一种管理装置,其特征在于,所述管理装置应用于数据处理系统,所述数据处理系统还包括计算引擎、存储装置,所述管理装置内置有第一元数据模型以及第一权限模型;所述第一元数据模型以及所述第一权限模型与所述存储装置适配,所述管理装置包括:交互模块,用于接收所述计算引擎发送的针对目标数据的元数据的访问请求,所述目标数据存储于所述存储装置;元数据确定模块,用于响应所述访问请求,根据第一映射关系确定所述目标数据的元数据,所述第一映射关系为所述第一元数据模型与第二元数据模型之间的映射关系,所述第二元数据模型与所述计算引擎适配,所述目标数据的元数据满足所述第二元数据模型;鉴权模块,用于根据第二映射关系对所述访问请求进行鉴权,所述第二映射关系为所述第一权限模型与第二权限模型之间的映射关系,所述第二权限模型与所述计算引擎适配;所述交互模块,还用于在所述访问请求通过鉴权后,将所述目标数据的元数据发送给所述计算引擎。
- 根据权利要求13所述的管理装置,其特征在于,所述交互模块还用于所述管理装置接收所述计算引擎发送的元数据更新请求;所述元数据确定模块,还用于响应所述元数据更新请求,根据所述第一映射关系,将所述元数据更新请求中携带的原始元数据翻译为目标元数据,所述原始元数据满足所述第二元数据模型,所述目标元数据满足所述第一元数据模型;所述鉴权模块,还用于根据所述第二映射关系对所述元数据更新请求进行鉴权;所述交互模块,还用于在所述元数据更新请求通过鉴权后,将所述目标元数据更新至所述管理装置。
- 根据权利要求13或14所述的管理装置,其特征在于,所述交互模块,还用于输出配置界面;所述管理装置还包括:映射模块,用于响应用户在所述配置界面执行的第一操作,建立所述第一映射关系,并响应用户在所述配置界面执行的第二操作,建立所述第二映射关系。
- 根据权利要求13至15任一项所述的管理装置,其特征在于,所述映射模块,还用于响应用户在所述配置界面执行的第三操作,生成针对所述第二元数据模型以及所述第二权限模型的访问控制策略。
- 根据权利要求13至16任一项所述的管理装置,其特征在于,所述数据处理系统包括多种计算引擎,所述管理装置包括所述多种计算引擎中的每种计算引擎所适配的元数据模型以及权限模型。
- 根据权利要求13至17任一项所述的管理装置,其特征在于,所述目标数据的元数据为第二元数据;所述元数据确定模块,具体用于根据所述访问请求读取第一元数据,所述第一元数据满足所述第一元数据模型,根据所述第一映射关系,将所述第一元数据翻译为满足所述第二元数据模型的所述第二元数据;所述鉴权模块,具体用于根据所述第二映射关系,将所述访问请求中的第一权限信息翻译为满足所述第一权限模型的第二权限信息,所述第一权限信息满足所述第二权限模型,并对所述第二权限信息进行鉴权;所述交互模块,具体用于在所述第二权限信息通过鉴权后,将所述第二元数据发送给所述计算引擎。
- 一种计算设备集群,其特征在于,包括至少一个计算设备,每个计算设备包括处理器和存储器;所述处理器用于执行所述存储器中存储的指令,以使得所述计算设备集群执行权利要求1至6中任一项所述的方法。
- 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当其在至少一个计算设备上运行时,使得所述至少一个计算设备执行如权利要求1至6任一项所述的方法。
Priority Applications (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| EP23890156.5A EP4614336A4 (en) | 2022-11-18 | 2023-06-16 | DATA PROCESSING METHOD AND ASSOCIATED SYSTEM, APPARATUS, AND DEVICE |
| US19/210,999 US20250278501A1 (en) | 2022-11-18 | 2025-05-16 | Data processing method and system, apparatus, and related device |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202211446388.9A CN118093500A (zh) | 2022-11-18 | 2022-11-18 | 一种数据处理方法、系统、装置及相关设备 |
| CN202211446388.9 | 2022-11-18 |
Related Child Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US19/210,999 Continuation US20250278501A1 (en) | 2022-11-18 | 2025-05-16 | Data processing method and system, apparatus, and related device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2024103714A1 true WO2024103714A1 (zh) | 2024-05-23 |
Family
ID=91083715
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/CN2023/100673 Ceased WO2024103714A1 (zh) | 2022-11-18 | 2023-06-16 | 一种数据处理方法、系统、装置及相关设备 |
Country Status (4)
| Country | Link |
|---|---|
| US (1) | US20250278501A1 (zh) |
| EP (1) | EP4614336A4 (zh) |
| CN (1) | CN118093500A (zh) |
| WO (1) | WO2024103714A1 (zh) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118410513A (zh) * | 2024-07-04 | 2024-07-30 | 北京国电通网络技术有限公司 | 面向数据库中间件的dpu内嵌式细粒度访问方法及系统 |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2006026636A2 (en) * | 2004-08-31 | 2006-03-09 | Ascential Software Corporation | Metadata management |
| CN112307122A (zh) * | 2020-10-30 | 2021-02-02 | 杭州海康威视数字技术股份有限公司 | 一种基于数据湖的数据管理系统及方法 |
| CN112364110A (zh) * | 2020-11-17 | 2021-02-12 | 深圳前海微众银行股份有限公司 | 元数据管理方法、装置、设备及计算机存储介质 |
| CN113468166A (zh) * | 2020-03-31 | 2021-10-01 | 广州虎牙科技有限公司 | 元数据处理方法、装置、存储介质及服务器 |
| CN113761294A (zh) * | 2021-09-10 | 2021-12-07 | 北京火山引擎科技有限公司 | 数据管理方法、装置、存储介质以及电子设备 |
Family Cites Families (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050234969A1 (en) * | 2003-08-27 | 2005-10-20 | Ascential Software Corporation | Services oriented architecture for handling metadata in a data integration platform |
| US20180373781A1 (en) * | 2017-06-21 | 2018-12-27 | Yogesh PALRECHA | Data handling methods and system for data lakes |
| CN109408689B (zh) * | 2018-10-24 | 2020-11-24 | 北京金山云网络技术有限公司 | 数据获取方法、装置、系统及电子设备 |
| CN113568931A (zh) * | 2020-04-29 | 2021-10-29 | 盛趣信息技术(上海)有限公司 | 一种数据访问请求的路由解析系统及方法 |
-
2022
- 2022-11-18 CN CN202211446388.9A patent/CN118093500A/zh active Pending
-
2023
- 2023-06-16 WO PCT/CN2023/100673 patent/WO2024103714A1/zh not_active Ceased
- 2023-06-16 EP EP23890156.5A patent/EP4614336A4/en active Pending
-
2025
- 2025-05-16 US US19/210,999 patent/US20250278501A1/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2006026636A2 (en) * | 2004-08-31 | 2006-03-09 | Ascential Software Corporation | Metadata management |
| CN113468166A (zh) * | 2020-03-31 | 2021-10-01 | 广州虎牙科技有限公司 | 元数据处理方法、装置、存储介质及服务器 |
| CN112307122A (zh) * | 2020-10-30 | 2021-02-02 | 杭州海康威视数字技术股份有限公司 | 一种基于数据湖的数据管理系统及方法 |
| CN112364110A (zh) * | 2020-11-17 | 2021-02-12 | 深圳前海微众银行股份有限公司 | 元数据管理方法、装置、设备及计算机存储介质 |
| CN113761294A (zh) * | 2021-09-10 | 2021-12-07 | 北京火山引擎科技有限公司 | 数据管理方法、装置、存储介质以及电子设备 |
Non-Patent Citations (1)
| Title |
|---|
| See also references of EP4614336A4 |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN118410513A (zh) * | 2024-07-04 | 2024-07-30 | 北京国电通网络技术有限公司 | 面向数据库中间件的dpu内嵌式细粒度访问方法及系统 |
Also Published As
| Publication number | Publication date |
|---|---|
| CN118093500A (zh) | 2024-05-28 |
| EP4614336A1 (en) | 2025-09-10 |
| EP4614336A4 (en) | 2025-11-19 |
| US20250278501A1 (en) | 2025-09-04 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11675746B2 (en) | Virtualized server systems and methods including domain joining techniques | |
| JP7090606B2 (ja) | データベース・システムにおけるテスト・データの形成及び動作 | |
| CN110799960B (zh) | 数据库租户迁移的系统和方法 | |
| CN103518364B (zh) | 分布式存储系统的数据更新方法及服务器 | |
| CN107480237B (zh) | 面向异构桌面云平台的数据融合方法及系统 | |
| CN106202452A (zh) | 大数据平台的统一数据资源管理系统与方法 | |
| US11561937B2 (en) | Multitenant application server using a union file system | |
| CN105227672B (zh) | 数据存储及访问的方法和系统 | |
| CN108427677B (zh) | 一种对象访问方法、装置及电子设备 | |
| JP5248912B2 (ja) | サーバ計算機、計算機システムおよびファイル管理方法 | |
| US20250278501A1 (en) | Data processing method and system, apparatus, and related device | |
| WO2025025694A1 (zh) | 权限校验方法、装置、设备及集群 | |
| US11803568B1 (en) | Replicating changes from a database to a destination and modifying replication capacity | |
| CN117539398A (zh) | 一种卷映射的管理方法、装置、设备及介质 | |
| CN114911574B (zh) | 一种数据处理方法及装置 | |
| US9336232B1 (en) | Native file access | |
| CN120523810A (zh) | 数据管理方法、装置及可读存储介质 | |
| WO2025209186A1 (zh) | 数据传输方法 | |
| CN113590309B (zh) | 一种数据处理方法、装置、设备及存储介质 | |
| US11068500B1 (en) | Remote snapshot access in a replication setup | |
| CN119106035A (zh) | 一种区块存储方法以及相关设备 | |
| CN115794164A (zh) | 一种云数据库升级方法、装置、设备及存储介质 | |
| WO2023273803A1 (zh) | 一种认证方法、装置和存储系统 | |
| CN114547055A (zh) | 一种数据处理方法及装置 | |
| US20250184393A1 (en) | Object Storage Service Configuration Method and Apparatus Based on Cloud Computing Technology |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 23890156 Country of ref document: EP Kind code of ref document: A1 |
|
| WWE | Wipo information: entry into national phase |
Ref document number: 2023890156 Country of ref document: EP |
|
| ENP | Entry into the national phase |
Ref document number: 2023890156 Country of ref document: EP Effective date: 20250603 |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| WWP | Wipo information: published in national office |
Ref document number: 2023890156 Country of ref document: EP |