EP4710213A2 - Erzeugung und verbrauch von datenmodellen, die mehrere faktenmengen spannen - Google Patents
Erzeugung und verbrauch von datenmodellen, die mehrere faktenmengen spannenInfo
- Publication number
- EP4710213A2 EP4710213A2 EP24733056.6A EP24733056A EP4710213A2 EP 4710213 A2 EP4710213 A2 EP 4710213A2 EP 24733056 A EP24733056 A EP 24733056A EP 4710213 A2 EP4710213 A2 EP 4710213A2
- Authority
- EP
- European Patent Office
- Prior art keywords
- data field
- data
- icon
- dimension
- implementations
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2428—Query predicate definition using graphical user interfaces, including menus and forms
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
Definitions
- the disclosed implementations relate generally to data visualization and more specifically to systems and methods that facilitate building object models and validating relationships between objects in object models of a data source.
- Data visualization applications enable a user to understand a data set visually, including distribution, trends, outliers, and other factors that are important to making business decisions.
- Some data visualization applications provide a user interface that enables users to build visualizations from a data source by selecting data fields and placing them into specific user interface regions to indirectly define a data visualization.
- An object is a collection of named attributes.
- An object often corresponds to a real -world object, event, or concept, such as a Store.
- the attributes are descriptions of the object that are conceptually at a 1 : 1 relationship with the object.
- a Store object may have a single [Manager Name] or [Employee Count] associated with it.
- an object is often stored as a row in a relational table, or as an object in JSON.
- a class is a collection of objects that share the same attributes. It must be analytically meaningful to compare objects within a class and to aggregate over them. At a physical level, a class is often stored as a relational table, or as an array of objects in JSON.
- An object model is a set of classes and a set of many-to-one relationships between them. Classes that are related by 1-to-l relationships are conceptually treated as a single class, even if they are meaningfully distinct to a user. In addition, classes that are related by 1-to-l relationships may be presented as distinct classes in the data visualization user interface. Many-to-many relationships are conceptually split into two many-to-one relationships by adding an associative table capturing the relationship.
- a data visualization application can assist a user in various ways. In some implementations, based on data fields already selected and placed onto shelves in the user interface, the data visualization application can recommend additional fields or limit what actions can be taken to prevent unusable combinations. In some implementations, the data visualization application allows a user considerable freedom in selecting fields, and uses the object model to build one or more data visualizations according to what the user has selected.
- Some implementations of the present disclosure are directed to a computing device having an improved user interface that facilitates authoring of a multi-fact data model.
- the data models disclosed herein are displayed in the user interface in a more compact manner compared to existing data models (see, e.g., Figure 1C).
- the user interface facilitates selection and disambiguation of relationships in the object model.
- the computing device in response to user interaction with (e.g., hover over) an object in the object model, the computing device highlights that object and traces other objects in the object model that are shared with that object, thus providing improved visual feedback to the user.
- Some implementations of the present disclosure are directed to a computing device having an improved user interface that facilitates analysis of a multi-fact data model.
- analysts cannot easily identify the relevant fields to be used together. Once they start their analysis, the analysts can easily lose sight of fields that are relevant and those that are not relevant. There are multiple perspectives on how to utilize a complex data model and these perspectives need to adapt to the analyst's analytic workflow.
- the user interface grays out fields that are not relevant to the current analysis (e.g., not relevant to the fields that are currently in use in the analysis).
- the user interface infoscents grayed out fields and provides an explanation as to why they are not related and the consequence of using them.
- An analyst obtains sufficient information from the tooltips that are displayed in the user interface, to decide whether to proceed.
- the relatability of fields also adapts to user input.
- the user interface preserves relevant reminders in the fields that have been used, and whether they are related or unrelated fields, so that the analyst can always go back and refine the analysis.
- Some implementations of the present disclosure are directed to improved query semantics that support multi-fact data model analysis.
- the disclosed query semantics are fully compatible with Tableau’s VizQL, which provides flexible interactivity, and answers sophisticated analytic questions in an iterative approach.
- a method for generating object models that span multiple fact tables is performed at computing device having a display, one or more processors and memory storing one or more programs configured for execution by the one or more processors.
- the method includes displaying, in a user interface, a first object icon and a second object icon positioned to the right of the first object icon.
- the first object icon represents a first object of a first data source.
- the second object icon represents a second object of the first data source.
- the first object icon is connected to the second object icon via a first connector representing a relationship between the first object and the second object.
- the relationship between the first object and the second object has a first cardinality.
- the method includes, in response to receiving a first user input to add a third object, displaying, in the user interface, a third object icon representing the third object.
- the method includes, in response to receiving a second user input on the third object icon, when the second object and the third object include at least one common data field, generating and displaying, in the user interface, a second connector connecting the third object icon to the second object icon.
- the second connector represents a relationship between the third object and the second object.
- the relationship between the third object and the second object has a second cardinality.
- the first cardinality is one of a many-to- many relationship, a many-to-one relationship, or a one-to-many relationship.
- the second cardinality is one of a many-to-many relationship, a many-to-one relationship, or a one-to-many relationship.
- the second user input comprises user selection of at least a portion of the third object icon.
- the method further includes, in response to the user selection: generating and displaying, in the user interface, a freeform line. A first end of the freeform line is connected to the third object icon and a second end of the freeform line corresponds to a position of a mouse cursor in the user interface.
- the method further includes, in response to receiving a user interaction with the second connector, displaying an identification of the at least one common data field.
- the method further includes, after connecting the third object icon to the second object icon via the second connector, in response to receiving user selection of the first object icon, displaying, in the user interface, a plurality of data rows and data columns representing information corresponding to one or more data fields in the first object.
- the method further includes, after connecting the third object icon to the second object icon via the second connector, vertically aligning the first object icon and the third object icon for display in the user interface.
- the method further includes, after connecting the third object icon to the second object icon via the second connector, arranging the first object icon and the third object icon in an alphabetical order for display in the user interface.
- displaying the second connector connecting the third object icon to the second object icon comprises converting the second object from a subtree of the first object to a shared object.
- the shared object is a dimension logical table consisting of one or more dimension data fields.
- the first object comprises a first fact table and the third object comprises a second fact table that is unrelated to the first fact table.
- the at least one common data field comprises a geographic data field.
- the at least one common data field comprises a date/time data field.
- the third object is an object of the first data source.
- the third object is an object of a second data source, distinct from the first data source.
- the method further includes, displaying, in the user interface, a fourth object icon representing a fourth object.
- the fourth object icon is connected to the second object icon via a third connector representing a relationship between the fourth object and the second object.
- the relationship between the fourth object and the second object has a third cardinality.
- the fourth object icon is connected to a fifth object icon, representing a fifth object, via a fourth connector.
- the fourth connector represents a relationship between the fourth object and the fifth object.
- the relationship between the fourth object and the fifth object has a fourth cardinality.
- the third connector and the fourth connector include an overlapping portion.
- the method includes, in response to receiving a user interaction with the overlapping portion of the third connector and the fourth connector, concurrently displaying (i) an identification of a first related data field relating the fourth object and the second object and (ii) an identification of a second related data field relating the fourth object and the fifth object.
- the first object icon, the second object icon, the third object icon, the fourth object icon, and the fifth object icon are distinct icons.
- the first related data field and the second related data field are distinct data fields.
- the method further includes, in response to user selection of the identification of the first related data field relating the fourth object and the second object, simultaneously visually emphasizing the fourth object, the second object, and the third connector.
- the third cardinality is one of: a many-to-many relationship, a many-to-one relationship, or a one-to-many relationship.
- the fourth cardinality is one of: a many-to-many relationship, a many-to-one relationship, or a one-to-many relationship.
- the method further comprises displaying, in the user interface, (i) a fourth object icon representing a fourth object; (ii) a fifth object icon representing a fifth object; and (iii) a third connector connecting the fourth object icon and the fifth object icon.
- the third connector represents a many-to-many relationship between the fourth object and the fifth object.
- the fourth object icon, the fifth object icon, and the third connector are not connected to any of the first object icon, the second object icon, or the third object icon.
- the method includes, in response to receiving a third user input on the fifth object icon, generating and displaying, in the user interface, a freeform line.
- a first end of the freeform line is connected to the fifth object icon and a second end of the freeform line corresponds to the position of a mouse cursor in the user interface.
- the method includes, in response to receiving an interaction between the second end of the freeform line and the second object icon: converting the freeform line into a third connector connecting the fifth object icon and the second object icon, the third connector representing a many-to-many relationship between the fifth object and the second object.
- the first object icon, the second object icon, and the third object icon are displayed in a first portion of the user interface.
- the fourth object icon and the fifth object icon are displayed in a second portion of the user interface.
- Converting the freeform line into a third connector connecting the fifth object icon and the second object icon includes redisplaying the fourth object icon and the fifth object icon in the first portion of the user interface.
- a method for performing guided analysis using multi -fact object model is performed at computing device having a display, one or more processors and memory storing one or more programs configured for execution by the one or more processors.
- the method includes displaying, in a user interface, a plurality of data field icons corresponding to a plurality of data fields. Each of the data fields is associated with a respective object of a plurality of objects in an object model.
- the method includes, in response to (i) user selection of a first data field icon, from the plurality of data field icons, corresponding to a first data field, and (ii) placement of the first data field icon in a shelf region of the user interface, where the first data field is associated with a first object of the plurality of objects: (1) generating and displaying a first data visualization in the user interface and (2) updating a visual characteristic, of a subset of the plurality of data field icons that are displayed in the user interface, from a first visual characteristic to a second visual characteristic.
- Each data field icon in the subset of data field icons is associated with a second object of the plurality of objects, distinct from the first object.
- the data field icons in the subset are user-selectable independently of the first or second visual characteristic.
- updating the visual characteristic, of the subset of data field icons, from the first visual characteristic to the second visual characteristic includes visually de-emphasizing the subset of data field icons relative to other data field icons in the plurality of data field icons while maintaining user-selectability of the subset of data field icons.
- the method further comprises, while the visual characteristic of the first subset of data fields is the second visual characteristic: in response to a user interaction with a second data field icon from the subset of data field icons, corresponding to a second data field of the plurality of data fields: displaying information that the second data field is unrelated to the first data field.
- the method further comprises, while the visual characteristic of the first subset of data fields is the second visual characteristic: in response to receiving (i) user selection of a second data field icon from the subset of data field icons, corresponding to a second data field of the plurality of data fields, and (ii) user placement of the second data field icon in the shelf region: generating and displaying a second data visualization in the user interface.
- generating the first data visualization includes executing a first query that specifies an aggregation of data values of the first data field.
- generating the second data visualization includes executing a second query that duplicates, for each data value of the third data field, the aggregated data values of the first data field.
- the method comprises, concurrently while displaying the second data visualization: displaying, in the shelf region, a warning visual indicator adjacent to the first data field icon. In response to a user interaction with the warning visual indicator, the method displays information that the second data field is unrelated to the first data field.
- the method comprises, after updating the visual characteristic of the subset of data field icons to the second visual characteristic: in response to receiving (i) user selection of a third data field icon from the plurality of data field icons, where the third data field icon corresponds to a third data field and is not a data field icon from the subset of data field icons and (ii) placement of the third data field icon in the shelf region, executing a second query that specifies an aggregation of data values of the first data field according to the third data field to generate a third data visualization, then displaying, in the user interface, the third data visualization.
- the method further comprises concurrently while displaying the third data visualization, updating a visual characteristic of the subset of data fields from the second visual characteristic to the first visual characteristic.
- the third data field is a shared data field that is shared between the first object and the second object.
- the third data field is associated with a dimension logical table.
- the third data field is a dimension data field.
- the third data field is a geographic data field.
- the third data field is a date/time data field.
- the method further comprises, after displaying the third data visualization: in response to receiving (i) user selection of a fourth data field icon from the subset of data field icons, corresponding to a fourth data field, and (ii) placement of the fourth data field icon in the shelf region: executing a third query that specifies an aggregation of data values of the fourth data field according to the third data field to generate a fourth data visualization, and displaying, in the user interface, the fourth data visualization.
- the fourth data visualization is concurrently displayed with the third data visualization in the user interface.
- a method for generating data visualizations using multi-fact object models is performed at computing device having a display, one or more processors and memory storing one or more programs configured for execution by the one or more processors.
- the method includes receiving a first user input specifying a first dimension data field and a second dimension data field for generating a first data visualization.
- the method includes determining that the first dimension data field belongs to a first object of an object model and the second dimension data field belongs to a second object of the object model, distinct from the first object.
- the method includes constructing a dimension subquery according to characteristics of the first dimension data field, the second dimension data field, the first object, and/or the second object.
- the constructing includes determining a join type for combining (i) first data rows that include data values of the first dimension data field and (ii) second data rows that include data values of the second dimension data field; and constructing the dimension subquery according to the determined join type, the dimension subquery referencing the first object and the second object.
- the method includes executing the dimension subquery against one or more data sources corresponding to the first dimension data field and the second dimension data field to retrieve first tuples that comprise unique ordered combinations of data values for the first dimension data field and the second dimension data field.
- the method includes constructing one or more measure subqueries, each of the measure subqueries referencing one or more measure data fields in the object model.
- the method includes executing the one or more measure subqueries to retrieve second tuples.
- the method includes forming extended tuples by combining the retrieved first tuples and the retrieved second tuples.
- the method also includes generating and displaying the first data visualization according to the extended tuples.
- constructing the dimension subquery according to the characteristics of the first dimension data field, the second dimension data field, the first object, and/or the object determining includes: when (i) the first dimension data field can be traced to one root object and (ii) the second dimension data field can be traced to the same root object, combining data columns of the first dimension data field and the second dimension data field using an inner join.
- constructing the dimension subquery according to the characteristics of the first dimension data field, the second dimension data field, the first object, and/or the second object includes: when (i) the first dimension data field can be traced to a first root object and (ii) the second dimension data field can be traced to a second root object that is distinct from the first root object: (a) forming a first object tree that includes the first object and the first root object, and combining data columns from objects of the first object tree according to data values of the first dimension data field using an inner join to form a first table; (b) forming a second object tree that includes the second object and the second root object, and combining data columns from objects of the second object tree according to data values of the second dimension data field using an inner join to form a second table; and (c) combining data columns of the first table and the second table via a cross join.
- constructing the dimension subquery according to the characteristics of the first dimension data field, the second dimension data field, the first object, and/or the second object includes: when the first dimension data field and the second dimension data field belong to the same object that is shared by two or more root objects, combining data columns of the first dimension data field and the second dimension data field using an inner join.
- constructing the dimension subquery according to the characteristics of the first dimension data field, the second dimension data field, the first object, and/or the second object includes: when (i) the first object is shared by a first set of root objects and (ii) the second object is shared by a second set of root objects, combining data columns of the first dimension data field and the second dimension data field using a cross join.
- constructing the dimension subquery according to the characteristics of the first dimension data field, the second dimension data field, the first object, and/or the second object includes: when (i) the first object is a first root object, (ii) the second object can be traced to the first root object, and the (iii) the second dimension data field is not shared by another root object, combining data columns of the first dimension data field and the second dimension data field using an inner join.
- a first dimension data field and/or the second dimension data field is a geographic data field.
- a first dimension data field and/or the second dimension data field is a date/time data field.
- the one or more data sources comprise a plurality of data sources.
- a computing device includes one or more processors, memory, and one or more programs stored in the memory.
- the programs are configured for execution by the one or more processors.
- the one or more programs include instructions for performing any of the methods described herein.
- a non-transitory computer-readable storage medium stores one or more programs configured for execution by a computing device having one or more processors and memory.
- the one or more programs include instructions for performing any of the methods described herein.
- Figure 1 A illustrates a workflow for creating and consuming a data model that spans multiple sets of facts, in accordance with some implementation.
- Figure IB illustrates a data model, in accordance with some implementations.
- Figure 1C illustrates visual differences between existing data models and data models of the present disclosure, in accordance with some implementations.
- Figure 2 is a block diagram of a computing device according to some implementations.
- Figure 3 is a block diagram of a server system according to some implementations.
- Figures 4A and 4B illustrate separate fact tables that share dimension logical tables, in accordance with some implementations.
- Figures 5 A - 5D illustrate adding a new tree to an existing object model and adding a new relationship, in accordance with some implementations.
- Figures 6A and 6B illustrate adding a new relationship between objects in an existing object model, in accordance with some implementations.
- Figure 7 illustrates removing a relationship between obj ects in an existing obj ect model, in accordance with some implementations.
- Figures 8 A - 8C illustrate changing a relationship between obj ects in an existing object model, in accordance with some implementations.
- Figures 9 A - 9C illustrate rearrangement of a fact subtree, in accordance with some implementations.
- Figure 10 illustrates visualizing objects that are connected to a root object, in accordance with some implementations.
- Figure 11 illustrates visualizing unshared obj ects that are connected to a selected object, in accordance with some implementations.
- Figure 12 illustrates swapping unshared tables with a base table, in accordance with some implementations.
- Figures 13 A - 13U provide a series of screenshots that illustrate user interactions with a data modeling graphical user interface for building (e.g., constructing) multi-fact data models, in accordance with some implementations.
- Figures 14A and 14B illustrate a data model with seven logical tables, in accordance with some implementations.
- Figures 15A - 15C illustrate a data model, in accordance with some implementations.
- Figures 16A - 16H provide a series of screenshots that illustrate user interactions with a data analytics user interface, in accordance with some implementations.
- Figures 17A - 17E provide a series of screenshots that illustrate user interactions with a data analytics user interface, in accordance with some implementations.
- Figures 18 A - 181 provide a series of screenshots that illustrate user interactions with a data analytics user interface, in accordance with some implementations.
- Figures 19A - 19G illustrate steps of a query generation algorithm, in accordance with some implementations.
- Figures 20A - 201 illustrate an example of generating a query, in accordance with some implementations.
- Figures 21 A - 21C illustrate dimension-measure subgraphs, in accordance with some implementations.
- Figure 22 illustrates a multi-fact object model, in accordance with some implementations.
- Figure 23 illustrates a multi-fact object model, in accordance with some implementations.
- Figures 24A - 24H illustrate exemplary data tables or data visualizations that are generated from multi -fact object model analysis.
- Figure 25 illustrates a multi-fact object model, in accordance with some implementations.
- Figure 26 illustrates creating a join logical operation for tree subqueries with shared nodes, in accordance with some implementations.
- Figure 27 illustrates creating a join logical operation for tree subqueries with no shared nodes, in accordance with some implementations.
- Figures 28A - 28E provide a flowchart of a method for generating object models (e.g., data models) that span multiple fact tables, in accordance with some implementations.
- object models e.g., data models
- Figures 29A - 29D provide a flowchart of a method for performing guided analysis using multi-fact object models (e.g., data models), in accordance with some implementations.
- multi-fact object models e.g., data models
- Figures 30A - 30D provide a flowchart of a method for generating data visualizations using multi-fact object models (e.g., data models), in accordance with some implementations.
- multi-fact object models e.g., data models
- Enterprises need a holistic understanding of their data to effectively manage their businesses. Oftentimes, their data exists as “silos,” in the form of separate fact tables, with common dimensions (e.g., dimension data fields or dimension fields) such as time and geography. For example, an organization may have a marketing department that controls a marketing campaign for products and a sales department that owns the sales of these products. Having the ability to combine data from a marketing fact table and a sales fact table can provide a complete picture about the effectiveness of the marketing campaign and its impact on sales.
- Some data analytics applications restrict analysis to a single set of facts, limiting the questions analysts can ask and imposing a maintenance burden on data stewards who must support workarounds.
- the disclosed implementations address deficiencies in current systems by providing improved methods, devices, systems, and user interfaces that enable the creation of and consumption of data models that span multiple fact tables.
- FIG. 1 A illustrates a workflow 100 for creating and consuming a data model that spans multiple sets of facts, in accordance with some implementations.
- the workflow 100 can be thought of as including three phases, namely a data modeling phase (step 102), a deployment phase (step 106), and an analytics phase (step 108).
- the data modeling phase comprises constructing (104) (e.g., generating) (e.g., by a data modeler) a multi -fact object model.
- a multi-fact object model refers to an object model that includes multiple (e.g., at least two) fact tables.
- a computing device executes a data visualization application 230 that includes a data modeling user interface 240 for generating multi-fact object models.
- Figure 1 A illustrates a data model 120 that is generated and displayed by the data modeling user interface 240, in accordance with some implementations.
- the workflow 100 includes an analytics phase 108.
- a computing device executes a data visualization application 230 that includes a data analytics user interface 250 for performing the analytics phase.
- Figure 1 A shows that in some implementations, the computing device receives (110) (e.g., via the user interface 250), user specification of one or more data fields for generating a data visualization (or a visual specification 252).
- the computing device is configured to apply (112) an object model (e.g., a multi-fact object model that is generated in the data modeling phase 102) to determine query semantics.
- the computing device constructs (114) and executes one or more queries according to the determined query semantics.
- the computing device then generates (116) and displays a data visualization.
- Figure 1A illustrates an example data visualization 130 that is generated and displayed on the data analytics user interface 250, in accordance with some implementations.
- Figure IB illustrates an object model 150 in accordance with some implementations.
- An object model can be thought of as a diagram that informs a data visualization application how it should query data in the connected database tables.
- an object model is a simple model with a single table.
- an object model is a complex model with multiple tables that use different combinations of relationships, joins, and unions.
- the object model 150 has two layers, namely a logical layer 160 and a physical layer 170.
- the physical layer 170 is referred to as the “Data Model”
- the logical layer 160 is referred to as the “Object Model.” In most cases here the focus is on the logical layer 160.
- the default view that a user (e.g., a data modeler) sees in a data modeling user interface 240 of the data visualization application 230 is the logical layer 160.
- the logical layer 160 includes a logical table A 162-1 (e.g., an object) and a logical table B 162-2 (e.g., another object). Data in the logical layer 160 is combined using relationships 164 (also referred to as “noodles” in the present disclosure).
- relationships 164 also referred to as “noodles” in the present disclosure.
- the physical layer 170 data is combined between tables using joins and unions.
- Each logical table 162 in the logical layer 160 contains data from at least one physical table in the physical layer 170.
- the logical table A 162-1 is made of four tables 172-1, 172-2, 172-3, and 172-4 whereas the logical table B 162-2 is made of one table 172-5.
- Tables at the logical layer are not merged in the data source; they remain distinct (e.g., normalized) and maintain their native level of detail.
- the terms “object model” and “data model” are generally used interchangeably.
- the logical layer 160 is also referred to as a semantic layer.
- the data modeling capabilities disclosed herein create flexible data sources built around relationships. Relationships combine data from different tables by looking at what columns (fields) those tables have in common and using that information to bring information from each table together in the analysis. Unlike joins or unions, relationships form a data source without flattening multiple tables into a single table. Because of this, related data sources know which table each field is from. That means each field keeps its context, or level of detail. Related data sources can therefore handle tables with different granularity without issues of duplication or data loss. In a related data source, the joins are not fixed up front. Instead of merging all the data (and having to work with all the data regardless of what each visualization requires), only the relevant data is combined as necessary (e.g., per data visualization).
- the data visualization application evaluates the relationships of the relevant fields and tables. Those relationships are used to write queries with the correct join types, aggregations, and null handling. Users can think about how the data fits together and what questions they want to answer, rather than how to combine the data or compensate for artifacts from the data source. Relationships do not replace the previous ways of combining data, such as via joins, unions, and blends. Rather, relationships are a novel, flexible way to bring data together from multiple sources.
- Figure 1C illustrates conceptually differences between existing data models (top diagram) and data models (bottom diagram) of the present disclosure, in accordance with some implementations. Compared to existing data models, the layout of the multi-fact data models disclosed herein is simpler and easier to understand.
- FIG. 2 is a block diagram of a computing device 200 (e.g., a client device) that can execute a data visualization application 230 or a data visualization web application to display a data visualization.
- the computing device displays a graphical user interface 232 for the data visualization application 230.
- the graphical user interface 232 includes a data modeling user interface 240.
- the graphical user interface 232 includes a data analytics user interface 250.
- the computing device 200 may be a desktop computer, a laptop computer, a tablet computer, or other computing device with a display and a processor capable of running a data visualization application 230.
- the data visualization application 230 may include a data source generator for database organization (e.g., generating object models for databases) as well as generating new data sources using existing databases.
- a computing device 200 typically includes one or more processing units/cores (CPUs) 202 for executing modules, programs, and/or instructions stored in the memory 214 and thereby performing processing operations, one or more network or other communications interfaces 204, memory 214, and one or more communication buses 212 for interconnecting these components.
- the communication buses 212 may include circuitry that interconnects and controls communications between system components.
- a computing device 200 includes a user interface 206 comprising a display 208 and one or more input devices or mechanisms 210. In some implementations, the input device/mechanism includes a keyboard.
- the input device/mechanism includes a “soft” keyboard, which is displayed as needed on the display 208, enabling a user to “press keys” that appear on the display 208.
- the display 208 and input device/mechanism 210 comprise a touch screen display (also called a touch sensitive display).
- the display is an integrated part of the computing device 200. In some implementations, the display is a separate display device.
- the memory 214 includes high-speed random-access memory, such as DRAM, SRAM, DDR RAM or other random-access solid-state memory devices.
- the memory 214 includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
- the memory 214 includes one or more storage devices remotely located from the CPUs 202.
- the memory 214, or alternatively the non-volatile memory devices within the memory 214 comprises a non- transitory computer readable storage medium.
- the memory 214, or the computer readable storage medium of the memory 214 stores the following programs, modules, and data structures, or a subset thereof:
- an operating system 216 which includes procedures for handling various basic system services and for performing hardware dependent tasks;
- a communication module 218, which is used for connecting the computing device 200 to other computers and devices via the one or more communication network interfaces 204 (wired or wireless) and one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
- a web browser 220 (or other client application), which enables a user to communicate over a network with remote computers or devices;
- a data visualization application 230 which provides a graphical user interface 232 for a user to perform data analysis, including constructing databases, constructing object models, and constructing visual graphics (e.g., an individual data visualization or a dashboard with a plurality of related data visualizations).
- the data visualization application 230 executes as a standalone application (e.g., a desktop application).
- the data visualization application 230 executes within the web browser
- the data visualization application 230 includes: o a graphical user interface 232, which enables a user to access or build object models and data sources, and also provides a graphical view to build data visualizations by specifying elements visually.
- the graphical user interface 232 includes a data modeling user interface 240 for accessing or building object models and data sources.
- the graphical user interface 232 includes a data analytics user interface 250 for building data visualizations by specifying elements (e.g., data fields) visually; o an object model generator 234, which generates object models that include a plurality of objects (e.g., object classes).
- the object model can be generated from one or more databases, and each object in the object model can be generated from one or more data tables (e.g., physical tables) or one or more data fields.
- An object icon represents a logical combination of one or more data tables.
- an object that is represented by an object icon may include one or more data fields from a data table.
- an object that is represented by an object icon may be constructed by combining (e.g., a left join, a right join, an inner join, a union, or an intersection) two data tables to one another.
- the object model generator 234 includes a relationship cardinality module 236, which determines the cardinality of a relationship between two object classes.
- the relationship cardinality module 236 may determine that a many-to-many relationship has been detected or that a many- to-one relationship has been detected.
- the object model generator 234 also includes a relationship referential integrity module 238, which analyzes and compares data values in linking fields (e.g., linking data fields) from two object classes that are joined in a relationship to identify matched data values and unmatched data values; o an object model visualization generator 242, which generates (e.g., constructs) a visualization of an object model 268 for display in the object model visualization region 120.
- the object model visualization generator 242 generates object icons 1322 (e.g., Figure 13) corresponding to objects in the object models 268, and generates visual representations of relationships between two objects in the object models 268; a query semantics module 244, which uses object models to determine query semantics in response to receiving user specification of one or more data fields for generating a data visualization or a visual specification 252. Query semantics specify the database queries to retrieve data from the physical layer 170; a visualization generation module 246, which generates data visualizations and/or data dashboards. In some implementations, the visualization generation module 246 generates and displays data visualizations according to visual specifications.
- object icons 1322 e.g., Figure 13
- Query semantics module 244 which uses object models to determine query semantics in response to receiving user specification of one or more data fields for generating a data visualization or a visual specification 252.
- Query semantics specify the database queries to retrieve data from the physical layer 170; a visualization generation module 246, which generates data visualizations and/or data dashboards.
- the visualization generation module 246 uses an object model to generate queries (e.g., dimension subqueries, aggregated measure subqueries, and/or final queries) and/or optimize queries using query optimizers; and visual specifications 252, which are used to define characteristics of a desired data visualization.
- a visual specification 252 is built using the user interface 250.
- a visual specification includes identified data sources 262 (i.e., specifies what the data sources are), which provide enough information to find the data sources 262 (e.g., a data source name or network full path name).
- a visual specification 252 also includes visual variables and the assigned data fields for each of the visual variables.
- a visual specification 252 has visual variables corresponding to each of the shelf regions 1612 (e.g., the columns shelf 1612-1 and the rows shelf 1612-2 in Figure 16A).
- the visual variables include other information such as context information about the computing device 200, user preference information, or other data visualization features that are not implemented as shelf regions (e.g., analytic features),
- the data visualization application 230 or web application 330 groups together the user-selected data fields according to the object model. Such groups are called data field sets. In many cases, all of the user-selected data fields are in a single data field set. In some instances, there are two or more data field sets. Each measure m is in exactly one data field set, but each dimension d may be in more than one data field set;
- Each data source 262 includes one or more data tables 264, and each of the data tables includes one or more data fields 266.
- Each object model 268 includes a plurality of objects 270 (e.g., logical tables) that are connected to one another by relationship(s) 272 (e.g., noodles).
- Each of the above identified executable modules, applications, or set of procedures may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above.
- the above identified modules or programs i.e., sets of instructions
- the memory 214 stores a subset of the modules and data structures identified above.
- the memory 214 stores additional modules or data structures not described above.
- Figure 2 shows a computing device 200
- Figure 2 is intended more as a functional description of the various features that may be present rather than as a structural schematic of the implementations described herein.
- items shown separately could be combined and some items could be separated.
- FIG. 3 is a block diagram of a server system 300 in accordance with some implementations.
- a server system 300 may host one or more databases 260 or may provide various executable applications or modules.
- a server 300 typically includes one or more processing units/cores (CPUs) 302, one or more network interfaces 304, memory 314, and one or more communication buses 312 for interconnecting these components.
- the server 300 includes a user interface 306, which includes a display 308 and one or more input devices 310, such as a keyboard and a mouse.
- the communication buses 312 include circuitry (sometimes called a chipset) that interconnects and controls communications between system components.
- the memory 314 includes high-speed random-access memory, such as DRAM, SRAM, DDR RAM, or other random-access solid-state memory devices.
- the memory includes non-volatile memory, such as one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, or other non-volatile solid-state storage devices.
- the memory 314 includes one or more storage devices remotely located from the CPU(s) 302.
- the memory 314, or alternatively the non-volatile memory devices within the memory 314, comprises a non- transitory computer readable storage medium.
- the memory 314, or the computer readable storage medium of the memory 314, stores the following programs, modules, and data structures, or a subset thereof:
- an operating system 316 which includes procedures for handling various basic system services and for performing hardware dependent tasks;
- a network communication module 318 which is used for connecting the server 300 to other computers via the one or more communication network interfaces 304 (wired or wireless) and one or more communication networks, such as the Internet, other wide area networks, local area networks, metropolitan area networks, and so on;
- a web server 320 (such as an HTTP server), which receives web requests from users and responds by providing responsive web pages or other resources;
- a data visualization web application 330 which may be downloaded and executed by a web browser 220 on a user’s computing device 200.
- a data visualization web application 330 has the same functionality as a desktop data visualization application 230, but provides the flexibility of access from any device at any location with network connectivity, and does not require installation and maintenance.
- the data visualization web application 330 includes various software modules to perform certain tasks.
- the data visualization web application 330 includes a user interface module 332, which provides the user interface for all aspects of the data visualization web application 330.
- the user interface module 332 includes a data modeling interface module 340.
- the user interface module 332 includes a data analytics interface module 350.
- the data visualization web application 330 includes an object model generator 334 and an object model visualization generator 342, as described above with respect to the object model generator 234 and the object model visualization generator 242 in Figure 2.
- the object model generator 334 includes a relationship cardinality module 336 and a relationship referential integrity module 338, both of which are described above with respect to the relationship cardinality module 236 and the relationship referential integrity module 238 in Figure 2.
- the data visualization web application 330 includes a query semantics module 344 and a visualization generation module 346, as described above with respect to the query semantics module 244 and the visualization generation module 246 in Figure 2;
- a data retrieval module 348 which builds and executes queries to retrieve data from one or more databases 260.
- the databases 260 may be stored locally on the server 300 or stored at an external database system.
- the data retrieval module 348 may retrieve data from a database 260 that stores one or more data sources 262 such that the data tables 264 and the data fields 266 from the data source 262 can be used to build an object model;
- the databases 260 may store data sources 262, which provide the data used in the generated data visualizations.
- a database 260 may store an object model 268 that includes a plurality of objects 270 that are linked to one another via one or more relationships 272, and the objects 270 may be formed from data fields 266 from the one or more databases 260 or data sources 262.
- the databases 260 may store data in many different formats, and commonly include many distinct tables 264, each with a plurality of data fields 266. Some databases 260 comprise a single table.
- the data fields 266 in the database 260 include both raw fields from the database 260 (e.g., a column from a database table or a column from a spreadsheet) as well as derived data fields, which may be computed or constructed from one or more other data fields.
- derived data fields include computing a month or quarter from a date field, computing a span of time between two date fields, computing cumulative totals for a quantitative field, computing percent growth, and so on.
- derived data fields are accessed by stored procedures or views in the database.
- the definitions of derived data fields 266 are stored separately from the data source 262.
- the database 260 stores a set of user preferences for each user.
- the user preferences may be used when the data visualization web application 330 (or desktop data visualization application 230) makes recommendations about how to view a set of data fields 266.
- the database 260 stores a data visualization history log, which stores information about each data visualization generated. [00120] In some implementations, the database 260 stores other information, including other information used by the data visualization application 230 or data visualization web application 330.
- the databases 260 may be separate from the server system 300 or may be included with the server system (or both).
- Each of the above identified executable modules, applications, or sets of procedures may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above.
- the above identified modules or programs i.e., sets of instructions
- the memory 314 stores a subset of the modules and data structures identified above.
- the memory 314 stores additional modules or data structures not described above.
- Figure 3 shows a server system 300
- Figure 3 is intended more as a functional description of the various features that may be present rather than as a structural schematic of the implementations described herein.
- items shown separately could be combined and some items could be separated.
- some of the programs, functions, procedures, or data shown above with respect to a server 300 may be stored or executed on a computing device 200.
- the functionality and/or data may be allocated between a computing device 200 and one or more servers 300.
- Figure 3 need not represent a single physical device.
- the server functionality is allocated across multiple physical devices that comprise a server system.
- references to a “server” or “data visualization server” include various groups, collections, or arrays of servers that provide the described functionality, and the physical servers need not be physically collocated (e.g., the individual physical devices could be spread throughout the United States or throughout the world).
- a data model contains an object graph.
- the nodes in the graph are called objects and the links are called relationships.
- the graph is also referred to as a “tree.”
- object model and “data model” are generally used interchangeably.
- An object is a logical table in an object model. Objects are built through physical modeling. For example, an object may contain a join of two Oracle tables. When considering semantics, objects are treated as opaque, meaning that it is only necessary to be able to query an object and get its schema. In this disclosure, the terms “object” and “logical table” are used interchangeably.
- a relationship is a link for combining objects. Relationships link the objects on one or more pairs of object fields.
- a data model with shared objects can block filter flow across shared objects. This is in contrast to existing data models, in which there exists one type of relationship and a filter that is applied to one object flows across all relationships between objects. Relationships will eventually get compiled into joins of various types.
- information about the relationship cardinality (e.g., many-to-one, many-to-many, one-to-many, or one-to-one) is known to the data visualization application 230 and/or the data visualization web application 330.
- the data visualization application 230 assumes all relationships are many-to-many if their cardinality is not known.
- the referential integrity of relationships is known.
- the data visualization application may know that inner joining one logical table to another will not cause the first to lose rows due to unmatched keys. Without this information, a data visualization application does not assume any guarantees about matches.
- a query includes calculations and/or filters, which are defined in terms of object fields or other calculations.
- a calculation includes a field name and a formula. “Layering on a calculation” means applying the formula on top of a query to output a column with the given field name defined by the formula. If a field with that name already exists, the calculation will overwrite it.
- Every tree has a root table.
- the root table always starts from the left-most side of the object model.
- a shared tree is a tree that can be traced back to two or more root tables. A shared tree does not include the root table.
- a shared object is an object that is in common with two or more related trees. For example, in Figure 4A, the data fields “date” (represented as “DimDate” in Figure 4A, or dimension field “Date”) and “sites” (represented as “DimSites” in Figure 4A, or dimension field “Sites”) are shared objects.
- Tree traversal is directional. When we trace an object back to its root, we trace only in the leftward direction
- a shared object exists in context. For example, if there are two or more facttrees that are used to generate a data visualization, but a shared object is used with only one fact tree, that shared object would use the same semantics as if it is unshared. This is discussed in greater detail in Section IV.
- Shared objects can be connected together into their own subtree. Only one shared object in a shared subtree (see below) can be connected to one unshared object in a facttree.
- Fact (sub)tree Some implementations of the present disclosure enable a data model to support multiple fact trees. Fact trees are combined at the shared objects. In the example of Figure 4A, there are two fact trees: (1) Inventory, DimDate, DimSites, and (2) Sales, DimDate, DimSites
- Unshared subtree An unshared subtree is composed of all related (e.g., connected) objects in a fact tree that are not shared between fact trees.
- the unshared subtrees are (1) Inventory and (2) Sales.
- Shared subtree A shared subtree is composed of all related (e.g., connected) objects in a fact tree that are shared between fact trees.
- a dimension is a field that can be considered an independent variable.
- a dimension data field contains qualitative or categorical information.
- a dimension data field cannot be aggregated except for counting. Some examples of dimensions are “date,” “region,” “customer name,” “sales type,” “order ID,” “age,” and “longitude.”
- a measure is a field that is a dependent variable. That is, its value is a function of one or more dimensions (e.g., dimension data field).
- a measure field is one that contains numerical (e.g., quantitative) information. Examples of measure fields are “sales,” “revenue,” “price,” and “spend.”
- the present disclosure improves existing data modeling experience by enabling analysts to (i) create a data model (also referred to as an object model) that has multiple related trees; (ii) create relationships independently of adding an object in order to share an object between trees; (iii) identify objects (e.g., logical table(s)) and relationships within a tree; and (iv) identify connected trees and objects related to a shared object.
- a data model also referred to as an object model
- FIG. 4A shows an inventory logical table 402 (a fact table) and a Sales logical table 404 (another fact table).
- the inventory table 402 and the Sales table 404 have separate facts that do not directly relate to each other. However, they both share a dimensional date (DimDate) object 406 (a logical table) and a dimensional sites (DimSites) object 408 (another logical table).
- DIDate dimensional date
- DISites dimensional sites
- Figure 4B illustrates that, in existing data modeling tools, in order to aggregate measures from the inventory fact table and the sales fact table using common dimensions, an analyst has to (i) generate (412) a first visualization by aggregating measures from the inventory table 402 using the dimensions date and sites, (ii) generate (414) a second visualization by aggregating measures from the sales table 404 using the dimensions date and sites; and (iii) juxtapose (416) the first and second data visualizations onto a data dashboard.
- the same object can be connected to multiple fact trees as a “shared object.”
- Some aspects of the present disclosure support existing relationship semantics within the same fact tree (i.e., relationships between logical tables, join/uni on between physical tables)
- Some aspects of the present disclosure support shared objects between some (and not all) fact trees. [00147] Some aspects of the present disclosure support shared objects that can have their own subtrees.
- characteristics / properties of the multi-tree approach include:
- the left-most object is the root (e.g., root table or root object). There can be one and only one root per tree at the far-left side of the data model, which will be used to describe the fact (sub)tree.
- Each tree is derived by starting from the root and then traversing all relationships going away from the root to the end of every branch, but never going backwards.
- Each object must belong to at least one tree.
- a single object can be a tree by itself.
- Each object can have zero or one relationship path back to every root object.
- Each shared object is defined as relating back to two or more root objects.
- Root objects cannot be directly related to each other.
- Scenario 1 Create a new data model knowing that it is multi-fact using shared objects
- Scenario 2 Modify a data model (not using shared objects) to have multi -fact and using shared objects
- Scenario 3 Convert multiple single-fact data sources into one data source that is multi-fact and has shared tables
- Scenario 4 Remove multi-fact and shared objects to from a single-fact data model.
- Scenario 1 Two (or more) fact subtrees with two (or more) shared objects.
- adding an object would also add a relationship.
- some implementations of the present disclosure enable adding relationships independently of objects. For example, in some implementations, relationships can be added across (sub)trees, thereby changing (e.g., converting) an object into a shared object. In some implementations, a relationship can be added to link another tree to an existing shared object.
- Figures 5 A, 5B, 5C, and 5D illustrate adding a new root to an object model 510, in accordance with some embodiments.
- the object model 510 is displayed in data modeling user interface 240 or data modeling user interface 340.
- Figure 5A shows the object model 510 includes inventory object 512, a date object 514, and a sites object 516.
- Data in the inventory object 512 can be combined with data in the date object 514 via a first relationship 522.
- Data in the inventory object 512 can be combined with data in the sites object 516 via a second relationship 524.
- the inventory object 512, the date object 514, and the sites object 516 form an “Inventory” tree.
- the data modeling user interface 240 or data modeling user interface 340 can display an icon 518 (e.g., a drop zone “+ New Tree”) for creating a new tree.
- Figure 5A illustrates placement (523) of a sales object 520 over the icon 518 to add a new tree (e.g., “Sales”) to the object model.
- Figure 5B shows that a new “Sales” tree has been added. Sales can become its own separate tree or it can be related to the Inventory tree via shared objects.
- FIG. 5B illustrates that the Sales tree, which consists of the sales object 520, is independent of the Inventory tree.
- Figure 5B also shows that in some implementations, in response to user interaction with the sales object 520 (e.g., at its right), a freeform line 526 is generated and displayed.
- Figure 5C illustrates the capability to add relationships independently of objects from the sales tree to another tree (e.g., the inventory tree).
- the data model in response to the user bringing the freeform line 526 to the date object 514, the data model creates a relationship 528 between the sales object 520 and the date object 514.
- the newly added relationship 528 makes the date object 514 a shared object between the Inventory fact tree (or fact subtree) and the Sales fact tree (or fact subtree).
- Figure 6A illustrates an object model 610, which includes an inventory root object 612 and a sales root object 614.
- the date object 616 is shared with the Inventory object 612 and the Sales object 614 via relationships 624 and 626, respectively.
- the products object 618 is related to the Sales 614 via another relationship 630.
- the sites object 620 is related to the Inventory 612 via the relationship 628.
- a context menu on an object can be utilized to add a new relationship.
- Figure 6 A illustrates that a context menu 622, corresponding to the Products 618, can be accessed in response to user selection of an icon 621 on the Products object 618.
- FIG. 6B the Inventory object 612 and the Sites object 620 are highlighted because either object can accept a relationship from the Products object 618.
- Other objects cannot relate to the Products object 618 because the Sales object 614 is already related to the Products object 618 and the Date object 616 is already related to Sales (cross branch within the same subtree).
- a freeform line 632 can extend from the right of the Products object 618 to connect to the Sites object 620 to create a relationship between these two objects.
- FIG. 7 illustrates that a relationship in an object model 700 can be removed if there are two or more relationships to the left of a current object.
- the Sites object 710 has, to its left, (i) a first relationship 718 with the budget object 712, (ii) a second relationship 720 with the inventory object 714, and (iii) a third relationship 722 with the products object 716.
- Figure 7 illustrates that, in response to user selection of the relationship 720 (e.g., via a mouse click), a menu 724 is displayed that enables the user to remove the relationship.
- any of the relationships can be deleted.
- the option to remove a relationship is not available when there is only one subtree.
- a shared object becomes “unshared” when its last remaining relationship is related to one tree. In some implementations, the last remaining relationship to the left of a current object cannot be deleted.
- objects that are downstream of a current object become unshared if they are related to only one (sub)tree.
- a relationship is a link for combining objects.
- a relationship can be changed by modifying either end of the link.
- Figure 8A shows a data model 800 that includes an inventory object 802 and a date object 804 connected by a relationship 806.
- the data modeling user interface 240 displays a menu 807 that includes options for removing the relationship, changing the left table, or changing the right table.
- FIG. 8B shows that in response to user selection at the right end of the relationship 806, the connection between Inventory 802 and Date 804 is removed and a freeform line 808 is created.
- Figure 8C illustrates a connection 810 (e.g., relationship) is created between the Inventory object 802 and the Finance object 812 via manipulation of the freeform line 808.
- the finance object 812 becomes a shared object between the Budget object 814 subtree and the Inventory object 802 subtree.
- Core scenario 2 Enable rearrangement of a fact subtree so a data modeler can share an object that is currently set as root.
- a user e.g., data modeler
- Figure 9A illustrates an exemplary complex data model 900.
- the budget object 902 and the products object 904 are not shared in the data model 900, they can become a root in their respective subtrees.
- Figure 9B shows that the Budget object 902 has a Finance branch 906 and a Sites branch 908.
- Figure 9C shows that making Budget 902 a new root causes Finance to be a new branch as well as Sites and its shared subtree with Employees and States.
- Core scenario 3 Data models can become complex and users would need to be able to take different perspectives to see the extent of a fact subtree or to see what facts are already related to the current shared objects.
- Figure 10 illustrates an object model 1000 according to some implementations.
- Figure 10 shows that in some implementations, when a user selects the Appointments object 1002, all objects that are connected to the Appointments object (e.g., Appointment Type 1004, Billing Category 1006, MonthFK 1008, ProviderNumber 1010, Room 1012, DemographicNumber 1014, FiscalYear 1016, ServiceTypelD 1018, CostCode 1020, SubCategory 1022, and Category 1024), as well as the respective relationship connectors, are visually emphasized.
- Appointment Type 1004 Billing Category 1006, MonthFK 1008, ProviderNumber 1010, Room 1012, DemographicNumber 1014, FiscalYear 1016, ServiceTypelD 1018, CostCode 1020, SubCategory 1022, and Category 1024
- an unshared object of an object model 1000 is selected (e.g., Ratings 1026, Channel 1028, and/or Feedback 1030)
- unshared objects e.g., such as Ratings 1026, Channel 1028, and/or Feedback 1030
- shared objects e.g., DemographicNumber 1024, ServiceTypelD 1018, CostCode 1020, SubCategory 1022, Category 1024, MonthFK 1008, and FiscalYear 1016).
- FIG 11 illustrates the data model 1000 that is shown in Figure 10, from the perspective of MonthFK 1018.
- a user can identify that root objects (i.e., the objects on the leftmost column) Ratings 1022, Spend 1034, Invoice 1028, and Appointments 1036, and the FiscalYear object 1016 are shared with MonthFK 1018 whereas the other objects in the object model 1000 are unshared.
- the data modeler can bring in additional unshared object(s), or to remove relationship(s) with existing object(s).
- Figure 11 illustrates that in some implementations, the data modeling user interface 240 includes an affordance 1102 that enables a user to select which root objects (e.g., Ratings 1026, Spend 1034, Invoice 1028, and Appointments 1036) the user would like to view on the user interface.
- root objects e.g., Ratings 1026, Spend 1034, Invoice 1028, and Appointments 1036
- a user can swap an unshared table of an object model with a base table of the object model.
- Figure 12 illustrates a data model 1200 where there are enough tables that swap with base tables and reparent have the freedom to work.
- the base tables are the leftmost tables: Inventory 1202, Marketing 1204, Sales 1206, and Support 1208.
- the unshared tables are those that can be traced back to only one base table: Inventory 1202, Marketing 1204, Sales 1206, Store 1214, Region 1222, Support 1208, and Parts 1216.
- the shared trees of tables are those that group together within their own trees starting with one table that is shared with multiple base tables.
- the single shared trees are Supplier 1218, Dates 1210, and Customer 1212.
- the multiple shared trees are Product 1220, Subcategory 1224, Category 1226, and Components 1228.
- any unshared table can swap with its base table.
- Store 1214 and Region 1222 can swap with Sales 1206; Parts 1216 can swap with Support 1208.
- a user can reparent a table within its own tree.
- an unshared table that can swap with its base table can reparent itself to any other unshared table that stems from the same base table.
- Region 1222 can be reparented to Sales 1206.
- a shared table can be reparented within its own shared tree.
- Category 1226 can be reparented to Components 1228; Components 1228 can be reparented to Subcategory 1224 or Category 1226.
- Figures 13 A - 13U provide a series of screenshots that illustrate user interactions with a data modeling graphical user interface 240 for building (e.g., constructing) multi-fact data models, in accordance with some implementations.
- the graphical user interface 240 is used for validating relationships between objects (also referred to as object classes or logical tables) in an object model.
- the user interface 240 includes a connections region 1302, a tables region 1304, an object model visualization region 1306, a data field/metadata region 1308, and a data region 1310, in accordance with some example implementations.
- the connections region 1302 includes one or more user-selectable data sources 1312.
- the tables region 1304 includes one or more icons 1314. Each of the icons 1314 represents a respective data table from one or more data sources 1312 that are selected in the connections region 1302.
- the object model visualization region 1306 shows an object model 1320 that includes a plurality of object icons 1322 (e.g., 1322-1 to 1322-9 in Figure 13A).
- Each object icon 1322 represents a respective object (e.g., logical table or object class) in the object model 1320, and each object includes respective one or more data fields.
- the object model visualization region 1306 also displays visual connections 1324 (e.g., connected links or connectors) between connected object icons 1322. Each of the visual connections 1324 represents a respective relationship between the connected objects corresponding to the object icons 1322..
- the object model 1320 is a multi -fact model that includes multiple fact tables “Inventory,” “Marketing,” and “Sales,” corresponding to object icons 1332-1 to 1332-3.
- the fact tables are also referred to as root tables (or root objects).
- object icons corresponding to fact tables are located in the left-most portion of the object model visualization region 1306.
- the object icons corresponding to the fact tables are arranged in an alphabetical order in the user interface 240.
- the data field/metadata region 1308 displays information regarding the data fields and/or metadata of a selected object that is represented in the object model visualization region 130 as an object icon 1306.
- Figure 13A shows that in response to user selection of object icon 1322-1, corresponding to “Inventory” object, the data field/metadata region 1308 displays a table 1326 that provides information about the fields in the Inventory logical table, such as their field names, field type, the physical table in which a respective field is located, and metadata information such as the remote field name.
- the data region 1310 displays information regarding data fields and data values of a selected object that is represented in the object model visualization region 130 as an object icon 1306.
- Figure 13A also shows that in response to user selection of object icon 1322-1, the data region 1310 displays a table 1328 that includes information regarding data fields corresponding to the Inventory logical table and their respective data values.
- Figure 13B shows that, in response to a user interaction (e.g., user selection or mouse hover) with a visual connection 1324-1, the user interface 240 displays a tooltip 1330 that provides information about the logical table(s) that are connected by the visual connection 1324-1, the cardinality (e.g., many-to-many, many-to-one, one-to-many), and the related fields.
- a user interaction e.g., user selection or mouse hover
- the user interface 240 displays a tooltip 1330 that provides information about the logical table(s) that are connected by the visual connection 1324-1, the cardinality (e.g., many-to-many, many-to-one, one-to-many), and the related fields.
- Figure 13C illustrates a user hovering over (1332) (e.g., via a mouse) an object icon 1322-2 corresponding to the “Marketing” object.
- Figure 13C shows that, in response to the user interaction, the user interface 240 displays a subset of object icons 1322-4, 1322-5, 1322-8, and 1322-9 of the object model, corresponding to objects Dates, Products, Subcategory, and Category, that are connected to the “Marketing” object.
- the user interface 240 also displays the visual connections 1324 of the subset of object icons.
- the other object icons and connectors that are not connected to the “Marketing” object are visually de-emphasized in the user interface 240.
- the user interface 240 displays a tooltip 1332 that invites the user to double click on the object icon 1322-2 to view the physical table.
- Figures 13D to 13S illustrate the process of adding a “Support’ fact table to the data model 1320, in accordance with some implementations.
- Figure 13D shows that before adding the “Support” fact table to the object model, the “Customer” object (represented by object icon 1322-7) is not shared because it only connects to one root table Sales (represented by object icon 1322-3).
- Figure 13E illustrates user selection (1334) of an icon 1314-1 corresponding to the Support data table (e.g., fact table).
- Figure 13F illustrates a user drag-and-drop action that drags the icon 1314-1 from the tables region 1304 to the object model visualization region 1306.
- Figure 13F also illustrates that as the icon 1314-1 crosses the tables region 1304 into the object model visualization region 1306, the user interface 240 displays an affordance 1336 (e.g., an icon or a drop zone) (e.g., “+ New Base Table”) for adding a new table (or creating a new tree, as described in with reference to Figure 5 A, element 518).
- an affordance 1336 e.g., an icon or a drop zone
- a table e.g., a logical table or an object
- the first way of adding the table to the object model is by relationships (e.g., “noodles”).
- Figures 13G and 13H show that as the user moves the icon 1314-1 toward any of the existing object icons 1322 in the data model 1320, a freeform line 1338 with one end connected to the icon 1314-1 is automatically generated.
- a user can connect the other end of the freeform line 1338 to an object icon of an object to form a relationship between the “Support” logical table and that object.
- Figure 13G shows that it is possible to form a relationship between the Support table (object) and the Sales object.
- Figure 13H shows that it is possible to form a relationship between the Support table (object) and the Customer logical table.
- Figure 131 illustrates that if the icon 1314-1 is placed underneath an existing object (e.g., “Customer”), the data visualization application provides a union option (1340) to combine the two objects (e.g., “Customer” and “Support”).
- an existing object e.g., “Customer”
- the data visualization application provides a union option (1340) to combine the two objects (e.g., “Customer” and “Support”).
- FIGS 13 J and 13K illustrate that placement of the icon 1314-1 on the affordance 1336 ( Figure 13K) adds a new object icon 1322-10, corresponding to the “Support” logical table, to the data model 1320.
- the “Support” logical table exists as a standalone table in the object model because the object icon 1322-10 is not connected to other object icons in the object model 1320.
- Figure 13K shows that, in some implementations, when the data visualization application 230 detects the presence of standalone objects (that are not connected to any other objects) in an object model, the data visualization application 230 causes an alert icon 1342 to be displayed on the user interface.
- Figure 13L illustrates that, in response to user selection (1344) of the alert icon 1342, the user interface 240 displays a tooltip that informs the user of presence of disconnected table in the graph (object model).
- the tooltip includes an alert feature 1348 that can be activated by a user.
- Figure 13M shows that in response to user activation (1350) of the alert feature 1348, the user interface 240 displays the object icon(s) 1322 corresponding to standalone objects (e.g., disconnected objects) in a disconnected tables region 1352 that is separate from the object model visualization region 1306.
- the disconnected tables region 1352 assists the user in the process of identifying disconnected objects and makes the data modeling process more efficient.
- Figures 13N to 13Q illustrate the process of adding a relationship to an object in accordance with some implementations.
- Figure 13N shows that when a user selects a portion (e.g., an edge, a side, or an icon) of the object icon 1322-10, a freeform line 1354 (e.g., a “noodle”) is generated and displayed.
- a freeform line 1354 e.g., a “noodle”
- One end 1356 of the line 1354 is connected to the object icon 1322-10 and the other end 1358 of the line 1354 corresponds to a position of a mouse cursor (e.g., within a predefined margin) in the user interface 240.
- the user interface 240 precents a user from forming a relationship between a root table and one or more shared objects because of limitations in tree traversal strategy, and/or if the relationship would create a nested shared tree.
- Figure 13R illustrates a user interaction to connect the other end 1358 of the freeform line 1354 to the object icon 1322-7 corresponding to the object “Customer.”
- Figure 13S illustrates that, in response to the user interaction, the data visualization application 230 displays a visual connection 1324-2 between the object icon 1322-7 and the object icon 1324- 10.
- the visual connection 1324-2 represents a relationship between these two connected objects. Because the object icon 1324-10 is now connected, there are no longer any standalone objects in the object model. Accordingly, the disconnected tables region 1352 is no longer displayed in the user interface. In some instances, a user can go on to create other relationships.
- the “Support” table may have dates and the user can drag another freeform line 1360 (e.g., by interacting with a portion of the object icon 1324-10) and link it to the “Dates” logical table ( Figure 13T) to form a connection between the “Support” object and the “Dates” object.
- This is illustrated by the visual connector 1324-3 between the object icon 1322-10 and the object icon 1322-7 in Figure 13U.
- a user can remove a relationship between two objects by clicking on the visual connector corresponding to the relationship and by clicking the “Remove Relationship” icon 1362 in Figure 13U.
- Figure 13U shows an updated object model that includes the Support fact table.
- cross-fact analysis may present data fields that are related, unrelated, and/or ambiguously related
- Multi-fact dimensional cardinality warning unrelated dimensions in different facts are crossed / cartesian joined thus introducing potential high cardinality product which requires user intervention
- Inventory 402 e.g., the inventory object
- Sales 404 e.g., the sales object
- DimDate 406 and DimSites 408 objects e.g., logical tables
- the present disclosure introduces new semantics for unrelated fields.
- Some implementations of the present disclosure also provide an improved user interface that provides more direct feedback to help analysts understand the underlying semantics, take appropriate action to keep their analysis along shared dimensions, or resolve ambiguities.
- the improved user interface abstracts away from the complex data model and presents a simple yet clear analytic experience for data analysts. Should the analysts “wander off track” during the analysis, they are notified of the off-track analysis and can take action to get back on-track.
- One aspect of the present disclosure improves existing analytic experience by presenting to a user reachable data fields.
- the disclosed user interface shows fields when using single tree versus multi-tree semantic.
- Another aspect of the present disclosure introduces additional user interface features to resolve ambiguities. For example, multiple unrelated dimensions are cross-joined, leading to high cardinality. Ambiguously relatable paths are resolved.
- Another aspect of the present disclosure enhances existing pill UX (e.g., in a shelf region of the user interface) to help users understand the underlying semantics when fields are unrelated or ambiguously related, and/or actions a user can take to be on track.
- the current Tableau data model contains a single tree graph of logical tables. Columns within logical tables appear as dimension and measure fields parented to their logical table in the schema viewer. Calculated and aggregate fields that span multiple tables appear outside of the table hierarchy. This logical grouping provides the necessary context for analysts to decide what to use in their analysis. Between any dimension and measure field, there is always an aggregation path.
- the aggregation path between dimensions and measures can be none (unrelated), one (related), or many (ambiguously related).
- analysts do not have the additional context of what fields belong to what trees and which fields should be used together. Teams that rely on the Tableau multi-fact data model also may not have this additional context for their features to function correctly.
- Figure 14A illustrates a multi-fact data model with 7 logical tables (Tables 1, 2, 3, 4, 5, 6, and 7).
- Tables 1, 2, 3, 4, 5, 6, and 7 there are three fact trees that share logical tables (4, 5, 7, 6) and three shared trees of logical tables (4), (5, 7), and (6).
- Figure 14B shows that the multi-fact data model is make up of separate trees, Tree 1, Tree 2, and Tree 3.
- Single tree although the data model contains multiple facts (1 to 3), analysts may focus their analysis on one fact at a time. If analysts are using fields that have only one fact in common, the underlying query semantics will use only this fact and this would maintain backward compatibility with the current version of the object modeling tool. Examples: a. Simple tree 1 using the base table: dimensions from table 1, measures from tables 4, 5, and/or 7. b. Simple tree 2 not using the base table: dimensions from table 6, measures from tables 4, 5, and/or 7
- static and dynamic aspects there are two aspects to tree membership: static and dynamic aspects.
- the static aspect is the tree membership of fields (as they appear on the schema viewer) using the underlying data model
- the dynamic aspect is what trees are being used (i.e., what fields are being used on shelves and the current marks card).
- the disclosed schema viewer user interface (illustrated in the examples of Figures 16, 17, and 18) provides info-scenting of fields that are relatable based on the underlying data model and what fields are already in use.
- the first of these user interface hints is to gray out (e.g., visually deemphasize) fields that are unrelated and/or ambiguously related to fields on shelves.
- Unrelated fields are fields that belong to logical tables that are in separate trees. The simplest case is when fields belong to separate base tables (e.g., root tables), i.e., fields in different base tables of trees are always unrelated to each other.
- base tables e.g., root tables
- unrelated measures from tables 1 and 2 can become relatable when a shared dimension is used (from Tables 4, 5, or 7).
- Unrelated measures from tables 1 and 3 can become relatable when dimensions from these separate trees are used. But the measures are only aggregated to their dimensions in their respective tree.
- Analysts can disambiguate by creating a level of detail (LOD) calculation that includes a field in either Table 1 or 2.
- LOD level of detail
- the data analytics user interface 250 includes a UI component to generate these LOD to simplify the disambiguation of the aggregation path.
- the computing device 200 or the server 300 is configured to execute an algorithm for field relatability.
- the algorithm includes:
- Step 1 Are there dimensions on shelves? If no, then no need to gray out field; measures are aggregated within their respective table
- Step 2 Do dimensions have in common one tree? If yes, then use single tree evaluation (from object model vl): show inner join dimensions, and identify all trees that these dimensions belong to. a. Related, dimensions of the one tree, measures (can belong to multiple trees, but they have one tree is actively in use), and aggregate calcs that belongs to this tree in use b. Unrelated. Gray out dimensions and measures of trees not in use c. Ambiguously related. Gray out measures (and row-level calcs) that belong to related trees but not the single tree being evaluated d. Partially related.
- Step 3 Group dimensions with one or more trees in common, then use single tree evaluation described in step 2 for each tree in use.
- relatability logic in step 2 a.
- Gray out measures (and row-level calcs) that belong to related trees that overlap with trees of dimensions in use
- Step 4 For groups of dimensions that have one or more trees in common, outer join their tree-based tuples together. Use the same relatability logic described in step 3 above.
- Step 5 For groups of dimensions that have no tree in common, cross their dimension tuples. Use the same relatability logic described in step 3 above.
- Figures 15 A, 15B, and 15C illustrate an object model 1500 in accordance with some implementations.
- Figure 15A shows the inventory tree of the object model 1000.
- Figure 15B shows the support tree of the object model 1500.
- Figure 15C shows the dates object.
- Figures 16A - 16H provide a series of screenshots that illustrate user interactions with a data analytics user interface 250, in accordance with some implementations.
- FIG. 16A shows a data analytics graphical user interface 250 (e.g., data visualization user interface) in accordance with some implementations.
- the user interface 250 includes a schema region 1610 (also sometimes referred to as a “schema viewer”), a plurality of shelf regions 1612 (in this example, two shelf regions 1612-1 and 1612-2 are shown), and a data visualization region 1614.
- the schema region 1610 shows data field icons 1620 (or object field icon) corresponding to data fields (or object fields) of an object model.
- Each shelf region 1612 is configured to define a respective characteristic of a displayed data visualization according to placement of data fields icons from the schema region 1610 into the respective shelf region 1612.
- the data visualization region 1614 is configured to display data visualizations.
- the object model is a multi -fact object model that includes fact tables “Marketing,” “Product,” “Sales,” and “Territory.”
- each data field icon 1620 in the schema region 1610 is associated with a respective object 1616 (e.g., objects 1616-1 to 1616-4) of the object model.
- object 1616 e.g., objects 1616-1 to 1616-4.
- the data field icons “Marketing name,” “Marketing type,” “Marketing manager,” “Product ID,” “Territory ID,” “Campaign Spend,” “Campaign Budget,” and “Marketing (Count)” are all included in the Marketing object 1616-1.
- Figure 16B illustrates user selection of a data field icon 1620-1, corresponding to the data field “Campaign Spend,” from the schema region 1610, and placement of the data field icon 1620-1 into the column shelf 1612-1.
- the user selection and placement comprise a drag-and-drop action.
- Figure 16C shows that in response to user placement of the data field icon 1620- 1 into the column shelf 1612-1, the user interface 250 displays a data visualization 1622 (e.g., a bar chart) in the data visualization region 1614.
- the data visualization 1622 is generated (e.g., automatically and without user intervention) by aggregating (e.g., summing) all data values (e.g., campaign expenditures) of the data field “Campaign Spend.”
- Figure 16C shows that in some implementations, concurrently with displaying the data visualization 1622, the computing device updates a visual characteristic (e.g., visual appearance) of one or more data field icons in the schema region 1610.
- the data field icons 1620-2, 1620-3, and 1620-4 are visually de-emphasized (e.g., grayed out) relative to other data field icons 1620 in the schema region 1610.
- the data field icon 1620-2 corresponds to the data field “Product name” (e.g., a dimension field) in the Sales object 1616-3.
- the data field icon 1620-3 corresponds to the data field “Sales Type” (e.g., a dimension field) in the Sales object 1616-3.
- the data field icon 1620-4 corresponds to the data field “Region name” (e.g., a dimension field) in the Sales object 1616-3.
- the data field icons 1620-2, 1620-3, and 1620-4, corresponding to the “Product Name,” “Sales type,” and “Region name” dimension fields in the Sales object are grayed out when the user places the data field icon 1620-1 “Campaign Spend” on the shelf regions 1612.
- the grayed out fields - or info-scenting fields - are hints that the data visualization application leaves for the user to try to steer the user toward a certain direction for their analysis.
- Figure 16C presents a simple yet informative way of guiding an analyst in fully utilizing the multi-fact data model, by visually de-emphasizing fields that are not relevant to the current analysis.
- the shelf region 1612 contains only measure fields (e.g., Campaign Spend), dimension fields in unrelated facts are grayed out.
- Figure 16D illustrates user selection of the data field icon 1620-5, corresponding to the object field “Product name” in the Product object 1616-2, and placement of the object field icon 1620-5 into the shelf region 1612.
- Figure 16E shows that in response to user placement of the object field icon 1620-5 into the shelf region 1612, the user interface 250 displays an updated data visualization 1624 that comprises a bar chart where the sum of marketing spend is broken down by the products “phone,” “laptop,” and “charger.”
- Figure 16E also shows that concurrently with displaying the data visualization 1624, the visual characteristic of the data field icons 1620-2, 1620-3, and 1620-4 are updated from the grayed- out appearance in Figure 16C to having a regular typeface like the rest of the data field icons in the schema region 1610.
- the Sales dimensions are no longer grayed out because one can break down sales by product type.
- Figure 16F illustrates user selection of a data field icon 1620-6, corresponding to the measure field “Sales Total” from the Sales object 1616-3, and placement of the data field icon 1620-5 into the shelf region 1612.
- Figure 16G shows that in response to user placement of the object field icon 1620-6 into the shelf region 1612, the user interface 250 displays an updated data visualization 1626.
- the data visualization comprises two bar graphs showing sum of campaign spend by product name and sum of sales total by product name. These two bar graphs share a common vertical axis Product name.
- Figure 16H illustrates a data model 1626 corresponding to the examples of Figures 16A to 16G.
- the data model 1626 includes a Marketing root table (e.g., fact table) (i.e., Marketing object 1616-1) and a Sales root table (e.g., fact table) (i.e., Sales object 1616- 3). which are related by the Products logical table (i.e., Product object 1616-2) and the Territories logical table (i.e., Territory object 1616-4).
- Marketing can be broken down by products (i.e., Product object 1616-2) and territory (i.e., Territory object 1616-4). Sales can also be broken down by products and territories, but marketing and sales are not related to each other. Therefore, when an analyst specifies fields from Marketing, the data visualization displays fields from sales in a grayed-out manner because Marketing and Sales are separate fact tables. However, as soon as an analyst specifies fields that are common between marketing and sales, then it becomes possible to connect these two root tables by the common fields.
- Figures 17A - 17E provide a series of screenshots that illustrate user interactions with a data analytics user interface, in accordance with some implementations.
- Figure 17A shows a view of the user interface 250. Details of the user interface 250, including the schema region 1610, the shelf region 1612, the data visualization region 1614, the objects 1616 and the data field icons 1620 are described in Figure 16A and are not repeated for the sake of brevity.
- Figure 17B illustrates user selection of the data field icon 1620-5, corresponding to the field “Product name” (e.g., a dimension field) from the Product object 1612-5, and placement of the data field icon 1620-5 into the shelf region 1612.
- the data visualization displays a data visualization 1702 in the data visualization region 1614.
- the data visualization is a text table with rows corresponding to data values (e.g., names of products, such as “Phone,” “Laptop,” and “Charger”) of the object field “Product name.”
- Figure 17B also shows that concurrently with displaying the data visualization 1702, the computing device updates a visual characteristic (e.g., visual appearance) of data field icons 1620-7 to 1620-11, corresponding to the Territory object 1616-4, in the schema region 1610.
- the data field icons 1620-7 to 1620-11 are visually de-emphasized (e.g., grayed out) relative to other data field icons 1620 in the schema region 1610.
- the grayed out dimension fields are “Region name” and “Territory ID.”
- the grayed out measure fields are “Population,” “Region size,” and “Territory (Count) .”
- Figure 17C illustrates that, in response to a user interaction (e.g., user selection or mouse hover) with the data field icon 1620-11 (e.g., “Territory (Count)”), the user interface 250 displays a tooltip 1704 that includes information (e.g., guidance) indicating that the field “Product name” is not related to the field “Territory (Count).”
- the tooltip 1704 also includes a comment that “Measures related by multiple paths show repeated value,” meaning that if a user places the field “Territory (Count)” in the shelf region 1612, all possible combinations between data values of the field “Product name” and data values of the field “Territory (Count)” will be shown.
- the disclosed data visualization application explains to the analyst the reason why certain fields are grayed out.
- some implementations of the present disclosure provide an improved user interface that manages user expectation, thereby ensuring that a user does not get confused or frustrated by the result after selecting certain data fields.
- data field icons of fields that have been grayed out continue to be user-selectable.
- Figure 17D shows that in response to user selection of the data field icon 1620-11 and placement of the icon 1620-11 onto the shelf region, the data visualization displays an updated visualization 1706.
- the visualization 1706 is a bar graph of count of territory by product name, where each of the product names “Phone,” “Laptop,” and “Charger” shows the same territory count.
- the value for territory count is replicated (e.g., duplicated) for each of the product names because count of territory cannot be broken down by product name.
- FIGs 17D and 17E also shows that in some implementations, pill(s) in the shelf region 1612, such as pill 1710 and pill 1712, include respective indicators 1708 that, when interacted with by the analyst, displays information of which fields used in the analysis are related fields and which fields are unrelated fields, so that the analyst can go back and refine their analysis.
- Figure 17E shows that when an analyst interacts with (e.g., hovers over) the indicator 1708-1, corresponding to the unrelated field “Territory (Count),” the user interface 250 displays a tooltip 1714 that preserves a relevant reminder (e.g., similar to the information provided by the tooltip 1704) to the analyst that the field “Territory (Count)” is not related to the field “Product name.”
- a relevant reminder e.g., similar to the information provided by the tooltip 1704
- an analyst has access to all the field information in the tooltips to inform them about a particular field and its relevance to their analysis. [00247] In some implementations, the analyst is also informed of cardinality issues if the domain is beyond the limit that can result from the domain size and/or its usage with unrelated fields.
- Figures 18 A - 181 provide a series of screenshots that illustrate user interactions with a data analytics user interface, in accordance with some implementations.
- Figure 18 A shows a view of the user interface 250. Details of the user interface 250, including the schema region 1610, the shelf region 1612, the data visualization region 1614, the objects 1616 and the data field icons 1620 are described in Figure 16A and are not repeated for the sake of brevity.
- Figure 18B illustrates user selection of a data field icon 1620-12, corresponding to the data field (e.g., object field) “Marketing type” in the Marketing object 1616-1, and placement of the data field icon 1620-12 into the shelf region 1612.
- data field e.g., object field
- Figure 18C shows that in response to placement of the data field icon 1620-12 into the shelf region 1612, the user interface 250 displays a data visualization 1802 (e.g., a text table).
- the rows of the text table are data values (e.g., “Online” and “Print”) of the data field marketing type.
- the computing device visually de-emphasizes all the data field icons (i.e., data field icons 1620-2, 1620-3, 1620-4, 1620-13, 1620-14, and 1620-15) corresponding to the Sales object 1612-3.
- Figure 18D illustrates that, in response to user selection of the data field icon 1620-1, corresponding to the data field “Campaign Spend,” from the schema region 1610, and placement of the data field icon 1620-1 into the shelf region 1612, the user interface 250 displays an updated data visualization 1804 (e.g., a bar chart).
- the data visualization 1804 shows a breakdown of campaign spend amount by marketing type (e.g., online and print).
- Figure 18D shows that all the data field icons for the Sales object 1612-3 continue to be grayed out.
- Figure 18E illustrates that, in response to a user interaction (e.g., user selection or mouse hover) with the data field icon 1620-3 corresponding to the data field “Sales Type,” the user interface 250 displays a tooltip 1806 that includes information (e.g., guidance) indicating that the field “Sales Type” is not related to the fields “Campaign Spend” and “Marketing type,” and information that all possible combinations will be displayed should the field “Sales Type” be used.
- Figure 18F shows that when the user selects the data field icon 1620-3 in the schema region 1610 places it in the shelf region, the user interface 250 displays an updated data visualization 1808.
- the data visualization 1808 is a bar graph of campaign spend by marketing type and sales type.
- the computing device generates the data visualization 1808 by (i) duplicating (e.g., replicating) the campaign spend for online marketing for each of the sales type “cash,” “credit card (CC),” and “check” and (ii) duplicating (e.g., replicating) the campaign spend for print marketing for each of the sales type “cash,” “credit card (CC),” and “check.”
- Figure 18F also shows that after the data field icon 1620-3 is placed into the shelf region, the data field icons for the Sales object 1612-3 are no longer grayed out.
- Figures 18F and 18G show that when a user hovers over an indicator 1708-3 that is displayed next to the pill 1810 for sales type, the user interface 250 displays a tooltip 1812 reminding the user that the fields “Sales Type” and “Campaign Spend” are unrelated, and repeating values are shown.
- Figure 18H shows a user interacting with (e.g., selecting or hovering over) the data field icon 1610-13, corresponding to the field “Sales total.”
- the user interface displays a tooltip 1814 with guidance/warning information.
- Figure 181 shows that, in response to user selection of the data field icon 1610- 13 from the schema region 1610 and placement of the icon into the shelf region, the computing device generates a data visualization 1816.
- the data bars on the left portion of the visualization 1816 are the same as the data bars of the data visualization 1808.
- the right portion 1818 of the data visualization 1816 shows sales total broken down by marketing type and by sales type. Because sales type and sales total are related (i.e., they are both data fields of the Marketing logical table), the computing device is able to determine a respective sales total for each of the sales type (cash, credit card, and check). Because sales total is unrelated to marketing type, the computing device duplicates the respective sales total that it obtained a respective sales type for each data value of the field “marketing types.”
- Some aspects of the disclosed implementations extend the current Tableau data model semantics to support multi-fact analysis, by enabling aggregation of measures from multiple fact tables to shared dimensions in different tables in the same visualization (see Figure 4A). This feature enables analysis of data models with multiple snowflake schemas where they share common objects.
- FIG. 4A one of the drawbacks of existing data models is that analysts cannot aggregate measures from different, separate fact tables (e.g., Inventory 402 and Sales 404) to common dimensions (e.g., DimDate 406 and DimSites 408). To compare data from separate fact tables using existing data models, analysts have to generate visualizations corresponding to each fact table and juxtapose them onto a data dashboard.
- the proposed technical solution to this problem preserves the current flexibility of the existing data model while extending its capabilities.
- the proposed solution balances the amount of work that the data modeler needs to do, limiting the amount of additional properties assigned to the new multi-fact data model, thus adding more analytic capabilities without much more user input.
- Some implementations of the present disclosure extend current object model semantics to support multiple snowflake schemas where they share common objects that can be created with data models.
- Some implementations of the present disclosure update query generation to enable separate tree-based queries.
- row-level measures are evaluated by tree.
- Some implementations support all existing row-based calculations, such as level of detail (LOD) calculations and calculations using combined fields and/or multidimensional sets.
- LOD level of detail
- Some implementations of the present disclosure add query generation to allow for consolidating separate tree-based queries together.
- the disclosed devices, methods, and/or user interfaces enable (i) aggregating measures that span multiple trees, (ii) outer joining shared dimensions between trees, and/or (iii) cross joining unshared dimensions between trees.
- Some implementations of the present disclosure impose/present limits on query generation for cross-join of unrelated fields.
- the disclosed devices, methods, and/or user interfaces caution a user against using unrelated dimensions from different sets of facts.
- the disclosed devices, methods, and/or user interfaces caution a user against using unreachable dimensions from measures.
- One of the goals of the data visualization application is to generate a query that comprises dimensions, aggregated measures and/or filters. Some of these fields and filter inputs may be calculations, for which the data visualization application has the formulae.
- Figure 19A illustrates the query generation algorithm.
- the query generation algorithm includes:
- Step 1 - Constructing the Dimension Subquery Create a table consisting of the dimensions. We call this query the dimension subquery.
- Step 2 Constructing Measure Subqueries: For each aggregated measure, create a subquery consisting of the dimensions and the single aggregated measure with the filters applied. We call these queries the measure subqueries.
- Step 3 Combining the Subqueries: Combine the dimension and measure subqueries
- Figure 19B illustrates the dimension subquery construction process (step 1 of the query generation algorithm).
- the objects needed for a set of dimensions, filters and/or measure is the minimum subgraph containing all the objects which contain at least one object field needed to compute a dimension, measure or filter.
- the purpose of the dimension subquery is to ensure we preserve all the dimension values that would appear in a dimension-only query.
- a special case worth noting is a query with no dimensions.
- the query we generate is Table Dee - the table with one row and an empty schema.
- Roughly speakingjoining a table to Table Dee yields the original table.
- a measure query consists of the set of the dimensions and a single aggregated measure.
- the crux of the object model algorithm is to create a table (referred to as a “preaggregation table”) containing the measure and dimensions - with the filters applied - for which it is safe to apply the aggregation.
- the process to construct the pre-aggregation table is the trickiest part of the object model algorithm as it also strives to keep all measure values and recover unmatched dimension values when possible.
- De-duplicating the Results De-duplicate the object join tree by the primary keys of the measure objects and the dimensions. The goal of this step is to ensure the measures are at their native level of granularity.
- the query we compute is to GROUP BY the de-duplication fields and perform an ANY aggregation on the rest of the fields.
- the measure subquery construction process is illustrated in Figure 19C. i. Creating the Object Join Tree [00283]
- the measure core to be the set of objects needed to get all the object fields for the subquery’s measure. Currently, we inner join the objects in the measure core.
- the measure core defines both the granularity of the pre-aggregation table as well as the set of measure rows that we want to keep.
- Figure 19E illustrates the subqueries combination step.
- Figure 19F illustrates performing an outer join on two tables. Unmatched measure values default to nulls (except for COUNT/COUNTD, which evaluate to 0),
- Step 1 Get all the object fields needed for the dimensions, measure and filters. Define the object field subgraph to be the minimum subgraph that contains all these fields.
- the object field in play is [Order ID] from Orders.
- the object field in play is [Customer Age Bracket] from
- the object fields that are needed to compute the calculation inside are [Order Amount] and [State Tax Rate] from Orders and States, respectively.
- the object field subgraph is therefore ⁇ Orders, Customers, Addresses, States ⁇ , illustrated in Figure 20B. For the rest of this example case, we can ignore the existence of ⁇ Line Items, Products ⁇ .
- Step 2 Define the measure core to be the minimum subgraph that contains all the object fields needed to compute the measure.
- the measure core is important because it both encodes the measure’ s granularity as well the set of measure rows that we need to keep.
- Step 1 By our analysis in Step 1, the measure only requires the Orders object - which is the measure core.
- Step 3 For all the dimensions and filters not entirely contained in the measure core, compute the minimum subgraph that:
- the goal of the dimension-measure subgraph will be to add all the dimensions and filters not in the measure core to the measure core in a controlled manner.
- the dimensionmeasure subgraph is ⁇ Orders, Customers, Addresses, States ⁇ . Since this graph shares an object with the measure core (Orders), this graph is enough.
- Step 4 Create the compiled measure subgraph by inner joining all the objects in the measure core. Then, add on the calculations and filters that depend only on objects in the measure core.
- the compiled measure subgraph is just the query representation for Orders.
- Step 5 Create the compiled dimension-measure subgraph by inner joining all the objects that come from the measure core. Then, left join in the rest of the objects.
- Orders is the only object from the measure core. We left join the rest of the objects against Orders. We then layer on the calculated field by creating a new field with its formula. Finally, we add on the filter.
- Step 6 Define the linking fields as the union of: The relationship keys that join the measure objects in the dimension-measure subgraph with the rest of this subgraph. We pick the key from the measure-object side.
- the filter’s input calculation has an input field that falls in Orders. This field is ⁇ [Order Amount] ⁇ .
- the linking keys are ⁇ [Customer FK], [Address FK], [Order Amount] ⁇ .
- Step 7 De-duplicate the compiled dimension-measure subgraph by the dimensions and the linking fields.
- Figure 20D shows the de-duplicated compiled dimension-measure subgraph.
- Step 8 In the simplified algorithm, Step 7 is analogous to the de-duplication step. Unlike with the simplified algorithm, we could not put all the measure core objects underneath this de-duplication step. Without primary keys, we might not be able to craft a group by that also preserves the granularity of the measure core.
- this step acts like a self-join between measure objects that appear in the measure core and those that appear in the dimension-measure subgraph.
- the dimension-measure subquery is at the granularity of the measure objects that it contains.
- Figures 21A, 21B, and 21C illustrate dimension-measure subgraphs in accordance with some implementations.
- Figure 21 A starts with an object field subgraph of ⁇ A,B,C ⁇ and a measure core ⁇ A ⁇ .
- the dimensions depend on ⁇ B ⁇ and ⁇ C ⁇ , which aren’t in the measure core.
- the minimum subgraph that contains these dimensions is ⁇ A,B,C ⁇ . Since this subgraph contains an object from the measure core, we are done.
- Figure 21B starts start with an object field subgraph of ⁇ A,B,C,D ⁇ and a measure core ⁇ A,D ⁇ .
- the dimensions depend on ⁇ B ⁇ and ⁇ C ⁇ , which aren’t in the measure core.
- the minimum subgraph that contains these dimensions is ⁇ A,B,C ⁇ . Since this subgraph contains an object from the measure core, we are done. In this case, we didn’t need to bring in the entire measure core; we need only bring in A.
- Figure 21C starts with an object field subgraph of ⁇ A,B,C,D ⁇ and a measure core ⁇ A,B ⁇ .
- the dimensions depend on ⁇ D ⁇ , which isn't in the measure core.
- the minimum subgraph that contains these dimensions is ⁇ D ⁇ .
- This subgraph doesn’t include an object from the measure core.
- the minimum subgraph that contains the dimension and the measure core is ⁇ B,C,D ⁇ .
- every tree has a root table.
- the root table always starts from the left-most side of an object model.
- a shared tree is a tree that can be traced back to two or more roots.
- a shared tree does not contain any root table.
- Products (logical table 7) is a shared tree because Products can be traced back to the Marketing root table (logical table 1), the Sales root table (logical table 2), and the Support root table (logical table 3).
- any object that is to the right of that shared object is part of a shared tree with that object.
- Products (logical table 7) is a shared object
- Sub-Category (logical table 9) and Category (logical table 10) are to its right
- Products (logical table 7), Sub-Category (logical table 9), and Category (logical table 10) belong to their own tree (e.g., a shared) tree.
- a shared tree refers to a tree that does not include a root object.
- Dates (logical table 4) is also a shared tree. In this example, Dates is also its own tree.
- Shared or “unshared” is an intrinsic property of the object model. Every object in the object model is either shared or unshared. One exception to this is when the dimensions can be collapsed to one tree (see example in Scenario 1 in Table 1 below), so as to maintain backward compatibility. Dimensions that collapse to one tree are considered to be unshared. [00369] Tree traversal is directional. When we trace back to the root, we can only traverse in the left direction. In Figure 22, Parts (logical table 6) is not a shared tree because it can only be traced back to one root when traversed in the leftward direction.
- Scenario 2 Unshared dimensions from one tree: collapse to same semantics as Object Model vl.
- D4 is a shared dimension, if DI and D4 are used together, they would collapse into a single tree.
- Inner join o e.g., inner join DI and D4; or inner join DI, D4, and D7.
- Scenario 2 Unshared dimensions from multiple trees: inner join dimensions from the same tree first, then cross-join from different trees.
- example unshared dimensions from multiple trees are: DI, D2, D3, and D6
- Scenario 2 Shared dimensions from a single shared tree: inner join within the shared dimensions. • From Figure 22, example shared dimensions from a single shared tree are: D7, D9, and DIO
- Scenario 2 Shared dimensions from multiple shared trees: cross-join across trees if they are evaluated to be sharing among different trees.
- Scenario 2 Shared dimensions and unshared dimensions in one tree: inner join within trees (same as Object model semantics vl)
- Join type Inner join DI and D4. This collapses to a single tree.
- Scenario 2,6 Shared dimensions and unshared dimensions in multiple trees: inner join with unshared dimensions in each tree, then outer join shared dimensions between trees.
- Step 1 Constructing the Dimension Subquery (to obtain a dimension core).
- Step 2 Constructing Measure Subqueries
- Step 3 Combining the Subqueries
- Scenarios 2.1 to 2.6 described above are directed to the dimension subquery construction (Step 1 of the query generation algorithm). If a measure is specified (e.g., in the visual specification), a measure sub-query is generated for the measures. That sub-query depends on whether the measure is shared or unshared.
- Scenario 2.8 Shared measure.
- a shared measure is a measure that belongs to different trees.
- a filter is applied to “customer names” (e.g., D5 in Figure 22). Because “customer names” is a dimension field that is is shared with both Sales (logical table 2) and Support (logical table 3), that filter is applied every time a measure is brought in from Sales or Support.
- Figure 24A illustrates a table corresponding to the object model 2200 based on the shared dimension Date.
- First table Shared dimension is date. Sales by Category occur in the months of January, March, and April. Marketing spend is incurred in the months of January, February, March, and May. Outer join semantics is used in this example.
- Figure 24B illustrates a data visualization (e.g., a text table) that is generated when the query consists only of measure fields (i.e., no dimension fields). Notice that there is no further breakdown of the measure values.
- a data visualization e.g., a text table
- Figure 24C illustrates a data visualization (e.g., a text table) that is generated when the query consists only of dimension fields (i.e., no measure fields).
- the query specifies “month” and “category.” Notice that data values for Category (i.e., “Device” and “Transportation”) are replicated for each of the months, each other. This is an example Scenario 2.4 described above.
- Figure 24D illustrates a data visualization (e.g., a text table) from the single fact table Sales. This is essentially a re-capitulation of Object Model VI. Notice that the data visualization shows r sum of sales for month(s) and category where sales is a not a null value. In this example, because there is no sale in the month of February, the data visualization in Figure 24 does not have a data row for the month of February.
- a data visualization e.g., a text table
- Figure 24E illustrates a data visualization (e.g., a text table) when a dimension is added to measures from many trees.
- a data visualization e.g., a text table
- Figure 24F illustrates a data visualization (e.g., a text table) when a filter is applied to filter data values of the dimension field Category to “Device.” Because “Category” is a shared object, applying a filter on this shared object cause it to be applied to all connected trees that are being uses in the visualization.
- a data visualization e.g., a text table
- Figure 24G illustrates a data visualization (e.g., a text table) when a filter is applied to filter data values of the dimension field Marketing Type to “Online Marketing.”
- a filter is applied to filter data values of the dimension field Marketing Type to “Online Marketing.”
- Marketing Type is an unshared dimension
- this filter is only limited to facts in the Marketing table.
- Figure 24H illustrates a scenario where analysts can evaluate aggregate measures across multiple fact trees at the shared dimension level of details.
- analysts can normalize SUM(Sales) by SUM(Spend).
- Figure 24H shows that there can be missing values in either the SUM (Sales) column or the SUM(Spend) column, and the aggregated calculation has to verified and adjusted accordingly.
- Each tree starts with a root object.
- root objects cannot relate to each other.
- all roots must be connected (via shared objects).
- All objects have only one path back to any related tree root. Objects with more than one tree roots are “shared” where its dimensions are shared dimensions. In some implementations, a shared object must belong to at least two trees of the object model (i.e., a shared object does not have to belong to all trees). In some implementations, a shared object must belong to all trees of the object model
- FIG. 25 illustrates an object model according to some implementations.
- Product - Subcategory - Category are related, Marketing is related to the Category object but it cannot resolve to other shared objects: Product - Subcategory.
- Shared objects are context dependent.
- Product - Subcategory are considered shared if their dimensions are used to evaluate measures in Inventory and Sales, but between Inventory and Marketing, Product - Subcategory, dimensions are in the Inventory tree.
- Measure results are evaluated by their single tree membership. For example, measures in shared object(s) (i.e., can belong to multiple trees) need to be identified by tree (e.g., via level of detail calculations). Measures spanning multiple trees are aggregated and their components come from individual trees. Filters are also applied by tree.
- An object that tracks back to two or more root objects are defined as a shared object within the data model.
- a shared object is not a special object, but rather it is defined by the context of analysis, i.e., what else is in the visualization specification or the query.
- Any unshared object can become the root object of its subtree; the layout is determined by the data modeler. Any unshared object within a subtree can be the root.
- Measures in shared objects may need additional information to identify which tree they would aggregate through.
- Filter scope on shared objects is propagated to all affected subtrees.
- Filter scope on unshared objects is limited to their respective subtrees. However, if shared dimension (i.e., from shared objects) are used, and their domain is affected by filter within subtrees in play, then shared dimensions domain is removed from the final overlay results. [00407] Calculations spanning multiple subtrees would require their shared objects.
- Object Model Query was created as an intermediate query representation (between Abstract Query and Logical Query).
- the desired semantics for data models with shared dimensions require that the computer device takes a full Abstract Query that may possibly span multiple trees and compute and combines the Object Model vl subqueries for each tree (“tree subqueries”) and then combines tree subqueries to get the final result.
- tree subqueries Object Model vl subqueries
- the computing device works with a SQLQuery object instead of an Abstract Query object. For the purposes of this disclosure, there are no meaningful differences between them.
- some implementations split up the full SQLQuery into separate SQLQuery objects for each tree. This is accomplished by first computing the Shared Dimension Tree View structure, determining which trees are active, and then for each active tree, the computing device creates a SQLQuery containing all of the objects that reference fields in that tree.
- the computing device may end up duplicating some objects (e.g., if a select column is shared between two trees, that select column should appear in the SQLQuery object for both trees it belongs to. Any tree-agnostic settings should be copied from the full SQLQuery object.
- Compute Tree Subqueries [00412] Some implementations resolve each tree-scoped SQLQuery into the Object Model vl subqueries by looping over the set of tree-scoped SQLQuery objects. Where we use the full Obj ectGraph today, we use the appropriate tree subgraph instead.
- each tree subquery and query component e.g., Order Bys, Top N
- Some implementations hold the tree ID for the tree that thge computing device is currently working with.
- ConstructQueryWithObj ectModel Semanticslmpl and pass it down to where we create the subqueries, adding subqueries and other query components.
- Some implementations modify the API for lObj ectModel QueryBuilder to facilitate adding subqueries for a particular tree, as well as other query components.
- Some implementations modify the reconstruction actions so that they can be associated with a specific tree.
- the nodes must be shared nodes (i.e., adjacent to all active root nodes they are related to). [00419] In some implementations, to determine the set of shared active nodes, the following algorithm is used:
- Figures 28A - 28E provide a flowchart of a method 2600 for generating object models (e.g., data models) that span multiple fact tables according to some implementations.
- the method 2600 is also called a process.
- the method 2600 is performed (2602) at a computing device 200 having a display 208, one or more processors 202, and memory 214.
- the memory 214 stores (2604) one or more programs configured for execution by the one or more processors 202.
- the operations shown in Figures 1 A to 1C, 4A, 4B, 5A to 5D, 6A, 6B, 7, 8A to 8C, 9A to 9C, 10, 11, 12, 13A to 13U, 14A, 14B, 15A to 15C, 16A to 16H, 17A to 17E, 18A to 181, 19A to 19G, 20A to 201, 21 A to 21C, 22, 23, 24A to 24H, 25, 26, and 27 correspond to instructions stored in the memory 214 or other non-transitory computer-readable storage medium.
- the computer-readable storage medium may include a magnetic or optical disk storage device, solid state storage devices such as Flash memory, or other non-volatile memory device or devices.
- the instructions stored on the computer-readable storage medium include one or more of: source code, assembly language code, object code, or other instruction format that is interpreted by one or more processors. Some operations in the method 2600 may be combined and/or the order of some operations may be changed. In some implementations, some of the operations in the method 2600 can be combined with other operations in the method 2700 and/or the method 2800.
- the computing device displays (2606), in a user interface (e.g., the UI displays the logical layer of data source(s)), a first object icon and a second object icon positioned to the right of the first object icon.
- the first object icon represents a first object (e.g., first logical table) of a first data source.
- the second object icon represents a second object (e.g., second logical table) of the first data source.
- the first object icon is connected to the second object icon via a first connector (e.g., link) representing a relationship between the first object and the second object.
- the relationship between the first object and the second object has a first cardinality.
- the first cardinality is (2608) one of: a many-to-many relationship, a many-to-one relationship, and a one-to-many relationship. In some implementations, if the relationship cardinality is unknown, the computing device assumes all relationships are many-to-many.
- the computing device in response to receiving user selection of the first object icon: displays (2610), in the user interface, a plurality of data rows and data columns representing information corresponding to one or more data fields in the third object.
- the first object comprises (2612) a first fact table (e.g., a logical table or a fact subtree).
- a first fact table e.g., a logical table or a fact subtree
- the computing device in response to receiving (2614) a first user input to add a third object (e.g., third logical table), displays, in the user interface, a third object icon representing the third object.
- a third object e.g., third logical table
- the third object comprises (2616) a second fact table that is unrelated to the first fact table (e.g., there is no aggregation path for dimensions and measures between the first fact table and the second for table, or the first fact table and the second fact table are different base tables).
- the third object is (2618) an object of the first data source.
- the third object is (2620) an object of a second data source, distinct from the first data source.
- the computing device in response to receiving (2622) a second user input on the third object icon: in accordance with a determination that the second object and the third object include at least one common data field, generates and displays, in the user interface, a second connector connecting the third object icon to the second object icon.
- the second connector represents a relationship between the third object and the second object.
- the relationship between the third object and the second object has a second cardinality.
- the second cardinality is (2624) one of: a many-to- many relationship, a many-to-one relationship, and a one-to-many relationship. In some implementations, if the relationship cardinality is unknown, the computing device assumes all relationships are many-to-many.
- the second user input comprises (2626) user selection of at least a portion (e.g., an edge or a side) (e.g., a circular icon that user can “drag” a line out of) of the third object icon.
- the computing device in response to (2628) the user selection, generates and displays, in the user interface, a freeform line.
- a first end of the line is connected to the third object icon and a second end of the line corresponds to a position of a mouse cursor in the user interface.
- the user can” search” the existing object model that is displayed in the user interface, to determine if there are relevant/related objects that the second object can relate to.
- the freeform line becomes a connector line (e.g., the second connector) connecting two object icons when the computing device determines that the two object models corresponding to the two object icons include at least one related (e.g., common) data field.
- generating and displaying the second connector further comprises converting (2630) the second object from a subtree of the first object to a shared object, (e.g., that is shared between a first tree to which the first object belongs and a second tree to which the second object belongs)
- the shared object comprises a logical table consisting (2632) of one or more dimension data fields.
- a dimension table is a logical table that consists of just dimension data fields (i.e., there are no measure data fields in a dimension table).)
- the shared object comprises a logical table consisting of dimension fields and measure fields.
- the at least one common data field comprises (2634) a geographic data field.
- geographic data field examples include country, region, state, province, city, postal code, longitude, or latitude.
- the at least one common (e.g., related) data field comprises (2636) a date/time data field (e.g., month, date, year, or day).
- the computing device after connecting the third object icon to the second object icon via the second connector, and in response to receiving a user interaction (e.g., a hover action) with the second connector, displays (2638) an identification of the at least one common data field.
- a user interaction e.g., a hover action
- the computing device after connecting the third object icon to the second object icon via the second connector, vertically aligns (2640) (e.g., arranges the icons in a column) the first object icon and the third object icon for display in the user interface.
- the computing device after connecting the third object icon to the second object icon via the second connector, arranges (2642) the first object icon and the third object icon in an alphabetical order for display in the user interface.
- the computing device displays (2644), in the user interface, a fourth object icon representing a fourth object.
- the fourth object icon is connected to the second object icon via a third connector representing a relationship between the fourth object and the second object.
- the relationship between the fourth object and the second object has a third cardinality.
- the fourth object icon is connected to a fifth object icon, representing a fifth object, via a fourth connector representing a relationship between the fourth object and the fifth object.
- the relationship between the fourth obj ect and the fifth obj ect has a fourth cardinality.
- the third connector and the fourth connector include an overlapping portion.
- the third cardinality is (2646) one of: a many-to-many relationship, a many-to-one relationship, and a one-to-many relationship. In some implementations, if the relationship cardinality is unknown, the computing device assumes all relationships are many-to-many.
- the fourth cardinality is (2648) one of: a many-to- many relationship, a many-to-one relationship, and a one-to-many relationship. In some implementations, if the relationship cardinality is unknown, the computing device assumes all relationships are many-to-many.
- the computing device in response to receiving (2650) a user interaction (e.g., a hover action) with the overlapping portion of the third connector and the fourth connector, concurrently displays (i) an identification of a first related data field relating the fourth object and the second object and (ii) an identification of a second related data field relating the fourth object and the fifth object.
- the first object icon, the second object icon, the third object icon, the fourth object icon, and the fifth object icon are (2652) distinct icons.
- the first related data field and the second related data field are (2654) distinct data fields.
- the computing device in response to user selection (2656) of the identification of the first related data field relating the fourth object and the second object, the computing device simultaneously visually emphasizes the fourth object, the second object, and the third connector.
- the computing device displays (2658), in the user interface: a fourth object icon representing a fourth object, a fifth object icon representing a fifth object, and a third connector connecting the fourth object icon and the fifth object icon.
- the third connector represents a many-to-many relationship between the fourth object and the fifth object.
- the fourth object icon, the fifth object icon, and the third connector are (2660) not connected to any of the first object icon, the second object icon, and the third object icon.
- the computing device in response to receiving (2662) a third user input on the fifth object icon, generates and displays, in the user interface, a freeform line.
- a first end of the line is connected to the fifth object icon and a second end of the line corresponds to the position of a mouse cursor in the user interface.
- the computing device in response to receiving (2664) an interaction between the second end of the line and the second object icon: converts the freeform line into a third connector connecting the fifth object icon and the second object icon.
- the third connector represents a many-to-many relationship between the fifth object and the second object.
- the first object icon, the second object icon, and the third object icon are (2666) displayed in a first portion of the user interface.
- the fourth object icon and the fifth object icon are displayed in a second portion of the user interface.
- Converting the freeform line into a third connector connecting the fifth object icon and the second object icon includes: redisplaying the fourth object icon and the fifth object icon in the first portion of the user interface.
- Figures 29A - 29D provide a flowchart of a method 2700 for performing guided analysis using multi-fact object models (e.g., data models), in accordance with some implementations.
- the 2700 is also called a process.
- the method 2700 is performed (2702) at a computing device 200 having a display 208, one or more processors 202, and memory 214.
- the memory 214 stores (2704) one or more programs configured for execution by the one or more processors 202.
- the operations shown in Figures 1 A to 1C, 4A, 4B, 5A to 5D, 6A, 6B, 7, 8A to 8C, 9A to 9C, 10, 11, 12, 13A to 13U, 14A, 14B, 15A to 15C, 16A to 16H, 17A to 17E, 18A to 181, 19A to 19G, 20A to 201, 21 A to 21C, 22, 23, 24A to 24H, 25, 26, and 27 correspond to instructions stored in the memory 214 or other non-transitory computer-readable storage medium.
- the computer-readable storage medium may include a magnetic or optical disk storage device, solid state storage devices such as Flash memory, or other non-volatile memory device or devices.
- the instructions stored on the computer-readable storage medium include one or more of source code, assembly language code, object code, or other instruction format that is interpreted by one or more processors. Some operations in the method 2700 may be combined and/or the order of some operations may be changed. In some implementations, some of the operations in the method 2700 can be combined with other operations in the method 2600 and/or the method 2800.
- the computing device displays (2706), in a user interface (e.g., in a schema region of the user interface), a plurality of data field icons corresponding to a plurality of data fields.
- a user interface e.g., in a schema region of the user interface
- Each of the data fields is associated with a respective object (e.g., a logical table) of a plurality of objects (e.g., plurality of logical tables) in an object model.
- the computing device in response (2708) to receiving (i) user selection of a first data field icon, from the plurality of data field icons, corresponding to a first data field, and (ii) placement of the first data field icon in a shelf region of the user interface, generates and displays a first data visualization in the user interface.
- the first data field is associated with a first object of the plurality of objects.
- generating the first data visualization includes executing (2710) a first query that specifies an aggregation of data values of the first data field (or aggregation of data values of the first data field according to a first dimension data field) (e.g., aggregate campaign spend, or aggregate campaign spend by marketing type).
- a first query specifies an aggregation of data values of the first data field (or aggregation of data values of the first data field according to a first dimension data field) (e.g., aggregate campaign spend, or aggregate campaign spend by marketing type).
- the computing device updates (2712) a visual characteristic (e.g., a visual appearance) of a subset of (one or more) the plurality of data field icons (e.g., the subset of data fields are associated with a third object of the plurality of objects) that are displayed in the user interface from a first visual characteristic to a second visual characteristic.
- Each data field icon in the subset of data field icons is (2714) associated with a second object of the plurality of objects, distinct from the first object.
- the subset of data field icons are (2716) user-selectable independent of the first or second visual characteristic. In other words, the data field icons selectable when their appearance corresponds to either the first visual characteristic or the second visual characteristic (e.g., user-selectable when the subset of data field icons have the first visual characteristic or the second visual characteristic).
- updating the visual characteristic of the subset of data field icons from the first visual characteristic to the second visual characteristic includes (2718) visually de-emphasizing (e.g., graying out) the subset of data field icons relative to other data field icons in the plurality of data field icons while maintaining user-selectability (e.g., clickable) of the subset of data field icons.
- the computing device in response to a user interaction (e.g., hover over) with a second data field icon from the subset of data field icons, corresponding to a second data field [e.g., “Sales type”] of the plurality of data fields (the second data field is associated with a second object), displays (2720) information that the second data field is unrelated to the first data field.
- a user interaction e.g., hover over
- a second data field icon e.g., “Sales type”
- the second data field is associated with a second object
- the computing device in response to receiving (i) user selection of a second data field icon (e.g., “Sales type”) from the subset of data field icons, corresponding to a second data field of the plurality of data fields, and (ii) user placement of the second data field icon in the shelf region, generates (2722) and displays a second data visualization in the user interface.
- generating the second data visualization includes executing (2724) a first query that duplicates (e.g., replicates or reproduces), for each data value of the third data field, the aggregated data values of the first data field.
- the computing device concurrently while displaying the second data visualization, displays (2726), in the shelf region, a warning visual indicator adjacent to the first data field icon (and/or the second date field icon).
- the computing device displays (2728) information that the second data field is unrelated to the first data field.
- the method 2700 further includes, after updating (2730) the visual characteristic of the subset of data field icons to the second visual characteristic: in response to receiving (i) user selection of a third data field icon from the plurality of data field icons, wherein the third data field icon corresponds to a third data field and is not a data field icon from the subset of data field icons and (ii) placement of the third data field icon in the shelf region: executing a second query that specifies an aggregation of data values of the first data field according to the third data field to generate a third data visualization.
- the third data field is (2732) a shared data field between the first object and the second object.
- the third data field is (2734) associated with a dimension logical table that consists of one or more dimension data fields.
- a dimension logical table is a logical table that contains only dimension data fields (i.e., it does not contain any measure data field.)
- the third data field is (2734) associated with a logical table that includes one or more dimension fields and one or more measure fields.
- the third data field is (2736) a dimension data field.
- the third data field is (2738) a geographic data field.
- the third data field is (2740) a date/time data field.
- the method 2700 includes displaying (2742), in the user interface, the third data visualization.
- the computing device concurrently while displaying the third data visualization, the computing device updates (2744) (e.g., restores) a visual characteristic of the subset of data fields from the second visual characteristic to the first visual characteristic.
- updates (2744) e.g., restores
- updating the visual characteristic of the subset of data fields from the first characteristic to the second characteristic comprises visually deemphasizing (e.g., graying out) the subset of data field icons relative to other data field icons of the plurality of data field icons.
- updating (restoring) a visual characteristic of the subset of data fields from the second characteristic to the first characteristic includes restoring a view of the user interface to a state prior to the visual de-emphasis.
- the method 2700 further includes after displaying the third data visualization: in response to receiving (i) user selection of a fourth data field icon from the subset of data field icons, corresponding to a fourth data field, and (ii) placement of the fourth data field icon in the shelf region: executing (2746) a third query that specifies an aggregation of data values of the fourth data field [“Sales total”] according to the third data field [“Product name”] to generate a fourth data visualization.
- the method includes displaying (2748), in the user interface, the fourth data visualization.
- the fourth data visualization is (2750) concurrently displayed with the third data visualization in the user interface.
- the third data visualization and the fourth data visualization share (2752) a common data axis.
- Figures 30A - 30D provide a flowchart of a method for generating data visualizations using multi-fact object models (e.g., data models), in accordance with some implementations.
- the 2800 is also called a process.
- the method 2800 is performed (2802) at a computing device 200 having a display 208, one or more processors 202, and memory 214.
- the memory 214 stores (22804) one or more programs configured for execution by the one or more processors 202.
- the operations shown in Figures 1 A to 1C, 4A, 4B, 5A to 5D, 6A, 6B, 7, 8A to 8C, 9A to 9C, 10, 11, 12, 13A to 13U, 14A, 14B, 15A to 15C, 16A to 16H, 17A to 17E, 18A to 181, 19A to 19G, 20A to 201, 21 A to 21C, 22, 23, 24A to 24H, 25, 26, and 27 correspond to instructions stored in the memory 214 or other non-transitory computer-readable storage medium.
- the computer-readable storage medium may include a magnetic or optical disk storage device, solid state storage devices such as Flash memory, or other non-volatile memory device or devices.
- the instructions stored on the computer-readable storage medium include one or more of: source code, assembly language code, object code, or other instruction format that is interpreted by one or more processors. Some operations in the method 2800 may be combined and/or the order of some operations may be changed. In some implementations, some of the operations in the method 2800 may be combined with other operations in the method 2600 and/or the method 2700.
- the computing device receives (2806) (e.g., via a user interface, such as user interface 2332) a first user input specifying a first dimension data field and a second dimension data field for generating a first data visualization.
- At least one of the first dimension data field or the second dimension data field is (2808) a geographic data field.
- At least one of the first dimension data field or the second dimension data field is (2810) a date/time data field.
- the computing device determines (2812) that the first dimension data field belongs to a first object (e.g., a first logical table) of an object model and the second dimension data field belongs to a second object (e.g., a second logical table) of the object model, distinct from the first object.
- a first object e.g., a first logical table
- a second object e.g., a second logical table
- the computing device constructs (2814) a dimension subquery according to characteristics of the first dimension data field, the second dimension data field, the first object, and/or the second object.
- the characteristics include whether the first dimension data field is a shared dimension or an unshared dimension, and/or whether the second dimension data field is a shared dimension or an unshared dimension, whether the first object can be traced back to a single root table or to multiple root tables, and/or whether the second object can be traced back to a single root table or to multiple root tables.
- the computing device determines (2816) a join type (e.g., inner join, cross join, outer join left join, right join) for combining (i) first data rows that includes data values of the first dimension data field and (ii) second data rows that includes data values of the second dimension data field.
- a join type e.g., inner join, cross join, outer join left join, right join
- the computing device constructs (2824) the dimension subquery according to the determined join type.
- the dimension subquery references the first object and the second object;
- the computing device executes (2820) the dimension subquery against one or more data sources corresponding to the first dimension data field and the second dimension data field to retrieve first tuples that comprise unique ordered combinations of data values for the first dimension data field and the second dimension data field.
- the one or more data sources comprise (2822) a plurality of data sources.
- the computing device constructs (2824) one or more measure subqueries.
- Each of the measure subqueries references one or more measure data fields in the object model;
- the computing device executes (2826) the one or more measure subqueries to retrieve second tuples;
- the computing device forms (2828) extended tuples by combining the retrieved first tuples and the retrieved second tuples.
- the computing device generates (2830) and displays the first data visualization according to the extended tuples.
- the first dimension data field and the second dimension data field are unshared dimensions from one tree (e.g., Scenario 2.1 in Section IV.C.l.b).
- constructing the dimension subquery according to the characteristics of the first dimension data field, the second dimension data field, the first object, and/or the object includes: in accordance with a determination (2832) by the computing device that (i) the first dimension data field can be traced to one (e.g., one and only one) (i.e., a single) root object (e.g., a fact table) (e.g., by traversing in the leftward direction in the object model) and (ii) the second dimension data field can be traced to the same root object (i.e., the first and objects belong to the same root object), the computing device combines data columns of the first dimension data field and the second dimension data field using an inner join.
- a dimension data field that can be traced to only one root object is an unshared dimension data field (i.e., it is not shared by other fact tables).
- the first dimension data field can be DI (e.g., Marketing Type)
- the second dimension data field can be D4 (e.g., dates).
- DI and D4 are unshared dimensions from one tree, the query semantics for constructing the dimension query are same semantics as Object Model vl (see, e.g., Section IV.B).
- the first dimension data field and the second dimension data field are unshared dimensions from multiple trees, as described in Scenario 2.2 in Section IV.C.1.b.
- constructing the dimension subquery according to the characteristics of the first dimension data field, the second dimension data field, the first object, and/or the second object includes: in accordance with a determination by the computing device that (i) the first dimension data field can be traced to a first root object and (ii) the second dimension data field can be traced to a second root object that is distinct from the first root object (e.g., and the second object is not a shared object of the first root object, the computing device forms (2834) a first object tree that includes the first object and the first root object, and combines data columns from objects of the first object tree according to data values of the first dimension data field using an inner join to form a first table.
- the computing device forms (2836) a second object tree that includes the second object and the second root object, and combines data columns from objects of the second object tree according to data values of the second dimension data field using an inner join to form a second table.
- the computing device combines (2838) data columns of the first table and the second table via a cross join.
- DI e.g., Marketing type
- D6 e.g., Parts
- the computing device forms a marketing object tree (in this case, the marketing tree consists of just the marketing object) (result 1), (ii) forms a support tree that includes the parts object (logical table 6) and the Support object (i.e., the root object) and inner joins all the dimensions using Parts (result 2), and cross joins result 1 and result 2.
- the first object is a first root object.
- the second object is a second root object that is distinct from the first root object.
- the first dimension data field and the second data dimension data field are shared dimensions from a single shared tree, as described in Scenario 2.3 in Section IV.C. l.b.
- constructing the dimension subquery according to the characteristics of the first dimension data field, the second dimension data field, the first object, and/or the second object includes: in accordance with a determination by the computing device that the first dimension data field and the second dimension data field belong to the same object (e.g., a shared tree) that is shared by two or more root objects (e.g., and the first object is not a root object, and the second object is not a root object): the computing device combines data columns of the first dimension data field and the second dimension data field using an inner join.
- the first dimension data field can be D7 and the second dimension data field can be D9.
- the first dimension data field and the second dimension data field are shared dimensions from multiple shared trees, as discussed with respect to Scenario 2.4 in Section IV.C. l.b.
- constructing the dimension subquery according to the characteristics of the first dimension data field, the second dimension data field, the first object, and/or the second object includes: in accordance with a determination by the computing device that (i) the first object (of which the first dimension data field belongs) is shared by a first set of root objects (e.g., two or more root objects) and (ii) the second object (of which the second dimension data field belongs) is shared by a second set of root objects (e.g., two or more root objects), the computing device combines data columns of the first dimension data field and the second dimension data field using a cross join.
- the first set of root objects is the same as the second set of root objects. In some implementations, the first set of root objects and the second set of root objects has at least one distinct root object.
- the first dimension data field can be D4 and the second dimension data field can be D5.
- the first dimension data field is an unshared dimension data field
- the second dimension data field is a shared dimension data field
- the first and second dimension data fields belong to the same tree, as discussed with respect to Scenario 2.5 in Section IV.C. l.b.
- constructing the dimension subquery according to the characteristics of the first dimension data field, the second dimension data field, the first object, and/or the second object includes, in accordance with a determination by the computing device that (i) the first object (of which the first dimension data field belongs) is a first root object (meaning that the first object (e.g., first root object), and therefore the first dimension data field, is not shared), (ii) the second object can be traced to the first root object, and the (iii) the second dimension data field is not shared by another root object, the computing device combines (2844) data columns of the first dimension data field and the second dimension data field using an inner join.
- the first dimension data field can be DI and the second dimension data field can be D4.
- Each of the above identified executable modules, applications, or sets of procedures may be stored in one or more of the previously mentioned memory devices, and corresponds to a set of instructions for performing a function described above.
- the above identified modules or programs i.e., sets of instructions
- the memory stores a subset of the modules and data structures identified above.
- the memory may store additional modules or data structures not described above.
- the phrase “based on” does not mean “based only on,” unless expressly specified otherwise. In other words, the phrase “based on” describes both “based only on” and “based at least on.”
- exemplary means “serving as an example, instance, or illustration,” and does not necessarily indicate any preference or superiority of the example over any other configurations or implementations.
- the term “and/or” encompasses any combination of listed elements.
- “A, B, and/or C” includes the following sets of elements: A only, B only, C only, A and B without C, A and C without B, B and C without A, and a combination of all three elements, A, B, and C.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- User Interface Of Digital Computer (AREA)
Applications Claiming Priority (5)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US202363464911P | 2023-05-08 | 2023-05-08 | |
| US18/424,505 US12505128B2 (en) | 2023-05-08 | 2024-01-26 | Creation and consumption of data models that span multiple sets of facts |
| US18/424,592 US12411872B2 (en) | 2023-05-08 | 2024-01-26 | Infoscenting fields for multi-fact data model analysis using shared dimensions |
| US18/424,619 US12373467B2 (en) | 2023-05-08 | 2024-01-26 | Query semantics for multi-fact data model analysis using shared dimensions |
| PCT/US2024/028206 WO2024233578A2 (en) | 2023-05-08 | 2024-05-07 | Creation and consumption of data models that span multiple sets of facts |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| EP4710213A2 true EP4710213A2 (de) | 2026-03-18 |
Family
ID=98827764
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| EP24733056.6A Pending EP4710213A2 (de) | 2023-05-08 | 2024-05-07 | Erzeugung und verbrauch von datenmodellen, die mehrere faktenmengen spannen |
Country Status (1)
| Country | Link |
|---|---|
| EP (1) | EP4710213A2 (de) |
-
2024
- 2024-05-07 EP EP24733056.6A patent/EP4710213A2/de active Pending
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US12050598B2 (en) | Dynamic dashboard with guided discovery | |
| JP7179981B2 (ja) | 選択したデータソースのオブジェクトモデルに従ったデータ視覚化の生成 | |
| US12488052B2 (en) | System and method for visually analyzing row-level calculations for data visualizations across multiple data tables including displaying separate tabs for the row-level calculations and visual data marks summary | |
| US9619581B2 (en) | Constructing queries for execution over multi-dimensional data structures | |
| US12373467B2 (en) | Query semantics for multi-fact data model analysis using shared dimensions | |
| US7734619B2 (en) | Method of presenting lineage diagrams representing query plans | |
| US20090024940A1 (en) | Systems And Methods For Generating A Database Query Using A Graphical User Interface | |
| CN111542813A (zh) | 使用异构数据的对象模型来便于建立数据可视化 | |
| US20070260582A1 (en) | Method and System for Visual Query Construction and Representation | |
| US11868406B2 (en) | Smart interactions for a digital duplicate | |
| US12314233B2 (en) | Processes and systems for onboarding data for a digital duplicate | |
| US11599533B2 (en) | Analyzing data using data fields from multiple objects in an object model | |
| US11232120B1 (en) | Schema viewer searching for a data analytics platform | |
| JP2006172446A (ja) | 複合データアクセス | |
| US11809459B2 (en) | Using objects in an object model as database entities | |
| EP4710213A2 (de) | Erzeugung und verbrauch von datenmodellen, die mehrere faktenmengen spannen | |
| WO2024233578A2 (en) | Creation and consumption of data models that span multiple sets of facts | |
| CA3141742C (en) | Digital duplicate | |
| CA3141742A1 (en) | Digital duplicate |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: UNKNOWN |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE |
|
| PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
| STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
| 17P | Request for examination filed |
Effective date: 20251029 |
|
| AK | Designated contracting states |
Kind code of ref document: A2 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC ME MK MT NL NO PL PT RO RS SE SI SK SM TR |