Multidimensional data model, CACHE as an object-oriented database...

Multidimensional Data Model

Generally speaking, MMD can also interact with a relational database, therefore this section is also relevant in Ch. 5. However, interaction with the object-oriented model is preferable, and therefore consider MMD here.

A multidimensional model is used, as noted earlier, in data warehouses. There are three types of MMD: MOLAP, ROLAP, HOLAP.

MOLAP assumes the formation of the so-called multidimensional cube (hypercube). A three-dimensional cube is shown in Fig. 7.4.

Each of the cube's coordinates is called measurement. Measurements are implemented using indexes, which are known to dramatically increase the speed of data access. According to various estimates, this speed is 10-100 times higher than in the relational data models. The result value in the multidimensional model is placed at the intersection (in the cell) of the corresponding measurement values.

Three-dimensional cube

Fig. 7.4. Three-dimensional cube

An analogue of such a model can be considered the assignment of a function, for example, Z = f ( x, y ). In MMD, the coordinates x and y are measurements, and Z is the result value. As in the function of several variables, MMD must have the ability to "move" both within a single dimension and a transition from one dimension to another (for example, from the lines to the plane ).

For increased access speed, you have to pay an increased amount of memory. The ratio between the useful and required memory sizes in a multidimensional data model is 5-10 times greater than in the relational data model. In addition, there are a lot of empty cells in MMD, and special effective storage programs require NULL values. It is desirable to delete empty cells. In the multidimensional model, four basic operations are performed: aggregation (convolution) of data; disaggregation (detailing) of data; Section (for fixed values ​​of one or several measurements); Rotation (for the two-dimensional case it is an analog of the matrix transposition).

ROLAP uses relational components & quot ;. Two structure schemes are possible: star (Figure 7.5) and snowflake (Figure 7.6).

Scheme

Fig. 7.5. Scheme Star

Scheme

Fig. 7.6. Scheme & "Snowflake

In any scheme, one factual table is selected, in which, in fact, the data is stored, and several reference tables, each of which characterizes one of the dimensions of the cube. When using the star Denialization is required.

ROLAP allows to give a characteristic of the dimension of the data warehouse (Table 7.3).

Table 7.3

Data Warehouse Dimension

Dimension

Limit Volume, Gbyte

Number of rows in the fact table, mln

Small

3

1-10

Average

25

11-100

Large

200

101-999

Extra large

Over 200

1000 and more

Comparison of MOLAP and ROLAP gives the following results:

• MOLAP has a high performance, but there are problems with storing large amounts of data;

• ROLAP has no limitations on the amount of data, but it has a much slower performance.

Hence the idea of ​​building HOLAP, combining the merits of MOLAP and ROLAP.

HOLAP. The fact is that all HD data is never required at the same time. Each time only part of it is used. In this connection, it is expedient to divide the data into subject sub-areas, which are called kiosks (stores, display cases) of data. Data Kiosk [17] - a specialized thematic repository serving one of the company's activities.

In this case, the central repository can be implemented using a relational database, and the master data is stored in multiple kiosks.

CACHE as an object-oriented database management system

Let's show some features of the construction of an object-oriented database on the example of CACHE.

First of all, let's pay attention to the object-oriented (object-based) construction of the database itself within the CACHE database, since the object-based construction of the user interface and the application algorithm was discussed in detail in Ch. 1.

The CACHE database uses the object specific terminology, somewhat different from the system of definitions of the object-oriented approach.

To make it clear what is at issue, let's summarize this terminology in the table (Table 7.4), from which the features of the OODB structure (OODBMS) are visible.

1. The cells of the tables, as in the extended ORDB, are not atomic. The cells can be collections (lists, arrays) that correspond to the List and multidimensional sets (MULTISET) in the extended object-relational database.

2. Links between classes are set via pointers (OID, OREF).

3. Common subtable tables of several tables are formed as built-in objects (analog of abstract classes ROW, and first of all - ROW TYPE of object-relational databases).

4. It is possible to store large objects.

Table 7.4

Comparative Terminology

CACHE

Relational databases

Object-oriented approach

Class

Table

Class

Instance instance (object)

Line

-

OID ID

Key

-

Property

Column, field

Property

Link to a stored object

Foreign key

-

Embedded object

Duplicate fields used in multiple tables

-

Class method

Stored procedure

Method

Collection-list

Column with a combo box

-

Array collection

Subtable

-

Request

A stored procedure or view

-

Data Flow

BLOB

-

Consider building the database itself.

In accordance with the object-oriented approach, we should consider the concept of class and associated terms (property, method).

The concept of class in the CACHE DBMS it is rather multifaceted (Figure 7.7). First of all, the data type classes and the data classes themselves (object classes, tables) are distinguished.

Types of classes

Fig. 7.7. Types of classes

The term class in the CACHE DBMS has two understandings: the object; set (a subset).

Data type classes (including those created by users) are attributed to attributes (fields, properties) of objects (tables) that receive data of one type or another as values. These classes can not contain properties and create instances, do not have their own identification (for references), but they have methods. Each data type is a class. Classes of data types can also be created by users.

There are three types of data types: storage in a storage, logical in the computer's memory, display on the monitor screen. Formats can be converted into each other.

Data type classes are provided through the data type interface (with their parameters). We list the main types of data (in logical format):% Library.String,% Library.Binary,% Library.Boolean, %% Library.List (list format), Library.Numeric,% Library. Integer,% Library.Float,% Library.Name (name in the format Last Name Name ),% Library. Date,% Library.Time,% Library.Currency (money).

An object is called an instance of a class (string), in contrast to the implementation of a component in the corresponding container (as, for example, in the Delphi software product).

The object class defines the data structure and behavior of objects of the same type. A class of objects is characterized by the class name, properties, methods.

In the object classes, unregistered and registered classes are distinguished by the nature of the processes.

Unregistered classes do not support polymorphism and do not have automatic control. The assignment of identifiers (OIDs) and object references (OREF) is performed by the user.

Registered classes support polymorphism and are managed automatically from the% Library system class. RegisteredObject (similar to the TObject class in Delphi). Instances of this class exist temporarily in the process memory (temporary objects).

In terms of data storage, registered and unregistered classes are divided into stored and embedded.

Stored classes of objects are inherited from the system class% Library. Persistent and able to be permanently stored in the object's memory. Instances of such classes have single-valued object identifiers OID. A stored object can also be used as a class property (column of a table) of another object. In other words, a reference to this object is possible, which corresponds to the connection I: M of two tables of a relational database.

Embedded classes of objects inherit their behavior from the system class % Library.SerialObject and can only be saved as part of the corresponding stored objects. The built-in object in memory is characterized by an object reference, stored in a database in a sequential form (a kind of collection-array) as part of the stored object, with no OID.

Building object classes is possible in dialog mode or with programming languages ​​(command line, program). In dialog mode, it is possible to use a visual language (an analogue of QBE for queries) or the accompanying CACHE Definition Language (CDL), an analog of SQL in relational databases.

Constants (regardless of class type), built-in objects, object references, BLOB data streams, collections, multidimensional variables, bidirectional links between stored objects appear as properties (Figure 7.8)

BLOB data flows have two varieties for character (CHARACTERSTREAM) and binary (BINARYSTREAM) data (similar to CLOB and BLOB for hybrid ADRs).

Collections can be a list (List Collection) and an array (Array Collection). They can include constants, built-in objects and references, which are specified by % Library. ListOfDataTypes,% Library.ListOfObjects,% Library.ArrayOfDataTypes,% Library.ArrayOf Objects.

Property Types>

Fig. 7.8. Property Types

In the array-array, the ordering is performed on the field accepted as the key.

In the collection-list, the item's position in the list is the key.

The relationship is bi-directional - reciprocal links between tables. They guarantee referential integrity.

There are temporary (buffer memory) and computed properties.

Methods, as noted, are related to the data type. The default data type is String.

The arguments to the method are passed by default, by value, and the argument must be preceded by a & to pass by reference. Methods, like properties, can be defined as public or private.

Select the methods of the class and the methods of the instances of the class (reference is ## this). The latter are used more often.

The following are distinguished in the object model:

• method-code (code of programs in CACHE ObjectScript);

• an expression method (one program code expression), in which parameters are passed by reference, using macros, embedded SQL fragments are not allowed;

• method-call (subroutines);

• The method generator used to compile the program and classes (so that classes can be used).

Properties are characterized by class name, property name, data type, keywords, parameters for data types. Properties can be public and private.

The property is associated with a set of methods that has two sources:

1) a class of properties (access methods Get () and Set ());

2) the data type.

For CACHE classes, along with concepts such as the "property," the method, "are specific concepts" class parameters "," queries "," indexes. "

Class parameters are used when compiling.

Query - operations with sets of instances of classes. The result of the query is available through a special interface for processing ResultSet query results. Requests can take the form of stored procedures.

Index is the path to the instance of the class. It is used to improve the speed of query execution. An index is created based on one or more properties (fields).

The interaction of the described components of the DBMS (DB) is provided by programming languages.

There are three types of access in CACHE (see Figure 7.9).

Types of access to the CACHE database

Fig. 7.9. Types of access to the CACHE database

Object access is done in a dialog or using the CACHE ObjectScript programming language.

A multidimensional data model (MMD) is used in direct access. It is in the multi-dimensional structure of the database core that data is stored.

The essence of MMD actually consists in setting the values ​​of some function of a set of coordinates.

Let the coordinates have the current values ​​ i , j, k, l respectively, and the value of the functions in this point - A .. kG Then this value can be defined by a multidimensional matrix or set

i , j, k, l , A ijkl & gt;.

Within the CACHE database, this fact is written in the form A ( i, j, k, l ) = A ijkl.

The question arises of the realization of such a model. To do this, use the B * -tree-hierarchical B-tree, in each node of which there is a block. In this case, the data in the nodes is represented by coordinates and the values ​​(Figure 7.10).

In * -tree

Fig. 7.10. In the * -tree

As coordinates it is convenient to use indexes, and within the framework of CACHE - global variables ( globals ).

MMD is not considered as a tool for implementing applications, but serves as the basis for data storage and is used in the construction of data warehouses (OLAP systems).

Of particular interest from the pumping data from a relational DBMS into an object-oriented representation of SQL-access. The basis of SQL-access is the programming language SQL2 (or SQL92). This language is for relational databases. DBMS CACHE, as shown in Ch. 2, differs in structure from relational DBMS. The main differences are related to non-atomicity of the DB cells. In such cells, there may be a built-in class (table) or collection.

In addition, it is necessary to implement inheritance procedures.

Thus, you should transform the SQL2 language so that it is possible to operate with non-atomic cells and provide an inheritance procedure.

This is the path (and not the way to create a new SQL3 language), the creators of the CACHE database went.

First of all, it should be noted that within a single CACHE architecture, each object representation class is a table with the same name, and each relational paradigm table is an object. The correlation of objects and relational analogs is given in Table. 7.4.

Note that in the relational database there are no exact analogs of methods and parameters of classes and instances of classes, temporary properties. On the other hand, there are triggers in the relational representation that are not needed in the object environment.

Instances of objects correspond to records identified by the primary key. For stored objects, it is possible to use the ID of the object (an approximate analog of the field with the data type counter ) as such a key.

Inheritance in the object view is transformed into a set of source and inheritable tables.

By default, the names of the object property and the relational field are the same. Renaming is possible with the SQLFIELDNAME keyword in the definition of the object property.

Object properties-constants (atomic cells) can be temporal, multidimensional and computable. The calculated properties are not automatically displayed in the tables. In the relational interpretation for this, it is necessary to provide calculated fields. The latter can be transformed into computed properties.

It's more convenient to place the computations in class methods, which can be called from both the methods for calculating properties, and how the fields are calculated.

References to stored objects are performed by a foreign key.

Embedded objects are characterized by a compound field name using an underscore (& lt; name of the class in which the object is embedded & gt; _ & lt; property name & gt;), i.e. Addres_Ulica.

The list collection is displayed as a separate field containing a list of values.

The collection-array is represented as a separate subtable, associated with the main table through a foreign key. The subtable name is the same as the composite: & lt; class name & gt; _ & lt; collection name & gt;.

The stored procedures and views correspond to the object queries. Keywords - SQLPROC, SQLVIEW, SQLVIEWNAME.

Class methods can be displayed in code in the form of calculated fields or as stored procedures. Triggers defined in the object representation in the CDL language or within the CACHE Object Architect can also be used.

Highlight embedded and embedded SQL languages. The structure of the SQL2 language (interface and nested) is displayed in many sources.

In this regard, we focus on the differences in the SQL language in CACHE (CACHE SQL) from SQL2. The extension of the nested SQL2 language takes place in the following directions:

1) additional operators;

2) the CURSOR object;

3) list fields;

4) connections (links, dependency relations).

1. Additional operators are:

= * - external connection;

- & gt; - implicit connection;

_, # - concatenation (names) and integer remainder of division (module);

? - the operator of verification by template;

(is an occurrence statement;

& amp; - AND operator;

! - OR operator;

] - the following statement.

It is possible to use both single and double quotes; prefixes not before logical operators (not =, not & lt ;, not & gt;).

2. In addition to objects such as tables, views, stored procedures, indexes, constraints, generators, triggers, CACHE uses an SQL2 object called CURSOR.

CURSOR is some data set, formed most often by the select statement. It contains several entries and differs from stored procedures that only compute a set of data, and the type that is evaluated each time a query is made to it. CURSOR is evaluated once and exists until it is destroyed.

Let's show the procedure for creating a cursor, providing it with appropriate comments.

/* Cursor creation */

DECLARE PersCur CURSOR

FOR SELECT Familia, DateRogd

FROM Person

WHERE Familia-Petrov '

/* for the declared cursor it is possible to use the OPEN, FETCH, CLOSE */

OPEN PersCur

/'Reading field contents in local variables, as in stored procedures */

FETCH PersCur INTO: Familia,: DateRogd

/* it should be noted that local variables could be specified when defining the cursor */

DECLARE PersCur CURSOR

FOR SELECT Familia, DateRogd INTO: Familia,: DateRogd

FROM Person

WHERE Familia = 'neTpoB'

/* in this case the data extraction operator changes */FETCH PersCur

/* the unused cursor is closed */

CLOSE PersCur

Query results can also be read into indexed variables

FETCH PersCur INTO: a ('Query result T)

in this case - in a three-level variable; in objects, for example an object with OREF = provider

FETCH PersCur INTO: Familia

Set provider.Familia = Familia.

The data can be inserted into the cursor (query by the INSERT statement) from variables and indexed variables (as parameters).

1. The appeal to the list fields was considered earlier.

2. Connections can be external and implicit.

In an external connection (as opposed to an internal connection) connected the fields of the first table are reflected even if the corresponding rows are not found for the field in the second table.

In the external CACHE connection in the WHERE clause, the symbol = * is used instead of the symbol =.

Implicit connection is not determined by the user's request, but is supported by the database. In implicit connections, links and dependency relationships are highlighted.

In the reference , the referenced table field contains the primary key of the table entry record (pointer) that it refers to.

SELECT Name, Manufacturer - & gt; Surname FROM Tovar

WHERE Tour = 'tool',

which corresponds to

SELECT Tovar.Name, Manufacturer.Surname FROM Tovar, Manufacturer

WHERE Tour = "tool"

ANDTovar.Manufacturer = Manufacturer.lD.

The dependency relation is a ratio of 1: M from the parent to the child table. Each row of the child table references a row of the parent table (internal join option). This often refers to the built-in objects: let the parent table have the name Invoice, the child - Position, and the reference to it - Invoice Position.

There are two types of dependencies: from the child table to the parent table and from the parent to the child table.

In the first case, the

SELECT lnvoice-> Date

FROM lnvoice_Position

WHERE Price & gt; 100,000.

is similar to

SELECT Invoice.Date

FROM lnvoice_Position, Invoice

WHERE Price & gt; 100000

AND lnvoice_Position.lnvoice = lnvoice.ID.

In the second case, the operator

SELECT 1nvoice_Position - & gt; Price

FROM Invoice WHERE Price & gt; 100,000

AND lnvoiceNumber = 1003274

matches

SELECT lnvoice_Position_ & gt; Price

FROM Invoice, lnvoice_Position

WHERE Invoice.lnvoiceNumber = 1003274

AND Invoice.ID = lnvoice_Position.Invoice.

In embedded language SQL statements are embedded in a CACHE ObjectScript program using the preprocessor function & sql, for example,

& amp; sql (SELECT ID INTO: ID

FROM PERSON

WHERE Familia =: Familia)

Quit ..% Openld (ID).

The SQL statement can be the update, object creation (CREATE), and the previously described CURSOR.

SQL data can have the following formats: Logical (default), ODBC, Display (not related to constant properties and variable values).

In the embedded SQL language, macro calls can also be used, for which the following conventions are accepted: # - preprocessor, ## or & amp; - extension of the program code, $$$ - macro call according to the previously made macro definition.

For example,

#define TABLE Person

#define FIELDS Familia, Telephon

#define VARS: Familia,: Telephon

#define COND Familia% STARTWITH "A"

& sql (SELECT $$$ FIELDS

INTO $$$ VARS

FROM $$$ TABLE

WHERE $$$ COND).

On the other hand, macros can be used for

Add & amp; sql:

#define GETNEXT & amp; sql (FETCH xcur INTO: a)

FOR $$$ GETNEXT Quit: SQLCODE = 100 Do abc,

which is equivalent to

FOR & amp; sql (FETCH xcur INTO: a)

Quit: SQLCODE = 100 Do abc.

Here SQLCODE is a variable characterizing the execution of & sql: 0 - successfully completed; 100 - completed successfully, but no records were found for the specified conditions; less than zero - there is an error.

In conclusion, we note that it is most convenient to perform SQL-access from external systems, which allows using relational objects, such as reports. This external relational access is provided by the CACHE SQL server and the ODBC interface.

OODB Development Outlook

Compared to relational databases, OODBs have the following advantages:

1. The best possibilities for modeling systems from blocks with arbitrary connections.

2. Easy extensibility of the structure by creating new types of data (properties), inheritance, establishing new relationships and correcting methods.

3. The possibility of using recursive methods with a navigation method for accessing large amounts of data.

4. Increase productivity by 10-30 times.

5. A wider scope of applications (eg, use in multimedia systems).

The advantages of OODB [3] probably lead to a very wide spread of them. However, it is necessary to solve a number of tasks to eliminate OODB deficiencies: create a flexible database structure; build a clear programming language; to work out the syntax for parsing queries, including nested queries; identify several methods for accessing data; to work out the issues of simultaneous access (resolution of conflicts with multiple legacies); to define a complex search of data; to work out protection and data recovery; to clarify the semantics (actions) of operators under dynamic changes; Embed a change in the attributes of child objects.

However, even after eliminating the mentioned shortcomings, the transition to OODB will evidently have an evolutionary character, since it will be impossible to immediately abandon a significant number of operating relational databases. Such a painless transition will be possible if initially not only the object, but also the relational component will be present in the OODBMS. Moreover, the OODBMS should also introduce a multidimensional model for the formation of data warehouses, the paradigm of which is in good agreement with the OODB paradigm. This is the approach used in the CACHE OODBMS [38, 40].

thematic pictures

Also We Can Offer!

Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.

How to ...

We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)