The use and functioning of the RDB, Queries - Databases

Using and Operating the RDB

The peculiarities of using RDB are related to the specifics of queries. The operation of the RDB is characterized by the features of providing simultaneous access, protection, recovery of data, determined by the DBMS used.

Let's proceed to a detailed study of the specifics of the named processes.


If the query structure is known in advance (standard), the efficiency of the query process is determined on the previously described stages of fragmentation and localization [11].

If the query structure is unknown, there is a need to optimize the query process. This is especially true in the presence of parallel processing of data.

The goals (optimization objectives) of optimization can be:

• minimum data transfer time;

• minimum processing time in nodes;

• increased parallelism in processing and data transfers;

• Improve traffic and load processors on the network;

• the minimum cost (delay) of data transfer only (if there are low-speed communication channels);

• the minimum cost of local processing (if the data transfer rate is commensurable with the speed of the processor);

• Computer load balancing.

To characterize the cost of queries, we introduce two parameters: the cost of processing and the delay of data.

The cost of the TS processing data of the volume x, sent to node i from nodes j, associated with the operation of physical communication channels, is

where are the costing and communication units for the unit of information (), determined by the system.

TD (x) delay is the time between the beginning and the end of the request processing:

where - the time matrix for establishing communication and transmitting a unit of information.

We accept these assumptions:

1) communication channels are uniform ();

2) the cost of transmission is high, and the cost of local processing can be neglected (only data transmission parameters are taken into account).

Optimizing the query can be divided into two phases:

1) global optimization, which will be referred to simply as optimization;

2) local optimization, carried out by the methods described in Ch. 4 and are not considered here.

To describe the essence of optimization, the relational algebra (RA) apparatus with its equivalent transformations is used, which, as a result, reduces the response time to the query. Variable optimization can be the choice of physical copies and the assignment of nodes when performing RA operations.

Again, we use a query tree in which each of the sheets corresponds to a relation, each vertex is a PA operation. The end node corresponds to the RDB node (local DB), and the answer to the question is the root node.

The operations of relational algebra (selection S or o, projection P or π, connection J, union UN, difference DF, Cartesian product CP), and laws that connect them are described in Ch. 4 of this paper. The semi-join of SJ is discussed in this chapter.

Perhaps a rigorous solution to the optimization problem, most often reduced to the problem of integer programming. However, when solving it, there are serious difficulties (a large amount of data of the problem and the need for a significant free resource, a shortage of solution time) and a gain in comparison with heuristic algorithms, as a rule, is small. In addition, the decision is influenced by factors that are difficult to consider, such as network topology, data transfer protocols, models, and data layout.

Therefore, we use the heuristic approach to optimization.

The cost and time of data transfer can be reduced by reducing the amount of information transmitted. In this regard, the following recommendations are possible.

1. Unary operations must be performed as early as possible (in nodes), as shown in Fig. 12.1, where S and J are unary operations.

2. Multiply occurring expressions are best done once. To do this, the merge of the leaf-relations is performed in the query tree, then the merging of the same intermediate operations (Figure 12.2). Here we use the obvious expression

3. Use semi-joins. The idea of ​​a half-join is shown in Fig. 12.3. In the traditional scheme, either R must be forwarded to node 2, or S - to node 1. You can transfer only one time S, shortened by projection, to node I and then perform a semi-join. Let's show what was said for the example.

Using the unary operator at the end (a) and at the beginning (b) of the transformation

Fig. 12.1. Using the unary operator at the end ( a ) and at the beginning ( b ) of the transformation

Selecting common subexpressions: a - the original query, b - the interim query, в - the converted query

Fig. 12.2. Selecting common subexpressions: a - the original query; b is an interim request; in is a converted query

Let the following relations be given:

The following options are available:

a) forward S from node 2 to node 1 (24 values);

b) compute r '= Pb (r) at node 1 and send it to node 2, where to calculate s' = J (r', s) and send to node 1. There find J (r, s) as J (r, s'), that is, we obtain a half-join. There are already 9 + 6 = 15 values.

Using a semi-join

Fig. 12.3. Using a semi-join

Recall that the semi-join r and s or SJ (r, s) is PR (J (r, s)), R & Icirc; r. If the ratios r and s are in different nodes, then computation of the half-join reduces the amount of data transferred. We note that J (SJ (r, s), s) for the half-join.

Sometimes a half-join can completely replace a connection.

Note that the semi-join is more often used for vertical fragmentation.

4. The copy usage strategy (access). It is carried out at the physical level. Determine the best option for a significant amount of them is possible only through modeling. Let us take into account only the costs of data transmission and there are matrixes of cost or time of transmission. The values ​​of the matrix can be determined exactly, and the dimensions of the relations are approximately. Here, therefore, it is necessary to use approximate methods.

Enter the concept of "profile", which includes:

• power (number of fields) of the ratio R;

• width (number of bytes) of any attribute;

• the number of different attribute values ​​for the R.

Use the method of estimating the sizes of intermediate relations for each operation of relational algebra.

To solve the issue of selecting copies and assigning execution nodes, it is convenient to build an optimization graph that does not contain unary operations performed at nodes. Operations of the Cartesian product and difference are rare, the operations of union and connection take place.

Thus, the query optimization problem is ambiguous.

It should be noted that the optimization problem should be solved regularly and promptly. This requires the developer to build special programs and therefore is not always done.

A more often used optimization algorithm is laid when designing RDBMS developers directly in the management system, when creating special software products in the form of applications or utilities.

Also We Can Offer!

Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.

How to ...

We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)