Creating RDB, Ensuring Integrity, Fragmentation and Localization - Databases

Creating an RDB

Specific stages of the RDB creation are fragmentation and localization. The design solutions at these stages are ambiguous, and in some cases (when choosing the RDB structure) it is useful to use mathematical modeling.

The RDB consists of associated local databases (LBDs). There are two options for creating an RDB: RDB design "from scratch"; integration (integration) of ready-made LDMs. In the first variant, the LJD data models are usually the same (homogeneous LJBs), in the second data model, there may be different (non-homogeneous LBDs).

The theory of integration for homogeneous and inhomogeneous LBDs as a set of processes describing and manipulating data is considered. The rules for converting data models into each other are presented.

Ensuring Integrity

We will adhere to the order of presentation in accordance with the design stages (see Figure 2.6).

Design principles can be used;

• maximum localization of data and a reduction in the number of data sent over the shortest path: it is recommended to have up to 90% of it in the local DB (LBD) of the node and about 10% - in the LBD of other nodes;

• Locality of data location should be determined in relation to the largest number of applications.

As the criteria for designing an RDB, [16] can be: 1) a minimum of the amount of data and messages sent; 2) the minimum cost of traffic; 3) minimum total time required to service database requests.

In considering the RDB, it is possible to distinguish two cases of work: with one application and with an application system. Perhaps a descending and ascending design.

An upward projection is usually used in the case when the RDB is created from already running local databases. These features are highlighted in the problem of integrating homogeneous and heterogeneous databases.

An upward projection can be a stage of top-down design, which, due to the fact that it is used much more often, will be discussed in more detail.

In general, data integrity may be violated for the following main reasons:

• errors in creating the structure of local databases and filling them;

• Miscalculations in the construction of the RDB structure (fragmentation and localization procedures);

• System errors in the software of interaction of local databases (simultaneous access);

• emergency situation (hardware malfunction) and recovery of the RDB.

The first position is described in detail earlier and will not be considered here. The specifics of the remaining three positions for the RDB can be fixed in the form of a set of problems (Figure 11.1). The fourth position is examined in Ch. 12, the second and the third - are discussed in detail here.

RDB Issues

Fig. 11.1. Problems of RDB

Sometimes, the intentional distortion of information, i.e., unauthorized access, is considered to violate integrity. These questions will be studied in Ch. 12.

Fragmentation and localization

Recall (see Figure 2.6) that the general stage of the design of the RDB is reminiscent of the time step in the creation of a centralized database, and the difference occurs only in the stages of fragmentation (fragmentation) and localization (placement).

The main factors that determine the method of dismemberment are the permissible size of each section; model and frequency of application use; structural compatibility; database performance factors. The connection between the database partition and the applications is characterized by the application type identifier, the network node identifier, the application usage frequency and its model.

The complexity of implementing the deployment phase of the database is determined by the multivariance. Therefore, in practice, it is recommended first of all to consider the possibility of using certain assumptions that simplify RDBMS functions (for example, the permissibility of a temporary database mismatch, the implementation of the database update procedure from one node).

Fragmentation [2], as noted earlier, can be horizontal and vertical. A fragment can be defined by a sequence of a selection operation and the projection of a relational algebra (see Chapter 5). When decomposing, a number of conditions must be met.

1. Completeness - All global R data must be mapped to its fragments.

2. Recovery ... - It's always possible to restore the global ratio from fragments.

3. Intersection - it's best that the fragments do not overlap (duplication occurs at the localization stage).

For horizontal fragmentation by selection, any subset of tuples is combined by a commonality of properties, defined by the description of the domain.

Vertical fragmentation by projection divides the global ratio (Scheme R) by application (or by geographic feature).

Fragmentation is correct if any attribute of the global relationship (scheme R) is present in any subset of attributes and the global ratio is restored by a natural connection.

Fragmentation together with localization (Figure 11.2) determines, in the final analysis, the speed of the RDB response to a query.

Data fragmentation and localization scheme

Fig. 11.2. Scheme of fragmentation and localization of data

Denote nodes through , and attachments through . If there is a relation r with the scheme R, then there is a relation-fragment in the j-node . If there is a copy of the fragment , in node j, then we denote it by . Then the fragmentation and localization scheme can be represented in the form shown in Fig. 11.2.

In other words, with fragmentation, the decomposition of a global relationship must have the lossless connection property.

Mixed fragmentation is possible, which corresponds to a set of operations of selection and projection

where ; m is the number of records in the horizontal fragment.

After the fragmentation, localization is performed. As a localization criterion [2], it is convenient to use the effect of localization (allocation) on the query optimization problem with its known structure. To do this, you need to model and optimize all applications for any placement option.

We use the notations introduced in this chapter. Let - frequency of activating the application to the node); rki. - the number of links to the search for the application k in the fragment i; - the number of links for updating the application data to the fragment i;

The allocation task has two main varieties: without use and using copies.

Consider the first variety. Heuristic and strictly mathematical algorithms are possible here.

First we discuss one of the heuristic placement algorithms, consisting of several steps.

Step 1: Using the most appropriate Placement: the fragment Ri is placed in the node j, where the number of references to it is maximal. Number of local links

(11.1)

From the expression (11.1) we define the node j 'where the fragment should be placed.

Step 2: Apply the selection method for all profitable nodes for redundancy: put in all nodes, where the cost of the links of the applications performing the search is greater than the cost of the links of the applications updating the data in the fragment Rj in any node: Bij & gt; 0 or

(11.2)

where C is the ratio of the cost of updating and searching.

Step 3 . Use the incremental copy method. Let dj - the degree of redundancy Ri; Fi - the advantage of placing a copy in any node of the RDB and Modifying the expression

(11.2), we obtain

(11.3)

Consider vertical fragmentation.

Suppose that the scheme Rs is decomposed into Rs and Rt, where s and t are nodes.

We use the following reasoning.

1. If there are two applications Ps and Pt that use only the attributes Rs and Rt, i.e. refer only to the nodes s and t, the result of fragmentation and localization is the absence of remote references.

2. If there is an application (many applications) Pq1, local to node w, which refers to Rs or Rt, then one remote reference will appear.

3. If there is an application Pq2, local to w and referring to the attributes Rs and Rt, then two remote references are obtained.

4. If there is an application of Pq3 at nodes other than w, s or t, and referring to Rs and Rt, then another remote reference will appear.

In general, the advantage of fragmentation and localization (for C = I)

(11.4)

The described heuristic algorithms can not give a rational solution, so consider some strict algorithms based on integer programming.

We introduce the notation: - independent data files; - nodes; L. - the volume of the file; B. - the amount of memory node (for files); dsj-coefficients taking into account the distance between nodes and ; - transfer cost; - intensity of requests to file i from node j; - intensity of message correction; - the volume of requests to file i from node j; - the amount of requested data when query i is executed from node j;

Then the amount of data arriving at the node) containing the file i, when executing the request to this file with the intensity taken into account is , and the amount of data that make up the queries and answers,

The following criteria are possible: the amount of data transferred; total cost of traffic.

Using the first criterion, we get the following integer programming problem:

Similar expressions are obtained using the second criterion.

Summarizing presented tasks, you can solve the problem of placement and determine the number of copies.

More detailed results were obtained with the help of queuing theory and integer programming (see [21-23]).

Also We Can Offer!

Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.

How to ...

We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)