Information storage - Information technology

Storing Information

Storage and accumulation are one of the main actions performed on information and the main means of ensuring its availability during for a certain period of time. Currently, the defining direction of the implementation of this operation is the concept of a database, a warehouse (data warehouse) of data.

A database can be defined as a set of interrelated data used by several users and stored with adjustable redundancy. Stored data does not depend on user programs, a general control method is used to modify and apply the changes.

Data Bank is a system that provides certain services for storing and searching data for a certain group of users on a certain topic.

Database system is the aggregate of a management system, application software, database, operating system and technical facilities that provide information services to users.

Data Warehouse (CD - also use the terms Data Warehouse, Data Warehouse, Information Storage) is a database that stores data aggregated in many dimensions. The main differences HD from the database: data aggregation; Data from the CD is never deleted; replenishment of HD occurs on a periodic basis; the formation of new aggregates of data, depending on the old - automatic; access to HD is performed on the basis of a multidimensional cube or hypercube.

An alternative to data warehousing is the concept of data marts (Data Mart). Data showcases - a set of thematic databases containing information related to specific information aspects of the domain.

Another important direction of database development are repositories. The repository, in a simplified form, can be viewed simply as a database designed to store non-user, but system data. The technology of repositories derives from data dictionaries, which, as they enrich themselves with new functions and capabilities, acquire the features of a tool for managing metadata.

Each of the participants in the action (user, user group, physical memory ) has his own idea of ​​the information.

In relation to users, a three-level representation is used to describe the subject area: conceptual, logical and internal (physical) (Figure 4.7).

Conceptual level is associated with a particular representation of the user group data in the form of an external schema, united by the generality of the information used. Each specific user works with a part of the database and presents it in the form of an external model. This level is characterized by a variety of models used (entity-relationship model, ER model, Chen model), binary and infologic models, semantic networks). In Fig. 4.8 a fragment of the subject database "Sales and one of its possible conceptual representations, which reflects not only the objects and their properties, but also the interrelations between them.

The logical level is a generalized representation of the data of all users in an abstract form. Three types of models are used: hierarchical, network and relational.

The network model is a model of relationship objects that allows only binary connections "many to one" and uses to describe the model of oriented graphs.

A hierarchical model is a kind of network that is a collection of trees (forest).

Description of the subject area

Fig. 4.7. Description of the subject area

Fragment of the object database

Fig. 4.8. Fragment of the object database Sales and one of its possible conceptual representations

The relational model uses a data representation in the form of tables (relations), it is based on the mathematical concept of the set-theoretic relation, it is based on relational algebra and the theory of relations.

Submission of the object database "Sales on a logical level for different models is shown in Fig. 4.9.

The physical (internal) level is related to the method of actually storing data in the physical memory of the computer. In many respects it is determined by a specific method of management. The main components of the physical layer are stored records that are combined into blocks; pointers necessary for data retrieval; overflow data; gaps between blocks; service information.

By the most characteristic features of the database can be classified as follows:

on how to store information:

• Integrated;

• Distributed;

by user type:

Representation of the object database

Fig. 4.9. Submission of the object database Sales on a logical level for different models

• mono-user;

• multi-user;

by the nature of the use of the data:

• Applied;

• Subject.

Currently, two approaches are used in database design. The first one is based on data stability, which provides the greatest flexibility and adaptability to the applications used. The application of this approach is advisable in those cases when there are no strict requirements to the efficiency of functioning (the amount of memory and the duration of the search), there are a large number of various tasks with variable and unpredictable requests.

The second approach is based on the stability of database query procedures and is preferable for stringent performance requirements, especially for performance.

Another important aspect of database design is the problem of data integration and distribution. Until recently, the concept of data integration, which had prevailed until a sharp increase in their volume, proved to be untenable. This fact, as well as an increase in the memory capacity of external storage devices at their cheaper, the wide introduction of data transmission networks facilitated the introduction of distributed databases. The distribution of data at the place of their use can be carried out in various ways:

1. Copied data. Same copies of data are stored in different places of use, since it is cheaper to transfer data. Data modification is centrally controlled;

2. A subset of the data. Data groups that are compatible with the source database are stored separately for local processing;

3. Reorganized data: The data in the system is integrated with the transfer to a higher level;

4. Partitioned data. Different entities use the same structure, but different data is stored;

5. Data with a separate subcircuit. Different objects use different data structures, integrated into an integrated system;

6. Incompatible data. Independent databases designed without co-ordination, requiring integration.

The internal content of the information has an important influence on the database creation process. There are two directions:

• application databases oriented to specific applications, for example, a database can be created to account for and control the receipt of materials;

• Object databases oriented to a specific data class, for example, an object database "Materials", which can be used for various applications.

The concrete implementation of the database system on the one hand is determined by the specifics of the domain data reflected in the conceptual model, and on the other hand by the type of the specific DBMS that establishes the logical and physical organization.

To work with the database, a special generalized tool is used in the form of DBMS (DBMS), designed to manage the database and provide the user interface.

Basic DBMS standards:

• data independence at the conceptual, logical, physical levels;

• universality (in relation to the conceptual and logical levels, the type of computer);

• compatibility, non-redundancy;

• Data security and integrity;

• relevance and manageability.

There are two main areas of implementation of the DBMS: software and hardware.

The software implementation (hereinafter DBMS) is a set of software modules, runs under the control of a specific OS and performs the following functions:

• a description of the data at the conceptual and logical levels;

• downloading data;

• Data storage;

• search and response to a query (transaction);

• making changes;

• Ensuring security and integrity.

Provides the user with the following language tools:

• the data description language (JOD);

• the language of data manipulation (NAM);

• The application (built-in) data language (PAN, VYAD).

The hardware implementation involves the use of so-called database machines (MDBs). Their appearance is caused by the increased volumes of information and requirements for access speed. The word machine in the term MDB means an auxiliary peripheral processor. The term DB computer - A standalone database processor or processor that supports DBMS. The main directions of MDB:

• Parallel processing;

• distributed logic;

• associative memory;

• conveyor storage;

• data filters, etc.

In Fig. 4.10 presents a set of database design procedures that can be combined in four stages. At the stage of formulation and analysis of requirements , the objectives of the organization are established, the requirements for the database are determined. These requirements are documented in a form that is accessible to the end user and the database designer. Usually, the technique of interviewing personnel at various levels of management is used.

The stage of conceptual design is to describe and synthesize the information requirements of users in the initial database project. The result of this phase is a high-level presentation of user information requirements based on different approaches.

Set of procedures for designing a database

Fig. 4.10. A set of database design procedures

In the logical design process, a high-level representation of the data is transformed in the structure of the database used. The received logical structure of a DB can be estimated quantitatively by means of various characteristics (number of calls to logical records, amount of data in each application, total amount of data, etc.). Based on these estimates, the logical structure can be improved to achieve greater efficiency.

At the physical design stage, issues related to system performance are addressed, data storage structures and access methods are defined.

The whole process of database design is iterative, with each stage being considered as a set of iterative procedures that result in the execution of the corresponding model.

The interaction between the design stages and the vocabulary system must be considered separately. Design procedures can be used independently in the absence of a vocabulary system. The vocabulary system itself can be considered as an element of design automation.

The stage of partitioning the database is connected with partitioning it into sections and synthesizing various applications based on the model. The main factors that determine the method of dismemberment, in addition to those shown in Fig. 4.10 are: the size of each section (permissible sizes); model and frequency of application use; structural compatibility; database performance factors. The connection between the database partition and the applications is characterized by the application type identifier, the network node identifier, the application usage frequency and its model.

Application models can be classified as follows:

1. Applications that use a single file;

2. Applications that use multiple files, including:

• allowing independent parallel processing;

• allowing synchronized processing.

The complexity of implementing the deployment phase of the database is determined by the multivariance. Therefore, in practice, it is recommended to consider in the first place the possibility of using certain assumptions that simplify the functions of the DBMS, for example, the permissibility of temporal misalignment of the database, the implementation of the database update procedure from one node, etc. Such assumptions have a big influence on the choice of the DBMS and the design phase under consideration. >

Design tools and evaluation criteria are used at all stages of development. Any design method (analytical, heuristic, procedural), implemented as a program, becomes a design tool that is virtually unaffected by the design style.

Currently, uncertainty in the choice of criteria is the weakest point in the design of the database. This is due to the difficulty of describing and identifying an infinite number of alternative solutions. It should be borne in mind that there are many attributes of optimality that are immeasurable, they are difficult to quantify or represent them as an objective function. Therefore, the evaluation criteria are divided into quantitative and qualitative. The most commonly used criteria for evaluating the database, grouped in such categories, are presented below.

Quantitative criteria: the time required to respond to a query, the cost of modification, the cost of memory, the creation time, the cost of reorganization.

Qualitative criteria: flexibility, adaptability, availability for new users, compatibility with other systems, the ability to convert to another computing environment, the ability to restore, the ability to distribute and expand.

Difficulty in evaluating design decisions is also related to the different sensitivity and time of action of the criteria. For example, the efficiency criterion is usually short-term and extremely sensitive to ongoing changes, and concepts such as adaptability and convertibility are manifested over long time intervals and less sensitive to environmental influences.

The purpose of the data warehouse is information support for decision-making, and not operational processing of data. Because the database and data warehouse are not the same concepts. The architecture of HD is shown in Fig. 4.11.

The basic principles of data warehousing are the following [44,45].

HD Architecture

Fig. 4.11. HD Architecture

1. Object Orientation. The operational database usually supports several subject areas, each of which can serve as a source of data for the CD. For example, for the store, with video and music products, the following subject areas are of interest: clients, video cassettes, CDs and audio cassettes, employees, suppliers. The analogy between subject domains of HD and object classes in object-oriented databases is clearly traced. This indicates the possibility of using design methods used in object-oriented DBMS.

2. Means of integration. Bringing different representations of the same entities to to some common type.

3. Persistence of data. In CD, modification operations are not supported in the sense of traditional databases. In HD, the bulk upload data, carried out at specified times in accordance with established rules, in contrast to the traditional model of individual modifications.

4. Timeline of the data. Thanks to the integration tools, a certain chronological temporal aspect inherent in the contents of the CD is realized.

Basic functions of repositories:

• the on/off paradigm and some formal procedures for objects;

• support for multiple versions of objects and procedures for managing configurations for objects;

• notification of tool and work systems about interesting events;

• Context management and different ways of reviewing repository objects;

• definition of workflows.

Let's consider briefly the main directions of scientific research in the field of databases:

• Development of the theory of relational databases;

• Data modeling and development of specific models for various purposes;

• display of data models aimed at creating methods for their transformation and constructing commutative mappings, developing architectural aspects of mapping data models and mapping definition specifications for specific data models;

• Creation of DBMS with multi-model external level, providing the ability to display widely used models;

• development, selection and evaluation of access methods;

• Creation of self-describing databases, allowing to apply uniform access methods for data and metadata;

• control of competitive access;

• Development of a database and knowledge programming system that would provide a single efficient environment for both application development and data management;

• Improving the database engine;

• Development of deductive databases based on the application of the apparatus of mathematical logic and logical programming tools, as well as spatio-temporal databases;

• Integration of heterogeneous information resources.

Also We Can Offer!

Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.

How to ...

We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)