The concept of the information retrieval system and its structure

The structure and functioning of a specific information retrieval system (IPS) depends on the type and composition of information sources, on the methods of implementing information retrieval. At the same time, there are some general principles for the construction and operation of IPS, which are briefly discussed in this chapter.

Analysis of IPS definitions The information retrieval system was initially understood to mean a set or a set of related individual parts intended to identify in some set of information elements documents, information, etc.) that respond to the information request to the system [14].

Given the above description of the information retrieval process, the IP (IP) can be defined as follows:


where D is some set of documents or a library (search array); Q - set of information requests; R is the set of relations, properties, in the presence of which to any query q i & Icirc; Q corresponds to the subset D '; D '- response to the information request.

With this in mind, AI Cherny proposed to present the information retrieval system - IPS (IPS) as a set of four main components [24, p. 18]:


where LS - logical semantic apparatus (including information retrieval languages ​​- one or more, indexing rules and delivery criteria); D - search array (i.e., a certain set of documents, provided with search images, in which necessary ones are searched); TS - technical means (that is, some devices or devices that are necessary for recording and storing search images, for storing documents, and for performing a process of matching search documents with a search query or a search query) ; N - people interacting with the system (ie those who use this IRS and serve it, including indexing documents and information requests, choose a search strategy, and perform other intellectual operations, without which information retrieval is impossible).

Then, in order to make it possible to automate the information retrieval procedure, it was suggested [14, 24] to distinguish two levels of consideration in the IPS - abstract and concrete.

The abstract IPA was the set of the IPY (retrieval language - RL), indexing rules ( IND) and the criterion of the delivery or criterion of the semantic correspondence (KSS):


Specific IPS is a practically implemented system that includes an array of documents D, in which the information search is performed, the technical means TS of the IPS implementation, and interacting with her people N.

The structure of the IRS in this sense is shown in Fig. 6.2.

In accordance with the considered isolation of the abstract and specific levels in IPS and taking into account the peculiarities of storing documentary information (libraries, archives and similar storages), the information retrieval procedure of documentary information was it is suggested to divide into two circuits [17]:

1) semantic interpretation of the request and the issuance of addresses (codes, codes) that correspond to the request for documents; in Fig. 6.2 this contour is shown by solid lines;

2) finding the documents themselves (manually or with the help of specialized technical means, if they have a storage facility); in Fig. 6.2 - dashed lines.

The second circuit is associated with the development of specialized technical means of storing large arrays of documents and the work on re-equipment of storages, and in fact the problems of information retrieval are solved in the primary circuit.

Structure of the functioning of the IPS

Fig. 6.2. The structure of the functioning of the IPS

In accordance with the foregoing, the first contour of the IPS is its logical-semantic apparatus and consists of three main blocks (Figure 6.3):

information retrieval language;

translation systems (indexing ) into this language;

logic, providing search, which, in turn, can be detailed and implemented in different ways.

The presentation of the IPS in the form of two circuits is currently the most common.

In some systems, paths can be combined.

On the contrary, sometimes it becomes necessary to allocate not more but more contours, which helps to organize consistently in-depth analysis of the texts of documents.

Such options are implemented, for example, in documentary-factual systems of normative-legal and normative-methodical documents.

The composition of the logical semantic apparatus of the IPS

Fig. 6.3. The composition of the logical-semantic apparatus of the IPS

In the symbolic form adopted above, the abstract of the IPS (1st contour) is a set of IPY (RL), indexing rules (system) ( IND ) and logic (LOG), including along with the criteria of the semantic correspondence, the basic relations:


Other IPS definitions were also proposed

To organize the design of information systems Yu. F. Telny [21] offers a definition containing seven components:


where G - goals; E li are internal elements; E n - external elements; T - the period of existence of the system; F - functions (processes, operations); R - relations, including dynamic interactions; Z - regularities that determine the structure of the system and its interaction with the external environment.

The choice of the IPS definition depends on the specific object for which it is developed, on its designation, development and operation conditions.

In theory and practice, different types of IPS are distinguished.

The IPS (DIPS), in response to the information requests that they enter, issue originals, copies or addresses of documents containing the required information.

Factoring IPS (FIPS) are intended for the issuance of directly required information. For example, the boiling point of any liquid, the statistical indicators contained in the relevant reporting documents, etc.

There are two types of factographic systems:

1) systems in which arrays of factual information are formed immediately, along with documentary;

2) systems in which factual information arrays are formed on the basis of arrays of documentary information.

Information systems of the second kind can, in turn, be formed as documentary-factographic (DFIPS and ADFIPS), containing two types of arrays:

documentary and the associated factual arrays


In contrast to documentary, factographic and documentary-fact-finding IPS of the 1st type, which can query only the information entered into them earlier, information-logical systems are systems of a higher class: they must not only give out information previously entered in them, but also produce, if necessary, a logical reworking of this information in order to obtain new information that has not been explicitly introduced into the IS.

The information-logical system (ILS or ILS ) can be defined as the set of IPY (RL), rules for translating from natural language to informational, those. indexing rules ( IND ) and inference rules (LV), which are intended for algorithmic retrieval of new information (I n ) [14, 24]:


Developing ideas about information systems that can receive new information, Yu. I. Shemakin offers the concept of information-semantic system [25, p. 60]:


where a - the target; St is the structure; tp iss & Icirc; TP - a subset of technological processes for a given ISS, ω - conditions; t i is the time.

The constituent elements in the definition (6.7) can be detailed taking into account the specific implementation of the IPS.

It is especially important to clarify the composition of technological processes:


where met are methods; re means; SemSI - semantic processing of semantic information.

