# Perspectives of Informometry Development - Theory...

## Prospects for the development of informetry

Based on the ideas of the laws of Zipf-Mandelbrot and Bradford-Vickery, the regularities of concentration-scattering formulated by VI Gorkova, the methods for automating the indexing and analysis of texts, introducing weight coefficients of terms are developed.

We introduce weight measures for keywords.

Thus, in the works of Spark Jones it is experimentally shown that if N is the number of documents and n - the number of documents in which the given index term (keyword), then its weight is calculated by the formula

and leads to more efficient search results than without using the evaluation of the significance of the index term, i.e. not only the frequency of the application of the word in a particular document has a certain value, but also the number of documents in which this word occurs.

Logarithmic measures are introduced.

For example, to get rid of unnecessary words and at the same time raise the rating of significant words, enter the inverse frequency of the term

where N - the number of documents in the database; n i is the number of documents with the term i.

And then each term is assigned a weighting factor that reflects its significance in the form

where j - weight of the term i in the document; j x - the frequency of the term i in the document; i x is the inverse frequency of the term.

In a new sense, the term kernel is used.

In 1995, at the symposium in Dublin, an interesting and useful idea for the improvement of the information search was the idea of ​​the Dublin Core (Dublin Core), based on the formation of metadata, fixed in the specification of a certain standard, and on the representation of the k-th document by the set of pairs D "= { N ik, V ik}, where N ik - the name of the i element of the Dublin kernel metadata in the description of the content k - of the first document; - is the value of this metadata element. A similar query is described.

It seems promising to use the "Dublin kernel" to form the regularities of concentration-scattering.

There is an increasing interest in how to evaluate texts. For example, the work G. Moon , in which the sentences of the text are evaluated in accordance with the parameter

where V - the significance of the sentence; Ν. κ - number of significant words in the sentence; N c - the total number of words in the sentence.

Using this criterion, a number of sentences can be selected from any document. It is clear that they will not constitute an articulate text. It should also be taken into account that meaningful words should be taken from a thematic thesaurus or selected by an expert. For this reason, the technique can only help a person, and not replace it (in any case, at the present stage of development of computer technology).

The regularities of the organization of the DIP, the introduction of quantitative measures of terms, proposals and other components of the text are useful to use at all stages of the creation of information retrieval systems: in the acquisition of information funds, the creation of information retrieval languages ​​and the logical semantic apparatus of the IPS, in the organization of reference and information service in libraries and departments of scientific and technical information, in the creation and improvement of classification systems, the identification of trends in the growth and aging of DIP, with analytic- inteticheskoy processing text information.

Recently, based on the idea of ​​the regularity of concentration-scattering, methods are being developed to identify the information core of the domain in the construction of an information system for the reorganization of business processes, when creating virtual enterprises.

[...]

[...]

[...]

[...]

[...]

[...]

[...]

[...]

[...]

[...]

[...]

[...]

[...]

[...]

