Modeling and development of tools and technology for searching for documentary information

Structural and methodological foundations of information retrieval systems

Нравится

Send

Мнение автора может не совпадать с мнением редакции

In the tasks of information retrieval, two components are qualitatively distinguished: conceptual and technological.

The conceptual components include, first of all, systems for representing the actual information (knowledge), as well as means for presenting information about the information being processed, which are used as the basis of both the information retrieval mechanism and the organization of user interaction processes with AIPS. . Technological components include user interface tools, information processing, indexing and search algorithms, integration of information from various sources, query languages, etc.

From the point of view of the «intelligence» of search tools and depending on the nature of the information (and the developer’s capabilities), one of the following search technologies can be used as the basis for a specific, respectively, more or less complex AIPS: literal search — substring search that occurs without involving knowledge about the lexical, grammatical and semantic structure of the processed material; search, during which lexical and grammatical information is used, that is, linguistic dictionaries, programs for morphological text analysis are involved; semantic search, carried out on the basis of knowledge about the relationship between the concepts of the subject area (SbA), expressed by means of natural language words.

In the latter case, the carriers of this kind of information, in particular, are thesauri, which have been used for information retrieval for more than three decades. In addition, a huge role in organizing the dialogue between the user and the information retrieval system is played, although less complex, but diverse vocabulary structures. Using them, the user can develop the search by modifying the request (the expression of his information needs) according to the features of the representation of the search object by means of a particular IS and database.

1.1. Information in the systems of the main and information activities

User interaction with a complex of heterogeneous information resources should be considered as a process that depends on two groups of main factors. On the one hand, these are the properties of information and the patterns of information transformations in the field of core activity (OD), taking into account the specifics

perception and processing by a person of both basic (target) information and technological information, which provides the conditions for its interaction with the information environment. On the other hand, the organization of the information space should be considered as a task of such IR management, in which the user’s personal LIS would allow working with them as a single resource, which is necessary based on resolving the issue of identifying resources, and at the level of information consumer — associated with the problems of developing interfaces and access tools that provide personification of the presentation of information objects.

Let us consider the proposed generalized information reproduction scheme, which is based on the representation of an aggregate information system (generator — consumer of information), which determines, in the context of the interdependence of the main and proper information activities, the studied objects and automation processes

From the point of view of flow management tasks, two sets of processes can be distinguished here: the formation of a flow of information (documents) in accordance with the given characteristics (topics, completeness of coverage, etc.) and the distribution of input and output flows and their components in accordance with with information needs. And, if the main activity deals with the receipt and meaningful processing of scientific information (i.e., messages describing some properties of the object under study), then the scientific information activity is, if possible, invariant with respect to the meaning of transforming the text into a form acceptable for : automated identification, storage and retrieval.

The result of the main activity is usually embodied in the form of a message — a document that implements the transformation of meaning into text. Such «materialization» of «ideal» knowledge provides a unified form of alienation, and due to the relatively low cost of replication, it significantly expands the scope of potential consumers. On the other hand, as an inevitable contradiction, the low cost of publication (compared to the cost of obtaining the result itself) leads to a colossal and ever-increasing volume of publications, and the unification of the presentation method causes the external impersonality of messages. In addition, in order for the published message to become a stimulus in the main activity, the subject that perceives the message must also perform semantic transformations: i.e., the message must be perceived (highlighted

among others), understood (the meaning is highlighted) and actually or potentially applied (that is, at least inscribed in the system of available knowledge).

Accordingly, to ensure the effectiveness of «recognition» — the first phase of use, messages must have «signal» features. Such features can be formed, for example, according to the «genus-species difference» scheme, i.e. by introducing an explicit systematization, which is quite natural — scientific knowledge is always systemic, because are created within the framework of some system of concepts of the corresponding branch of knowledge.

A characteristic feature of the circuit shown in is the cycle-personality of reproduction: the objects that form it are the results of purposeful activity, and the purpose of their creation is to use them themselves to obtain new results. This predetermines the «naturalness» of the existence of information activity, which aims at such a transformation of the information environment that would ensure its full and effective use in the process of knowledge reproduction. Moreover, it should be noted that such transformations can be either direct ordering of messages, or their «virtual» ordering — the creation of additional (reference) information messages (for example, thematic rubricators, classifiers, thesauri, etc.) that provide al- alternative «direct» entries into the set of messages related to the problem being solved.

To determine the requirements for LO — the main means of both identifying semantic objects and supporting user interaction with AIS, we will consider it as a set of systems that allow us to represent (identify) both flows and arrays of information, as well as individual documents and queries ad -quate to the nature of the need and the cognitive state of the user.