Multilingual structure modeling - Databases: design

Simulation of the multi-lingual structure

In the context of globalization of the world, there is an aspiration in the development of information systems to expand the use of several languages ​​to present information to the user in the language in which he communicates. The task of multilanguage has emerged with the introduction of online technologies and the need to present data not only to a specific user or organization for which an information system is being developed, but also outside the state where it is mainly used. First of all, the task of multilanguage of information systems is critical in the development of Internet representative offices of an organization or a transnational information system implemented in a network or Internet mode.

Suppose that the e-shop being developed should provide an opportunity to order goods to citizens of different countries, for which the main language is not only United States, but also possible, English, German, Korean, etc. In this case, the catalog of goods must be presented in the national language of the user,

which means that the names of the goods and their descriptions must be represented in any of the language versions.

The simplest solution, which begs for such a task, is the introduction of the corresponding language attributes (Figure 4.66), where the values ​​of the product name will be stored in the language for which the attribute is intended.

Fig. 4.66. Multilingual example based on language attributes


It is important to note one feature of the use of data types when it is necessary to present data in national alphabets. As is clear from the example, for the attribute "Product name (when) the data type is COOKED & quot ;, unlike other similar attributes. This is explained by the fact that Arabic and Asian groups of languages ​​are not represented by the familiar Latin or Cyrillic symbols, but by images denoting combinations of symbols or hieroglyphs. To store such data, the usual character data type is not suitable, since it is oriented to a code table of 256-element symbols, including a Latin or Cyrillic alphabet. To store string data for other language groups, much more memory is needed and the available 256 elements of the code table are not enough. Therefore, for similar text representations, two-byte elements are used and for them, separate data types are designated, which are denoted by .... Therefore, for Korean, Chinese or Japanese, this data type is used, and the maximum row dimension is determined twice as much as is necessary for the Latin alphabet and Cyrillic alphabet.

Multilingual representation in this form is obviously not a good solution, because for each language it is necessary to create a separate attribute. If necessary, add another language, you will need to restructure the database in the form of adding an attribute, and also rework the program logic, which depends on the set of attributes. This is a rather complicated and expensive procedure. Therefore, the use of this option, except for cases of guaranteeing these languages, is not appropriate.

Another option for solving the multilanguage data problem is to create an appropriate database structure that will take into account available language alphabets and to determine the correctness of the information presentation using data structuring. Primarily, to ensure working with languages, you need to have the entity "Languages" (Figure 4.67), which will allow the user to receive data in the desired language and share all the test data for these languages. Also, this entity will make it possible to add new languages ​​without restructuring the database.

Fig. 4.67. Example of the entity Languages ​​


This entity can also be created based on the normalization of the "Products" entity, where it is advisable to specify the "Language" attribute. The normalization process in this case will lead to the creation of an entity-bundle between the goods and the language (Figure 4.68).

Fig. 4.68. Normalization of multilingual representation of the goods


This variant of multitasking for a particular object is quite applicable and allows you to shine the problem of implementing data representation in different languages. When it is created, the attribute Item name must be moved to the binding entity, thereby ensuring the uniqueness of the instance in the entity Products and all possible variants of the language representation.

However, there is often a need to universalize the language support for character data. This is due to the fact that databases often contain a lot of information that it is advisable to represent in the language version, and not only on the basis of the national alphabet. For example, such data can be information in classifiers, which are significantly larger than functional entities that contain information only in a national form. To solve such a problem, the use of a set of bundle entities will greatly complicate the database model and the implementation of query sampling. Therefore, developers provide centralized storage of data in a multilingual format.

To solve this problem, you must have an entity with unique records that define each specific character data element. Since this entity is technical, it can, and this is obviously the only such case, contain a single attribute -

The surrogate primary key. In addition to it, if there is a need to find the language string values ​​not by the numerical code, the value of which is unknown in advance, but by the symbolic code, the corresponding attribute is added to the essence (see Figure 4.69).

Fig. 4.69. Example of an entity with character elements


The dimension of the Character string code is determined by the developer in accordance with the possible values ​​that will need to be stored there.

It is the string identifier that defines the language element that should be represented in different languages. Having determined the relationship between the string identifier and the languages, you can see that there is a multi-valued relationship that needs to be normalized. The type of connection between these entities is many-to-many.

As a result of this normalization, taking into account the universality of all entities used, the developer gets a model that can be associated with functional entities or classifiers, providing language support for names (Figure 4.70). It is important to note that there are more than one character attribute that stores language values ​​in the binding binding entity, and two. This is necessary, as previously explained, for the ability to store short data in the form of a single line and large text data of a multi-line nature. In this case, data types, taking into account the need to store strings in the languages ​​of Asian and other groups, can be defined in the meaning of "MUAKSNAI" and "LICENSES".

Fig. 4.70. Language representation of the symbolic identifier


Because the entity Language forms is a bundle providing a linguistic representation of the character data, it is not practical to use it as a connecting element. There remains only one entity that can be associated with the functional entities and which was previously identified as the identity component, is the entity "String identifier".

When defining the relationship between the entities Products and String id you need to find out how to provide the naming of goods. If it is assumed that each product is unique by its name, then the connection between these entities will be one-to-one (1: 1), but this relationship is not appropriate to normalize, because the entity "String identifier" can not be deleted due to its association with other entities where language support is required (Figure 4.71).

Fig. 4.71. Linking a language representation to a functional entity


If the same names can appear in the product catalog, for example, when the catalog contains lists of goods not only sold at the moment but sold at a different price, then the connection between the entities will be one-to-many (1: D ^), in the direction of the entity "Goods", which is reflected in the presented model (see Figure 4.71).

And the last stroke of the universalization of language processing is the presence of string and text information, not related to linguistic conditions (Figure 4.72). Usually this is done to connect the language information with the symbolic data represented without regard to the linguistic alphabet, i.e. always in one chosen language. These attributes in the entity String id may not be, but then string and text data not associated with the language alphabet will need to be duplicated in conjunction with the symbolic information assigned to a specific language.

Fig. 4.72. Using non-language elements


The use of language forms enables the database developer to build such a model and, subsequently, a physical database that will take into account the features of presenting information in the languages ​​required by the customer, without the use of additional modifications.

Also We Can Offer!

Other services that we offer

If you don’t see the necessary subject, paper type, or topic in our list of available services and examples, don’t worry! We have a number of other academic disciplines to suit the needs of anyone who visits this website looking for help.

How to ...

We made your life easier with putting together a big number of articles and guidelines on how to plan and write different types of assignments (Essay, Research Paper, Dissertation etc)