Compiling of the Electronic Dictionary of Models of the Ukrainian Language Multicomponent Complex Sentences

Natalia Darchuk, Taras Shevchenko National University of Kyiv, Ukraine

DOI: https://doi.org/10.17721/um/49(2019).117-129

Abstract

The purpose of this study is to construct an automatic syntactic analysis (ASA) and, as a result, to compile a dictionary of models of multicomponent complex sentences for studying the fectures of the linear structure of Ukrainian text. The process includes two-stages: the first stage is an automatic syntactic analysis of the hierarchical type which results in building of a dependency tree (DT), in the second stage, the sentence structure information is automatically extracted from the obtained graph. ASA is a package of operations performed with a string of morphological information (the result of AMA work) representing the incoming text for determination of syntactic relations between text units. The outgoing text for the ACA is a string of information reduced after the AMA to wordforms. We have studied features of the linear structure of 2000 Ukrainian language sentences in journalistic genre (selection of 52000 words use). Based on the obtained results, we have constructed the real models of the syntactic structure of sentences, in which the relations between simple clauses were presented. All grammatical situations of the linear context were possible manifestations of models in the text. Based on that data, the algorithm for the automatic generation of a complex sentence model was created. These models are linear syntax grammar. All types of syntactic connection between the main and subordinate clauses are recorded algorithmically. Thus, it is possible to build the interpretations of the linear structure of the Ukrainian language sentence almost not using lexical-semantic information. The theoretical value of the paper is in extension of our knowledge about the structure of the syntactic level of the language and the variety of mechanisms functioning at that level. The applied value, is first of all, in creation of the dictionary of compatibility of compound (coordinated) and complex (subordinated) sentences, and in the possibility of constructing requests to the Ukrainian language Corpus in order to mine from the text definite models sentences, creating own dictionaries of authors and styles.

Key words

dependency tree, automatic syntactic analysis, models of multicomponent complex sentences, phrase, frequency dictionary.

Full text PDF

References

1. Darchuk N.P. (2013) Kompyuterne anotuvannia ukrainskoho tekstu: rezultaty i perspektyvy [Computer Annotation of Ukrainian Text: Results and Prospects]. Kyiv: Osvita Ukrainy. 543 p. (in Ukrainian).

2. Zahnitko A.P. (2004). Osnovy ukrainskoho teoretychnoho syntaksysu [Fundamentals of Ukrainian Theoretical Syntax]. Part 1. Gorlovka: GDPIIM, 227 p. (in Ukrainian).

3. Kulagina O.S. (2001) Ob odnom podhode k ustanovleniyu otnosheniy mezhdu prostymi predlozheniyami v sostave slozhnogo pri avtomaticheskom analize tekstov [One approach to define relations between simple sentences as part of a complex at automatic analysis of texts]. Mathematical questions of cybernetics, no 10, pp. 15-34 (in Russian]).

4. Shvedova N.Yu (ed.) (1980) Russkaya Gramatika [Russian Grammar: in 2 Vols]. Moskva: Nauka, vol. 2, 709 p. (in Russian).

5. Sevbo I.P. (1981) Graficheskoe predstavlenie sintaksicheskih struktur i stilisticheskaya diagnostika [Graphical representation of syntactical structures and stylistic diagnostics]. Kiev: Naukova dumka, 192 p. (in Russian).

6. Tsimmerling A.V. (1999) Poriadok slov i sintaksicheskie pozitsii [Words order and syntactic positions]. Proceedings of the international seminar “Dialog’98” on computer linguistics and its application. URL : https: //antonzimmerling.files.wordpress.com/2013/06/turus.pdf (in Russian).