Abstract
Our time is characterized by a rapid increase in the amount of available information, which leads to the creation of large-scale arrays of thematically and spatially distributed information. Manual processing of such arrays is an extremely complex process that requires significant labor costs. Such arrays can be processed with the help of modern information-analytical systems, in particular – lexicographic ones. However, a significant part of the information in such arrays may be weakly structured or unstructured (in particular, in the form of natural language texts), which creates a need for high-quality linguistic support for such systems as an element of applied linguistics. There is a large number of methods and tools for processing natural language texts – based on dictionaries, machine learning, statistical indicators, etc. However, the results of all these methods require further processing and conversion into the format required by the system in use. For such processing the method of recursive reduction is proposed, which can be used as a component of the linguistic support of information-analytical systems. The method involves the creation of a formalized description of the data format required by the system, based on which the results of the analysis of the input natural language text are transformed, with a creation of an ontology. The resulting ontology can be used to form an ontology-driven lexicographic system.
References
Stryzhak O., Prykhodniuk V., Popova M., Nadutenko M., Haiko S., Chepkov R. Development of an Oceanographic Databank Based on Ontological Interactive Documents. Lecture Notes in Networks and Systems. Cham : Springer. 2021. С. 97–114. DOI: https://doi.org/10.1007/978-3-030-80126-7_8
Hartmann J., Huppertz J., Schamp C., Heitmann M. Comparing automated text classification methods. International Journal of Research in Marketing. 2019. вип. 36. № 1. С. 20–38. DOI: https://doi.org/10.1016/j.ijresmar.2018.09.009
Berardi G. Semi-automated text classification. ACM SIGIR Forum. 2014. вип. 48. № 1. С. 42–42. DOI: https://doi. org/10.1145/2641383.2641392
Humphreys A., Wang R. J.-H. Automated Text Analysis for Consumer Research. Journal of Consumer Research. 2018. вип. 44. № 6. С. 1274–1306. DOI: https://doi.org/10.1093/jcr/ucx104
Aggarwal C. C., Zhai C. A Survey of Text Classification Algorithms. Mining Text Data. Boston, MA : Springer US. 2012. С. 163–222. ISBN: 978-1-4614-3223-4 DOI: https://doi.org/10.1007/978-1-4614-3223-4_6
Sebastiani F. Machine learning in automated text categorization. ACM Computing Surveys. 2002. вип. 34. № 1. С. 1–47. DOI: https://doi.org/10.1145/505282.505283
Perkins M. Approaches to Text Analysis. Global Language Review. 2019. вип. IV. № I. С. 1–7. DOI: https://doi. org/10.31703/glr.2019(IV-I).01
Pennebaker J. W., Boyd R. L., Jordan K., Blackburn K. The Development and Psychometric Properties of LIWC2015. University of Texas at Austin. 2015. С. 1–26. DOI: https://doi.org/10.15781/T29G6Z
Nadutenko M., Prykhodniuk V., Shyrokov V., Stryzhak O. Ontology-Driven Lexicographic Systems. Advances in Information and Communication. FICC 2022. Lecture Notes in Networks and Systems. Cham : Springer. 2022. С. 204–215. DOI: https://doi.org/10.1007/978-3-030-98012-2_16