THE PROBLEM OF NATURALITY OF INTONATION IN AI SPEECH SYSTEMS
Abstract
This article explores the issue of naturalness in intonation within AI-based speech systems. While artificial intelligence can generate grammatically accurate speech, it still struggles to reproduce the emotional and pragmatic depth inherent to human communication. The paper discusses the role of intonation as a cognitive and communicative marker, emphasizing its function in conveying meaning, rhythm, and emotion. It analyzes the limitations of current prosodic models, cultural and linguistic variability, and ethical concerns surrounding voice cloning and identity. Furthermore, the article outlines future directions for enhancing naturalness in AI-generated speech, including contextual prosodic planning, emotional modeling, and multimodal integration.
References
1. Chafe, W. (2018). The Relationship between Thought and Language: Intonation as a Cognitive Structure. New York: Routledge.
2. Levelt, W. J. M. (1999). Models of Speech Production. Cambridge University Press.
3. Makarov, D. & Belousov, A. (2021). Artificial Intelligence in Speech Synthesis: Challenges and Prosody Modeling.Moscow: RUDN Press.
4. McKeown, G. (2023). Emotion in Speech Technology: Towards Humanized AI Voices. London: Springer.
5. Zhang, X., & Li, H. (2022). Prosodic Modelling for Natural Speech Synthesis in Multilingual Contexts. IEEE Transactions on Audio, Speech, and Language Processing.
6. Patel, A. D. (2010). Music, Language, and the Brain. Oxford: Oxford University Press.
7. Taylor, P. (2009). Text-to-Speech Synthesis. Cambridge: Cambridge University Press.
8. Юлдашева Д.Н., Юсупова А.Ш., Чўллиева Г.Т. О невербальных средствах речи. ‒ BuxDU Ilmiy axboroti, 2023. ‒4-son. ‒ 108-115-b.