Abstract
Humanoid robots are being introduced in places where people do not speak the same language and people expect quick, natural responses. In such situations, speech interaction cannot afford noticeable delays. Most of the present speech-to-text systems are mainly maintained with cloud servers, leading to latency problems, reliance on dependable connectivity, and failures on the fly when used in real time. These shortcomings become especially evident when robots are expected to autonomously and continuously interact with human users. To address these limitations, this project proposes a new edge-centric speech-to-text framework tailored specifically for the multilingual humanoid robot domain. Instead of sending audio data to the cloud, this method performs speech processing directly within the robot. This technology includes lightweight neural models for real-time streaming, an onboard mechanism that allows for real-time identification of the target language, and local caching methods for quicker retrieval of repeated or known speech patterns. Combine these and you can get quicker, more trustworthy transcription without burning a hole in network resources. The system reduces communication delays to a great extent, while providing transcribed data in multiple languages due to local handling of speech from the wireless edge network. The time taken in overall response is more than 60% lower than the response time used in cloud-based systems; it has been found in experiments. More critically, the framework does a good job with fluctuating network bandwidth, loss of packets, and background noise. It is concluded that edge-based and multilingual speech-to-text systems will be important for humanoid robots to enhance responsivity and contextuality. Understanding faster results in faster reactions, smoother conversations, and moments of interaction that feel more natural is a major step toward pragmatic and reliable communication between humans and robots in the working world.
Keywords
Get full access to this article
View all access options for this article.
