Downloads: 2 | Views: 149 | Weekly Hits: ⮙1 | Monthly Hits: ⮙2

Research Paper | Computer Science & Engineering | India | Volume 13 Issue 8, August 2024 | Rating: 5.3 / 10

Enhancing Speech-to-Text Conversion with Convolutional Reinforcement Learning Algorithms

Pichika Ravikiran, Midhun Chakkaravarthy

Abstract: Speech-to-Text (STT) conversion has become a critical component in various applications, ranging from virtual assistants to real-time transcription services. Traditional models, while effective, often struggle with accuracy and robustness in diverse acoustic environments. This paper introduces a novel approach to STT conversion by leveraging Convolutional Neural Networks (CNNs) for feature extraction and Reinforcement Learning (RL) for optimizing transcription accuracy. Our proposed method employs CNNs to capture local temporal and spectral features from raw audio signals, transforming them into high-dimensional representations suitable for sequential processing. These features are then fed into a Sequence-to-Sequence (Seq2Seq) model, which translates the audio features into textual output. To enhance the performance of the Seq2Seq model, we integrate a reinforcement learning agent that dynamically adjusts model parameters based on a reward function that incentivizes correct transcriptions. We evaluate our model on a benchmark speech recognition dataset, demonstrating significant improvements in accuracy and robustness compared to traditional STT systems. Our results indicate that the convolutional reinforcement learning approach not only enhances the model?s ability to generalize across different speakers and acoustic conditions but also reduces the error rate in noisy environments. This study underscores the potential of combining CNNs and RL to create more efficient and accurate speech recognition systems, paving the way for future advancements in voice-activated technologies and applications.

Keywords: Speech-to-Text (STT), Convolutional Neural Networks (CNNs), Reinforcement Learning (RL), Sequence-to-Sequence (Seq2Seq) model

Edition: Volume 13 Issue 8, August 2024,

Pages: 1118 - 1122

Enhancing Speech-to-Text Conversion with Convolutional Reinforcement Learning Algorithms

Rate this Article