International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064

Downloads: 0 | Views: 26

Research Paper | Computer Science & Engineering | India | Volume 13 Issue 8, August 2024 | Rating: 4.7 / 10


Enhancing Speech-to-Text Conversion with Convolutional Reinforcement Learning Algorithms

Pichika Ravikiran | Midhun Chakkaravarthy [3]


Abstract: Speech-to-Text (STT) conversion has become a critical component in various applications, ranging from virtual assistants to real-time transcription services. Traditional models, while effective, often struggle with accuracy and robustness in diverse acoustic environments. This paper introduces a novel approach to STT conversion by leveraging Convolutional Neural Networks (CNNs) for feature extraction and Reinforcement Learning (RL) for optimizing transcription accuracy. Our proposed method employs CNNs to capture local temporal and spectral features from raw audio signals, transforming them into high-dimensional representations suitable for sequential processing. These features are then fed into a Sequence-to-Sequence (Seq2Seq) model, which translates the audio features into textual output. To enhance the performance of the Seq2Seq model, we integrate a reinforcement learning agent that dynamically adjusts model parameters based on a reward function that incentivizes correct transcriptions. We evaluate our model on a benchmark speech recognition dataset, demonstrating significant improvements in accuracy and robustness compared to traditional STT systems. Our results indicate that the convolutional reinforcement learning approach not only enhances the model?s ability to generalize across different speakers and acoustic conditions but also reduces the error rate in noisy environments. This study underscores the potential of combining CNNs and RL to create more efficient and accurate speech recognition systems, paving the way for future advancements in voice-activated technologies and applications.


Keywords: Speech-to-Text (STT), Convolutional Neural Networks (CNNs), Reinforcement Learning (RL), Sequence-to-Sequence (Seq2Seq) model


Edition: Volume 13 Issue 8, August 2024,


Pages: 1118 - 1122

Rate this Article


Select Rating (Lowest: 1, Highest: 10)

5

Your Comments

Characters: 0


Type Your Registered Email Address below to Rate the Article


Verification Code will appear in 2 Seconds ... Wait

Top