Voice agent development for Health care applications
Built production-grade healthcare voice agents using the LiveKit framework.
Benchmarked STT, LLM, and TTS models for latency, cost, and quality.
Implemented post-call quality analysis using an “LLM-as-a-Judge” evaluation pipeline.
Speech Large language model exploration
Explored Speech Large Language Models for speech understanding tasks.
Created and prepared datasets for training and evaluation.
Built speech recognition and speech translation workflows using Speech LLMs.
Text to speech Synthesis using Diffusion model
Trained diffusion-based TTS models for natural speech synthesis.
Built multi-speaker TTS for voice cloning use cases.
Developed TTS support for 8 Indian languages with latency < 100ms.
Speech recognition for module for Telephone use-case
Developed ASR modules optimized for telephone audio (8 kHz, noisy channels).
Performed self-supervised training using 10k+ hours of telephone speech data.
Fine-tuned ASR models for Indian languages with domain-specific adaptation.
E2E ASR model exploration
Explored end-to-end ASR model architectures and training recipes.
Built a production-ready speech recognition module with reliable evaluation.
Applied model compression and pruning to reduce latency and cost.
Implemented multilingual ASR and domain adaptation using LoRA-style methods.
End to End AI Solution for Voice Audit Analytics
Sentiment analysis of call recordings and profiling Agent and customer Tones
Automation platform for training and testing Deep learning models
E2E ASR, support for automatic punctuation in transcription, confidence score etc
Neural network based Language models
End to End AI Solution for Voice bot
Context based ASR and low resource ASR
Natural Speech synthesis with control over speed, pitch and speaking style
Voice cloning for any speaker with limited amount of data
Speech Enhancement Using Generative Adversarial network.
Neural network based speech enhancement using GAN under high SNR.
Developing TTS system under noisy condition
Non-parallel system using Cycle GAN.
Data augmentation for Automatic Speech Recognition.
Data augmentation using Generative adversarial network (GAN).
Verification of GAN based Data augmentation in Automatic Speech Recognition System.
Statistical Vocoder and its application to Text-to-Speech (TTS).
Different acoustic features and data size variation for WaveNet vocoder.
WaveNet vocoder for Multi-speaker.
Parallel WaveGan for Indian languages.
Automatic Speech Recognition for Children.
Prosody modifications to reduce mismatch condition .
Exploring different acoustic features.
Data augmentation for improving traning data.
Pathological speech processing.
Enhancement of Pathological speech .
Analysis of Voice Disorders .
Classification of different categories of Pathological speech.
Applications of Riesz transform
Pitch estimation using Riesz transform .
Riesz transform for statistical parametric speech synthesis.
Speech recognition using Riesz transform.
Glottal activity region based processing for speech synthesis
Glottal activity detection using source features .
Improved voicing decision for statistical parametric speech synthesis .
Source modeling using features derived from glottal activity region
Development of Text to Speech System (TTS) in Assamese and Manipuri Languages .