Nagaraj Adiga - Projects

Projects

I am involved in developing the following Techniques:

- Voice agent development for Health care applications
  1. Built production-grade healthcare voice agents using the LiveKit framework.
  2. Benchmarked STT, LLM, and TTS models for latency, cost, and quality.
  3. Implemented post-call quality analysis using an “LLM-as-a-Judge” evaluation pipeline.

- Speech Large language model exploration
  1. Explored Speech Large Language Models for speech understanding tasks.
  2. Created and prepared datasets for training and evaluation.
  3. Built speech recognition and speech translation workflows using Speech LLMs.

- Text to speech Synthesis using Diffusion model
  1. Trained diffusion-based TTS models for natural speech synthesis.
  2. Built multi-speaker TTS for voice cloning use cases.
  3. Developed TTS support for 8 Indian languages with latency < 100ms.

- Speech recognition for module for Telephone use-case
  1. Developed ASR modules optimized for telephone audio (8 kHz, noisy channels).
  2. Performed self-supervised training using 10k+ hours of telephone speech data.
  3. Fine-tuned ASR models for Indian languages with domain-specific adaptation.

- E2E ASR model exploration
  1. Explored end-to-end ASR model architectures and training recipes.
  2. Built a production-ready speech recognition module with reliable evaluation.
  3. Applied model compression and pruning to reduce latency and cost.
  4. Implemented multilingual ASR and domain adaptation using LoRA-style methods.

- End to End AI Solution for Voice Audit Analytics
  1. Sentiment analysis of call recordings and profiling Agent and customer Tones
  2. Automation platform for training and testing Deep learning models
  3. E2E ASR, support for automatic punctuation in transcription, confidence score etc
  4. Neural network based Language models
- End to End AI Solution for Voice bot
  1. Context based ASR and low resource ASR
  2. Natural Speech synthesis with control over speed, pitch and speaking style
  3. Voice cloning for any speaker with limited amount of data
- Speech Enhancement Using Generative Adversarial network.
  1. Neural network based speech enhancement using GAN under high SNR.
  2. Developing TTS system under noisy condition
  3. Non-parallel system using Cycle GAN.
- Data augmentation for Automatic Speech Recognition.
  1. Data augmentation using Generative adversarial network (GAN).
  2. Verification of GAN based Data augmentation in Automatic Speech Recognition System.
- Statistical Vocoder and its application to Text-to-Speech (TTS).
  1. Different acoustic features and data size variation for WaveNet vocoder.
  2. WaveNet vocoder for Multi-speaker.
  3. Parallel WaveGan for Indian languages.
Automatic Speech Recognition for Children.
- 1. Prosody modifications to reduce mismatch condition .
  2. Exploring different acoustic features.
  3. Data augmentation for improving traning data.
Pathological speech processing.
- 1. Enhancement of Pathological speech .
  2. Analysis of Voice Disorders .
  3. Classification of different categories of Pathological speech.
Applications of Riesz transform
- 1. Pitch estimation using Riesz transform .
  2. Riesz transform for statistical parametric speech synthesis.
  3. Speech recognition using Riesz transform.
Glottal activity region based processing for speech synthesis
- 1. Glottal activity detection using source features .
  2. Improved voicing decision for statistical parametric speech synthesis .
  3. Source modeling using features derived from glottal activity region
  4. Development of Text to Speech System (TTS) in Assamese and Manipuri Languages .

Page updated

Google Sites

Report abuse