SONATA Features ๐ŸŽต๐Ÿ”Š

SONATA offers a comprehensive suite of audio transcription and analysis features. This document provides details on each major feature.

๐ŸŽ™๏ธ High-Accuracy Speech Recognition

SONATA uses WhisperX, an enhanced version of Whisper that provides:

  • State-of-the-art transcription accuracy across multiple languages
  • Word-level timestamps for precise text alignment
  • Support for various Whisper models (tiny, base, small, medium, large, large-v2, large-v3)
  • Automatic language detection capabilities
  • Model optimization for various hardware (CPU, CUDA, MPS)

Advanced Audio Event Detection

SONATA identifies non-speech sounds - from laughter and crying to ambient noises like traffic or music. Our system can detect over 523 different audio events with precise confidence scoring.

๐Ÿ”Š See complete list of detectable audio events

๐ŸŒ Multi-Language Support

SONATA supports 10 languages:

  • English (en)
  • Korean (ko)
  • Chinese (zh)
  • Japanese (ja)
  • French (fr)
  • German (de)
  • Spanish (es)
  • Italian (it)
  • Portuguese (pt)
  • Russian (ru)

๐Ÿ‘ฅ Speaker Diarization

  • Identify and label different speakers in multi-speaker audio
  • Set minimum and maximum speaker constraints
  • Integrated with PyAnnoteโ€™s diarization models
  • Speaker-attributed transcripts with formatting options

โฑ๏ธ Rich Timestamp Information

  • Word-level timestamps for all transcribed content
  • Precise timing for audio events
  • Multiple output formats with varying levels of timestamp detail
  • Support for extracting specific time ranges

๐Ÿ”„ Audio Preprocessing

  • Audio format conversion for maximum compatibility
  • Silence detection and trimming to improve transcription quality
  • Audio segmentation for long files
  • Custom segment length and overlap controls

๐Ÿ“Š Multiple Output Formats

  • Concise: Simple text with integrated audio event tags
  • Default: Text with timestamps
  • Extended: Includes confidence scores
  • JSON output with comprehensive metadata

๐Ÿ“ฑ Convenient Interfaces

  • Python API for integration into other applications
  • Command-line interface for quick usage
  • Batch processing capabilities
  • Progress indicators for long-running operations

Back to top

Copyright © 2024 SONATA. Distributed under GPLv3 license.