🔊 Audio Event Detection
SONATA includes advanced audio event detection capabilities that can identify over 500 different types of sounds in your audio files.
Full List of Supported Audio Events
SONATA can detect over 523 different audio event types, grouped into categories including human sounds, animal sounds, music, natural sounds, and more.
For a complete list of detectable audio events, see the AudioEventType
enum in constants.py`.
Detection Sensitivity
SONATA uses a sophisticated detection system that employs different sensitivity thresholds for different types of audio events. The default threshold for most events is 0.5, but certain subtle events like laughter, sighs, or breathing have lower thresholds to ensure they’re detected properly.
Custom Audio Event Thresholds
You can customize the detection sensitivity for specific audio events to better suit your particular use case. This is useful when you need to:
- Increase sensitivity for certain events you’re particularly interested in
- Decrease sensitivity for events that might be triggering false positives
- Fine-tune the detection based on your specific audio environment
Using Custom Thresholds in Python
from sonata.core.transcriber import IntegratedTranscriber
# Define custom thresholds (values between 0.0 and 1.0)
custom_thresholds = {
"laughter": 0.05, # More sensitive (lower threshold)
"cough": 0.2, # Less sensitive (higher threshold)
"music": 0.7, # Much less sensitive
"dog": 0.3 # Custom threshold for dog sounds
}
# Initialize transcriber with custom thresholds
transcriber = IntegratedTranscriber(
asr_model="large-v3",
device="cpu",
custom_audio_thresholds=custom_thresholds
)
# Process audio with custom thresholds applied
result = transcriber.process_audio("path/to/audio.wav", language="en")
Using Custom Thresholds with CLI
Create a JSON file with your custom thresholds:
{
"laughter": 0.05,
"giggle": 0.05,
"cough": 0.2,
"sneeze": 0.2,
"music": 0.7
}
Then use the --custom-thresholds
parameter:
sonata-asr path/to/audio.wav --custom-thresholds path/to/thresholds.json
Example
Here’s a sample of how audio events appear in the transcript:
[00:05.234] The presenter walked onto the stage [applause]
[00:10.567] Thank you everyone for coming today [laughter]
[00:15.123] Let me share some exciting news about our latest product [music]
Advanced Use Cases
Custom thresholds can be particularly useful in scenarios like:
- Podcast Analysis: Increase sensitivity for laughter to capture audience reactions
- Meeting Transcription: Reduce sensitivity for keyboard/typing sounds that might be prevalent
- Nature Recordings: Customize thresholds for specific bird or animal sounds
- Music Analysis: Fine-tune detection of specific instruments or musical elements
For a complete example of using custom thresholds, see the custom_thresholds_example.py script.