Speaker identification improvements
Mavio’s speaker identification (diarization) engine has been rebuilt with a new model that delivers significantly better accuracy, especially in challenging scenarios.Accuracy improvements
| Scenario | Previous accuracy | New accuracy |
|---|---|---|
| 2 speakers | 91% | 97% |
| 3-4 speakers | 85% | 94% |
| 5-8 speakers | 76% | 89% |
| 9+ speakers | 62% | 81% |
What changed
- Voice profile learning — Mavio now builds a voice profile for each participant over time. The more meetings someone attends, the more accurately they are identified in future recordings.
- Cross-meeting consistency — the same speaker is now identified consistently across different meetings, even without manual correction. If Mavio learns that “Speaker 2” is Sarah Chen in one meeting, it applies that label automatically in the next.
- Overlapping speech handling — significantly improved accuracy when two or more people speak simultaneously. The model now separates overlapping voices instead of attributing the segment to a single speaker.
- Accent robustness — the new model performs equally well across a wide range of accents and speaking styles.
Speaker management
A new Speakers section in the dashboard lets you manage identified speakers:- View all detected speakers across your meeting library
- Merge duplicate speaker profiles (e.g., “John” and “John Smith”)
- Assign photos from your Google Contacts or workspace directory
- Correct misidentifications with one click — the correction is applied retroactively and improves future accuracy
40+ language support
Mavio now supports transcription and AI summaries in over 40 languages.Newly supported languages
European
French, German, Spanish, Italian, Portuguese, Dutch, Polish, Swedish, Norwegian, Danish, Finnish, Czech, Romanian, Hungarian, Greek, Ukrainian, Croatian, Slovak, Bulgarian, Lithuanian
Asian
Japanese, Korean, Mandarin Chinese, Cantonese, Hindi, Thai, Vietnamese, Indonesian, Malay, Tagalog, Tamil, Bengali, Turkish
Other
Arabic, Hebrew, Russian, Swahili, Persian
Language features
- Auto-detection — Mavio automatically detects the spoken language. No manual configuration needed.
- Multi-language meetings — when participants speak different languages in the same meeting, Mavio identifies and transcribes each language correctly, switching inline as the conversation shifts.
- Translated summaries — regardless of the language spoken, you can generate summaries in any supported language. Record in Japanese, get your summary in English.
- Localized action items — action items are extracted in the original language and can be translated on demand.
Transcription accuracy varies by language. English, Spanish, French, German, Portuguese, Japanese, Korean, and Mandarin achieve 95%+ accuracy. Less common languages achieve 88-93% accuracy and will improve over time as the models are refined.
Additional improvements
- Faster processing — transcription pipeline optimized, reducing average processing time from 5 minutes to 3 minutes for a 60-minute meeting
- Improved timestamps — word-level timestamps are now accurate to within 200ms (previously 500ms)
- Custom vocabulary — add industry-specific terms, product names, and acronyms in Settings > AI > Custom vocabulary so Mavio transcribes them correctly
- API language parameter — specify the expected language in API calls for faster processing when the language is known in advance