Speaker identification improvements

Mavio’s speaker identification (diarization) engine has been rebuilt with a new model that delivers significantly better accuracy, especially in challenging scenarios.

Accuracy improvements

ScenarioPrevious accuracyNew accuracy
2 speakers91%97%
3-4 speakers85%94%
5-8 speakers76%89%
9+ speakers62%81%

What changed

  • Voice profile learning — Mavio now builds a voice profile for each participant over time. The more meetings someone attends, the more accurately they are identified in future recordings.
  • Cross-meeting consistency — the same speaker is now identified consistently across different meetings, even without manual correction. If Mavio learns that “Speaker 2” is Sarah Chen in one meeting, it applies that label automatically in the next.
  • Overlapping speech handling — significantly improved accuracy when two or more people speak simultaneously. The model now separates overlapping voices instead of attributing the segment to a single speaker.
  • Accent robustness — the new model performs equally well across a wide range of accents and speaking styles.

Speaker management

A new Speakers section in the dashboard lets you manage identified speakers:
  • View all detected speakers across your meeting library
  • Merge duplicate speaker profiles (e.g., “John” and “John Smith”)
  • Assign photos from your Google Contacts or workspace directory
  • Correct misidentifications with one click — the correction is applied retroactively and improves future accuracy
After your first few meetings, review the Speakers section and merge any duplicates. This trains Mavio’s voice profiles and dramatically improves identification accuracy in subsequent recordings.

40+ language support

Mavio now supports transcription and AI summaries in over 40 languages.

Newly supported languages

European

French, German, Spanish, Italian, Portuguese, Dutch, Polish, Swedish, Norwegian, Danish, Finnish, Czech, Romanian, Hungarian, Greek, Ukrainian, Croatian, Slovak, Bulgarian, Lithuanian

Asian

Japanese, Korean, Mandarin Chinese, Cantonese, Hindi, Thai, Vietnamese, Indonesian, Malay, Tagalog, Tamil, Bengali, Turkish

Other

Arabic, Hebrew, Russian, Swahili, Persian

Language features

  • Auto-detection — Mavio automatically detects the spoken language. No manual configuration needed.
  • Multi-language meetings — when participants speak different languages in the same meeting, Mavio identifies and transcribes each language correctly, switching inline as the conversation shifts.
  • Translated summaries — regardless of the language spoken, you can generate summaries in any supported language. Record in Japanese, get your summary in English.
  • Localized action items — action items are extracted in the original language and can be translated on demand.
Transcription accuracy varies by language. English, Spanish, French, German, Portuguese, Japanese, Korean, and Mandarin achieve 95%+ accuracy. Less common languages achieve 88-93% accuracy and will improve over time as the models are refined.

Additional improvements

  • Faster processing — transcription pipeline optimized, reducing average processing time from 5 minutes to 3 minutes for a 60-minute meeting
  • Improved timestamps — word-level timestamps are now accurate to within 200ms (previously 500ms)
  • Custom vocabulary — add industry-specific terms, product names, and acronyms in Settings > AI > Custom vocabulary so Mavio transcribes them correctly
  • API language parameter — specify the expected language in API calls for faster processing when the language is known in advance