How speaker identification works
Mavio uses a two-stage process to identify speakers:Stage 1: Speaker diarization
AI models analyze the audio and detect when the speaker changes. Each continuous speech segment is grouped and assigned to a distinct speaker cluster. At this stage, speakers are labeled generically (Speaker 1, Speaker 2, etc.).Stage 2: Speaker matching
Mavio matches the anonymous speaker clusters to known identities using multiple signals:| Signal | How it helps |
|---|---|
| Meeting participants | Calendar events include attendee names — Mavio maps speakers to the participant list |
| Voice profiles | If a speaker has been identified in previous meetings, their voice profile is used for matching |
| Meeting bot metadata | The meeting bot receives participant join/leave events with names from the platform |
| Manual corrections | When you correct a speaker label, Mavio learns that voice for future meetings |
Accuracy expectations
| Scenario | Typical accuracy |
|---|---|
| Meeting bot with calendar sync | 95-98% |
| Meeting bot without calendar | 90-95% |
| System audio with known speakers | 85-92% |
| Mobile recording, first time | 75-85% |
| Mobile recording, known speakers | 85-92% |
Voice profiles
Mavio builds a voice profile for each speaker it encounters. Voice profiles are acoustic representations — they capture the unique characteristics of a person’s voice (pitch, cadence, formant patterns) without storing actual audio.How profiles are created
Profiles are created automatically when a speaker is identified for the first time. The more meetings a speaker appears in, the more robust their profile becomes.Managing voice profiles
Go to Settings > AI > Speaker profiles to view and manage your organization’s voice profiles:- Merge profiles — combine duplicate profiles that represent the same person
- Rename profiles — correct the name associated with a profile
- Delete profiles — remove profiles for people who are no longer relevant
Correcting speaker labels
If the AI assigns the wrong name to a speaker:When you correct a speaker label, Mavio updates the voice profile to improve future identification. Corrections are the single most effective way to improve speaker accuracy over time.
Handling challenging scenarios
Multiple speakers talking at once
Multiple speakers talking at once
Overlapping speech (crosstalk) is the hardest scenario for diarization. Mavio handles brief overlaps well but extended crosstalk may result in segments being attributed to the wrong speaker. Encourage participants to take turns for cleaner transcripts.
Very similar voices
Very similar voices
Occasionally two speakers have very similar vocal characteristics. Calendar metadata and meeting participant data help resolve ambiguity. If the AI still confuses them, manual corrections will train the system to distinguish them.
Phone or dial-in participants
Phone or dial-in participants
Phone audio is lower quality (8 kHz narrowband) which reduces diarization accuracy. If possible, have participants join via the computer client for better audio fidelity.
Large meetings (10+ speakers)
Large meetings (10+ speakers)
Accuracy decreases slightly as the number of speakers increases. For large meetings, using the meeting bot with calendar sync is strongly recommended, as participant metadata helps disambiguate speakers.