Speaker Identification

Mavio automatically identifies individual speakers in your recordings and labels each transcript segment with the correct name. This turns a wall of text into a readable conversation with clear attribution.

How speaker identification works

Mavio uses a two-stage process to identify speakers:

Stage 1: Speaker diarization

AI models analyze the audio and detect when the speaker changes. Each continuous speech segment is grouped and assigned to a distinct speaker cluster. At this stage, speakers are labeled generically (Speaker 1, Speaker 2, etc.).

Stage 2: Speaker matching

Mavio matches the anonymous speaker clusters to known identities using multiple signals:

Signal	How it helps
Meeting participants	Calendar events include attendee names — Mavio maps speakers to the participant list
Voice profiles	If a speaker has been identified in previous meetings, their voice profile is used for matching
Meeting bot metadata	The meeting bot receives participant join/leave events with names from the platform
Manual corrections	When you correct a speaker label, Mavio learns that voice for future meetings

The meeting bot provides the most accurate speaker identification because it receives participant metadata directly from the meeting platform. Calendar sync also improves accuracy by providing attendee names upfront.

Accuracy expectations

Scenario	Typical accuracy
Meeting bot with calendar sync	95-98%
Meeting bot without calendar	90-95%
System audio with known speakers	85-92%
Mobile recording, first time	75-85%
Mobile recording, known speakers	85-92%

Accuracy improves over time as Mavio builds voice profiles for your frequent contacts.

Voice profiles

Mavio builds a voice profile for each speaker it encounters. Voice profiles are acoustic representations — they capture the unique characteristics of a person’s voice (pitch, cadence, formant patterns) without storing actual audio.

How profiles are created

Profiles are created automatically when a speaker is identified for the first time. The more meetings a speaker appears in, the more robust their profile becomes.

Managing voice profiles

Go to Settings > AI > Speaker profiles to view and manage your organization’s voice profiles:

Merge profiles — combine duplicate profiles that represent the same person
Rename profiles — correct the name associated with a profile
Delete profiles — remove profiles for people who are no longer relevant

Correcting speaker labels

If the AI assigns the wrong name to a speaker:

Open the transcript

Navigate to the meeting and view the transcript.

Click the speaker name

Click the speaker label on any misidentified segment.

Select the correct speaker

Choose from the list of meeting participants or type a new name.

Apply to all segments

Mavio will ask if you want to relabel all segments from this speaker in the current meeting. Click Yes, update all to fix the entire transcript at once.

When you correct a speaker label, Mavio updates the voice profile to improve future identification. Corrections are the single most effective way to improve speaker accuracy over time.

Handling challenging scenarios

Multiple speakers talking at once

Overlapping speech (crosstalk) is the hardest scenario for diarization. Mavio handles brief overlaps well but extended crosstalk may result in segments being attributed to the wrong speaker. Encourage participants to take turns for cleaner transcripts.

Very similar voices

Occasionally two speakers have very similar vocal characteristics. Calendar metadata and meeting participant data help resolve ambiguity. If the AI still confuses them, manual corrections will train the system to distinguish them.

Phone or dial-in participants

Phone audio is lower quality (8 kHz narrowband) which reduces diarization accuracy. If possible, have participants join via the computer client for better audio fidelity.

Large meetings (10+ speakers)

Accuracy decreases slightly as the number of speakers increases. For large meetings, using the meeting bot with calendar sync is strongly recommended, as participant metadata helps disambiguate speakers.

Privacy

Voice profiles are stored securely and encrypted at rest. They are used only for speaker identification within your organization’s Mavio workspace. Voice profiles are never shared with other organizations or used for purposes other than speaker labeling. You can delete all voice profiles at any time from Settings > AI > Speaker profiles.

Documentation Index

​How speaker identification works

​Stage 1: Speaker diarization

​Stage 2: Speaker matching

​Accuracy expectations

​Voice profiles

​How profiles are created

​Managing voice profiles

​Correcting speaker labels

​Handling challenging scenarios

​Privacy

How speaker identification works

Stage 1: Speaker diarization

Stage 2: Speaker matching

Accuracy expectations

Voice profiles

How profiles are created

Managing voice profiles

Correcting speaker labels

Handling challenging scenarios

Privacy