LOW
The issue is assessed as LOW severity because it affects functionality rather than security or privacy. No sensitive data leaks are indicated, but inaccurate transcription could lead to miscommunication among users.

The open-source meeting bot Vexa has an architecture issue affecting speaker diarization due to merging all audio streams into one before transcription. This can lead to inaccurate attribution of speech segments, impacting the accuracy and reliability of transcriptions.

Affected Systems
  • Vexa (open-source meeting bot)
Affected Versions: All versions using the current audio merging architecture
Remediation
  • Implement a multi-stream audio processing pipeline where each participant's stream is processed separately before transcription to improve speaker diarization accuracy.
  • Update Vexa's codebase to separate WebRTC streams and handle them individually in different AudioContext instances.
  • Refactor the SPEAKER_START/SPEAKER_END event handling logic to work more reliably with individual streams.
Stack Impact

This issue impacts the use of WebRTC, JavaScript AudioContext, and transcription services like Whisper within Vexa. It does not directly impact nginx, docker, linux kernel, openssh, curl, openssl, python, or homelab components.

Source →