Project - Automatic transcription of conversation situations (Uni Paderborn) - Project | Paderborn University_365体育

Overview

Multi-talker conversational speech recognition is concerned with the transcription of audio re?cordings of formal meetings or informal get-to?gethers in machine-readable form using distant microphones. Current solutions are far from reaching human performance. The difficulty of the task can be attributed to three factors. First, the recording conditions are challenging: The speech signal captured by microphones from a distance is noisy and reverberated and often contains nonstationary acoustic distortions, which makes it hard to decode. Second, there is a significant percentage of time with overlap?ped speech, where multiple speakers talk at the same time. Finally, the interaction dynamics of the scenario are challenging because speakers articulate themselves in an intermittent manner with alternating segments of speech inactivity, single-, and multi-talker speech. We aim to develop a transcription system that is able to operate on arbitrary length input, correctly handles segments of overlapped as well as non-overlapped speech, and transcribes the speech of different speakers consistently into separate output streams. While existing ap?proaches using separately trained subsystems for diarization, separation, and recognition are by far not able to reach human performance, we believe that the missing piece is a formulation which encapsulates all aspects of meeting transcription and which allows to design a joint approach under a single optimization criterion. This project is aimed at such a coherent formulation.

Key Facts

Grant Number:: 448568305

Project type:: Sonstiger Zweck

Project duration:: 05/2021 - 12/2024

Funded by:: Deutsche Forschungsgemeinschaft (DFG)

Website:: DFG-Datenbank gepris

More Information

Principal Investigators

Prof. Dr. Reinhold H?b-Umbach

Communications Engineering / Heinz Nixdorf Institute

About the person

Ralf Schlüter

Technische Hochschule Aachen

About the person (Orcid.org)

Publications

TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker Embeddings

C. Boeddeker, A.S. Subramanian, G. Wichern, R. Haeb-Umbach, J. Le Roux, IEEE/ACM Transactions on Audio, Speech, and Language Processing 32 (2024) 1185–1197.

DOI PDF PDF PDF

Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization

T. von Neumann, C. Boeddeker, T. Cord-Landwehr, M. Delcroix, R. Haeb-Umbach, in: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing 365体育_足球比分网￥投注直播官网s (ICASSPW), IEEE, 2024.

DOI PDF

Once more Diarization: Improving meeting transcription systems through segment-level speaker reassignment

C. Boeddeker, T. Cord-Landwehr, R. Haeb-Umbach, in: Interspeech 2024, ISCA, 2024.

DOI

Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation for Meeting Transcription

P. Vieting, S. Berger, T. von Neumann, C. Boeddeker, R. Schlüter, R. Haeb-Umbach, in: 2024 IEEE Spoken Language Technology 365体育_足球比分网￥投注直播官网 (SLT), 2024.

Simultaneous Diarization and Separation of Meetings through the Integration of Statistical Mixture Models

T. Cord-Landwehr, C. Boeddeker, R. Haeb-Umbach, in: ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2024.

DOI

Show all publications

Funded by:

More information about the project:

DFG-Datenbank gepris