Variance based time-frequency mask estimation for unsupervised speech enhancement

Saleem, N. and Khattak, M.I. and Witjaksono, G. and Ahmad, G. (2019) Variance based time-frequency mask estimation for unsupervised speech enhancement. Multimedia Tools and Applications, 78 (22). pp. 31867-31891.

Full text not available from this repository.
Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

Variance based two dimensional time-frequency mask estimation for unsupervised speech enhancement is proposed to improve the speech quality and intelligibility by reducing the low-frequency residual noise distortion in the noisy speech signals. Unlike conventional speech enhancement methods, the proposed method is able to reduce the residual noise distortion by utilizing benefits of the less aggressive Wiener gain and variance based two dimensional time-frequency mask to establish a two-stage speech enhancement method. In the first stage, the less aggressive Wiener gain with modified a priori signal-to-noise (SNR) estimate is applied to the input noisy speech to obtain a reduced noise pre-processed speech signal. In the second stage, variance based features are extracted from the pre-processed speech and compared to a nonparametric adaptive threshold to construct a two dimensional time-frequency mask. The estimated mask is then applied to the pre-processed speech from the first stage to suppress the annoying residual noise distortion. A comparative performance study is included to demonstrate the effectiveness of the proposed method in various noisy conditions. The experimental results showed large improvements in terms of the perceptual evaluation of speech quality (PESQ), segmental SNR (SegSNR), residual noise distortion (BAK) and speech distortion (SIG) over that achieved with competing methods at different input SNRs. To measure the understanding of enhanced speech in different noisy conditions, short-time intelligibility prediction (STOI) is used which reinforced a better performance of the proposed method in terms of the speech intelligibility. The time-varying spectral analysis validated significant reduction of the residual noise components in the enhanced speech. © 2019, Springer Science+Business Media, LLC, part of Springer Nature.

Item Type: Article
Impact Factor: cited By 2
Uncontrolled Keywords: Frequency estimation; Quality control; Signal to noise ratio; Spectrum analysis; Speech communication; Speech enhancement, Priori SNR; Speech quality; Time-Frequency Masking; Variance-based features; Wiener gain, Speech intelligibility
Depositing User: Ms Sharifah Fahimah Saiyed Yeop
Date Deposited: 27 Aug 2021 08:45
Last Modified: 27 Aug 2021 08:45
URI: http://scholars.utp.edu.my/id/eprint/24872

Actions (login required)

View Item
View Item