...::Mehran University Research Journal of Engineering & Technology::...

Article Information
Automatic Speaker Identification Using Clinically Depressed Speech Content Keywords: Speaker Recognition, Depression, Clinical Environment, Gaussian Mixture Model. Mehran University Research Journal of Engineering & Technology Volume 31 , Issue 2 Sheeraz Memon,Faisal Karim Shaikh,Javed Ali Baloch Abstract The environment affects largely the performance of automatic speaker recognition. This work investigates the effects of clinical environment on the task of speaker recognition. For this task we have used two sets of speakers, a clinical set which consists of speech samples from 70 clinically depressed speakers and a control set which comprises of 68 clinically non-depressed speakers. The MFCCs (Mel Frequency Cepstral Coefficients) are applied for feature extraction, and a number of modeling methods such as GMM-EM (Gaussian Mixture Models Based on Expectation Maximization), GMM based on Kmeans (GMM-Kmeans), GMM-LBG based on Linde Buzo Gray, and GMM -ITVQ based on Information Theoretic Vector Quantization are used. The different modeling methods are evaluated for the novel speech corpus. The results suggest that the speaker recognition rates for the depressed speakers are lower (60-71%) than for the non-depressed speakers (79-89%). This paper further investigate the performance of VQ (Vector Quantization) based Gaussian modeling, and proposes a novel approach called GMM-ITVQ. The results suggest that GMM-EM has the higher recognition rates however, the performance of GMMITVQ is comparable to GMM-EM.

Article Information

Automatic Speaker Identification Using Clinically Depressed Speech Content

Keywords: Speaker Recognition, Depression, Clinical Environment, Gaussian Mixture Model.

Mehran University Research Journal of Engineering & Technology

Volume 31 , Issue 2

Sheeraz Memon,Faisal Karim Shaikh,Javed Ali Baloch

Abstract

The environment affects largely the performance of automatic speaker recognition. This work investigates the effects of clinical environment on the task of speaker recognition. For this task we have used two sets of speakers, a clinical set which consists of speech samples from 70 clinically depressed speakers and a control set which comprises of 68 clinically non-depressed speakers. The MFCCs (Mel Frequency Cepstral Coefficients) are applied for feature extraction, and a number of modeling methods such as GMM-EM (Gaussian Mixture Models Based on Expectation Maximization), GMM based on Kmeans (GMM-Kmeans), GMM-LBG based on Linde Buzo Gray, and GMM -ITVQ based on Information Theoretic Vector Quantization are used. The different modeling methods are evaluated for the novel speech corpus. The results suggest that the speaker recognition rates for the depressed speakers are lower (60-71%) than for the non-depressed speakers (79-89%). This paper further investigate the performance of VQ (Vector Quantization) based Gaussian modeling, and proposes a novel approach called GMM-ITVQ. The results suggest that GMM-EM has the higher recognition rates however, the performance of GMMITVQ is comparable to GMM-EM.