Differentiable Measures for Speech Spectral Modeling

Arjona Ramírez, Miguel; Beccaro, Wesley; Rodríguez, Demóstenes Zegarra; Rosa, Renata Lopes

Differentiable Measures for Speech Spectral Modeling

dc.creator	Arjona Ramírez, Miguel
dc.creator	Beccaro, Wesley
dc.creator	Rodríguez, Demóstenes Zegarra
dc.creator	Rosa, Renata Lopes
dc.date.accessioned	2022-07-18T20:53:34Z
dc.date.available	2022-07-18T20:53:34Z
dc.date.issued	2022-02
dc.description.abstract	Autoregressive models for the envelope of speech power spectral densities (PSDs) are refined by the self-supervised spectral learning machine (S3LM) provided with differentiable spectral objective functions, including the Itakura-Saito divergence (ISD), the Kullback-Leibler divergence (KLD), the reverse KLD (RKLD) and the log spectral distortion (LSD), which display more significant results. However, in order to assess the models more perceptually, a method is proposed based upon perturbations around perfect reconstruction analysis-synthesis configurations. In the cross-excitation analysis-synthesis assessment (CEASA) method, the residual signals generated by analysis filters of the spectral models are injected as excitation into the synthesis filters derived from the same and other models in order to be evaluated by the perceptual evaluation of speech quality (PESQ) and Itakura divergence (ID), which are averaged over a set of models obtained using the objective functions mentioned above. The results lead to a superior performance when the RKLD is used as the loss function for the estimation of the spectral models with the ISD ranking close behind. The focus of these divergences on the spectral peaks is argued and pointed as the most important factor for this behavior. Specifically, using the PESQ scores obtained with CEASA, the RKLD loss is found to improve the performance by 1.0%, 4.0% and 19.3% with respect to the open-loop analysis, the KLD and the LSD models, respectively, while the corresponding improvements for the ISD loss are 0.1%, 3.0% and 18.2%, and the RKLD models excel the ISD models by 1.0% on average. Even though the spectral measures alone are not able to unequivocally distinguish the better of the two, CEASA is shown to have enough sensitivity to distinguish their performances. In summary, the learning machine S3LM fits models for the short-term spectral envelope of speech and, for the evaluation of its performance under several differentiable loss...	pt_BR
dc.description.provenance	Submitted by Daniele Faria (danielefaria@ufla.br) on 2022-07-18T13:30:12Z No. of bitstreams: 2 ARTIGO_Differentiable Measures for Speech Spectral Modeling.pdf: 1505846 bytes, checksum: 1488a8f07316e8665e457b10e473b355 (MD5) license_rdf: 907 bytes, checksum: c07b6daef3dbee864bf87e6aa836cde2 (MD5)	en
dc.description.provenance	Approved for entry into archive by Eliana Bernardes (eliana@biblioteca.ufla.br) on 2022-07-18T20:53:34Z (GMT) No. of bitstreams: 2 ARTIGO_Differentiable Measures for Speech Spectral Modeling.pdf: 1505846 bytes, checksum: 1488a8f07316e8665e457b10e473b355 (MD5) license_rdf: 907 bytes, checksum: c07b6daef3dbee864bf87e6aa836cde2 (MD5)	en
dc.description.provenance	Made available in DSpace on 2022-07-18T20:53:34Z (GMT). No. of bitstreams: 2 ARTIGO_Differentiable Measures for Speech Spectral Modeling.pdf: 1505846 bytes, checksum: 1488a8f07316e8665e457b10e473b355 (MD5) license_rdf: 907 bytes, checksum: c07b6daef3dbee864bf87e6aa836cde2 (MD5) Previous issue date: 2022-02	en
dc.identifier.citation	ARJONA RAMÍREZ, M. et al. Differentiable Measures for Speech Spectral Modeling. IEEE Access, [S.I.], v. 10, p. 17609-17618, 2022. DOI: 10.1109/ACCESS.2022.3150728.	pt_BR
dc.identifier.uri	https://repositorio.ufla.br/handle/1/50638
dc.language	en	pt_BR
dc.publisher	Institute of Electrical and Electronics Engineers (IEEE)	pt_BR
dc.rights	acesso aberto	pt_BR
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.source	IEEE Access	pt_BR
dc.subject	Autoregressive processes	pt_BR
dc.subject	Machine learning algorithms	pt_BR
dc.subject	Prediction methods	pt_BR
dc.subject	Selfsupervised learning	pt_BR
dc.subject	Speech analysis	pt_BR
dc.subject	Spectral analysis	pt_BR
dc.subject	Processos autorregressivos	pt_BR
dc.subject	Algoritmos de aprendizagem de máquinas	pt_BR
dc.subject	Métodos de previsão	pt_BR
dc.subject	Aprendizado autossupervisionado	pt_BR
dc.subject	Análise de discurso	pt_BR
dc.subject	Análise espectral	pt_BR
dc.title	Differentiable Measures for Speech Spectral Modeling	pt_BR
dc.type	Artigo	pt_BR

Arquivos

Pacote original

Agora exibindo 1 - 1 de 1

Nome:: ARTIGO_Differentiable Measures for Speech Spectral Modeling.pdf
Tamanho:: 1.44 MB
Formato:: Adobe Portable Document Format
Descrição:

Baixar

Licença do pacote

Agora exibindo 1 - 1 de 1

Nome:: license.txt
Tamanho:: 953 B
Formato:: Item-specific license agreed upon to submission
Descrição:

Baixar

Coleções

DCC - Artigos publicados em periódicos