Designing a New Non-parallel Training Method to Voice Conversion with Better Performance than Parallel Training

Ghorbandoost, Mostafa; Saba, Valiallah

Volume 10, Issue 2 (Paramedical Sciences and Military Health (Autumn 2015) 2015) Paramedical Sciences and Military Health 2015, 10(2): 6-16 | Back to browse issues page

Mendeley

Zotero

RefWorks

Ghorbandoost M, Saba V. Designing a New Non-parallel Training Method to Voice Conversion with Better Performance than Parallel Training. Paramedical Sciences and Military Health 2015; 10 (2) :6-16
URL: http://jps.ajaums.ac.ir/article-1-52-en.html

Designing a New Non-parallel Training Method to Voice Conversion with Better Performance than Parallel Training

Mostafa Ghorbandoost

, Valiallah Saba ^*¹

1- , vsaba@aut.aut.ac.ir

Abstract: (7119 Views)

Introduction: The art of voice mimicking by computers, has with the computer have been one of the most challenging topics of speech processing in recent years. The system of voice conversion has two sides. In one side, the speaker is the source that his or her voice has been changed for mimicking the target speaker’s voice (which is on the other side). Two methods of parallel and non-parallel training are used for voice conversion. In parallel method, both source and target speakers express the same sentences while different sentences are expressed in non-parallel method. Most of the voice conversion researchers prefer to use parallel training; however, there is not always the possibility of collecting parallel data. Therefore, there is a need for using non-parallel methods.
Methods and Materials: Source and target speakers’ voice was recorded and then analyzed. Voice features of both speakers were extracted by signal processing. Then the action of alignment has been done and the function of voice conversion was obtained. Source voice has been analyzed and the action of extracting feature has been carried out in order to convert source voice to the target. Voice conversion function from the previous section was applied on the extracted features. Then, the reverse action of features was done and finally, the voice synthesis took place. Moreover, the synthesized voice is the voice of target person.
Results: The results of both numerical and objective experiments demonstrated that our proposed method is better than parallel training methods. It was observed that this superiority holds for different sizes of training material from 5 to 40 training sentences, both in terms of quality and similarity to the target speaker.
Discussion and Conclusion: It seems that our proposed method is a serious competitor of parallel training method for from alignment.

Keywords: Voice conversion, Speech analysissynthesis, Non-parallel training system, INCA algorithm, Gaussian Mixture Model (GMM), Universal Background Model (UBM), Realtime voice conversion

Full-Text [PDF 1579 kb] (3885 Downloads)

Type of Study: Research | Subject: article abstracts
Received: 2015/09/19 | Accepted: 2015/12/13 | Published: 2015/12/21

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

Paramedical Sciences and Military Health

Related Websites

Site Keywords