RT - Journal Article T1 - Designing a New Non-parallel Training Method to Voice Conversion with Better Performance than Parallel Training JF - ajaums-jps YR - 2015 JO - ajaums-jps VO - 10 IS - 2 UR - http://jps.ajaums.ac.ir/article-1-52-en.html SP - 6 EP - 16 K1 - Voice conversion K1 - Speech analysissynthesis K1 - Non-parallel training system K1 - INCA algorithm K1 - Gaussian Mixture Model (GMM) K1 - Universal Background Model (UBM) K1 - Realtime voice conversion AB - Introduction: The art of voice mimicking by computers, has with the computer have been one of the most challenging topics of speech processing in recent years. The system of voice conversion has two sides. In one side, the speaker is the source that his or her voice has been changed for mimicking the target speaker’s voice (which is on the other side). Two methods of parallel and non-parallel training are used for voice conversion. In parallel method, both source and target speakers express the same sentences while different sentences are expressed in non-parallel method. Most of the voice conversion researchers prefer to use parallel training; however, there is not always the possibility of collecting parallel data. Therefore, there is a need for using non-parallel methods. Methods and Materials: Source and target speakers’ voice was recorded and then analyzed. Voice features of both speakers were extracted by signal processing. Then the action of alignment has been done and the function of voice conversion was obtained. Source voice has been analyzed and the action of extracting feature has been carried out in order to convert source voice to the target. Voice conversion function from the previous section was applied on the extracted features. Then, the reverse action of features was done and finally, the voice synthesis took place. Moreover, the synthesized voice is the voice of target person. Results: The results of both numerical and objective experiments demonstrated that our proposed method is better than parallel training methods. It was observed that this superiority holds for different sizes of training material from 5 to 40 training sentences, both in terms of quality and similarity to the target speaker. Discussion and Conclusion: It seems that our proposed method is a serious competitor of parallel training method for from alignment. LA eng UL http://jps.ajaums.ac.ir/article-1-52-en.html M3 ER -