Always Learning

Advanced Search

Text to Speech Synthesis

Text to Speech Synthesis

New Paradigms and Advances

Shrikanth Narayanan, Abeer Alwan

Aug 2004, Hardback, 288 pages
ISBN13: 9780131456617
ISBN10: 013145661X
This title is no longer available.
ú71.99

This title cannot be purchased online
  • Print pagePrint page
  • Email this pageEmail page
  • Write a reviewWrite a review
  • Share

Recent advances in speech synthesis will enable the development of high-quality natural voice systems with broad application in education, business, entertainment, and medicine. Text to Speech Synthesis is the first book to comprehensively document these new research trends and paradigms, balancing coverage of research and applications. It brings together seminal research by leaders in the field, drawn from both academic and industrial laboratories worldwide.

The authors and editors offer broad coverage of several key areas, including new unit selection approaches, speech representations and modeling, data-driven synthesis schemes, and expressive speech synthesis.

Coverage includes:

  • Unit Selection Methods: Reducing discontinuities at synthesis time in corpus-based speech processing, voice quality variation, and join costs
  • Hidden Markov Model (HMM)-Based Synthesis: Advanced uses of speech recognition technology, HMM-based multilingual speech synthesis, and new prosody control techniques
  • Expressive Speech Synthesis: Challenges, questions, and avenues of research, including diphone transplantation and minimization of pitch modification
  • Speech Representation and Models: A new articulatory modeling paradigm for controlling synthesis quality

This is an essential resource for all researchers working in speech synthesis and related areas such as multimedia signal processing, linguistics, and spoken user interfaces. It will also be valuable to any engineer, developer, or manager who must evaluate the latest speech technologies or integrate them into practical applications.



Recent advances in speech synthesis will enable the development of high-quality natural voice systems with broad application in education, business, entertainment, and medicine. Text to Speech Synthesis is the first book to comprehensively document these new research trends and paradigms, balancing coverage of research and applications. It brings together seminal research by leaders in the field, drawn from both academic and industrial laboratories worldwide.

The authors and editors offer broad coverage of several key areas, including new unit selection approaches, speech representations and modeling, data-driven synthesis schemes, and expressive speech synthesis.

Coverage includes:

  • Unit Selection Methods: Reducing discontinuities at synthesis time in corpus-based speech processing, voice quality variation, and join costs
  • Hidden Markov Model (HMM)-Based Synthesis: Advanced uses of speech recognition technology, HMM-based multilingual speech synthesis, and new prosody control techniques
  • Expressive Speech Synthesis: Challenges, questions, and avenues of research, including diphone transplantation and minimization of pitch modification
  • Speech Representation and Models: A new articulatory modeling paradigm for controlling synthesis quality

This is an essential resource for all researchers working in speech synthesis and related areas such as multimedia signal processing, linguistics, and spoken user interfaces. It will also be valuable to any engineer, developer, or manager who must evaluate the latest speech technologies or integrate them into practical applications.



Preface.

Foreword.

1. Reducing Discontinuities at Synthesis Time for Corpus-Based Speech Synthesis.

Baris Bozkurt, Thierry Dutoit, Romain Prudon, Christophe D'Alessandro and Vincent Pagel.

Introduction.

Shift-Only F0 Smoothing.

Improving Quality of MBROLA Synthesis.

Evaluation.

Discussions and Conclusion.

Bibliography.

2. Voice Quality Variation in a Long-Term Recording of a Single Speaker Speech Corpus.

Hisashi Kawai and Minoru Tsuzaki.

Introduction.

Perceptual Experiment.

Factors of Voice Quality Variation.

Candidates of Acoustic Correlates.

Prediction of Voice Quality Difference Scores.

Summary.

Bibliography.

3. Join Cost for Unit Selection Speech Synthesis.

Jithendra Vepa and Simon King.

Introduction.

Previous Work.

Spectral Distances.

Perceptual Listening Tests.

Results and Discussion.

Conclusions.

Bibliography.

4. Articulatory Modeling: A Role in Concatenative Text to Speech Synthesis.

M. Mohan Sondhi and Daniel J. Sinder.

Introduction.

Articulatory Modeling.

Rule-Based Control of the Parameters.

Concatenative Articulatory Synthesis.

Concluding Remarks.

Bibliography.

5. Minimizing The Amount of Pitch Modification in Speech Synthesis.

Esther Klabbers, Jan van Santen and Johan Wouters.

Introduction.

Speech Corpus Analysis.

Text Corpus Analysis.

Perceptual Experiment.

Conclusion.

Bibliography.

6. The Use of Speech Recognition Technology in Speech Synthesis.

Mari Ostendorf and Ivan Bulyko.

Introduction.

Speech Recognition.

ASR in Synthesis.

Limitations.

Speculations.

Bibliography.

7. An HMM-Based Approach to Multilingual Speech Synthesis.

Keiichi Tokuda, Heiga Zen and Alan W. Black.

Introduction.

HMM-Based Speech Synthesis System.

F0 Pattern Modeling by HMM.

Speech-Parameter Generation from an HMM.

Implementation on Festival Architecture.

Discussion.

Conclusion.

Bibliography.

8. Prosody Control For HMM-Based Japanese TTS.

Koji Iwano, Masahiro Yamada, Taro Togawa and Sadaoki Furui.

Introduction.

Outline of HMM-Based TTS System.

Prosody Generation Using the Quantification Theory (Type 1).

Speech-Rate-Variable Synthesis Method.

Conclusions.

Bibliography.

9. Synthesizing Expressive Speech Overview: Challenges, and Open Questions.

Murtaza Bulut, Shrikanth Narayanan and Lewis Johnson.

Introduction.

Theories of Emotion.

Dimensions of Emotional Space.

Speech Synthesis Methods.

Emotional Speech Data Collection.

Experimental Evaluation of Expressive Speech.

Presentation of Results From Case Studies.

Conclusion.

Open Questions and Future Directions.

Bibliography.

10. Unit Selection Synthesis of Prosody: Evaluation Using Diphone Transplantation.

Romain Prudon, Christophe D'Alessandro and Philippe Boula de MareŘil.

Introduction.

Computing Prosody by Selection.

Comparative Evaluation.

Results.

Conclusion.

Bibliography.

11. Toward Expressive Synthetic Speech.

Ellen Eide, Raimo Bakis, Wael Hamza and John F. Pitrelli.

Introduction.

A Pilot Study For Generating Expressive Speech.

Generating Expressive Speech with Limited Resources.

Rule-Based Methods for Generating Expressive Speech.

Use of an Expressive TTS System.

Assessing Performance.

Conclusions.

Bibliography.

Footnotes.

Copyright Forms.

References.

Index.

Dr. Shrikanth Narayanan is associate professor at the Signal and Image Processing Institute of USC's Electrical Engineering Department. He founded and directs USC's Speech Analysis and Interpretation Laboratory, and serves as research area director of the Integrated Media Systems Center, an NSF Engineering Research Center. He is associate editor of IEEE Transactions of Speech and Audio Processing, serves on the speech communication technical committee of the Acoustical Society of America, and was Principal Member of Technical Staff at AT&T Laboratories.

Dr. Abeer Alwan, a professor of electrical engineering at UCLA, established and directs the Speech Processing and Auditory Perception Laboratory there. Her research interests include modeling human speech production and perception mechanisms and applying these models to speech-processing applications such as noise-robust automatic speech recnognition, compression, and synthesis. She is a Fellow of the Acoustical Society of America and recently served as editor-in-chief of the journal Speech Communication.



013145661XAB04232004

Your opinions count

Be the first to review this product. Write your review now.