What is it about?
The paper documents the long-term research that led to the current unrestricted spoken English system for output from computers based on an acoustic "tube model" of the human vocal tract; describes how various problems were overcome; and links the work to the research of others and explains the methods and tools used, as well as their potential for speech research in general. Because the tube model is a reasonably accurate representation of the actual acoustic basis of the real vocal tract, and may be controlled in real time, voices equivalent to different speakers can be imitated producing speech on demand from normally punctuated text, and the speech is more natural because the energy balances involved between the oral and nasal passages, losses from the cheeks and throat, the radiation characteristics from the mouth and nose, and the like are directly included. The existing system ("Gnuspeech") uses the databases we generated that are appropriate for spoken English. The tools that were used for producing these databases are described and illustrated along with their use, whilst links to access the relevant on-line databases are provided. Links are also provided to allow samples of the speech produced to be auditioned, as well as to the manuals for the tools, plus the code sources to all the components involved. The system could be used to develop databases for other languages using the approaches and tools we describe. The material is available under a Free Software Foundation "General Public Licence" ("GPL") that allows the work to be used by all comers. The system runs under Apple's OS X, and the synthesis system is also available under Linux, though the tools are still being ported to Linux.
Featured Image
Why is it important?
The work is important because it is the first complete real-time text-to-speech system based on low-level articulatory synthesis of speech using a reasonably accurate model of the vocal tract; because it can be used in psychophysical experiments on speech; and because it can be used to produce databases for other languages and then speak them. Extensions would be needed to deal with languages involving implosives, clicks, and the like, but the terms of the GPL allow this. This can provide a new beginning for research on speech, and for speech synthesis by computer.
Perspectives
I was really pleased that the reviewers and editors were able to see the real merit in this paper, and I thank them for their improvements and careful work. What is described represents an important segment of what I spent my career doing—specifically computer human interaction in general, but focusing particularly on computer speech recognition and synthesis. The work is original and has yet to be duplicated. The involvement of people who are interested and would like to develop the system and tools further would be most welcome.
David Hill
University of Calgary
Read the Original
This page is a summary of: Low-level articulatory synthesis: A working text-to-speech solution and a linguistic tool, The Canadian Journal of Linguistics / La revue canadienne de linguistique, June 2017, Cambridge University Press,
DOI: 10.1017/cnj.2017.15.
You can read the full text:
Resources
Gnuspeech-related papers
A linked list of Gnuspeech-related papers held on the first author's university website
Free Software Foundation page explaining the Gnuspeech project
Free Software Foundation page explaining the Gnuspeech project, providing another perspective on the work, along with acknowledgements
Free Software Foundation Resource Page for Gnuspeech
The page where interested parties can access Gnuspeech resources, including thos working on the Gnuspeech project
The North Wind and the Sun
A short video with the soundtrack of the fable "The North Wind and the Sun" illustrating the speech synthesised from text by Gnuspeech. The fable was used in 1964 as a specimen text by Professor David Abercrombie at the University of Edinburgh as a phonetic text, saying such a text should: ■ contain all the symbols; ■ exemplify the chief phenomena of weakening, shortening, stress, word linking, etc. ■ make sense; ■ be as short as possible.
She was Ninety-eight When She Died
A video providing a pair-wise comparison between natural speech and speech produced by Gnuspeech, shoing the equivalent spectrograms of the phrases spoken.
Contributors
The following have contributed to this page







