Interview about Text-To-Speech Support in KDE
One of the important topics in the KDE Accessibility Project is the integration of speech synthesis solutions into KDE. This interview show the people behind the great improvements of the last months and years.
The interviewed developers are working on several speech synthesis related applications in KDE:
- Gary Cramblit and Paul Giannaros: authors of ktts, the KDE Text-To-Speech System started by José Pablo Ezequiel Fernández
- Gunnar Schmidt: author of KMouth, an application for speech impaired people
- Robert Vogl: KSayIt, an application for reading out longer texts
The interviewer was Olaf Schmidt, who is also member of the KDE Accessibility Project and who has contributed to some of these projects. The questions and answers were exchanged by email and later combined into one long text.
Please introduce yourself!
Paul: Hi, I'm Paul Giannaros, currently living in England, London.
Robert: Hi, I'm Robert Vogl, 37, married, living south of Munich, Gemany, graduated Communications Engineer.
Gary: Hello, my name is Gary Cramblit, a.k.a PhantomsDad (my wife Nora and I have a pet Schipperke named Phantom). I'm 49 and I've been programming for a living for 28 years, however I've been a Linux and KDE user and programmer for only 2 years.
Gunnar: I am a student of computer science at the University of Paderborn. I like listening to classical music, singing in a choir and working with the computer. With respect to KDE, I am the author of KMouth and one of the co-maintainers of the KDE Accessibility module. I have also written the Command and Hadifix plug-ins for KTTS.
How did you become involved with KDE?
Paul: Not sure exactly. Just thought it would be fun to start developing for real people (giving back from what I've taken), and as Qt is such a wonderful toolkit KDE was the natural choice.
Gunnar: Well, I like writing programs and my favorite desktop is KDE, so writing programs for KDE is a logical step.
Robert: I started my computer life with a funny Schneider/Amstrad CPC464, followed by two Amigas which I used primarily for video processing to finance my study. Unfortunately Amigas where in fact not very usefull for text processing which becomes a more and more important task for me, so I switched over to a PC with Windows. I felt as I was dropped into the dark Middle Ages of single-tasking, especially when I started on LaTex for my seminar papers. At the university Unix manchines were ubiquitous, consequently I did my first steps into Linux motivated by the certainty even a "Hacker-OS" couldn't be worse than my hated windows machine. After all I had a professional LaTex environment (Emacs :-) ) but a ugly GUI (mostly Motif). KDE was the next logical step. My first programming KDE-Project was a Konqueror plug-in to access my digicam.
Gary: Like a lot of people, I'm an escapee from Windows. I was tired of Microsoft and everybody else in the world controlling my computer and telling me what I could and could not do. I was attracted to KDE because it gave me all the capabilities I needed, but gave me complete control.
How did you become involved with speech synthesis and accessibility?
Gunnar: In her last days my mother had an illness which lead to her being able to do less and less as time continued. In her case the illness began with her tongue, so that she became problems speaking. In order to help her I wrote KMouth, an application where you can type in sentences which then get passed on to a speech synthesizer.
Robert: Caused by my job I was living in the U.S. for a while and by each trip to visit family and friends in Germany. I had a lot very boring hours sitting around in an aircraft 30.000ft above the Atlantic. As programming is a much more efficient way to kill time, I took my Laptop and began to write KSayIt, primary intended as a GUI for Mbrola and released it later at kde-apps.org. I was surprised about the positive feedback, especially from handicapped people. This was (and still is) a great motivation for me to keep the project alive.
Paul: I joined #kde-devel at irc.kde.org and asked if anyone knew of some projects in need of help. Up comes Gary (PhantomsDad) and tells me about KTTSD and fills me in. It seemed like a good project at the time, and KDE is in need of a powerful yet simple API for speaking text.
Gary: To be honest, initially I had no interest in accessibility at all. I like e-books and was looking for a convenient way to read out my e-books without spending long hours in front of a monitor screen. I played around with Festival a bit, but wanted something that was better integrated with the desktop, could read out a variety of formats, and permitted me to pause, stop, rewind, and advance through the file. I happened upon a project in kdenonbeta called "KTTSD" -- KDE Text-to-Speech Daemon. It was started by Pupeno but was dormant. From the description, it seemed like just what I was looking for and after checking with Pupeno, I decided to take it on as maintainer. Since then, KTTS has evolved to be quite useful, even if you're not accessibility challenged.
The goals of KTTSD are 1) to provide a common API for all KDE applications to use for text-to-speech (TTS) synthesis with a minimum of fuss and bother, 2) provide support for as many synths and languages as possible, 3) provide capabilities to speak a variety of text formats, such as web pages and PDFs, 4) provide a GUI that permits users to configure and control speech output -- sort of a combination of KMix and kcmshell printmgr, and 5) provide a TTS subsystem to support accessibility in KDE. You can learn more about KTTSD at http://accessibility.kde.org/developer/kttsd/.
What is your motivation for working on it?
Robert: Just for fun.
Gunnar: I like working with the computer, and I see the need for accessibility.
Paul: People need these tools. Applications like Konqueror, Kate, etc. can have plugins created for speaking text on webpages or in files respectively now that such a daemon is in place. People with visibility problems can therefore use TTS from the comfortable environment of applications they're familiar with.
Gary: It's fun and allows me to give back to the community. And speech synthesis seemed to be an area of need in KDE.
What are currently the biggest obstacles for speech synthesis in KDE?
Gunnar: The biggest obstacle is that for most languages, no free speech synthesizers are available. Currently there is Festival, FreeTTS, Flite and Epos.
Festival supports English, Spanish and Welsh. Epos supports Czech and Slovak. With some patches they support additional languages, but to my knowledge these patches are not under a free license. As far as I know FreeTTS and Flite do only support English.
There is also MBrola, which can produce voice output for a many languages if you have already pre-processed the text into a list of phonemes, but the project has been abandoned several years ago. MBrola can be used free of charge for non-commercial and non-military purposes, but it is only available in the form of outdated binaries.
Unfortunately the situation is not much better regarding proprietary speech synthesis support on Linux.
Gary: There are numerous challenges. Probably the biggest problem is a lack of good open source speech synthesizers for languages other than English, or Spanish. One of the things I'm trying to do with KTTSD is add support for as many synth engines as I can get my hands on. We'd also like to support commercial synths, but to support them properly, vendors must be willing to donate a copy of their product to us.
Even when synths are available, working around their quirks and limitations has been a challenge. Festival, for example, is simply not designed to be stopped in mid-speech. Give it a huge block of text and the only way to stop it is to abort the process. The accessibility group at freedesktop.org is working on a common specification for speech synthesizers which, when adopted, would certainly make my like easier.
BTW, if you haven't heard the new MultiSyn voices in the latest version of Festival, you are missing out. They are very natural sounding.
KDE doesn't have a Screen Reader for the blind or low-sighted. In this regard, KDE is way behind GNOME, which has Gnopernicus. Qt4 will provide a GUI framework to support a Screen Reader, and KTTSD will provide a common speech module, but there is still a lot of work to be done before KDE will have a usable Screen Reader.
Paul: Awareness is needed. Once KTTSD is established the major KDE applications should try and provide a method for reading their documents or other data. KTTSD is also in desperate need of translators.
Robert: Obstacles? As an engineer I think in solutions not in obstacles ;-). In general I think there is a lack of free speech synthesizers for Linux. Concerning the development of KSayIt, my time resource is one of the biggest obstacle.
Which future plans do you have for yourself, for KDE, and for accessibility solutions?
Robert: Right now I think KSayIt is in a very early state of its development. My plans are to establish KSayIt as a more or less useful player in correlation with the other interesting accessibility tools and projects. Nevertheless, I don't see it strictly as a tool simply to enhance the accessibility of KDE. Currently I'm working on a DocBook-Interface to handle structured texts and to allow a bookmarking system. A just-for-fun-task with low priority is to enhance the effect stack for audio post-processing in a way to enable the integration of LADSPA plugins. I like it, because it's useless.
Gary: Up until now, KTTSD has been a "fringe" application with few users or contributors. I expect that to change rapidly over the next few months. For starters, we think it is stable enough now for public release. Qt4 will give a huge push towards enabling accessibility within KDE apps, and KTTSD will be there to provide the speech capabilities.
Something exciting we are working on right now is Speech Synthesis Markup Language (SSML) support in KTTSD. The idea is that you can right-click on a web page and, by using suitable stylesheets, have it spoken using a variety of genders, volumes, and pitches. For example, links might be spoken in a female voice while the body of the page is spoken in a male voice.
Paul: Currently I'm working mainly on providing an SSML implementation (or as much of the SSML spec as possible) for TTS via KTTSD, which should keep me busy for the near future. This implementation allows you, via an SSML document, to have greater control over certain talkers (i.e Festival). You can dynamically control the speed of the text, the pitch, the volume, insert breaks, pronounce certain words in different ways, etc.
Gunnar: In 2005 I will finish my studies. After that I hope to find a job where I can work on KDE or Accessibility. Which concrete solutions will come out of I cannot say yet.
In the several countries (including the USA and several EU countries), there are laws that if computer hardware or software is bought by public money, then it must also be accessible to people with physical handicaps. What do you think about such laws?
Robert: Nowadays using computers for e.g. accessing the internet becomes more and more a everyday issue of our life, I think such a law makes absolutely sense. It's a very important distinction of democracy to take care about the needs of minorities and let them participate in the "normal" society as much as possible. As there is so much software around which is inaccessible even for non-handicapped users, we all will hopefully benefit from such laws.
Gary: Laws and committees don't write software; programmers do. And that is what is needed: lots of new software and lots of rewrites of existing software. Here in the U.S., it's called Section 508 compliance, and its an unfunded mandate. I believe this law has done more damage than good. Writing software so that it is 100% accessible is very difficult and expensive, given the current state of technology. Even the definition of what "accessible" means is subject to debate. The net effect for most U.S. government agencies has been to not release software they develop to the public, lest it be found non-compliant. Everybody loses. I believe a carrot approach, rather than a stick, is needed. Perhaps tax incentives for entities that write accessible software, or budget incentives for government agencies that develop or use accessible software?
Paul: Tax payers generally don't like having to pay money for something which will not adversely affect them in a positive way. These countries have therefore created such laws - to try and cater for all. They sound pretty good to me.
Gunnar: From the viewpoint of a person with disabilities such a law is a very good thing as it leads to more programs being accessible. On the other hand you need to make sure that it is possible for all programmers to make their applications accessible. Currently the development of AT-SPI is leading in the right direction on Linux/Unix, but if some of the techniques to achieve the accessibility were protected by patents vendors of free software would have a really big problem.
(December 12, 2004)