Abstract:
The project involves the design and implementation of a tool
for visually handicapped people speaking Turkish. Using this tool,
visually handicapped people will be able to read, edit and print a
document, browse in the internet, send and receive email
messages. Also, as part of this project a text-to-speech (TTS) engine
suitable for the Turkish language will be designed and developed.
Introduction:
With the decrease in the cost of personal computers, a need for
software tools for the visually handicapped people that would enable
them to read, edit and print documents has arisen. We have realized
that such handicapped people are not able establish a private
communication with other people, since they are not able to read
and/or write hand written or typed documents. Also with the decrease
of the cost of accessing the Internet, the demand for accessing the
Internet among the visually handicapped people has been increasing
drastically. They would like to communicate with other people through
email messages. The demand from the visually handicapped Turkish
speaking people has been the motivation for the current project.
Previous Work:
Having realized the need for such a tool, we have done some
experiments under the name Oku (it stands for read in Turkish). The
first versions, Oku.1 and Oku.2 were only editors. They used an
experimental TTS engine. In that engine we have relied on the phonetic
characteristics of Turkish language. In Turkish language the words are
composed of syllables. A Turkish word can be broken down into a
sequence of syllables using a simple algorithm. The pronunciation of a
syllable is the same, independent of the word it occurs. Making use of
this fact, the TTS engine breaks a word into its syllables, and then
plays the sound files from its database. The database contained about
one thousand syllables.
We made these programs publicly available, free of charge, through a web page. We also sent copies on CDROMs to those who did not have internet access. The interest in the program encouraged us to continue on this tool. A large number of the users asked for an extension to the program that would allow them to surf on the Internet. Considering the feedback from the users, we have developed the last version, called Oku.3. It included a web browser and an email client, as additions to all the features of the previous versions. An important difference of Oku.3 is that it uses Mbrola as the TTS, which has been developed outside of our group. It is a Multilingual Speech Synthesizer. It is free for non-commercial applications. We have the Mbrola TTS as a DLL, an incorporated it into our system. The Mbrola performed slightly better than the TTS used in the previous versions of Oku. The last version, Oku.3.0.1, is available, again free of charge, from http://www.cs.bilkent.edu.tr/~guvenir/Oku/. With the addition of the Internet browser, the Oku.3 program received a lot of attention from the community of visually handicapped people in Turkey. They gave us very important feedback and asked for many improvements in the program. However, we did not have any chance to incorporate these suggestions into the Oku.3 program, due to our limited resources.
Current Project:
Taking into considerations of the community of visually handicapped
people in Turkey, we re-design the Oku program. In the current
project, we aim to develop a professional quality tool. The web
browser of Oku.3 is an experimental tool that cannot handle many
features of web pages, including forms. It is an important deficiency,
because without forms, the user cannot use the search tools, e.g.,
google, and yahoo. Also in Oku.3 the email servers such as hotmail are
not accessible. In the proposed project all features of the web pages,
except pictures, will be handled.
The most important work in the proposed project is the design and implementation a new TTS system suitable for the Turkish language. The Mbrola, that we have used in Oku.3 is a speech synthesizer based on the concatenation of diphones. It takes a list of phonemes as input, together with prosodic information (duration of phonemes and a piecewise linear description of pitch), and produces speech samples on 16 bits (linear), at the sampling frequency of the diphone database used (it is therefore NOT a Text-To-Speech (TTS) synthesizer, since it does not accept raw text as input). The TTS system to be designed and developed in this project will accept raw text as input and produce high quality speech samples. We investigate new techniques for speech synthesis suitable for the Turkish language. All other feedback from the users will be incorporated in the new version of the Oku program. They include improvements in the handling of dates, times, numbers, abbreviations and units.
Results:
This version of the project will be called Oku4.
The results of this project will be made available to the public, freely
and without restrictions, through this web site
(http://www.cs.bilkent.edu.tr/~guvenir/Oku4).
Principal Investigator:
H. Altay Guvenir, Ph.D.
Investigator:
Engin Demir, MSc.
Investigator:
Celal Ziftci, Firat Kart, Orhan Uctepe.
Duration: June 2003 - June 2004.
Sponsor: Microsoft Research Ltd.
Contract No: 2003-239