|
Objective(s)
The goal of this project is developing content
based retrieval techniques for historical Ottoman documents stored as textual
images.
Project
Members
Faculty
|
|
Ph.D./M.S. Students
|
|
Undergraduates
|
|
Project
Description
There is an accelerating demand to access the
visual content of documents stored in historical and cultural archives.
Availability of electronic imaging tools and effective image processing
techniques makes it feasible to process the multimedia data in large databases.
In this project, a framework for content-based retrieval of historical documents
in the Ottoman Empire archives is presented. The documents are stored as textual
images, which are compressed by constructing a library of symbols occurring in a
document, and the symbols in the original image are then replaced with pointers
into the codebook to obtain a compressed representation of the image. The
features in wavelet and spatial domain based on angular and distance span of
shapes are used to extract the symbols. In order to make content-based retrieval
in historical archives, a query is specified as a rectangular region in an input
image and the same symbol-extraction process is applied to the query region. The
queries are processed on the codebook of documents and the query images are
identified in the resulting documents using the pointers in textual images. The
querying process does not require decompression of images. The new content-based
retrieval framework is also applicable to many other document archives using
different scripts.
Publications
The refereed
publications for this research include:
-
E. Saykol, A. K. Sinop, U. Gudukbay, Ö. Ulusoy, E. Cetin.
Content-Based
Retrieval of Historical Ottoman Documents Stored as Textual Images
[abstract] [.pdf]
[bib]
IEEE Transactions on Image Processing,
vol.13, no.3, 2004.
-
I. Z. Yalniz,
I. S. Altingovde, U. Gudukbay, Ö. Ulusoy.
Ottoman Archives Explorer: A Retrieval System for
Digital Ottoman Archives
[abstract]
ACM Journal on Computing and
Cultural Heritage.
Vol. 2, No. 3, Article No. 8, 20 pages, 2009.
-
I. Z. Yalniz,
I. S. Altingovde, U. Gudukbay, O. Ulusoy.
span>
Integrated Segmentation and Recognition of Connected Ottoman Script
[abstract]
Optical Engineering .
Vol. 48, No. 11, Article No. 117205, 12 pages, 2009.
Demo
The prototype system developed for content-based
retrieval of digital Ottoman Archives can be accessed
here.
Data
There are two data sets (including Ottoman
document scans) used in our studies: Set1 (38
documents, 118 MB) Set2 (43 documents, 15
MB) Set1 is obtained from the following
recources (please see our publications for detailed information).
-
A. Alparslan, Osmanlı Hat Sanatı Tarihi, Yapı Kredi Yayınları, 1999.
-
M. U. Derman, Hat
Koleksiyonundan Seçmeler, Sakıp Sabancı Müzesi, Sabancı Üniversitesi, 2002.
-
H. M. H.Hakkak-zade
Mustafa Hilmi Efendi, Mizanü’l-hatt, Istanbul, 1986.
-
M. Ülker, Başlangıçtan
Günümüze Türk Hat Sanatı, Türkiye İş¸ Bankası Kültür Yayınları, 1987.
-
E. Çetin,Ö.N. Gerek,
and A. H. Tewfik, “The Topkapi Palace Museum,”
Museum Int.,
vol. 1, no. 1, pp. 22–25, 2000.
Set 2 is obtained
from a textbook for teaching Ottoman.
|
|