All technical report files are in compressed (gzip) postscript format.
The file names are of the sort BU-CE-YYXX.ps.gz or BU-CE-YYXX.pdf.
BU-CE-0101
TITLE: Stereoscopic View-Dependent Visualization of Terrain Height Fields
AUTHORS: Ugur Güdükbay and Türker Yilmaz
ABSTRACT: Visualization of large geometric environments has always been an
important problem of computer graphics. In this paper, we present a
framework for the stereoscopic view-dependent visualization of large
scale terrain models. We use a quadtree based multiresolution
representation for the terrain data. This structure is queried to
obtain the view-dependent approximations of the terrain model at
different levels of detail. In order not to loose depth information,
which is crucial for the stereoscopic visualization, we make use of a
different simplification criterion, namely distance-based angular
error threshold. We also present an algorithm for the construction of
stereo pairs in order to speed up the view-dependent stereoscopic
visualization. The approach we use is the simultaneous generation of
the triangles for two stereo images using a single draw-list so that
the view frustum culling and vertex activation is done only once for
each frame. The cracking problem is solved using the dependency
information stored for each vertex. We eliminate the popping artifacts
that can occur while switching between different resolutions of the
data using morphing. We implemented the proposed algorithms on
personal computers and graphics workstations. Performance experiments
show that the second eye image can be produced approximately 45 \%
faster than drawing the two images separately and a smooth
stereoscopic visualization can be achieved at interactive frame rates
using continuous multi-resolution representation of height fields.
Keywords: Stereoscopic visualization, terrain height fields,
multiresolution rendering, quadtrees.
BU-CE-0102
TITLE: Analysis and Presentation of Interesting Rules
AUTHORS: Turker Yilmaz and H. Altay Güvenir
ABSTRACT: Finding interesting rules from a large rule set is one of
the most important aims of data mining research. In this paper, we
present an algorithm and its application to finding and presenting
interesting rules from a large set of previously discovered rules. Our
domain of experimentation is the differential diagnosis of
erythemato-squamous diseases. The application domain contains records
patients with known diagnoses.
Keywords: Data mining, rule interestingness, rule surprisingness
BU-CE-0103
TITLE: Categorization in a Hierarchically Structured Text Database
AUTHOR: Ferhat Kutlu
ABSTRACT:
Over the past two decades there has been a huge increase in the amount
of data being stored in databases and the on-line flow of data by
the effects of improvements in Internet. This huge increase brought
out the needs for intelligent tools to manage that size of data and
its flow. Hierarchical approach is the best way to satisfy these needs
and it is so widespread among people dealing with databases and Internet.
Usenet newsgroups system is one of the on-line databases that have
built-in hierarchical structures. Our point of departure is this
hierarchical structure which makes categorization tasks easier and faster.
In fact most of the search engines in Internet also exploit inherent
hierarchy of Internet. Growing size of data makes most of the
traditional categorization algorithms obsolete. Thus we developed a
brand-new categorization learning algorithm which constructs an index
tree out of Usenet news database and then decides the related
newsgroups of a new news by categorizing it over the index tree. In
learning phase it has an agglomerative and bottom-up hierarchical
approach. In categorization phase it does an overlapping and
supervised categorization. K-Nearest Neighbor categorization
algorithm is used to compare the complexity measure and accuracy of
our algorithm. This comparison does not only mean comparing two
different algorithms but also means comparing hierarchical approach
vs. flat approach, similarity measure vs. distance measure and
importance of accuracy vs. importance of speed. Our algorithm prefers
hierarchical approach and similarity measure, and greatly outperforms
k-Nearest Neighbor categorization algorithm in speed with
minimal loss of accuracy.
Keywords: Learning, categorization, clustering, hierarchy, Usenet,
newsgroup, top-level, header-line, posting, frequency, norm-scaling,
similarity measure, distance measure, agglomerative, bottom-up,
stemming, stopword, index.
BU-CE-0104
TITLE: Application of K-NN and FPTC Based Text Categorization
Algorithms to Turkish News Reports
AUTHOR: Ufuk Ilhan
ABSTRACT:
New technological developments, such as easy access to Internet,
optical character readers, high-speed networks and inexpensive massive
storage facilities, have resulted in a dramatic increase in the
availability of on-line text-newspaper articles, incoming (electronic)
mail, technical reports, etc. The enormous growth of on-line
information has led to a comparable growth in the need for methods
that help users organize such information. Text Categorization may be
the remedy of increased need for advanced techniques. Text
Categorization is the classification of units of natural language
texts with respect to a set of pre-existing categories. Categorization
of documents is challenging, as the number of discriminating words can
be very large. This thesis presents compilation of a Turkish dataset,
called Anadolu Agency Newsgroup in order to study in Text Categorization.
Turkish is an agglutinative language in which words contain no direct
indication where the morpheme boundaries are, furthermore, morphemes
take a shape dependent on the morphological and phonological
context. In Turkish, the process of adding one suffix to another can
result in a relatively long word, furthermore, a single Turkish word can
give rise to a very large number of variants. Due to this complex
morphological structure, Turkish requires text processing techniques
different than English and similar languages. Therefore,
besides converting all words to lower case and removing punctuation
marks, some preliminary work is required such as stemming, removal of
stop words and formation of a keyword list.
Keywords: Text categorization, classification, feature projections,
stemming, wild card matching, stopword.
BU-CE-0105
TITLE: Quasi-Birth-and-Death Processes with Level-Geometric Distribution
AUTHOR: Tugrul Dayar and Franck Quessette
ABSTRACT: A special class of homogeneous continuous-time
quasi-birth-and-death (QBD) Markov Chains (MCs) which possess
level-geometric (LG) stationary distribution are considered. Assuming
that the stationary vector is partitioned by levels into subvectors,
in an LG distribution all stationary subvectors beyond a finite level
number are multiples of each other. Specifically, each pair of
stationary subvectors that belong to consecutive levels are related by
the same scalar, hence the term level-geometric. Necessary and
sufficient conditions are specified for the existence of such a
distribution and the results are elaborated on three examples.
Keywords: Markov chains, QBD processes, geometric distributions
BU-CE-0106
TITLE: An Eager Regression Method Based On Best Feature Projections
AUTHOR: Tolga Aydin and H. Altay Güvenir
ABSTRACT: This paper describes a machine learning method, called
Regression by Selecting Best Feature Projections (RSBFP). In the training
phase, RSBFP projects the training data on each feature dimension and
aims to find the predictive power of each feature attribute by
constructing simple linear regression lines, one per each continuous
feature and number of categories per each categorical feature. Because,
although the predictive power of a continuous feature is constant, it
varies for each distinct value of categorical features. Then the simple
linear regression lines are sorted according to their predictive power.
In the querying phase of learning, the best linear regression line and
thus the best feature projection are selected to make predictions.
Keywords: Prediction, feature projection, regression
BU-CE-0107
TITLE: Detection of Abnormal ECG Recordings using Feature Intervals
AUTHOR: H. Altay Güvenir
ABSTRACT:
A new classification algorithm, called CFI (for {\it Classification on
Feature Intervals}), is developed and applied to problem of detecting
abnormal ECG signals. The domain contains records of patients with
known diagnosis. Given a training set of such records the CFI
algorithm learns how to detect arrhythmia. CFI represents a concept in
the form of {\it feature intervals} on each feature dimension
separately. Classification in the CFI algorithm is based on a
real-valued voting. A genetic algorithm is used to select the set of
relevant features. The CFI algorithm can decline to make a prediction
its confidence level is low. The performance of the CFI classifier is
evaluated empirically in terms of classification accuracy and running
time.
Keywords: ECG, arrhythmia detection, feature intervals, feature
selection, classification, machine learning
BU-CE-0108
TITLE: SQL-TC: A Topic-Centric Query Language for Web-Based
Information Resources
AUTHOR: İsmail Sengör Altıngövde, Selma Ayşe Özel, Özgür Ulusoy,
Gültekin Özsoyoğlu, Zehra Meral Özsoyoğlu
ABSTRACT:
This report deals with the problem of modeling web information
resources using expert knowledge and personalized user information,
and querying them in terms of topics and topic relationships. We
propose a "web information space" metadata model for web information
resources, and a query language SQL-TC (Topic-Centric SQL) to query
the model. The web information space model is composed of web-based
information resources (XML or HTML documents on the web), expert
advice repositories (domain-expert-specified metadata for information
resources as XTM documents), and personalized information about users
(captured as user profiles, that indicate users' preferences as to
which expert advice they would like to follow, and which to ignore,
etc., as XML documents). Expert advice is specified using topics and
relationships among topics (called metalinks), along the lines of the
recently proposed topic maps. Experts attach importance values and
domains to topics and metalinks that they specify. And, users declare
to accept, reject or "don-t-care" about the choices that experts
make. The query language SQL-TC makes use of the metadata information
provided in expert advice repositories and embedded in information
resources, and employs user preferences to further refine the query
output. Query output objects/tuples are ranked with respect to the
(expert-judged and user-preference-revised) importance values of
requested topics/metalinks, and the query output is limited by either
top n-ranked objects/tuples, or objects/tuples with importance values
above a given threshold, or both.Therefore, the query output of SQL-TC
is expected to produce highly relevant and semantically related
responses to user queries within short amounts of time.
BU-CE-0109
TITLE: Fast Stereoscopic View-Dependent Visualization of Terrain
Height Fields
AUTHOR: Türker Yilmaz
ABSTRACT:
Visualization of large geometric environments has always
been an important problem of computer graphics. In this thesis, we
present a framework for the stereoscopic view-dependent
visualization of large scale terrain models. We use a quadtree
based multiresolution representation for the terrain data. This
structure is queried to obtain the view-dependent approximations
of the terrain model at different levels of detail. In order not
to loose depth information, which is crucial for the stereoscopic
visualization, we make use of a different simplification
criterion, namely distance-based angular error threshold. We also
present an algorithm for the construction of stereo pairs in order
to speed up the view-dependent stereoscopic visualization. The
approach we use is the simultaneous generation of the triangles
for two stereo images using a single draw-list so that the view
frustum culling and vertex activation is done only once for each
frame. The cracking problem is solved using the dependency
information stored for each vertex. We eliminate the popping
artifacts that can occur while switching between different
resolutions of the data using morphing. We implemented the
proposed algorithms on personal computers and graphics
workstations. Performance experiments show that the second eye
image can be produced approximately 45% faster than drawing the
two images separately and a smooth stereoscopic visualization can
be achieved at interactive frame rates using continuous
multi-resolution representation of height fields.
Keywords: Stereoscopic visualization, terrain height fields,
multiresolution rendering, quadtrees.
BU-CE-0110
TITLE: On-Line New Event Detection and Tracking in a Multi-Resource
Environment
AUTHOR: Hakan Kurt
ABSTRACT:
As the amount of electronically available information resources
increase, the need for information also increases. Today, it is
almost impossible for a person to keep track all the information
resources and find new events as soon as possible. In this
thesis, we present an on-line new event detection and tracking
system, which automatically detects new events from multiple news
resources and immediately start tracking events as they evolve.
Since we implemented the on-line version of event detection
approach, the novelty decision about a news story is done before
processing the next one. We also present a new threshold, called
support threshold, used in detection process to decrease the
number of new event alarms that are caused by informative and
one-time-only news. The support threshold can be used to tune
the weights of news resources. We implemented the tracking phase
as an unsupervised learning process, that is, detected events
are automatically tracked by training the system using the first
news story of an event. Since events evolve over time, an
unsupervised adaptation is used to retrain the tracking system
in order to increase the tracking system performance. Adaptation
is achieved by adding predicted documents to the training
process. From the corpus observations, we conclude that one news
story can be associated to more than one event. For this reason,
the tracking system can relate a news story to more than one
event. The on-line new event detection and tracking system has
been tested on the Reuters news feed, available on the Internet.
The Reuters news feed, that we used, comprises four independent
news resources. The news stories are in Turkish.
Keywords: Event detection, event tracking, information retrieval.
BU-CE-0111
TITLE: A Rule-Based Video Database System Architecture
AUTHOR: Mehmet Emin Dönderler, Özgür Ulusoy and
Ugur Güdükbay
ABSTRACT:
We propose a novel architecture for a video database system
incorporating both spatio-temporal and semantic (annotation,
event/activity and category-based) query facilities. The originality
of our approach lies in the fact that we intend to provide full
support for spatio-temporal, relative object-motion and
similarity-based object-trajectory queries by a rule-based system
utilizing a knowledge-base while using an object-relational database
to answer semantic-based queries. Our method of extracting and
modeling spatio-temporal relations is also a unique one such that we
segment video clips into shots using spatial relationships between
objects in video frames rather than applying a traditional scene
detection algorithm. The technique we use is simple, yet novel and
powerful in terms of effectiveness and user query satisfaction: video
clips are segmented into shots whenever the current set of relations
between objects changes and the video frames, where these changes
occur, are chosen as key frames. The directional, topological and
third-dimension relations used for shots are those of the key frames
selected to represent the shots and this information is kept, along
with frame numbers of the key frames, in a knowledge-base as Prolog
facts. The system has a comprehensive set of inference rules to reduce
the number of facts stored in the knowledge-base because a
considerable number of facts, which otherwise would have to be stored
explicitly, can be derived by rules with some extra effort.
Keywords: Content-Based Retrieval, spatio-temporal relations, video
databases, knowledge representation, rule-based video modeling,
inference rules, information systems.
BU-CE-0112
TITLE: Mining User Access Patterns and Identity Information
from Web Logs for Effective Personalization
AUTHOR: Esra Satiroglu
ABSTRACT:
Web is a huge source of data in terms of its usage as a result of
being visited by millions of people on each day. Its usage data is
stored in web server logs which contain a detailed description of
every single hit taken by the corresponding web server. Recently, it
has been discovered that analysis of this data for understanding the
user behavioral patterns may have critical implications. Understanding
the behavioral patterns of visitors is especially important for
e-commerce companies which try to gain customers and sell products
through the web. Interactive sites that recognize their customer and
customize themselves accordingly may save lots of money to the
companies. Usage Based Personalization is a study on designing such
personalized sites. In this thesis, we present a new usage based
personalization system. The system we designed and implemented is
capable of guessing the web pages that may be requested by the on-line
visitors during the rest of their visits. The system shows the subset
of these pages with highest scores as recommendations to the visitors
as being attached to the original pages. The system has two major
modules. The off-line module mines the log files off-line for
determining the behavioral patterns of the previous visitors of the
web site considered. The information obtained by the off-line module
is utilized by the on-line module of the system for recognizing new
visitors and producing on-line recommendations. The first criterion
for identifying on-line visitors is the paths followed by them. A path
of a particular visitor consists of pages retrieved by him throughout
his visit to the web site. Another criterion considered by the system
is the identity information (IP address or domain name) of the
visitors. By using identity information, it is possible to learn old
preferences of the visitor himself or visitors from similar domains.
The similarity between domains is determined by the help of the domain
name hierarchy which is represented by a hierarchical tree
structure. The leaves of the tree representing domain name hierarchy
contain the domain names of the machines connecting to the Internet
while the inner nodes contain the general domains such as
com, edu.tr, etc. In domain name hierarchy, the similarity between two
domains increases as the number of common parents of them
increases. In the light of these observations, for guessing the
navigational behavior of a particular visitor, our system makes use of
the common behavioral trends of the visitors whose machines belong to
the parent domains of the domain of that visitor. We have tested the
system on the web site of CS department of Bilkent University. The
results of the experiments show the efficiency and applicability of
the system.
Keywords: Web log mining, access pattern, personalization.
BU-CE-0113
TITLE: Active and Mobile Data Management Through Event History Mining
AUTHOR: Yucel Saygin
ABSTRACT:
An event history is a collection of events that have occurred in an
event-based system over a period of time. There can be various types
of events, among which are the temperature changes and power demands
in a power management system, client requests for data items in a
broadcast system, price increase of a stock in stock market, and so
on. There is a lot of interesting information that can be extracted
from an event history via data mining techniques. Our purpose in this
thesis is to propose methods for extracting this useful information in
the form of event sequences and event associations from a single or
two correlated event histories. We also aim to show how the results of
the mining process can be used for active and mobile data
management. The results of the mining process demonstrates the
relationships among the events which are generally captured as
associations or sequences. The relationships among the events are
shown to be a useful tool to enhance an event-based system via event
organization, predictive event detection, and proactive rule
execution.
We consider the mining of both a single event history and
two correlated event histories. We first propose a method for mining
binary relationships from a single event history. The binary
relationships among events are used to organize the events into
related groups of event. This organization is important because the
number of events in an event-driven system may become very high and
unmanageable. The groups of related events and the relationships
among the events are exploited for predictive event detection and
proactive rule execution in active database systems. We also consider
the mining of two correlated event histories which are disjoint and
the events in one history are related to the events in the other
history. We describe how we can efficiently extract associations
among the events spanning different event histories, which we call
cross associations.
We have chosen data broadcast in mobile computing environments as a
case study for active data management. One of the important facts in
mobile computing environments with wireless communication medium is
that the server-to-client (downlink) communication bandwidth is much
higher than the client-to-server (uplink) communication
bandwidth. This asymmetry makes the dissemination of data to client
machines a desirable approach. However, the dissemination of data by
broadcasting may induce high access latency in case the number of
broadcast data items is large. Our purpose is to show how active data
management can be used to improve mobile data management through
broadcast data organization and prefetching from the broadcast
medium. In order to achieve this, the client requests of data items at
the server are considered as events and the chronological sequence of
items that have been requested by clients is considered as an event
history. We call the event history in broadcast medium a broadcast
history. The first step in our work is to analyze the broadcast
history to discover sequential patterns describing the client access
behavior. The sequential patterns are used to organize the data items
in the broadcast disk in such a way that the items requested
subsequently are placed close to each other. Then we utilize the
predictive event detection techniques to improve the cache hit ratio
to be able to decrease the access latency. Prediction of future
client access behavior enables clients to prefetch the data from the
broadcast disk based on the rules extracted from the broadcast
history.
BU-CE-0114
TITLE:
Animation of Human Motion with Inverse Kinematics using Nonlinear Programming
AUTHOR: A. Sezgin Abali
ABSTRACT:
Animation of articulated figures has always been an interesting
subject of computer graphics due to a wide range of applications,
like military, ergonomic design etc. An articulated figure is
usually modelled as a set of segments linked with joints.
Changing the joint angles brings the articulated figure to a new
posture. An animator can define the joint angles for a new posture
(forward kinematics). However, it is difficult to estimate the
exact joint angles needed to place the articulated figure to a
predefined position. Instead of this, an animator can specify
the desired position for an end-effector, and then an algorithm
computes the joint angles needed (inverse kinematics). In this
thesis, we present the implementation of an inverse kinematics
algorithm using nonlinear optimization methods. This algorithm
computes a potential function value between the end-effector
and the desired posture of the end-effector called goal.
Then, it tries to minimize the value of the function.
If the goal cannot be reached due to constraints then
an optimum solution is found and applied by the algorithm.
The user may assign priority to the joint angles by scaling
initial values estimated by the algorithm. In this way, the
joint angles change according to the animator?s priority.
Keywords: animation, human motion, inverse kinematics,
nonlinear programming, optimization
BU-CE-0115
TITLE:
Realistic Rendering of Human Models
AUTHOR: Gultekin Arabaci
ABSTRACT:
Realistic rendering of human models has an increasing importance in
computer graphics. Simulation of muscle bulging and proper
deformations of skin at joints, makes human animation more
realistic. In this thesis, we describe a layered human animation
system in which muscles move with respect to the bones and the skin
deforms according to the muscles. Muscles are modelled as ellipsoids
and their shapes are deformed with the movements of the bones
represented by sticks in the skeleton. The skin is anchored to the
muscles and it changes its shape as the muscles bulge. Every muscle
may have different user defined bulging ratios according to the joint
movements.
Keywords:
Human animation, skin and muscle deformation, 3D modelling,
rendering.
BU-CE-0116
TITLE:
Web-based User Interface for Query Specification in a Video Database System
AUTHOR: Ediz Saykol
ABSTRACT:
The enhancements in multimedia technology have accelerated the
development of new techniques for querying and retrieval in video
databases. Based on these new techniques, effective user
interfaces are designed especially for query specification. Since
the Internet is a suitable multiple client environment, most of
the video database systems are designed to work over the Internet.
In this thesis, a web-based user interface for a video database
system is presented. The interface is composed of a hierarchical
windows-based structure where logically separable sub-queries are
handled in separate windows. For example, spatial and motion
queries are handled separately. The separation of the interface
aims to facilitate the query specification process. Furthermore,
the user is allowed to get partial results for the sub-queries.
The retrieval and result presentation process displays the query
results in the form of video frame sequences on a separate window
with the capability of playing each result separately. As another
contribution of this thesis, considering the fact that most of the
systems include querying by object features, a novel approach for
color and shape querying and retrieval is proposed. The proposed
approach is histogram-based and it is shown through the
performance experiments that it overcomes the deficiencies of the
existing methods. This new approach is supported by an auxiliary
object extraction tool in the query-by-feature sub-system of the
video database system. The object extraction tool is
semi-automatic, hence it can successfully capture salient object
regions for most of the images and/or video frames.
Keywords:
Web-based query specification interface, object
extraction, flood fill for extraction, distance histogram, color
histogram, kinematics model for polygon simplification.