All technical reports files are in compressed (gzip) postscript format.
The file names are of the sort BU-CEIS-YYXX.ps.z
BU-CEIS-9701
TITLE: Design and Implementation of a Computational Lexicon for
Turkish
(M.Sc. Thesis)
AUTHOR: Kurtulus Yorulmaz
ABSTRACT:
All natural language processing systems (such as parsers, generators,
taggers) need to have access to a lexicon about the words in the
language. This thesis presents a lexicon architecture for natural
language processing in Turkish. Given a query form consisting of a
surface form and other features acting as restrictions, the lexicon
produces feature structures containing morphosyntactic, syntactic, and
semantic information for all possible interpretations of the surface
form satisfying those restrictions. The lexicon is based on
contemporary approaches like feature-based representation,
inheritance, and unification. It makes use of two information
sources: a morphological processor and a lexical database containing
all the open and closed-class words of Turkish. The system has been
implemented in SICStus Prolog as a standalone module for use in
natural language processing applications.
BU-CEIS-9702
TITLE: Inductive Synthesis of Recursive Logic Programs: Achievements and Prospects
AUTHORS: Pierre Flener and Serap Yilmaz
ABSTRACT:
The synthesis of recursive logic programs from incomplete
information, such as input/output examples, is a challenging subfield
both of ILP (Inductive Logic Programming) and of the synthesis (in
general) of logic programs from formal specifications. We first
survey
past and present achievements, focusing on the techniques that mostly
aim at the synthesis of recursive logic programs, dispensing thus with
many more general ILP techniques that can also induce non-recursive
hypotheses. Then we analyze the prospects of all inductive techniques
in this task, whether they are tailored or not for the synthesis of
recursive programs, investigating their applicability to software
engineering and to knowledge acquisition and discovery.
BU-CEIS-9703
TITLE: Specifications are Necessarily Informal or: The Ultimate Myths of Formal Methods
AUTHOR(S): Baudouin Le Charlier (University of Namur, Belgium)
Pierre Flener (Bilkent University, Turkey)
ABSTRACT:
We reconsider the concept of specification in order to bring new
insights into the debate of formal versus non-formal methods in
computer science. In our view, a specification is the link between
the program (formality) and its meaning (informality). Since the
purpose of a program must be something directly understandable,
specifications are the essential tool for constructing, in practice,
correct real-world programs through explicit, yet not completely
automatable reasoning. This allows us to explain why formal
specifications cannot meet the demands of specifications in our sense,
since this would be a contradiction in terms. Our view of
specifications does not imply a rejection of all ideas put forward in
the literature on formal methods. On the contrary, we agree with the
proponents of formal methods on most of their arguments, except on the
fact that specifications had better be written in a formal
specification language. And, inevitably, we also disagree with
other arguments that are a consequence of this assumption.
Finally, we examine why the role and nature of specifications are
so often misunderstood.
BU-CEIS-9704
TITLE: Tagging English by Path Voting Constraints
AUTHOR(S): Gokhan Tur, Kemal Oflazer and Nihat Ozkan
ABSTRACT: We present a constraint-based tagging approach where
individual constraints vote on sequences of matching tokens and
tags. Disambiguation of all the tokens in a sentence is performed at
the very end by selecting tags that appear on the path that receives
highest vote. This constraint application paradigm makes the outcome
of the disambiguation independent of the rule sequence, and
hence relieves the rule developer from worrying about potentially
conflicting rule sequencing found in other systems. We have applied
this approach to tagging English text from the Energy Abstracts
section of the ACL/DCI CD. Our preliminary results indicate that using
about 400 constraint rules, we can attain a recall of 98.2% and a
precision of 97.1 with about 1.01 parses per token on previously
seen text, and recall of 95.9% and a precision of 93.5% with about
1.025 parses per token on previously unseen text. Our current system
is a prototype implementation, and we propose an efficient
implementation based on finite state transducers.
BU-CEIS-9705
TITLE: Program Schemas as Steadfast Programs and their Usage in Deductive Synthesis
AUTHOR(S): Pierre Flener (Bilkent University, Turkey),
Kung-Kiu Lau (University of Manchester, UK)
ABSTRACT:
A program schema is an abstraction of a class of
actual programs, in the sense that it represents their data-flow
and control-flow, but does not contain (all) their actual
computations nor (all) their actual data structures. We show
that schemas can be expressed as first-order open programs, that
is where some of the used relations are left undefined. Compared
to higher-order representations, this considerably simplifies the
semantics and manipulations of schemas. Actually, our schemas
are steadfast open programs, expressed in the first-order sorted
language of an axiomatisation (called framework) of the
application domain. We give correctness and steadfastness
(parametric correctness) criteria, the latter entailing
a-priori-correct reusability. All this is illustrated by means
of a schema capturing the divide-and-conquer methodology, and we
derive the abstract conditions under which it is steadfast wrt an
arbitrary specification in an arbitrary framework. Finally, we
show how to use schemas for effectively guiding the deductive
synthesis of steadfast programs from complete specifications
(i.e. complete axiomatisations of the problem), and illustrate
this by developing a strategy for synthesising divide-and-conquer
programs as well as by actually synthesising a Quicksort program.
BU-CEIS-9706
TITLE: Generalized Generalization Generalizers
AUTHOR(S): Halime Buyukyildiz and Pierre Flener
ABSTRACT: Problem generalization can be effectively used in
program transformation. Schema-based program transformation,
embodying any of the tupling, descending, or simultaneous
tupling-and-descending generalization techniques, gives rise to
full automation of the generalization process. We generalize
known generalization schemas (or: generalizers), and validate
them, based on the notions of correctness of a program,
steadfastness of a program in a set of specifications, and
equivalence of two programs. Finally, we evaluate our
generalization schemas by performance tests done on the input and
output programs of each generalization schema.
BU-CEIS-9707
TITLE: Concurrent Rule Execution In Active Databases
AUTHOR(S): Yucel Saygin (Bilkent University, Turkey)
Ozgur Ulusoy (Bilkent University, Turkey)
Sharma Chakravarthy(University of Florida, USA)
ABSTRACT: An active DBMS is expected to support concurrent as well as
sequential rule execution in an efficient manner. Nested transaction
model is a suitable tool to implement rule execution as it can handle
nested rule firing and concurrent rule execution well. In this paper,
we describe a concurrent rule execution model based on parallel nested
transactions. We discuss implementation details of how the flat
transaction model of OpenOODB has been extended by using Solaris
threads in order to support concurrent execution of rules. We also
provide a formal specification of the rule execution model using the
ACTA framework.
BU-CEIS-9708
TITLE: Classification by Voting Feature Intervals
AUTHORS: Gülsen Demiröz and H. Altay Güvenir
ABSTRACT:
A new classification algorithm called VFI (for Voting Feature
Intervals) is proposed. A concept is represented by a set of
feature intervals on each feature dimension separately.
Each feature participates in the classification by distributing
real-valued votes among classes. The class receiving the highest
vote is declared to be the predicted class. VFI is compared with the
Naive Bayesian Classifier, which also considers each feature
separately. Experiments on real-world datasets show that VFI achieves
comparably and even better than NBC in terms of classification
accuracy. Moreover, VFI is faster than NBC on all datasets.
BU-CEIS-9709
TITLE: Differential Diagnosis of Erythemato-Squamous Diseases
using Voting Feature Intervals
AUTHORS: Gülsen Demiröz, H. Altay Güvenir and
Nilsel Ilter
ABSTRACT:
A new classification algorithm, called VFI (for Voting Feature
Intervals), is developed and applied to Differential Diagnosis of
Erythemato-Squamous Diseases. The domain contains records of patients
with known diagnosis. Given a training set of such records the VFI
classifier learns how to differentiate a new case in the domain.
VFI represents a concept in the form of feature intervals on
each feature dimension separately.
Classification in the VFI algorithm is based on a real-valued
voting. Each feature equally participates in the voting process and
the class that receives the maximum amount of votes is declared to be
the predicted class. The performance of the VFI classifier is
evaluated empirically in terms of classification accuracy and running
time.
BU-CEIS-9710
TITLE: Design and Performance Evaluation of Indexing Metods for Dynamic
Attributes in Mobile Database Management Systems
(M.Sc. Thesis)
AUTHOR: Jamel Tayeb
ABSTRACT:
In traditional databases, we deal with static attributes
which change very infrequently over time and their change is handled
with an explicit update operation. In temporal databases, the time of
change of attributes is also important and every update creates a new
version. Attributes, typically change more frequently over time. A
more agitated category of attributes are the so-called dynamic
attributes whose value changes continuously over time, thus making it
impractical to explicitly update them as they change. In this thesis,
we conduct a performance evaluation study of two indexing methods for
dynamic attributes. These are based on the key idea of using a linear
function that describes the way the attribute changes over time and
allows us to predict its value in the future based on any value of it
in the past. The problem is rooted in the context of mobile data
management and draws upon the fields of spatial and temporal indexing.
We contribute various experimental results, mathematical analyses, and
improvement and optimization algorithms. Finally, inspired by a few of
the observed shortcomings of both of the studied techniques, we
propose a novel indexing method which we call the FP-Index which
is shown analytically to have promising prospects and to beat both
methods over most performance parameters.
BU-CEIS-9711-part1,
BU-CEIS-9711-part2,
BU-CEIS-9711-part3
TITLE: Experiments with Two-Stage Iterative Solvers and Preconditioned
Krylov Subspace Methods On Nearly Completely Decomposable Markov
Chains
(M.Sc. Thesis)
AUTHOR: Wail Gueaieb
ABSTRACT:
Preconditioned Krylov subspace methods are state-of-the-art iterative
solvers developed mostly in the last fifteen years that may be used,
among other things, to solve for the stationary distribution of Markov
chains. Assuming Markov chains of interest are irreducible, the problem
amounts to computing a positive solution vector to a homogeneous system
of linear algebraic equations with a singular coefficient matrix under
a normalization constraint. That is, the (n x 1) unknown
stationary vector x in
Ax=0, ||x||1=1
(1)
is sought. Hence A=I-PT, an n x n
singular M-matrix, and P is the one-step stochastic transition
probability matrix.
Albeit the recent advances, practicing performance analysts still
widely prefer iterative methods based on splittings when they want to
compare the performance of newly devised algorithms against existing
ones, or when they need candidate solvers to evaluate the performance
of a system model at hand. In fact, experimental results with Krylov
subspace methods on Markov chains, especially the ill-conditioned
nearly completely decomposable (NCD) ones, are few. We believe there
is room for research in this area specifically to help us understand
the effect of the degree of coupling of NCD Markov chains and their
nonzero structure on the convergence characteristics and space
requirements of preconditioned Krylov subspace methods.
The work of several researchers have raised important and
interesting questions that led to research in another, yet related
direction. These questions are the following: "How must one go
about partitioning the global coefficient matrix A in equation
(1) into blocks if the system is NCD and a two-stage iterative solver
(such as block successive overrelaxation---SOR) is to be employed? Are
block partitionings dictated by the NCD normal form of P
necessarily superior to others? Is it worth investing alternative
partitionings? Better yet, for a fixed labeling and partitioning of
the states, how does the performance of block SOR (or even that of
point SOR) compare to the performance of the iterative
aggregation-disaggregation (IAD) algorithm? Finally, is there any
merit in using two-stage iterative solvers when preconditioned Krylov
subspace methods are available?"
Experimental results show that in most of the test cases two-stage
iterative solvers are superior to Krylov subspace methods with the
chosen preconditioners, on NCD Markov chains. For two-stage iterative
solvers, there are cases in which a straightforward partitioning of
the coefficient matrix gives a faster solution than can be obtained
using the NCD normal form.
BU-CEIS-9712
TITLE: Iterative Methods Based on Splittings for Stochastic Automata Networks
(M.Sc. Thesis)
AUTHOR: Ertugrul Uysal
ABSTRACT:
This thesis presents iterative methods based on splittings (Jacobi,
Gauss--Seidel, Successive Over Relaxation) and their block versions
for Stochastic Automata Networks (SANs). These methods prove to be
better than the power method that has been used to solve SANs until
recently. Through the help of three examples we show that the time it
takes to solve a system modeled as a SAN is still substantial and it
does not seem to be possible to solve systems with tens of millions of
states on standard desktop workstations with the current state of
technology. However, the SAN methodology enables one to solve much
larger models than those could be solved by explicitly storing the
global generator in the core of a target architecture especially if
the generator is reasonably dense.
BU-CEIS-9713
TITLE: Correctness Proofs of Transformation Schemas
AUTHOR: Halime Buyukyildiz and Pierre Flener
ABSTRACT:
Schema-based logic program transformation has proven to be
an effective technique for the optimization of programs. Some
transformation schemas were given in the M.Sc. thesis of Buyukyildiz;
they pre-compile some widely used transformation techniques from an
input program schema that abstracts a particular family of programs
into an output program schema that abstracts another family of
programs. This report presents the correctness proofs of these
transformation schemas, based on a correctness definition of
transformation schemas. A transformation schema is correct iff the
templates of its input and output program schemas are equivalent wrt
the specification of the top-level relation defined in these program
schemas, under the applicability conditions of this transformation
schema.
BU-CEIS-9714
TITLE:
Schema-based Logic Program Transformation
(M.Sc. Thesis)
AUTHOR: Halime Buyukyildiz
ABSTRACT:
In traditional programming methodology, developing a
correct and efficient program is divided into two phases: in the
first phase, called the synthesis phase, a correct, but maybe
inefficient program is constructed, and in the second phase, called
the transformation phase, the constructed program is transformed into
a more efficient equivalent program. If the synthesis phase is guided
by a schema that embodies the algorithm design knowledge abstracting
the construction of a particular family of programs, then the
transformation phase can also be done in a schema-guided fashion using
transformation schemas, which encode the transformation techniques
from input program schemas to output program schemas by defining the
conditions that have to be verified to have a more efficient
equivalent program.
Seven program schemas are proposed, which capture sub-families of
divide-and-conquer programs and the programs that are constructed
using some generalization methods. The proposed transformation
schemas either automate transformation strategies, such as accumulator
introduction and tupling generalization, which is a special case of
structural generalization, or simulate and extend a basic theorem in
functional programming (the first duality law of the fold operators)
for logic programs. A prototype transformation system is presented
that can transform programs, using the proposed transformation
schemas.
BU-CEIS-9715
TITLE:
Non-Incremental Classification Learning Algorithms Based on
Voting Feature Intervals
(M.Sc. Thesis)
AUTHOR: Gülsen Demiröz
ABSTRACT:
Learning is one of the necessary abilities of an intelligent
agent. This thesis proposes several learning algorithms for
multi-concept descriptions in the form of feature intervals,
called Voting Feature Intervals (VFI) algorithms.
These algorithms are non-incremental classification learning
algorithms, and use feature projection based knowledge representation
for the classification knowledge induced from a set of preclassified
examples. The concept description learned is a set of intervals
constructed separately for each feature. Each interval carries
classification information for all classes. The classification of an
unseen instance is based on a voting scheme, where each feature
distributes its vote among all classes. Empirical evaluation of the
VFI algorithms have shown that they are the best performing algorithms
among other previously developed feature projection based methods in
terms of classification accuracy. In order to further improve the
accuracy, genetic algorithms are developed to learn the optimum
feature weights for any given classifier. Also a new crossover
operator, called continuous uniform crossover, to be used in
this weight learning genetic algorithm is proposed and developed
during this thesis. Since the explanation ability of a learning system
is as much important as its accuracy, VFI classifiers are supplemented
with a facility to convey what they have learned in a comprehensible
way to humans.
BU-CEIS-9716
TITLE:
Verb Instantiation by Concept Coherence and Refinement
AUTHOR: Abolfazl Fatholahzadeh (SUPELEC, France)
H. Altay Güvenir (Bilkent University, Turkey)
ABSTRACT:
This paper outlines a new approach for the verb instantiation problem
in the translation of a narrative text using both the theory of
event/state concept coherence and refinement with operators.
Our hypothesis is that sentences of a narrative source text are
conceptually coherent, and its appropriate translation should have the
same concept coherence properties.
The approach has been implemented and called TSGR (Target Sequence
Generation by Refinement).
TSGR transforms the input, a set of event/state concepts, into a
plausible sequence using refinement and concept coherence in
the target language.
A strategy for disambiguation of each verb containing multiple senses
is described.
TSGR has been tested for verb instantiations in English to French and
English to Turkish translations.
BU-CEIS-9717
TITLE:
Inductive Synthesis of Recursive Logic Programs
(M.Sc. Thesis)
AUTHOR: Serap Yilmaz
ABSTRACT:
The learning of recursive logic programs (i.e. the class
of logic programs where at least one clause is recursive) from
incomplete information, such as input/output examples, is a
challenging subfield both of ILP (Inductive Logic Programming) and of
the synthesis (in general) of logic programs from formal
specifications. This is an extremely important class of logic
programs, as the recent work on constructive induction shows that
necessarily invented predicates have recursive programs, and it even
turns out that their induction is much harder than the one of
non-recursive programs. We call this inductive program synthesis. We
introduce a system called DIALOGS-II (Dialogue-based Inductive and
Abductive LOGic Program Synthesizer-II) whose ancestor is DIALOGS. It
is a schema-guided, interactive, and non-incremental synthesizer of
recursive logic programs that takes the initiative and queries a
(possibly naive) specifier for evidence in her/his conceptual
language. It can be used by any learner (including itself) that
detects, or merely conjectures, the necessity of invention of a new
predicate. Moreover, due to its powerful codification of
"recursion-theory" into program schemata and schematic constraints, it
needs very little evidence and is very fast.
BU-CEIS-9718
TITLE:
A Redefinition of Least Generalizations and its Application to
Inductive Logic Program Synthesis
AUTHOR: Esra Erdem (University of Texas at Austin, USA)
Pierre Flener (Bilkent University, Turkey)
ABSTRACT:
The `classical' definition of the concept of least
generalization (under \theta-subsumption) of a clause set C is that it
is a single clause. Since such a unique clause is sometimes
over-general, we re-define this concept as being a minimal-sized set
of clauses, each member of this set being the least generalization
(under `classical' \theta-subsumption) of some subset of C. The
elements of these subsets are two by two compatible, in the sense that
their least generalizations (under `classical' \theta-subsumption) are
not too general. We show an algorithm for computing this redefined
concept.
The criterion for over-generality is of course problem-specific. We
design such a criterion for a problem frequently occurring in the
inductive synthesis of recursive logic programs, namely the closing of
an open program that was synthesized in a schema-biased way, but that
has one of the place-holders of the used schema still undefined.
After evidence for this undefined relation has been abduced, it needs
to be inductively generalized. A least generalization is over-general
if it is not admissible wrt a construction mode capturing the required
dataflow of the place-holder. We design a language for expressing
such construction modes (for any place-holder of any schema), and we
define powerful admissibility and compatibility criteria. We also
prove a few theorems relating the problem-independent concept of
compatibility with the problem-specific concept of admissibility:
these theorems show how to speed up certain computations for this
specific problem.
BU-CEIS-9719
TITLE:
Weighted K Nearest Neighbor Classification on Feature Projections
AUTHOR:
H. A. Guvenir and A. Akkus
ABSTRACT:
This paper proposes an extension to the k Nearest Neighbor algorithm
on Feature Projections, called kNNFP. The kNNFP algorithm has
been shown to achieve comparable accuracy with the well-known kNN
algorithm. However, kNNFP algorithm has a very low time complexity
compared to kNN.
The extension to kNNFP introduced here assigns weights to features,
therefore it is called WkNNFP, for {\it Weighted k Nearest
Neighbor on Feature Projections}. The paper also
introduces a weight learning algorithm, called SFA, for Single
Feature Accuracy. It is based on the assumption that the weight of a
feature is proportional with the accuracy that will be obtained by
considering only that feature. The SFA algorithm is not specific to
WkNNFP, so it can be used with many other classification algorithms.
An empirical evaluation of the SFA algorithm on real-world datasets
shows that it achieves an important improvement in the classification
accuracy of the WkNNFP algorithm.
BU-CEIS-9720
TITLE: A Quadtree Based Dynamic Attribute Indexing Method
AUTHORS: Jamel Tayeb, Ozgur Ulusoy, and Ouri Wolfson
ABSTRACT:
Dynamic attributes are attributes that change continuously over time
making it impractical to issue explicit updates for every change. In
this paper, we adapt a variant of the quadtree structure to solve the
problem of indexing dynamic attributes. The approach is based on the
key idea of using a linear function of time for each dynamic attribute
that allows us to predict its value in the future. We contribute an
algorithm for regenerating the quadtree-based index periodically that
is optimal in CPU and disk access cost. We also provide an
experimental study of performance focusing on query processing and
index update overheads.