All technical report files are in compressed (gzip) postscript format.
The file names are of the sort BU-CE-YYXX.pdf
BU-CE-0501
TITLE: GUAP - A Strong User Authentication Protocol for GSM
AUTHORS: Özer Aydemir
ABSTRACT:
Traditionally, the authentication protocols for cellular phone
networks have been designed for device authentication rather than user
authentication, which brings limitations and restrictions on the
functionality of the system. In this thesis we propose a user
authentication protocol for Global Standards for Mobile (GSM). Our
protocol permits the use of weak secrets (e.g. passwords or PINs) for
authentication and provides certain flexibilities for GSM users. The
simulation results on currently established user authentication
protocols and GUAP are presented. The protocol also has a capture
resilience extension to disable captured cellular phones securely.
Keywords:
Wireless network security, user authentication, GSM, strong password
protocols, capture resilient
BU-CE-0502
TITLE: Harbinger Machine Learning Toolkit Manual
AUTHORS: Berkant Barla Cambazoglu and Cevdet Aykanat
ABSTRACT:
This manual is the primary guide to the Harbinger Machine Learning
Toolkit (HMLT), which provides implementations for some well-known and
frequently used machine learning classifiers. The main concerns in
development of HMLT are correctness, effectiveness, transparency,
modularity, and re-usability. At the moment, efficiency is not claimed
to be a primary concern in any part of the toolkit. This is basically
due to the fact that all supported classifier implementations use
common representations and data structures, preventing further
utilization and employment of some classifier-specific
optimizations. However, we believe that the toolkit is quite robust
and supposed to successfully execute under most circumstances.
Keywords:
Machine learning, toolkit, classification, validation.
BU-CE-0503
TITLE: Hypergraph-Partitioning-Based Remapping Models for
Image-Space-Parallel Direct Volume Rendering of Unstructured Grids
AUTHORS: B. Barla Cambazoglu and Cevdet Aykanat
ABSTRACT:
In this work, image-space-parallel direct volume rendering (DVR) of
unstructured grids is investigated for distributed-memory
architectures.
A hypergraph-partitioning-based model is proposed for the adaptive
screen partitioning problem in this context. The proposed model aims
to balance the rendering loads of processors while trying to minimize
the amount of data replication. In the parallel DVR framework we
adopted, each data primitive is statically owned by its home
processor, which is responsible from replicating its primitives on
other processors. Two appropriate remapping models are proposed by
enhancing the above model for use within this framework. These two
remapping models aim to minimize the total volume of communication in
data replication while balancing the rendering loads of
processors. Based on the proposed models, a parallel DVR algorithm is
developed. The experiments conducted on a PC cluster show that the
proposed remapping models achieve better speedup values compared to
the remapping models previously suggested for image-space-parallel DVR.
Keywords:
Direct volume rendering, unstructured grids, ray casting, image space
parallelization, hypergraph partitioning, screen partitioning,
remapping.
BU-CE-0504
TITLE:
Processing Count Queries over Event Streams at Multiple Time Granularities
AUTHORS: Aykut Ünal, Yücel Saygın, Özgür Ulusoy
ABSTRACT:
Management and analysis of streaming data has become crucial with
its applications in web, sensor data, network traffic data, and
stock market. Data streams consist of mostly numeric data but what
is more interesting is the events derived from the numerical data
that need to be monitored. The events obtained from streaming data
form event streams. Event streams have similar properties to data
streams, i.e., they are seen only once in a fixed order as a
continuous stream. Events appearing in the event stream have time
stamps associated with them in a certain time granularity, such as
second, minute, or hour. One type of frequently asked queries over
event streams is count queries, i.e., the frequency of an event
occurrence over time. Count queries can be answered over event
streams easily, however, users may ask queries over different time
granularities as well. For example, a broker may ask how many
times a stock increased in the same time frame, where the time
frames specified could be hour, day, or both. This is crucial
especially in the case of event streams where only a window of an
event stream is available at a certain time instead of the whole
stream. In this paper, we propose a technique for predicting the
frequencies of event occurrences in event streams at multiple time
granularities. The proposed approximation method efficiently
estimates the count of events with a high accuracy in an event
stream at any time granularity by examining the distance
distributions of event occurrences. The proposed method has been
implemented and tested on different real data sets and the results
obtained are presented to show its effectiveness.
Keywords:
Count Queries, Data Streams, Event Streams, Time Granularity, Association Rules, Data Mining
BU-CE-0505
TITLE:
GnuSim: A General Purpose Simulator for Gnutella and Unstructured P2P Networks
AUTHORS: Murat Karakaya, Ibrahim Korpeoglu, Özgür Ulusoy
ABSTRACT:
Peer-to-peer networks have attracted a significant amount of interest
as a popular and successful alternative to traditional client-server
networks for resource sharing and content distribution. Since a P2P
network consists of many nodes, thousands or millions, it is hard to
evaulate the performance of a P2P network and some related protocols
analytically using only mathematical models. Most often we need to do
simulations. Therefore it is important to have a simulation tool
specifically designed for P2P network and protocol simulation. We
developed such a tool, and in this report we present the design and
implementation of this simulation tool, which we call GnuSim. The
report includes both the simulation model and the code details. This
work will be useful for researchers doing simulation study in P2P
systems domain.
BU-CE-0506
TITLE:
A Library for Parallel Sparse Matrix-Vector Multiplies
AUTHORS: Bora Ucar and Cevdet Aykanat
ABSTRACT:
We provide parallel matrix-vector multiply routines for 1D and
2Dpartitioned sparse square and rectangular matrices.
We clearly give pseudocodes that perform necessary
initializations for parallel execution. We show how to maximize
overlapping between communication and computation through the
proper usage of compressed sparse row and compressed sparse
column formats of the sparse matrices. We give pseudocodes for
multiplication routines which benefit from such overlaps.
BU-CE-0507
TITLE:
Reducing Query Overhead Through Route Learning In Unstructured
Peer-to-Peer Networks
AUTHOR: Selim Ciraci
ABSTRACT:
We provide parallel matrix-vector multiply routines for 1D and
2Dpartitioned sparse square and rectangular matrices.
We clearly give pseudocodes that perform necessary
initializations for parallel execution. We show how to maximize
overlapping between communication and computation through the
proper usage of compressed sparse row and compressed sparse
column formats of the sparse matrices. We give pseudocodes for
multiplication routines which benefit from such overlaps.
BU-CE-0508
TITLE:
An Ontology for Computer-Aided Modeling of Cellular Processes
AUTHOR: Emek Demir
ABSTRACT:
Cellular processes form the hardware layer of living organisms.
Malfunctions in cellular processes are responsible for most of the
currently incurable diseases. Not surprisingly, knowledge about
cellular processes are growing at an enormous rate. However, today's
molecular biology suffers from lack of a formal representation system
for cellular processes. Most of the knowledge is locked in literature,
that are not accessible to computational analysis and modeling. Given
the complexity of the system we are attacking, the need for a
representation system and modeling tools for cellular processes are
clear.
In this dissertation, we describe an ontology for modeling processes.
Our ontology possesses several unique features, including ability to
represent abstractions and multiple levels of detail, cellular
compartments and molecular states. Furthermore, it was designed to
meet several user and system requirements, including ease of
integration, querying, analysis and visualization.
Based on this ontology we also implemented a set of software tools
within the PATIKA project. Primary use cases of PATIKA are
integration, querying and visualization, and we have obtained
satisfactory results proving the feasibility of our ontology.
Compared with existing alternative methods of representing and
querying information about cellular processes, PATIKA provides several
advantages, including a regular representation system, powerful
querying options, an advanced visualization. Moreover PATIKA models
can be analyzed by computational methods such as flux analysis or
pathway activity inference. Although it has a more steep learning
curve compared to existing ad hoc representation systems, we believe
that tools like PATIKA will be essential for molecular biology
research in the future.
Keywords:
Bioinformatics, ontology, pathway.
BU-CE-0509
TITLE:
Parallel Sparse Matrix-Vector Multiplies and Iterative Solvers
AUTHOR: Bora Uçar
ABSTRACT:
Sparse matrix-vector multiply (SpMxV) operations are in the kernel of many
scientific computing applications. Therefore, efficient
parallelization of SpMxV operations is of prime importance to
scientific computing community. Previous works on parallelizing SpMxV
operations consider maintaining the load balance among processors and
minimizing the total message volume. We show that the total message
latency (start-up time) may be more important than the total message
volume. We also stress that the maximum message volume and latency
handled by a single processor are important communication cost metrics
that should be minimized. We propose hypergraph models and hypergraph
partitioning methods to minimize these four communication cost metrics
in one dimensional and two dimensional partitioning of sparse
matrices. Iterative methods used for solving linear systems appear to
be the most common context in which SpMxV operations arise. Usually,
these iterative methods apply a technique called preconditioning.
Approximate inverse preconditioning—which can be applied to a large class of
unsymmetric and symmetric matrices—replaces an SpMxV operation by a series
of SpMxV operations. That is, a single SpMxV operation is only a piece of a
larger computation in the iterative methods that use approximate
inverse preconditioning. In these methods, there are interactions in
the form of dependencies between the successive SpMxV
operations. These interactions necessitate partitioning the matrices
simultaneously in order to parallelize a full step of the subject
class of iterative methods efficiently. We show that the simultaneous
partitioning requirement gives rise to various matrix partitioning
models depending on the iterative method used. We list the
partitioning models for a number of widely used iterative methods. We
propose operations to build a composite hypergraph by combining the
previously proposed hypergraph models and show that partitioning the
composite hypergraph models addresses the simultaneous matrix
partitioning problem. We strove to demonstrate how the proposed
partitioning methods—both the one that addresses multiple
communication cost metrics and the other that addresses the
simultaneous partitioning problem—help in practice.
We implemented a library and investigated the performances of the partitioning
methods. These practical investigations revealed a problem that we call message
ordering problem. The problem asks how to organize the send operations
to minimize the completion time of a certain class of parallel
programs. We show how to solve the message ordering problem optimally
under reasonable assumptions.
Keywords:
Sparse matrices, parallel matrix-vector multiplication, iterative methods,
preconditioning, approximate inverse preconditioner, hypergraph partitioning.
BU-CE-0510
TITLE:
A Comprehensive Performance Evaluation of a Novel Framework
Against Free Riding in Peer-to-Peer Networks.
AUTHOR: Murat Karakaya, Ibrahim Korpeoglu and Ozgur Ulusoy
ABSTRACT:
Peer-to-peer computing paradigm has attracted a signcant amount of
interest as a popular and successful alternative to traditional
client-server paradigm for resource sharing and content distribution.
However, the existence of high degrees of free riding may be an
important threat against the proper operation and high-performance of
P2P networks.
In this report, we propose a distributed framework to reduce the
degree of free riding and its adverse affects on P2P networks. Our
solution primarily focuses on locating free riders and taking
counter-actions against them. As part of our solution, we propose an
approach in which each peer monitors its neighbors, makes decisions if
they are free riders or not, and takes appropriate actions if it
decides that one of the neighbors is a free rider. To be able to
identify the peers as free riders, we define sample free riding types
and describe how they are related with the protocol messages of
existing P2P protocols such as Gnutella. As a result, we employ sample
formulas to determine if a neighboring peer exhibits any kind of free
riding or not. Furthermore, we present example counter action schemes
that can be applied to neighbors that are detected as free riders. We
then combine the mechanisms to detect free riders and the actions that
can be taken against them into a practical framework that can be used
in an unstructured P2P network. Unlike other distributed mechanisms
against free riding, our framework does not require any permanent
identcation of peers or security infrastructures for maintaining a
global reputation system. In the work, we also discuss the probable
attacks to the proposed framework and their effects on it.
We also performed a sample implementation of the proposed framework
as part of a simulation model, and conducted extensive simulation
tests. Our simulations results show that by reducing the amount of
free riding in a P2P network, we can accomplish an increase in the
performance of the P2P networks for contributor peers.
Keywords:
Free Riding, Peer-to-Peer Networks, Gnutella, Distributed Computing,
Performance Evaluation.
BU-CE-0511
TITLE:
Pattern Information Extraction From Crystal Structures
AUTHOR: Erhan Okuyan
ABSTRACT:
Determining crystal structure parameters of a material is a quite
important issue in crystallography. Knowing the crystal structure
parameters helps to understand physical behavior of material. For
complex structures, particularly for materials which also contain
local symmetry as well as global symmetry, obtaining crystal
parameters can be quite hard. This work provides a tool that will
extract crystal parameters such as primitive vectors, basis vectors
and space group from atomic coordinates of crystal structures. A
visualization tool for examining crystals is also provided.
Accordingly, this work presents a useful tool that help
crystallographers, chemists and material scientists to analyze crystal
structures efficiently.
Keywords:
crystal, crystallography, chemistry, material science, pattern
recognition, primitive vectors, basis vectors, space group, symmetry.
BU-CE-0512
TITLE:
Automatic Detection of Salient Objects for a Video Database System
AUTHOR: Tarkan Sevilmis
ABSTRACT:
Recently, the increase in the amount of multimedia data has unleashed
the development of storage techniques. Multimedia databases is one of
the most popular of these techniques because of its scalability and
ability to be queried by the media features. One downside of these
databases is the necessity for processing of the media for feature
extraction prior to storage and querying. Ever growing pile of media
makes this processing harder to be completed manually. This is the
case with BilVideo Video Database System, as well. Improvements on
computer vision techniques for object detection and tracking have made
automation of this tedious manual task possible. In this thesis, we
propose a tool for the automatic detection of objects of interest and
deriving spatio-temporal relations between them in video frames. The
proposed framework covers the scalable architecture for video
processing and the stages for cut detection, object detection and
tracking. We use color histograms for cut detection. Based on detected
shots, the system detects salient objects in the scene, by making use
of color regions and camera focus estimation. Then, the detected
objects are tracked based on their location, shape and estimated
speed.
Keywords:
Video Databases, Video Object Detection, Object Tracking, Camera Focus Estimation.