Original Project Proposal

CS513 Project Proposal
Ontology and Automation Methods for Categorization of Text Repositories on Internet

Eray Özkural


Date: January 3, 2001

In this project, the problem of categorizing a multitude of text resources on Internet sites will be tackled. Among these resources are text files and html files which are in abundance over the net, and have been multiplying. Although search engines can be used to access information on the Internet, ontologies and automatic text categorization provide alternative approaches.A categorization of the whole Internet would be similar to Internet Directories such as Open Directory Project . Automatic text categorization is applied for Computer Science papers on the net at the Cora CS Research Paper Search Engine

Following the explosion of text data, many sites are accumulated with countless text resources. Often, these sites present http based navigation or search engines for data access. Many of these sites have numerous contributors, including anonymous posters. In the case of such repositories, amending the categorization of data is a prominent issue. While data can be classified by hand-editing navigation indices, formal ontology and automatic text categorization tools may give better results by presenting categorization developers a dedicated environment with languages and interfaces tailored for this specific task, automating laborious tasks.

Within the scope of this project, papers and technologies which use these approaches will be surveyed and the results will be presented as a web site. In addition to this, a sample tool for categorization using either of the approaches will be developed. The developed tool will be used on the server side to aid repository maintainers present an intuitive interface to users. Then, the tool will be used to categorize a sample text repository.