DSL Sponsored Research
Computational Genomics
Sponsor: Dept. of Bio-Technology, Govt. of India
Investigators: M. Bansal (MBU), S. Visweswara (MBU), N. Balakrishnan (SERC), J. Haritsa
Summary:
The goal of this project is to gain insights into biological processes
through computational techniques. Firstly, the vast amount of
genomics data will be analyzed with a variety of tools for establishing
the sequence-structure-function relationships in proteins and DNA.
Secondly, new mathematical and computational tools will be developed to
gain new insights from the genomics data. Thirdly, the results of the
investigations will be stored in custom-designed databases that can be
accessed by scientists through public domain platforms. Finally, system
architectures that are specifically tuned for computational genomics
applications will be developed.
Status: Completed
Duration: June 2002 - May 2007
Database Engines for Sequences
Sponsor: Dept. of Science and Technology, Govt. of India
Investigator: J. Haritsa
Summary:
Biological sequence data is not well supported by current database
technology, especially with regard to index structures, data storage,
query optimization, query operators and preservation of privacy. Due to
these shortcomings, biologists are forced to store their sequences
in flat files, resulting in large running times for their queries,
which will become only worse in the future given the exponential
increase in the sequence data size and the number of queries. In this
project, we aim to tackle the above-mentioned shortcomings and design
a sequence-friendly database engine. Overall, the goal is to ensure
that the standard biological queries of today which typically take a
couple of hours to process can instead be handled in a much smaller time
period. This will require addressing the problem in a holistic manner
across all components of the database system and will involve algorithmic,
structural, architectural and mathematical innovations.
Status: Completed
Duration: January 2003 - December 2007
OSHADHI: Design and Implementation of a Bio-diversity Database Management System
Investigators: J. Haritsa, M. Gadgil (CES), V. Nanjundiah (CES)
Sponsor: Dept. of Bio-technology, Govt. of India
Summary:
The biodiversity conservation of the large number of plant species of
India has become very essential with the rapid growth in the number of
plant species which strengthen the genetic base of India due to their
rich economic importance. Several organizations undertaking the
measures to conserve the biodiversity such as identification of species
and monitoring the climatic conditions, are finding it necessary to
have efficient and natural access to a variety of biodiversity data.
The goal of the OSHADHI project is to adapt state-of-the-art database
technology to the biodiversity domain and develop a comprehensive
database management system for the biodiversity community.
The technology inputs that will be used include object-oriented
modeling, hierarchical access methods, spatial access techniques,
extensible database systems, and client-server architectures.
Status: Completed
Duration: September 1998 - March 2001
Design and Analysis of Database Mining Algorithms
Sponsor: HITACHI Ltd., Japan
Investigator: J. Haritsa
Summary:
The problem addressed by this project is to design and analyze
sampling-based algorithms for database mining. In particular, we wish
to clarify the relationship between knowledge extracted from full-scale
data and that from sampled data, using statistical methods. The
advantage of sampling is that inferences about an entire population can
be made based on characteristics exhibited by a representative subset
of the population. This is achieved, however, at some cost in the
accuracy of the results. We will attempt to quantify the performance
versus accuracy tradeoff explicitly, and thereby determine the sample
sizes needed to achieve the level of accuracy desired by the user. We
will also investigate the possibility of deriving theoretical bounds on
the accuracy of the sampling algorithms. Apart from theoretical
results, we will also attempt to develop incremental data mining
algorithms, wherein sampling is used as a first-cut technique for
narrowing down the search space and then fine grain techniques are used
to evaluate the reduced search space over larger data sets.
Status: Completed
Duration: December 1997 - May 1998
MINTO: A Software Tool for Mining Manufacturing Databases
Sponsor: Dept. of Science and Technology, Govt. of India
Investigator: J. Haritsa
Summary:
The goal of Database Mining is to discover information from historical
organizational databases that can be used to improve their business
decisions. Developing efficient algorithms for mining has become an
active area of research in the database community in the last few
years. However, all the current research prototypes are addressed
towards commercial retail data, not manufacturing data which is far
more complex, larger in size and different in nature. The problem
addressed by this project is to develop a database mining package for
mining manufacturing data. This mining software tool, called
MINTO, will be customized for manufacturing databases. In
particular, we wish to include constructs that support both tabular and
complex data, devise sampling techniques that will allow for pattern
generation without scanning through the entire database and quantify
the error introduced by such sampling techniques, and devise parallel
mining algorithms that will speedup the analysis process. Finally,
we wish to develop a graphical user interface that facilitates
usage of the tool.
Status: Completed
Duration: January 1997 - March 1999
MIDAS: A Database Design for Flexible Manufacturing Systems
Investigators: J. Haritsa and V. Rajaraman
Sponsor: Dept. of Science and Technology, Govt. of India
Summary:
Flexible manufacturing systems (FMS) cater to recent manufacturing
trends such as continuous variability in product mix, frequent design
changes and just-in-time inventory control. In order to achieve the
required degree of flexibility, FMS need real-time access to
information about the plant organization and operation. The MIDAS
project aims to develop an object-oriented database system for use in
flexible manufacturing environments. The system will support complex
objects, active mechanisms, decision support, and embedded control.
Status: Completed
Duration: February 1995 - January 1997
A Distributed OO Simulation Testbed for Manufacturing Systems
Sponsor: Dept. of Science and Technology, Govt. of India
Investigators: M. Jacob and J. Haritsa
Summary:
Simulation has become the primary method of studying modern
manufacturing systems involving computer control. It is being
increasingly used not only at the time of design of a new manufacturing
system, but also in real-time or operational environments for decision
support. Traditional simulation languages have been used for this
purpose, but they do not provide the flexibility required for detailed
simulation of complex systems. Object oriented simulation
environments are viewed as a good solution to this problem. Our aim
therefore, in this project, is to develop an object oriented simulation
testbed for manufacturing systems on a distributed computing platform.
Status: Completed
Duration: December 1994 - November 1996
DIAS: An Object-Oriented Database for Interconnect Analysis
Sponsor: Texas Instruments India Pvt. Ltd.
Investigator: J. Haritsa
Summary:
Due to drastic reductions in feature sizes of VLSI chips, device
interconnect parasitics have begun to have a significant adverse impact
on chip performance. Therefore, the electrical characteristics of
these parasitics have to be taken into account in the design of IC
chips. This involves processing data that is both large in size and
complex in nature, calling for effective data management. The DIAS
project aims to develop an object-oriented database system for
addressing the parasitic data management problem.
Status: Completed
Duration: July 1995 - December 1996