formerly University of Missouri-Rolla
Missouri S&T






Computer Science

500 West 15th Street
325 Computer Science Bldg.
Rolla, MO 65409
(573) 341-4491
csdept@mst.edu

print 
Current Research

Bioinformatics

Critical Infrastructure Protection

Software Engineering

 

BIOINFORMATICS

Bioinformatics is any application of computational methods to address biological problems. Although often used to refer to analysis of genomic information, Bioinformatics is defined broadly by the NSF and NIH as "research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data."

BioMiner: Data Mining Tools for Biological Data

Students and faculty in the Bioinformatics Lab are currently working on several research projects that utilize computational techniques to analyze biological data. Among these projects are: parallel EST clustering algorithms, automation of web tools to analyze genomic data, evaluation of the quality of generated EST clusters, data mining techniques to identify character non-independence in phylogenetic data, and ontology and database development to facilitate access to a digital library of 3D re constructions of anatomy.

Amphibian Ontology

The goal of this project is to create an ontology of amphibian morphology by enlisting a blend of manual and semi-automated approaches to mine electronic media for instances of potential concepts and properties. This research is carried out jointly with the University of Kansas. The ontology will enable disassociated research groups to overcome differences in three commonly used anatomical lexicons derived from research on each distinct order. The resulting ontology will be accessed through the web and through an application programming interface. An advisory board and outreach workshops will be organized to promote community engagement in the development and ultimate maintenance of the ontology. Because ontologies can be understood by both software agents and humans, they are an ideal vehicle for information exchange, creating the foundation for integrating knowledge from multiple disciplines. Currently, the development of most ontologies requires large amounts of manual effort. Semi-automated generation of ontologies will substantially decrease the amount of human effort required in the process. This project will demonstrate this approach in the domain of amphibian morphology with the expectation that it will be potentially applicable to many other domains. Because the meta-data inherent in ontologies allow for universal information exchange, expediting the process of ontology development will be instrumental to fully realizing a Semantic Web.

Morphology Net

MorphologyNet is an NSF-funded, freely available interactive library of three-dimensional digital re constructions of animal anatomy, which includes an easy-to-use interface that allows users to manipulate images just as they would real biological objects, including rotation in all planes simultaneously and "dissection" of tissues and structures.

Any researcher who generates 3D re constructions of animal anatomy is welcome to deposit images into the MorphologyNet library. We can accept images from nearly any data source or file format.

Bioinformatics Working Group

The BWG was created to facilitate communication among students and scientists from different disciplines in an effort to: 1) increase collaborative relationships in Bioinformatics, 2) provide educational opportunities in Bioinformatics, and 3) encourage the pursuit of external funds for Bioinformatics-related projects.

Automated Gene Family Identification Using Computational Methods

A gene family is a set of genes defined by common ancestry (presumed homology). A significant proportion of genes that make up a genome are part of larger families of related genes resulting from duplications of individual genes, genomic segments, or even whole genomes. The study of the molecular processes by which functional innovation occurs interests not only evolutionary biologists, but protein engineers and medical and agricultural biologists. A clearer understanding of the extent to which gene families contribute to the selected traits in our most important crop species help guide decisions regarding future improvements.

A significant proportion of genes in plants are members of multigene families. However, only a fraction have been discovered and characterized. Research aimed at the identification of specific gene families and their constituent members has increased significantly in last few decades. Although experimental approaches generally produce the most reliable results, they are time consuming and labor-intensive. Most strategies of gene family identification are computational approaches that take advantage of database mining and analysis tools to improve the capability and efficiency in dealing with large amounts of sequenced data.

Computational methods using EST datasets have been successful at identifying one or a few families at a time. What is needed is a less family-specific strategy that can identify many gene families at a time. In our research, we are developing automated techniques to search the Glycine Max (soybean) dbEST using seed genes and identify new gene families in soybean. Our method is based on Negative Selection Patterns (NSP). To verify the accuracy of our techniques, we are using Arabidopsis genome which is fully sequenced and publicly available.

CRITICAL INFRASTRUCTURE PROTECTION

Infrastructures, such as the electric power grid, oil and gas distribution and pipelines, transportation systems, telecommunications systems, and information systems are critical for our nation's operation. Computer Science plays a vital role in protecting these infrastructures from harm. These systems are naturally distributed, thus, are very complex; in fact, they are typically interconnected sets of systems. The distributed structure makes them vulnerable to attack at many places, in many forms including physical and cyber-based, singly or in combination with each other.

In order to assure the continued functioning of these systems, their complexity requires expertise in all aspects of possible attacks. This, therefore, includes not only protecting against physical attack and damage but also the integrated reliability and security of such large-scale systems, where an attack at one point can have drastic consequences over a much broader target area, leading to cascading failures. Further, the diversity of these systems requires expertise in many different domain areas including "hard" engineering such as civil engineering, electrical and computer engineering, and petroleum engineering, as well as computer science, software engineering, economics, social issues, and cyber security

Advanced Protection and Control of the Power Grid

One of the main vulnerabilities of modern power distribution grids is their susceptibility to cascading failures from successive losses of transmission lines. Recent developments in power research have lead to “Flexible AC Transmission System” (FACTS) devices that modify the power flow locally within a power grid. Embedded computers within the FACTS devices form a distributed computing system that can make coordinated, rapid, changes to the power flow in the grid. If a particular transmission line becomes overloaded due to a failure of a power source, the embedded computers can re-balance the power flow before a massive, cascading power failure can occur resulting in a blackout.

Possible threats to the survivability of the system can come, not only from physical disruptions, but also from security intrusions in which a hacker may attempt to confuse the distributed control algorithms. These threats are minimized by enforcing correct operation of the computing system through ensuring that its actions correspond to the correct physical rules that govern power flow. Integrating computer control with a complex physical system requires expertise in both the computing research and power research fields. The exploratory research is an interdisciplinary collaboration.  Overall scope of the project is be to examine evolving system stability, economic issues, certification of the control systems and power grid design.  Our current progress and technical information on the project can be found at the project web page. Our current effort is in constructing a Hardware In the Loop (HIL) FACTS interaction laboratory to study FACTS interactions and response to computer and power system failures.

Attacks and key management for secure data streaming and Mobile Data Management

Attacks in Sensor Networks: Many sensor network applications, such as border security, emergency response operations in the disaster environment, and battlefield monitoring, run in untrustworthy environments and require secure communication against different types of attacks. The attacks such as black hole attack and wormhole attack cause an existing route to be broken or a new route to be prevented from being established. We propose a hierarchical secure routing protocol for detecting and defending against black hole attacks and are working on detecting collaborative attacks.

Secure Aggregation in Sensor Networks: Wireless sensor networks (WSN) create a constant stream of data which flows from the sensing location towards an interface with the world, usually a more powerful computer, called base station. Since all communication is done via wireless radio links, security is an especially important topic. Most sensors run from a non-renewable energy source such as batteries and ways to increase the life of the network are constantly thought after. Aggregation or the combining of several readings along the routing path has been shown to decrease the number of radio transmissions, generally the most expensive operation in a WSN. How to handle aggregation if security is required poses a new problem. There are two central issues for secure aggregation in WSN. At each aggregation point, it is important to ensure that the actual reading where used to calculate the aggregate. Due to the nature of WSN, infiltration of malicious sensors is possible and they could falsify an aggregate result. If data security is required and standard encryption schemes are used, only constant decryption, aggregation, encryption allows for security and aggregation. This slows down the data collection process and consumes additional energy. Encryption schemes are needed which allow for aggregation without decryption, only the base station needs to be able to decrypt the aggregate result. We are proposing some algorithms to handle secure aggregation in WSN.

Key Management in WSN: Sensor nodes have limited computation and battery power, and are not very reliable. A sensor network needs to be secure against eavesdrop when it is deployed in hostile environments. In order to provide security at low cost, symmetric key based approaches have been proposed. An elliptic curve cryptography based approach has been implemented to facilitate the public-key cryptography. However, the scheme become ineffective in terms memory usage, communication time and energy required with the rapidly growing network size. We propose an Energy and Communication Efficient Group key management (ECEG) scheme which reduces the usage of memory, communication and energy in sensors.

Data Stream Security in Wireless Sensor Networks: Wireless sensor networks can generate large amounts of data; naturally that data needs to be secured. Sensors can become corrupted due to the physical environment in which they are deployed, so one important goal in wireless sensor networks is ensuring that all data is correct. Data security in wireless sensor networks encompasses data confidentiality, data integrity, and data availability. Since data transmission is via a wireless medium anybody tuned to the same frequency can intercept messages. Moreover, an attacker who simply listens to the transmissions is eavesdropping. Having certain information, an attacker can inject false messages into the network. Additionally, an attacker can spoof messages by first intercepting the message, modifying it and then re-insert the message into the network. In addition, when data are generated in sensor networks, high-speed data streams travel through the network. Traditional security approaches are often unable to keep up with the rates of the streams or they introduce overhead, which shortens the life of the network. We are particularly interested in providing a secure data processing environment which is lightweight in computational and time complexity to allow for fast processing of data, yet still provides a reasonable amount of protection against a variety of attacks, such as changing data in midstream and overhearing transmissions of packets.

Integrity Preserving Aggregation: Data aggregation is a main factor in reducing energy consumption by eliminating data redundancy and reducing communications overhead. Secure data aggregation  in sensor networks is to provide data aggregation and the energy savings while ensuring data security. Security in sensor networks requires new approaches due to the limitations of sensors and their limited computing power. Implementations in hostile environments face the additional problem of malicious corruption by attackers. When in-network aggregation is used, the base station needs to be able to trust that any corruption during the aggregation process is detectable or preventable, and all non-corrupt sensor nodes need to be sure that their readings were properly applied to an aggregate reading. The focus of this work is on the “How can a sensor network calculate an in-network aggregate and ensure that the base station is assured that the aggregate is correct or is able to identify an aggregator which reports an incorrect result”. We propose the use a secure multiparty computation (MPC) protocol. An MPC protocol allows the secure computation of almost any function with a few additional properties.

Power-aware Secure Routing Protocols: Secure routing is one of the most important aspects in sensor networks. There are several examples of attacks on routing in sensor networks, such as the routing packet could be captured or the information in the packet could be tampered, the adversary might insert spurious message in sensor networks. However, much research has been focused on making sensor networks feasible and useful, and has not been concentrated on security and therefore, the traditional route discovery algorithms are assumed to be used in the trusted environment. The performance of a protocol will be measured based on the degree of overhead associated with a given security measure and energy consumed. The power aware protocol will be modified to include secure key management and trust levels. The result will be a secure, trusted, adaptive and scalable routing protocol.

Incentive based routing protocol in MANET's: The focus of this research is to handle routing issues in Mobile Ad Hoc Networks. The main idea is to use Incentive Based Routing Protocol to avoid selfishness among mobile hosts by providing incentives to pass the information among them. By modeling the Network in Directed Weighted Graphs, we are designing a virtual currency function to find the cost/weight between two Mobile Hosts (MHs) and use this cost as an incentive for the intermediate nodes to route the information packet to the destination requester node. We will use game theoretical models to optimize the cost.

Privacy Ensured Service Discovery in MANET's: Research involves in developing a protocol that ensures private details of the mobile hosts remains secured. We are considering an ad hoc network with ubiquitous services and users. Users will not reveal their private details in the process of discovering the service in the network. Also, the service provider will not publish its services. Protocol also considers the trust issues among the participating nodes in the network

Replication issue in P2P: Here we are working on designing replication and caching schemes to handle the dynamic behavior of these mobile peers. We also address the resource constraints at these peers in order to design adaptive and dynamic replica allocation schemes. We have proposed some replication schemes which uses incentive based models for data replication.

SOFTWARE ENGINEERING

Research projects in software engineering focus on virtually all aspects of the software and systems development life cycle, from understanding and capturing customer requirements to the generation of high-performance implementations. Particular attention is also paid to systems that comprise both software and hardware components as well as ensuring the correctness and safety of developed systems.

Verification of system requirements

Experience in industrial practice has shown that roughly half of system defects trace to errors in the capture of requirements; in addition, these defects are typically discovered late in the life cycle and are the most costly defects to fix at that point. This research project aims at developing techniques to verify system requirements specifications so that they can be corrected in the early development phases. While formal techniques to address this problem are well-known, they have not been able to scale to industrial size projects. We have developed automated reasoning techniques based on a novel algebraic representation of requirements which is able to verify very large specifications. These techniques have been demonstrated to be effective in industrial applications in the telecommunications domain. The current focus of this project is to extend verification to security properties of systems, to continue to increase the performance and scope of our verification system, as well as to refine the specification language to appeal to industrial system developers.

Development and standardization of specification languages

We have contributed to significant portions of the design of UML 2, the current generation of the most widely used software specification notation. While we are continuing to participate in the standardization of this work within the OMG and the ITU, we are also developing UML profiles providing novel specification concepts to the software engineering community. For example, we are currently focusing on extensions to UML to represent real-time systems, on representations that efficiently capture system requirements, and on a realization of the aspect-oriented paradigm for software and systems modeling. 

Derivation of implementations from high-level designs

We have developed a program transformation environment (an expert system) that allows to quickly instantiate program translators and compilers based on the application of rewrite rules to representations of the input programs. These tools have been used develop tools compilers that translate high-level specifications (expressed as UML profiles) into efficient code. These compilers are currently in use in large-scale industrial software development efforts and have generated a number of shipping products from subscriber devices to telecommunications network infrastructure. We are working on further improving the capabilities of our program transformation environment based on lessons learned from application in industrial practice. The current focus is on the automation of scheduling of transformations (or compilation phases), the extension of the automation of data structure selection to general purpose collection types, the implementation of incrementalization and finite differencing techniques for algorithm optimization, and the enhancement of the analysis infrastructure to improve the inference of properties of the program under development.

Hardware/software co-design for power systems

An advanced electric power system is a complex hybrid distributed real-time system which contains physical devices, hardware, and software components. Its development needs to be based on a solid engineering methodology. A structured object-oriented development methodology based on High Order Object Model Technique (HOOMT) will be developed for its analysis, design and implementation. It is used for integrated analysis and specification of functionalities, non-functional requirements, and constraints of its hardware components, software components, and physical devices based on hierarchical decomposition through the object-oriented development method. In this method, functional requirements, non-functional requirements, and constraints of a high order object is analyzed and decomposed based on those of its component objects.

Security, safety, liveness, and vulnerability are critical in power systems controlled by a network of FACTS devices. However, their requirements for such a system are not well analyzed and understood. Most requirement engineering methods address only high level security, safety, and vulnerability requirements at system level. They should be dealt with for physical devices, hardware components, and software components level by level and functionality by functionality. HOOMT model will be extended to analyze those requirements and incorporate them in the development of the system.

HOOMT is a top-down structured object-oriented analysis and design methodology. Its high order object encapsulates its component objects and their relationships unlike objects in UML. In addition, UML is very hard to use for analysis of non-functional requirements and constraints. HOOMT is easy to expand for their analysis by incorporating non-functional requirements, and constraints into objects.

This research has two objectives: 1) use the proposed distributed hybrid real-time system which consists of a network of FACTS devices as an application domain and test bed for HOOMT so that an innovative object-oriented modeling method can be developed  for such type of systems; 2) use HOOMT as a tool for the analysis, specification, and validation of the system and facilitates the integration of its physical devices, hardware and software components.

Click here to view related abstracts and technical reports