Opportunities for Undergraduate Research Experiences
The OURE program has been established to expand opportunities for a more active form of learning by students and to encourage the interaction of undergraduate students with faculty. It has helped the expansion of the level of research activity on campus, and has been able to demonstrate that teaching and research are compatible and mutually reinforcing. This effort is all in the hopes to help recruit superior students into our graduate programs.
Students participating in the OURE program should experience a foundational understanding of how research is conducted in their disciplines and to have a greater understanding of the information resources available and how to utilize these resources. They should also be familiar with how to interpret research outcomes and learn the fundamentals of experimental design.
For more in depth OURE information please visit the OURE website here.
First find a faculty sponsor:
Each list research faculty has a research web-page:
https://cs.mst.edu/people/faculty-directory/
The student should read through the research web page of each faculty they are interested in. The student should then reach out to the faculty to discuss the possibility of doing undergraduate research with them.
Each participant and faculty sponsor is to:
*The faculty sponsor agrees to supervise the work and certifies the educational value of the proposed project
Read Instructions Below Carefully - Closes May 1st -
After coming to an agreement about research work with a faculty, then the student and advisor can fill out this application:
*Projects start at the beginning of Fall Semester and end at the conclusion of Spring semester each Academic Year
Pick out one of our esteemed faculty members to get in contact with!
Dr. Arif Zaman
Developing Data Transfer Solutions for Next-Generation High-Performance Networks
Please send your inquiries for a project along with a resume/CV to marifuzzaman@mst.edu
Research Background
Scientific applications and experiments in all areas of science are becoming increasingly network intensive as generated data needs to be moved to remote locations for processing, collaboration, and archival purposes. Despite the existence of high speed research networks up to 400Gbps, users often experience performance issues mainly due to scalability issues of existing data transfer applications. Besides the performance, existing transfer applications also suffer from reliability issues that can cause corrupt data to be accepted as authentic. This in turn motivated us to work on this area to understand the limitations of state-of-the-art methods and develop transformative solutions. Specifically, so far we have developed several transfer optimization algorithms that outperform existing solutions by up to an order of magnitude. Concurrently, to complement the data transfer optimization process, we developed network measurements and network performance anomaly detection algorithms.
Research Project: Multi-Objective Data Transfer Optimization
Our previously proposed solutions utilized black-box optimization at the application level, where the optimizer operates without access to or assumptions about system utilization statistics. While this approach offers high portability, allowing deployment on any system regardless of administrative privileges, it also necessitates continuous optimization of application parameters. Now, we are working on designing an optimization algorithm that incorporates real-time performance metrics from end hosts and networks. The lightweight monitoring agents will gather a wide range of system metrics, such as I/O and network throughput/availability, background traffic, I/O contention, and CPU and memory usage. These metrics will then be used to estimate the optimal transfer parallelism that can strike a balance between performance and overhead. This data-driven approach aims to skip the time consuming search process to find the optimal concurrency level based on availability of system resources. The collection of real-time I/O and network usage metrics presents significant research and development challenges, yet, it holds the promise of substantially accelerating the optimization process, which in turn would increase network utilization.
Dr. Sanjay Madria
Please send all inquiries along with a resume/CV to madrias@mst.edu.
Project Title: GPS-Free Localization, Object Tracking, and Path-Planning for Secure Navigation in GPA Denied Battlefield Scenarios Using Computer Vision and Deep Learning
Description-
This project includes designing a path planning algorithm for secure navigation of moving military forces in a GPS-denied environment where using GPS is denied due to tactical reasons or it is intermittently available due to unusual terrain, signal blockage, and interference. We will study and discuss the current state-of-the-art path planning algorithms and design a novel algorithm for battlefield application. A significant component of this work involves GPS-free localization of moving objects, which is crucial for tracking and navigation applications. So, we will also discuss the GPS-free object localization algorithms and will design and implement a unique solution that will be feasible for our application. In this project, we will use a stereo vision system and deep learning model to build a landmark-based localization system. The model will be able to detect and identify the geographical landmark objects in the surroundings and a stereo vision system will be used to determine the distance to the detected landmarks. To achieve the above objectives, we will incorporate our real-world landmark datasets and enhance them by gathering, processing, and labeling new image data of real-world landmarks. This comprehensive approach aims to devise a practical solution that addresses the unique challenges of navigating in GPS-denied environments.
Dr. Satish Puri
Project: Nearest Neighbor Similarity Search for Shapes like Polygons and Trajectories
Please send all inquiries along with a resume to satish.puri@mst.edu.
Description-
Similarity searches are a critical task in data mining. Nearest neighbor similarity search over geometrical shapes - polygons and trajectories - are used in various domains such as digital pathology, solar physics, and geospatial intelligence. Given a geometry/shape as a query object, the goal is to return approximately similar shapes from an input collection of billions of objects with low latency. Given the ever-increasing size of datasets, exact nearest neighbor searches requiring a scan of the entire dataset quickly become impractical, leading to approximate nearest neighbor searches. Traditional methods, such as using trees, suffer from the constraints of dimensionality. Approximate similarity search is required for scalability in processing large numbers of queries, index construction over big spatial data, and to address the dynamic nature of data itself.
Shape similarity is guided by a distance metric all Jaccard distance. In geospatial intelligence, similarity search is used to geo-locate a shape or a contour in global reference datasets. The current literature, while rich in methods for textual and image datasets, is lacking for geometric datasets. This project will develop scalable similarity search systems on polygonal and trajectory datasets.
Background needed: Data Structure and Algorithms, Programming in C/C++/Python/Java.
Dr. Suman Maity
Please send inquiries for these projects and a resume/CV to smaity@mst.edu
Ideal Candidates: be proficient in Python/R, have an interest in data science, and be eager to learn new skills.
You will gain experience with methods in data science and meta-science, advance our understanding and have the opportunity to contribute to academic publications and open-source software.
Project Option 1: Understanding Biases in Large Language Models (LLMs)
Description-
Large Language Models (LLMs) like GPT-3.5, GPT-4 have demonstrated remarkable capabilities in natural language processing tasks and have gained significant attention in recent times. For instance, ChatGPT amassed over 100 million monthly active users, marking its place as one of the fastest-growing consumer internet applications in history within a mere 2 months of its launch. While these formidable models hold great promise in enhancing productivity and stimulating creativity, they are not without their pitfalls. One glaring concern is their propensity to manifest biases ingrained in the training data, potentially resulting in harmful and inequitable outputs. We plan to conduct an in-depth analysis of biases inherent in the LLMs. This involves identifying and categorizing biases, understanding their origins, and assessing their impact on various demographic groups. It is important to note that there are some prior research that has already addressed biases in LLMs in various dimensions. This project will do a comprehensive survey on the existing literature and provide a comparative analysis of the biases among various publicly accessible models (ChatGPT, Bard etc.). Additionally, we plan to discern how the temporal evolution of these models contributes to the mitigation or exacerbation of these biases.
Project Option 2: Building Comprehensive Dataset for Mentor-Mentee Relationship and Career Trajectory of Computer Scientists
Description-
Mentorship plays a vital role in the realm of scientific pursuits, influencing critical aspects such as the selection of research topics, career trajectory decisions, and the overall achievements of both mentees and mentors. Conventionally, scholars investigating mentorship lean on datasets derived from article co-authorship and doctoral dissertations. However, the existing datasets of this
nature tend to have a narrow focus on specific fields, omitting interactions in the early stages of careers and those unrelated to formal publications. In this project, we want to develop a comprehensive mentorship dataset tailored to the domain of computer science. To achieve this, we plan to use diverse, publicly accessible datasets, including the Microsoft Academic Graph (rebranded as OpenAlex) for publication records, ProQuest for doctoral dissertations, and the Academic Family Tree. Further, we want to curate publicly available information from various sources (CV, LinkedIn profiles etc.). To enhance the richness of our dataset, we plan to incorporate semantic representations of research by leveraging state-of-the-art representations provided by the LLMs. Given the increasing significance of gender and race dimensions in scrutinizing disparities within the scientific realm, we also aim to provide estimations concerning these
aspects. Our strategy encompasses validating the accuracy of matching profiles to publications, ensuring the fidelity of semantic content representation, and confirming the precision of demographic inferences. In essence, we are driven by the curiosity to unravel the intricate role that mentorship plays in shaping the trajectories of scientists' careers.
Project Option 3: Understanding Citation Imbalances and Gendered Citation Practices
Description-
Science has been experiencing vast gender imbalances in academic participation. Such inequalities have also been found in compensation, grant funding, hiring and promotions, authorship, and citations. Despite recent progress in these areas, the presence of disparities in scholarship engagement may result in long-term inequities in other areas. This imbalance can be attributed to the ‘Matilda effect’ in which men’s contributions are seen as more central and valued, whereas women's contributions are under appreciated and under-discussed. The study of citation dynamics is an important endeavor for understanding and addressing biases in science because of the potential downstream effects of inequitable engagement with women-led and men-led work. In this project, we are interested to study how citation imbalances might be amplified or reduced due to online visibility. We plan to leverage Altmetric dataset to measure online visibility. Our long-term goal is to understand the role social media (Twitter) plays in citation dynamics.
Follow Computer Science