OURE Opportunity

Opportunities for Undergraduate Research Experiences

OURE Program Overview

The OURE program has been established to expand opportunities for a more active form of learning by students and to encourage the interaction of undergraduate students with faculty. It has helped the expansion of the level of research activity on campus, and has been able to demonstrate that teaching and research are compatible and mutually reinforcing. This effort is all in the hopes to help recruit superior students into our graduate programs.

Students participating in the OURE program should experience a foundational understanding of how research is conducted in their disciplines and to have a greater understanding of the information resources available and how to utilize these resources. They should also be familiar with how to interpret research outcomes and learn the fundamentals of experimental design.

For more in depth OURE information please visit the OURE website here.

OURE Application Process

First find a faculty sponsor:

Each list research faculty has a research web-page:

https://cs.mst.edu/people/faculty-directory/

The student should read through the research web page of each faculty they are interested in. The student should then reach out to the faculty to discuss the possibility of doing undergraduate research with them.

Each participant and faculty sponsor is to:

Discuss their project
Come to an agreement
A cumulative grade point of 2.50 is required to participate in this program
Student must be a full time student with a least 12 credit hours per semester
Submit the completed application

*The faculty sponsor agrees to supervise the work and certifies the educational value of the proposed project

Read Instructions Below Carefully - Closes May 1st -

After coming to an agreement about research work with a faculty, then the student and advisor can fill out this application:

Click Here For Application

Previous OURE participants are eligible
Interested students should contact an OURE Departmental Coordinator
Late applications will not be accepted

*Projects start at the beginning of Fall Semester and end at the conclusion of Spring semester each Academic Year

OURE: CS Department

Pick out one of our esteemed faculty members to get in contact with!

Dr. Arif Zaman

Developing Data Transfer Solutions for Next-Generation High-Performance Networks

Please send your inquiries for a project along with a resume/CV to marifuzzaman@mst.edu

Research Background

Scientific applications and experiments in all areas of science are becoming increasingly network intensive as generated data needs to be moved to remote locations for processing, collaboration, and archival purposes. Despite the existence of high speed research networks up to 400Gbps, users often experience performance issues mainly due to scalability issues of existing data transfer applications. Besides the performance, existing transfer applications also suffer from reliability issues that can cause corrupt data to be accepted as authentic. This in turn motivated us to work on this area to understand the limitations of state-of-the-art methods and develop transformative solutions. Specifically, so far we have developed several transfer optimization algorithms that outperform existing solutions by up to an order of magnitude. Concurrently, to complement the data transfer optimization process, we developed network measurements and network performance anomaly detection algorithms.

Research Project: Multi-Objective Data Transfer Optimization

Our previously proposed solutions utilized black-box optimization at the application level, where the optimizer operates without access to or assumptions about system utilization statistics. While this approach offers high portability, allowing deployment on any system regardless of administrative privileges, it also necessitates continuous optimization of application parameters. Now, we are working on designing an optimization algorithm that incorporates real-time performance metrics from end hosts and networks. The lightweight monitoring agents will gather a wide range of system metrics, such as I/O and network throughput/availability, background traffic, I/O contention, and CPU and memory usage. These metrics will then be used to estimate the optimal transfer parallelism that can strike a balance between performance and overhead. This data-driven approach aims to skip the time consuming search process to find the optimal concurrency level based on availability of system resources. The collection of real-time I/O and network usage metrics presents significant research and development challenges, yet, it holds the promise of substantially accelerating the optimization process, which in turn would increase network utilization.

References

[1] Md Arifuzzaman and Engin Arslan. Online optimization of file transfers in high-speed networks.
In Proceedings of the International Conference for High Performance Computing, Networking,
Storage and Analysis, pages 1–13, 2021.
[2] Md Arifuzzaman and Engin Arslan. Use only what you need: Judicious parallelism for file transfers
in high performance networks. In Proceedings of the 37th ACM International Conference
on Supercomputing, 2023.
[3] Md Arifuzzaman, Brian Bockelman, James Basney, and Engin Arslan. Falcon: Fair and efficient
online file transfer optimization. IEEE Transactions on Parallel and Distributed Systems, 2023.
[4] Md Arifuzzaman and Engin Arslan. Swift and accurate end-to-end throughput measurements
for high speed networks. In The Network Traffic Measurement and Analysis Conference, 2022.
[5] Hemanta Sapkota, Md Arifuzzaman, and Engin Arslan. Sample transfer optimization with
adaptive deep neural network. In 2019 IEEE/ACM Innovating the Network for Data-Intensive
Science (INDIS), pages 69–76. IEEE, 2019.
2
[6] Md Arifuzzaman, Shafkat Islam, and Engin Arslan. Towards generalizable network anomaly
detection models. In 2021 IEEE 46th Conference on Local Computer Networks (LCN), pages
375–378, 2021.
[7] Md Arifuzzaman and Engin Arslan. Learning transfers via transfer learning. In 2021 IEEE
Workshop on Innovating the Network for Data-Intensive Science (INDIS), pages 34–43, 2021.
[8] Md Arifuzzaman, Masudul Bhuiyan, Mehmet Gumus, and Engin Arslan. Be smart, save i/o:
Probabilistic approach to avoid uncorrectable errors in storage systems. In IEEE International
Conference on Cluster Computing, 2022.

Dr. Sanjay Madria

Please send all inquiries along with a resume/CV to madrias@mst.edu.

Project Title: GPS-Free Localization, Object Tracking, and Path-Planning for Secure Navigation in GPA Denied Battlefield Scenarios Using Computer Vision and Deep Learning

Description-

This project includes designing a path planning algorithm for secure navigation of moving military forces in a GPS-denied environment where using GPS is denied due to tactical reasons or it is intermittently available due to unusual terrain, signal blockage, and interference. We will study and discuss the current state-of-the-art path planning algorithms and design a novel algorithm for battlefield application. A significant component of this work involves GPS-free localization of moving objects, which is crucial for tracking and navigation applications. So, we will also discuss the GPS-free object localization algorithms and will design and implement a unique solution that will be feasible for our application. In this project, we will use a stereo vision system and deep learning model to build a landmark-based localization system. The model will be able to detect and identify the geographical landmark objects in the surroundings and a stereo vision system will be used to determine the distance to the detected landmarks. To achieve the above objectives, we will incorporate our real-world landmark datasets and enhance them by gathering, processing, and labeling new image data of real-world landmarks. This comprehensive approach aims to devise a practical solution that addresses the unique challenges of navigating in GPS-denied environments.

Dr. Satish Puri

Project: Nearest Neighbor Similarity Search for Shapes like Polygons and Trajectories

Please send all inquiries along with a resume to satish.puri@mst.edu.

Description-

Similarity searches are a critical task in data mining. Nearest neighbor similarity search over geometrical shapes - polygons and trajectories - are used in various domains such as digital pathology, solar physics, and geospatial intelligence. Given a geometry/shape as a query object, the goal is to return approximately similar shapes from an input collection of billions of objects with low latency. Given the ever-increasing size of datasets, exact nearest neighbor searches requiring a scan of the entire dataset quickly become impractical, leading to approximate nearest neighbor searches. Traditional methods, such as using trees, suffer from the constraints of dimensionality. Approximate similarity search is required for scalability in processing large numbers of queries, index construction over big spatial data, and to address the dynamic nature of data itself.

Shape similarity is guided by a distance metric all Jaccard distance. In geospatial intelligence, similarity search is used to geo-locate a shape or a contour in global reference datasets. The current literature, while rich in methods for textual and image datasets, is lacking for geometric datasets. This project will develop scalable similarity search systems on polygonal and trajectory datasets.

Background needed: Data Structure and Algorithms, Programming in C/C++/Python/Java.

Dr. Suman Maity

Please send inquiries for these projects and a resume/CV to smaity@mst.edu

Ideal Candidates: be proficient in Python/R, have an interest in data science, and be eager to learn new skills.
You will gain experience with methods in data science and meta-science, advance our understanding and have the opportunity to contribute to academic publications and open-source software.

Project Option 1: Understanding Biases in Large Language Models (LLMs)

Description-

Large Language Models (LLMs) like GPT-3.5, GPT-4 have demonstrated remarkable capabilities in natural language processing tasks and have gained significant attention in recent times. For instance, ChatGPT amassed over 100 million monthly active users, marking its place as one of the fastest-growing consumer internet applications in history within a mere 2 months of its launch. While these formidable models hold great promise in enhancing productivity and stimulating creativity, they are not without their pitfalls. One glaring concern is their propensity to manifest biases ingrained in the training data, potentially resulting in harmful and inequitable outputs. We plan to conduct an in-depth analysis of biases inherent in the LLMs. This involves identifying and categorizing biases, understanding their origins, and assessing their impact on various demographic groups. It is important to note that there are some prior research that has already addressed biases in LLMs in various dimensions. This project will do a comprehensive survey on the existing literature and provide a comparative analysis of the biases among various publicly accessible models (ChatGPT, Bard etc.). Additionally, we plan to discern how the temporal evolution of these models contributes to the mitigation or exacerbation of these biases.

Project Option 2: Building Comprehensive Dataset for Mentor-Mentee Relationship and Career Trajectory of Computer Scientists

Description-

Mentorship plays a vital role in the realm of scientific pursuits, influencing critical aspects such as the selection of research topics, career trajectory decisions, and the overall achievements of both mentees and mentors. Conventionally, scholars investigating mentorship lean on datasets derived from article co-authorship and doctoral dissertations. However, the existing datasets of this
nature tend to have a narrow focus on specific fields, omitting interactions in the early stages of careers and those unrelated to formal publications. In this project, we want to develop a comprehensive mentorship dataset tailored to the domain of computer science. To achieve this, we plan to use diverse, publicly accessible datasets, including the Microsoft Academic Graph (rebranded as OpenAlex) for publication records, ProQuest for doctoral dissertations, and the Academic Family Tree. Further, we want to curate publicly available information from various sources (CV, LinkedIn profiles etc.). To enhance the richness of our dataset, we plan to incorporate semantic representations of research by leveraging state-of-the-art representations provided by the LLMs. Given the increasing significance of gender and race dimensions in scrutinizing disparities within the scientific realm, we also aim to provide estimations concerning these
aspects. Our strategy encompasses validating the accuracy of matching profiles to publications, ensuring the fidelity of semantic content representation, and confirming the precision of demographic inferences. In essence, we are driven by the curiosity to unravel the intricate role that mentorship plays in shaping the trajectories of scientists' careers.

Project Option 3: Understanding Citation Imbalances and Gendered Citation Practices

Description-

Science has been experiencing vast gender imbalances in academic participation. Such inequalities have also been found in compensation, grant funding, hiring and promotions, authorship, and citations. Despite recent progress in these areas, the presence of disparities in scholarship engagement may result in long-term inequities in other areas. This imbalance can be attributed to the ‘Matilda effect’ in which men’s contributions are seen as more central and valued, whereas women's contributions are under appreciated and under-discussed. The study of citation dynamics is an important endeavor for understanding and addressing biases in science because of the potential downstream effects of inequitable engagement with women-led and men-led work. In this project, we are interested to study how citation imbalances might be amplified or reduced due to online visibility. We plan to leverage Altmetric dataset to measure online visibility. Our long-term goal is to understand the role social media (Twitter) plays in citation dynamics.

Learn About the Project Mentor

Suman Kalyan Maity has recently joined the Department of Computer Science at Missouri University of Science and Technology as an Assistant Professor in Fall 2023. Prior to this, he was a postdoctoral research associate at the MIT Brain and Cognitive Sciences (BCS) and MIT Center for Research on Equitable and Open Scholarship (CREOS) hosted by Prof. Roger Levy. He received his PhD in Computer Science and Engineering from Indian Institute of Technology Kharagpur. He was also the recipient of IBM PhD Fellowship and Microsoft Research India PhD Fellowship Award. His research interests lie in the interdisciplinary area of Social Data Science, where he investigates
the mechanisms and dynamics of complex social systems. His research on social media focuses on understanding how various sociolinguistic phenomena emerge, are adopted; used and misused. He has contributed in understanding the detection of hate content, misinformation and adoption of mitigation strategies to stop spread of negativity on social media platforms. He is also interested in the broad areas of Science of Science where he studies the global landscape of scientific research funding, publications and their interplay; risk taking behavior, the impact of success or failure on professional career, and various aspects of Open Science Communication –
Citation bias and inclusivity, Open Peer Review etc.

Follow Computer Science

Computer Science

OURE Opportunity

OURE Program Overview

OURE Application Process

OURE: CS Department

References

Learn About the Project Mentor

Computer Science Missouri University of Science and Technology

Missouri University of Science and Technology