Multiclass Datasets, Their Predictions, and Their Visualization
Wallace Brown and Alexander Morrow with Kevin Winner
Senior, Computer Science
Sophomore, Computer Science

Many datasets contain a wealth of information. For example, a person may be described by their race, age, gender, income, marital status, nationality, level of education, etc. By analyzing this data, we can form educated and accurate predictions about individuals. We can, for instance, determine that a person with a particular race, age, nationality, and income is likely to be a college undergraduate. Our goal is to develop ways to visualize these predictions and the uncertainty associated with the predictions. Displaying data in a scatterplot is a standard means of describing two-dimensional information. However, displaying high-dimensional data (i.e., data that includes many attributes, such as age, race, and income) is significantly more challenging.  We present a means of visualizing high-dimensional data sets and the predictive models derived from the data, using existing dimension reduction techniques and novel glyph-based displays.


In their own words:

We asked Wallace and Alexander a few questions about their research project, which was funded by an NSF Eager Award and an associated Research Experiences for Undergraduates (REU) supplement. Here's what they said:

Briefly summarize your research in non-technical terms: This project attempts to both simplify and discover new underlying trends in data through visualization. Often the tools and methods used to analyze data require a high level of technical expertise to understand and implement. Through this work, a framework that simplifies data analysis was developed and is explored to determine how best to provide useful insight to users to help define inherent structure in the data and to provide feedback into what areas there exists various forms of uncertainty.

When did you start this research? We began prior to summer of 2011 after our interest in research prompted involvement with the MAPLE Lab. After becoming involved with the project, we fulfilled a summer position to work on the project further.

Why is your research important? One of the most important aspects we are exploring is making tools more accessible and easier to use. In addition, the framework we have been developing for research will eventually be released for use in the public domain. Hopefully this framework will be useful for people wishing to process and explore large and complex data sets in an intuitive and user friendly way.

What has been the biggest challenge so far? One of the biggest challenges so far has been finding methods by which to quantify results. Since we are working with a visualization, it can be apparent that a particular technique may provide insight to a user, but difficult to quantify just how large or useful the improvement is. A user study is in development that will provide us with feedback on these techniques and help to quantify the improvements.

The biggest reward? Excitement of being involved in research and education. It’s very interesting to develop experimental products with professors, rather than work on established class material.

What advice would you give to students interested in pursuing undergraduate research? Talk to your professors often, put good effort into your courses and let them know you’re putting that effort in. They may have opportunities for you if you stay in contact and develop a strong relationship with them. Additionally, don’t believe there is not a place for yourself in research. Even with little knowledge of the area a task that is both beneficial to the project and to your understanding can usually be found. These tasks are a great way to get started with research.


Don't forget to see Wallace and Alexander's presentation at URCAD on Wednesday, April 25 in the University Center (UC) 312 from 1:15 p.m. to 1:30 p.m.