Big Data

Cultural Activity Diversity and Community Characteristics: An Exploratory Study

Cultural diversity has been conceptualized and studied in diverse ways. On the one hand, cultural diversity can be conceptualized based on people’s ethnic and national backgrounds. On the other hand, cultural dimensions are defined based on individuals' behaviors and traits. Sociologists further categorize the latter depending on the degree of typicality in cultural artifacts/activities and individuals’ omnivorousness over cultural tastes.

Local Information Landscapes: Theory, Measures, and Evidence

To understand issues about information accessibility within communities, research studies have examined human, social, and technical factors by taking a sociotechnical view. While this view provides a profound understanding of how people seek, use, and access information, this approach tends to overlook the impact of the larger structures of information landscapes that constantly shape peoples access to information.

Virtual Observatory of Innovation Communities and Ecosystems (VOICE): Advancing Big Data with Ecology Theory and Data Science

Today's digital revolution is fostering remarkable innovations. The landscape of innovation is rapidly changing and thus difficult to navigate or study. Understanding broader participation in innovation requires large datasets on the innovation participants and their activities. As organizational and industry boundaries become fuzzy in the digitized world, small data analysis cannot fully explain how boundaries shift and evolve. Open innovation leads to overlapping roles of designer and user, making it inadequate to examine innovation development and adoption separately.

Making Information Deserts Visible: Computational Models, Disparities in Civic Technology Use, and Urban Decision Making

This research will develop a foundational tool for understanding how civic technologies are used and how information inequalities manifest in a city. User data from new civic technologies that reveal inequalities in the information environments of citizens has only recently become available. Since a large portion of data is demographically or geospatially biased due to varying human-data relationships, computational social scientists have used data modeling and algorithmic techniques to adjust the data and remove biases during data-processing.

A Tool for Estimating and Visualizing Poverty Maps

"Poverty maps" are designed to simultaneously display the spatial distribution of welfare and different dimensions of poverty determinants. The plotting of such information on maps heavily relies on data that is collected through infrequent national household surveys and censuses. However, due to the high cost associated with this type of data collection process, poverty maps are often inaccurate in capturing the current deprivation status.

MapReduce Framework for Swarm Robot Systems

This was a part of research projects conducted under the Basic Research Laboratory Grant from Ministry of Education, Science, and Technology in South Korea. This project is two folds: (1) simulating an application of swarm robot systems; (2) designing a software framework for the swarm robot systems to reduce the complexity of developing applications while minimizing the amount of transmitted data by adopting MapReduce paradigm. The video above is a simulation of a swarm robot system application that searches for red pillars (foraging).

Large-scale News Image Analysis with MapReduce-based LSH and VisualRank

Hao Li (Ph.D. student from CS) and I conducted a big-data analysis project using the MapReduce framework (Hadoop) for the final project of INFM718G (Data-Intensive Computing with MapReduce, by Dr. Jimmy Lin). Targeting all the news images in April 2013, we tried to rank news images based on the importance and popularity level of each news image. To do that, we extracted image features using SIFT (Scale-invariant feature transform) and constructed a graph of images using LSH (Locality-sensitive Hashing) as a means to approximate the similarity of images.