Skip to main content
Sign In

College of Engineering and Applied Science

College of Engineering and Applied Sciences at UC Denver
 

Laboratories

Big Data Management and Mining Laboratory (BDLab)


At BDLab (Big Data Management and Mining Laboratory), we have organized our research and education around two tracks: a Data Science track, and a Data Management and Mining track. With the Data Science track, we engage with real-world problems that can benefit from data-driven solutions (consisting of all data scientific life-cycle components), given various combinations of the Big Data V5 challenges. Toward this end, we have experienced with a number of data-driven decision-making systems (DDSs) from various application areas, such as health informatics, oil recovery optimization, real-time surveillance, intelligent transportation, and scientific computing. The Data Science track complements the Data Management and Mining track by providing practical real-world problems, which we generalize, formalize, and rigorously study as novel data management and mining problems. In particular, we have special interest in the following areas (among others): spatiotemporal data management and mining, graph data management and mining, high-throughput data management and mining using modern hardware, and next generation database engines (or NewSQL). Our research at BDLab has been supported by grants from both governmental agencies (NSF/CENS, NIH/CTSI, DOT/METRANS, DOJ/NIJ, NASA/JPL) and industry (Google, IBM, Chevron, NGC).​

For more information, visit BDLab website.

​​
 

 

 

For research and education purposes, BDLab is equipped with an extensive computing platform consisting of numerous data management and mining software tools as well as supporting equipment, namely, a PowerEdge R920 Database Server (2x Intel Xeon E7-4820 v2 Processor, 16GB RDIMM memory, and 4TB of SSD and SAS storage), and five XPS 8700 workstations each with 16GB Dual Channel DDR3 memory and 2TB SATA HDD.  More importantly the lab is staffed with students that are skillful and knowledgeable in both data science and data management/mining areas.​ 

  • Far​noush Banaei-Kashani​ (Faculty)  

  • Mrutunjayya (MJay) Fnu (MS Student)

  • Shahab Helmi (PhD Student)

  • Ashkan Malekloo (PhD Student)

  • Zohreh Raghebi (PhD Student)​​

Activity Pattern Mining from Movement/Trajectory Datasets 

Real-time detection and monitoring of activities based on movement data derived from various and numerous data sources (namely, video, imagery, text, and sensor data) are essential enablers for many applications such as multi-INT, intelligent transportation, etc. With this project, we research and develop analytical tools for detecting incidents (i.e., simple events) from individual data sources of various modalities independently. The detected incidents are then stored, indexed, crossed-referenced using novel spatiotemporal index structures, and queried on a common time-space coordinate system to identify activities (i.e., complex events) based on the data received from all sources. The key justification behind using content for incident detection and spatiotemporal features for cross-referencing, is that while computers can extract incidents from individual data streams quite efficiently, relating these incidents based on their content has proven to be hard, particularly when events occur in large spatial and temporal context. 

Human Mobility Analysis

There is a critical need to develop monitoring technologies for management of individuals aging with and into the disabilities that affect mobility (e.g., cerebral vascular accident/stroke, spinal cord injury, aging-related balance and gait disorders, Parkinson’s disease, and traumatic brain injuries). With this project, we research and develop algorithms and tools for effective collection, analysis, and mining of multi-modal human mobility data acquired by various sensors (including 3D visual sensors such as Microsoft Kinect, as well as motion sensors). Our data collection algorithms focus on effective deployment of the data acquisition tools for accurate and efficient mobility data collection. On the other hand, with our analytics the goal is to effectively capture and mine the distinguishing signatures of mobility disorders from the multi-modal mobility data to enable automated 24/7 at home monitoring. 

Management and Mining of Large-Scale Dynamic Spatial Graphs 

With the advent of reliable positioning technologies and prevalence of location-based services, it is now feasible to accurately study the propagation of items such as infectious viruses, sensitive information pieces, and malwares through a population of moving objects, e.g., individuals, mobile devices, and vehicles. In such application scenarios, an item passes between two objects when the objects are sufficiently close (i.e., when they are, so-called, in contact), and hence once an item is initiated, it can penetrate the object population through the evolving network of contacts among objects, termed contact graph. With this project, we study and develop solutions to manage and mine large (i.e., disk-resident) evolving contact graphs which record the movement of a (potentially large) set of objects moving in a spatial environment over an extended time period.

 

High-Performance Data Management and Mining with Modern Hardware

The hardware technology has undergone major advancements over the past decade. The number of cores on a chip has grown exponentially, allowing abundant parallel processing, while modern data storage (e.g., SSD and NVM) enables significantly more efficient data storage and access. With this project, we explore opportunities in leveraging modern data processing and storage hardware to introduce novel and custom high-performance data management and mining solutions that go well beyond. 

 

Next Generation Database Engines

The database field has been addressing problems of scale for some time, but the exponential growth in data volumes over the last decade has even the traditional database providers stumped. In other words, the products that the big database firms have been selling are not up to the task. If you couple this with some interesting new hardware opportunities, you get a moment in time in which many new and radical approaches to database management systems have been proposed. At BDLab, we study and develop such modern database engines, dubbed NewSQL Data Stores, which benefit from distribution and parallelism to scale-out. In particular, we focus on data stores that consider custom data types such as spatiotemporal data, graph data, array data, and stream data, with common workloads including, visualization, ML, and analytics.​

Data Collection

  • F. Banaei-Kashani et al., “Monitoring Mobility Disorders at Home using 3D Visual Sensors and Mobile Sensors (demo paper)”, Wireless Health 2013.

  • H. Shirani-Mehr, F. Banaei-Kashani and C. Shahabi, “Users Plan Optimization for Participatory Urban Texture Documentation”, GeoInformatica, Vol. 17, Issue 1, January 2013.

  • F. Banaei-Kashani, H. Shirani-Mehr, B. Pan, N. Bopp, L. Nocera, C. Shahabi, “GeoSIM: A GeoSpatial Data Collection System for Participatory Urban Texture Documentation”, Special Issue of IEEE Data Engineering Bulletin on Spatial and Spatiotemporal Databases, June 2010.

  • H. Shirani-Mehr, F. Banaei-Kashani and C. Shahabi, “Efficient Viewpoint Assignment for Urban Texture Documentation”, ACMGIS 2009.

  • H. Shirani-Mehr, F. Banaei-Kashani and C. Shahabi, “Efficient Viewpoint Selection for Urban Texture Documentation”, GSN 2009.

Data Transfer

  • ​H. Shirani-Mehr , F. Banaei-Kashani and C. Shahabi, “A Case Study of Participatory Data Transfer for Urban Temperature Monitoring”, W2GIS 2011.
  • H. Shirani-Mehr , F. Banaei-Kashani and C. Shahabi, “Using Location-based Social Networks for Quality-Aware Participatory Data Transfer”, LBSN 2010 in conjunction with ACMGIS 2010.

  • F. Banaei-Kashani and C. Shahabi, “Case study: Scoop up data from peer-to-peer databases”, Book chapter in Handbook of Peer-to-Peer Networking (Eds.: X. Shen, H. Yu, J. Buford, M. Akon), Springer, March 2009.

  • F. Banaei-Kashani and C. Shahabi, “Fixed-precision approximate continuous aggregate queries in peer-to-peer databases (poster paper)”, ICDE 2008.

  • F. Banaei-Kashani and C. Shahabi. “Partial read from peer-to-peer databases”, Journal of Computer Communications, Vol. 31, No. 2, February 2008.

  • C. Shahabi and F. Banaei-Kashani, “Modeling peer-to-peer data networks under complex system theory”, International Journal of Computational Science and Engineering (IJCSE), Vol. 3, No. 2, November 2007.

  • F. Banaei-Kashani and C. Shahabi, “Partial selection query in peer-to-peer databases (poster paper)”, ICDE 2006.

  • F. Banaei-Kashani and C. Shahabi, “Searchable querical data networks”, International Workshop on Databases, Information Systems and Peer-to-Peer Computing (DBISP2P) in conjunction with VLDB'03, September 2003.

  • F. Banaei-Kashani and C. Shahabi, “Efficient flooding in power-law networks”, PODC 2003.

  • F. Banaei-Kashani and C. Shahabi, “Criticality-based analysis and design of unstructured peer-to-peer networks as complex systems”, 3rd International Workshop on Global and Peer-to-Peer Computing (GP2PC) in conjunction with CC-Grid, May 2003.

  • C. Shahabi and F. Banaei-Kashani, “Decentralized resource management for a distributed continuous media server”, IEEE Transactions on Parallel and Distributed Systems (IEEE TPDS), Vol. 13, No. 7, July 2002.

Data Preprocessing

  • F. Banaei-Kashani, M. Asghari, M Rahmani, C. Shahabi, Lisa Brenskelle,“SDPF: A Framework for Online, Real-time Cleansing of Upstream Operating Data”, SPE Western Regional Meeting, April 2013.

Data Storage for Querying

  • P. Ghaemi , K. Shahabi, J. Wilson, F. Banaei-Kashani, “A Comparative Study of Two Approaches for Supporting Optimal Network Location Queries”, GeoInformatica, Vol. 18, Issue 2, April 2014.

  • Seyed Jalal Kazemitabar, F. Banaei-Kashani, Seyed Jalil Kazemitabar, Dennis McLeod, “Efficient Batch Processing of Proximity Queries by Optimized Probing”, ACMGIS 2013.

  • P. Ghaemi , K. Shahabi, J. Wilson, F. Banaei-Kashani, “Continuous Maximal Reverse Nearest Query on Spatial Networks”, ACMGIS 2012.

  • H. Shirani-Mehr, F. Banaei-Kashani and C. Shahabi, “Reachability Query in Large Evolving Contact Networks”, VLDB 2012.

  • U. Demiryurek, F. Banaei-Kashani, C. Shahabi and Anand Ranganathan, “Online Computation of Fastest Path in Time-Dependent Spatial Networks”, SSTD 2011.

  • A. Akdogan, U. Demiryurek, F. Banaei-Kashani and C. Shahabi, “Voronoi-based Geospatial Query Processing with MapReduce”, IEEE CloudCom 2010 [Best Paper Award].

  • P. Ghaemi , K. Shahabi, J. Wilson, F. Banaei-Kashani, “Optimal Network Location Queries”, ACMGIS 2010.

  • U. Demiryurek, F. Banaei-Kashani and C. Shahabi, “A Case for Time-Dependent Shortest Path Computation in Spatial Networks (poster)”, ACMGIS 2010.

  • B. Pan, U. Demiryurek, F. Banaei-Kashani and C. Shahabi, “Spatiotemporal Summarization of Traffic Data Streams”, IWGS 2010 in conjunction with ACMGIS 2010.

  • U. Demiryurek, F. Banaei-Kashani and C. Shahabi, “Efficient K-Nearest Neighbor Search in Time-Dependent Spatial Networks”, DEXA 2010.

  • L. Kazemi, F. Banaei-Kashani, C. Shahabi and R. Jain, “Efficient Approximate Visibility Query in Large Dynamic Environments”, DASFAA 2010.

  • U. Demiryurek, F. Banaei-Kashani and C. Shahabi, “TransDec: A Spatiotemporal Query Processing Framework for Transportation Systems (demo paper)”, ICDE 2010.

  • F. Banaei-Kashani and C.Shahabi, “Fixed-precision approximate continuous aggregate queries in peer-to-peer databases”, CollaborateCom 2010.

  • U. Demiryurek, B. Pan, F. Banaei-Kashani and C. Shahabi, “Temporal Modeling of Spatiotemporal Networks”, IWCTS 2009 in conjunction with ACMGIS 2009.

  • L. Nocera, A. Rihan, S. Xing, A. Khodaei, A. Khoshgozaran, C. Shahabi, F. Banaei-Kashani, “GeoDec: A Multi-Layered Query Processing Framework for Spatiotemporal Data (demo paper)”, ACMGIS 2009.

  • U. Demiryurek, F. Banaei-Kashani and C. Shahabi, “ER-CkNN: Efficient continuous nearest neighbor query in spatial networks using Euclidian Restriction”, SSTD 2009.

  • F. Banaei-Kashani and C. Shahabi, “Applications of sensor network data management”, Book chapter in Encyclopedia of Database Systems (Eds.: T. Ozsu and L. Liu), Springer, 2009.

  • C. Shahabi, M. Jahangiri and F. Banaei-Kashani, “ProDA: An end-to-end wavelet-based OLAP system for massive datasets”, IEEE Computer, Vol. 41, No. 4, April 2008.

  • F. Banaei-Kashani and C. Shahabi, “SWAM: A family of access methods for similarity-search in peer-to-peer data networks”, CIKM 2004.

  • F. Banaei-Kashani, C. Chen and C. Shahabi, “WSPDS: Web Services Peer-to-peer Discovery Service”, ISWS 2004.

Data Analysis and Mining

  • F. Banaei-Kashani, C. Shahabi and B. Pan, “Discovering Traffic Patterns in Traffic Sensor Data”, IWGS 2011 in conjunction with ACMGIS 2011.

  • C. Shahabi and F. Banaei-Kashani, “Efficient and anonymous web usage mining for web personalization”, INFORMS Journal on Computing Special Issue on Mining Web-Based Data for e-Business Applications, Vol. 15, No. 2, Spring 2003.

  • C. Shahabi and F. Banaei-Kashani, “A framework for efficient and anonymous web usage mining based on client-side tracking”, Book Chapter: Lecture Notes in Computer Science, Vol. 2356, 2001.

  • C. Shahabi, F. Banaei-Kashani, Y. Chen and D. McLeod, “Yoda: An accurate and scalable web-based recommendation system”, CoopIS 2001.

  • C. Shahabi, F. Banaei-Kashani, J. Faruque and A. Faisal, “Feature Matrices: A model for efficient and anonymous web usage mining”, ECWeb 2001.

  • C. Shahabi, F. Banaei-Kashani and J. Faruque, “A reliable, efficient, and scalable system for web usage data acquisition”, Workshop on Web Mining and Web Usage Analysis (WebKDD) in conjunction with KDD Conference, August 2001.

Data Visualization

  • C. Shahabi, F. Banaei Kashani, A. Khoshgozaran and S. Xing, “GeoDec: A framework to effectively visualize and query geospatial data for decision-making”, IEEE Multimedia Magazine, July 2010.

  • C. Shahabi, F. Banaei-Kashani and K. Song, “On-the-fly Visualization of Scientific Geospatial Data Using Waveltes”, Microsoft e-Science Workshop, December 2008.​ 

  • Introduction to Data Science

  • Database System Concepts (CSCI 3287)

  • Database Systems (CSCI 5999)

  • Data Mining and Analytics (CSCI 5702/7702)

  • Advanced Database Systems

  • Advanced Data Stores (CSCI 5800)​