Uncertain Spatial Data Mining


Problem

Trajectories are sequences of (ObjectID, Location, Time) triples. In most applications, trajectories are sparse and uncertain for many reasons:

  • Locations may only be captured in discrete locations, such as check-ins in a location-based social network or RFID sensor readings,

  • Location updates may be infrequent to preserve the battery of sensors,

  • Capture locations may be inaccurate due to sensor uncertainty (such as in GPS readings),

  • Uncertainty may be added deliberately to preserve the privacy of users.

User Check-ins in Los Angeles.

Dataset provided by: E. Cho, S. A. Myers and J. Leskovek. Friendship and Mobility: User Movement in Location-Based Social Networks. SIGKDD 2011.

Challenge

There are models to capture uncertainty. Discrete and continuous models describe the (uncertain) location of an object using a finite or infinite set of alternatives called possible worlds.

The goal of mining uncertain data is to leverage uncertainty information directly in the mining process to enrich data mining results with probability results. For example: What are possible clusters of an uncertain dataset? What is the probability of each possible cluster?

Prior Work

Data mining and query processing problems that I've studied on uncertain data in the past include:

For more details, please see my tutorial on uncertain data management and mining presented/published at top conferences (ICDE [11], [12], VLDB [13], MDM [14]).

Research Directions

The number of publications above (and this is just a small sample of top-tier publications) shows that uncertain data mining is a great field to publish. Yet, I think the best direction to publish (and be cited) is toward uncertain clustering and outlier detection. A specifically important direction is to enhance clustering/outlier detection results with p-values to assess the likelihood that these clusters/outliers may be spurious (false positives).

Funding

There is no extramural funding agency for this project - this project is funded through my own funds. Funding is available for 1-2 PhD students.

[1] Bernecker, T., Kriegel, H.P., Renz, M., Verhein, F. and Zuefle, A., 2009, June. Probabilistic frequent itemset mining in uncertain databases. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 119-128).

[2]Bernecker, T., Cheng, R., Cheung, D.W., Kriegel, H.P., Lee, S.D., Renz, M., Verhein, F., Wang, L. and Zuefle, A., 2013. Model-based probabilistic frequent itemset mining. Knowledge and Information Systems, 37(1), pp.181-217.

[3] Bernecker, T., Kriegel, H.P., Renz, M., Verhein, F. and Züfle, A., 2012, June. Probabilistic frequent pattern growth for itemset mining in uncertain databases. In International Conference on Scientific and Statistical Database Management (pp. 38-55). Springer, Berlin, Heidelberg.

[4] Schubert, E., Koos, A., Emrich, T., Züfle, A., Schmid, K.A. and Zimek, A., 2015. A framework for clustering uncertain data. Proceedings of the VLDB Endowment, 8(12), pp.1976-1979.

[5] Züfle, A., Emrich, T., Schmid, K.A., Mamoulis, N., Zimek, A. and Renz, M., 2014, August. Representative clustering of uncertain data. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 243-252).

[6] Niedermayer, J., Züfle, A., Emrich, T., Renz, M., Mamoulis, N., Chen, L. and Kriegel, H.P., 2013. Probabilistic nearest neighbor queries on uncertain moving object trajectories. arXiv preprint arXiv:1305.3407.

[7] Zhang, P., Cheng, R., Mamoulis, N., Renz, M., Züfle, A., Tang, Y. and Emrich, T., 2013, April. Voronoi-based nearest neighbor search for multi-dimensional uncertain databases. In 2013 IEEE 29th International Conference on Data Engineering (ICDE) (pp. 158-169). IEEE.

[8] Bernecker, T., Kriegel, H.P., Mamoulis, N., Renz, M. and Zuefle, A., 2010. Scalable probabilistic similarity ranking in uncertain databases. IEEE Transactions on Knowledge and Data Engineering, 22(9), pp.1234-1246.

[9] Bernecker, T., Emrich, T., Kriegel, H.P., Renz, M., Zankl, S. and Züfle, A., 2011. Efficient probabilistic reverse nearest neighbor query processing on uncertain data. Proceedings of the VLDB Endowment, 4(10), pp.669-680.

[10] Emrich, T., Kriegel, H.P., Mamoulis, N., Niedermayer, J., Renz, M. and Züfle, A., 2014, April. Reverse-nearest neighbor queries on uncertain moving object trajectories. In International Conference on Database Systems for Advanced Applications (pp. 92-107). Springer, Cham.

[11] Cheng, R., Emrich, T., Kriegel, H.P., Mamoulis, N., Renz, M., Trajcevski, G. and Züfle, A., 2014, March. Managing uncertainty in spatial and spatio-temporal data. In 2014 IEEE 30th International Conference on Data Engineering (pp. 1302-1305). IEEE.

[12] Züfle, A., Trajcevski, G., Pfoser, D., Renz, M., Rice, M.T., Leslie, T., Delamater, P. and Emrich, T., 2017, April. Handling uncertainty in geo-spatial data. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE) (pp. 1467-1470). IEEE.

[13] Renz, Matthias, Reynold Cheng, and Hans-Peter Kriegel, Züfle, A. and Bernecker, T. "Similarity search and mining in uncertain databases." Proceedings of the VLDB Endowment 3, no. 1-2 (2010): 1653-1654.

[14] Züfle, A., Trajcevski, G., Pfoser, D. and Kim, J.S., 2020, June. Managing uncertainty in evolving geo-spatial data. In 2020 21st IEEE International Conference on Mobile Data Management (MDM) (pp. 5-8). IEEE.