Reverse Nearest Neighbor Query Processing


Problem

The k-Nearest Neighbor (kNN) Query returns the k closest (most similar) objects to a query object and is one of the most commonly queries used for data mining. For example, the k-means algorithm requires finding the (1-)nearest object of each cluster centroid and the DBSCAN algorithm requires checking, for each object, if the distance to the minPts-nearest neighbor is at most epsilon.

The Reverse k-Nearest Neighbor (RkNN) Query finds all those points that have a query point among their kNNs. Finding RKNNs is not trivial, as the nearest neighbor relationship is not symmetric. For example, in the figure above, p2 is the nearest neighbor of p1, but p2 is not the nearest neighbor of p1.

Answering RkNN queries efficiently is important for updating data mining tasks. For example, think of a DBSCAN clustering, but now imagine that a new data point p is added to the dataset. In that case, all the objects that have p among their minPts-nearest neighbors may become upgraded from noise/border points to core points. These are the RkNNs of p.

Prior Work

RkNNs have been studied thoroughly in the past using the notion of influence sets on spatial data [1] as well as on spatiotemporal data [2]. Recent work on efficient RkNN processing using spatial index structures includes some of my own work [3], [4], [5], [6], [7], [8], [9], [10], [11]. Other work also considers RkNN queries on graphs [12], [13].

Research Directions

The list of existing work on efficient RkNN processing on spatial, spatiotemporal, graph, and other data types is exhaustive. I'd like to work with an eager student on a survey paper. Writing a survey paper would require reading 100-200 papers which is a very time consuming effort. But the student would not be alone - I'm able to point to most of the important works - and this survey would also be in collaboration with Muhammad Aamir Cheema at Monash University, Melbourne, Australia, and possible another PhD student there.

Another possible research direction is the use of a learned index for RkNN processing. Learned indexes use a prediction model (from simple regression to deep models) to predict the query result without the use of traditional hierarchical index structure.

Funding

There is no extramural funding agency for this project - this project is funded through my own funds. Funding is available for 1 PhD student (fully funded for three years). I had a student exchange grant (from the German Exchange Service, DAAD) with Monash University, Australia when I was back at LMU Munich, Germany. However, this grant could not be transferred to the U.S.

Collaborations

This work will be in collaboration Dr. Muhammad Aamir Cheema at Monash University, Melbourne, Australia.

[1] Korn, F. and Muthukrishnan, S., 2000. Influence sets based on reverse nearest neighbor queries. ACM Sigmod Record, 29(2), pp.201-212.

[2] Benetis, R., Jensen, C.S., Karĉiauskas, G. and Ŝaltenis, S., 2006. Nearest and reverse nearest neighbor queries for moving objects. The VLDB Journal, 15(3), pp.229-249.

[3] Achtert, E., Kriegel, H.P., Kröger, P., Renz, M. and Züfle, A., 2009, March. Reverse k-nearest neighbor search in dynamic and general metric databases. In Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology (pp. 886-897).

[4] Bernecker, T., Emrich, T., Kriegel, H.P., Renz, M., Zankl, S. and Züfle, A., 2011. Efficient probabilistic reverse nearest neighbor query processing on uncertain data. Proceedings of the VLDB Endowment, 4(10), pp.669-680.

[5] Emrich, T., Kriegel, H.P., Kröger, P., Renz, M. and Züfle, A., 2009, July. Incremental reverse nearest neighbor ranking in vector spaces. In International Symposium on Spatial and Temporal Databases (pp. 265-282). Springer, Berlin, Heidelberg.

[6] Kriegel, H.P., Kröger, P., Renz, M., Züfle, A. and Katzdobler, A., 2009, March. Incremental reverse nearest neighbor ranking. In 2009 IEEE 25th International Conference on Data Engineering (pp. 1560-1567). IEEE.

[7] Emrich, T., Kriegel, H.P., Mamoulis, N., Niedermayer, J., Renz, M. and Züfle, A., 2014, April. Reverse-nearest neighbor queries on uncertain moving object trajectories. In International Conference on Database Systems for Advanced Applications (pp. 92-107). Springer, Cham.

[8] Emrich, T., Kriegel, H.P., Kröger, P., Niedermayer, J., Renz, M. and Züfle, A., 2015. On reverse-k-nearest-neighbor joins. GeoInformatica, 19(2), pp.299-330.

[9] Kriegel, H.P., Kröger, P., Renz, M., Züfle, A. and Katzdobler, A., 2009, June. Reverse k-nearest neighbor search based on aggregate point access methods. In International Conference on Scientific and Statistical Database Management (pp. 444-460). Springer, Berlin, Heidelberg.

[10] Emrich, T., Kriegel, H.P., Kröger, P., Renz, M. and Züfle, A., 2009, November. Constrained reverse nearest neighbor search on mobile objects. In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (pp. 197-206).

[11] Emrich, T., Kriegel, H.P., Kröger, P., Niedermayer, J., Renz, M. and Züfle, A., 2013, August. Reverse-k-nearest-neighbor join processing. In International Symposium on Spatial and Temporal Databases (pp. 277-294). Springer, Berlin, Heidelberg.

[12] Yiu, M.L., Papadias, D., Mamoulis, N. and Tao, Y., 2006. Reverse nearest neighbors in large graphs. IEEE Transactions on Knowledge and Data Engineering, 18(4), pp.540-553.

[13] Cheema, M.A., Zhang, W., Lin, X., Zhang, Y. and Li, X., 2012. Continuous reverse k nearest neighbors queries in euclidean space and in spatial networks. The VLDB Journal, 21(1), pp.69-95.