In this paper, we introduce a method for survival analysis on data streams. Survival analysis (also known as event history analysis) is an established statistical method for the study of temporal “events” or, more specifically, questions regarding the temporal distribution of the occurrence of events and their dependence on covariates of the data sources. To make this method applicable in the setting of data streams, we propose an adaptive variant of a model that is closely related to the well-known Cox proportional hazard model. Adopting a sliding window approach, our method continuously updates its parameters based on the event data in the current time window. As a proof of concept, we present two case studies in which our method is used for different types of spatio-temporal data analysis, namely, the analysis of earthquake data and Twitter data. In an attempt to explain the frequency of events by the spatial location of the data source, both studies use the location as covariates of the sources.
@article{bwmeta1.element.bwnjournal-article-amcv24i1p199bwm, author = {Ammar Shaker and Eyke H\"ullermeier}, title = {Survival analysis on data streams: Analyzing temporal events in dynamically changing environments}, journal = {International Journal of Applied Mathematics and Computer Science}, volume = {24}, year = {2014}, pages = {199-212}, zbl = {1295.62093}, language = {en}, url = {http://dml.mathdoc.fr/item/bwmeta1.element.bwnjournal-article-amcv24i1p199bwm} }
Ammar Shaker; Eyke Hüllermeier. Survival analysis on data streams: Analyzing temporal events in dynamically changing environments. International Journal of Applied Mathematics and Computer Science, Tome 24 (2014) pp. 199-212. http://gdmltest.u-ga.fr/item/bwmeta1.element.bwnjournal-article-amcv24i1p199bwm/
[000] Aggarwal, C.C., Han, J., Wang, J. and Yu, P.S. (2003). A framework for clustering evolving data streams, Proceedings of the 29th International Conference on Very Large Data Bases, Berlin, Germany, pp. 81-92.
[001] Allan, J., Papka, R. and Lavrenko, V. (1998). On-line new event detection and tracking, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), Melbourne, Australia, pp. 37-45.
[002] Amati, G., Amodeo, G. and Gaibisso, C. (2012). Survival analysis for freshness in microblogging search, Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM-2012), Maui, HI, USA, pp. 2483-2486.
[003] Amodeo, G., Blanco, R. and Brefeld, U. (2011). Hybrid models for future event prediction, Proceedings of the 20th ACM International Conference on Information and Knowledge Management (CIKM-2011), Glasgow, UK, pp. 1981-1984.
[004] Babcock, B., Babu, S., Datar, M., Motwani, R. and Widom, J. (2002). Models and issues in data stream systems, Proceedings of the 21st ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, Madison, WI, USA, pp. 1-16.
[005] Beringer, J. and H¨ullermeier, E. (2006). Online clustering of parallel data streams, Data and Knowledge Engineering 58(2): 180-204.
[006] Bottou, L. (1998). Online algorithms and stochastic approximations, in D. Saad (Ed.), Online Learning and Neural Networks, Cambridge University Press, Cambridge. | Zbl 0968.68127
[007] Chen, G., Wu, X. and Zhu, X. (2005). Sequential pattern mining in multiple streams, Proceedings of the 5th IEEE International Conference on Data Mining (ICDM), Houston, TX, USA, pp. 585-588.
[008] Cheon, S.-P., Kim, S., Lee, S.-Y. and Lee, C.-B. (2009). Bayesian networks based rare event prediction with sensor data, Knowledge-Based Systems 22(5): 336-343.
[009] Cherniack, M., Balakrishnan, H., Balazinska, M., Carney, D., Cetintemel, U., Xing, Y. and Zdonik, S. (2003). Scalable distributed stream processing, Proceedings of CIDR-03: 1st Biennial Conference on Innovative Database Systems, Asilomar, CA, USA.
[010] Considine, J., Li, F., Kollios, G. and Byers, J. (2004). Approximate aggregation techniques for sensor databases, ICDE-04: 20th IEEE International Conference on Data Engineering, Boston, MA, USA, pp. 449-460.
[011] Cormode, G. and Muthukrishnan, S. (2005). What's hot and what's not: Tracking most frequent items dynamically, ACM Transactions on Database Systems 30(1): 249-278.
[012] Cox, D. (1972). Regression models and life tables, Journal of the Royal Statistical Society B 34(2): 187-220. | Zbl 0243.62041
[013] Cox, D. R. and Oakes, D. (1984). Analysis of Survival Data, Chapman & Hall, London.
[014] Das, A., Gehrke, J. and Riedewald, M. (2003). Approximate join processing over data streams, Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, San Diego, CA, USA, pp. 40-51.
[015] Domingos, P. and Hulten, G. (2003). A general framework for mining massive data streams, Journal of Computational and Graphical Statistics 12(4): 945-949.
[016] Gaber, M.M., Zaslavsky, A. and Krishnaswamy, S. (2005). Mining data streams: A review, ACM SIGMOD Record 34(1): 18-26. | Zbl 1087.68557
[017] Gama, J. (2012). A survey on learning from data streams: Current and future trends, Progress in Artificial Intelligence 1(1): 45-55.
[018] Gama, J. and Gaber, M.M. (2007). Learning from Data Streams, Springer-Verlag, Berlin/New York, NY. | Zbl 1153.68361
[019] Garofalakis, M., Gehrke, J. and Rastogi, R. (2002). Querying and mining data streams: You only get one look, Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, Madison, WI, USA, pp. 635-635.
[020] Golab, L. and Tamer, M. (2003). Issues in data stream management, ACM SIGMOD Record 32(2): 5-14.
[021] Hulten, G., Spencer, L. and Domingos, P. (2001). Mining time-changing data streams, Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, pp. 97-106.
[022] Ikonomovska, E., Gama, J. and Dzeroski, S. (2011). Learning model trees from evolving data streams, Data Mining and Knowledge Discovery 23(1): 128-168. | Zbl 1235.68158
[023] Krizanovic, K., Galic, Z. and Baranovic, M. (2011). Data types and operations for spatio-temporal data streams, IEEE International Conference on Mobile Data Management (MDM), Luleå, Sweden, pp. 11-14.
[024] Li, R., Lei, K.H., Khadiwala, R. and Chang, K.C.-C. (2012). Tedas: A twitter-based event detection and analysis system, Proceedings of the IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA, pp. 1273-1276.
[025] Oliveira, M. and Gama, J. (2012). A framework to monitor clusters evolution applied to economy and finance problems, Intelligent Data Analysis 16(1): 93-111.
[026] Radinsky, K. and Horvitz, E. (2013). Mining the web to predict future events, Proceedings of the 6th ACM International Conference on Web Search and Data Mining (WSDM 2013), Rome, Italy, pp. 255-264.
[027] Sakaki, T., Okazaki, M. and Matsuo, Y. (2013). Tweet analysis for real-time event detection and earthquake reporting system development, IEEE Transactions on Knowledge and Data Engineering 25(4): 919-931.
[028] Weng, J. and Lee, B.-S. (2011). Event detection in twitter, Proceedings of the 5th International Conference on Weblogs and Social Media (ICWSM 2011), Barcelona, Spain.
[029] Yang, Y., Pierce, T. and Carbonell, J.G. (1998). A study of retrospective and on-line event detection, Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), Melbourne, Australia, pp. 28-36.
[030] Zadeh, L. (1965). 8(3): 338-353. Fuzzy sets, Information and Control | Zbl 0139.24606
[031] Zupan, B., Demšar, J., Kattan, M.W., Beck, J.R. and Bratko, I. (2000). Machine learning for survival analysis: A case study on recurrence of prostate cancer, Artificial Intelligence in Medicine 20(1): 59-75.