Temporal Data Stream Mining by Using Incrementally Optimized Very Fast Decision Forest (iOVFDF)

非常快速及優化的決策森林用於時態數據流挖掘 (iOVFDF)

iOVFDT (Incrementally Optimized Very Fast Decision Tree) is a new data stream mining model, that optimizes a balance of compact tree size and prediction accuracy.

The iOVFDT was developed into open source by my PhD student who was supported by my previous MYRG. In this sequel project, we extend iOVFDTinto iOVFDF (‘F’ for forest of Trees) for temporal data stream mining.

A major issue to the current temporal data mining algorithms is due to the inherent limitation of batch learning. But in real-life, the hidden concepts of data streams may change rapidly, and the data may amount to infinity.

In Big Data era, incremental learning is attractive because it does not require processing the full volume of dataset. We propose to research and develop a new breed of temporal data stream algorithms –iOVFDF.

The main features of iOVFDT: - Loss function, replaced Hoeffding Tree for higher accuracy - Auxiliary classifier, solved imperfect data stream problems - ROC optimizer, relieved concept drift problems

We integrate for a “meta-classifier” called iOVFDForest over a collection of iOVFDT classifiers. The new iOVFDForest can incrementally learn temporal associations across multiple time-series in real-time, while each underlying individual iOVFDTree learns and recognizes sub-sequence patterns dynamically.

Extending from our previous MYRG grant “Adaptive OVFDT with Incremental Pruning and ROC Corrective Learning for Data Stream Mining,” Grant no. MYRG073(Y3-L2)-FST12-FCC, we want to reuse the research outputs, and extend them to a more significant level of application research.

• To research and develop a new breed of data stream algorithms/tools, for temporal data stream mining

• The new model, iOVFDForest is extended from our iOVFDTree developed by the PI and his PhD student supported by previous MYRG, with a new application domain of mining multiple time-series

• The significance of the new model iOVFDForest is to provide possibilities of stream mining temporal patterns in real-time, that include but not limited to applications for real-time sentiments analysis, opinion mining from social media, wireless sensor network data, financial market information, physiological data streams and bioinformatics.

In the past, traditional temporal data mining algorithms have existed, but they all belong to classical batch-mode machine learning, that needs all the data to be loaded and processed for model refresh.

In this project we focus on incremental learning, which makes real-time temporal data stream mining possible for the new breed of applications that fed on continuous data streams in nature.

To the best of our knowledge, data stream mining on temporal pattern is a relatively new and popular research domain, because of the prevalent ubiquitous computing, data collection and Big Data applications.