A Scalable Data Stream Mining Methodology: Stream-based Holistic Analytics and Reasoning in Parallel


研究目標: The research target is to design and develop a data-stream-mining system, which is "holistic" meaning it is able to produce a decision-support model of highest possible accuracy in data-mining data streams.


研究內容: In this project, a scalable data stream mining called Stream-based Holistic Analytics and Reasoning in Parallel (SHARP) is proposed. SHARP is holistic because it consists of several components and they target to improve different aspects of data mining functions such as smoothing the input data streams, reducing the feature search space, finding the optimum feature subset, optimizing parameter values for the classifiers, and allowing incremental classifiers to go ensemble by spawning different classifiers in parallel. Preliminary experiments for three individual components have been tested and demonstrated superiority over existing methods, by our previously published literature. In this project, it is planned that all the components would be fully integrated and tested as a holistic data stream mining system that can produce the best possible performance. It is anticipated that SHARP is capable of eliminating some of the key problems in Big Data especially those associated with high-dimensionality and infinite and continuous data streams.


預期研究成果或擬解決問題: The end result is anticipated to generate a new breed of data stream mining system, which is holistic by design as it embraces: incremental feature selection, parameters optimization, ensemble bagging, and factors analysis/understanding, all for achieving the best possible data mining performance. In particular, this data stream mining system is intended to solve at least 2 Big Data problems, namely high-data-dimensionality, and potentially infinite amount in volume.