About

Data stream mining has gained a lot of attention in recent years as an exciting research topic. However, there is still a gap between the pure research proposals and the practical applications to real world machine learning problems. The main goal of this tutorial is to introduce attendees to data stream mining theory and practice. We will use the river library to illustrate concepts and also to demonstrate how data stream mining can be easily performed in Python.

The tools and algorithms in river are specifically designed for the data stream setting. Due to the large amount of data that is created –and must be processed– in real-time streams, such methods need to be time-efficient while using very small amounts of memory. River is a Python library for online machine learning. The library is designed for research and implementation of data stream methods, as well as deployment on real-world applications. River serves to both researchers and practitioners, as a platform to easily design and run experiments and to easily extend existing methods.

Target audience

The target audience of this tutorial includes researchers and practitioners, specially if interested on: learning from big data streams, evolving data and/or IoT applications. No previous experience on stream learning is required, but familiarity with traditional batch learning concepts and frameworks (scikit-learn) will be a plus.

Announcement

In November 2020, scikit-muiltiflow and creme, the two most popular open-source libraries for stream learning merged into a new library named river. Accordingly, the notebooks are based on river, the title of the tutorial was not changed as the conference's program was already announced.

Organizers

Jacob Montiel

Jacob is a research fellow at the University of Waikato in New Zealand and the core developer and maintainer of river. His research interests are in the field of machine learning for evolving data streams. Prior to focusing on research, Jacob led development work for onboard software for aircraft and engine’s prognostics at GE Aviation; working in the development of GE’s Brilliant Machines, part of the IoT and GE’s approach to Industrial Big Data.

Website: https://jacobmontiel.github.io/

Heitor Murilo Gomes

Heitor is a senior research fellow at the University of Waikato in New Zealand. His main research area is Machine Learning, specially Evolving Data Streams, Concept Drift, Ensemble methods and Big Data Streams. He is an active contributor to the open data stream mining project MOA and a co-leader of the StreamDM project, a real-time analytics open-source software library built on top of Spark Streaming.

Website: https://www.heitorgomes.com

Jesse Read

Jesse is a Professor at the DaSciM team in LIX at Ecole Polytechnique in France. His research interests are in the areas of Artificial Intelligence, Machine Learning, and Data Science and Mining. Jesse is the maintainer of the open-source soft- ware MEKA, a multi-label/multi-target extension to Weka.

Website: https://jmread.github.io/

Albert Bifet

Albert is a Professor at University of Waikato and Télécom Paris. His research focuses on Data Stream mining, Big Data Machine Learning and Artificial Intelligence. Problems he investigate are motivated by large scale data, the Internet of Things (IoT), and Big Data Science. He co-leads the open source projects MOA (Massive On-line Analysis), Apache SAMOA (Scalable Advanced Massive Online Analysis) and StreamDM.

Website: http://albertbifet.com