Skip to Main content Skip to Navigation
Theses

Improving IoT data stream analytics using summarization techniques

Abstract : With the evolution of technology, the use of smart Internet-of-Things (IoT) devices, sensors, and social networks result in an overwhelming volume of IoT data streams, generated daily from several applications, that can be transformed into valuable information through machine learning tasks. In practice, multiple critical issues arise in order to extract useful knowledge from these evolving data streams, mainly that the stream needs to be efficiently handled and processed. In this context, this thesis aims to improve the performance (in terms of memory and time) of existing data mining algorithms on streams. We focus on the classification task in the streaming framework. The task is challenging on streams, principally due to the high -- and increasing -- data dimensionality, in addition to the potentially infinite amount of data. The two aspects make the classification task harder.The first part of the thesis surveys the current state-of-the-art of the classification and dimensionality reduction techniques as applied to the stream setting, by providing an updated view of the most recent works in this vibrant area.In the second part, we detail our contributions to the field of classification in streams, by developing novel approaches based on summarization techniques aiming to reduce the computational resource of existing classifiers with no -- or minor -- loss of classification accuracy. To address high-dimensional data streams and make classifiers efficient, we incorporate an internal preprocessing step that consists in reducing the dimensionality of input data incrementally before feeding them to the learning stage. We present several approaches applied to several classifications tasks: Naive Bayes which is enhanced with sketches and hashing trick, k-NN by using compressed sensing and UMAP, and also integrate them in ensemble methods.
Document type :
Theses
Complete list of metadata

Cited literature [139 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02865982
Contributor : Abes Star :  Contact Connect in order to contact the contributor
Submitted on : Friday, June 12, 2020 - 10:47:08 AM
Last modification on : Tuesday, September 21, 2021 - 2:16:03 PM

File

90368_BAHRI_2020_archivage.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02865982, version 1

Collections

Citation

Maroua Bahri. Improving IoT data stream analytics using summarization techniques. Machine Learning [cs.LG]. Institut Polytechnique de Paris, 2020. English. ⟨NNT : 2020IPPAT017⟩. ⟨tel-02865982⟩

Share

Metrics

Record views

339

Files downloads

679