Alla Sapronova
Uni Research Computing, Norway
Title: Prune the inputs, increase data volume, or select a different classification method – a strategy to improve accuracy of classification
Biography
Biography: Alla Sapronova
Abstract
Classification, the process of assigning data into labeled groups, is one of the most common operation in data mining. Classification can be used in predictive modeling to learn the relation between desired feature-vector and labeled classes. When the data set contains arbitrary big number of missed data and/or the amount of data samples is not adequate to the data complexity, it is important to define a strategy that allows to reach highest possible classification accuracy. In this work authors present results on classification-based predictive model's accuracy for three different strategies: input pruning, semi-auto selection of various classification methods, and data volume increase. Authors suggest that a satisfactory level of model's accuracy can be reached when preliminary input pruning is used.
The presented model is connecting fishing data with environmental variables. Even with limited number of samples the model is able to resolve the type of the fish with up to 92% of accuracy.
The results of using various classification methods are shown and suggestions are made towards defining the optimal strategy to build an accurate predictive model, opposed to common trial-and-error method. Different strategies for input pruning that assure information's preservation are described.