Abdulmohsen Algarni
King Khalid University, Saudi Arabia
Title: Selecting Training Documents for Better Learning
Biography
Biography: Abdulmohsen Algarni
Abstract
In general, there are two types of feedback documents: positive feedback documents and negative feedback documents. Term-Ââ€based approaches can extract many features in text documents, but most include noise. It is clear that all feedback documents contain some noise knowledge that affects the quality of the extracted features. The amount of noise is different from document to another. Therefore, reducing the noise data in the training documents would help to reduce noise in the extracted features. Moreover, we believe that removing some training documents (documents that contain more noise data than useful data) can help to improve the effectiveness of a classifier. Based on that observation, we found that short documents are more important than long documents. Testing that idea, we found that using the advantages of short training documents to improve the quality of extracted features can give a promising result. Moreover, we found that not all training documents are useful for training the classifier.