Increasing with the number of users, the need for automatic classification techniques with good classification accuracy increases as search engines depend on previously classified web pages stored in classified directories to retrieve the relevant results. Preprocessing is the important step in web page classification problem as most of the web pages contain more irrelevant information than relevant details useful for finding its category. For the classification purpose the representative words in the web page called as features are used for classification rather than the entire web page to reduce time and space requirements. The selection of relevant features which reduces the high dimensionality and redundancy is the current research topic. In this paper selecting relevant features from the pages is treated as an optimization problem and we propose an algorithm to find the optimal features for web page classification. Machine learning techniques for automatic classification gains more interest as the classifier improves its performance with experience. In this paper we use Naïve Bayes, Kstar, Random Forest and Bagging machine learning classifiers.
Increasing with the number of users, the need for automatic classification techniques with good classification accuracy increases as search engines depend on previously classified web pages stored in classified directories to retrieve the relevant results. Preprocessing is the important step in ...
مادة فرعية