A novel feature selection algorithm using ACO-Ant Colony Optimization, to extract feature words from a given web page and then to generate an optimal feature set based on ACO Metaheuristics and normalized weight defined as a learning function of their learned weights, position and frequency of feature in the web page.
To ascertain the validity of the proposed measure, we performed the experiments on web document categorization and the obtained results using the proposed measure were compared with those using other commonly used measures.
Webkb datasets(CMU Machine Learning Repository) were adopted in our simulation experiments. Dataset along with noise was firstly fed into feature selectors, which will generate feature subsets from the datasets. After that, Newly selected features were passed to external learning algorithms to assess classification performance. NBC (Naive Bayes Classifier) and (SVM) Support Vector Machine classifier, was chosen to test prediction capability of the selected subset. All test were done on experimental platform Weka. To achieve impartial results, 10-fold cross-validations were performed on the datasets using both the classifier.
- Developed a novel Feature Selection algorithm based on Ant Colony Optimization (ACO) Metaheuristics. Web page classification task was performed to ascertain the validity of the proposed method on WEBKB dataset. Achieved significant improvement in the classification performance of Naive Bayes classifier (NBC) and Support Vector Machine (SVM) classifiers. Completed the thesis under the guidance of Dr. Shine N Das.
- Experimental results: using NBC- 0.853, 0.788, 0.814, 0.937 using SVM 0.760, 0.873, 0.807, 0.936 (in the following order IR-precision, IR-Recall, F-Measure, Area under precision-recall curve (AUC)).
- Was awarded as the Best undergraduate thesis project by the university.
Web Feature selection is the essential steps in web page classification systems. Web Feature selection is commonly used to reduce the dimensionality of datasets with tens or hundreds of thousands of features which would be impossible to process further.A significant problem of web featuring is the high dimensionality of the feature space; therefore, Web feature selection is the most critical step in web featuring, but only reducing the dimensionality of feature space will not lead to efficient web page categorization.
Henceforth we present a novel, Web feature selection algorithm that is based on Ant Colony Optimization, which optimizes the extracted feature from the web pages using the population-based metaheuristics. Ant Colony Optimization algorithm is inspired by the behavior of real ants, that comprises of a parallel search over several constructive computational threads based on local problem data and on a dynamic memory structure containing information on the quality of the previously obtained result. The collective behavior merging from the interaction of the different search threads has proved useful in solving combinatorial optimization(CO) problems. To apply Ant Colony Optimization,the combinatorial optimization problem is transformed into the problem of finding the best path on a weighted graph.The artificial ants (software mutant of real ants) incrementally build solutions by moving on the graph. The solution construction process is stochastic and is biased by a pheromone model, that is, a set of parameters associated with graph components (either nodes or edges) whose values are modified at runtime by the ants. The proposed algorithm is easily implemented, and because of use of a simple classifier in that, its computational complexity is very low.
We presented a novel technique which in corporates web page extraction and optimization of the extracted feature set using Ant Colony Optimization algorithm, to create an efficient web page categorization process.
Ant colony Optimization process has been proved more efficient compared to the Genetic algorithm ,Chi-Square Statistics and Information Gain methods, as after the extraction process the web page will merely act as text feature selection with ACO will lead to same excellent results of text feature selection using ACO.
The current proposed system leads to the creation of an optimal feature set for web page categorization. As future work, we would like to incorporate Dynamic Mutual Information along with ACO hence creating a logical and optimal feature set for categorization of web pages which in turn help in improving the web page categorization efficiency.
Incorporating Dynamic Mutual Information along with ACO will result in a different variant of current ACO algorithm called Multi objective ACO. In Multi objective ACO, we apply heuristics on the problem with multiple objectives, i.e., in order to obtain a solution while applying a heuristic,multiple objectives are concerned.