TY - JOUR
T1 - A fuzzy ontology and SVM–based web content classification system
AU - Ali, Farman
AU - Khan, Pervez
AU - Riaz, Kashif
AU - Kwak, Daehan
AU - Abuhmed, Tamer
AU - Park, Daeyoung
AU - Kwak, Kyung Sup
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/11/1
Y1 - 2017/11/1
N2 - The volume of adult content on the world wide web is increasing rapidly. This makes an automatic detection of adult content a more challenging task, when eliminating access to ill-suited websites. Most pornographic webpage–filtering systems are based on n-gram, naïve Bayes, K-nearest neighbor, and keyword-matching mechanisms, which do not provide perfect extraction of useful data from unstructured web content. These systems have no reasoning capability to intelligently filter web content to classify medical webpages from adult content webpages. In addition, it is easy for children to access pornographic webpages due to the freely available adult content on the Internet. It creates a problem for parents wishing to protect their children from such unsuitable content. To solve these problems, this paper presents a support vector machine (SVM) and fuzzy ontology–based semantic knowledge system to systematically filter web content and to identify and block access to pornography. The proposed system classifies URLs into adult URLs and medical URLs by using a blacklist of censored webpages to provide accuracy and speed. The proposed fuzzy ontology then extracts web content to find website type (adult content, normal, and medical) and block pornographic content. In order to examine the efficiency of the proposed system, fuzzy ontology, and intelligent tools are developed using Protégé 5.1 and Java, respectively. Experimental analysis shows that the performance of the proposed system is efficient for automatically detecting and blocking adult content.
AB - The volume of adult content on the world wide web is increasing rapidly. This makes an automatic detection of adult content a more challenging task, when eliminating access to ill-suited websites. Most pornographic webpage–filtering systems are based on n-gram, naïve Bayes, K-nearest neighbor, and keyword-matching mechanisms, which do not provide perfect extraction of useful data from unstructured web content. These systems have no reasoning capability to intelligently filter web content to classify medical webpages from adult content webpages. In addition, it is easy for children to access pornographic webpages due to the freely available adult content on the Internet. It creates a problem for parents wishing to protect their children from such unsuitable content. To solve these problems, this paper presents a support vector machine (SVM) and fuzzy ontology–based semantic knowledge system to systematically filter web content and to identify and block access to pornography. The proposed system classifies URLs into adult URLs and medical URLs by using a blacklist of censored webpages to provide accuracy and speed. The proposed fuzzy ontology then extracts web content to find website type (adult content, normal, and medical) and block pornographic content. In order to examine the efficiency of the proposed system, fuzzy ontology, and intelligent tools are developed using Protégé 5.1 and Java, respectively. Experimental analysis shows that the performance of the proposed system is efficient for automatically detecting and blocking adult content.
KW - Adult content identification
KW - Data mining
KW - Fuzzy ontology
KW - Semantic knowledge
KW - SVM
UR - http://www.scopus.com/inward/record.url?scp=85033368808&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2017.2768564
DO - 10.1109/ACCESS.2017.2768564
M3 - Article
AN - SCOPUS:85033368808
SN - 2169-3536
VL - 5
SP - 25781
EP - 25797
JO - IEEE Access
JF - IEEE Access
M1 - 8094233
ER -