IJIRST (International Journal for Innovative Research in Science & Technology)ISSN (online) : 2349-6010

 International Journal for Innovative Research in Science & Technology

Visual Webpage Content Segmentation and Retrieval Based on n-Grams


Print Email Cite
International Journal for Innovative Research in Science & Technology
Volume 2 Issue - 4
Year of Publication : 2015
Authors : Kumud Jaglan ; Dr. Kulvinder Singh; Vipul Jaglan

BibTeX:

@article{IJIRSTV2I4025,
     title={Visual Webpage Content Segmentation and Retrieval Based on n-Grams},
     author={Kumud Jaglan, Dr. Kulvinder Singh and Vipul Jaglan },
     journal={International Journal for Innovative Research in Science & Technology},
     volume={2},
     number={4},
     pages={42--49},
     year={},
     url={http://www.ijirst.org/articles/IJIRSTV2I4025.pdf},
     publisher={IJIRST (International Journal for Innovative Research in Science & Technology)},
}



Abstract:

Web documents are often viewed as complicated objects which frequently contain multiple entities every of which may represent a separate unit. Though, most processing requests applications for the web and web content because of the smallest indivisible components and knowledge Extraction from Web Pages has continually trusted comprehensive human involvement within the sort of hand crafted extraction algorithms or scripts using usual expressions. Preceding works usually flout the underlying content segments that are composed of un-important knowledge like net ads and knowledge moot to the users. This paper resolve these subjects, we tend to endorsed n-gram established website segmentation algorithmic program that used the density for segmenting the webpage lacking hoping on the DOM tree for the segmentation method.


Keywords:

Web documents are often viewed as complicated objects which frequently contain multiple entities every of which may represent a separate unit. Though, most processing requests applications for the web and web content because of the smallest indivisible components and knowledge Extraction from Web Pages has continually trusted comprehensive human involvement within the sort of hand crafted extraction algorithms or scripts using usual expressions. Preceding works usually flout the underlying content segments that are composed of un-important knowledge like net ads and knowledge moot to the users. This paper resolve these subjects, we tend to endorsed n-gram established website segmentation algorithmic program that used the density for segmenting the webpage lacking hoping on the DOM tree for the segmentation method.


Download Article