IJIRST (International Journal for Innovative Research in Science & Technology)ISSN (online) : 2349-6010

 International Journal for Innovative Research in Science & Technology

Content based Document Retrieval using Content Extraction


Print Email Cite
International Journal for Innovative Research in Science & Technology
Volume 4 Issue - 2
Year of Publication : 2017
Authors : Ajaykumar Ashok Awad

BibTeX:

@article{IJIRSTV4I2005,
     title={Content based Document Retrieval using Content Extraction },
     author={Ajaykumar Ashok Awad},
     journal={International Journal for Innovative Research in Science & Technology},
     volume={4},
     number={2},
     pages={61--66},
     year={},
     url={http://www.ijirst.org/articles/IJIRSTV4I2005.pdf},
     publisher={IJIRST (International Journal for Innovative Research in Science & Technology)},
}



Abstract:

The procedure with advancement of information surge has made it hard to get significant information on the web. In this proposed system, the necessity for practical Information Retrieval (IR) strategy has been extended. Document data contains huge information; user can easily get the information by using only title and keywords of document or information. We propose a fast and effective content-based document information retrieval system that retrieves the information from the actual content of a document. In proposed system, we use model of Latent Dirichlet Allocation that is used to extract major keywords for a given document. To improve the performance of system we use MongoDB database for the effective documents indexing. B-tree based indexing of MongoDB makes our system flexible, effective and fast than the previous system.


Keywords:

Information Retrieval, CBDIR, Inverted Indexing, B-tree Indexing, MongoDB


Download Article