fbpx
Wikipedia

Apache Lucene

Apache Lucene is a free and open-source search engine software library, originally written in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene is widely used as a standard foundation for non-research search applications.[2][3][4]

Lucene
Developer(s)Apache Software Foundation
Initial release1999; 24 years ago (1999)
Stable release
9.4.2 / November 21, 2022; 52 days ago (2022-11-21)[1]
Repository
  • github.com/apache/lucene
Written inJava
Operating systemCross-platform
TypeSearch and index
LicenseApache License 2.0
Websitelucene.apache.org

Lucene has been ported to other programming languages including Object Pascal, Perl, C#, C++, Python, Ruby and PHP.[5]

History

Doug Cutting originally wrote Lucene in 1999.[6] Lucene was his fifth search engine, having previously written two while at Xerox PARC, one at Apple, and a fourth at Excite.[7] It was initially available for download from its home at the SourceForge web site. It joined the Apache Software Foundation's Jakarta family of open-source Java products in September 2001 and became its own top-level Apache project in February 2005. The name Lucene is Doug Cutting's wife's middle name and her maternal grandmother's first name.[8]

Lucene formerly included a number of sub-projects, such as Lucene.NET, Mahout, Tika and Nutch. These three are now independent top-level projects.

In March 2010, the Apache Solr search server joined as a Lucene sub-project, merging the developer communities.

Version 4.0 was released on October 12, 2012.[9]

In March 2021, Lucene changed its logo, and Apache Solr became a top level Apache project again, independent from Lucene.

Features and common use

While suitable for any application that requires full text indexing and searching capability, Lucene is recognized for its utility in the implementation of Internet search engines and local, single-site searching.[10][11]

Lucene includes a feature to perform a fuzzy search based on edit distance.[12]

Lucene has also been used to implement recommendation systems.[13] For example, Lucene's 'MoreLikeThis' Class can generate recommendations for similar documents. In a comparison of the term vector-based similarity approach of 'MoreLikeThis' with citation-based document similarity measures, such as co-citation and co-citation proximity analysis, Lucene's approach excelled at recommending documents with very similar structural characteristics and more narrow relatedness.[14] In contrast, citation-based document similarity measures tended to be more suitable for recommending more broadly related documents,[14] meaning citation-based approaches may be more suitable for generating serendipitous recommendations, as long as documents to be recommended contain in-text citations.

Lucene-based projects

Lucene itself is just an indexing and search library and does not contain crawling and HTML parsing functionality. However, several projects extend Lucene's capability:

See also

References

  1. ^ "Welcome to Apache Lucene". Lucene™ News section. from the original on 12 February 2020. Retrieved 12 February 2020.
  2. ^ Kamphuis, Chris; de Vries, Arjen P.; Boytsov, Leonid; Lin, Jimmy (2020), Jose, Joemon M.; Yilmaz, Emine; Magalhães, João; Castells, Pablo (eds.), "Which BM25 Do You Mean? A Large-Scale Reproducibility Study of Scoring Variants", Advances in Information Retrieval, Cham: Springer International Publishing, 12036: 28–34, doi:10.1007/978-3-030-45442-5_4, ISBN 978-3-030-45441-8, PMC 7148026
  3. ^ Grand, Adrien; Muir, Robert; Ferenczi, Jim; Lin, Jimmy (2020), Jose, Joemon M.; Yilmaz, Emine; Magalhães, João; Castells, Pablo (eds.), "From MAXSCORE to Block-Max Wand: The Story of How Lucene Significantly Improved Query Evaluation Performance", Advances in Information Retrieval, Cham: Springer International Publishing, 12036: 20–27, doi:10.1007/978-3-030-45442-5_3, ISBN 978-3-030-45441-8, PMC 7148045
  4. ^ Azzopardi, Leif; Moshfeghi, Yashar; Halvey, Martin; Alkhawaldeh, Rami S.; Balog, Krisztian; Di Buccio, Emanuele; Ceccarelli, Diego; Fernández-Luna, Juan M.; Hull, Charlie; Mannix, Jake; Palchowdhury, Sauparna (2017-02-14). "Lucene4IR: Developing Information Retrieval Evaluation Resources using Lucene". ACM SIGIR Forum. 50 (2): 58–75. doi:10.1145/3053408.3053421. ISSN 0163-5840. S2CID 212416159.
  5. ^ "LuceneImplementations". apache.org. from the original on 6 October 2015. Retrieved 23 September 2015.
  6. ^ KeywordAnalyzer (PDF). 19 November 2007. Archived from the original (PDF) on 31 January 2012.
  7. ^ Cutting, Doug (2019-06-07). "I wrote a couple of search engines at Xerox PARC, then V-Twin at Apple, then re-wrote Excite's search, then Lucene. So, Lucene might be considered V-Twin 3.0? Almost 25 years later, V-Twin still lives on as Mac OS X Search Kit!". @cutting. Retrieved 2019-06-19.
  8. ^ Barker, Deane (2016). Web Content Management. O'Reilly. p. 233. ISBN 978-1491908105.
  9. ^ "Apache Lucene - Welcome to Apache Lucene". apache.org. from the original on 4 February 2016. Retrieved 4 February 2016.
  10. ^ McCandless, Michael; Hatcher, Erik; Gospodnetić, Otis (2010). Lucene in Action, Second Edition. Manning. p. 8. ISBN 978-1933988177.
  11. ^ (PDF). glscube.org. Archived from the original (PDF) on 2010-06-01.
  12. ^ "Apache Lucene - Query Parser Syntax". lucene.apache.org. from the original on 2017-05-02.
  13. ^ J. Beel, S. Langer, and B. Gipp, “The Architecture and Datasets of Docear’s Research Paper Recommender System,” in Proceedings of the 3rd International Workshop on Mining Scientific Publications (WOSP 2014) at the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2014), London, UK, 2014
  14. ^ a b M. Schwarzer, M. Schubotz, N. Meuschke, C. Breitinger, V. Markl, and B. Gipp, https://www.gipp.com/wp-content/papercite-data/pdf/schwarzer2016.pdf "Evaluating Link-based Recommendations for Wikipedia" in Proceedings of the 16th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL), New York, NY, USA, 2016, pp. 191-200.
  15. ^ Wayner, Peter. "11 cutting-edge databases worth exploring now". InfoWorld. from the original on 21 September 2015. Retrieved 21 September 2015.
  16. ^ "Elasticsearch: RESTful, Distributed Search & Analytics - Elastic". elastic.co. from the original on 8 October 2015. Retrieved 23 September 2015.
  17. ^ . the dude abides. Archived from the original on 2015-10-15. Retrieved 2015-10-14.
  18. ^ a b Natividad, Angela. "Socialtext Updates Search, Goes Kino". CMS Wire. from the original on 2012-09-29. Retrieved 2011-05-31.
  19. ^ Marvin Humphrey. "KinoSearch - Search engine library. - metacpan.org". p3rl.org. Retrieved 23 September 2015.
  20. ^ Diment, Kieren; Trout, Matt S (2009). "Catalyst Cookbook". The Definitive Guide to Catalyst. Apress. p. 280. ISBN 978-1-4302-2365-8.
  21. ^ Wishart, D. S.; et al. (January 2009). "HMDB: a knowledgebase for the human metabolome". Nucleic Acids Res. 37 (Database issue): D603–10. doi:10.1093/nar/gkn810. PMC 2686599. PMID 18953024.
  22. ^ Lim, Emilia; Pon, Allison; Djoumbou, Yannick; Knox, Craig; Shrivastava, Savita; Guo, An Chi; Neveu, Vanessa; Wishart, David S. (January 2010). "T3DB: a comprehensively annotated database of common toxins and their targets". Nucleic Acids Res. 38 (Database issue): D781–6. doi:10.1093/nar/gkp934. PMC 2808899. PMID 19897546.

Bibliography

External links

  • Official website

apache, lucene, this, article, needs, additional, citations, verification, please, help, improve, this, article, adding, citations, reliable, sources, unsourced, material, challenged, removed, find, sources, news, newspapers, books, scholar, jstor, february, 2. This article needs additional citations for verification Please help improve this article by adding citations to reliable sources Unsourced material may be challenged and removed Find sources Apache Lucene news newspapers books scholar JSTOR February 2012 Learn how and when to remove this template message Apache Lucene is a free and open source search engine software library originally written in Java by Doug Cutting It is supported by the Apache Software Foundation and is released under the Apache Software License Lucene is widely used as a standard foundation for non research search applications 2 3 4 LuceneDeveloper s Apache Software FoundationInitial release1999 24 years ago 1999 Stable release9 4 2 November 21 2022 52 days ago 2022 11 21 1 Repositorygithub wbr com wbr apache wbr luceneWritten inJavaOperating systemCross platformTypeSearch and indexLicenseApache License 2 0Websitelucene wbr apache wbr orgLucene has been ported to other programming languages including Object Pascal Perl C C Python Ruby and PHP 5 Contents 1 History 2 Features and common use 3 Lucene based projects 4 See also 5 References 6 Bibliography 7 External linksHistory EditDoug Cutting originally wrote Lucene in 1999 6 Lucene was his fifth search engine having previously written two while at Xerox PARC one at Apple and a fourth at Excite 7 It was initially available for download from its home at the SourceForge web site It joined the Apache Software Foundation s Jakarta family of open source Java products in September 2001 and became its own top level Apache project in February 2005 The name Lucene is Doug Cutting s wife s middle name and her maternal grandmother s first name 8 Lucene formerly included a number of sub projects such as Lucene NET Mahout Tika and Nutch These three are now independent top level projects In March 2010 the Apache Solr search server joined as a Lucene sub project merging the developer communities Version 4 0 was released on October 12 2012 9 In March 2021 Lucene changed its logo and Apache Solr became a top level Apache project again independent from Lucene Features and common use EditWhile suitable for any application that requires full text indexing and searching capability Lucene is recognized for its utility in the implementation of Internet search engines and local single site searching 10 11 Lucene includes a feature to perform a fuzzy search based on edit distance 12 Lucene has also been used to implement recommendation systems 13 For example Lucene s MoreLikeThis Class can generate recommendations for similar documents In a comparison of the term vector based similarity approach of MoreLikeThis with citation based document similarity measures such as co citation and co citation proximity analysis Lucene s approach excelled at recommending documents with very similar structural characteristics and more narrow relatedness 14 In contrast citation based document similarity measures tended to be more suitable for recommending more broadly related documents 14 meaning citation based approaches may be more suitable for generating serendipitous recommendations as long as documents to be recommended contain in text citations Lucene based projects EditLucene itself is just an indexing and search library and does not contain crawling and HTML parsing functionality However several projects extend Lucene s capability Apache Nutch provides web crawling and HTML parsing citation needed Apache Solr an enterprise search server CrateDB open source distributed SQL database built on Lucene 15 DocFetcher a multiplatform desktop search application citation needed Elasticsearch an enterprise search server released in 2010 16 17 Kinosearch a search engine written in Perl and C 18 and a loose port of Lucene 19 The Socialtext wiki software uses this search engine 18 and so does the MojoMojo wiki 20 It is also used by the Human Metabolome Database HMDB 21 and the Toxin and Toxin Target Database T3DB 22 MongoDB Atlas Search a cloud native enterprise search application based on MongoDB and Apache Lucene OpenSearch an open source enterprise search server based on a fork of Elasticsearch 7 Swiftype an enterprise search startup based on LuceneSee also Edit Free and open source software portalEnterprise search Information extraction List of information retrieval libraries Text miningReferences Edit Welcome to Apache Lucene Lucene News section Archived from the original on 12 February 2020 Retrieved 12 February 2020 Kamphuis Chris de Vries Arjen P Boytsov Leonid Lin Jimmy 2020 Jose Joemon M Yilmaz Emine Magalhaes Joao Castells Pablo eds Which BM25 Do You Mean A Large Scale Reproducibility Study of Scoring Variants Advances in Information Retrieval Cham Springer International Publishing 12036 28 34 doi 10 1007 978 3 030 45442 5 4 ISBN 978 3 030 45441 8 PMC 7148026 Grand Adrien Muir Robert Ferenczi Jim Lin Jimmy 2020 Jose Joemon M Yilmaz Emine Magalhaes Joao Castells Pablo eds From MAXSCORE to Block Max Wand The Story of How Lucene Significantly Improved Query Evaluation Performance Advances in Information Retrieval Cham Springer International Publishing 12036 20 27 doi 10 1007 978 3 030 45442 5 3 ISBN 978 3 030 45441 8 PMC 7148045 Azzopardi Leif Moshfeghi Yashar Halvey Martin Alkhawaldeh Rami S Balog Krisztian Di Buccio Emanuele Ceccarelli Diego Fernandez Luna Juan M Hull Charlie Mannix Jake Palchowdhury Sauparna 2017 02 14 Lucene4IR Developing Information Retrieval Evaluation Resources using Lucene ACM SIGIR Forum 50 2 58 75 doi 10 1145 3053408 3053421 ISSN 0163 5840 S2CID 212416159 LuceneImplementations apache org Archived from the original on 6 October 2015 Retrieved 23 September 2015 KeywordAnalyzer Better Search with Apache Lucene and Solr PDF 19 November 2007 Archived from the original PDF on 31 January 2012 Cutting Doug 2019 06 07 I wrote a couple of search engines at Xerox PARC then V Twin at Apple then re wrote Excite s search then Lucene So Lucene might be considered V Twin 3 0 Almost 25 years later V Twin still lives on as Mac OS X Search Kit cutting Retrieved 2019 06 19 Barker Deane 2016 Web Content Management O Reilly p 233 ISBN 978 1491908105 Apache Lucene Welcome to Apache Lucene apache org Archived from the original on 4 February 2016 Retrieved 4 February 2016 McCandless Michael Hatcher Erik Gospodnetic Otis 2010 Lucene in Action Second Edition Manning p 8 ISBN 978 1933988177 GNU Linux Semantic Storage System PDF glscube org Archived from the original PDF on 2010 06 01 Apache Lucene Query Parser Syntax lucene apache org Archived from the original on 2017 05 02 J Beel S Langer and B Gipp The Architecture and Datasets of Docear s Research Paper Recommender System in Proceedings of the 3rd International Workshop on Mining Scientific Publications WOSP 2014 at the ACM IEEE Joint Conference on Digital Libraries JCDL 2014 London UK 2014 a b M Schwarzer M Schubotz N Meuschke C Breitinger V Markl and B Gipp https www gipp com wp content papercite data pdf schwarzer2016 pdf Evaluating Link based Recommendations for Wikipedia in Proceedings of the 16th ACM IEEE CS Joint Conference on Digital Libraries JCDL New York NY USA 2016 pp 191 200 Wayner Peter 11 cutting edge databases worth exploring now InfoWorld Archived from the original on 21 September 2015 Retrieved 21 September 2015 Elasticsearch RESTful Distributed Search amp Analytics Elastic elastic co Archived from the original on 8 October 2015 Retrieved 23 September 2015 The Future of Compass amp Elasticsearch the dude abides Archived from the original on 2015 10 15 Retrieved 2015 10 14 a b Natividad Angela Socialtext Updates Search Goes Kino CMS Wire Archived from the original on 2012 09 29 Retrieved 2011 05 31 Marvin Humphrey KinoSearch Search engine library metacpan org p3rl org Retrieved 23 September 2015 Diment Kieren Trout Matt S 2009 Catalyst Cookbook The Definitive Guide to Catalyst Apress p 280 ISBN 978 1 4302 2365 8 Wishart D S et al January 2009 HMDB a knowledgebase for the human metabolome Nucleic Acids Res 37 Database issue D603 10 doi 10 1093 nar gkn810 PMC 2686599 PMID 18953024 Lim Emilia Pon Allison Djoumbou Yannick Knox Craig Shrivastava Savita Guo An Chi Neveu Vanessa Wishart David S January 2010 T3DB a comprehensively annotated database of common toxins and their targets Nucleic Acids Res 38 Database issue D781 6 doi 10 1093 nar gkp934 PMC 2808899 PMID 19897546 Bibliography EditGospodnetic Otis Erik Hatcher Michael McCandless 28 June 2009 Lucene in Action 2nd ed Manning Publications ISBN 978 1 9339 8817 7 Gospodnetic Otis Erik Hatcher 1 December 2004 Lucene in Action 1st ed Manning Publications ISBN 978 1 9323 9428 3 External links EditOfficial website Retrieved from https en wikipedia org w index php title Apache Lucene amp oldid 1123355507, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.