fbpx
Wikipedia

Lemur Project

The Lemur Project is a collaboration between the Center for Intelligent Information Retrieval at the University of Massachusetts Amherst and the Language Technologies Institute at Carnegie Mellon University. The Lemur Project develops search engines, browser toolbars, text analysis tools, and data resources that support research and development of information retrieval and text mining software. The project is best known for its Indri and Galago search engines, the ClueWeb09 and ClueWeb12 datasets, and the RankLib learning-to-rank library. The software and datasets are used widely in scientific and research applications, as well as in some commercial applications.

The Lemur Project's software development philosophy emphasizes state-of-the-art accuracy, flexibility, and efficiency. For example, the Indri search engine provides accurate search for large text collections 'out of the box', and data is stored in an accessible manner to support development of new retrieval strategies. Software from the Lemur Project is distributed under open-source licenses that provide flexibility to scientists and software developers.

The programming languages used to create Lemur are C, C++, and Java, and it comes along with the source files and build instructions. The provided source code can be modified for the purpose of developing new libraries. It is compatible with various operating systems which include Linux and Windows.

Features edit

Lemur supports the following features:

  • Indexing:
  • Retrieval:
    • Ad hoc retrieval (TF-IDF and InQuery)
    • Passage and cross-lingual retrieval
    • Language modeling
      • Query model updating
      • Two stage smoothing
    • Relevance feedback
    • Structured query language
    • Wildcard term matching
  • Distributed IR:
    • Query-based sampling
    • Database based ranking (CORI)
    • Results merging
  • Document clustering
  • Summarization
  • Simple text processing

Components edit

Lemur Project has the following components:

  • Indri search engine in C++
  • Galago search engine research framework in Java
  • RankLib learning-to-rank library
  • Sifaka data mining application
  • ClueWeb09 and ClueWeb12 datasets
  • Query Log Toolbar

Latest Version edit

Updates to the Lemur Project components are made twice a year, in June and December. The latest version of the Indri search engine is 5.17. The latest version of the Galago search engine is version 3.18. The latest version of the RankLib learning-to-rank library is 2.14. The latest version of the Sifaka data mining application is 1.8.

Indri Search Engine edit

The Indri search engine is one of the components developed by the Lemur Project. It is open source. The query language that is used in Indri allows researchers to index data or structure documents using simple command line instructions. Indri offers flexibility in terms of adaptation to various current applications. It also can be distributed across a cluster of nodes for high performance. The Indri search engine can handle large collections of data and can understand various data formats like HTML and XML.

The Indri API supports various programming and scripting languages like C++, Java, C#, and PHP.

Features of Indri Search Engine edit

  • Can make use of multiple document representations
  • Explicit term weighting
  • Robust query language
  • Formally well-grounded
  • Highly effective
  • Can be efficiently implemented

See also edit

External links edit

  • The Lemur Project website


lemur, project, this, article, relies, excessively, references, primary, sources, please, improve, this, article, adding, secondary, tertiary, sources, find, sources, news, newspapers, books, scholar, jstor, august, 2011, learn, when, remove, this, message, to. This article relies excessively on references to primary sources Please improve this article by adding secondary or tertiary sources Find sources Lemur Project news newspapers books scholar JSTOR August 2011 Learn how and when to remove this message The topic of this article may not meet Wikipedia s general notability guideline Please help to demonstrate the notability of the topic by citing reliable secondary sources that are independent of the topic and provide significant coverage of it beyond a mere trivial mention If notability cannot be shown the article is likely to be merged redirected or deleted Find sources Lemur Project news newspapers books scholar JSTOR December 2020 Learn how and when to remove this message The Lemur Project is a collaboration between the Center for Intelligent Information Retrieval at the University of Massachusetts Amherst and the Language Technologies Institute at Carnegie Mellon University The Lemur Project develops search engines browser toolbars text analysis tools and data resources that support research and development of information retrieval and text mining software The project is best known for its Indri and Galago search engines the ClueWeb09 and ClueWeb12 datasets and the RankLib learning to rank library The software and datasets are used widely in scientific and research applications as well as in some commercial applications The Lemur Project s software development philosophy emphasizes state of the art accuracy flexibility and efficiency For example the Indri search engine provides accurate search for large text collections out of the box and data is stored in an accessible manner to support development of new retrieval strategies Software from the Lemur Project is distributed under open source licenses that provide flexibility to scientists and software developers The programming languages used to create Lemur are C C and Java and it comes along with the source files and build instructions The provided source code can be modified for the purpose of developing new libraries It is compatible with various operating systems which include Linux and Windows Contents 1 Features 2 Components 3 Latest Version 4 Indri Search Engine 4 1 Features of Indri Search Engine 5 See also 6 External linksFeatures editLemur supports the following features Indexing English Chinese and Arabic text Word stemming Stop words Tokenization Passage and incremental indexing Retrieval Ad hoc retrieval TF IDF and InQuery Passage and cross lingual retrieval Language modeling Query model updating Two stage smoothing Relevance feedback Structured query language Wildcard term matching Distributed IR Query based sampling Database based ranking CORI Results merging Document clustering Summarization Simple text processingComponents editLemur Project has the following components Indri search engine in C Galago search engine research framework in Java RankLib learning to rank library Sifaka data mining application ClueWeb09 and ClueWeb12 datasets Query Log ToolbarLatest Version editUpdates to the Lemur Project components are made twice a year in June and December The latest version of the Indri search engine is 5 17 The latest version of the Galago search engine is version 3 18 The latest version of the RankLib learning to rank library is 2 14 The latest version of the Sifaka data mining application is 1 8 Indri Search Engine editThe Indri search engine is one of the components developed by the Lemur Project It is open source The query language that is used in Indri allows researchers to index data or structure documents using simple command line instructions Indri offers flexibility in terms of adaptation to various current applications It also can be distributed across a cluster of nodes for high performance The Indri search engine can handle large collections of data and can understand various data formats like HTML and XML The Indri API supports various programming and scripting languages like C Java C and PHP Features of Indri Search Engine edit Can make use of multiple document representations Explicit term weighting Robust query language Formally well grounded Highly effective Can be efficiently implementedSee also editList of information retrieval librariesExternal links editThe Lemur Project website nbsp This free and open source software article is a stub You can help Wikipedia by expanding it vte Retrieved from https en wikipedia org w index php title Lemur Project amp oldid 1131743624, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.