fbpx
Wikipedia

Sphinx (search engine)

Sphinx is a fulltext search engine that provides text search functionality to client applications.

Sphinx
Developer(s)Andrew Aksyonoff
Initial release2001; 23 years ago (2001)
Stable release
3.5.1[1]  / 3 February 2023; 14 months ago (3 February 2023)
Written inC++
Operating systemLinux, Windows, Solaris, FreeBSD, NetBSD, Mac OS, AIX
TypeSearch and index
LicenseGPLv2 until version 2 and commercial; proprietary since version 3
Websitesphinxsearch.com 

Overview edit

Sphinx can be used either as a stand-alone server or as a storage engine ("SphinxSE") for the MySQL family of databases. When run as a standalone server Sphinx operates similar to a DBMS and can communicate with MySQL, MariaDB and PostgreSQL through their native protocols or with any ODBC-compliant DBMS via ODBC. MariaDB, a fork of MySQL, is distributed with SphinxSE.[2]

SphinxAPI edit

If Sphinx is run as a stand-alone server, it is possible to use SphinxAPI to connect an application to it. Official implementations of the API are available for PHP, Java, Perl, Ruby and Python languages. Unofficial implementations for other languages, as well as various third party[3] plugins and modules are also available. Other data sources can be indexed via pipe in a custom XML format.[4]

SphinxQL edit

The Sphinx search daemon supports the MySQL binary network protocol and can be accessed with the regular MySQL API and/or clients. Sphinx supports a subset of SQL known as SphinxQL. It supports standard querying of all index types with SELECT, modifying RealTime indexes with INSERT, REPLACE, and DELETE, and more.

SphinxSE edit

Sphinx can also provide a special storage engine for MariaDB and MySQL databases. This allows those MySQL, MariaDB to communicate with Sphinx's searchd to run queries and obtain results. Sphinx indices are treated like regular SQL tables. The SphinxSE storage engine is shipped with MariaDB.

Full-text fields and indexing edit

Sphinx is configured to examine a data set via its Indexer. The Indexer process creates a full-text index (a special data structure that enables quick keyword searches) from the given data/text. Full-text fields are the resulting content that is indexed by Sphinx; they can be (quickly) searched for keywords. Fields are named, and you can limit your searches to a single field (e.g. search through "title" only) or a subset of fields (e.g. to "title" and "abstract" only). Sphinx's index format generally supports up to 256 fields. Note that the original data is not stored in the Sphinx index, but are discarded during the Indexing process; Sphinx assumes that you store those contents elsewhere.

Attributes edit

Attributes are additional values associated with each document that can be used to perform additional filtering and sorting during search. Attributes are named. Attribute names are case insensitive. Attributes are not full-text indexed; they are stored in the index as is. Currently supported attribute types are:

(since 1.10-beta);

(since 2.1.1-beta);[5][6]

  • MVA, multi-value attributes (variable-length lists of 32-bit unsigned integers).

JSON attributes in Sphinx edit

Sphinx, like classic SQL databases, works with a so-called fixed schema, that is, a set of predefined attribute columns. These work well when most of the data stored actually has values: mapping sparse data to static columns can be cumbersome. Assume for example that you're running a price comparison or an auction site with many different products categories. Some of the attributes like the price or the vendor are identical across all goods. But from there, for laptops, you also need to store the weight, screen size, HDD type, RAM size, etc. And, say, for shovels, you probably want to store the color, the handle length, and so on. So it's manageable across a single category, but all the distinct fields that you need for all the goods across all the categories are legion. The JSON field can be used to overcome this. Inside the JSON attribute you don't need a fixed structure. You can have various keys which may or may not be present in all documents. When you try to filter on one of these keys, Sphinx will ignore documents that don't have the key in the JSON attribute and will work only with those documents that have it.

License edit

Up until version 3, Sphinx is dual licensed; either:

  1. GNU General Public License version 2 or
  2. proprietary licensing is available for use-cases which are not within the terms of the GNU GPLv2.

Since version 3, Sphinx has become proprietary, with a promise to release its source code in the future[7]

Sphinx use examples edit

  • Craigslist.org[8]
  • Recruitment.aleph-graymatter.com[9]
  • Tradebit.com[10]
  • vBulletin.com[11]
  • MediaWiki extension[12]
  • Boardreader.com[13]
  • OMBE.com[14]
  • Limundo.com[14]

Feature list edit

  • Batch and incremental (soft real-time) full-text indexing.
  • Support for non-text attributes (scalars, strings, sets, JSON).
  • Direct indexing of SQL databases. Native support for MySQL, MariaDB, PostgreSQL, MSSQL, plus ODBC connectivity.
  • XML document indexing support.
  • Distributed searching support out-of-the-box.
  • Integration via access APIs.
  • SQL-like syntax support via MySQL protocol (since 0.9.9)
  • Full-text searching syntax.
  • Database-like result set processing.
  • Relevance ranking utilizing additional factors besides standard BM25.
  • Text processing support for SBCS and UTF-8 encodings, stopwords, indexing of words known not to appear in the database ("hitless"), stemming, word forms, tokenizing exceptions, and "blended characters" (dual-indexing as both a real character and a word separator).
  • Supports UDF (since 2.0.1).

Performance and scalability edit

  • Indexing speed of up to 10-15 MB/sec per core and HDD.
  • Searching speed of over 500 queries/sec against 1,000,000 document/1.2 GB collection using a 2-core desktop system with 2 GB of RAM.[15]
  • The biggest known installation using Sphinx, Boardreader.com, indexes 16 billion documents.[16]
  • The busiest known installation, Craigslist, serves over 300,000,000 queries/day[16] and more than 50 billion page views/month.[17]

Fork edit

In 2017, key members of the original Sphinx team forked the project under the name Manticore[18], with the intention of fixing bugs and developing new features[19]. Unlike Sphinx, Manticore continues to be released as open source under version 3 of the GPL[20].

See also edit

References edit

  1. ^ "Feb 3, 2023. Sphinx 3.5.1 released". Retrieved 15 June 2023.
  2. ^ "AskMonty: About SphinxSE". kb.askmonty.org. Monty Program AB. Retrieved 16 August 2013.
  3. ^ "Sphinx Wiki: Third Party Tools". sphinxsearch.com. Sphinx Search Wiki. Retrieved 16 August 2013.
  4. ^ "xmlpipe2". sphinxsearch.com. Sphinx Search Documentation. Retrieved 16 August 2013.
  5. ^ "JSON Attributes in Sphinx 2.1.1". sphinxsearch.com. Sphinx Search Blog. 7 February 2013. Retrieved 16 August 2013.
  6. ^ "Full JSON Support in Trunk". sphinxsearch.com. Sphinx Search Blog. 8 August 2013. Retrieved 16 August 2013.
  7. ^ "Sphinx | Open Source Search Server".
  8. ^ "Sphinx at Craigslist". craigslist.org. Craigslist. Retrieved 17 August 2013.
  9. ^ "GM Recruitment". aleph-networks.com. Aleph-networks. Retrieved 1 October 2012.
  10. ^ "Lightning Fast PHP Site Search". tradebit.com. Tradebit. Retrieved 17 August 2013.
  11. ^ "Sphinx Search beta for Vbulletin 4.0". vbulletin.com. Vbulletin. Retrieved 17 August 2013.
  12. ^ "Sphinx Search Extension for MediaWiki". mediawiki.org. MediaWiki: Svemir Brkic, Paul Grinberg. Retrieved 17 August 2013.
  13. ^ "Powered by Sphinx Search: Boardreader". sphinxsearch.com. Sphinx Search. Retrieved 17 August 2013.
  14. ^ a b "Powered by Sphinx". sphinxsearch.com/.
  15. ^ "About Sphinx". sphinxsearch.com. Sphinx Search. Retrieved 16 August 2013.
  16. ^ a b "Powered by Sphinx". sphinxsearch.com. Sphinx Search. Retrieved 10 May 2015.
  17. ^ . craigslist.org. Craigslist. Archived from the original on 5 August 2012. Retrieved 16 August 2013.
  18. ^ "About Manticore Search". manticoresearch.com. Retrieved 24 April 2023.
  19. ^ "Manticore Search: 3 years after forking from Sphinx". manticoresearch.com. Retrieved 2 May 2024.
  20. ^ "manticoresoftware/manticoresearch". GitHub. Retrieved 2 May 2024.

Further reading edit

External links edit

sphinx, search, engine, confused, with, sphinx, documentation, generator, sphinx, sphinx, fulltext, search, engine, that, provides, text, search, functionality, client, applications, sphinxdeveloper, andrew, aksyonoffinitial, release2001, years, 2001, stable, . Not to be confused with Sphinx documentation generator or CMU Sphinx Sphinx is a fulltext search engine that provides text search functionality to client applications SphinxDeveloper s Andrew AksyonoffInitial release2001 23 years ago 2001 Stable release3 5 1 1 3 February 2023 14 months ago 3 February 2023 Written inC Operating systemLinux Windows Solaris FreeBSD NetBSD Mac OS AIXTypeSearch and indexLicenseGPLv2 until version 2 and commercial proprietary since version 3Websitesphinxsearch wbr com Contents 1 Overview 1 1 SphinxAPI 1 2 SphinxQL 1 3 SphinxSE 1 4 Full text fields and indexing 1 5 Attributes 1 5 1 JSON attributes in Sphinx 2 License 3 Sphinx use examples 4 Feature list 5 Performance and scalability 6 Fork 7 See also 8 References 9 Further reading 10 External linksOverview editSphinx can be used either as a stand alone server or as a storage engine SphinxSE for the MySQL family of databases When run as a standalone server Sphinx operates similar to a DBMS and can communicate with MySQL MariaDB and PostgreSQL through their native protocols or with any ODBC compliant DBMS via ODBC MariaDB a fork of MySQL is distributed with SphinxSE 2 SphinxAPI edit If Sphinx is run as a stand alone server it is possible to use SphinxAPI to connect an application to it Official implementations of the API are available for PHP Java Perl Ruby and Python languages Unofficial implementations for other languages as well as various third party 3 plugins and modules are also available Other data sources can be indexed via pipe in a custom XML format 4 SphinxQL edit The Sphinx search daemon supports the MySQL binary network protocol and can be accessed with the regular MySQL API and or clients Sphinx supports a subset of SQL known as SphinxQL It supports standard querying of all index types with SELECT modifying RealTime indexes with INSERT REPLACE and DELETE and more SphinxSE edit Sphinx can also provide a special storage engine for MariaDB and MySQL databases This allows those MySQL MariaDB to communicate with Sphinx s searchd to run queries and obtain results Sphinx indices are treated like regular SQL tables The SphinxSE storage engine is shipped with MariaDB Full text fields and indexing edit Sphinx is configured to examine a data set via its Indexer The Indexer process creates a full text index a special data structure that enables quick keyword searches from the given data text Full text fields are the resulting content that is indexed by Sphinx they can be quickly searched for keywords Fields are named and you can limit your searches to a single field e g search through title only or a subset of fields e g to title and abstract only Sphinx s index format generally supports up to 256 fields Note that the original data is not stored in the Sphinx index but are discarded during the Indexing process Sphinx assumes that you store those contents elsewhere Attributes edit Attributes are additional values associated with each document that can be used to perform additional filtering and sorting during search Attributes are named Attribute names are case insensitive Attributes are not full text indexed they are stored in the index as is Currently supported attribute types are unsigned integers 1 bit to 32 bit wide UNIX timestamps floating point values 32 bit IEEE 754 single precision string ordinals specially computed integers strings since 1 10 beta JSON since 2 1 1 beta 5 6 MVA multi value attributes variable length lists of 32 bit unsigned integers JSON attributes in Sphinx edit Sphinx like classic SQL databases works with a so called fixed schema that is a set of predefined attribute columns These work well when most of the data stored actually has values mapping sparse data to static columns can be cumbersome Assume for example that you re running a price comparison or an auction site with many different products categories Some of the attributes like the price or the vendor are identical across all goods But from there for laptops you also need to store the weight screen size HDD type RAM size etc And say for shovels you probably want to store the color the handle length and so on So it s manageable across a single category but all the distinct fields that you need for all the goods across all the categories are legion The JSON field can be used to overcome this Inside the JSON attribute you don t need a fixed structure You can have various keys which may or may not be present in all documents When you try to filter on one of these keys Sphinx will ignore documents that don t have the key in the JSON attribute and will work only with those documents that have it License editUp until version 3 Sphinx is dual licensed either GNU General Public License version 2 or proprietary licensing is available for use cases which are not within the terms of the GNU GPLv2 Since version 3 Sphinx has become proprietary with a promise to release its source code in the future 7 Sphinx use examples editCraigslist org 8 Recruitment aleph graymatter com 9 Tradebit com 10 vBulletin com 11 MediaWiki extension 12 Boardreader com 13 OMBE com 14 Limundo com 14 Feature list editBatch and incremental soft real time full text indexing Support for non text attributes scalars strings sets JSON Direct indexing of SQL databases Native support for MySQL MariaDB PostgreSQL MSSQL plus ODBC connectivity XML document indexing support Distributed searching support out of the box Integration via access APIs SQL like syntax support via MySQL protocol since 0 9 9 Full text searching syntax Database like result set processing Relevance ranking utilizing additional factors besides standard BM25 Text processing support for SBCS and UTF 8 encodings stopwords indexing of words known not to appear in the database hitless stemming word forms tokenizing exceptions and blended characters dual indexing as both a real character and a word separator Supports UDF since 2 0 1 Performance and scalability editIndexing speed of up to 10 15 MB sec per core and HDD Searching speed of over 500 queries sec against 1 000 000 document 1 2 GB collection using a 2 core desktop system with 2 GB of RAM 15 The biggest known installation using Sphinx Boardreader com indexes 16 billion documents 16 The busiest known installation Craigslist serves over 300 000 000 queries day 16 and more than 50 billion page views month 17 Fork editIn 2017 key members of the original Sphinx team forked the project under the name Manticore 18 with the intention of fixing bugs and developing new features 19 Unlike Sphinx Manticore continues to be released as open source under version 3 of the GPL 20 See also edit nbsp Free and open source software portal List of information retrieval librariesReferences edit Feb 3 2023 Sphinx 3 5 1 released Retrieved 15 June 2023 AskMonty About SphinxSE kb askmonty org Monty Program AB Retrieved 16 August 2013 Sphinx Wiki Third Party Tools sphinxsearch com Sphinx Search Wiki Retrieved 16 August 2013 xmlpipe2 sphinxsearch com Sphinx Search Documentation Retrieved 16 August 2013 JSON Attributes in Sphinx 2 1 1 sphinxsearch com Sphinx Search Blog 7 February 2013 Retrieved 16 August 2013 Full JSON Support in Trunk sphinxsearch com Sphinx Search Blog 8 August 2013 Retrieved 16 August 2013 Sphinx Open Source Search Server Sphinx at Craigslist craigslist org Craigslist Retrieved 17 August 2013 GM Recruitment aleph networks com Aleph networks Retrieved 1 October 2012 Lightning Fast PHP Site Search tradebit com Tradebit Retrieved 17 August 2013 Sphinx Search beta for Vbulletin 4 0 vbulletin com Vbulletin Retrieved 17 August 2013 Sphinx Search Extension for MediaWiki mediawiki org MediaWiki Svemir Brkic Paul Grinberg Retrieved 17 August 2013 Powered by Sphinx Search Boardreader sphinxsearch com Sphinx Search Retrieved 17 August 2013 a b Powered by Sphinx sphinxsearch com About Sphinx sphinxsearch com Sphinx Search Retrieved 16 August 2013 a b Powered by Sphinx sphinxsearch com Sphinx Search Retrieved 10 May 2015 Craigslist Factsheet craigslist org Craigslist Archived from the original on 5 August 2012 Retrieved 16 August 2013 About Manticore Search manticoresearch com Retrieved 24 April 2023 Manticore Search 3 years after forking from Sphinx manticoresearch com Retrieved 2 May 2024 manticoresoftware manticoresearch GitHub Retrieved 2 May 2024 Further reading editAksyonoff Andrew 2011 Introduction to Search with Sphinx From installation to relevance tuning O Reilly Media ISBN 978 0 596 80955 3 Ali Abbas 2011 Sphinx Search Beginner s Guide Birmingham England Packt Publishing ISBN 978 1 84951 254 1 No more open source 2017 https sphinxsearch com blog 2017 07 24 sphinx 2017 External links edit nbsp Wikibooks has a book on the topic of Sphinx Search Official website SphinxSE in MariaDB KnowledgeBase https manticoresearch com Manticore opensource fork site Retrieved from https en wikipedia org w index php title Sphinx search engine amp oldid 1221853688, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.