fbpx
Wikipedia

Learned sparse retrieval

Learned sparse retrieval or sparse neural search is an approach to text search which uses a sparse vector representation of queries and documents.[1] It borrows techniques both from lexical bag-of-words and vector embedding algorithms, and is claimed to perform better than either alone. The best-known sparse neural search systems are SPLADE[2] and its successor SPLADE v2.[3] Others include DeepCT,[4] uniCOIL,[5] EPIC,[6] DeepImpact,[7] TILDE and TILDEv2,[8] Sparta,[9] SPLADE-max, and DistilSPLADE-max.[3]

Some implementations of SPLADE have similar latency to Okapi BM25 lexical search while giving as good results as state-of-the-art neural rankers on in-domain data.[10]

The Official SPLADE model weights and training code is released under a Creative Commons NonCommercial license.[11] But there are other independent implementations of SPLADE++ (a variant of SPLADE models) that are released under permissive licenses.

SPRINT is a toolkit for evaluating neural sparse retrieval systems.[12]

External links edit

  • SPLADE code base at github

Notes edit

  1. ^ Nguyen, Thong; MacAvaney, Sean; Yates, Andrew (2023). "A Unified Framework for Learned Sparse Retrieval". In Kamps, Jaap; Goeuriot, Lorraine; Crestani, Fabio; Maistro, Maria; Joho, Hideo; Davis, Brian; Gurrin, Cathal; Kruschwitz, Udo; Caputo, Annalina (eds.). Advances in Information Retrieval. Lecture Notes in Computer Science. Vol. 13982. Cham: Springer Nature Switzerland. pp. 101–116. arXiv:2303.13416. doi:10.1007/978-3-031-28241-6_7. ISBN 978-3-031-28241-6. S2CID 257585074.
  2. ^ Formal, Thibault; Piwowarski, Benjamin; Clinchant, Stéphane (2021-07-11). "SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking". Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '21. New York, NY, USA: Association for Computing Machinery. pp. 2288–2292. arXiv:2107.05720. doi:10.1145/3404835.3463098. ISBN 978-1-4503-8037-9. S2CID 235792467.
  3. ^ a b Formal, Thibault; Piworwarski, Benjamin; Lassance, Carlos; Clinchant, Stéphane (21 September 2021). "SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval". arXiv:2109.10086v1 [cs.IR].
  4. ^ Dai, Zhuyun; Callan, Jamie (2020-04-20). "Context-Aware Document Term Weighting for Ad-Hoc Search". Proceedings of the Web Conference 2020. New York, NY, USA: ACM. pp. 1897–1907. doi:10.1145/3366423.3380258. ISBN 9781450370233. S2CID 218521094.
  5. ^ Lin, Jimmy; Ma, Xueguang (28 June 2021). "A few brief notes on DeepImpact, COIL, and a conceptual framework for information retrieval techniques". arXiv:2106.14807 [cs.IR].
  6. ^ MacAvaney, Sean; Nardini, Franco Maria; Perego, Raffaele; Tonellotto, Nicola; Goharian, Nazli; Frieder, Ophir (2020-07-25). "Expansion via Prediction of Importance with Contextualization". Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '20. New York, NY, USA: Association for Computing Machinery. pp. 1573–1576. arXiv:2004.14245. doi:10.1145/3397271.3401262. ISBN 978-1-4503-8016-4. S2CID 216641912.
  7. ^ Mallia, Antonio; Khattab, Omar; Suel, Torsten; Tonellotto, Nicola (2021-07-11). "Learning Passage Impacts for Inverted Indexes". Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '21. New York, NY, USA: Association for Computing Machinery. pp. 1723–1727. arXiv:2104.12016. doi:10.1145/3404835.3463030. ISBN 978-1-4503-8037-9. S2CID 233394068.
  8. ^ Zhuang, Shengyao; Zuccon, Guido (13 September 2021). "Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion". arXiv:2108.08513 [cs.IR].
  9. ^ Zhao, Tiancheng; Lu, Xiaopeng; Lee, Kyusong (28 September 2020). "SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval". arXiv:2009.13013 [cs.CL].
  10. ^ Lassance, Carlos; Clinchant, Stéphane (2022-07-07). "An Efficiency Study for SPLADE Models". Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '22. New York, NY, USA: Association for Computing Machinery. pp. 2220–2226. arXiv:2207.03834. doi:10.1145/3477495.3531833. ISBN 978-1-4503-8732-3. S2CID 250340284.
  11. ^ "splade/LICENSE at main · naver/splade". GitHub. Retrieved 2023-08-25.
  12. ^ Thakur, Nandan; Wang, Kexin; Gurevych, Iryna; Lin, Jimmy (2023-07-18). "SPRINT: A Unified Toolkit for Evaluating and Demystifying Zero-shot Neural Sparse Retrieval". Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR '23. New York, NY, USA: Association for Computing Machinery. pp. 2964–2974. arXiv:2307.10488. doi:10.1145/3539618.3591902. ISBN 978-1-4503-9408-6. S2CID 259949923.

learned, sparse, retrieval, splade, redirects, here, eating, utensil, splayd, sparse, neural, search, approach, text, search, which, uses, sparse, vector, representation, queries, documents, borrows, techniques, both, from, lexical, words, vector, embedding, a. SPLADE redirects here For the eating utensil see splayd Learned sparse retrieval or sparse neural search is an approach to text search which uses a sparse vector representation of queries and documents 1 It borrows techniques both from lexical bag of words and vector embedding algorithms and is claimed to perform better than either alone The best known sparse neural search systems are SPLADE 2 and its successor SPLADE v2 3 Others include DeepCT 4 uniCOIL 5 EPIC 6 DeepImpact 7 TILDE and TILDEv2 8 Sparta 9 SPLADE max and DistilSPLADE max 3 Some implementations of SPLADE have similar latency to Okapi BM25 lexical search while giving as good results as state of the art neural rankers on in domain data 10 The Official SPLADE model weights and training code is released under a Creative Commons NonCommercial license 11 But there are other independent implementations of SPLADE a variant of SPLADE models that are released under permissive licenses SPRINT is a toolkit for evaluating neural sparse retrieval systems 12 External links editSPLADE code base at githubNotes edit Nguyen Thong MacAvaney Sean Yates Andrew 2023 A Unified Framework for Learned Sparse Retrieval In Kamps Jaap Goeuriot Lorraine Crestani Fabio Maistro Maria Joho Hideo Davis Brian Gurrin Cathal Kruschwitz Udo Caputo Annalina eds Advances in Information Retrieval Lecture Notes in Computer Science Vol 13982 Cham Springer Nature Switzerland pp 101 116 arXiv 2303 13416 doi 10 1007 978 3 031 28241 6 7 ISBN 978 3 031 28241 6 S2CID 257585074 Formal Thibault Piwowarski Benjamin Clinchant Stephane 2021 07 11 SPLADE Sparse Lexical and Expansion Model for First Stage Ranking Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR 21 New York NY USA Association for Computing Machinery pp 2288 2292 arXiv 2107 05720 doi 10 1145 3404835 3463098 ISBN 978 1 4503 8037 9 S2CID 235792467 a b Formal Thibault Piworwarski Benjamin Lassance Carlos Clinchant Stephane 21 September 2021 SPLADE v2 Sparse Lexical and Expansion Model for Information Retrieval arXiv 2109 10086v1 cs IR Dai Zhuyun Callan Jamie 2020 04 20 Context Aware Document Term Weighting for Ad Hoc Search Proceedings of the Web Conference 2020 New York NY USA ACM pp 1897 1907 doi 10 1145 3366423 3380258 ISBN 9781450370233 S2CID 218521094 Lin Jimmy Ma Xueguang 28 June 2021 A few brief notes on DeepImpact COIL and a conceptual framework for information retrieval techniques arXiv 2106 14807 cs IR MacAvaney Sean Nardini Franco Maria Perego Raffaele Tonellotto Nicola Goharian Nazli Frieder Ophir 2020 07 25 Expansion via Prediction of Importance with Contextualization Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR 20 New York NY USA Association for Computing Machinery pp 1573 1576 arXiv 2004 14245 doi 10 1145 3397271 3401262 ISBN 978 1 4503 8016 4 S2CID 216641912 Mallia Antonio Khattab Omar Suel Torsten Tonellotto Nicola 2021 07 11 Learning Passage Impacts for Inverted Indexes Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR 21 New York NY USA Association for Computing Machinery pp 1723 1727 arXiv 2104 12016 doi 10 1145 3404835 3463030 ISBN 978 1 4503 8037 9 S2CID 233394068 Zhuang Shengyao Zuccon Guido 13 September 2021 Fast Passage Re ranking with Contextualized Exact Term Matching and Efficient Passage Expansion arXiv 2108 08513 cs IR Zhao Tiancheng Lu Xiaopeng Lee Kyusong 28 September 2020 SPARTA Efficient Open Domain Question Answering via Sparse Transformer Matching Retrieval arXiv 2009 13013 cs CL Lassance Carlos Clinchant Stephane 2022 07 07 An Efficiency Study for SPLADE Models Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR 22 New York NY USA Association for Computing Machinery pp 2220 2226 arXiv 2207 03834 doi 10 1145 3477495 3531833 ISBN 978 1 4503 8732 3 S2CID 250340284 splade LICENSE at main naver splade GitHub Retrieved 2023 08 25 Thakur Nandan Wang Kexin Gurevych Iryna Lin Jimmy 2023 07 18 SPRINT A Unified Toolkit for Evaluating and Demystifying Zero shot Neural Sparse Retrieval Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR 23 New York NY USA Association for Computing Machinery pp 2964 2974 arXiv 2307 10488 doi 10 1145 3539618 3591902 ISBN 978 1 4503 9408 6 S2CID 259949923 nbsp This computer science article is a stub You can help Wikipedia by expanding it vte nbsp This computational linguistics related article is a stub You can help Wikipedia by expanding it vte Retrieved from https en wikipedia org w index php title Learned sparse retrieval amp oldid 1219692489, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.