fbpx
Wikipedia

Trigram search

Trigram search is a method of searching for text when the exact syntax or spelling of the target object is not precisely known[1] or when queries may be regular expressions.[2] It finds objects which match the maximum number of three consecutive character strings (i.e. trigrams) in the entered search terms, which are generally near matches.[3] Two strings with many shared trigrams can be expected to be very similar.[4] Trigrams also allow for efficiently creating indexes for searches that are regular expressions or match the text inexactly. Indexes can significantly accelerate searches.[5][6] A threshold for number of trigram matches can be specified as a cutoff point, after which a result is no longer considered a match.[4]

Using trigrams for accelerating searches is a technique used in some systems for code searching, in situations in which queries that are regular expressions may be useful,[5][2][7] in search engines such as Elasticsearch,[8] as well as in databases such as PostgreSQL.[4]

Examples

Consider the string "alice". The trigrams of the string would be "ali", "lic", and "ice," not including spaces.[5] Searching for this string in a database with a trigram-based index would involve finding which objects contain as many of the three trigrams as possible.

As a concrete example of using trigram search to search for a regular expression query, consider searching for the string ab[cd]e, where the brackets denote that the third character in the string being searched for could be c or d. In this situation, one could query the index for objects that have the two trigrams abc and bce or the two trigrams abd and bde. Thus, finding this query would involve no string matching, and could just query the index directly, which can be faster in practice.[2]

See also

References

  1. ^ Hardarson, Omar (1997). "Interactive coding of economic activity using trigram search in BLAISE III" (PDF). International Blaise User Group. Note: This article discusses trigram search as a way of efficiently encoding certain kinds of economic data, and finds that this technique is particularly useful when the users of the system don't have a lot of context for the structure of the data.
  2. ^ a b c Cox, Russ (January 2012). "Regular Expression Matching with a Trigram Index or How Google Code Search Worked".
  3. ^ Adams, Elizabeth; Meltzer, Arnold (1 March 1993). "Trigrams as index element in full text retrieval: observations and experimental results". CSC '93: Proceedings of the 1993 ACM Conference on Computer Science. doi:10.1145/170791.170891. S2CID 16701550.
  4. ^ a b c "F.33. pg_trgm". PostgreSQL Documentation. 2022-05-12. Retrieved 2022-05-28.
  5. ^ a b c "Fast Search Using PostgreSQL Trigram Text Indexes". GitLab. 2016-03-18. Retrieved 2022-05-28.
  6. ^ Zobel, Justin; Moffat, Alistair; Sacks-Davis, Ron (1993). "Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files" (PDF). Conference on Very Large Databases (VLDB). Note: This research paper does not use the term "trigram search" but does seem to be the first instance in the literature of using n-grams as indices, and is cited in the Russ Cox article as the first instance of the structure of a reverse index based on trigrams. The paper also cited successful performance results from using this style of search.
  7. ^ "Big Grep". resources.sei.cmu.edu. Retrieved 2022-06-12. Also called BigGrep. Uses n-grams (not always 3),
  8. ^ "N-gram tokenizer | Elasticsearch Guide [8.2] | Elastic". www.elastic.co. Retrieved 2022-05-28.

trigram, search, method, searching, text, when, exact, syntax, spelling, target, object, precisely, known, when, queries, regular, expressions, finds, objects, which, match, maximum, number, three, consecutive, character, strings, trigrams, entered, search, te. Trigram search is a method of searching for text when the exact syntax or spelling of the target object is not precisely known 1 or when queries may be regular expressions 2 It finds objects which match the maximum number of three consecutive character strings i e trigrams in the entered search terms which are generally near matches 3 Two strings with many shared trigrams can be expected to be very similar 4 Trigrams also allow for efficiently creating indexes for searches that are regular expressions or match the text inexactly Indexes can significantly accelerate searches 5 6 A threshold for number of trigram matches can be specified as a cutoff point after which a result is no longer considered a match 4 Using trigrams for accelerating searches is a technique used in some systems for code searching in situations in which queries that are regular expressions may be useful 5 2 7 in search engines such as Elasticsearch 8 as well as in databases such as PostgreSQL 4 Examples EditConsider the string alice The trigrams of the string would be ali lic and ice not including spaces 5 Searching for this string in a database with a trigram based index would involve finding which objects contain as many of the three trigrams as possible As a concrete example of using trigram search to search for a regular expression query consider searching for the string ab cd e where the brackets denote that the third character in the string being searched for could be c or d In this situation one could query the index for objects that have the two trigrams abc and bce or the two trigrams abd and bde Thus finding this query would involve no string matching and could just query the index directly which can be faster in practice 2 See also EditSearch engine indexing Approximate string matching Trigram N gram Regular expression Google Code SearchReferences Edit Hardarson Omar 1997 Interactive coding of economic activity using trigram search in BLAISE III PDF International Blaise User Group Note This article discusses trigram search as a way of efficiently encoding certain kinds of economic data and finds that this technique is particularly useful when the users of the system don t have a lot of context for the structure of the data a b c Cox Russ January 2012 Regular Expression Matching with a Trigram Index or How Google Code Search Worked Adams Elizabeth Meltzer Arnold 1 March 1993 Trigrams as index element in full text retrieval observations and experimental results CSC 93 Proceedings of the 1993 ACM Conference on Computer Science doi 10 1145 170791 170891 S2CID 16701550 a b c F 33 pg trgm PostgreSQL Documentation 2022 05 12 Retrieved 2022 05 28 a b c Fast Search Using PostgreSQL Trigram Text Indexes GitLab 2016 03 18 Retrieved 2022 05 28 Zobel Justin Moffat Alistair Sacks Davis Ron 1993 Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files PDF Conference on Very Large Databases VLDB Note This research paper does not use the term trigram search but does seem to be the first instance in the literature of using n grams as indices and is cited in the Russ Cox article as the first instance of the structure of a reverse index based on trigrams The paper also cited successful performance results from using this style of search Big Grep resources sei cmu edu Retrieved 2022 06 12 Also called BigGrep Uses n grams not always 3 N gram tokenizer Elasticsearch Guide 8 2 Elastic www elastic co Retrieved 2022 05 28 Retrieved from https en wikipedia org w index php title Trigram search amp oldid 1136417568, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.