fbpx
Wikipedia

String-to-string correction problem

In computer science, the string-to-string correction problem refers to determining the minimum cost sequence of edit operations necessary to change one string into another (i.e., computing the shortest edit distance). Each type of edit operation has its own cost value.[1] A single edit operation may be changing a single symbol of the string into another (cost WC), deleting a symbol (cost WD), or inserting a new symbol (cost WI).[2]

If all edit operations have the same unit costs (WC = WD = WI = 1) the problem is the same as computing the Levenshtein distance of two strings.

Several algorithms exist to provide an efficient way to determine string distance and specify the minimum number of transformation operations required.[3][4] Such algorithms are particularly useful for delta creation operations where something is stored as a set of differences relative to a base version. This allows several versions of a single object to be stored much more efficiently than storing them separately. This holds true even for single versions of several objects if they do not differ greatly, or anything in between. Notably, such difference algorithms are used in molecular biology to provide some measure of kinship between different kinds of organisms based on the similarities of their macromolecules (such as proteins or DNA).

Extension edit

The extended variant of the problem includes a new type of edit operation: swapping any two adjacent symbols, with a cost of WS.

This version can be solved in polynomial time under certain restrictions on edit operation costs.[2][5]

Robert A. Wagner (1975) showed that the general problem is NP-complete. In particular, he proved that when WI < WC = WD = ∞ and 0 < WS < ∞ (or equivalently, changing and deletion are not permitted), the problem is NP-complete.[5]

References edit

  1. ^ Wagner, Robert A.; Fischer, Michael J. (1974). "The String-to-String Correction Problem". Journal of the ACM. 21 (1): 168–173. doi:10.1145/321796.321811. S2CID 13381535.
  2. ^ a b Lowrance, Roy; Wagner, Robert A. (April 1975). "An Extension of the String-to-String Correction Problem". Journal of the ACM. 22 (2): 177–183. doi:10.1145/321879.321880. S2CID 18892193.
  3. ^ Edit distance#Computation
  4. ^ Tichy, Walter F. (1984). "The string-to-string correction problem with block moves". ACM Transactions on Computer Systems. 2 (4): 309–321. doi:10.1145/357401.357404. S2CID 14034845.
  5. ^ a b Wagner, Robert A. (May 1975). "On the complexity of the Extended String-to-String Correction Problem". Proceedings of seventh annual ACM symposium on Theory of computing - STOC '75. pp. 218–223. doi:10.1145/800116.803771. ISBN 9781450374194. S2CID 18705107.

string, string, correction, problem, computer, science, string, string, correction, problem, refers, determining, minimum, cost, sequence, edit, operations, necessary, change, string, into, another, computing, shortest, edit, distance, each, type, edit, operat. In computer science the string to string correction problem refers to determining the minimum cost sequence of edit operations necessary to change one string into another i e computing the shortest edit distance Each type of edit operation has its own cost value 1 A single edit operation may be changing a single symbol of the string into another cost WC deleting a symbol cost WD or inserting a new symbol cost WI 2 If all edit operations have the same unit costs WC WD WI 1 the problem is the same as computing the Levenshtein distance of two strings Several algorithms exist to provide an efficient way to determine string distance and specify the minimum number of transformation operations required 3 4 Such algorithms are particularly useful for delta creation operations where something is stored as a set of differences relative to a base version This allows several versions of a single object to be stored much more efficiently than storing them separately This holds true even for single versions of several objects if they do not differ greatly or anything in between Notably such difference algorithms are used in molecular biology to provide some measure of kinship between different kinds of organisms based on the similarities of their macromolecules such as proteins or DNA Extension editThe extended variant of the problem includes a new type of edit operation swapping any two adjacent symbols with a cost of WS This version can be solved in polynomial time under certain restrictions on edit operation costs 2 5 Robert A Wagner 1975 showed that the general problem is NP complete In particular he proved that when WI lt WC WD and 0 lt WS lt or equivalently changing and deletion are not permitted the problem is NP complete 5 References edit Wagner Robert A Fischer Michael J 1974 The String to String Correction Problem Journal of the ACM 21 1 168 173 doi 10 1145 321796 321811 S2CID 13381535 a b Lowrance Roy Wagner Robert A April 1975 An Extension of the String to String Correction Problem Journal of the ACM 22 2 177 183 doi 10 1145 321879 321880 S2CID 18892193 Edit distance Computation Tichy Walter F 1984 The string to string correction problem with block moves ACM Transactions on Computer Systems 2 4 309 321 doi 10 1145 357401 357404 S2CID 14034845 a b Wagner Robert A May 1975 On the complexity of the Extended String to String Correction Problem Proceedings of seventh annual ACM symposium on Theory of computing STOC 75 pp 218 223 doi 10 1145 800116 803771 ISBN 9781450374194 S2CID 18705107 Retrieved from https en wikipedia org w index php title String to string correction problem amp oldid 1170418673, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.