fbpx
Wikipedia

Optimal matching

Optimal matching is a sequence analysis method used in social science, to assess the dissimilarity of ordered arrays of tokens that usually represent a time-ordered sequence of socio-economic states two individuals have experienced. Once such distances have been calculated for a set of observations (e.g. individuals in a cohort) classical tools (such as cluster analysis) can be used. The method was tailored to social sciences[1] from a technique originally introduced to study molecular biology (protein or genetic) sequences (see sequence alignment). Optimal matching uses the Needleman-Wunsch algorithm.

Algorithm edit

Let   be a sequence of states   belonging to a finite set of possible states. Let us denote   the sequence space, i.e. the set of all possible sequences of states.

Optimal matching algorithms work by defining simple operator algebras that manipulate sequences, i.e. a set of operators  . In the most simple approach, a set composed of only three basic operations to transform sequences is used:

  • one state   is inserted in the sequence  
  • one state is deleted from the sequence   and
  • a state   is replaced (substituted) by state  ,  .

Imagine now that a cost   is associated to each operator. Given two sequences   and  , the idea is to measure the cost of obtaining   from   using operators from the algebra. Let   be a sequence of operators such that the application of all the operators of this sequence   to the first sequence   gives the second sequence  :   where   denotes the compound operator. To this set we associate the cost  , that represents the total cost of the transformation. One should consider at this point that there might exist different such sequences   that transform   into  ; a reasonable choice is to select the cheapest of such sequences. We thus call distance
 
that is, the cost of the least expensive set of transformations that turn   into  . Notice that   is by definition nonnegative since it is the sum of positive costs, and trivially   if and only if  , that is there is no cost. The distance function is symmetric if insertion and deletion costs are equal  ; the term indel cost usually refers to the common cost of insertion and deletion.

Considering a set composed of only the three basic operations described above, this proximity measure satisfies the triangular inequality. Transitivity however, depends on the definition of the set of elementary operations.

Criticism edit

Although optimal matching techniques are widely used in sociology and demography, such techniques also have their flaws. As was pointed out by several authors (for example L. L. Wu[2]), the main problem in the application of optimal matching is to appropriately define the costs  .

Software edit

  • TDA is a powerful program, offering access to some of the latest developments in transition data analysis.
  • STATA has implemented a package to run optimal matching analysis.
  • TraMineR is an open source R-package for analyzing and visualizing states and events sequences, including optimal matching analysis.

References and notes edit

  1. ^ A. Abbott and A. Tsay, (2000) Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect Sociological Methods & Research], Vol. 29, 3-33. doi:10.1177/0049124100029001001
  2. ^ L. L. Wu. (2000) Some Comments on "Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect" 2006-10-24 at the Wayback Machine Sociological Methods & Research, 29 41-64. doi:10.1177/0049124100029001003

optimal, matching, confused, with, maximum, matching, graph, theory, statistical, problem, finding, optimal, match, causal, inference, sequence, analysis, method, used, social, science, assess, dissimilarity, ordered, arrays, tokens, that, usually, represent, . Not to be confused with maximum matching in graph theory or the statistical problem of finding an optimal match for causal inference Optimal matching is a sequence analysis method used in social science to assess the dissimilarity of ordered arrays of tokens that usually represent a time ordered sequence of socio economic states two individuals have experienced Once such distances have been calculated for a set of observations e g individuals in a cohort classical tools such as cluster analysis can be used The method was tailored to social sciences 1 from a technique originally introduced to study molecular biology protein or genetic sequences see sequence alignment Optimal matching uses the Needleman Wunsch algorithm Contents 1 Algorithm 2 Criticism 3 Software 4 References and notesAlgorithm editLet S s1 s2 s3 sT displaystyle S s 1 s 2 s 3 ldots s T nbsp be a sequence of states si displaystyle s i nbsp belonging to a finite set of possible states Let us denote S displaystyle mathbf S nbsp the sequence space i e the set of all possible sequences of states Optimal matching algorithms work by defining simple operator algebras that manipulate sequences i e a set of operators ai S S displaystyle a i mathbf S rightarrow mathbf S nbsp In the most simple approach a set composed of only three basic operations to transform sequences is used one state s displaystyle s nbsp is inserted in the sequence as Ins s1 s2 s3 sT s1 s2 s3 s sT displaystyle a s rm Ins s 1 s 2 s 3 ldots s T s 1 s 2 s 3 ldots s ldots s T nbsp one state is deleted from the sequence as2Del s1 s2 s3 sT s1 s3 sT displaystyle a s 2 rm Del s 1 s 2 s 3 ldots s T s 1 s 3 ldots s T nbsp and a state s1 displaystyle s 1 nbsp is replaced substituted by state s1 displaystyle s 1 nbsp as1 s1 Sub s1 s2 s3 sT s1 s2 s3 sT displaystyle a s 1 s 1 rm Sub s 1 s 2 s 3 ldots s T s 1 s 2 s 3 ldots s T nbsp Imagine now that a cost c ai R0 displaystyle c a i in mathbf R 0 nbsp is associated to each operator Given two sequences S1 displaystyle S 1 nbsp and S2 displaystyle S 2 nbsp the idea is to measure the cost of obtaining S2 displaystyle S 2 nbsp from S1 displaystyle S 1 nbsp using operators from the algebra Let A a1 a2 an displaystyle A a 1 a 2 ldots a n nbsp be a sequence of operators such that the application of all the operators of this sequence A displaystyle A nbsp to the first sequence S1 displaystyle S 1 nbsp gives the second sequence S2 displaystyle S 2 nbsp S2 a1 a2 an S1 displaystyle S 2 a 1 circ a 2 circ ldots circ a n S 1 nbsp where a1 a2 displaystyle a 1 circ a 2 nbsp denotes the compound operator To this set we associate the cost c A i 1nc ai displaystyle c A sum i 1 n c a i nbsp that represents the total cost of the transformation One should consider at this point that there might exist different such sequences A displaystyle A nbsp that transform S1 displaystyle S 1 nbsp into S2 displaystyle S 2 nbsp a reasonable choice is to select the cheapest of such sequences We thus call distance d S1 S2 minA c A such that S2 A S1 displaystyle d S 1 S 2 min A left c A rm such that S 2 A S 1 right nbsp that is the cost of the least expensive set of transformations that turn S1 displaystyle S 1 nbsp into S2 displaystyle S 2 nbsp Notice that d S1 S2 displaystyle d S 1 S 2 nbsp is by definition nonnegative since it is the sum of positive costs and trivially d S1 S2 0 displaystyle d S 1 S 2 0 nbsp if and only if S1 S2 displaystyle S 1 S 2 nbsp that is there is no cost The distance function is symmetric if insertion and deletion costs are equal c aIns c aDel displaystyle c a rm Ins c a rm Del nbsp the term indel cost usually refers to the common cost of insertion and deletion Considering a set composed of only the three basic operations described above this proximity measure satisfies the triangular inequality Transitivity however depends on the definition of the set of elementary operations Criticism editAlthough optimal matching techniques are widely used in sociology and demography such techniques also have their flaws As was pointed out by several authors for example L L Wu 2 the main problem in the application of optimal matching is to appropriately define the costs c ai displaystyle c a i nbsp Software editTDA is a powerful program offering access to some of the latest developments in transition data analysis STATA has implemented a package to run optimal matching analysis TraMineR is an open source R package for analyzing and visualizing states and events sequences including optimal matching analysis References and notes edit A Abbott and A Tsay 2000 Sequence Analysis and Optimal Matching Methods in Sociology Review and Prospect Sociological Methods amp Research Vol 29 3 33 doi 10 1177 0049124100029001001 L L Wu 2000 Some Comments on Sequence Analysis and Optimal Matching Methods in Sociology Review and Prospect Archived 2006 10 24 at the Wayback Machine Sociological Methods amp Research 29 41 64 doi 10 1177 0049124100029001003 Retrieved from https en wikipedia org w index php title Optimal matching amp oldid 1122251933, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.