fbpx
Wikipedia

Set cover problem

The set cover problem is a classical question in combinatorics, computer science, operations research, and complexity theory.

Example of an instance of set cover problem.

Given a set of elements {1, 2, …, n} (called the universe) and a collection S of m subsets whose union equals the universe, the set cover problem is to identify the smallest sub-collection of S whose union equals the universe. For example, consider the universe U = {1, 2, 3, 4, 5} and the collection of sets S = { {1, 2, 3}, {2, 4}, {3, 4}, {4, 5} }. Clearly the union of S is U. However, we can cover all elements with only two sets: { {1, 2, 3}, {4, 5} }, see picture. Therefore, the solution to the set cover problem has size 2.

More formally, given a universe and a family of subsets of , a set cover is a subfamily of sets whose union is .

  • In the set cover decision problem, the input is a pair and an integer ; the question is whether there is a set cover of size or less.
  • In the set cover optimization problem, the input is a pair , and the task is to find a set cover that uses the fewest sets.

The decision version of set covering is NP-complete. It is one of Karp's 21 NP-complete problems shown to be NP-complete in 1972. The optimization/search version of set cover is NP-hard.[1] It is a problem "whose study has led to the development of fundamental techniques for the entire field" of approximation algorithms.[2]

Variants edit

In the weighted set cover problem, each set is assigned a positive weight (representing its cost), and the goal is to find a set cover with a smallest weight. The usual (unweighted) set cover corresponds to all sets having a weight of 1.

In the fractional set cover problem, it is allowed to select fractions of sets, rather than entire sets. A fractional set cover is an assignment of a fraction (a number in [0,1]) to each set in  , such that for each element x in the universe, the sum of fractions of sets that contain x is at least 1. The goal is to find a fractional set cover in which the sum of fractions is as small as possible. Note that a (usual) set cover is equivalent to a fractional set cover in which all fractions are either 0 or 1; therefore, the size of the smallest fractional cover is at most the size of the smallest cover, but may be smaller. For example, consider the universe U = {1, 2, 3} and the collection of sets S = { {1, 2}, {2, 3}, {3, 1} }. The smallest set cover has a size of 2, e.g. { {1, 2}, {2, 3} }. But there is a fractional set cover of size 1.5, in which a 0.5 fraction of each set is taken.

Linear program formulation edit

The set cover problem can be formulated as the following integer linear program (ILP).[3]

minimize   (minimize the number of sets)
subject to   for all   (cover every element of the universe)
  for all  . (every set is either in the set cover or not)

For a more compact representation of the covering constraint, one can define an incidence matrix  , where each row corresponds to an element and each column corresponds to a set, and   if element e is in set s, and   otherwise. Then, the covering constraint can be written as  .

Weighted set cover is described by a program identical to the one given above, except that the objective function to minimize is  , where   is the weight of set  .

Fractional set cover is described by a program identical to the one given above, except that   can be non-integer, so the last constraint is replaced by  .

This linear program belongs to the more general class of LPs for covering problems, as all the coefficients in the objective function and both sides of the constraints are non-negative. The integrality gap of the ILP is at most   (where   is the size of the universe). It has been shown that its relaxation indeed gives a factor-  approximation algorithm for the minimum set cover problem.[4] See randomized rounding#setcover for a detailed explanation.

Hitting set formulation edit

Set covering is equivalent to the hitting set problem. That is seen by observing that an instance of set covering can be viewed as an arbitrary bipartite graph, with the universe represented by vertices on the left, the sets represented by vertices on the right, and edges representing the membership of elements to sets. The task is then to find a minimum cardinality subset of left-vertices that has a non-trivial intersection with each of the right-vertices, which is precisely the hitting set problem.

In the field of computational geometry, a hitting set for a collection of geometrical objects is also called a stabbing set or piercing set.[5]

Greedy algorithm edit

There is a greedy algorithm for polynomial time approximation of set covering that chooses sets according to one rule: at each stage, choose the set that contains the largest number of uncovered elements. This method can be implemented in time linear in the sum of sizes of the input sets, using a bucket queue to prioritize the sets.[6] It achieves an approximation ratio of  , where   is the size of the set to be covered.[7] In other words, it finds a covering that may be   times as large as the minimum one, where   is the  -th harmonic number:

 

This greedy algorithm actually achieves an approximation ratio of   where   is the maximum cardinality set of  . For  dense instances, however, there exists a  -approximation algorithm for every  .[8]

 
Tight example for the greedy algorithm with k=3

There is a standard example on which the greedy algorithm achieves an approximation ratio of  . The universe consists of   elements. The set system consists of   pairwise disjoint sets   with sizes   respectively, as well as two additional disjoint sets  , each of which contains half of the elements from each  . On this input, the greedy algorithm takes the sets  , in that order, while the optimal solution consists only of   and  . An example of such an input for   is pictured on the right.

Inapproximability results show that the greedy algorithm is essentially the best-possible polynomial time approximation algorithm for set cover up to lower order terms (see Inapproximability results below), under plausible complexity assumptions. A tighter analysis for the greedy algorithm shows that the approximation ratio is exactly  .[9]

Low-frequency systems edit

If each element occurs in at most f sets, then a solution can be found in polynomial time that approximates the optimum to within a factor of f using LP relaxation.

If the constraint   is replaced by   for all S in   in the integer linear program shown above, then it becomes a (non-integer) linear program L. The algorithm can be described as follows:

  1. Find an optimal solution O for the program L using some polynomial-time method of solving linear programs.
  2. Pick all sets S for which the corresponding variable xS has value at least 1/f in the solution O.[10]

Inapproximability results edit

When   refers to the size of the universe, Lund & Yannakakis (1994) showed that set covering cannot be approximated in polynomial time to within a factor of  , unless NP has quasi-polynomial time algorithms. Feige (1998) improved this lower bound to   under the same assumptions, which essentially matches the approximation ratio achieved by the greedy algorithm. Raz & Safra (1997) established a lower bound of  , where   is a certain constant, under the weaker assumption that P NP. A similar result with a higher value of   was recently proved by Alon, Moshkovitz & Safra (2006). Dinur & Steurer (2013) showed optimal inapproximability by proving that it cannot be approximated to   unless P NP.

Weighted set cover edit

Relaxing the integer linear program for weighted set cover stated above, one may use randomized rounding to get an  -factor approximation. Non weighted set cover can be adapted to the weighted case.[11]

Fractional set cover edit

Related problems edit

  • Hitting set is an equivalent reformulation of Set Cover.
  • Vertex cover is a special case of Hitting Set.
  • Edge cover is a special case of Set Cover.
  • Geometric set cover is a special case of Set Cover when the universe is a set of points in   and the sets are induced by the intersection of the universe and geometric shapes (e.g., disks, rectangles).
  • Set packing
  • Maximum coverage problem is to choose at most k sets to cover as many elements as possible.
  • Dominating set is the problem of selecting a set of vertices (the dominating set) in a graph such that all other vertices are adjacent to at least one vertex in the dominating set. The Dominating set problem was shown to be NP complete through a reduction from Set cover.
  • Exact cover problem is to choose a set cover with no element included in more than one covering set.
  • Red Blue Set Cover.[12]
  • Set-cover abduction.
  • Monotone dualization is a computational problem equivalent to either listing all minimal hitting sets or listing all minimal set covers of a given set family.[13]

Notes edit

  1. ^ Korte & Vygen 2012, p. 414.
  2. ^ Vazirani (2001, p. 15)
  3. ^ Vazirani (2001, p. 108)
  4. ^ Vazirani (2001, pp. 110–112)
  5. ^ Nielsen, Frank (2000-09-06). "Fast stabbing of boxes in high dimensions" (PDF). Theoretical Computer Science. 246 (1): 53–72. doi:10.1016/S0304-3975(98)00336-3. ISSN 0304-3975.
  6. ^ Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990], "Exercise 35.3-3", Introduction to Algorithms (3rd ed.), MIT Press and McGraw-Hill, p. 1122, ISBN 0-262-03384-4
  7. ^ Chvatal, V. (August 1979), "A Greedy Heuristic for the Set-Covering Problem", Mathematics of Operations Research, 4 (3): 233–235, doi:10.1287/moor.4.3.233, JSTOR 3689577
  8. ^ Karpinski & Zelikovsky 1998
  9. ^ Slavík Petr A tight analysis of the greedy algorithm for set cover. STOC'96, Pages 435-441, doi:10.1145/237814.237991
  10. ^ Vazirani (2001, pp. 118–119)
  11. ^ Vazirani (2001, Chapter 14)
  12. ^ Information., Sandia National Laboratories. United States. Department of Energy. United States. Department of Energy. Office of Scientific and Technical (1999). On the Red-Blue Set Cover Problem. United States. Dept. of Energy. OCLC 68396743.
  13. ^ Gainer-Dewar, Andrew; Vera-Licona, Paola (2017), "The minimal hitting set generation problem: algorithms and computation", SIAM Journal on Discrete Mathematics, 31 (1): 63–100, arXiv:1601.02939, doi:10.1137/15M1055024, MR 3590650, S2CID 9240467

References edit

External links edit

  • Benchmarks with Hidden Optimum Solutions for Set Covering, Set Packing and Winner Determination
  • A compendium of NP optimization problems - Minimum Set Cover

cover, problem, cover, problem, classical, question, combinatorics, computer, science, operations, research, complexity, theory, example, instance, cover, problem, given, elements, called, universe, collection, subsets, whose, union, equals, universe, cover, p. The set cover problem is a classical question in combinatorics computer science operations research and complexity theory Example of an instance of set cover problem Given a set of elements 1 2 n called the universe and a collection S of m subsets whose union equals the universe the set cover problem is to identify the smallest sub collection of S whose union equals the universe For example consider the universe U 1 2 3 4 5 and the collection of sets S 1 2 3 2 4 3 4 4 5 Clearly the union of S is U However we can cover all elements with only two sets 1 2 3 4 5 see picture Therefore the solution to the set cover problem has size 2 More formally given a universe U displaystyle mathcal U and a family S displaystyle mathcal S of subsets of U displaystyle mathcal U a set cover is a subfamily C S displaystyle mathcal C subseteq mathcal S of sets whose union is U displaystyle mathcal U In the set cover decision problem the input is a pair U S displaystyle mathcal U mathcal S and an integer k displaystyle k the question is whether there is a set cover of size k displaystyle k or less In the set cover optimization problem the input is a pair U S displaystyle mathcal U mathcal S and the task is to find a set cover that uses the fewest sets The decision version of set covering is NP complete It is one of Karp s 21 NP complete problems shown to be NP complete in 1972 The optimization search version of set cover is NP hard 1 It is a problem whose study has led to the development of fundamental techniques for the entire field of approximation algorithms 2 Contents 1 Variants 2 Linear program formulation 3 Hitting set formulation 4 Greedy algorithm 5 Low frequency systems 6 Inapproximability results 7 Weighted set cover 8 Fractional set cover 9 Related problems 10 Notes 11 References 12 External linksVariants editIn the weighted set cover problem each set is assigned a positive weight representing its cost and the goal is to find a set cover with a smallest weight The usual unweighted set cover corresponds to all sets having a weight of 1 In the fractional set cover problem it is allowed to select fractions of sets rather than entire sets A fractional set cover is an assignment of a fraction a number in 0 1 to each set in S displaystyle mathcal S nbsp such that for each element x in the universe the sum of fractions of sets that contain x is at least 1 The goal is to find a fractional set cover in which the sum of fractions is as small as possible Note that a usual set cover is equivalent to a fractional set cover in which all fractions are either 0 or 1 therefore the size of the smallest fractional cover is at most the size of the smallest cover but may be smaller For example consider the universe U 1 2 3 and the collection of sets S 1 2 2 3 3 1 The smallest set cover has a size of 2 e g 1 2 2 3 But there is a fractional set cover of size 1 5 in which a 0 5 fraction of each set is taken Linear program formulation editThe set cover problem can be formulated as the following integer linear program ILP 3 minimize s S x s displaystyle sum s in mathcal S x s nbsp minimize the number of sets subject to s e s x s 1 displaystyle sum s colon e in s x s geqslant 1 nbsp for all e U displaystyle e in mathcal U nbsp cover every element of the universe x s 0 1 displaystyle x s in 0 1 nbsp for all s S displaystyle s in mathcal S nbsp every set is either in the set cover or not For a more compact representation of the covering constraint one can define an incidence matrix A displaystyle A nbsp where each row corresponds to an element and each column corresponds to a set and A e s 1 displaystyle A e s 1 nbsp if element e is in set s and A e s 0 displaystyle A e s 0 nbsp otherwise Then the covering constraint can be written as A x 1 displaystyle Ax geqslant 1 nbsp Weighted set cover is described by a program identical to the one given above except that the objective function to minimize is s S w s x s displaystyle sum s in mathcal S w s x s nbsp where w s displaystyle w s nbsp is the weight of set s S displaystyle s in mathcal S nbsp Fractional set cover is described by a program identical to the one given above except that x s displaystyle x s nbsp can be non integer so the last constraint is replaced by 0 x s 1 displaystyle 0 leq x s leq 1 nbsp This linear program belongs to the more general class of LPs for covering problems as all the coefficients in the objective function and both sides of the constraints are non negative The integrality gap of the ILP is at most log n displaystyle scriptstyle log n nbsp where n displaystyle scriptstyle n nbsp is the size of the universe It has been shown that its relaxation indeed gives a factor log n displaystyle scriptstyle log n nbsp approximation algorithm for the minimum set cover problem 4 See randomized rounding setcover for a detailed explanation Hitting set formulation editSet covering is equivalent to the hitting set problem That is seen by observing that an instance of set covering can be viewed as an arbitrary bipartite graph with the universe represented by vertices on the left the sets represented by vertices on the right and edges representing the membership of elements to sets The task is then to find a minimum cardinality subset of left vertices that has a non trivial intersection with each of the right vertices which is precisely the hitting set problem In the field of computational geometry a hitting set for a collection of geometrical objects is also called a stabbing set or piercing set 5 Greedy algorithm editThere is a greedy algorithm for polynomial time approximation of set covering that chooses sets according to one rule at each stage choose the set that contains the largest number of uncovered elements This method can be implemented in time linear in the sum of sizes of the input sets using a bucket queue to prioritize the sets 6 It achieves an approximation ratio of H s displaystyle H s nbsp where s displaystyle s nbsp is the size of the set to be covered 7 In other words it finds a covering that may be H n displaystyle H n nbsp times as large as the minimum one where H n displaystyle H n nbsp is the n displaystyle n nbsp th harmonic number H n k 1 n 1 k ln n 1 displaystyle H n sum k 1 n frac 1 k leq ln n 1 nbsp This greedy algorithm actually achieves an approximation ratio of H s displaystyle H s prime nbsp where s displaystyle s prime nbsp is the maximum cardinality set of S displaystyle S nbsp For d displaystyle delta nbsp dense instances however there exists a c ln m displaystyle c ln m nbsp approximation algorithm for every c gt 0 displaystyle c gt 0 nbsp 8 nbsp Tight example for the greedy algorithm with k 3There is a standard example on which the greedy algorithm achieves an approximation ratio of log 2 n 2 displaystyle log 2 n 2 nbsp The universe consists of n 2 k 1 2 displaystyle n 2 k 1 2 nbsp elements The set system consists of k displaystyle k nbsp pairwise disjoint sets S 1 S k displaystyle S 1 ldots S k nbsp with sizes 2 4 8 2 k displaystyle 2 4 8 ldots 2 k nbsp respectively as well as two additional disjoint sets T 0 T 1 displaystyle T 0 T 1 nbsp each of which contains half of the elements from each S i displaystyle S i nbsp On this input the greedy algorithm takes the sets S k S 1 displaystyle S k ldots S 1 nbsp in that order while the optimal solution consists only of T 0 displaystyle T 0 nbsp and T 1 displaystyle T 1 nbsp An example of such an input for k 3 displaystyle k 3 nbsp is pictured on the right Inapproximability results show that the greedy algorithm is essentially the best possible polynomial time approximation algorithm for set cover up to lower order terms see Inapproximability results below under plausible complexity assumptions A tighter analysis for the greedy algorithm shows that the approximation ratio is exactly ln n ln ln n 8 1 displaystyle ln n ln ln n Theta 1 nbsp 9 Low frequency systems editIf each element occurs in at most f sets then a solution can be found in polynomial time that approximates the optimum to within a factor of f using LP relaxation If the constraint x S 0 1 displaystyle x S in 0 1 nbsp is replaced by x S 0 displaystyle x S geq 0 nbsp for all S in S displaystyle mathcal S nbsp in the integer linear program shown above then it becomes a non integer linear program L The algorithm can be described as follows Find an optimal solution O for the program L using some polynomial time method of solving linear programs Pick all sets S for which the corresponding variable xS has value at least 1 f in the solution O 10 Inapproximability results editWhen n displaystyle n nbsp refers to the size of the universe Lund amp Yannakakis 1994 showed that set covering cannot be approximated in polynomial time to within a factor of 1 2 log 2 n 0 72 ln n displaystyle tfrac 1 2 log 2 n approx 0 72 ln n nbsp unless NP has quasi polynomial time algorithms Feige 1998 improved this lower bound to 1 o 1 ln n displaystyle bigl 1 o 1 bigr cdot ln n nbsp under the same assumptions which essentially matches the approximation ratio achieved by the greedy algorithm Raz amp Safra 1997 established a lower bound of c ln n displaystyle c cdot ln n nbsp where c displaystyle c nbsp is a certain constant under the weaker assumption that P displaystyle not nbsp NP A similar result with a higher value of c displaystyle c nbsp was recently proved by Alon Moshkovitz amp Safra 2006 Dinur amp Steurer 2013 showed optimal inapproximability by proving that it cannot be approximated to 1 o 1 ln n displaystyle bigl 1 o 1 bigr cdot ln n nbsp unless P displaystyle nbsp NP Weighted set cover editThis section needs expansion You can help by adding to it November 2017 Relaxing the integer linear program for weighted set cover stated above one may use randomized rounding to get an O log n displaystyle O log n nbsp factor approximation Non weighted set cover can be adapted to the weighted case 11 Fractional set cover editThis section needs expansion You can help by adding to it September 2023 Related problems editHitting set is an equivalent reformulation of Set Cover Vertex cover is a special case of Hitting Set Edge cover is a special case of Set Cover Geometric set cover is a special case of Set Cover when the universe is a set of points in R d displaystyle mathbb R d nbsp and the sets are induced by the intersection of the universe and geometric shapes e g disks rectangles Set packing Maximum coverage problem is to choose at most k sets to cover as many elements as possible Dominating set is the problem of selecting a set of vertices the dominating set in a graph such that all other vertices are adjacent to at least one vertex in the dominating set The Dominating set problem was shown to be NP complete through a reduction from Set cover Exact cover problem is to choose a set cover with no element included in more than one covering set Red Blue Set Cover 12 Set cover abduction Monotone dualization is a computational problem equivalent to either listing all minimal hitting sets or listing all minimal set covers of a given set family 13 Notes edit Korte amp Vygen 2012 p 414 Vazirani 2001 p 15 Vazirani 2001 p 108 Vazirani 2001 pp 110 112 Nielsen Frank 2000 09 06 Fast stabbing of boxes in high dimensions PDF Theoretical Computer Science 246 1 53 72 doi 10 1016 S0304 3975 98 00336 3 ISSN 0304 3975 Cormen Thomas H Leiserson Charles E Rivest Ronald L Stein Clifford 2009 1990 Exercise 35 3 3 Introduction to Algorithms 3rd ed MIT Press and McGraw Hill p 1122 ISBN 0 262 03384 4 Chvatal V August 1979 A Greedy Heuristic for the Set Covering Problem Mathematics of Operations Research 4 3 233 235 doi 10 1287 moor 4 3 233 JSTOR 3689577 Karpinski amp Zelikovsky 1998 Slavik Petr A tight analysis of the greedy algorithm for set cover STOC 96 Pages 435 441 doi 10 1145 237814 237991 Vazirani 2001 pp 118 119 Vazirani 2001 Chapter 14 Information Sandia National Laboratories United States Department of Energy United States Department of Energy Office of Scientific and Technical 1999 On the Red Blue Set Cover Problem United States Dept of Energy OCLC 68396743 Gainer Dewar Andrew Vera Licona Paola 2017 The minimal hitting set generation problem algorithms and computation SIAM Journal on Discrete Mathematics 31 1 63 100 arXiv 1601 02939 doi 10 1137 15M1055024 MR 3590650 S2CID 9240467References editAlon Noga Moshkovitz Dana Safra Shmuel 2006 Algorithmic construction of sets for k restrictions ACM Trans Algorithms 2 2 153 177 CiteSeerX 10 1 1 138 8682 doi 10 1145 1150334 1150336 ISSN 1549 6325 S2CID 11922650 Cormen Thomas H Leiserson Charles E Rivest Ronald L Stein Clifford 2001 Introduction to Algorithms Cambridge Mass MIT Press and McGraw Hill pp 1033 1038 ISBN 978 0 262 03293 3 Feige Uriel 1998 A threshold of ln n for approximating set cover Journal of the ACM 45 4 634 652 CiteSeerX 10 1 1 70 5014 doi 10 1145 285055 285059 ISSN 0004 5411 S2CID 52827488 Karpinski Marek Zelikovsky Alexander 1998 Approximating dense cases of covering problems Proceedings of the DIMACS Workshop on Network Design Connectivity and Facilities Location vol 40 American Mathematical Society pp 169 178 ISBN 9780821870846 Lund Carsten Yannakakis Mihalis 1994 On the hardness of approximating minimization problems Journal of the ACM 41 5 960 981 doi 10 1145 185675 306789 ISSN 0004 5411 S2CID 9021065 Raz Ran Safra Shmuel 1997 A sub constant error probability low degree test and a sub constant error probability PCP characterization of NP STOC 97 Proceedings of the twenty ninth annual ACM symposium on Theory of computing ACM pp 475 484 ISBN 978 0 89791 888 6 Dinur Irit Steurer David 2013 Analytical approach to parallel repetition STOC 14 Proceedings of the forty sixth annual ACM symposium on Theory of computing ACM pp 624 633 Vazirani Vijay V 2001 Approximation Algorithms PDF Springer Verlag ISBN 978 3 540 65367 7 Korte Bernhard Vygen Jens 2012 Combinatorial Optimization Theory and Algorithms 5 ed Springer ISBN 978 3 642 24487 2 Cardoso Nuno Abreu Rui 2014 An Efficient Distributed Algorithm for Computing Minimal Hitting Sets PDF Proceedings of the 25th International Workshop on Principles of Diagnosis Graz Austria doi 10 5281 zenodo 10037 a href Template Citation html title Template Citation citation a CS1 maint location missing publisher link External links edit nbsp Wikimedia Commons has media related to Set cover problem Benchmarks with Hidden Optimum Solutions for Set Covering Set Packing and Winner Determination A compendium of NP optimization problems Minimum Set Cover Retrieved from https en wikipedia org w index php title Set cover problem amp oldid 1202519318, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.