fbpx
Wikipedia

Shattered set

A class of sets is said to shatter another set if it is possible to "pick out" any element of that set using intersection. The concept of shattered sets plays an important role in Vapnik–Chervonenkis theory, also known as VC-theory. Shattering and VC-theory are used in the study of empirical processes as well as in statistical computational learning theory.

Definition edit

Suppose A is a set and C is a class of sets. The class C shatters the set A if for each subset a of A, there is some element c of C such that

 

Equivalently, C shatters A when their intersection is equal to A's power set: P(A) = { cA | cC }.

We employ the letter C to refer to a "class" or "collection" of sets, as in a Vapnik–Chervonenkis class (VC-class). The set A is often assumed to be finite because, in empirical processes, we are interested in the shattering of finite sets of data points.

Example edit

We will show that the class of all discs in the plane (two-dimensional space) does not shatter every set of four points on the unit circle, yet the class of all convex sets in the plane does shatter every finite set of points on the unit circle.

Let A be a set of four points on the unit circle and let C be the class of all discs.

 
The set A of four particular points on the unit circle (the unit circle is shown in purple).

To test where C shatters A, we attempt to draw a disc around every subset of points in A. First, we draw a disc around the subsets of each isolated point. Next, we try to draw a disc around every subset of point pairs. This turns out to be doable for adjacent points, but impossible for points on opposite sides of the circle. As visualized below:

Because there is some subset which can not be isolated by any disc in C, we conclude then that A is not shattered by C. And, with a bit of thought, we can prove that no set of four points is shattered by this C.

However, if we redefine C to be the class of all elliptical discs, we find that we can still isolate all the subsets from above, as well as the points that were formerly problematic. Thus, this specific set of 4 points is shattered by the class of elliptical discs. Visualized below:

With a bit of thought, we could generalize that any set of finite points on a unit circle could be shattered by the class of all convex sets (visualize connecting the dots).

Shatter coefficient edit

To quantify the richness of a collection C of sets, we use the concept of shattering coefficients (also known as the growth function). For a collection C of sets  ,   being any space, often a sample space, we define the nth shattering coefficient of C as

 

where   denotes the cardinality of the set and   is any set of n points,.

  is the largest number of subsets of any set A of n points that can be formed by intersecting A with the sets in collection C.

Here are some facts about  :

  1.   for all n because   for any  .
  2. If  , that means there is a set of cardinality n, which can be shattered by C.
  3. If   for some   then   for all  .

The third property means that if C cannot shatter any set of cardinality N then it can not shatter sets of larger cardinalities.

Vapnik–Chervonenkis class edit

The VC dimension of a class C is defined as

 

or, alternatively, as

 

Note that  

If for any n there is a set of cardinality n which can be shattered by C, then   for all n and the VC dimension of this class C is infinite.

A class with finite VC dimension is called a Vapnik–Chervonenkis class or VC class. A class C is uniformly Glivenko–Cantelli if and only if it is a VC class.

See also edit

References edit

  • Wencour, R. S.; Dudley, R. M. (1981), "Some special Vapnik–Chervonenkis classes", Discrete Mathematics, 33 (3): 313–318, doi:10.1016/0012-365X(81)90274-0.
  • Steele, J. M. (1975), Combinatorial Entropy and Uniform Limit Laws, Ph.D. thesis, Stanford University, Mathematics Department
  • Steele, J. M. (1978), "Empirical discrepancies and subadditive processes", Annals of Probability, 6 (1): 118–227, doi:10.1214/aop/1176995615, JSTOR 2242865.

External links edit

  • Origin of "Shattered sets" terminology, by J. Steele

shattered, look, shattered, wiktionary, free, dictionary, class, sets, said, shatter, another, possible, pick, element, that, using, intersection, concept, shattered, sets, plays, important, role, vapnik, chervonenkis, theory, also, known, theory, shattering, . Look up shattered set in Wiktionary the free dictionary A class of sets is said to shatter another set if it is possible to pick out any element of that set using intersection The concept of shattered sets plays an important role in Vapnik Chervonenkis theory also known as VC theory Shattering and VC theory are used in the study of empirical processes as well as in statistical computational learning theory Contents 1 Definition 2 Example 3 Shatter coefficient 4 Vapnik Chervonenkis class 5 See also 6 References 7 External linksDefinition editSuppose A is a set and C is a class of sets The class C shatters the set A if for each subset a of A there is some element c of C such that a c A displaystyle a c cap A nbsp Equivalently C shatters A when their intersection is equal to A s power set P A c A c C We employ the letter C to refer to a class or collection of sets as in a Vapnik Chervonenkis class VC class The set A is often assumed to be finite because in empirical processes we are interested in the shattering of finite sets of data points Example editWe will show that the class of all discs in the plane two dimensional space does not shatter every set of four points on the unit circle yet the class of all convex sets in the plane does shatter every finite set of points on the unit circle Let A be a set of four points on the unit circle and let C be the class of all discs nbsp The set A of four particular points on the unit circle the unit circle is shown in purple To test where C shatters A we attempt to draw a disc around every subset of points in A First we draw a disc around the subsets of each isolated point Next we try to draw a disc around every subset of point pairs This turns out to be doable for adjacent points but impossible for points on opposite sides of the circle As visualized below nbsp Each individual point can be isolated with a disc showing all four nbsp Each subset of adjacent points can be isolated with a disc showing one of four nbsp A subset of points on opposite sides of the unit circle can not be isolated with a disc Because there is some subset which can not be isolated by any disc in C we conclude then that A is not shattered by C And with a bit of thought we can prove that no set of four points is shattered by this C However if we redefine C to be the class of all elliptical discs we find that we can still isolate all the subsets from above as well as the points that were formerly problematic Thus this specific set of 4 points is shattered by the class of elliptical discs Visualized below nbsp Opposite points of A are now separable by some ellipse showing one of two nbsp Each subset of three points in A is also separable by some ellipse showing one of four With a bit of thought we could generalize that any set of finite points on a unit circle could be shattered by the class of all convex sets visualize connecting the dots Shatter coefficient editMain article growth function To quantify the richness of a collection C of sets we use the concept of shattering coefficients also known as the growth function For a collection C of sets s W displaystyle s subset Omega nbsp W displaystyle Omega nbsp being any space often a sample space we define the nth shattering coefficient of C as S C n max x 1 x 2 x n W card x 1 x 2 x n s s C displaystyle S C n max forall x 1 x 2 dots x n in Omega operatorname card x 1 x 2 dots x n cap s s in C nbsp where card displaystyle operatorname card nbsp denotes the cardinality of the set and x 1 x 2 x n W displaystyle x 1 x 2 dots x n in Omega nbsp is any set of n points S C n displaystyle S C n nbsp is the largest number of subsets of any set A of n points that can be formed by intersecting A with the sets in collection C Here are some facts about S C n displaystyle S C n nbsp S C n 2 n displaystyle S C n leq 2 n nbsp for all n because s A s C P A displaystyle s cap A s in C subseteq P A nbsp for any A W displaystyle A subseteq Omega nbsp If S C n 2 n displaystyle S C n 2 n nbsp that means there is a set of cardinality n which can be shattered by C If S C N lt 2 N displaystyle S C N lt 2 N nbsp for some N gt 1 displaystyle N gt 1 nbsp then S C n lt 2 n displaystyle S C n lt 2 n nbsp for all n N displaystyle n geq N nbsp The third property means that if C cannot shatter any set of cardinality N then it can not shatter sets of larger cardinalities Vapnik Chervonenkis class editMain article VC dimension The VC dimension of a class C is defined as V C C min n n S C n lt 2 n displaystyle VC C underset n min n S C n lt 2 n nbsp or alternatively as V C 0 C max n n S C n 2 n displaystyle VC 0 C underset n max n S C n 2 n nbsp Note that V C C V C 0 C 1 displaystyle VC C VC 0 C 1 nbsp If for any n there is a set of cardinality n which can be shattered by C then S C n 2 n displaystyle S C n 2 n nbsp for all n and the VC dimension of this class C is infinite A class with finite VC dimension is called a Vapnik Chervonenkis class or VC class A class C is uniformly Glivenko Cantelli if and only if it is a VC class See also editSauer Shelah lemma relating the cardinality of a family of sets to the size of its largest shattered setReferences editWencour R S Dudley R M 1981 Some special Vapnik Chervonenkis classes Discrete Mathematics 33 3 313 318 doi 10 1016 0012 365X 81 90274 0 Steele J M 1975 Combinatorial Entropy and Uniform Limit Laws Ph D thesis Stanford University Mathematics Department Steele J M 1978 Empirical discrepancies and subadditive processes Annals of Probability 6 1 118 227 doi 10 1214 aop 1176995615 JSTOR 2242865 External links editOrigin of Shattered sets terminology by J Steele Retrieved from https en wikipedia org w index php title Shattered set amp oldid 1216839839, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.