Posted by: yglima | December 24, 2011

Szemerédi’s regularity lemma

Szemerédi’s regularity lemma is an important tool in discrete mathematics, specially in graph theory and additive combinatorics. It says that, in some sense, all graphs can be approximated by random-looking graphs. The lemma helps in proving theorems for arbitrary graphs whenever the corresponding result is easy for random graphs. One of its applications is the triangle removal lemma which, as observed by Ruzsa and Szemerédi in the paper Triple systems with no six points carrying three triangles, gives a proof of Roth’s theorem on the existence of arithmetic progressions of length {3} in subsets of the integers with positive density (see ERT13 for an ergodic theoretical proof).

In this first of two posts, we prove Szemerédi’s regularity lemma. The second post will give some applications of this lemma: the triangle removal lemma and Roth’s theorem. Some of the content has intersection with the Ergodic Ramsey Theory posts, whose interested reader may check here: ERT0, ERT1, ERT2, ERT3, ERT4, ERT5, ERT6, ERT7, ERT8, ERT9, ERT10, ERT11, ERT12, ERT13, ERT14, ERT15, ERT16.

1. Additive combinatorics

Additive combinatorics is the theory of
counting additive structures in sets.
T. Tao and V. Vu.

This theory has seen exciting developments and dramatic changes in direction in recent years, thanks to its connections with areas such as number theory, ergodic theory and graph theory. This section gives a brief historic introduction on the main results.

Van der Waerden’s theorem (see ERT6 for a topological dynamical proof), one of Kintchine’s Three Pearls of Number Theory, states that whenever the natural numbers are finitely partitioned (or, as it is customary to say, finitely colored), one of the cells of the partition contains arbitrarily long arithmetic progressions. In other words, the structure of the natural numbers can not be destroyed by partitions: arbitrarily large parts of {\mathbb N} persist inside some component of the partition. This result was first proved in {1927} and represents the first great result on additive combinatorics. Afterwards, in the mid-thirties, Erdös and Turán conjectured a density version of van der Waerden’s theorem. To present it, let us define what is the notion of density in the natural numbers.

Definition 1 Given a set {A\subset\mathbb N}, the upper density of {A} is

\displaystyle \overline{\rm d}(A)=\limsup_{n\rightarrow\infty}\dfrac{|A\cap\{1,2,\ldots,n\}|}{n}\,\cdot

If the limit exists, we say that {A} has density, and denote it by {{\rm d}(A)}. As pointed out by Erdös and Turán, having positive upper density is a notion of largeness and it is natural to ask if sets with this property have arbitrarily long arithmetic progressions. This quite recalcitrant question was only settled in {1975} by Szemerédi in the paper On sets of integers containing no k elements in arithmetic progressions . Meanwhile, the first partial result was obtained by Roth in {1953}.

Theorem 2 (Roth) If {A\subset\mathbb N} has positive upper density, then it contains an arithmetic progression of length {3}.

His proof relied on a Fourier-analytic argument of energy increment for functions: one decomposes a function {f} as {g+b}, where {g} is good and {b} is bad in a specific sense (this follows the same philosophy of Calderón-Zygmund’s theory on harmonic analysis). If the effect of {b} is large, it is possible to break it into good and bad parts again and so on. In each step, the “energy” of {b} increases a fixed amount. Being bounded, it must stop after a finite number of steps. At the end, {g} controls the behavior of {f} and for it the result is straightforward. See The remarkable effectiveness of ergodic theory in number theory for further details.

Sixteen years later, in the paper On sets of integers containing no four elements in arithmetic progression, Szemerédi  extended Roth’s theorem to

Theorem 3 (Szemerédi) If {A\subset\mathbb N} has positive upper density, then it contains an arithmetic progression of length {4}.

Finally, in {1975}, Szemerédi settled the conjecture in its full generality.

Theorem 4 (Szemerédi) If {A\subset\mathbb N} has positive upper density, then it contains arbitrarily long arithmetic progression.

His proof required a complicated combinatorial argument and relied on a graph-theoretical result, known as Szemerédi’s regularity lemma, which turned out to be an important result in graph theory. It asserts, roughly speaking, that any graph can be decomposed into a relatively small number of disjoint subgraphs, most of which behave pseudo-randomly. This is the main topic of this post.

It is worth to mention Erdös also conjectured that if {A\subset\mathbb N} satisfies

\displaystyle \sum_{n\in A}\dfrac{1}{n}=\infty\,,

then it contains arbitrarily long arithmetic progressions. This question is wide open: nobody knows even if {A} contains arithmetic progressions of length {3}. On the other hand, a remarkable result of Green and Tao states the conjecture for the particular case of the prime numbers.

Theorem 5 (Green and Tao) The prime numbers contain arbitrarily long arithmetic progressions.

2. Setting notation

{G=(V,E)} is a graph, where {V} is a finite set of vertices and {E} is the set of edges, each of them joining two distinct elements of {V}. For disjoint {A,B\subset V}, {e(A,B)} is the number of edges between {A} and {B} and

\displaystyle d(A,B)=\dfrac{e(A,B)}{|A|\cdot |B|}

is the density of the pair {(A,B)}.

Definition 6 For {\varepsilon>0} and disjoint subsets {A,B\subset V}, the pair {(A,B)} is {\varepsilon}-regular if, for every {X\subset A} and {Y\subset B} satisfying

\displaystyle |X|\ge\varepsilon\cdot|A|\ \text{ and }\ |Y|\ge\varepsilon\cdot|B|

we have

\displaystyle |d(X,Y)-d(A,B)|<\varepsilon.

A partition {\mathcal U=\{V_0,V_1,\ldots,V_k\}} of {V} into pairwise disjoint sets in which {V_0} is called the exceptional set is an equipartition if {|V_1|=\cdots=|V_k|}. We view the exceptional set as {|V_0|} distinct parts, each consisting of a single vertex, and its role is purely technical: to make all other classes have exactly the same cardinality.

Definition 7 An equipartition {V=V_0\cup V_1\cup\cdots\cup V_k} is {\varepsilon}-regular if

  1. {|V_0|\le \varepsilon\cdot|V|},
  2. all but at most {\varepsilon k^2} of the pairs {(V_i,V_j)} are {\varepsilon}-regular.

The classes {V_i} are called clusters or groups. Given two partitions {\mathcal U,\mathcal W} of {V}, we say {\mathcal U} refines {\mathcal W} if every cluster of {\mathcal W} is equal to the union of some clusters of {\mathcal U}.

3. Szemerédi’s regularity lemma

Szemerédi’s regularity lemma says that every graph with many vertices can be partitioned into a small number of clusters with the same cardinality, most of the pairs being {\varepsilon}-regular, and a few leftover edges. In my point of view, this result allows the decomposition of every graph with a sufficiently large number of vertices into many components uniformly (every component has the same number of vertices) in such a way the relation of the clusters is at the same time

uniform: the densities do not vary too much, and

randomic: even controlling the density, nothing can be said about the distribution of the edges.

As a toy model, let {0\le p\le 1} and consider the complete random graph {G=(V,E)} with {n} vertices in which every edge belongs to {E} with probability {p}. If {A,B} are disjoint subsets of {V}, the expected value of {d(A,B)} is {p}, and the same happens for subsets {X\subset A}, {Y\subset B}. Szemerédi’s regularity lemma says that, approximately, this is indeed the universal behavior.

Theorem 8 (Szemerédi’s regularity lemma) For every {\varepsilon>0} and every integer {t}, there exist integers {T(\varepsilon,t)} and {N(\varepsilon,t)} for which every graph with at least {N(\varepsilon,t)} vertices has an {\varepsilon}-regular equipartition {(V_0,V_1,\ldots,V_k)}, where {t\le k\le T(\varepsilon,t)}.

Note the importance of having an upper bound for the number of clusters. Otherwise, we could just take each of them to be a singleton.

The idea in the proof is similar to Roth’s approach. Start with an arbitrary partition of {V} into {t} disjoint classes {V_1,\ldots,V_t} of equal sizes. Proceed by showing that, as long as the partition is not {\varepsilon}-regular, it can be refined in a way to distribute the density deviation. This is done by introducing a bounded energy function that increases a fixed amount every time the refinement is made. After a finite number of steps, the resulting partition is {\varepsilon}-regular.

We now discuss what should be the energy function. The natural way of looking for it is to identify the obstruction for a pair {(U,W)} to be {\varepsilon}-regular. This means there are subsets {U_1\subset U} and {W_1\subset W} such that {|U_1|\ge\varepsilon\cdot|U|}, {|W_1|\ge\varepsilon\cdot|W|} and

\displaystyle |d(U_1,W_1)-d(U,W)|>\varepsilon.

Consider the partitions {\mathcal U=\{U_1,U\backslash U_1\}} and {\mathcal W=\{W_1,U\backslash W_1\}}. The above inequality has the following probabilistic interpretation. Consider the random variable {Z} defined on the product {U\times W} by: let {u} be a uniformly random element of {U} and {w} a uniformly random element of {W}, let {A\in\mathcal U} and {B\in\mathcal W} be those members of the respective partitions for which {u\in A} and {w\in B}, and take

\displaystyle Z(u,w)\doteq d(A,B)\,.

The expectation of {Z} is equal to

\displaystyle \begin{array}{rcl} \mathbb E[Z]&=&\displaystyle\sum_{A\in\mathcal U\atop{B\in\mathcal W}}\dfrac{|A|}{|U|}\cdot\dfrac{|B|}{|W|}\cdot d(A,B)\\ &&\\ &=&\dfrac{1}{|U|\cdot|W|}\displaystyle\sum_{A\in\mathcal U\atop{B\in\mathcal W}}e(A,B)\\ &&\\ &=&d(U,W). \end{array}

By assumption, {Z} deviates from {\mathbb E[Z]} at least {\varepsilon} whenever {u\in U_1}, {w\in W_1} and this event has probability

\displaystyle \dfrac{|U_1|}{|U|}\cdot\dfrac{|W_1|}{|W|}\ge \varepsilon^2.

Then {{\rm Var}[Z]\ge \varepsilon^4}. Noting that the expectation of {Z^2} is

\displaystyle \begin{array}{rcl} \mathbb E[Z^2]&=&\displaystyle\sum_{A\in\mathcal U\atop{B\in\mathcal W}}\dfrac{|A|}{|U|}\cdot\dfrac{|B|}{|W|}\cdot d^2(A,B)\\ &&\\ &=&\dfrac{1}{|U|\cdot|W|}\displaystyle\sum_{A\in\mathcal U\atop{B\in\mathcal W}}\dfrac{e^2(A,B)}{|A|\cdot|B|}\,, \end{array}

we conclude that

\begin{array}{rcl} \mathbb E[Z^2]&\ge &\mathbb E[Z]^2+\varepsilon^4\\ & &\\ \dfrac{1}{|U|\cdot|W|}\displaystyle\sum_{A\in\mathcal U\atop{B\in\mathcal W}}\dfrac{e^2(A,B)}{|A|\cdot|B|}&\ge& \dfrac{1}{|U|\cdot|W|}\cdot\dfrac{e^2(U,W)}{|U|\cdot|W|}+\varepsilon^4.\end{array}

The fractions containing e^2 above represent the energy function we are looking for: given two disjoint subsets {A,B\subset V}, define

\displaystyle q(A,B)=\dfrac{1}{n^2}\cdot\dfrac{e^2(A,B)}{|A|\cdot|B|}=\dfrac{|A|\cdot|B|}{n^2}\cdot d^2(A,B)\,.

For partitions {\mathcal U,\mathcal W}, let

\displaystyle q(\mathcal U,\mathcal W)=\sum_{A\in\mathcal U\atop{B\in\mathcal W}}q(A,B)\,.

Definition 9 Given a partition {\mathcal U} of {V} with exceptional set {V_0}, the index of {\mathcal U} is

\displaystyle q(\mathcal U)=\sum_{A,B\in\mathcal U}q(A,B),

where the sum ranges over all unordered pairs of distinct parts {A,B} of {\mathcal U}, with each vertex of {V_0} forming a singleton part in its own.

Note that {q(\mathcal U)} is a sum of {{k+|V_0|}\choose 2} terms of the form {q(A,B)}. The first good property it must have is boundedness.

Property 1. {q(\mathcal U)\le 1/2}.

In fact, as {d(A,B)\le 1},

\displaystyle \begin{array}{rcl} q(\mathcal U)&\le&\dfrac{1}{n^2}\displaystyle\sum_{A,B\in\mathcal U\atop{A\not=B}}|A|\cdot|B|\\ & & \\ &\le&\dfrac{1}{2n^2}\cdot\left(\displaystyle\sum_{A\in\mathcal U}|A|\right)\cdot\left(\displaystyle\sum_{B\in\mathcal U}|B|\right)\\ & & \\ &=&\dfrac{1}{2}\,\cdot \end{array}

It is also monotone increasing with respect to refinements. This is the content of the next two properties.

Property 2. If {U,W} are subsets of {V} and {\mathcal U,\mathcal W} are partitions of {U,V}, respectively, then

\displaystyle q(\mathcal U,\mathcal W)\ge q(U,W)\,.

This property follows easily from Cauchy-Schwarz inequality (the interested reader may check it in the survey Szemerédi’s regularity lemma and its applications in graph theory), but this analytical argument is not so clear. A soft way of proving it is to consider the probabilistic point of view, with the aid of the random variable {Z}. According to the above calculations,

\displaystyle \mathbb E[Z]^2=\dfrac{n^2}{|U|\cdot|W|}\cdot q(U,W)\ \ \text{ and }\ \ \mathbb E[Z^2]=\dfrac{n^2}{|U|\cdot|W|}\cdot q(\mathcal U,\mathcal W)

and so, by Jensen’s inequality (which in this case is just Cauchy-Schwarz inequality),

\displaystyle \begin{array}{rcl} \mathbb E[Z^2]&\ge&\mathbb E[Z]^2\\ &&\\ \Longrightarrow\hspace{1.2cm}q(\mathcal U,\mathcal W)&\ge&q(U,W)\,. \end{array}

Property 3. If {\mathcal U'} refines {\mathcal U}, then

\displaystyle q(\mathcal U')\ge q(\mathcal U)\,.

This is a direct consequence of Property 2 by breaking {q(\mathcal U')} according to {\mathcal U}:

\displaystyle \begin{array}{rcl} q(\mathcal U')&=&\displaystyle\sum_{A',B'\in\mathcal U'}q(A',B')\\ & & \\ &=&\displaystyle\sum_{A,B\in\mathcal U}\displaystyle\sum_{A'\subset A\atop{B'\subset B}}q(A',B')\\ & & \\ &=&\displaystyle\sum_{A,B\in\mathcal U}q(\mathcal U'\cap A,\mathcal U'\cap B)\\ & & \\ &\ge&\displaystyle\sum_{A,B\in\mathcal U}q(A,B)\\ & & \\ &=&q(\mathcal U)\,. \end{array}

The next property grows the index of non {\varepsilon}-regular partitions and reflects the right choice of the energy function. In a few words, it says that

“The lack of uniformity implies energy increment”

and this idea permeates many results in recent developments in combinatorics, harmonic analysis, ergodic theory and others areas. Actually, all known proofs of Szemerédi’s theorem use this principle at some stage. To mention some of them:

  1. the original proof of Roth considers good and bad parts of functions.
  2. Furstenberg’s approach: every non-compact system has a weak mixing factor.
  3. the Fourier-analytic proof of Gowers identifies arithmetic progressions via the nowadays called Gowers norms.
  4. the construction of characteristic factors for multiple ergodic averages uses the Gowers-Host-Kra seminorms.

These two last results are still being developed to generate what is being called higher-order Fourier analysis. See this post of Terence Tao for a discussion about this topic. Going back to what matters, let’s prove the

Proposition 10 (Lack of uniformity implies energy increment 1) Suppose {\varepsilon>0} and {U,W} are disjoint nonempty subsets of {V} and the pair {(U,W)} is not {\varepsilon}-regular. Then there are partitions {\mathcal U=\{U_1,U_2\}} of {U} and {\mathcal W=\{W_1,W_2\}} of {W} such that

\displaystyle q(\mathcal U,\mathcal W)>q(U,W)+\varepsilon^4\cdot\dfrac{|U|\cdot|W|}{n^2}\,\cdot

Proof: The reader must convince himself that this is exactly relation (3). For those still not convinced, let’s do it again. Assume {U_1\subset U} and {W_1\subset W} are such that {|U_1|\ge\varepsilon\cdot|U|}, {|W_1|\ge\varepsilon\cdot|W|} and

\displaystyle |d(U_1,W_1)-d(U,W)|>\varepsilon.

Consider {\mathcal U=\{U_1,U\backslash U_1\}} and {\mathcal W=\{W_1,U\backslash W_1\}}. The evaluation of the variation {{\rm Var}[Z]} will prove the proposition. On one hand, by the calculations in Property 2,

\displaystyle {\rm Var}[Z]=\dfrac{n^2}{|U|\cdot|W|}\cdot\left(q(\mathcal U,\mathcal W)-q(U,W)\right). \ \ \ \ \ (1)

On the other, {Z} deviates from {\mathbb E[Z]} at least {\varepsilon} whenever {u\in U_1}, {w\in W_1} and this event has probability

\displaystyle \dfrac{|U_1|}{|U|}\cdot\dfrac{|W_1|}{|W|}\ge \varepsilon^2.

Then {{\rm Var}[Z]\ge \varepsilon^4} which, together with (1), gives that

\displaystyle \begin{array}{rcl} q(\mathcal U,\mathcal W)-q(U,W)&\ge&\varepsilon^4\cdot\dfrac{|U|\cdot|W|}{n^2}\\ & & \\ \Longrightarrow\hspace{1.9cm}q(\mathcal U,\mathcal W)&\ge&q(U,W)+\varepsilon^4\cdot\dfrac{|U|\cdot|W|}{n^2}\,\cdot \end{array}


Proposition 11 (Lack of uniformity implies energy increment 2) Suppose {0<\varepsilon<1/4} and let {\mathcal U=\{V_0,V_1,\ldots,V_k\}} be a non {\varepsilon}-regular equipartition of {V}, where {V_0} is the exceptional set. Then there exists a refinement {\mathcal U'=\{V_0',V_1',\ldots,V_l'\}} of {\mathcal U} with the following properties:

  1. {\mathcal U'} is an equipartition of {V},
  2. {k<l<k\cdot 8^k},
  3. {|V_0'|\le|V_0|+n/2^k} and
  4. {q(\mathcal U')\ge q(\mathcal U)+\varepsilon^5/2}.

Proof: The idea is to apply the previous proposition to every non-regular pair. As there are at least {\varepsilon k^2} of them, the index will increase the fixed amount. Let {c} be the cardinality of every {V_i}, {i=1,\ldots,k}. Saying that {\mathcal U} is not {\varepsilon}-regular means that, for at least {\varepsilon k^2} pairs {(i,j)}, {1\le i<j\le k}, {(V_i,V_j)} is not {\varepsilon}-regular. For each of these, let {\mathcal U_{ij}}, {\mathcal U_{ji}} be the partitions of {V_i,V_j}, respectively, given by Proposition 10 and consider {\mathcal W} the smallest partition that refines {\mathcal U} and all {\mathcal U_{ij}}, {\mathcal U_{ji}}. By Proposition 10,

\displaystyle \begin{array}{rcl} q(\mathcal W)&\ge& q(\mathcal U)+\varepsilon k^2\cdot \left(\varepsilon^4\cdot\dfrac{c^2}{n^2}\right)\\ & & \\ &=&q(\mathcal U)+\varepsilon^5\cdot\left(\dfrac{kc}{n}\right)^2\\ & & \\ &\ge&q(\mathcal U)+\dfrac{\varepsilon^5}{2}\, \end{array}

as {kc=n-|V_0|\ge n/2}. This proves that {\mathcal W} (and any of its refinements) satisfies (iv). The problem is that {\mathcal W} is not necessarily an equipartition. We adjust this by defining {b=\lfloor c/4^k\rfloor}, splitting every part of {\mathcal W} arbitrarily into disjoint sets of size {b} and throwing the remaining vertices of each part, if any, to the exceptional set. This new partition {\mathcal U'} satisfies (i), (ii) and (iii), as we’ll verify below.

(i) {\mathcal U'} is an equipartition by definition.

(ii) To get {\mathcal W}, every cluster of {\mathcal U} is divided in at most {2^{k-1}} parts. After, every element of {\mathcal W} is divided in at most {4^k} non-exceptional parts. This implies that

\displaystyle l\le k\cdot 2^{k-1}\cdot 4^k<k\cdot 8^k.

(iii) Each cluster of {\mathcal W} contributes with at most {b} vertices to {V_0'} and so

\displaystyle |V_0'|\le |V_0|+b\cdot\left(k\cdot 2^{k-1}\right)\le |V_0|+kc\cdot\dfrac{2^{k-1}}{4^k}<|V_0|+\dfrac{n}{2^k}\,\cdot


Finally, we are able to prove the regularity lemma.

Proof: First, note that if the result is true for {(\varepsilon,t)} and {\varepsilon'>\varepsilon}, {t'<t}, then the result is also true for the pair {(\varepsilon',t')}. This allows us to assume that {\varepsilon<1/4} and {t/\varepsilon} is arbitrarily large.

Begin with an arbitrary partition {\mathcal U=\left\{V_0,V_1,\ldots,V_t\right\}} of {V} such that {|V_0|\le\lfloor n/t\rfloor} and {|V_1|=\cdots=|V_t|=\lfloor n/t\rfloor}. Apply Proposition 11 at most {\varepsilon^{-5}} times to obtain an equipartition {\mathcal U'}. Let {T(\varepsilon,t)} be the largest number obtained by iterating the map {x\mapsto x\cdot 8^x} at most {\varepsilon^{-5}} times, starting from {t}. Then {\mathcal U'} has at most {T(\varepsilon,t)} clusters. In addition, the cardinality of its exceptional set {V_0'} is bounded by

\displaystyle |V_0'|\le |V_0|+\dfrac{1}{\varepsilon^5}\cdot\dfrac{n}{2^t}\le\left\lfloor\dfrac{n}{t}\right\rfloor+\dfrac{n}{2^t\varepsilon^5}\,,

which is smaller than {\varepsilon n} if {t} is large. This concludes the proof. \Box

There is a large literature about Szemerédi’s regularity lemma. We refer the reader to four references: my lecture notes available at my homepage, the book The probabilistic method of Alon and Spencer, the survey of Komlós and M. Simonovits and Tao’s perspective via random partitions. Merry Christmas!!


  1. […] Disquisitiones Mathematicae starts a series on Szemerédi’s regularity Lemma. […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.


%d bloggers like this: