Last January 8, 2020, Jialun Li gave the talk “Decrease of Fourier coefficients of stationary measures on the circle” in the “flat seminar” that I co-organize with Anton Zorich once per month.

In this post, I’ll transcript my notes of this nice talk (while taking full responsibility for any errors/mistakes in what follows).

1. Introduction

1.1. Stationary measures

Consider the linear action of {G=SL_2(\mathbb{R})} on {\mathbb{R}^2} induces an action on the projective space {X=\mathbb{P}(\mathbb{R}^2)}. For later use, recall that {X\simeq\mathbb{T}^1=\mathbb{R}/\pi\mathbb{Z}} via

\displaystyle \mathbb{T}^1\ni\theta\mapsto [\cos\theta:\sin\theta]=\mathbb{R}\cdot(\cos\theta,\sin\theta)\in X.

Given a probability measure {\mu} on {G}, we can build a Markov chain / random walk whose steps consist into taking points {y\in X} into {gy} where {g\in G} is chosen accordingly with the law of {\mu}.

The absence of hypothesis on {\mu} might lead to uninteresting random walks: in fact, if a point {x\in X} is stabilized by two elements {g,h\in Stab_x(G)}, then the random walk starting at {x} associated to {\mu=\frac{1}{2}(\delta_g+\delta_h)} is not very interesting.

For this reason, we shall assume that

Hypothesis (i): the support of {\mu} generates a Zariski-dense semigroup {\langle \textrm{supp}(\mu)\rangle}.

Remark 1 By Tits alternative, in our current setting of {G=SL_2(\mathbb{R})}, the hypothesis (i) can be reformulated by replacing “Zariski-dense” with “not solvable”.

As it was famously established by Furstenberg, the random walks associated to {\mu} have a well-defined asymptotic behaviour whenever (i) is fulfilled:

Theorem 1 (Furstenberg) Under (i), there exists (an unique) probability measure {\nu} on {X} such that, for all {x\in X},

\displaystyle \mu^n\ast\delta_x\rightharpoonup\nu

as {n\rightarrow\infty}. Here, the convolution of {\mu} with a probability measure {\eta} on {X} is a probability measure {\mu\ast\eta} on {X} defined as

\displaystyle \mu\ast\eta = \int_{g\in G} g_*\eta \, \, d\mu(g),

so that {\mu^n\ast\delta_x=\underbrace{\mu\ast\dots\ast\mu}_{n \textrm{ times}}\ast\nu} is the distribution of points obtained from {x} after {n} steps of the Markov chain associated to {\mu}.

In the literature, {\nu} is called Furstenberg measure, and it is an important example of {\mu}stationary measure, i.e., a probability measure on {X} which is “invariant on average”:

\displaystyle \nu=\int_{g\in G}g_*\nu \, \, d\mu(g) := \mu\ast\nu.

1.2. Lyapunov exponents

The stationary measure {\nu} can be used to describe the growth of the norms of random products {g_n\dots g_1} associated to {\mu^{\otimes\mathbb{N}}}-almost every {(g_1,\dots, g_n,\dots)\in G^{\mathbb{N}}} whenever {\mu} satisfies (i) and its first moment is finite:

Theorem 2 (Furstenberg, Guivarch–Raugi) If {\mu} has finite first moment, i.e.,

\displaystyle \int_G\log\|g\|\ d\mu(g)<\infty

and {\mu} satisfies (i), then

\displaystyle \lim\limits_{n\rightarrow\infty}\frac{1}{n}\log\|g_n\dots g_1\|= \sigma_{\mu}:=\int_{x=\mathbb{R}v\in X}\int_{g\in G}\log\frac{\|gv\|}{\|v\|} d\mu(g)\,d\nu(x)>0

for {\mu^{\otimes\mathbb{N}}}-almost every {(g_1,\dots, g_n, \dots)\in G^{\mathbb{N}}}.

The quantity {\sigma_{\mu}} is called Lyapunov exponent.

1.3. Regularity of stationary measures

The Furstenberg measure dictates the distribution of the Markov chains associated to {\mu} and, for this reason, it is natural to inquiry about the regularity properties of stationary measures.

In this direction, Guivarch showed that the Furstenberg measures have a certain regularity when {\mu} satisfies (i) and its exponential moment is finite:

Hypothesis (ii): there exists {\theta>0} with {\int_{g\in G} \|g\|^{\theta}\,\,d\mu(g)<\infty}.

Theorem 3 (Guivarch) Under (i) and (ii), there are {\alpha>0} and {C>0} such that

\displaystyle \nu(B(x,r))\leq C r^{\alpha}

for all {x\in X} and {r>0} (where {B(x,r)} is the interval of radius {r} centered at {x\in X\simeq\mathbb{T}^1}). In particular, {\nu} has no atoms.

More recently, Jialun Li established in this article here another regularity result by showing the decay of the Fourier coefficients

\displaystyle \widehat{\nu}(k):=\int_X e^{2ikx}d\nu(x), \quad k\in\mathbb{Z},

(where {X=\mathbb{P}(\mathbb{R}^2)\simeq\mathbb{T}^1=\mathbb{R}/\pi\mathbb{Z}}). More concretely, he proved that:

Theorem 4 (Li) Under (i) and (ii), we have {\lim\limits_{|k|\rightarrow\infty}\hat{\nu}(k)=0}. In other words, {\nu} is a Rajchman measure.

In a certain sense, the role of assumption (i) in the previous theorem is to avoid the following kind of example:

Example 1 Let {\mu=\frac{1}{2}(\delta_g+\delta_h)} with

\displaystyle g=\left(\begin{array}{cc} \frac{1}{\sqrt{3}} & 0 \\ 0 & \sqrt{3}\end{array}\right) \quad \textrm{and} \quad h=\left(\begin{array}{cc} \frac{1}{\sqrt{3}} & \frac{2}{\sqrt{3}} \\ 0 & \sqrt{3}\end{array}\right).

Note that the semigroup generated by {\textrm{supp}(\mu)=\{a,b\}} is not Zariski dense in {G} (as {a} and {b} are upper-triangular).We affirm that there is no decay of Fourier coefficients in this situation. Indeed, recall that if we identify {X} with {\mathbb{R}\cup\{\infty\}} via {X\ni x=\mathbb{R}\cdot (v_1,v_2)\mapsto v_1/v_2\in\mathbb{R}\cup\{\infty\}}, then {G} acts on {\mathbb{R}\cup\{\infty\}} via Möbius transformations, i.e., an element {\left(\begin{array}{cc} a & b \\ c & d\end{array}\right)\in G} acts on {x\in\mathbb{R}\cup\{\infty\}} as

\displaystyle \left(\begin{array}{cc} a & b \\ c & d\end{array}\right)x=\frac{ax+b}{cx+d}.

In particular, {gx=x/3}, {hx=(x+2)/3}, and the Fourier coefficients of the stationary measure given by the standard Hausdorff measure on middle-third Cantor set do not decay to zero.In a similar vein, if {0<\lambda<1} is a real number such that {1/\lambda} is a Pisot number, then {\mu=\frac{1}{2}(\delta_{g_{\lambda}}+\delta_{h_{\lambda}})} with

\displaystyle g_{\lambda}=\left(\begin{array}{cc} \sqrt{\lambda} & -\frac{1}{\sqrt{\lambda}} \\ 0 & \frac{1}{\sqrt{\lambda}}\end{array}\right) \quad \textrm{and} \quad h=\left(\begin{array}{cc} \sqrt{\lambda} & \frac{1}{\sqrt{\lambda}} \\ 0 & \frac{1}{\sqrt{\lambda}}\end{array}\right)

admits a stationary measure {\nu_{\lambda}} (called the Bernoulli convolution of parameter {\lambda} describing the distribution of the points {\sum\limits_{j\geq0}\varepsilon_j \lambda^j} where {\varepsilon_j=\pm1} with probability {1/2}) whose Fourier coefficients do not decay.

The proof of Theorem 4 is based on a renewal theorem. More concretely, given a function {f\in C^{\infty}_c(\mathbb{R})}, let

\displaystyle Rf(t):=\sum\limits_{n=1}^{\infty} \int f(\log\|g\|-t) d\mu^n(g).

By thinking of {f} as a smooth version of the characteristic function of an interval {I\subset \mathbb{R}}, we see that {Rf(t)} is “counting random products {g_n\dots g_1} with norm in the interval {\exp(I+t)}”. In this context, Guivarch and Le Page established the following renewal theorem:

Theorem 5 (Guivarch–Le Page) Under (i) and (ii), one has

\displaystyle Rf(t)\rightarrow \frac{1}{\sigma_{\mu}}\int_{\mathbb{R}} f(u) \, du

as {t\rightarrow\infty}.

Remark 2 Another important fact in the proof of Theorem 4 is the non-arithmeticity of the Jordan projections {\lambda(g)=\log (\textrm{top eigenvalue of }g)} of the elements of {\langle \textrm{supp}(\mu)\rangle}, i.e., the fact that these Jordan projections generate a dense subgroup of {\mathbb{R}} (whenever (i) is satisfied).

Since we will come back later to the discussion of deriving the decay of Fourier coefficients (e.g., Theorem 4) from a renewal theorem, let us now move forward in order to introduce the main result of this post, namely, a quantitative version of Theorem 4.

2. Quantitative decay of Fourier coefficients

The central result of this post is inspired by the following theorem of Bourgain and Dyatlov.

Theorem 6 (Bourgain–Dyatlov) If {\nu_{PS}} is the Patterson–Sullivan measure associated to a Schottky subgroup of {SL_2(\mathbb{R})}, then there exists {\varepsilon>0} (depending only on the dimension of {\nu_{PS}}, i.e., the Hausdorff dimension of the limit set of the Schottky subgroup) such that

\displaystyle |\widehat{\nu_{PS}}(k)|=O(|k|^{-\varepsilon})

for all {k\in\mathbb{Z}}.

The method of proof of this result is based on the so-called discretized sum-product estimates from additive combinatorics.

Interestingly enough, this result can be interpreted as a decay of Fourier coefficients of certain stationary measures thanks to the following theorem:

Theorem 7 (Furstenberg, Sullivan, …) The Patterson–Sullivan measure {\nu_{PS}} of a Schottky subgroup coincides with the stationary measure {\nu} of some probability measure {\mu} on {G} satisfying (i) and (ii).

Remark 3 We saw the proof of a version of this result for cocompact lattices of {SL_2(\mathbb{R})} in Proposition 14 of this blog post here.

The previous theorems suggest that a decay of Fourier coefficients of the Furstenberg measure associated to a probability measure {\mu} on {G} satisfying (i) and (ii). This statement was recently proved by Jialun Li in this article here.

Theorem 8 (Li) If {\mu} is a probability measure on {G=SL_2(\mathbb{R})} satisfying (i) and (ii), then there exists {\varepsilon>0} such that the Furstenberg measure {\nu} associated to {\mu} verifies

\displaystyle |\widehat{\nu}(k)|=O(|k|^{-\varepsilon})

for all {k\in\mathbb{Z}}.

Remark 4 Actually, Li’s theorem is stated in his article for any real split semisimple Lie group {G}.

The proof of this result is also based on a discretized sum-product estimate. Moreover, this statement is closely related to spectral gap of transfer operators and a renewal theorem:

Theorem 9 (Li) Let {\mu} be a probability measure on {G} verifying (i) and (ii). Given {b\in \mathbb{R}}, consider the transfer operator

\displaystyle P_{ib} f(x) := \int_G \exp(i b \log\frac{\|gv\|}{\|v\|}) f(gx) \, d\mu(g), \quad x=\mathbb{R}\cdot v\in X,

acting on {f\in C^{\gamma}(X)} (with {\gamma>0} small enough). Then,we have the following spectral gap property: there exists {\rho<1} such that the spectral radius of {P_{ib}} satisfies

\displaystyle \rho(P_{ib})<\rho

for all {|b|>1}.

Theorem 10 (Li) Under (i) and (ii), there exists {\varepsilon>0} such that the renewal operator {Rf(t)=\sum\limits_{n=1}^{\infty} \int f(\log\|g\|-t) d\mu^n(g)} satisfies

\displaystyle Rf(t) = \frac{1}{\sigma_{\mu}}\int_{\mathbb{R}} f(u) \, du + O(e^{-\varepsilon t} |f|_{C^2})

for all {f\in C^{\infty}_c(\mathbb{R})}.

In his article, Li establishes first Theorem 8 from a discretized sum-product estimate, and subsequently Theorems 9 and 10 are deduced from Theorem 8.

Nevertheless, Li pointed out in his talk that Theorems 89 and 10 are “morally equivalent” to each other. In fact,

  • Theorem 8 {\implies} Theorem 9: the Fourier decay can be used to prove spectral gap for transfer operators via the so-called Dolgopyat method (which was discussed in this blog post here);
  • Theorem 9 {\implies} Theorem 10: the spectral gap for transfer operators allows to deduce the renewal theorem because some elementary calculations reveal that {Rf(t)} is related to {(Id-P_{ib})^{-1}};
  • Theorem 10 {\implies} Theorem 8: let us finally fulfil our promise made in the end of the previous section by briefly explaining the idea of the derivation of the Fourier decay in Theorem 8 from the renewal theorem in Theorem 10; since {\mu^n\ast\nu=\nu}, the {k}th Fourier coefficient of the Furstenberg measure is

    \displaystyle \widehat{\nu}(k) = \int_X e^{2ikx}\,d\nu(x) = \int_G \int_X e^{2ikx}\,d\mu^n(g)\,d\nu(x);

    by Cauchy–Schwarz inequality, the control of {|\widehat{\nu}(k)|} is reduced to the study of

    \displaystyle \int_G e^{2ik(gx-gy)} d\mu^n(g);

    since {gx-gy\sim \|g\|^{-1} d(x,y)}, we see that the size of the integral above depends on the “number of random products {g=g_n\dots g_1} with norm {\|g\|} in a given interval”, and the answer to this kind of “counting problem” is encoded in the asymptotic property of the renewal operator {Rf(t)} provided by Theorem 10.

Remark 5 The analog of Theorem 10 in Abelian settings is false: the random walks driven by a finitely supported law {\lambda} on {\mathbb{R}} which is not arithmetic (i.e., its support generates a dense subgroup) verify a renewal theorem

\displaystyle Rf(t)=\sum\limits_{n=1}^{\infty}\int f(x-t) d\lambda^n(x)\rightarrow \frac{1}{\mathbb{E}(\lambda)}\int f(u) \, du \quad \textrm{as } t\rightarrow\infty

for {f\in C^{\infty}_c(\mathbb{R})}, but the error term is never exponential because {\#\textrm{supp}(\lambda^n)} grows polynomially with {n}. (Of course, this phenomenon is avoided in the context of {SL_2(\mathbb{R})} thanks to the fact that the Zariski-density assumption (i) on {\mu} ensures an exponential growth of {\#\textrm{supp}(\mu^n)} with {n}.)

Posted by: matheuscmss | January 2, 2020

Breuillard–Sert’s joint spectrum (III)

Last time, we saw that if {S\subset G} is a compact subset of reductive, real linear algebraic group {G} such that the monoid {\langle S\rangle} generated by {S} is Zariski dense in {G}, then the Cartan projections {\frac{1}{n}\kappa(S^n)} and the Jordan projections {\frac{1}{n}\lambda(S^n)} associated to {S} converge in the Hausdorff topology to the same limit {J(S)}, an object baptised “joint spectrum of {S}” by Breuillard and Sert.

Today, I’ll transcript below my notes of a talk by Romain Dujardin explaining to the participants of our groupe de travail some basic convexity and continuity properties of the joint spectrum. After that, we close the post with a brief discussion of the question of prescribing the joint spectrum.

As usual, all mistakes in what follows are my sole responsibility.

1. Preliminaries

Let us warm up by reviewing the setting of the previous posts of this series.

Let {G} be a reductive real linear algebraic group and denote its rank by {d}. By definition, a maximal torus {A\subset G} is isomorphic to {(\mathbb{R}^*_+)^d}.

The Cartan decomposition {G=KAK} (with {K} a maximal compact subgroup of {G}) allows to write any {g\in G} as {g\in K\exp(\kappa(g))K} for an unique {\kappa(g)\in\mathfrak{a}^+} where {\mathfrak{a}^+} is a choice of Weyl chamber in the Lie algebra {\mathfrak{a}} of {A}. The interior of the Weyl chamber {\mathfrak{a}^+} is denoted by {\mathfrak{a}^{++}}.

Example 1 For {G=GL_n}, we can take {\mathfrak{a}^+\simeq \{(x_1,\dots,x_n)\in\mathbb{R}^n: x_1\geq\dots\geq x_n\}} in {\mathfrak{a}\simeq \mathbb{R}^n}, so that {\mathfrak{a}^{++}\simeq\{(x_1,\dots,x_n)\in\mathbb{R}^n: x_1>\dots>x_n\}}.

The element {\kappa(g)\in\mathfrak{a}^+} is called the Cartan projection of {g}.

Example 2 For {G=GL_n}, {\kappa(g)=(\log a_1(g),\dots, \log a_n(g))\in\mathfrak{a}^+}, where {a_1(g)\geq \dots\geq a_n(g)>0} are the singular values of {g}.

Similarly, the Jordan projection {\lambda(g)} is defined in terms of the Jordan-Chevalley decomposition. For {G=GL_n}, this amounts to write the Jordan normal form {g=d+n = d(1+d^{-1}n)} with {d} diagonalisable and {n} nilpotent, so that {g=d(1+d^{-1}n) = \widetilde{g}_s g_u = g_e g_s g_u} with {g_u=1+d^{-1}n} unipotent, {g_e\in O(n)}, and {g_s=\exp(\lambda(g))} has eigenvalues {|\lambda_1(g)|\geq\dots\geq|\lambda_n(g)|} where {\lambda_i(g)} are the eigenvalues of {g} (ordered by decreasing sizes of their moduli).

The group {G} has a family {\rho_1,\dots, \rho_d} of distinguished representations such that the components of the vectors {\kappa(g)}, resp. {\lambda(g)}, are linear combinations of {\log\|\rho_i(g)\|}, resp. {\log|\lambda_1(\rho_i(g))|}. In particular, the usual formula for the spectral radius implies that {\frac{1}{n}\kappa(g^n)\rightarrow\lambda(g)} as {n\rightarrow\infty} (and, as it turns out, this fact is important in establishing the coincidence of the limits of the sequences {\frac{1}{n}\kappa(S^n)} and {\frac{1}{n}\lambda(S^n)}).

Example 3 For {G=GL_n}, the representations {\rho_i} of {G} on {\wedge^i \mathbb{R}^n}, {1\leq i \leq n}, have the property that the eigenvalue of {\rho_i(g)} with the largest modulus is {\lambda_1(g)\dots\lambda_i(g)}.

The rank {d} of {G} can be written as {d=d_s+(d-d_s)} where {d-d_s} is the dimension of the center {Z(G)} of {G}. In the literature, {d_s} is called the semi-simple rank of {G}. In general, we have {d_s} “truly” distinguished representations which are completed by a choice of {d-d_s} characters of {G/[G,G]}.

Example 4 For {G=GL_n}, {n=(n-1)+1}, and the representations {\rho_i} from the previous example have the property that {\rho_i} with {1\leq i<n} is “truly” distinguished and the determinant representation {\rho_n} comes from the center.

Remark 1 Recall that a weight {\chi=\exp\circ\overline{\chi}\circ\log} of a representation {(V,\rho)} of {G} is a generalized eigenvalue associated to a non-trivial {A}-invariant subspace, i.e., {\chi} is a weight whenever

\displaystyle \{0\}\neq V_{\chi}=\{v\in V: \rho(a)v=\chi(a)v:=\exp(\overline{\chi}(\log a))v \, \, \,\, \forall\, \, a\in A\}.

The weights are partially ordered via {\overline{\chi}_1\leq\overline{\chi}_2} if and only if {\overline{\chi}_1(\log a)\leq \overline{\chi}_2(\log a)} for all {a\in A}, and any irreducible representation {(V,\rho)} possesses an unique maximal weight {\overline{\chi}_{\rho}} (and, as it turns out, {V_{\chi_{\rho}}} is one-dimensional).In this context, the distinguished representations {\rho_1,\dots,\rho_{d_s}} form a family of representations whose maximal weights {\overline{\chi}_{\rho_i}} provide a basis of {Hom(\mathfrak{a},\mathbb{R})}.

A matrix {T\in GL_n} is proximal when its projective action on {\mathbb{P}(\mathbb{R}^n)} possesses an attracting fixed point {v_T^+} and a repulsive hyperplane {H_T^<}. Also, an element {g\in G} is called {G}proximal if and only if the matrices {\rho_i(g)} are proximal for all {1\leq i\leq d_s} (or, equivalently, {\lambda(g)\in \mathfrak{a}^{++}}).

A matrix {T\in GL_n} is {(r,\varepsilon)}-proximal whenever {T} is proximal, {d(v_T^+, H_T^<)\geq 2r}, and {d(Tx, Ty)\leq \varepsilon d(x,y)} for all {d(x,H_T^<)\geq\varepsilon}, {d(y,H_T^<)\geq\varepsilon} (where {d} is the Fubini-Study on the projective space {\mathbb{P}(\mathbb{R}^n)}). Moreover, {g\in G} is {(G,r,\varepsilon)}proximal if and only if the matrices {\rho_i(g)} are {(r,\varepsilon)}-proximal for all {1\leq i\leq d_s}.

A beautiful theorem of Abels–Margulis–Soifer asserts that {(G,r,\varepsilon)}-proximal elements are really abundant: given a Zariski-dense monoid {\Gamma} of {G}, there exists {r=r(\Gamma)>0} such that for all {0<\varepsilon<r}, one can find a finite subset {F=F(\Gamma, r, \varepsilon)\subset\Gamma} with the property that for any {g\in G}, one can find {f\in F} with {gf} {(G,r,\varepsilon)}-proximal.

In the previous post of this series, we saw that Abels–Margulis–Soifer was at the heart of Breuillard–Sert proof of the following result:

Theorem 1 If {S\subset G} is compact and the monoid {\langle S\rangle} generated by {S} is Zariski-dense in {G}, then the sequences {\frac{1}{n}\kappa(S^n)} and {\frac{1}{n}\lambda(S^n)} converge in Hausdorff topology to a compact subset {J(S)\subset \mathfrak{a}^+} called the joint spectrum of {S}.

After this brief review of the definition of the joint spectrum, let us now study some of its basic properties.

2. Convexity of the joint spectrum

Theorem 2 {J(S)} is a convex subset of {\mathfrak{a}^+}.

Remark 2 Later, we will see some sufficient conditions to get {\textrm{int}(J(S))\neq\emptyset}.

Similarly to the proof of Theorem 1, some important ideas behind the proof of Theorem 2 are:

  • the Jordan projection {\lambda} behaves well under powers: {\lambda(g^k)=k\lambda(g)};
  • the Cartan projection {\kappa} is subadditive: {\|\kappa(gh)-\kappa(g)\|=O_G(\|h\|)};
  • the Cartan and Jordan projections of proximal elements are comparable: there is a constant {C_r>0} such that {|\lambda(g)-\kappa(g)|\leq C_r} for all {g\in G} {(G,r,\varepsilon)}-proximal;
  • Abels–Margulis–Soifer provides a huge supply of proximal elements.

We start to formalize these ideas with the following lemma:

Lemma 3 If {g} and {h} are {(G,r,\varepsilon)}-proximal elements, then there are {u\in\langle S\rangle} and {M>0} such that {\|\lambda(g^k u h^k u)-k\lambda(g)-k\lambda(h)\|\leq M} for all {k\in\mathbb{N}}.

Proof: After replacing {g} by the matrix {\rho_i(g)}, our task is reduced to study the behaviours of the eigenvalues {\lambda_1(g)} of largest moduli of proximal matrices {g}.

By definition of proximality, the matrices {\frac{1}{\lambda_1(g^k)}g^k} converge to a projection {\pi_g} on {\mathbb{R}v_g^+} parallel to {H_g^<} as {k\rightarrow\infty}. Also, an analogous statement is valid for {h}. In particular, for any {u}, one has

\displaystyle \frac{g^k u h^k u}{\lambda_1(g^k)\lambda_1(h^k)}\rightarrow \pi_g u \pi_h u

as {k\rightarrow\infty}.

It is not hard to show that there exists {u\in \langle S\rangle} such that {\pi_g u \pi_h u} is not nilpotent: in fact, this happens because {\langle S\rangle} is Zariski-dense and the nilpotency condition can be describe in polynomial terms. In particular, {|\lambda_1(\pi_g u \pi_h u)|>0} and, by continuity, there exists {M>1} with

\displaystyle \left|\log\frac{|\lambda_1(g^k u h^k u)|}{|\lambda_1(g^k)|\cdot|\lambda(h^k)|}\right|\leq \log M

for all {k\in\mathbb{N}}. This ends the proof. \Box

At this point, we are ready to prove Theorem 2. Since {J(S)} is a compact subset of {\mathfrak{a}^+}, the proof of its convexity is reduced to show that {\frac{x+y}{2}\in J(S)} for all {x, y\in J(S)}.

For this sake, we begin by applying Abels–Margulis–Soifer theorem in order to fix {r>0} and a finite subset {F\subset\langle S\rangle} so that for any {w\in G} we can find {f\in F} with {wf} {(G,r,\varepsilon)}-proximal. By definition, there exists {m_0\in\mathbb{N}} such that any {f\in F} satisfies {f\in S^{n_f}} for some {n_f\leq m_0}.

Next, we consider {x, y\in J(S)} and we recall that {J(S)=\lim\frac{1}{n}\lambda(S^n)=\lim\frac{1}{n}\kappa(S^n)}. Hence, given {\delta>0}, we have that for all {n\in \mathbb{N}} sufficiently large, there are {g, h\in S^n} with

\displaystyle |\frac{1}{n}\kappa(g)-x|<\delta \quad \textrm{and} \quad |\frac{1}{n}\kappa(h)-y|<\delta.

Now, we select {f_g, f_h\in F} with {gf_g} and {h f_h} {(G,r,\varepsilon)}-proximal. Recall that, by proximality, there exists a constant {C_r>0} with

\displaystyle \|\lambda(gf_g)-\kappa(gf_g)\|\leq C_r \quad \textrm{and} \quad \|\kappa(gf_g)-\kappa(g)\|\leq C_r

(and an analogous statement is also true for {hf_h}). Furthermore, by Lemma 3, there are {u\in\langle S\rangle}, say {u\in S^{p(n)}}, and {M>0} with

\displaystyle \|\lambda((g f_g)^k u (h f_h)^k u) - k\lambda(g f_g) - k\lambda(h f_h)\|\leq M

for all {k\in\mathbb{N}}. Observe that {(g f_g)^k u (h f_h)^k u\in S^{2kn+2k(n_f+n_g)+2p(n)}}.

By dividing by {2kn+2k(n_f+n_g)+2p(n)}, by taking {n} large (so that {n\gg m_0\geq n_f, n_g}) and by letting {k\rightarrow \infty} (so that {k\gg p(n)}), we see that

\displaystyle \|\frac{1}{2kn+2k(n_f+n_g)+2p(n)}\lambda((g f_g)^k u (h f_h)^k u)-\frac{x+y}{2}\|\leq 2\delta

for {n} and {k} sufficiently large.

Since {\delta>0} is arbitrary and {J(S)} is closed, this proves that {(x+y)/2\in J(S)}. This completes the proof of Theorem 2.

3. Continuity properties of the joint spectrum

3.1. Domination and continuity

Definition 4 We say that {S\subset GL_d(\mathbb{R})} is {1}-dominated if there exists {\delta>0} such that

\displaystyle \frac{a_2(g)}{a_1(g)}\leq (1-\delta)^n

for all {n} sufficiently large and {g\in S^n}. (Recall that {a_1(g)\geq a_2(g)\geq\dots\geq a_d(g)} are the singular values of {g}.)

Definition 5 We say that {S\subset G} is {G}-dominated if {\rho_i(S)} is {1}-dominated for all {1\leq i\leq d_s}.

Remark 3 If {S} is {G}-dominated, then it is possible to show that the joint spectrum {J(S)} is well-defined even when {S} is not Zariski dense in {G}.

The next proposition asserts that the notion of {G}-domination generalizes the concept of matrices with simple spectrum (i.e., all of its eigenvalues have distinct moduli and multiplicity one).

Proposition 6 {S} is {G}-dominated if and only if {J(S)\subset\mathfrak{a}^{++}}.

On the other hand, the notion of {1}-domination is related to Schottky families.

Definition 7 We say that {E\subset GL_d(\mathbb{R})} is a {(r,\varepsilon)}-Schottky family if

  • (a) any {\gamma\in E} is {(r,\varepsilon)}-proximal;
  • (b) {d(v_{\gamma}^+, H_{\gamma'}^<)\geq 6\varepsilon} for all {\gamma, \gamma'\in E}.

Proposition 8 {S\subset GL_d(\mathbb{R})} is {1}-dominated {\iff} there are {n\in\mathbb{N}} and {0<\varepsilon<r} so that {S^n} is a {(r,\varepsilon)}-Schottky family.

Proof: Let us first establish the implication {\Longleftarrow}. It is not hard to see that if {S^n} is {1}-dominated, then {S} is {1}-dominated. Therefore, we can assume that {S} is a {(r,\varepsilon)}-Schottky family. At this point, we invoke the following lemma due to Breuillard–Gelander:

Lemma 9 (Breuillard–Gelander) If {g\in GL_d(\mathbb{R})} is {\varepsilon}-Lipschitz on an non-empty open subset {\Omega} of {\mathbb{P}(\mathbb{R}^d)}, then {a_2(g)/a_1(g)\leq \varepsilon/\sqrt{1-\varepsilon^2}}.

Proof: Thanks to the {KAK} decomposition, we can assume that {g=\textrm{diag}(a_1,\dots, a_d)}. Given {[v]\in\Omega} and {\delta>0} sufficiently small, our assumption on {g} implies that {d([gv], [gv+\delta ge_1])<\varepsilon d([v],[v+\delta e_1])} and {d([gv], [gv+\delta ge_2])<\varepsilon d([v],[v+\delta e_2])}. These inequalities imply the desired fact that {a_2(g)/a_1(g)\leq \varepsilon/\sqrt{1-\varepsilon^2}} after some computations with the Fubini-Study metric {d}. \Box

If {S} is a {(r,\varepsilon)}-Schottky family, then all elements of {S^n} are {\varepsilon^n}-Lipschitz on a neighborhood of any fixed {v_s^+}, {s\in S} for all {n\in\mathbb{N}}. By the previous lemma, we conclude that {a_2(g)/a_1(g)\leq 2\varepsilon^n} for all {n} sufficiently large and {g\in S^n}. Thus, {S} is {1}-dominated.

Let us now prove the implication {\Longrightarrow}. For this sake, we use a result of Bochi–Gourmelon (justifying the nomenclature “domination”): {S} is {1}-dominated if and only if there is a dominated splitting for a natural linear cocycle over the full shift dynamics on {S^{\mathbb{Z}}}, i.e.,

  • Splitting condition: there are continuous maps {E^u:S^{\mathbb{Z}}\rightarrow\mathbb{P}(\mathbb{R}^d)} and {E^s:S^{\mathbb{Z}}\rightarrow\textrm{Gr}(d-1,\mathbb{R})} such that {\mathbb{R}^d=E^u(x)\oplus E^s(x)} for all {x\in S^{\mathbb{Z}}} (here, {\textrm{Gr}(d-1,\mathbb{R})} is the Grassmannian of hyperplanes of {\mathbb{R}^d});
  • Invariance condition: {E^u(\sigma x) = x_0 (E^u(x))} and {E^s(\sigma x)=x_0(E^s(x))} for all {x=(\dots, x_{-1},x_0,x_1,\dots)\in S^{\mathbb{Z}}} (here, {\sigma} denotes the left shift dynamics {\sigma((x_i)_{i\in\mathbb{Z}}) = (x_{i+1})_{i\in\mathbb{Z}}});
  • Domination condition: the weakest contraction along {E^u} dominates the strongest expansion along {E^s}, that is, there are {C>0} and {0<\tau<1} such that {\|x_{n-1}\dots x_0|_{E^s(x)}\| \leq C\tau^n \|x_{n-1}\dots x_0|_{E^u(x)}\|} {\forall} {x=(\dots, x_{-1},x_0,x_1,\dots)\in S^{\mathbb{Z}}}.

Remark 4 For {S\subset SL(2,\mathbb{R})}, the equivalence between {1}-domination and the presence of dominated splittings was established by Yoccoz.

An important metaprinciple in Dynamics (going back to the classical proofs of the stable manifold theorem) asserts that “stable spaces depend only on the future orbit”. In our present context, this is reflected by the fact that one can show that {E^s(x)} depends only on {x_0, x_1,\dots} and {E^u(x)} depends only on {x_{-1}, x_{-2}, \dots} for all {x=(\dots, x_{-1},x_0,x_1,\dots)\in S^{\mathbb{Z}}}.

An interesting consequence of this fact is the following statement about the “non-existence of tangencies between {E^u} and {E^s}”: if {S} is {1}-dominated, then {E^u(x)\notin E^s(y)} for all {x, y\in S^{\mathbb{Z}}}. Indeed, this statement can be easily obtained by contradiction: if {E^u(x)\in E^s(y)} for some {x=(\dots, x_{-1}, x_0, x_1,\dots)} and {y=(\dots, y_{-1}, y_0, y_1,\dots)}, then {z:=(\dots, x_{-2}, x_{-1}, y_0, y_1,\dots)} has the property that {E^u(z)=E^u(x)} and {E^s(z)=E^s(y)}. Hence, {E^u(z) + E^s(z)=E^s(z)\neq \mathbb{R}^d}, a contradiction with the splitting condition above.

At this stage, we are ready to show that if {S} is {1}-dominated, then {S^n} is a {(r,\varepsilon)}-Schottky family for some {n\in\mathbb{N}} and {0<\varepsilon\leq r}. In fact, given {g\in S^n}, let {x(g)\in S^{\mathbb{Z}}} be the periodic sequence obtained by infinite concatenation of the word {g}. We affirm that, for {n} sufficiently large, {g} is proximal with {v_g^+= E^u(x(g))} and {H_g^<=E^s(x(g))}, and {\varepsilon}-Lipschitz outside the {\varepsilon}-neighborhood of {H_g^<}. This happens because the compactness of {S^{\mathbb{Z}}} and the non-existence of tangencies between {E^u} and {E^s} provide an uniform transversality between {E^u} and {E^s}. By combining this information with the domination condition above (and the fact that {C\tau^n\ll 1} for {n} sufficiently large), a small linear-algebraic computation reveals that any {g\in S^n} is proximal and {\varepsilon}-Lipschitz outside the {\varepsilon}-neighborhood of {H_g^<=E^s(x(g))} for adequate choices of {\varepsilon>0} and {n\in\mathbb{N}}. \Box

The proof of the previous proposition gave a clear link between {1}-domination and the notion of dominated splittings. Since a dominated splitting is robust under small perturbations (because they are detected by variants of the so-called cone field criterion), a direct consequence of the proof of the proposition above is:

Corollary 10 The {G}-domination property is open: if {S} is {G}-dominated, then any {S'} included in a sufficiently small neighborhood of {S} is also {G}-dominated.

The previous proposition also links {1}-domination to Schottky families and, as it turns out, this is a key ingredient to obtain the continuity of the joint spectrum in the presence of domination.

Theorem 11 If {S_0} is {G}-dominated, then the map {S\mapsto J(S)} is continuous at {S_0}.

Very roughly speaking, the proof of this result relies on the fact that if a matrix is “very Schottky” (like a huge power of a proximal matrix), then this matrix is quite close to a rank 1 operator and, in this regime, the Jordan projection {\lambda} behaves in an “almost additive” way.

3.2. Examples of discontinuity

3.2.1. Calculation of a joint spectrum in {SL_2(\mathbb{R})}

Recall that {SL_2(\mathbb{R})} acts on Poincaré disk {\mathbb{D}} by isometries of the hyperbolic metric. Consider {S=\{a,b\}}, where {a} and {b} are loxodromic elements of {SL_2(\mathbb{R})} acting by translations along disjoint oriented geodesic axis {\rho_a} and {\rho_b} on {\mathbb{D}} from {x_a^-\in\partial\mathbb{D}} to {x_a^+\in\partial\mathbb{D}} and from {x_b^-\in\partial\mathbb{D}} to {x_b^+\in\partial\mathbb{D}}. We assume that the endpoints of the axes {\rho_a} and {\rho_b} are cyclically order on {\partial\mathbb{D}} as {x_a^-, x_b^-, x_b^+, x_a^+}, and we denote by {\tau_a=2\log\lambda_1(a)} and {\tau_b=2\log\lambda_1(b)} the translation lengths of {a} and {b} along {\rho_a} and {\rho_b}.

In the sequel, we want to compute {J(S)\subset\mathbb{R}} and, for this sake, we need to understand {\frac{1}{n}\log\lambda_1(w(a,b))} where {w(a,b)} is a word of length {n} on {a} and {b}.

Proposition 12 If {a} and {b} are elements of {SL_2(\mathbb{R})} as above, {d>0} denotes the distance between the axes of {a} and {b}, and {\tau_b=\tau_a+2d+1}, then {J(S)} is the interval

\displaystyle J(S)=[\tau_a/2, \tau_b/2].

Proof: One can show (using hyperbolic geometry) that {ab} is a loxodromic element whose axis stays between the axes of {a} and {b} while going from a point in {[x_a^-, x_b^-]} to a point in {[x_b^+, x_a^+]}, and the translation length of {ab} satisfies

\displaystyle \cosh(\tau_{ab}/2) = \cosh(d)\sinh(\tau_a/2)\sinh(\tau_b/2)+\cosh(\tau_a/2)\cosh(\tau_b/2).

In particular, {\tau_a+\tau_b\leq \tau_{ab}\leq \tau_a+\tau_b+2d}.

We affirm that if {w(a,b)} is a word on {a} and {b} and {\widetilde{w}(a,b)} is a word obtained from {w(a,b)} by replacing some letter {a} by {b}, then {\tau_{\widetilde{w}}\geq \tau_w+1}. In fact, by performing a conjugation if necessary, we can assume that {w=w'a} and {\widetilde{w}=w'b}, so that {\tau_{\widetilde{w}}\geq\tau_{w'}+\tau_b=\tau_{w'}+\tau_a+2d+1} and {\tau_w\leq\tau_{w'}+\tau_a+2d}.

Therefore, if we start in {S^n} with {a^n} and we successively replace {a} by {b} until we reach {b^n}, then we see from the claim in the previous paragraph that {\frac{1}{n}\lambda(S^n)} becomes denser in {[\tau_a/2, \tau_b/2]} as {n\rightarrow\infty}. This proves that {J(S)=[\tau_a/2,\tau_b/2]}. \Box

3.2.2. Some joint spectra in {GL_2(\mathbb{R})}

Let {a, b\in SL_2(\mathbb{R})} as above and fix {\alpha>0}. We assume that there exists {k\in\mathbb{N}} such that {b^{-1}=Ra^kR} where {R} is the rotation by {\pi/2}.

The joint spectrum {J(S_{\infty})} of {S_{\infty}=\{\alpha\cdot\textrm{Id}, a, b\}} in the plane with axis {\log\lambda_1} and {\log\lambda_2} is a triangle with vertex at {(\log\alpha, \log\alpha)}, intersecting the {\log\lambda_1}-axis on the interval {[\log\lambda_1(a), \log\lambda_1(b)]}, and the side opposite to the vertex {(\log\alpha, \log\alpha)} contained in the line {\log\lambda_2=-\log\lambda_1}. Indeed, one eventually get this description of {J(S_{\infty})} because {S_{\infty}^n}, {n\in\mathbb{N}}, can be computed explicitly in terms of the joint spectrum of {\{a,b\}} thanks to the fact that {\alpha\cdot\textrm{Id}} commutes with {a} and {b}. Note that {(0,0)\notin J(S_{\infty})}.

Let us now consider {S_m=\{\alpha\cdot R_m, a, b\}}, where {R_m} denotes the rotation by {\pi/2m}. We affirm that {(0,0)\in J(S_m)} for all {m\in\mathbb{N}} and, a fortiori, {J(.)} is discontinuous at {S_{\infty}} (because {S_m\rightarrow S_{\infty}} as {m\rightarrow\infty}). In fact, given {m\in\mathbb{N}}, since {b^{-1}=Ra^kR}, the word

\displaystyle w_n = b^n(\alpha\cdot R_m)^m a^{kn} (\alpha\cdot R_m)^m\in S_m^{(k+1)n+2m}

equals to {\alpha^{2m}\cdot \textrm{Id}}. Therefore,

\displaystyle \frac{2m(\log\alpha,\log\alpha)}{(k+1)n+2m}=\frac{1}{(k+1)n+2m}\lambda(w_n)\in\frac{1}{(k+1)n+2m}\lambda(S_m^{(k+1)n+2m})

and, by letting {n\rightarrow\infty}, we conclude that {(0,0)\in J(S_m)}, as desired.

4. Prescribing the joint spectrum

We close this post with a brief sketch of the following result:

Theorem 13

  • (1) If {\mathcal{C}} is a convex body dans {\mathfrak{a}^+}, there exists a compact subset {S} of {G} generating a Zariski-dense monoid such that {J(S)=\mathcal{C}}.
  • (2) Moreover, if {\mathcal{C}} is a convex polyhedron with a finite number of vertices, then there exists a finite subset {S\subset G} generating a Zariski-dense monoid such that {J(S)=\mathcal{C}}.

Proof: (1) If we forget about the Zariski-denseness condition, then we could take simply {S=\exp(\mathcal{C})}. In order to respect the Zariski-density constraint, we fix {a_0\in\textrm{int}(\mathcal{C})} and we set {S=\exp(\mathcal{C})\cup \exp(a_0)V} where {V} is a small neighborhood of the identity. In this way, the monoid generated by {S} is Zariski-dense and it is possible to check that {J(S)=\mathcal{C}} whenever {V} is sufficiently small.

(2) Given a finite set {\mathcal{C}_0} whose convex hull is {\mathcal{C}}, we can take {S=\exp(\mathcal{C}_0)\cup \exp(a_0) F} where {F\subset V} is a finite set with sufficiently many points so that the monoid generated by {S} is Zariski-dense. \Box

Posted by: matheuscmss | December 22, 2019

Dartyge’s talk on ellipsephic integers

Last November, I attended the beautiful conference Prime Numbers, Determinism and Pseudorandomness at CIRM. This conference was originally prepared to celebrate the 60th birthday of Christian Mauduit, but unfortunately a tragic event during the summer of 2019 made that this conference ended up becoming a celebration of the memory of Christian.

The links to the titles, abstracts, slides and videos for the talks of this excellent meeting can be found here.

In this blog post, I would like to transcript my notes for the amazing survey talk “On ellipsephic integers” by Cécile Dartyge on one of Christian’s favorite topics in Analytic Number Theory, namely, the statistics of integers missing some digits.

Of course, all mistakes in the sequel are my sole responsibility.

1. Introduction

Ellipsephic integers refers to a collection of integers with missing digits in a certain basis (e.g., all integers whose representation in basis 10 doesn’t contain the digit 9). Christian Mauduit proposed this nomenclature partly because ellipsis = missing and psiphic = digit in Greek.

Formally, we consider a basis {r\in\mathbb{N}}, {r\geq 3}, and a subset {\mathcal{D}} of {\{0, 1, \dots, r-1\}} of cardinality {2\leq\#\mathcal{D}\leq r-1}. The corresponding set of ellipsephic integers {W_{\mathcal{D}}} is

\displaystyle W_{\mathcal{D}}:=\left\{n=\sum\limits_{j=0}^k a_j r^j: a_j\in\mathcal{D} \, \, \,\forall\,1\leq j\leq k\right\}.

The subset of ellipsephic integers below a certain threshold {x} is denoted by

\displaystyle W_{\mathcal{D}}(x):=\{n\in W_{\mathcal{D}}: n< x\}.

For the sake of exposition, we shall assume from now on that {0\in\mathcal{D}} and

\displaystyle gcd(d\in\mathcal{D})=1

unless it is explicitly said otherwise.

2. Ellipsephic integers on arithmetic progressions

Let {W_{\mathcal{D}}(x,a,q) := \{n\in W_{\mathcal{D}}(x): n\equiv a \, (\textrm{mod } q)\}}. Despite their sparseness, it was proved by Erdös, Mauduit and Sárközy that ellipsephic integers behave well (i.e., “à la Siegel-Walfisz”) along arithmetic progressions:

Theorem 1 (Erdös–Mauduit–Sárközy) There are two constants {c_1=c_1(r,\mathcal{D})} and {c_2=c_2(r,\mathcal{D})} such that

\displaystyle \left|\#W_{\mathcal{D}}(x,a,q)-\frac{\#W_{\mathcal{D}}(x)}{q}\right|\ll_{r,\mathcal{D}}\frac{\#W_{\mathcal{D}}(x)}{q}\exp\left(-c_2\frac{\log x}{\log q}\right)

for all {a\in\mathbb{N}}, {gcd(q,r(r-1))=1}, {q\leq \exp(c_1\sqrt{\log x})} and {x} sufficiently large.

Proof: As it is usual in this kind of counting problem, one relies on exponential sums. More precisely, note that

\displaystyle \#W_{\mathcal{D}}(x,a,q) = \frac{1}{q}\sum\limits_{h=1}^q\sum\limits_{n\in W_{\mathcal{D}}(x)} e\left(\frac{h(n-a)}{q}\right)

where {e(t):=\exp(2\pi i t)}. The “main term” {\#W_{\mathcal{D}}(x)/q} comes from the case {h=q}, so that our task consists into estimating the “error term”. For this sake, one has essentially to study

\displaystyle F_{N,\mathcal{D}}(h/q):=\sum\limits_{n\in W_{\mathcal{D}}(x)} e\left(\frac{hn}{q}\right)

where {x:=r^N}. Observe that

\displaystyle F_{N,\mathcal{D}}(h/q)=\prod\limits_{j=1}^{N-1}\left(\sum\limits_{d\in\mathcal{D}}e\left(\frac{hdr^j}{q}\right)\right):=\prod\limits_{j=1}^{N-1} u_{\mathcal{D}}(hr^j/q).

The terms {u_{\mathcal{D}}(hr^j/q)} are controlled thanks to the following lemma (giving some saving over the trivial bound {|u_{\mathcal{D}}(\alpha)|\leq \#\mathcal{D}} for all {\alpha\in\mathbb{R}}):

Lemma 2 (Erdös–Mauduit–Sárközy) Let {t:=\#\mathcal{D}}. For any {\alpha\in\mathbb{R}}, one has

\displaystyle |u_{\mathcal{D}}(\alpha)|\leq t\left(1-\frac{\|\alpha\|^2}{(r-1)^5}\right)

where {\|\alpha\|:=\min\limits_{n\in\mathbb{Z}}|\alpha-n|}.

In order to take full advantage of the saving on the right-hand side of the inequality, one needs the following lemma:

Lemma 3 (Mauduit–Sárközy) For any {\beta\in\mathbb{R}} and {q\leq r^{(N-8)/2}}, one has

\displaystyle \sum\limits_{0\leq k<N}\left\|\beta+\frac{hr^k}{q}\right\|\geq \frac{(r-1)^2}{128}\frac{N}{\log q}

The details of the derivation of the desired theorem from the two lemmas above is explained in Section 4 of Erdös–Mauduit–Sárközy paper. \Box

The methods of Erdös–Mauduit–Sárközy above paved the way to further results about ellipsephic integers. For instance, similarly to Bombieri–Vinogradov theorem, it is natural to expect that the distribution result of Erdös–Mauduit–Sárközy gets better on average: as it turns out, this was done independently by C. Dartyge and C. Mauduit, and S. Konyagin (circa 2000):

Theorem 4 (Dartyge–Mauduit, Konyagin) There exists {\alpha=\alpha(r,t)>0} such that for all {B>0} there exists {A>0} with the property that

\displaystyle \sum\limits_{\substack{q\leq x^{\alpha}/(\log x)^A,\\ \textrm{gcd}(q,r(r-1))=1}}\left|\#W_{\mathcal{D}}(x,a,q)-\frac{\#W_{\mathcal{D}}(x)}{q}\right|\ll \frac{\#W_{\mathcal{D}}(x)}{(\log x)^B}.

Proof: One uses Lemmas 2 and 3 above, a large sieve method, and some bounds on the moments

\displaystyle \int_0^1 |F_{N,\mathcal{D}}(s)|^m \, ds

of the function {F_{N,\mathcal{D}}}. \Box

Remark 1 More recently, K. Aloui, C. Mauduit and M. Mkaouar improved (in 2017) some of the results of Erdös–Mauduit–Sárközy to obtain some distribution results for ellipsephic and palindromic integers.

3. Ellipsephic primes and almost primes

By pursuing sieve methods, Dartyge and Mauduit obtained in 2001 the following result about ellipsephic almost primes:

Theorem 5 (Dartyge–Mauduit) There exists {k=k(r, t)\in\mathbb{N}} such that

\displaystyle \#\{n\in W_{\mathcal{D}}(x): \Omega(n)\leq k\}\gg\frac{\#W_{\mathcal{D}}(x)}{\log x},

where {\Omega(n)} stands for the number of prime factors of {n} (counted with multiplicity).

A natural question motivated by this theorem concerns the determination of explicit values of {k} in the previous statement. The answer to this question is somewhat related to the value of {\alpha} in the last theorem of the previous section and, in this direction, it is possible to show that

  • if {\mathcal{D}=\{0,1\}}, then one can take
    • {k=3} (and {\alpha\sim1/3}) for {r=3},
    • {k=5} (and {\alpha\sim 1/4}) for {r=4}, …,
    • {k=23} for {r=10}, and
    • {k=\frac{8}{\pi}(1+o(1))r} as {r\rightarrow\infty}
  • if {\mathcal{D}=\{0,\dots,r-2\}}, {r\geq 5}, then one can take {k=2}.

In 2009 and 2010, C. Mauduit and J. Rivat proved two conjectures of Gelfond on sums of digits of primes and squares. The methods in these articles gave hope to reach the case {k=1} (of ellipsephic primes) in Dartyge–Mauduit theorem above. This was recently accomplished by J. Maynard in 2016: if {\mathcal{D}=\{0,\dots,r-1\}\setminus\{a_0\}} and {r\geq 10}, then

\displaystyle \#\{p\in W_{\mathcal{D}}(x): p \textrm{ prime}\}\gg \frac{\#W_{\mathcal{D}}(x)}{\log x}

Remark 2 In his thesis, A. Irving got analogous results for palindromic ellipsephic integers with {3} digits in basis {r\geq 4} with two prime factors.

After this brief discussion of ellipsephic almost primes, let us now talk about ellipsephic integers possessing only small prime factors.

4. Friable ellipsephic integers

Recall that a friable integer is an integer without large prime factors. For later reference, we denote the largest prime factor of {n} by

\displaystyle p^+(n):=\max\limits_{\substack{p|n\\p\textrm{ prime}}} p.

It was shown by Erdös–Mauduit–Sárközy that, for any fixed {a\in\mathcal{D}\setminus\{0\}} and for all {\varepsilon>0}, there are infinitely many ellipsephic integers {n\in W_{\mathcal{D}}} of the form {n=\sum\limits_{i=0}^{k-1}ar^i} whose largest prime factor is {p^+(n)\leq n^{\varepsilon}}.

Logically, this results motivates the question to establish the existence of a positive proportion of friable ellipsephic integers. This seems a hard task for arbitrary {\varepsilon>0}, but this problem becomes more tractable for small values of {\varepsilon} when the basis {r} is large enough.

In fact, S. Col showed that there exists {0<\alpha=\alpha(r,\mathcal{D})<1} such that

\displaystyle \#\{n\in W_{\mathcal{D}}(x): p^+(n)<n^{\alpha}\}\gg\#W_{\mathcal{D}}(x).

Moreover, if {\mathcal{D}=\{0,1\}}, then it is possible to take {\alpha=1-\frac{\pi}{4r}\left(1-\frac{3\pi}{4r}\right)} (which is close to one for {r} large). On the other hand, {\alpha} can be taken very small when {\mathcal{D}=\{0,\dots,r-1\}\setminus\{a_0\}} and {r} sufficiently large.

5. Ellipsephic solutions to Vinogradov systems

Vinogradov system is a system of equations on the variables {x_1,\dots, x_s, y_1, \dots, y_s} of the form:

\displaystyle x_1^j+\dots+x_s^j = y_1^j+\dots+y_s^j, \quad 1\leq j\leq k.

A major breakthrough on counting solutions to Vinogradov systems was famously obtained by J. Bourgain, C. Demeter and L. Guth (see also the text and the video of L. Pierce’s Bourbaki seminar talk on this subject).

Concerning ellipsephic solutions to Vinogradov systems (i.e., solutions with {x_i, y_i\in W_{\mathcal{D}}} for all {1\leq i\leq s}), Kirsty Briggs showed that for {k=2}, {r=p} prime and {\mathcal{D}=\{d^2: d<\sqrt{p}\}}, the trivial bound {\leq \#W_{\mathcal{D}}(x)^{2s}} on the number of solutions of the Vinogradov system

\displaystyle x_1+\dots+ x_s=y_1+\dots+y_s,

\displaystyle x_1^2+\dots+x_s^2=y_1^2+\dots+y_s^2

with {x_i, y_i\in W_{\mathcal{D}}(x)} can be improved into {\ll_{\varepsilon} \#W_{\mathcal{D}}(x)^{2s-6+\varepsilon}} {\forall \,\varepsilon>0} whenever {s\geq 6}. (In particular, this result is saying that in the case {s=6}, the main contribution to the number of ellipsephic solutions of the corresponding Vinogradov system comes from the trivial solutions {x_i=y_i\in W_{\mathcal{D}}(x)}.)

6. Ellipsephic numbers in finite fields

The notion of finite-field analogs of ellipsephic numbers was studied by several authors including Dartyge, Mauduit and Sárközy.

In order to explain some results in this direction, let us setup some notations. Let {q=p^r} be the power of a prime number {p} and denote by {z\in \mathbb{F}_q}primitive element generating a basis {\mathcal{B}=\{1,z,\dots, z^{r-1}\}} of {\mathbb{F}_q} over {\mathbb{F}_p}. In this way, we can represent a number {x\in\mathbb{F}_q} as

\displaystyle x=\sum\limits_{i=0}^{r-1} c_i z^i \quad \textrm{ with }0\leq c_i<p.

Given a set of digits {\mathcal{D}\subset\{0,\dots, p-1\}} with {2\leq \#\mathcal{D}\leq p-1}, the associated subset of ellipsephic numbers is

\displaystyle W_{\mathcal{D}}:=\left\{\sum\limits_{i=0}^{r-1} c_iz^i: c_i\in\mathcal{D} \,\,\,\forall\, 0\leq i\leq r-1\right\}

Given a polynomial {f(x)\in\mathbb{F}_q[x]}, we can study the set of its ellipsephic values via the set

\displaystyle W_{\mathcal{D}}(f):=\{a\in\mathbb{F}_q: f(a)\in W_{\mathcal{D}}\}.

The size of {W_{\mathcal{D}}(f)} is described by the following theorem:

Theorem 6 (Dartyge–Mauduit–Sárközy) If {\textrm{deg}(f)=n}, then

\displaystyle \left|\#W_{\mathcal{D}}(f)-\#W_{\mathcal{D}}\right|\leq \frac{n-1}{\sqrt{q}}\left(\#\mathcal{D}+p\sqrt{p-\#\mathcal{D}}\right)

This result is specially interesting when {\mathcal{D}} contains a positive proportion of {\mathbb{F}_p}. Moreover, it can be improved when {\mathcal{D}} contains consecutive digits.

More recently, a better result was obtained by R. Dietmann, C. Elsholtz and I. Shparlinski for the case {f(x)=x^2}. Finally, the reader can consult the work of C. Swaenepoel for further results.

Posted by: matheuscmss | December 21, 2019

Breuillard–Sert’s joint spectrum (II)

In the previous post of this series, we gave the statements of some of the results of Breuillard and Sert on the definition and basic properties of the joint spectrum, and we promised to discuss the proofs in subsequent posts.

Today, after a long hiatus, I’ll try to accomplish part of this promise. More precise, I’ll transcript below my notes for the two talks (by Rodolfo Gutiérrez-Romo and myself) aiming to explain to the participants of our groupe de travail the proof of the first portion of Theorem 5 in the previous post, i.e., the convergence of {\frac{1}{n}\kappa(S^n)} (cf. Theorem 5 below), the convergence of {\frac{1}{n}\lambda(S^n)} (cf. Theorem 7 below), and the equality of the limits (cf. Theorem 9 below).

Evidently, all mistakes in what follows are my sole responsibility.

1. Spectral radius formula revisited

Let {G} be a reductive linear algebraic group. Recall that the Cartan projection {\kappa:G\rightarrow\mathfrak{a}^+} and Jordan projection {\lambda:G\rightarrow\mathfrak{a}^+} were defined in the previous post via the Cartan decomposition {g\in K\exp(\kappa(g))K} and the Jordan–Chevalley decomposition {g=g_e g_h g_u} with {g_e} elliptic, {g_u} unipotent, and {g_h} “hyperbolic” conjugated to {\exp(\lambda(g))}.

The semisimple rank {d_s} of {G} is {d_s=\textrm{dim}(A)-\textrm{dim}(Z(G))} where {A} is a maximal torus of {G} and {Z(G)} is the center of {G}. We denote by {\overline{\alpha}_i\in\textrm{Hom}(\mathfrak{a},\mathbb{R})}, {1\leq i\leq d:=\textrm{dim}(A)}system of roots such that {\Pi:=\{\overline{\alpha}_1,\dots, \overline{\alpha}_{d_s}\}} is a base of simple roots.

Each {\overline{\alpha}\in\Pi} induces a weight {\omega_{\overline{\alpha}}\in\textrm{Hom}(\mathfrak{a},\mathbb{R})} satisfying {\omega_{\overline{\alpha}}|_{\mathfrak{a}_Z}=0} and {\langle\omega_{\overline{\alpha}},\overline{\beta}\rangle=0} for all {\overline{\beta}\in\Pi\setminus\{\overline{\alpha}\}}, where {\mathfrak{a}_Z} is the Lie subalgebra of {A\cap Z(G)} and {\langle.,.\rangle} is a fixed extension of the Killing form on the Lie subalgebra {\mathfrak{a}_S} of {A\cap [G,G]} to the Lie algebra {\mathfrak{a}} of {A} such that {\mathfrak{a}=\mathfrak{a}_S\oplus\mathfrak{a}_Z} becomes an orthogonal decomposition.

The weights {\omega_{\overline{\alpha}_i}}, {1\leq i\leq d_s}, are the highest weights of distinguished representations {\rho_i}, {1\leq i\leq d_s} of {G}. One has

\displaystyle \omega_{\overline{\alpha}_i}(\kappa(g)) = \log \|\rho_i(g)\|_i \textrm{ and } \omega_{\overline{\alpha}_i}(\lambda(g)) = \log |\lambda_1(\rho_i(g))|

where {\|.\|_i} is a choice of {\rho_i(K)}-invariant norm with {\rho_i(A)} diagonalisable in an orthonormal basis and {\rho_i(G)} stable under the adjoint operation, and {\lambda_1(M)} denotes the top eigenvalue of a matrix {M}. In particular, the Cartan projection {\kappa(g)} is represented by a vector of logarithms of norms of matrices, the Jordan projection {\lambda(g)} is represented by a vector of the logarithms of the moduli of top eigenvalues of matrices, and, a fortiori, the usual formula for the spectral radius implies that:

Lemma 1 One has

\displaystyle \lim\limits_{n\rightarrow\infty}\frac{1}{n}\kappa(g^n) = \lambda(g)

for every {g\in G}.

2. Proximal elements and Cartan projections

As we indicated in § 2.1 of the previous post of this series, the convergence of {\frac{1}{n}\kappa(S^n)} relies on the notion of proximal matrices.

Definition 2 Let {d([x],[y]):=\frac{\|x\wedge y\|}{\|x\|\cdot\|y\|}} be the Fubini-Study metric on the projective space {\mathbb{P}(V)} of a finite-dimensional real vector space {V} equipped with an Euclidean norm {\|.\|}.Given {0<\varepsilon\leq r}, we say that {g\in GL(V)} is a {(r,\varepsilon)}-proximal matrix whenever:

  • {g} has an unique eigenvalue of maximal modulus with eigendirection {v_g^+\in\mathbb{P}(V)} and {g}-invariant supplementary hyperplane {H_g^{<}\subset\mathbb{P}(V)};
  • {d(v_g^+, H_g^{<})\geq 2r};
  • {d(gx, gy)\leq \varepsilon d(x,y)} for all {x, y\in\mathbb{P}(V)} with {d(x,H_g^{<})\geq \varepsilon} and {d(y,H_g^{<})\geq \varepsilon}.

In general, we say that an element {g\in G} of a reductive linear algebraic group {G} with distinguished representations {\rho_i}, {1\leq i\leq d_s}, is {(G,r,\varepsilon)}proximal when the matrices {\rho_i(g)} are {(r,\varepsilon)}-proximal for all {1\leq i\leq d_s}.

A basic feature of proximal elements is the fact their Cartan and Jordan projections are comparable (cf. Lemmas 2.15 and 2.16 of Breuillard–Sert paper extracted from Benoist’s paper).

Lemma 3 There is a constant {C_G>0} such that

\displaystyle \|\kappa(h_1\dots h_n)\|\leq C_G(\|\kappa(h_1)\|+\dots+\|\kappa(h_n)\|)


\displaystyle \|\kappa(h_1 g h_2) - \kappa(g)\|\leq C_G(\|\kappa(h_1)\|+\|\kappa(h_2)\|)

for all {g,h_1,h_2,\dots, h_n\in G}.For each {r>0}, there is a constant {C_r>0} such that the Cartan and Jordan projections of any {(G,r,\varepsilon)}-proximal element {g\in G} satisfy

\displaystyle \|\kappa(g)-\lambda(g)\|\leq C_r.

Another crucial feature of proximal elements (discovered by Abels, Margulis and Soifer, see Theorem 4.1 of their paper) is their ubiquity in Zariski dense monoids:

Theorem 4 (Abel–Margulis–Soifer) Let {G} be a connected, reductive, real Lie group. Suppose that {\Gamma\subset G} is a Zariski dense monoid. Then, there exists {r=r(\Gamma)>0} such that, for all {0<\varepsilon\leq r}, there exists a finite subset {F=F(\Gamma, r,\varepsilon)\subset \Gamma} with the property: given {g\in G}, there exists {f\in F} so that {gf} is {(G,r,\varepsilon)}-proximal.

At this point, we are ready to prove the convergence of Cartan projections:

Theorem 5 Let {G} be a connected reductive real Lie group and suppose that {S\subset G} is a compact subset generating a Zariski dense subgroup. Then,

\displaystyle \frac{1}{n}\kappa(S^n)

converges in the Hausdorff topology as {n\rightarrow\infty}.

Proof: By Lemma 2 of the previous post, our task is reduced to show that {\frac{1}{m}\kappa(S^m)}, {m\in\mathbb{N}}, stays in a compact region of {\mathfrak{a}}, and for each {\delta>0}, there exists {n_0\in\mathbb{N}} such that {\limsup\limits_{m\rightarrow\infty}d(x,\frac{1}{m}\kappa(S^m))\leq\delta} for all {x\in\frac{1}{n}\kappa(S^n)} and {n\geq n_0}.

By Lemma 3, there exists a constant {C_G>0} such that

\displaystyle \frac{1}{m}\kappa(g)\leq C_G\sup\limits_{s\in S}\|\kappa(s)\|:=C_{G,S}

for all {g\in S^m}. It follows that {\frac{1}{m}\kappa(S^m)\subset B(0,C_{G,S})} for all {m\in\mathbb{N}}, that is, {\frac{1}{m}\kappa(S^m)} is confined in a compact region of {\mathfrak{a}}.

Let us now estimate {d(x,\frac{1}{m}\kappa(S^m))} for {x\in\frac{1}{n}\kappa(S^n)}, say {x=\frac{1}{n}\kappa(g)} with {g\in S^n}. By Abels–Margulis–Soifer theorem 4, we can select a finite subset {F} of the monoid generated by {S} such that for each {h\in G}, there exists {a\in F} so that {ha} is {(G,r,\varepsilon)}-proximal. In particular, we can take {f\in F} such that {(gf)^k} is {(r,\varepsilon)}-proximal for all {k\in\mathbb{N}}. By Lemma 3, we have

\displaystyle \|k\lambda(gf)-\kappa((gf)^k)\| = \|\lambda((gf)^k)-\kappa((gf)^k)\|\leq C_r \quad \forall \, \, k\geq 1


\displaystyle \|\kappa(gf)-\kappa(g)\|\leq C_G\|\kappa(f)\|.

Since {\lambda(gf)=\frac{1}{k}\lambda((gf)^k)}, it follows from the triangular inequality that

\displaystyle \begin{array}{rcl} \|\kappa(g)-\frac{1}{k}\kappa((gf)^k)\|&\leq& \|\kappa(g)-\kappa(gf)\|+\|\kappa(gf)-\lambda(gf)\|+\|\frac{1}{k}\lambda((gf)^k)-\frac{1}{k}\kappa((gf)^k)\| \\ &\leq& C_G\|\kappa(f)\|+C_r+\frac{1}{k}C_r = C_G\|\kappa(f)\|+\frac{k+1}{k}C_r. \end{array}

Therefore, if we fix {h_0\in S} and we write {f\in S^{n_f}}, {n_f\in\mathbb{N}}, we can use the Euclidean division {m=k(n+n_f)+j}, {0\leq j<n+n_f} to obtain an element of {S^m} via the formula

\displaystyle g_m:=h_0^j(gf)^k.

It follows from the definitions and Lemma 3 that

\displaystyle \begin{array}{rcl} d(x,\frac{1}{m}S^m)&\leq& \|x-\frac{1}{m}\kappa(g_m)\|=\|\frac{1}{n}\kappa(g)-\frac{1}{m}\kappa(g_m)\| \\ &\leq& \frac{1}{n}\|\kappa(g)-\frac{1}{k}\kappa((gf)^k)\|+\|\frac{1}{nk}\kappa((gf)^k)-\frac{1}{m}\kappa(g_m)\| \\ &\leq& \frac{1}{n}\left(C_G\|\kappa(f)\|+\frac{(k+1)C_r}{k}\right)+\left|\frac{1}{nk}-\frac{1}{m}\right|\|\kappa((gf)^k)\|+\frac{1}{m}\|\kappa((gf)^k)-\kappa(g_m)\| \\ &\leq& \frac{1}{n}\left(C_G\|\kappa(f)\|+\frac{(k+1)C_r}{k}\right)+C_G\left(k(n+n_f)\left|\frac{1}{nk}-\frac{1}{m}\right|+\frac{j}{m}\right) \sup_{s\in S}\|\kappa(s)\|. \end{array}


\displaystyle k(n+n_f)\left|\frac{1}{nk}-\frac{1}{m}\right| = \left|\frac{n+n_f}{n}-\frac{k(n+n_f)}{m}\right|=\frac{n_f}{n}+\frac{j}{m},

by taking {m\rightarrow\infty} (or equivalently {k\rightarrow\infty}) we derive that

\displaystyle \limsup\limits_{m\rightarrow\infty}d(x,\frac{1}{m}\kappa(S^m))\leq \frac{1}{n}\left(C_G\sup\limits_{f\in F}\|\kappa(f)\|+C_r\right)+\frac{1}{n}\left(C_G\sup\limits_{f\in F}n_f \sup_{s\in S}\|\kappa(s)\|\right).

Hence, given {\delta>0}, there exists {n_0\in\mathbb{N}} such that

\displaystyle \limsup\limits_{m\rightarrow\infty}d(x,\frac{1}{m}\kappa(S^m))\leq \delta

for all {x\in\frac{1}{n}\kappa(S^n)}, {n\geq n_0}. This completes the proof. \Box

3. Twisting and Jordan projections

A Zariski dense monoid of matrices is twisting in the sense that it always contains an element putting a finite configuration of lines and hyperplanes in general positions:

Lemma 6 Let {G} be a connected, reductive Lie group and suppose that {\Gamma\subset G} is a Zariski dense monoid. Given a finite collection {(\rho_i, V_i)}, {1\leq i\leq D}, of irreducible representations of {G} and finite configurations {v_i^j} and {H_i^j}, {1\leq j\leq t}, of points and hyperplanes in {\mathbb{P}(V_i)}, there is an element {\gamma\in \Gamma} such that

\displaystyle \rho_i(\gamma)v_i^j\notin H_i^j

for all {1\leq i\leq D} and {1\leq j\leq t}.

Proof: Since {(\rho_i, V_i)} are irreducible, the sets {\{g\in G:\rho_i(\gamma)v_i^j\notin H_i^j\}} are non-empty and Zariski open in {G}. Thus,

\displaystyle \bigcap\limits_{\substack{1\leq i\leq D \\ 1\leq j\leq t}} \{g\in G:\rho_i(\gamma)v_i^j\notin H_i^j\}

is Zariski open in {G} and non-empty (because {G} is connected). Since {\Gamma} is Zariski dense,

\displaystyle \Gamma\cap \bigcap\limits_{\substack{1\leq i\leq D \\ 1\leq j\leq t}} \{g\in G:\rho_i(\gamma)v_i^j\notin H_i^j\}\neq\emptyset.

This completes the argument. \Box

Remark 1 The conclusion of this lemma can be reinforced as follows (cf. Remark 2.22 of Breuillard–Sert paper): it is possible to select {\gamma} from a finite subset of {\Gamma} depending only on {D} and {t} (but not on {v_i^j} and {H_i^j}).

At this stage, we can start the discussion of the convergence of Jordan projections:

Theorem 7 Let {G} be a connected reductive real Lie group and suppose that {S\subset G} is a compact subset generating a Zariski dense subgroup. Then,

\displaystyle \frac{1}{n}\lambda(S^n)

converges in the Hausdorff topology as {n\rightarrow\infty}.

Proof: Similarly to the previous section (on convergence of Cartan projections), our task consists into showing that for each {\delta}, there exists {n_0\in\mathbb{N}} such that

\displaystyle \limsup\limits_{m\rightarrow\infty} d(x,\frac{1}{m}\lambda(S^m))\leq \delta

for all {x\in\frac{1}{n}\lambda(S^n)}, {n\geq n_0}. In this direction, let us fix {\delta>0} and let us take {x=\frac{1}{n}\lambda(g)}, {g\in S^n}. By the formula for the spectral radius (cf. Lemma 1), we can fix {\ell\in\mathbb{N}} with

\displaystyle \|\frac{1}{\ell}\kappa(g^{\ell})-\lambda(g)\|<\delta.

By Abels–Margulis–Soifer theorem 4, we can fix {F} a finite subset of the monoid {\Gamma} generated by {S} such that for some {f\in F}, we have that {g^{\ell} f} is {(G,r,\varepsilon)}-proximal.

By Lemma 3,

\displaystyle \|\kappa(g^{\ell}f)-\kappa(g^{\ell})\|\leq C_G\|\kappa(f)\|\leq C_G n_f\sup\limits_{s\in S}\|\kappa(s)\|

where {f\in S^{n_f}}, and

\displaystyle \|\lambda((g^{\ell}f)^k)-\kappa((g^{\ell}f)^k)\|\leq C_r

for all {k\geq 1}.

Consider the distinguished representations {(\rho_i, V_i)}, {1\leq i\leq d_s}, of {G}. Note that the dominant eigendirection {v_i^+} and the dominated hyperplane {H_i^<} for the actions of the proximal matrices {\rho_i((g^{\ell}f)^k)} on {\mathbb{P}(V_i)} are the same for all {k\geq 1}.

We fix {h_0\in S}. By the twisting property in Lemma 6, there exists {\gamma\in\Gamma}, say {\gamma\in S^{n_{\gamma}}}, such that

\displaystyle \rho_i(\gamma)\rho_i(h_0^j)v_i^+\notin H_i^<

for all {1\leq i\leq d_s} and {0\leq j<n\ell+n_f}.

The dynamics of projective actions of the iterates of a proximal matrix {a} is easy to describe: any direction transverse to {H_a^<} is attracted towards {v_a^+}. By rendering this argument slightly more quantitative (with the aid of the so-called Tits proximality criterion), Breuillard and Sert proved in Lemma 3.6 of their paper that

Lemma 8 If {a\in G} is {(G,r,\varepsilon)}-proximal and {T\subset G} is a finite subset such that

\displaystyle \rho_i(t)v_{\rho_i(a)}^+\notin H_{\rho_i(a)}^<

for all {t\in T} and {1\leq i\leq d_s}, then there exists {\widehat{r}>0} such that for all {0<\widehat{\varepsilon}<\widehat{r}} and {t\in T}, one has that {ta^k} is {(G,\widehat{r},\widehat{\varepsilon})}-proximal for all {k\geq k_0=k_0(\widehat{\varepsilon})}.

By applying this lemma with {T:=\{\gamma h_0^j: 0\leq j<n\ell+n_f\}}, we can select {0<\widehat{\varepsilon}<\widehat{r}} such that {\gamma h_0^j(g^{\ell}f)^k} is {(G,\widehat{r},\widehat{\varepsilon})}-proximal for all {1\leq i\leq d_s}, {0\leq j<n\ell+n_f} and {k\geq k_0(\widehat{\varepsilon})}.

Once again, it follows from Lemma 3 that

\displaystyle \|\kappa(\gamma h_0^j(g^{\ell}f)^k)-\kappa((g^{\ell}f)^k)\|\leq C_G\sup\limits_{t\in T}\|\kappa(t)\|


\displaystyle \|\kappa(\gamma h_0^j(g^{\ell}f)^k)-\lambda(\gamma h_0^j(g^{\ell}f)^k)\|\leq C_{\widehat{r}}

for all {0\leq j<n\ell+n_f} and {k\geq k_0(\widehat{\varepsilon})}.

By Euclidean division, we can write {m-n_{\gamma}=k(n\ell+n_f)+j} with {0\leq j<n\ell+n_f} and define

\displaystyle g_m:= \gamma h_0^j (g^{\ell}f)^k\in S^m.

From our discussion above, we derive that

\displaystyle \begin{array}{rcl} \|x-\frac{1}{m}\lambda(g_m)\| &\leq& \frac{1}{n}\|\lambda(g)-\frac{1}{\ell}\kappa(g^{\ell})\| + \|\frac{1}{n\ell}\kappa(g^{\ell})-\frac{1}{m}\lambda(g_m)\| \\ &\leq& \frac{\delta}{n}+ \frac{1}{n\ell}\|\kappa(g^{\ell})-\kappa(g^{\ell}f)\| + \|\frac{1}{n\ell}\kappa(g^{\ell}f)-\frac{1}{m}\lambda(g_m)\| \\ &\leq& \frac{\delta}{n}+C_G \frac{n_f}{n\ell}\sup\limits_{s\in S}\|\kappa(s)\|+\|\frac{1}{n\ell}\kappa(g^{\ell}f)-\frac{1}{m}\lambda(g_m)\| \\ &\leq& \frac{\delta}{n}+C_G \frac{n_f}{n\ell}\sup\limits_{s\in S}\|\kappa(s)\|+\frac{C_r}{n\ell}+ \|\frac{1}{n\ell}\lambda(g^{\ell}f)-\frac{1}{m}\lambda(g_m)\| \\ &=& \frac{\delta}{n}+C_G \frac{n_f}{n\ell}\sup\limits_{s\in S}\|\kappa(s)\|+\frac{C_r}{n\ell}+ \|\frac{1}{kn\ell}\lambda((g^{\ell}f)^k)-\frac{1}{m}\lambda(g_m)\| \\ &\leq& \frac{\delta}{n}+C_G \frac{n_f}{n\ell}\sup\limits_{s\in S}\|\kappa(s)\|+\frac{C_r}{n\ell}+\frac{C_r}{kn\ell}+ \|\frac{1}{kn\ell}\kappa((g^{\ell}f)^k)-\frac{1}{m}\lambda(g_m)\| \\ &\leq& \frac{\delta}{n}+C_G \frac{n_f}{n\ell}\sup\limits_{s\in S}\|\kappa(s)\|+\frac{C_r}{n\ell}+\frac{C_r+C_G\sup\limits_{t\in T}\|\kappa(t)\|}{kn\ell} +\|\frac{1}{kn\ell}\kappa(g_m)-\frac{1}{m}\lambda(g_m)\| \\ &\leq& \frac{\delta}{n}+C_G \frac{n_f}{n\ell}\sup\limits_{s\in S}\|\kappa(s)\|+\frac{C_r}{n\ell}+\frac{C_r+C_G\sup\limits_{t\in T}\|\kappa(t)\|+C_{\widehat{r}}}{kn\ell} +\|\frac{1}{kn\ell}\lambda(g_m)-\frac{1}{m}\lambda(g_m)\|. \end{array}

Since {\sup\limits_{t\in T}\|\kappa(t)\|\leq C_G(n\ell+n_f+n_{\gamma})\sup\limits_{s\in S}\|\kappa(s)\|} and

\displaystyle \frac{1}{kn\ell}-\frac{1}{m} = \frac{kn_f+j+n_{\gamma}}{mkn\ell},

by letting {m\rightarrow\infty} (or equivalently, {k\rightarrow\infty}) we conclude that

\displaystyle \limsup\limits_{m\rightarrow\infty} d(x,\frac{1}{m}\lambda(S^m))\leq \frac{\delta}{n}+C_G \frac{n_f}{n\ell}\sup\limits_{s\in S}\|\kappa(s)\|+\frac{C_r}{n\ell}+C_G \frac{n_f}{n\ell}\sup\limits_{s\in S}\|\kappa(s)\|

for all {x\in\frac{1}{n}\lambda(S^n)}. This completes the proof. \Box

4. Coincidence of the limits

Let {G} be a connected, reductive real Lie group and let {S\subset G} be a compact subset generating a Zariski dense monoid. By Theorems 5 and 7, we have that

\displaystyle \frac{1}{n}\kappa(S^n)\rightarrow J_{Cartan} \quad \textrm{and} \quad \frac{1}{n}\lambda(S^n)\rightarrow J_{Jordan}

as {n\rightarrow\infty}.

Theorem 9 We have {J_{Cartan}=J_{Jordan}}.

Proof: By the formula for the spectral radius (cf. Lemma 1), for all {g\in G}, one has {\frac{1}{n}\kappa(g^n)\rightarrow \lambda(g)} as {n\rightarrow\infty}. In particular, {J_{Jordan}\subset J_{Cartan}}.

In order to derive the other inclusion, we recall that the proof of Theorem 5 about the convergence of Cartan projections revealed that there exists {i_0=i_0(S)\in\mathbb{N}} and a constant {C_S>0} such that for all {n\in\mathbb{N}} and {g\in S^n}, there exists {f\in S^i} with {i\leq i_0} and

\displaystyle \|\kappa(g)-\lambda(gf)\|\leq C_S.


\displaystyle \begin{array}{rcl} \|\frac{1}{n}\kappa(g)-\frac{1}{n+i}\lambda(gf)\|&\leq& \|\frac{1}{n}\kappa(g)-\frac{1}{n}\lambda(gf)\|+\left|\frac{1}{n}-\frac{1}{n+i}\right|\lambda(gf) \\ &\leq& \frac{1}{n}C_S + \frac{i_0}{n}\frac{\lambda(gf)}{n+i}. \end{array}

Since {gf\in S^{n+i}}, we have that {\lambda(gf)/(n+i)} is bounded and, a fortiori,

\displaystyle \|\frac{1}{n}\kappa(g)-\frac{1}{n+i}\lambda(gf)\|\rightarrow 0

as {n\rightarrow\infty}. This shows that {J_{Cartan}\subset J_{Jordan}}, as desired. \Box

5. Realization of the joint spectrum by sequences

Closing this post, let us further discuss Theorem 5 from the previous post by showing that {x=\lim\limits_{n\rightarrow\infty}\frac{1}{n}\kappa(a_n)} with {a_n\in S^n} is realized by a single sequence {b=(b_1,b_2,\dots)\in S^{\mathbb{N}}} in the sense that {x=\lim\limits_{n\rightarrow\infty}\frac{1}{n}\kappa(b_1\dots b_n)}.

For this sake, we use Abels–Margulis–Soifer theorem 4 and the strong version in Remark 1 of the twisting property in Lemma 6 to select a finite subset {F} of {\Gamma} and some constants {0<\varepsilon<r} such that for each {n\in\mathbb{N}} there are {f_n, \gamma_n\in F} with the property that {g_n:=\gamma_n a_n f_n} is a Schottky family in the sense that {g_n} is {(G,r,\varepsilon)}-proximal, and {d(v_{g_n}^+,H_{g_{n+1}}^<)\geq 6r} and {d(v_{g_1}^+,H_{g_{n}}^<)\geq 6r} for all {n\in\mathbb{N}}. (This nomenclature comes from the fact that the projective actions of the elements in a Schottky family resemble the classical Schottky groups.) Note that {g_n\in S^{|g_n|}} where {|g_n|=n+O(1)}.

Let us now choose a rapidly increasing sequence {(\ell_n)_{n\in\mathbb{N}}} so that

\displaystyle \sum\limits_{i=1}^{n-1} i\ell_i=o(\ell_n)

for all {n\in\mathbb{N}}, and we define {(b_1,b_2,\dots)\in S^{\mathbb{N}}} by

\displaystyle b_1 b_2\dots b_k\dots:=g_1^{\ell_1} g_2^{\ell_2}\dots g_n^{\ell_n}\dots

By definition, any finite word {b_1\dots b_k} has the form {g_1^{\ell_1} g_2^{\ell_2}\dots g_n^{\ell_n} g_{n+1}^{\ell}\overline{g}} where {\ell\leq\ell_{n+1}} and {\overline{g}} is a prefix of {g_{n+1}}. Observe that

\displaystyle \begin{array}{rcl} k&=&\sum\limits_{i=1}^n |g_i|\ell_i + |g_{n+1}|\ell+|\overline{g}| = |g_n|\ell_n+|g_{n+1}|\ell+o(\ell_n) \\ &=& n\ell_n + (n+1)\ell+O(1)(\ell_n+\ell) \end{array}

By Lemma 3, {\|\kappa(\overline{g})\|=O(n)}. Moreover, Lemma 2.17 in Breuillard–Sert paper ensures that the Schottky property for the family {(g_n)_{n\in\mathbb{N}}} makes that {g_1^{\ell_1}\dots g_n^{\ell_n} g_{n+1}^{\ell}} is a {(G,2r,2\varepsilon)}-proximal element with

\displaystyle \|\lambda(g_1^{\ell_1}\dots g_n^{\ell_n} g_{n+1}^{\ell})-\sum\limits_{i=1}^n\ell_i\lambda(g_i)-\ell\lambda(g_{n+1})\|=O(n).

Therefore, it follows from Lemma 3 that

\displaystyle \begin{array}{rcl} \kappa(b_1\dots b_k) &=& \kappa(g_1^{\ell_1} g_2^{\ell_2}\dots g_n^{\ell_n} g_{n+1}^{\ell}\overline{g}) = \kappa(g_1^{\ell_1} g_2^{\ell_2}\dots g_n^{\ell_n} g_{n+1}^{\ell})+O(n) \\ &=& \lambda(g_1^{\ell_1} g_2^{\ell_2}\dots g_n^{\ell_n} g_{n+1}^{\ell})+O(n) = \sum\limits_{i=1}^n\ell_i\lambda(g_i)+\ell\lambda(g_{n+1})+O(n) \\ &=& \sum\limits_{i=1}^n\ell_i\kappa(g_i)+O(1)\sum\limits_{i=1}^n\ell_i+\ell\kappa(g_{n+1})+O(1)\ell+O(n) \\ &=& \sum\limits_{i=1}^{n-1}(i+O(1))\ell_i+\ell_n\kappa(a_n)+O(1)\ell_n+\ell\kappa(a_{n+1})+O(1)\ell+O(n) \\ &=& n\ell_n\frac{1}{n}\kappa(a_n)+(n+1)\ell\frac{1}{n+1}\kappa(a_{n+1})+O(1)(\ell_n+\ell). \end{array}

Since {\frac{1}{n}\kappa(a_n)} converges to {x}, we conclude that {\frac{1}{k}\kappa(b_1\dots b_k)} converges to {x} as {k\rightarrow\infty}.

Yuri Lima asked me to announce in this blog the following opportunity for a post-doctoral position in Dynamical Systems and Ergodic Theory at Universidade Federal do Ceará (located in Fortaleza, Brazil):

Serrapilheira Postdoctoral Fellowship – UFC

The Department of Mathematics at Universidade Federal do Ceará (UFC) invites applications for a Serrapilheira Postdoctoral Fellowship in dynamical systems and ergodic theory. The position is for one year with start date at any moment between March 2020 and September 2020, with possibility of extension for another year.

Qualifications and expectations

The position is part of the project “Jangada Dinâmica – boosting dynamical systems in Brazil’s Northeastern region”, which is funded by Instituto Serrapilheira and aims to boost dynamical systems and ergodic theory in the mathematical community of universities located in the Northeastern region of Brazil. The applicant must have completed a PhD and be qualified for conducting research in either dynamical systems and/or ergodic theory. There are NO teaching duties. As part of the program, and to foster interaction, the fellow shall visit another department of Mathematics in the Northeast for one month each semester or two months per year. Applications from underrepresented groups in Mathematics are highly encouraged.


The salary will range from 5000–6000 Brazilian Reais monthly, tax free, in a twelve month-base calendar, according to the applicant’s qualifications. There will be an extra 5000 Brazilian Reais for each of the two months of visits to another institution in the Northeast. The salary is more attractive than those offered by regular Brazilian funding agencies.

Department of Mathematics at UFC

The Department of Mathematics at UFC currently holds the highest rank among Brazilian Mathematics departments. Having a strong history in the field of differential geometry, during the last 15 years it has developed new research groups in analysis, graph optimization and, more recently, in dynamical systems. Currently, the group of dynamical systems has two members, with expertise on nonuniform hyperbolicity, partial hyperbolicity, and symbolic dynamics.


UFC is located in the city of Fortaleza, which has approximately 2.5 million inhabitants and is the fifth largest city of Brazil. Located in the Northeastern region of Brazil, Fortaleza is becoming a common port of entry to the country, with many direct flights to the US and Europe. Historically known for touristic reasons, it is nearby beaches with warm water and white sand dunes, and its cost of living is cheaper than bigger cities like Rio de Janeiro and São Paulo, thus making the monthly stipend affordable.

Documentation required

– CV with publication list

– Research statement

– Two (or more) letters of recommendation.

All documents must be sent to The applicant must send the first two documents, and ask two (or more) professors to directly send their letters of recommendation.

Deadline: December 31, 2019.

More information:

In this previous post here (from 2018), I described some “back of the envelope calculations” (based on private conversations with Scott Wolpert) indicating that some sectional curvatures of the Weil–Petersson (WP) metric could be at least exponentially small in terms of the distance to the boundary divisor of Deligne–Mumford compactification.

Very roughly speaking, this heuristic computation went as follows: the WP sectional curvature of any {2}-plane can be written as the sum of three terms; for the {2}-planes considered in the previous post, the main term among those three seemed to be a kind of {L^4}-norm of Beltrami differentials with essentially disjoint supports; finally, this {L^4}-type norm was shown to be really small once a certain Green propagator is ignored.

Last April 2019, I met Scott during an event at Simons Center for Geometry and Physics, and I took the opportunity to tell him that one could perhaps show that the measure of the set of {2}-planes leading to tiny WP curvatures is very small using the real-analyticity of the WP metric.

More concretely, my idea was very simple: since the Grassmannian {G} of {2}-planes tangent to a point {p} is a compact space, the WP sectional curvature defines a real-analytic function {c:G\rightarrow(-\infty, 0)}, and we dispose of good upper bounds for {|c|} and all of its derivatives in terms of the distance of {p} to the boundary (see this article here), we can hope to get reasonable estimates for the measure of the sets {\{P\in G: |c(P)|\leq\varepsilon\}} using the techniques of these articles here and here (which are close in spirit to the classical fact [explained in Lemma 3.2 of Kleinbock–Margulis paper, for instance] that the measure of the sets {\{|P|\leq\varepsilon\}} are small whenever {P} is a polynomial function on {[0,1]} whose degree and {C^0}-norm are bounded).

As it turns out, Scott thought that this strategy made some sense and, in particular, he promised to use my suggestion as a motivation to review his arguments concerning WP sectional curvatures.

After several email exchanges with Howard Masur and I, Scott announced that there were some mistakes in the construction of tiny WP sectional curvature: in a nutshell, one should not restrict the analysis to a single “main term” in the formula for WP sectional curvatures as a sum of three expressions, and one can not ignore the effect of the Green propagator. More importantly, Scott made a detailed study of these mistakes which ultimately led him to establish polynomial upper bounds for WP sectional curvatures at the heart of his newest preprint available here.

In this post, we will follow closely Scott’s preprint in order to give an outline of the proof of a polynomial upper bound for WP sectional curvatures:

Theorem 1 (Wolpert) Given two integers {g\geq 0} and {n\geq 0} with {3g-3+n\geq 1}, there exists a constant {C(g,n)>0} with the following property.If {\sigma(X)} denotes the product of the lengths of the short geodesics of a hyperbolic surface {X} of genus {g} with {n} cusps whose systole is sufficiently small, then the sectional curvatures of the Weil-Petersson metric at {X} are at most

\displaystyle -C(g,n)\cdot\sigma(X)^7

Remark 1 As it was pointed out by Scott in his preprint, it is likely that this estimate is not optimal: indeed, one expects that the best exponent should be {3} rather than {7}.

In what follows, we’ll assume some familiarity with some basic aspects of the geometry of the Weil–Petersson metric (such as those described in these posts here and here).

1. Weil–Petersson sectional curvatures

Let {X} be a hyperbolic surface of genus {g\geq0} with {n\geq0}. If we write {X=\mathbb{H}/\Gamma}, where {\mathbb{H}} is the usual hyperbolic plane and {\Gamma} is a group of isometries of {\mathbb{H}} describing the fundamental group of {X}, then the holomorphic tangent space at {X} to the moduli space {\mathcal{M}_{g,n}} of Riemann surfaces of genus {g} with {n} punctures is naturally identified with the space {B(\Gamma)} of harmonic Beltrami differentials on {X} (and the cotangent space is related to quadratic differentials).

In this setting, the Weil–Petersson metric is the Riemannian metric {ds^2=2\sum g_{\alpha\overline{\beta}} dt_{\alpha}\overline{dt_{\beta}}} induced by the Hermitian inner product

\displaystyle g_{\alpha\overline{\beta}} = \langle\mu_{\alpha},\mu_{\beta}\rangle := \int_X \mu_{\alpha}\overline{\mu_{\beta}} \, dA

where {\mu_{\alpha}, \mu_{\beta}\in B(\Gamma)} and {dA} is the hyperbolic area form on {X}.

Remark 2 Note that {\langle.,.\rangle} is well-defined: if {\mu=\mu(z)\overline{dz}/dz} and {\nu=\nu(z)\overline{dz}/dz} are Beltrami differentials, then {\mu\overline{\nu}} is a function on {X}.

The Riemann tensor of the Weil–Petersson metric was computed by Wolpert in 1986:

\displaystyle R_{\alpha\overline{\beta}\gamma\overline{\delta}} = (\alpha\overline{\beta}, \gamma\overline{\delta}) + (\alpha\overline{\delta}, \gamma\overline{\beta})

where {(a\overline{b},c\overline{d}) := \int_X (\mu_a\overline{\mu_b}) \mathcal{D}(\mu_c\overline{\mu_d})\, dA} and {\mathcal{D}:=-2(\Delta-2)^{-1}} is an operator related to the Laplace–Beltrami operator {\Delta} on {L^2(X)}.

Remark 3 Our choice of notation here differs from Wolpert’s preprint! Indeed, he denotes the Laplace–Beltrami operator by {D} and he writes {\Delta=-2(D-2)^{-1}}.

The Riemann tensor gives access to nice formulas for the sectional curvatures thanks to the work of Bochner. More concretely, given {v_1} and {v_2} span a {2}-plane {P} in the real tangent space to {\mathcal{M}_{g,n}} at {X}, let us take Beltrami differentials {\mu_1} and {\mu_2} such that {v_1=\mu_1+\overline{\mu_2}}, {v_2=\mu_1-\overline{\mu_2}}, and {\{\mu_1,\mu_2\}} is orthonormal. Then, Bochner showed that the sectional curvature of {P} is

\displaystyle K(P)=\frac{R_{1\overline{2}1\overline{2}}-R_{1\overline{2}2\overline{1}}-R_{2\overline{1}1\overline{2}}+R_{2\overline{1}2\overline{1}}}{4g_{1\overline{1}}g_{2\overline{2}}-2|g_{1\overline{2}}|^2-2\textrm{Re}(g_{1\overline{2}})^2} = \frac{R_{1\overline{2}1\overline{2}}-R_{1\overline{2}2\overline{1}}-R_{2\overline{1}1\overline{2}}+R_{2\overline{1}2\overline{1}}}{4}

Hence, by Wolpert’s formula for the Riemann tensor of the WP metric, we see that

\displaystyle K(P) = \frac{2(1\overline{2}, 1\overline{2})-(1\overline{2}, 2\overline{1})-(1\overline{1}, 2\overline{2})-(2\overline{1}, 1\overline{2})-(2\overline{2}, 1\overline{1})+2(2\overline{1}, 2\overline{1})}{4} \ \ \ \ \ (1)

2. Spectral theory of {\mathcal{D}}

Wolpert’s formula for the Riemann tensor of the WP metric hints that the spectral theory of {\mathcal{D}} plays an important role in the study of the WP sectional curvatures.

For this reason, let us review some key properties of {\mathcal{D}} (and we refer to Section 3 of Wolpert’s preprint for more details and references). First, {\mathcal{D}=-2(\Delta-2)^{-1}} is a positive operator on {L^2(X)} whose norm is {1}: these facts follow by integration by parts. Secondly, {\Delta} is essentially self-adjoint on {L^2(X)}, so that {\mathcal{D}} is self-adjoint on {L^2(X)}. Moreover, the maximum principle permits to show that {\mathcal{D}} is also a positive operator on {C_0(X)} with unit norm. Finally, {\mathcal{D}} has a positive symmetric integral kernel: indeed,

\displaystyle \mathcal{D}f(p) = \int_X G(p,q) f(q) \, dA

where the Green propagator {G} is the Poincaré series

\displaystyle G(p,q)=-2\sum\limits_{\gamma\in\Gamma} Q_1(d(p,\gamma(q)))

associated to an appropriate Legendre function {Q_1}. (Here, {d(.,..)} stands for the hyperbolic distance on {\mathbb{H}}.) For later reference, we recall that {Q_1} has a logarithmic singularity at {0} and {-Q_1(x)\sim e^{-2x}} whenever {x} is large.

3. Negativity of the WP sectional curvatures

Interestingly enough, as it was first noticed by Wolpert in 1986, the spectral features of {\mathcal{D}} described above are sufficient to derive the negativity of WP sectional curvatures from Cauchy-Schwarz inequality. More precisely, since {\mathcal{D}} is self-adjoint, i.e.,

\displaystyle (a\overline{b},c\overline{d}) := \int_X (\mu_a\overline{\mu_b}) \, \mathcal{D}(\mu_c\overline{\mu_d}) \, dA = \int_X \mathcal{D}(\mu_a\overline{\mu_b}) \, \mu_c\overline{\mu_d}\,dA

and its integral kernel {G} is a real function, a straightforward computation reveals that the equation (1) for the sectional curvature {K(P)} of a {2}-plane {P} can be rewritten as

\displaystyle \begin{array}{rcl} K(P) &=& \frac{2(1\overline{2}, 1\overline{2})-(1\overline{2}, 2\overline{1})-(1\overline{1}, 2\overline{2})-(2\overline{1}, 1\overline{2})-(2\overline{2}, 1\overline{1})+2(2\overline{1}, 2\overline{1})}{4} \\ &=& \frac{4\textrm{Re}(1\overline{2}, 1\overline{2}) -2(1\overline{2}, 2\overline{1}) -2(1\overline{1}, 2\overline{2})}{4}. \end{array}

If we decompose the function {\mu_1\overline{\mu_2}} into its real and imaginary parts, say {\mu_1\overline{\mu_2} = f+ih}, then we see that

\displaystyle \begin{array}{rcl} \textrm{Re}(1\overline{2}, 1\overline{2}) - (1\overline{2}, 2\overline{1}) &=& \left[\int_X f\,\mathcal{D}f \, dA - \int_X h\, \mathcal{D}h \, dA\right] - \left[\int_X f\,\mathcal{D}f \, dA + \int_X h\, \mathcal{D}h \, dA\right] \\ &=& -2\int_X h\, \mathcal{D}h \, dA. \end{array}

Since {\mathcal{D}} is a positive operator, we conclude that {\textrm{Re}(1\overline{2}, 1\overline{2}) - (1\overline{2}, 2\overline{1})\leq 0} and, a fortiori,

\displaystyle K(P)\leq \frac{\textrm{Re}(1\overline{2}, 1\overline{2})-(1\overline{1}, 2\overline{2})}{2} \ \ \ \ \ (2)

The non-positivity of the right-hand side of (2) can be established in three steps. First, the positivity of {\mathcal{D}} also implies that

\displaystyle \textrm{Re}(1\overline{2}, 1\overline{2})\leq \int_X |f|\,\mathcal{D}|f|\,dA\leq \int_X |f|\,\mathcal{D}|\mu_1\overline{\mu_2}|\,dA.

Secondly, the fact that {\mathcal{D}} has a positive integral kernel {G} allows to apply the Cauchy–Schwarz inequality to get that {\mathcal{D}|uv| =\int G |u v| = \int G^{1/2}|u| G^{1/2}|v| \leq (\mathcal{D}|u|^2)^{1/2} (\mathcal{D}|v|^2)^{1/2}}. Therefore,

\displaystyle \int_X |f|\,\mathcal{D}|\mu_1\overline{\mu_2}|\,dA\leq \int_X |f|\,(\mathcal{D}|\mu_1|^2)^{1/2} (\mathcal{D}|\mu_2|^2)^{1/2}\,dA\leq \int_X |\mu_1\overline{\mu_2}| \, (\mathcal{D}|\mu_1|^2)^{1/2} (\mathcal{D}|\mu_2|^2)^{1/2}\,dA

Finally, the Cauchy–Schwarz inequality also says that

\displaystyle \int_X |\mu_1\overline{\mu_2}| \, (\mathcal{D}|\mu_1|^2)^{1/2} (\mathcal{D}|\mu_2|^2)^{1/2}\,dA\leq \left(\int_X |\mu_1|^2\,(\mathcal{D}|\mu_2|^2)\, dA\right)^{1/2}\left(\int_X |\mu_2|^2\,(\mathcal{D}|\mu_1|^2)\, dA\right)^{1/2}=(1\overline{1},2\overline{2})

In summary, we showed that

\displaystyle (I)\leq (II)\leq (III)\leq (IV)\leq (V)\leq (VI) \ \ \ \ \ (3)


\displaystyle \begin{array}{rcl} & &(I):=\textrm{Re}(1\overline{2}, 1\overline{2}), \quad (II):=\int_X |f|\,\mathcal{D}|f|\,dA, \quad (III):=\int_X |f|\,\mathcal{D}|\mu_1\overline{\mu_2}|\,dA, \\ & & (IV):=\int_X |f|\,(\mathcal{D}|\mu_1|^2)^{1/2} (\mathcal{D}|\mu_2|^2)^{1/2}\,dA, \quad (V):=\int_X |\mu_1\overline{\mu_2}| \, (\mathcal{D}|\mu_1|^2)^{1/2} (\mathcal{D}|\mu_2|^2)^{1/2}\,dA, \\ & & (VI):= (1\overline{1}, 2\overline{2}) \end{array}

In particular, {(I)\leq (VI)}, so that it follows from (2) that all sectional curvatures {K(P)} of the WP metric are non-positive, i.e., {K(P)\leq 0}.

Actually, it is not hard to derive that {K(P)<0} at this stage: indeed, {K(P)=0} would force a case of equality in Cauchy-Schwarz inequality and this is not possible in our context because {\{\mu_1,\mu_2\}} is orthonormal.

Remark 4 Philosophically speaking, the “analog” to this argument in the realm of Teichmüller dynamics is Forni’s proof of the spectral gap property {\lambda_2<1} for the Lyapunov exponents of the Teichmüller geodesic flow. In fact, after some computations with variational formulas for the so-called Hodge norm, Forni establishes that {\lambda_2<1} by ruling out an equality case in a certain Cauchy-Schwarz estimate.

4. Reduction of Theorem 1 to bounds on {\mathcal{D}}‘s kernel

The discussion in the previous section says that small WP sectional curvatures correspond to almost equalities in certain Cauchy-Schwarz inequalities.

Hence, a natural strategy towards the proof of Theorem 1 consists into showing that an almost equality in (3) is impossible. In this direction, Wolpert establishes the following result:

Theorem 2 (Wolpert) There are two constants {c_1(g,n)>0} and {c_2(g,n)>0} with the following property. If we have an almost equality

\displaystyle (V)-(I)\leq c_1(g,n)\cdot\sigma(X)^7,

between the terms {(I)} and {(V)} in (3), then {(VI)} and {(I)} can not be almost equal:

\displaystyle (VI)-(I)\geq c_2(g,n)\cdot\sigma(X)^3

Of course, Theorem 1 is an immediate consequence of Theorem 2 (in view of (2) and the estimate {(VI)-(I)\geq (V)-(I)} [implied by (3)]).

Thus, it remains only to prove Theorem 2. For this sake, we need further spectral information on {\mathcal{D}}, namely, some lower bounds on its the kernel {G(p,q)}. In order to illustrate this point, let us now show Theorem 2 assuming the following statement.

Proposition 3 There exists a constant {c_3(g,n)>0} such that

\displaystyle G(p,q)\geq c_3(g,n)\cdot \sigma(X)^3

whenever {p} and {q} do not belong to the cusp region {X_{cusps}} of {X}.

Remark 5 We recall that the cusp region {X_{cusps}} of {X} is a finite union of portions of {X} which are isometric to a punctured disk {\{0<|w|<c_4(g,n)\}} (equipped with the hyperbolic metric {ds^2=(|dw|/|w|\log|w|)^2}).

For the sake of exposition, let us first establish Theorem 2 when {X} is compact, i.e., {X_{cusps}=\emptyset}, before explaining the extra ingredient needed to treat the general case.

4.1. Proof of Theorem 2 modulo Proposition 3 when {X_{cusps}=\emptyset}

Suppose that {(V)-(I)\leq c_1(g,n)\sigma(X)^7} for a constant {c_1(g,n)} to be chosen later. In this regime, our goal is to show that {(VI)} is “big” and {(II)} is “small”, so that {(VI)-(I)} is necessarily “big”.

We start by quickly showing that {(VI)} is “big”. Since {\mu_1} and {\mu_2} are unitary tangent vectors, it follows from Proposition 3 that

\displaystyle (VI)=\int_X |\mu_1|^2\,\mathcal{D}|\mu_2|^2\,dA\geq c_3(g,n) \sigma(X)^3 \ \ \ \ \ (4)

Let us now focus on proving that {(II)} is “small”. Since {(II)-(I)\leq (V)-(I)} (cf. (3)), if we write {\mu_1\overline{\mu_2} = f+ih=f^+-f^-+ih} (where {f^+} and {f^-} are the positive and negative parts of the real part {f} of {\mu_1\overline{\mu_2})}, then we obtain that

\displaystyle \begin{array}{rcl} c_1(g,n)\,\sigma(X)^7\geq (II)-(I) &=& \int_X |f|\,\mathcal{D}|f|\,dA - \textrm{Re}\int_X\mu_1\overline{\mu_2}\,\mathcal{D}(\mu_1\overline{\mu_2})\, dA \\ &=& \int_X f^+\,\mathcal{D}f^+\,dA + 2\int_X f^+\,\mathcal{D}f^-\,dA+\int_X f^-\,\mathcal{D}f^-\,dA \\ & &- \int_X f\,\mathcal{D}f\,dA+\int_X h\,\mathcal{D}h\,dA \\ &=&4\int_X f^+\,\mathcal{D}f^-\,dA+\int_X h\,\mathcal{D}h\,dA. \end{array}

Since {\mathcal{D}} is positive, we derive that {4\int_X f^+\,\mathcal{D}f^-\,dA\leq c_1(g,n)\,\sigma(X)^7}. Thus, if {X} is compact, i.e., {X_{cusps}=\emptyset}, then Proposition 3 says that {G(p,q)\geq c_3(g,n)\,\sigma(X)^3} for all {p,q\in X}. It follows that

\displaystyle 4\,c_3(g,n)\,\sigma(X)^3\int_X f^+\,dA\int_X f^-\,dA\leq c_1(g,n)\,\sigma(X)^7

By orthogonality of {\{\mu_1,\mu_2\}}, we have that {\textrm{Re}\int_X\mu_1\overline{\mu_2}\,dA=0}, i.e., {\int_X f^+\,dA = \int_X f^-\,dA = (1/2) \int_X |f|\,dA}. By plugging this information into the previous inequality, we obtain the estimate

\displaystyle c_3(g,n)\,\left(\int_X |f|\,dA\right)^2\leq c_1(g,n)\,\sigma(X)^4 \ \ \ \ \ (5)

Next, we observe that {(V)-(IV)\leq (V)-(I)} (cf. (3)) in order to obtain that

\displaystyle c_1(g,n)\,\sigma(X)^7\geq (V)-(IV)=\int_X (|\mu_1\overline{\mu_2}|-|f|) \, (\mathcal{D}|\mu_1|^2)^{1/2} (\mathcal{D}|\mu_2|^2)^{1/2}\,dA

On the other hand, Proposition 3 ensures that {\mathcal{D}|\mu_{\ast}|^2\geq c_3(g,n)\,\sigma(X)^3\,\int_X|\mu_{\ast}|^2\,dA} for {\ast=1,2}. Since {\mu_1} and {\mu_2} are unitary tangent vectors, one has {\mathcal{D}|\mu_{\ast}|^2\geq c_3(g,n)\,\sigma(X)^3} for {\ast=1,2}. By inserting this inequality into the previous estimate, we derive that

\displaystyle c_1(g,n)\,\sigma(X)^7\geq c_3(g,n)\,\sigma(X)^3\,\int_X (|\mu_1\overline{\mu_2}|-|f|)\,dA \ \ \ \ \ (6)

From (5) and (6), we see that

\displaystyle \int_X |\mu_1\overline{\mu_2}|\,dA\leq \sqrt{\frac{c_1(g,n)}{c_3(g,n)}}\sigma(X)^2+\frac{c_1(g,n)}{c_3(g,n)}\sigma(X)^4\leq 2\sqrt{\frac{c_1(g,n)}{c_3(g,n)}}\sigma(X)^2 \ \ \ \ \ (7)

whenever {X} has a sufficiently small systole.

This {L^1} bound on {|\mu_1\overline{\mu_2}|} can be converted into a {C^0} bound thanks to Cauchy integral formula. More concrentely, as it is explained in Section 2 of Wolpert’s preprint, after observing that {|\mu_1\overline{\mu_2}| = |\mu_1\mu_2|} and replacing Beltrami differentials {\mu_1} and {\mu_2} by the dual objects {q_1} and {q_2} (namely, quadratic differentials), we are led to study quartic differentials {q_1q_2}. By Cauchy integral formula on {\mathbb{H}}, one has

\displaystyle |q_1q_2(ds^2)^{-2}|(p)\leq \frac{1}{\pi}\int_{B(p,1)}|q_1q_2(ds^2)^{-2}|\,dA

On the other hand, if {X=\mathbb{H}/\Gamma} has systole {\rho(X)} and the cusp region {X_{cusps}} is empty, then the injectivity radius at any {p\in X} is {\geq \rho(X)/2}. Thus, there exists an universal constant {c_0>0} such that

\displaystyle |q_1q_2(ds^2)^{-2}|(p)\leq \frac{1}{\pi}\int_{B(p,1)}|q_1q_2(ds^2)^{-2}|\,dA\leq c_0\frac{1}{\rho(X)}\|q_1q_2\|_{L^1(X)}

for all {p\in X}. By plugging this inequality into (7), we conclude that

\displaystyle |\mu_1\overline{\mu_2}(p)|\leq 2c_0\sqrt{\frac{c_1(g,n)}{c_3(g,n)}}\frac{1}{\rho(X)}\sigma(X)^2\leq 2c_0\sqrt{\frac{c_1(g,n)}{c_3(g,n)}}\sigma(X)

for all {p\in X}.

Since {\mathcal{D}} is a positive operator on {C_0(X)} with unit norm (cf. Section 2 above) and {|f|\leq |\mu_1\overline{\mu_2}|}, we have that the previous inequality implies the following {C^0} bound on {\mathcal{D}|f|}:

\displaystyle \mathcal{D}|f|(p)\leq 2c_0\sqrt{\frac{c_1(g,n)}{c_3(g,n)}}\sigma(X)

for all {p\in X}. By combining this estimate with (7), we conclude that

\displaystyle (II)=\int_X |f|\,\mathcal{D}|f|\,dA\leq 4c_0\frac{c_1(g,n)}{c_3(g,n)}\sigma(X)^3 \ \ \ \ \ (8)

In summary, (4) and (8) imply that

\displaystyle (VI)-(I)\geq (VI)-(II)\geq \frac{c_3(g,n)}{2}\cdot\sigma(X)^3:=c_2(g,n)\cdot\sigma(X)^3

for the choice of constant {c_1(g,n):=\frac{c_3(g,n)^2}{8c_0}}. This proves Theorem 2 in the absence of cusp regions.

4.2. Proof of Theorem 2 modulo Proposition 3 when {X_{cusps}\neq\emptyset}

The arguments above for the case {X_{cusps}=\emptyset} also work in the case {X_{cusps}\neq\emptyset} because the cusp regions carry only a tiny fraction of the mass of the relevant functions, Beltrami differentials, etc.

More precisely, as it is explained in Section 2 of Wolpert’s preprint, if the constant {c_4(g,n)>0} is chosen correctly, then the Cauchy integral formula and the Schwarz lemma can be used to prove that

\displaystyle \int_{X_{cusps}}|\varphi (ds^2)^{-2}|\,dA\leq \frac{1}{8}\|\varphi\|_{L^1(X)}

for all holomorphic quartic differentials {\varphi}.

In particular, we do not lose too much information after truncating {\mu_1}, {\mu_2}, etc. to {X\setminus X_{cusps}} and this allows us to repeat the arguments of the case {X_{cusps}=\emptyset} to the corresponding truncated objects {\widetilde{\mu_1}}, {\widetilde{\mu_2}}, etc. without any extra difficulty: see Section 5 of Wolpert’s preprint for more details.

5. Proof of Proposition 3

Closing this post, let us give an idea of the proof of Proposition 3 (and we refer the reader to Section 4 of Wolpert’s preprint for more details).

Since {G(p,q)=-2\sum\limits_{\gamma\in\Gamma} Q_1(d(p,\gamma(q)))} and {-Q_1\sim e^{-2x}} (cf. Section 2 above), our task is reduced to give lower bounds on the Poincaré series

\displaystyle K(p,q)=\sum\limits_{\gamma\in\Gamma} e^{-2d(p,\gamma(q))}

For this sake, let us first recall that a hyperbolic surface {X} has thick-thin decomposition: the thick portion is the region where the injectivity radius is bounded away from zero by a uniform constant and the thin portion is the complement of the thick region. Geometrically, the thin region is the disjoint union of the cusp region {X_{cusps}} and a finite number of collars around simple closed short geodesics: roughly speaking, a collar consisting of the points at distance {\leq w(\alpha)=\log(1/\ell_{\alpha})+O(1)} of a short simple closed geodesic {\alpha} of length {\ell_{\alpha}}.

We can provide lower bounds on {K(p,q)} in terms of the behaviours of simple geodesic arcs connecting {p} and {q} on {X}.

More concretely, let {\theta_{pq}} be the shortest geodesic connecting {p} and {q}. Since {\theta_{pq}} is simple, we have that, for certain adequate choices of the constants defining the collars, one has that {\theta_{pq}} can not “back track” after entering a collar, i.e., it must connect the boundaries (rather than going out via the same boundary component). Furthermore, {\theta_{pq}} can not go very high into a cusp. Thus, if we decompose {\theta_{pq}} according to its visits to the thick region, the collars and the cusps, then the fact that {p,q\in X\setminus X_{cusps}} permits to check that it suffices to study the passages of {\theta_{pq}} through collars in order to get a lower bound on {K(p,q)}.

Next, if {\eta} is a subarc of {\theta_{pq}} crossing a collar around a short closed geodesic {\alpha}, then we can apply Dehn twists to {\eta} to get a family of simple arcs indexed by {\mathbb{Z}} giving a “contribution” to {K(p,q)} of

\displaystyle \sum\limits_{n\in\mathbb{Z}}e^{-2(2w(\alpha)+|n|\ell_{\alpha})}\geq c_5(g,n)\cdot \ell_{\alpha}^3

for some constant {c_5(g,n)>0} depending only on the topology of {X}. In this way, the desired result follows by putting all “contributions” together.

The celebrated works of several mathematicians (including Poincaré, Denjoy, …, ArnoldHermanYoccoz, …) provide a very satisfactory picture of the dynamics of smooth circle diffeomorphisms:

  • each {C^r}-diffeomorphism {f\in\textrm{Diff}^r(\mathbb{T})} of the circle {\mathbb{T}:=\mathbb{R}/\mathbb{Z}} has a well-defined rotation number {\alpha=\rho(f)} (which can be defined using the cyclic order of its orbits, for instance);
  • {f\in\textrm{Diff}^r(\mathbb{T})} is topologically semi-conjugated to the rigid rotation {R_{\alpha}(x)=x+\alpha} (i.e., {h\circ f=R_{\alpha}\circ h} for a surjective continuous map {h:\mathbb{T}\rightarrow \mathbb{T}}) whenever its rotation number {\alpha=\rho(f)} is irrational;
  • if {f\in\textrm{Diff}^2(\mathbb{T})} has irrational rotation number {\alpha}, then {f} is topologically conjugated to {R_{\alpha}} (i.e., there is an homeomorphism {h:\mathbb{T}\rightarrow\mathbb{T}} such that {h\circ f = R_{\alpha}\circ h});
  • if {f\in\textrm{Diff}^r(\mathbb{T})}, {r\geq 3}, has an irrational rotation number {\alpha} satisfying a Diophantine condition of the form {|\alpha-p/q|\geq c/q^{2+\beta}} for some {c>0}, {(r-1)/2>\beta\geq 0}, and all {p/q\in\mathbb{Q}}, then there exists {h\in\textrm{Diff}^{r-1-\beta-}(\mathbb{T}):= \bigcap\limits_{\varepsilon>0}\textrm{Diff}^{r-1-\beta-\varepsilon}(\mathbb{T})} conjugating {f} and {R_{\alpha}} (i.e., {h\circ f = R_{\alpha}\circ h});
  • etc.

In particular, if {\alpha} has Roth type (i.e., for all {\varepsilon>0}, there exists {c_{\varepsilon}>0} such that {|\alpha-p/q|\geq c_{\varepsilon}/q^{2+\varepsilon}} for all {p/q\in\mathbb{Q}}), then any {f\in\textrm{Diff}^r(\mathbb{T})} with rotation number {\alpha} is {C^{r-1-}} conjugated to {R_{\alpha}} whenever {r>3}. (The nomenclature is motivated by Roth’s theorem saying that any irrational algebraic number has Roth type, and it is well-known that the set of Roth type numbers has full Lebesgue measure in {\mathbb{R}}.)

In the last twenty years, many authors gave important contributions towards the extension of this beautiful theory.

In this direction, a particularly successful line of research consists into thinking of circle rotations {R_{\alpha}} as standard interval exchange transformations on 2 intervals and trying to build smooth conjugations between generalized interval exchange transformations (g.i.e.t.) and standard interval exchange transformations. In fact, Marmi–Moussa–Yoccoz studied the notion of standard i.e.t. of restricted Roth type (a concept designed so that the circle rotation {R_{\alpha}} has restricted Roth type [when viewed as an i.e.t. on 2 intervals] if and only if {\alpha} has Roth type) and proved that, for any {r\geq 2}, the {C^{r+3}} g.i.e.t.s {T} close to a standard i.e.t. {T_0} of restricted Roth type such that {T} is {C^r}-conjugated to {T_0} form a {C^1}-submanifold of codimension {(g-1)(2r+1)+s} where {T_0} is the first return map to an interval transverse to a translation flow on a translation surface of genus {g\geq 1} and {T_0} is an i.e.t. on {d=2g+s-1} intervals.

An interesting consequence of this result of Marmi–Moussa–Yoccoz is the fact that local conjugacy classes behave differently for circle rotations and arbitrary i.e.t.s. Indeed, a circle rotation is an i.e.t. on 2 intervals associated to the first return map of a translation flow on the torus {\mathbb{T}^2=\mathbb{R}^2/\mathbb{Z}^2}, so that {R_{\alpha}} has genus {g=1} and also {s=1}. Hence, Marmi–Moussa–Yoccoz theorem says that its local conjugacy class of {R_{\alpha}} with {\alpha} of Roth type has codimension {(g-1)(2r+1)+s=1} regardless of the differentiability scale {r}. Of course, this fact was previously known from the theory of circle diffeomorphisms: by the results of Herman and Yoccoz, the sole obstruction to obtain a smooth conjugation between {f} and {R_{\alpha}} (with {\alpha} of Roth type) is described by a single parameter, namely, the rotation number of {f}. On the other hand, Marmi–Moussa–Yoccoz theorem says that the codimension

\displaystyle (g-1)(2r+1)+s

of the local conjugacy class of an i.e.t. of restricted Roth type with genus {g\geq 2} grows linearly with the differentiability scale {r}.

Remark 1 This indicates that KAM theoretical approaches to the study of the dynamics of g.i.e.t.s might be delicate because the “loss of regularity” in the usual KAM schemes forces the analysis of cohomological equations (linearized versions of the conjugacy problem) in several differentiability scales and Marmi–Moussa–Yoccoz theorem says that these changes of differentiabilty scale produce non-trivial effects on the numbers of obstructions (“codimensions”) to solve cohomological equations.

In any case, this interesting phenomenon concerning the codimension of local conjugacy classes of i.e.t.s of genus {g\geq 2} led Marmi–Moussa–Yoccoz to make a series of conjectures (cf. Section 1.2 of their paper) in order to further compare the local conjugacy classes of circle rotations and i.e.t.s of genus {g\geq 2}.

Among these fascinating conjectures, the second open problem in Section 1.2 of Marmi–Moussa–Yoccoz paper asks whether, for almost all i.e.t.s {T_0}, any {C^4} g.i.e.t. {T} with trivial conjugacy invariants (e.g., “simple deformations”) and {C^0} conjugated to {T_0} is also {C^1} conjugated to {T_0}. In other words, the {C^0} and {C^1} conjugacy classes of a typical i.e.t. {T_0} coincide.

In this short post, I would like to transcript below some remarks made during recent conversations with Pascal Hubert showing that the hypothesis “for almost all i.e.t.s {T_0}” can not be removed from the conjecture above. In a nutshell, we will see in the sequel that the self-similar standard interval exchange transformations associated to two special translation surfaces (called Eierlegende Wollmilchsau and Ornithorynque) of genera {3} and {4} are {C^0} but not {C^1} conjugated to a rich family of piecewise affine interval exchange transformations. Of course, I think that these examples are probably well-known to experts (and Jean-Christophe Yoccoz was probably aware of them by the time Marmi–Moussa–Yoccoz wrote down their conjectures), but I’m including some details of the construction of these examples here mostly for my own benefit.

Disclaimer: As usual, even though the content of this post arose from conversations with Pascal, all mistakes/errors in the sequel are my sole responsibility.

1. Preliminaries

1.1. Rauzy–Veech algorithm

The notion of “irrational rotation number” for generalized interval exchange transformations relies on the so-called Rauzy–Veech algorithm.

More concretely, given a {C^r}-g.i.e.t. {f:I\rightarrow I} sending a finite partition (modulo zero) {I=\bigcup\limits_{\alpha\in\mathcal{A}} I_{\alpha}^t} of {I} into closed subintervals {I_{\alpha}^t} disposed accordingly to a bijection {\pi_t:\mathcal{A}\rightarrow\{1,\dots,d\}} to a finite partition (modulo zero) {I=\bigcup\limits_{\alpha\in\mathcal{A}} I_{\alpha}^b} of {I} into closed subintervals {I_{\alpha}^b} disposed accordingly to a bijection {\pi_b:\mathcal{A}\rightarrow\{1,\dots,d\}} (via {C^r}-diffeomorphisms {f|_{I_{\alpha}^t}:I_{\alpha}^t\rightarrow I_{\alpha}^b}), an elementary step of the Rauzy–Veech algorithm produces a new {C^r}-g.i.e.t. {\mathcal{R}(f)} by taking the first return map of {f} to the interval {I\setminus J} where {J=I_{\pi_t^{-1}(d)}^t}, resp. {I_{\pi_b^{-1}(d)}^b} whenever {|I_{\pi_t^{-1}(d)}^t|<|I_{\pi_b^{-1}(d)}^b|}, resp. {|I_{\pi_t^{-1}(d)}^t|>|I_{\pi_b^{-1}(d)}^b|} (and {\mathcal{R}(f)} is not defined when {|I_{\pi_t^{-1}(d)}^t| = |I_{\pi_b^{-1}(d)}^b|}).

We say that a {C^r}-g.i.e.t. {f} has irrational rotation number whenever the Rauzy–Veech algorithm {\mathcal{R}} can be iterated indefinitely. This nomenclature is partly justified by the fact that Yoccoz generalized the proof of Poincaré’s theorem in order to establish that a {C^r}-g.i.e.t. {f} with irrational rotation number is topologically semi-conjugated to a standard, minimal i.e.t. {T_0}.

1.2. Denjoy counterexamples

Similarly to Denjoy’s theorem in the case of circle diffeomorphisms, the obstruction to promote topological semi-conjugations between {f} and {T_0} as above into {C^0}-conjugations is the presence of wandering intervals for {f}, i.e., non-trivial intervals {A} whose iterates under {f} are pairwise disjoint (i.e., {f^i(A)\cap f^j(A)=\emptyset} for all {i,j\in\mathbb{Z}}, {i\neq j}).

Moreover, as it was also famously established by Denjoy, a little bit of smoothness (e.g., {C^1} with derivative of bounded variation) suffices to preclude the existence of wandering intervals for circle diffeomorphisms, and, actually, some smoothness is needed because there are several examples of {C^1}-diffeomorphisms with any prescribed irrational rotation number and possessing wandering intervals. Nevertheless, it was pointed out by several authors (including Camelier–GutierrezBressaud–Hubert–MaasMarmi–Moussa–Yoccoz, …), a high amount of smoothness is not enough to avoid wandering intervals for arbitrary {C^r}-g.i.e.t.: indeed, there are many examples of piecewise affine interval exchange transformations possessing wandering intervals.

Remark 2 The facts mentioned in the previous two paragraphs partly justifies the nomenclature Denjoy counterexample for a {C^r}-g.i.e.t. with irrational rotation number possessing wandering intervals.

In the context of piecewise affine i.e.t.s, the Denjoy counterexamples are also characterized by the behavior of certain Birkhoff sums. More concretely, let {T} be a piecewise affine i.e.t. with irrational rotation number, say {T} is semi-conjugated to a standard i.e.t. {T_0:\bigcup I_{\alpha}^t\rightarrow \bigcup I_{\alpha}^b}. By definition, the logarithm {\log DT} of the slope of {T} is constant on the continuity intervals of {T} and, hence, it allows to naturally define a function {w} taking a constant value {w_{\alpha}} on each continuity interval {I_{\alpha}^t} of {T_0}. In this setting, it is possible to prove (see, e.g., the subsection 3.3.2 of Marmi–Moussa–Yoccoz paper) that {T} has wandering intervals if and only if there exists a point {x^*\in I=\bigcup I_{\alpha}^t} with bi-infinite {T_0}-orbit such that

\displaystyle \sum\limits_{n\in\mathbb{Z}} \exp(S_n w(x^*))<\infty

where the Birkhoff sum {S_nw(x^*)} at a point {x^*} with orbit {T_0^j(x^*)\in \textrm{int}(I_{\alpha_j}^t)} for all {j\in\mathbb{Z}} is defined as {S_nw(x^*)=\sum\limits_{j=0}^{n-1}w_{\alpha_j}}, resp. {\sum\limits_{j=-1}^{n}w_{\alpha_j}} for {n\geq 0}, resp. {n<0}.

For our subsequent purposes, it is worth to record the following interesting (direct) consequence of this “Birkhoff sums” characterization of piecewise affine Denjoy counterexamples:

Proposition 1 Let {T} be a piecewise affine i.e.t. topologically semi-conjugated to a standard, minimal i.e.t. {T_0}. Denote by {w} the piecewise constant function associated to the logarithms of the slopes of {T}.If {\liminf\limits_{n\rightarrow\infty} |S_n w(y)|<\infty} for all {y} with bi-infinite {T_0}-orbit, then {T} is topologically conjugated to {T_0} (i.e., {T} is not a Denjoy counterexample).

1.3. Special Birkhoff sums and the Kontsevich–Zorich cocycle

An elementary step of the Rauzy–Veech algorithm {\mathcal{R}} replaces a standard, minimal i.e.t. {T_0} on an interval {I=\bigcup\limits_{\alpha\in\mathcal{A}} I_{\alpha}^t} by a standard, minimal i.e.t. {\mathcal{R}(T_0)} given by the first return map of {T_0} on an appropriate subinterval {J=\bigcup\limits_{\alpha\in\mathcal{A}} J_{\alpha}^t\subset I}.

The special Birkhoff sum {\mathcal{S}} associated to an elementary step {\mathcal{R}} is the operator mapping a function {\phi:I\rightarrow I} to a function {\mathcal{S}\phi(x)=S_{r(x)}\phi(x):=\sum\limits_{j=0}^{r(x)-1}\phi(T_0^j(x))}, {x\in J}, where {r(x)} stands for the first return time to {J}.

The special Birkhoff sum operator {S} preserves the space of piecewise constant functions in the sense that {\mathcal{S}\phi} is constant on each {J_{\alpha}^t} whenever {\phi} is constant on each {I_{\beta}^t}. In particular, the restriction of {\mathcal{S}} to the space of such piecewise constant functions gives rise to a matrix {B:\mathbb{R}^{\mathcal{A}}\rightarrow \mathbb{R}^{\mathcal{A}}}. The family of matrices obtained from the successive iterates of the Rauzy–Veech algorithm provides a concrete description of the so-called Kontsevich–Zorich cocycle.

In summary, the behaviour of special Birkhoff sums (i.e., Birkhoff sums at certain “return” times) of piecewise constant functions is described by the Kontsevich–Zorich cocycle. Therefore, in view of Proposition 1, it is probably not surprising to the reader at this point that the Lyapunov exponents of the Kontsevich–Zorich cocycle will have something to do with the presence or absence of piecewise affine Denjoy counterexamples.

1.4. Eierlegende Wollmilchsau and Ornithorynque

The Eierlegende Wollmilchsau and Ornithorynque are two remarkable translation surfaces {M_{EW}} and {M_{O}} of genera {3} and {4} obtained from finite branched covers of the torus {\mathbb{T}^2}. Among their several curious features, we would like to point out that the following fact proved by Jean-Christophe Yoccoz and myself: if {T_0} is a standard i.e.t. on {\#\mathcal{A}=9} or {10} intervals (resp.) associated to the first return map of the translation flow {V} in a typical direction on {M_{EW}} or {M_{O}} (resp.), then there are vectors {q_V}, {p_{T_0}} and a {(\#\mathcal{A}-2)}-dimensional vector subspace {H} such that {\mathbb{R}^{\mathcal{A}} = \mathbb{R} q_V\oplus H\oplus \mathbb{R} p_{T_0}} is an equivariant decomposition with respect to the matrices of the Kontsevich–Zorich cocycle with the following properties:

  • (a) {q_V} generates the Oseledets direction of the top Lyapunov exponent {\theta_1>0};
  • (b) {p_{T_0}} generates the Oseledets direction of the smallest Lyapunov exponent {-\theta_1};
  • (c) the matrices of the Kontsevich–Zorich cocycle act on {H} through a finite group.

In the literature, the Lyapunov exponents {\pm\theta_1} are usually called the tautological exponents of the Kontsevich–Zorich cocycle. In this terminology, the third item above is saying that all non-tautological Lyapunov exponents of the Kontsevich–Zorich associated to {M_{EW}} and {M_{O}} vanish.

In the next two sections, we will see that this curious behaviour of the Kontsevich–Zorich cocycle of {M_{EW}} or {M_{O}} along {H} allows to construct plenty of piecewise affine i.e.t.s which are {C^0} but not {C^1} conjugated to standard (and uniquely ergodic) i.e.t.s.

2. “Il n’y a pas de contre-exemple de Denjoy affine par morceaux issu de {M_{EW}} et {M_{O}}

In this section (whose title is an obvious reference to a famous article by Jean-Christophe Yoccoz), we will see that the Eierlegende Wollmilchsau and Ornithorynque never produce piecewise affine Denjoy counterexamples with irrational rotation number of “bounded type”.

More precisely, let us consider {T} is a piecewise affine i.e.t. topologically semi-conjugated to {T_0} coming from (the first return map of the translation flow in the direction of a pseudo-Anosov homeomorphism of) {M_{EW}} or {M_{O}}. It is well-known that the piecewise constant function {w} associated to the logarithms of the slopes {DT} of {T} belongs to {H\oplus \mathbb{R} p_{T_0}} (see, e.g., Section 3.4 of Marmi–Moussa–Yoccoz paper). In order to simplify the exposition, we assume that the “irrational rotation number” {T_0} has “bounded type”, that is, {T_0} is self-similar in the sense that some of its iterates {\mathcal{R}^k(T_0)} under the Rauzy–Veech algorithm actually coincides with {T_0} up to scaling.

If {w\in H}, then the item (c) from Subsection 1 above implies that all special Birkhoff sums of {w} (in the future and in the past) are bounded. From this fact, we conclude that {\liminf\limits_{n\rightarrow\infty} |S_nw(y)|\leq C} for all {y} with bi-infinite {T_0}-orbit: indeed, as it is explained in details in Bressaud–Bufetov–Hubert article, if {T_0} is self-similar, then the orbits of {T_0} can be described by a substitution on a finite alphabet {\mathcal{A}} and this allows to select a bounded subsequence of {S_nw(y)} thanks to the repetition of certain words in the prefix-suffix decomposition.

In particular, it follows from Proposition 1 above that there is no Denjoy counterexample among the piecewise affine i.e.t.s {T} topologically semi-conjugated to a self-similar standard i.e.t. {T_0} coming from {M_{EW}} or {M_O} such that {w\in H}.

Remark 3 Actually, it is possible to explore the fact that {p_{T_0}} is a stable vector (i.e., it generates the Oseledets space of a negative Lyapunov exponent) to remove the constraint “{w\in H}” from the statement of the previous paragraph.

In other words, we showed that any {w\in H} always provides a piecewise affine i.e.t. {C^0}-conjugated to {T_0}. Note that this is a relatively rich family of piecewise affine i.e.t.s because {H} is a vector space of dimension {7}, resp. {8}, when {T_0} is a self-similar standard i.e.t. coming from {M_{EW}}, resp. {M_O}.

3. Cohomological obstructions to {C^1} conjugations

Closing this post, we will show that the elements {w\in H\setminus\{0\}} always lead to piecewise affine i.e.t.s which are not {C^1} conjugated to self-similar standard i.e.t.s of {M_{EW}} or {M_O}. Of course, this shows that the {C^0} and {C^1} conjugacy classes of a self-similar standard i.e.t. of {M_{EW}} or {M_O} are distinct and, a fortiori, the Marmi–Moussa–Yoccoz conjecture about the coincidence of {C^0} and {C^1} conjugacy classes of standard i.e.t.s becomes false if we remove “for almost all standard i.e.t.s” from its statement.

Suppose that {T} is a piecewise affine i.e.t. {C^1}-conjugated to a self-similar standard i.e.t. {T_0} of {M_{EW}} or {M_O}, say {T\circ h = h\circ T_0} for some {C^1}-diffeomorphism {h}. By taking derivatives, we get

\displaystyle (DT\circ h) \cdot h' = h'\circ T_0

since {T_0} is an isometry. Of course, we recognize the slope of {T} on the left-hand side of the previous equation. So, by taking logarithms, we obtain

\displaystyle w=\Psi\circ T_0 - \Psi

where {\Psi:=\log h'} is a {C^0} function. In other terms, {\Psi} is a solution of the cohomological equation and {w} is a {C^0}-coboundary. Hence, the Birkhoff sums {S_nw=\Psi\circ T_0^n-\Psi} are bounded and, by continuity of {\Psi}, the special Birkhoff sums {\mathcal{S}w} of {w} converge to zero. Equivalently, {w\in\mathbb{R}^{\mathcal{A}}} belongs to the weak stable space of the Kontsevich–Zorich cocycle (compare with Remark 3.9 of Marmi–Moussa–Yoccoz paper).

However, the item (c) from Subsection 1.4 above tells that the Kontsevich–Zorich cocycle acts on {w\in H\setminus\{0\}} through a finite group of matrices and, thus, {w\in H\setminus\{0\}} can not converge to zero under the Kontsevich–Zorich cocycle.

This contradiction proves that {T} is not {C^1}-conjugated to {T_0}, as desired.

Patrice Le Calvez and Jean-Christophe Yoccoz showed in 1997 that there are no minimal homemorphisms on the infinite annulus \mathbb{R}/\mathbb{Z}\times\mathbb{R}.

Their beautiful paper was motivated by the quest of finding minimal homeomorphisms on punctured spheres \mathbb{S}^2\setminus\{p_1,\dots,p_k\}. More concretely, the non-existence of such homeomorphism was previously known when k=0 (as an easy application of the features of Lefschetz indices), k=1 (thanks to the works of Brouwer and Guillou), and k\geq 3 (thanks to the work of Handel), so that the main result in Jean-Christophe and Patrice paper ensures the non-existence of minimal homeomorphisms in the remaining (harder) case of k=2.

A key step in Jean-Christophe and Patrice proof of their theorem above is to establish the following result about the sequence of Lefschetz indices i(f^k,z) of iterates f^k of a local homeomorphism f of the plane at a fixed point z of f: if z is not a sink nor a source, then there are integers q, r\geq 1 such that

i(f^k,z) = \left\{\begin{array}{cc} 1-rq & \textrm{ if }k\in q\mathbb{Z} \\ 1 & \textrm{ otherwise } \end{array}\right.

As it turns out, Jean-Christophe and Patrice planned a sequel to this paper with the idea of extending their techniques to compute the sequences of Lefschetz indices of periodic points of f belonging to any given Jordan domain U with K=\bigcap\limits_{n\in\mathbb{Z}} f^{-k}(U) is compact.

In fact, this plan was already known when the review of Jean-Christophe and Patrice paper came out (see here), and, as Patrice told me, some arguments from this promised subsequent work were used in the literature as a sort of folklore.

Nevertheless, a final version of this preprint was never released, and, even worse, some portions of the literature were invoking some arguments from a version of the preprint which was available only to Jean-Christophe (but not to Patrice).

Of course, this situation became slightly problematic when Jean-Christophe passed away, but fortunately Patrice and I were able to locate the final version of the preprint in Jean-Christophe’s mathematical archives. (Here, the word “final” means that all mathematical arguments are present, but the preprint has no abstract, introduction, or other “cosmetic” details.)

After doing some editing (to correct minor typos, add better figures [with the aid of Aline Cerqueira], etc.), Patrice and I are happy to announce that the folklore preprint by Jean-Christophe and Patrice (entitled “Suite des indices de Lefschetz des itérés pour un domaine de Jordan qui est un bloc isolant“) is finally publicly available here. We hope that you will enjoy reading this text (written in French)!

Posted by: matheuscmss | June 26, 2019

Yoccoz book collection at ICTP

The mathematical books of Michel Herman were donated to IMPA’s library by Jean-Christophe Yoccoz in the early 2000s: it amounts to more than 700 books and the complete list of titles can be found here.

This beautiful gesture of donating the books of a great mathematician to a developing country helped in the training of several mathematicians. In particular, I remember that reading Herman’s books during my PhD at IMPA was a singular experience in two aspects: intellectually, it gave me access to many high level mathematical topics, and olfactively, it was curious to get a smell of cigarette smoke out of old books (rather than the “usual” smell). (As I learned later, this experience was fully justified by the facts that Herman was an avid reader and a heavy smoker.)

Of course, this attitude of Jean-Christophe prompted me to discuss with Stefano Marmi about an appropriate destination in Africa to send Yoccoz’s mathematical books. After some conversations, we contacted ICTP (and, in particular, Stefano Luzzatto) to inquire about the possibility of sending Yoccoz’s books to Senegal (as a sort of “retribution” for the good memories that Jean-Christophe had during his visit to AIMS-Senegal and University of Dakar in December 2011) or Rwanda.

Unfortunately, some organisational difficulties made that we were obliged to split this plan into two parts. More concretely, rather than taking unnecessary risks by rushing to send Yoccoz’s books directly to Africa, last Thursday I sent all of them (a total of 13 boxes weighting approximately 35kg each) to ICTP library, so that they can already be useful to all ICTP visitors — in particular those coming from developing countries — instead of staying locked up in my office (where they were only sporadically read by me). In this way, we get some extra time to carefully think the definitive transfer of Yoccoz’s books to Africa while making them already publicly available.

Anyhow, the next time you visit ICTP, I hope that Yoccoz’s books will help you in some way!


Let {S_{g,n}} be a surface of genus {g\geq 0} with {n\geq 0} punctures. Given a Lie group {G}, the {G}-character variety of {S_{g,n}} is the space {X(S_{g,n},G)} of representations {\pi_1(S_{g,n})\rightarrow G} modulo conjugations by elements of {G}.

The mapping class group {\textrm{Mod}(S_{g,n})} of isotopy classes of orientation-preserving diffeomorphisms of {S_{g,n}} acts naturally on {X(S_{g,n},G)}.

The dynamics of mapping class groups on character varieties was systematically studied by Goldman in 1997: in his landmark paper, he showed that the {\textrm{Mod}(S_{g,0})}-action on {X(S_{g,0},SU(2))} is ergodic with respect to Goldman–Huebschmann measure whenever {g\geq 1}.

Remark 1 This nomenclature is not standard: we use it here because Goldman showed here that {X(S_{g,0},SU(2))} has a volume form coming from a natural symplectic structure and Huebschmann proved here that this volume form has finite mass.

The ergodicity result above partly motivates the question of understanding the dynamics of individual elements of mapping class groups acting on {SU(2)}-character varieties.

In this direction, Brown studied in 1998 the actions of elements of {SL(2,\mathbb{Z}) = \textrm{Mod}(S_{1,1})} on the character variety {X(S_{1,1}, SU(2))}. As it turns out, if {\gamma\in \pi_1(S_{1,1})} is a small loop around the puncture, then the {SL(2,\mathbb{Z})}-action on {X(S_{1,1},SU(2))} preserves each level set {\kappa^{-1}(k)}, {k\in\mathbb{R}}, of the function {\kappa:X(S_{1,1},SU(2))\rightarrow\mathbb{R}} sending {[\rho]\in X(S_{1,1},SU(2))} to the trace of the matrix {\rho(\gamma)}. Here, Brown noticed that the dynamics of elements of {SL(2,\mathbb{Z})} on level sets {\kappa^{-1}(k)} with {k} close to {-2} fit the setting of the celebrated KAM theory (assuring the stability of non-degenerate elliptic periodic points of smooth area-preserving maps). In particular, Brown tried to employ Moser’s twisting theorem to conclude that no element of {SL(2,\mathbb{Z})} can act ergodically on all level sets {\kappa^{-1}(k)}, {k\in[-2,2]}.

Strictly speaking, Brown’s original argument is not complete because Moser’s theorem is used without checking the twist condition.

In the sequel, we revisit Brown’s work in order to show that his conclusions can be derived once one replaces Moser’s twisting theorem by a KAM stability theorem from 2002 due to Rüssmann.

1. Statement of Brown’s theorem

1.1. {SU(2)}-character variety of a punctured torus

Recall that the fundamental group {\pi_1(S_{1,1})} of an once-punctured torus is naturally isomorphic to a free group {F_2} on two generators {\alpha} and {\beta} such that the commutator {[\alpha, \beta]} corresponds to a loop {\gamma} around the puncture of {S_{1,1}}.

Therefore, a representation {\rho:\pi_1(S_{1,1})\rightarrow SU(2)} is determined by a pair of matrices {\rho(\alpha), \rho(\beta)\in SU(2)}, and an element {[\rho]\in X(S_{1,1},SU(2))} of the {SU(2)}-character variety of {S_{1,1}} is determined by the simultaneous conjugacy class {(\phi\rho(\alpha)\phi^{-1}, \phi\rho(\beta)\phi^{-1})}, {\phi\in SU(2)}, of a pair of matrices {(\rho(\alpha), \rho(\beta))\in SU(2)\times SU(2)}.

The traces {x=\textrm{tr}(\rho(\alpha))}, {y=\textrm{tr}(\rho(\beta))} and {z=\textrm{tr}(\rho(\alpha\beta))} of the matrices {\rho(\alpha)}, {\rho(\beta)} and {\rho(\alpha\beta)} provide an useful system of coordinates on {X(S_{1,1}, SU(2))}: algebraically, this is an incarnation of the fact that the ring {\mathbb{R}[SU(2)\times SU(2)]^{SU(2)}} of invariants of {(A,B)\in SU(2)\times SU(2)} is freely generated by the traces of {A}, {B} and {AB}.

In particular, the following proposition expresses the trace of {\rho(\gamma)=\rho([\alpha,\beta])} in terms of {x=\textrm{tr}(\rho(\alpha))}, {y=\textrm{tr}(\rho(\beta))} and {z=\textrm{tr}(\rho(\alpha\beta))}.

Proposition 1 Given {A, B\in SL(2,\mathbb{C})}, one has

\displaystyle \textrm{tr}(ABA^{-1}B^{-1}) = \textrm{tr}(A)^2 + \textrm{tr}(B)^2 + \textrm{tr}(AB)^2 - \textrm{tr}(A)\textrm{tr}(B)\textrm{tr}(AB)-2

Proof: By Cayley–Hamilton theorem (or a direct calculation), any {M\in SL(2,\mathbb{C})} satisfies {M^2-\textrm{tr}(M) M + \textrm{Id}=0}, i.e., {M+M^{-1} = \textrm{tr}(M) \textrm{Id}}.

Hence, for any {X, Y\in SL(2,\mathbb{C})}, one has

\displaystyle XY+Y^{-1}X^{-1} = \textrm{tr}(XY) \textrm{Id} \quad \textrm{and} \quad XY^{-1}+YX^{-1} = \textrm{tr}(XY^{-1}) \textrm{Id},

so that

\displaystyle \textrm{tr}(XY)+\textrm{tr}(XY^{-1}) = \textrm{tr}(X)\textrm{tr}(Y).

It follows that, for any {A, B\in SL(2,\mathbb{C})}, one has

\displaystyle \textrm{tr}(ABA^{-1}B^{-1})+\textrm{tr}(ABA^{-1}B) = \textrm{tr}(ABA^{-1})\textrm{tr}(B) = \textrm{tr}(B)^2


\displaystyle \textrm{tr}(ABA^{-1}B)+\textrm{tr}(AB(A^{-1}B)^{-1}) = \textrm{tr}(AB)\textrm{tr}(A^{-1}B).

Since {\textrm{tr}(AB(A^{-1}B)^{-1}) = \textrm{tr}(A^2) = \textrm{tr}(A)^2-2} and {\textrm{tr}(A^{-1}B) +\textrm{tr}(AB) = \textrm{tr}(A)\textrm{tr}(B)}, the proof of the proposition is complete. \Box

1.2. Basic dynamics of {SL(2,\mathbb{Z})} on character varieties

Recall that the mapping class group {\textrm{Mod}(S_{1,1})} is generated by Dehn twists {\tau_{\alpha}} and {\tau_{\beta}} about the generators {\alpha} and {\beta} of {\pi_1(S_{1,1})}. In appropriate coordinates on the once-punctured torus {S_{1,1}}, the isotopy classes of these Dehn twists are represented by the actions of the matrices

\displaystyle \tau_{\alpha} = \left(\begin{array}{cc}1&1\\0&1\end{array}\right), \tau_{\beta} = \left(\begin{array}{cc}1&0\\1&1\end{array}\right) \in SL(2,\mathbb{Z})

on the flat torus {\mathbb{R}^2/\mathbb{Z}^2}. In particular, at the homotopy level, the actions of {\tau_{\alpha}} and {\tau_{\beta}} on {\pi_1(S_{1,1})} are given by the Nielsen transformations

\displaystyle \tau_{\alpha}(\alpha)=\alpha, \quad \tau_{\alpha}(\beta)=\beta\alpha, \quad \tau_{\beta}(\alpha) = \alpha\beta, \quad \tau_{\beta}(\beta)=\beta. \ \ \ \ \ (1)

Since the elements of {\textrm{Mod}(S_{1,1})=SL(2,\mathbb{Z})} fix the puncture of {S_{1,1}}, they preserve the homotopy class {\gamma=[\alpha,\beta]\in\pi_1(S_{1,1})} of a small loop around the puncture. Therefore, the {\textrm{Mod}(S_{1,1})}-action on the character variety {X(S_{1,1}, SU(2))} respects the level sets {\kappa^{-1}(k)}, {k\in[-2,2]}, of the function {\kappa: X(S_{1,1}, SU(2))\rightarrow [-2,2]} given by

\displaystyle \kappa([\rho]) := \textrm{tr}(\rho(\gamma)).

Furthermore, each level set {\kappa^{-1}(k)}, {-2<k\leq 2}, carries a finite (GoldmanHuebschmann) measure coming from a natural {\textrm{Mod}(S_{1,1})}-invariant symplectic structure.

In this context, the level set {\kappa^{-1}(2)} corresponds to impose the restriction {\rho(\gamma)=\textrm{Id}\in SU(2)}, so that {\kappa^{-1}(2)} is naturally identified with the character variety {X(S_{1,0}, SU(2))}.

In terms of the coordinates {x=\textrm{tr}(\rho(\alpha))}, {y=\textrm{tr}(\rho(\beta))} and {z=\textrm{tr}(\rho(\alpha\beta))} on {X(S_{1,1}, SU(2))}, we can use Proposition 1 (and its proof) and (1) to check that

\displaystyle \kappa(x,y,z)=x^2+y^2+z^2-xyz-2 \ \ \ \ \ (2)


\displaystyle \tau_{\alpha}(x,y,z)= (x,z,xz-y), \quad \tau_{\beta}^{-1}(x,y,z)=(xy-z,y,x). \ \ \ \ \ (3)

Hence, we see from (2) that:

  • the level set {\kappa^{-1}(-2)} consists of a single point {(0,0,0)};
  • the level sets {\kappa^{-1}(k)}, {-2<k<2}, are diffeomorphic to {2}-spheres;
  • the character variety {X(S_{1,1},SU(2))} is a {3}-dimensional orbifold whose boundary {\kappa^{-1}(2)} is a topological sphere with 4 singular points (of coordinates {2(\varepsilon_1,\varepsilon_2,\varepsilon_3)\in\{-2,2\}^3} with {\varepsilon_1\varepsilon_2\varepsilon_3=1}) corresponding to the character variety {X(S_{1,0}, SU(2))}.

After this brief discussion of some geometrical aspects of {X(S_{1,1}, SU(2))}, we are ready to begin the study of the dynamics of {\textrm{Mod}(S_{1,1})}. For this sake, recall that the elements of {\textrm{Mod}(S_{1,1})=SL(2,\mathbb{Z})} are classified into three types:

  • {g\in SL(2,\mathbb{Z})} is called elliptic whenever {|\textrm{tr}(g)|<2};
  • {g\in SL(2,\mathbb{Z})} is called parabolic whenever {|\textrm{tr}(g)|=2};
  • {g\in SL(2,\mathbb{Z})} is hyperbolic whenever {|\textrm{tr}(g)|>2}.

The elliptic elements {g\in SL(2,\mathbb{Z})} have finite order (because {\textrm{tr}(g)= 0, \pm 1} and {g^2-\textrm{tr}(g) g + \textrm{Id}=0}) and the parabolic elements {g\in SL(2,\mathbb{Z})} are conjugated to {\pm\tau_{\alpha}^n} for some {n\in\mathbb{Z}}.

In particular, if {g\in SL(2,\mathbb{Z})} is elliptic, then {g} leaves invariant non-trivial open subsets of each level set {\kappa^{-1}(k)}, {-2<k\leq 2}. Moreover, if {g\in SL(2,\mathbb{Z})} is parabolic, then {g} preserves a non-trivial and non-peripheral element {\delta\in\pi_1(S_{1,1})} and, a fortiori, {g} preserves the level sets of the function {f_{\delta}: X(S_{1,1}, SU(2))\rightarrow [-2,2]}, {f_{\delta}([\rho]) := \textrm{tr}(\rho(\delta))}. Since any such function {f_{\delta}} has a non-constant restriction to any level set {\kappa^{-1}(k)}, {-2<k\leq 2}, Brown concluded that:

Proposition 2 (Proposition 4.3 of Brown’s paper) If {g\in SL(2,\mathbb{Z})} is not hyperbolic, then its action on {\kappa^{-1}(k)} is not ergodic whenever {-2<k\leq 2}.

On the other hand, Brown observed that the action of any hyperbolic element of {SL(2,\mathbb{Z})} on {\kappa^{-1}(2)} can be understood via a result of Katok.

Proposition 3 (Theorem 4.1 of Brown’s paper) Any hyperbolic element of {SL(2,\mathbb{Z})} acts ergodically on {\kappa^{-1}(2)}.

Proof: The level set {\kappa^{-1}(2)} is the character variety {X(S_{1,0}, SU(2))}. In other words, a point in {\kappa^{-1}(2)} represents the simultaneous conjugacy class of a pair {(\rho(\alpha), \rho(\beta))} of commuting matrices in {SU(2)}.

Since a maximal torus of {SU(2)} is a conjugate of the subgroup

\displaystyle T = \left\{\left(\begin{array}{cc} e^{2\pi i\theta} & 0 \\ 0 & e^{-2\pi i\theta}\end{array}\right):\theta\in\mathbb{R}/\mathbb{Z}\right\},

we have that {X(S_{1,0}, SU(2))} is the set of simultaneous conjugacy classes of elements of {T\times T}. In view of the action by conjugation

\displaystyle \left(\begin{array}{cc} 0 & 1 \\ -1 & 0\end{array}\right) \left(\begin{array}{cc} e^{i\theta} & 0 \\ 0 & e^{-i\theta}\end{array}\right)\left(\begin{array}{cc} 0 & 1 \\ -1 & 0\end{array}\right)^{-1} = \left(\begin{array}{cc} e^{-i\theta} & 0 \\ 0 & e^{i\theta}\end{array}\right)

of the element {w=\left(\begin{array}{cc} 0 & 1 \\ -1 & 0\end{array}\right)} of the Weyl subgroup of {SU(2)}, we have

\displaystyle X(S_{1,0}, SU(2)) = (T\times T)/w.

In terms of the coordinates {(\theta,\phi)\in\mathbb{R}^2/\mathbb{Z}^2} given by the phases of the elements

\displaystyle (\left(\begin{array}{cc} e^{2\pi i\theta} & 0 \\ 0 & e^{-2\pi i\theta}\end{array}\right), \left(\begin{array}{cc} e^{2\pi i\phi} & 0 \\ 0 & e^{-2\pi i\phi}\end{array}\right))\in T\times T,

the element {w} acts by {(\theta,\phi)\mapsto(-\theta,-\phi)}, so that {X(S_{1,0}, SU(2))} is the topological sphere obtained from the quotient of {\mathbb{R}^2/\mathbb{Z}^2} by its hyperelliptic involution {\iota} (and {X(S_{1,0}, SU(2))} has only four singular points located at the subset {\{0, 1/2\}^2} of fixed points of the hyperelliptic involution). Moreover, an element {\left(\begin{array}{cc} a & b \\ c & d\end{array}\right)\in SL(2,\mathbb{Z})} acts on {T\times T} by mapping {(\theta,\phi)} to {(a\theta+c\phi, b\theta+d\phi)}.

In summary, the action of {SL(2,\mathbb{Z})} on {\kappa^{-1}(2)} is given by the usual {SL(2,\mathbb{Z})}-action on the topological sphere {(\mathbb{R}^2/\mathbb{Z}^2)/\iota} induced from the standard {SL(2,\mathbb{Z})} on the torus {\mathbb{R}^2/\mathbb{Z}^2}.

By a result of Katok, it follows that the action of any hyperbolic element of {SL(2,\mathbb{Z})} on {\kappa^{-1}(2)} is ergodic (and actually Bernoulli). \Box

1.3. Brown’s theorem

The previous two propositions raise the question of the ergodicity of the action of hyperbolic elements of {SL(2,\mathbb{Z})} on the level sets {\kappa^{-1}(k)}, {-2<k<2}. The following theorem of Brown provides an answer to this question:

Theorem 4 Let {g} be an hyperbolic element of {SL(2,\mathbb{Z})}. Then, there exists {-2<k<2} such that {g} does not act ergodically on {\kappa^{-1}(k)}.

Very roughly speaking, Brown establishes Theorem 4 along the following lines. One starts by performing a blowup at the origin {\kappa^{-1}(-2)=\{(0,0,0)\}} in order to think of the action of {g} on {X(S_{1,1},SU(2))} as a one-parameter family {g_{(k)}}, {-2\leq k\leq 2}, of area-preserving maps of the {2}-sphere such that {g_{(-2)}} is a finite order element of {SO(3)}. In this way, we have that {g_{(k)}} is a non-trivial one-parameter family going from a completely elliptic behaviour at {k=-2} to a non-uniformly hyperbolic behaviour at {k=2}. This scenario suggests that the conclusion of Theorem 4 can be derived via KAM theory in the elliptic regime.

In the next (and last) section of this post, we revisit Brown’s ideas leading to Theorem 4 (with an special emphasis on its KAM theoretical aspects).

2. Revisited proof of Brown’s theorem

2.1. Blowup of the origin

The origin {\kappa^{-1}(-2)} of the character variety {X(S_{1,1}, SU(2))} can be blown up into a sphere of directions {S_{-2}}. The action of {SL(2,\mathbb{Z})} on {S_{-2}} factors through an octahedral subgroup of {SO(3)}: this follows from the fact that (3) implies that the generators {\tau_{\alpha}} and {\tau_{\beta}} of {SL(2,\mathbb{Z})} act on {S_{-2}} as

\displaystyle \tau_{\alpha}|_{S_{-2}}(\dot{x},\dot{y},\dot{z}) = (\dot{x},\dot{z},-\dot{y}), \quad \tau_{\beta}^{-1}|_{S_{-2}}(\dot{x},\dot{y},\dot{z})=(-\dot{z},\dot{y},\dot{x}).

In this way, each element {g\in SL(2,\mathbb{Z})} is related to a root of unity

\displaystyle \lambda_{-2}(g)\in U(1)=\{w\in \mathbb{C}: |w|=1\}

of order {\leq 4} coming from the eigenvalues of the derivative of {g|_{S_{-2}}} at any of its fixed points.

Example 1 The hyperbolic element {\left(\begin{array}{cc} 2 & 1 \\ 1 & 1\end{array}\right) = \tau_{\alpha}\tau_{\beta}} acts on {S_{-2}} via the element {(\dot{x},\dot{y},\dot{z})\mapsto (\dot{z},-\dot{x},-\dot{y})} of {SO(3)} of order {3}.

2.2. Bifurcations of fixed points

An hyperbolic element {g\in SL(2,\mathbb{Z})} induces a non-trivial polynomial automorphism of {\mathbb{R}^3} whose restriction to {\kappa^{-1}([-2,2])} describe the action of {g} on {X(S_{1,1}, SU(2))}. In particular, the set {L_g} of fixed points of this polynomial automorphism in {\kappa^{-1}([-2,2])} is a semi-algebraic set of dimension {< 3}.

Actually, it is not hard to exploit the fact that {g} acts on the level sets {\kappa^{-1}(k)}, {k\in[-2,2]}, through area-preserving maps to compute the Zariski tangent space to {L_{g}} in order to verify that {L_g} is one-dimensional (cf. Proposition 5.1 in Brown’s work).

Moreover, this calculation of Zariski tangent space can be combined with the fact that any hyperbolic element {g\in SL(2,\mathbb{Z})} has a discrete set of fixed points in {\mathbb{R}^2/\mathbb{Z}^2} and, a fortiori, in {\kappa^{-1}(2)=X(S_{1,1}, SU(2))} to get that {L_g} is transverse to {\kappa} except at its discrete subset of singular points and, hence, {L_g\cap \kappa^{-1}(k)} is discrete for all {-2\leq k\leq 2} (cf. Proposition 5.2 in Brown’s work).

Example 2 The hyperbolic element {\left(\begin{array}{cc} 2 & 1 \\ 1 & 1\end{array}\right) = \tau_{\alpha}\tau_{\beta}} acts on {X(S_{1,1}, SU(2))} via the polynomial automorphism {(x,y,z)\mapsto (z, zy-x, z(zy-x)-y)} (cf. (3)). Thus, the corresponding set of fixed points is given by the equations

\displaystyle x=z, \quad y=zy-x, \quad z=z(zy-x)-y

describing an embedded curve in {\mathbb{R}^3}.

In general, the eigenvalues {\lambda(p), \lambda(p)^{-1}} of the derivative at {p\in L_g} of the action of an hyperbolic element {g\in SL(2, \mathbb{Z})} on {\kappa^{-1}(\kappa(p))} can be continuously followed along any irreducible component {\ell_g\ni p} of {L_g}.

Furthermore, it is not hard to check that {\lambda} is not constant on {\ell_g} (cf. Lemma 5.3 in Brown’s work). Indeed, this happens because there are only two cases: the first possibility is that {\ell_g} connects {\kappa^{-1}(-2)} and {\kappa^{-1}(2)} so that {\lambda} varies from {\lambda_{-2}(g)\in U(1)} to the unstable eigenvalue of {g} acting on {\mathbb{R}^2/\mathbb{Z}^2}; the second possibility is that {\ell_g} becomes tangent to {\kappa^{-1}(k)} for some {-2<k<2} so that the Zariski tangent space computation mentioned above reveals that {\lambda} varies from {1} (at {\ell_g\cap\kappa^{-1}(k)}) to some value {\neq 1} (at any point of transverse intersection between {\ell_g} and a level set of {\kappa}).

2.3. Detecting Brjuno elliptic periodic points

The discussion of the previous two subsections allows to show that the some portions of the action of an hyperbolic element {g\in SL(2,\mathbb{Z})} fit the assumptions of KAM theory.

Before entering into this matter, recall that {e^{2\pi i\theta}\in U(1)} is Brjuno whenever {\theta} is an irrational number whose continued fraction has partial convergents {(p_k/q_k)_{k\in\mathbb{Z}}} satisfying

\displaystyle \sum\limits_{k=1}^{\infty} \frac{\log q_{k+1}}{q_k}<\infty.

For our purposes, it is important to note that the Brjuno condition has full Lebesgue measure on {U(1)}.

Let {g\in SL(2,\mathbb{Z})} be an hyperbolic element. We have three possibilities for the limiting eigenvalue {\lambda_{-2}(g)\in U(1)}: it is not real, it equals {1} or it equals {-1}.

If the limiting eigenvalue {\lambda_{-2}(g)\in U(1)} is not real, then we take an irreducible component {\ell_g} intersecting the origin {\kappa^{-1}(-2)}. Since {\lambda} is not constant on {\ell_g} implies that {\lambda(\ell_{g})} contains an open subset of {U(1)}. Thus, we can find some {-2<k<2} such that {\{p\}=\ell_g\cap \kappa^{-1}(k)} has a Brjuno eigenvalue {\lambda(p)}, i.e., the action of {g} on {\kappa^{-1}(k)} has a Brjuno fixed point.

If the limiting eigenvalue is {\lambda_{-2}(g)=1}, we use Lefschetz fixed point theorem on the sphere {\kappa^{-1}(k)} with {k} close to {-2} to locate an irreducible component {\ell_g} of {L_g} such that {\{p_k\}=\ell_g\cap\kappa^{-1}(k)} is a fixed point of positive index of {g|_{\kappa^{-1}(k)}} for {k} close to {-2}. On the other hand, it is known that an isolated fixed point of an orientation-preserving surface homeomorphism which preserves area has index {<2}. Therefore, {p_k} is a fixed point of {g|_{\kappa^{-1}(k)}} of index {1} with multipliers {\lambda(p_k), \lambda(p_k)^{-1}} close to {1} whenever {k} is close to {-2}. Since a hyperbolic fixed point with positive multipliers has index {-1}, it follows that {p_k} is a fixed point with {\lambda(p_k)\in U(1)\setminus\{1\}} when {k} is close to {-2}. In particular, {\lambda(\ell_g)} contains an open subset of {U(1)} and, hence, we can find some {-2<k<2} such that {p_k} has a Brjuno multiplier {\lambda(p_k)}.

If the limiting eigenvalue is {\lambda_{-2}(g)=-1}, then {g^2} is an hyperbolic element with limiting eigenvalue {\lambda_{-2}(g^2)=1}. From the previous paragraph, it follows that we can find some {-2<k<2} such that {\kappa^{-1}(k)} contains a Brjuno elliptic fixed point of {g^2|_{\kappa^{-1}(k)}}.

In any event, the arguments above give the following result (cf. Theorem 4.4 in Brown’s work):

Theorem 5 Let {g\in SL(2,\mathbb{Z})} be an hyperbolic element. Then, there exists {-2<k<2} such that {g|_{\kappa^{-1}(k)}} has a periodic point of period one or two with a Brjuno multiplier.

2.4. Moser’s twisting theorem and Rüssmann’s stability theorem

At this point, the idea to derive Theorem 4 is to combine Theorem 5 with KAM theory ensuring the stability of certain types of elliptic periodic points.

Recall that a periodic point is called stable whenever there are arbitrarily small neighborhoods of its orbit which are invariant. In particular, the presence of a stable periodic point implies the non-ergodicity of an area-preserving map.

A famous stability criterion for fixed points of area-preserving maps is Moser’s twisting theorem. This result can be stated as follows. Suppose that {f} is an area-preserving {C^r}, {r\geq 4}, map having an elliptic fixed point at origin {(0,0)\in \mathbb{R}^2} with multipliers {e^{2\pi i\theta}}, {e^{-2\pi i\theta}} such that {n\theta\notin\mathbb{Z}} for {n=1, 2, 3, \dots, r}. After performing an appropriate area-preserving change of variables (tangent to the identity at the origin), one can bring {f} into its Birkhoff normal form, i.e., {f} has the form

\displaystyle \left(\begin{array}{c}\xi \\ \eta\end{array}\right) \mapsto \left(\begin{array}{c} \xi\cos(\sum\limits_{n=0}^s\gamma_n(\xi^2+\eta^2)^n)-\eta\sin(\sum\limits_{n=0}^s \gamma_n(\xi^2+\eta^2)^n) \\ \xi\sin(\sum\limits_{n=0}^s \gamma_n(\xi^2+\eta^2)^n)+\eta\cos(\sum\limits_{n=0}^s \gamma_n(\xi^2+\eta^2)^n)\end{array}\right) + h(\xi,\eta)

where {s=[r/2]-1}, {\gamma_0=2\pi\theta}, {\gamma_1, \dots, \gamma_s} are uniquely determined Birkhoff constants and {h(\xi,\eta)} denotes higher order terms.

Theorem 6 (Moser twisting theorem) Let {f} be an area-preserving map as in the previous paragraph. If {\gamma_n\neq0} for some {1\leq n\leq s}, then the origin {(0,0)\in\mathbb{R}^2} is a stable fixed point.

The nomenclature “twisting” comes from the fact {\gamma_1\neq 0} when {f} is a twist map, i.e., {f} has the form {f(r,\theta)=(r,\theta+\mu(r))} in polar coordinates where {\mu} is a smooth function with {|\mu'(0)|\neq 0}. In the literature, the condition “{\gamma_n\neq0} for some {n}” is called twist condition.

Example 3 The Dehn twist {\tau_{\alpha}} induces the polynomial automorphism {\tau_{\alpha}(x,y,z)= (x,z,xz-y)} on {X(S_{1,1}, SU(2))=\kappa^{-1}([-2,2])}. Each level set {\kappa^{-1}(k)}, {-2<k<2}, is a smooth {2}-sphere which is swept out by the {\tau_{\alpha}}-invariant ellipses {C_{k,x_0}} obtained from the intersections between {\kappa^{-1}(k)} and the planes of the form {\{x_0\}\times \mathbb{R}^2}.Goldman observed that, after an appropriate change of coordinates, each {C_{k,x_0}} becomes a circle where {\tau_{\alpha}} acts as a rotation by angle {\cos^{-1}(x_0/2)}. In particular, the restriction of {\tau_{\alpha}} to each level set {\kappa^{-1}(k)} is a twist map near its fixed points {(\pm\sqrt{2+k},0,0)}.

In his original argument, Brown deduced Theorem 4 from (a weaker version of) Theorem 5 and Moser’s twisting theorem. However, Brown employed Moser’s theorem with {r=4} while checking only the conditions on the multipliers of the elliptic fixed point but not the twist condition {\gamma_1\neq 0}.

As it turns out, it is not obvious to check the twist condition in Brown’s setting (especially because it is not satisfied at the sphere of directions {S_{-2}}).

Fortunately, Rüssmann discovered that a Brjuno elliptic fixed point of a real-analytic area-preserving map is always stable (independently of twisting conditions):

Theorem 7 (Rüssmann) Any Brjuno elliptic periodic point of a real-analytic area-preserving map is stable.

Remark 2 Actually, Rüssmann obtained the previous result by showing that a real-analytic area-preserving map with a Brjuno elliptic fixed point and vanishing Birkhoff constants (i.e., {\gamma_n=0} for all {n\in\mathbb{N}}) is analytically linearisable. Note that the analogue of this statement in the {C^{\infty}} category is false (as a counterexample is given by {(r,\theta)\mapsto (r,\theta+\rho+e^{-1/r})}).

In any case, at this stage, the proof of Theorem 4 is complete: it suffices to put together Theorems 5 and 7.

Older Posts »