In this previous post here (from 2018), I described some “back of the envelope calculations” (based on private conversations with Scott Wolpert) indicating that some sectional curvatures of the Weil–Petersson (WP) metric could be at least exponentially small in terms of the distance to the boundary divisor of Deligne–Mumford compactification.

Very roughly speaking, this heuristic computation went as follows: the WP sectional curvature of any {2}-plane can be written as the sum of three terms; for the {2}-planes considered in the previous post, the main term among those three seemed to be a kind of {L^4}-norm of Beltrami differentials with essentially disjoint supports; finally, this {L^4}-type norm was shown to be really small once a certain Green propagator is ignored.

Last April 2019, I met Scott during an event at Simons Center for Geometry and Physics, and I took the opportunity to tell him that one could perhaps show that the measure of the set of {2}-planes leading to tiny WP curvatures is very small using the real-analyticity of the WP metric.

More concretely, my idea was very simple: since the Grassmannian {G} of {2}-planes tangent to a point {p} is a compact space, the WP sectional curvature defines a real-analytic function {c:G\rightarrow(-\infty, 0)}, and we dispose of good upper bounds for {|c|} and all of its derivatives in terms of the distance of {p} to the boundary (see this article here), we can hope to get reasonable estimates for the measure of the sets {\{P\in G: |c(P)|\leq\varepsilon\}} using the techniques of these articles here and here (which are close in spirit to the classical fact [explained in Lemma 3.2 of Kleinbock–Margulis paper, for instance] that the measure of the sets {\{|P|\leq\varepsilon\}} are small whenever {P} is a polynomial function on {[0,1]} whose degree and {C^0}-norm are bounded).

As it turns out, Scott thought that this strategy made some sense and, in particular, he promised to use my suggestion as a motivation to review his arguments concerning WP sectional curvatures.

After several email exchanges with Howard Masur and I, Scott announced that there were some mistakes in the construction of tiny WP sectional curvature: in a nutshell, one should not restrict the analysis to a single “main term” in the formula for WP sectional curvatures as a sum of three expressions, and one can not ignore the effect of the Green propagator. More importantly, Scott made a detailed study of these mistakes which ultimately led him to establish polynomial upper bounds for WP sectional curvatures at the heart of his newest preprint available here.

In this post, we will follow closely Scott’s preprint in order to give an outline of the proof of a polynomial upper bound for WP sectional curvatures:

Theorem 1 (Wolpert) Given two integers {g\geq 0} and {n\geq 0} with {3g-3+n\geq 1}, there exists a constant {C(g,n)>0} with the following property.If {\sigma(X)} denotes the product of the lengths of the short geodesics of a hyperbolic surface {X} of genus {g} with {n} cusps whose systole is sufficiently small, then the sectional curvatures of the Weil-Petersson metric at {X} are at most

\displaystyle -C(g,n)\cdot\sigma(X)^7

Remark 1 As it was pointed out by Scott in his preprint, it is likely that this estimate is not optimal: indeed, one expects that the best exponent should be {3} rather than {7}.

In what follows, we’ll assume some familiarity with some basic aspects of the geometry of the Weil–Petersson metric (such as those described in these posts here and here).

1. Weil–Petersson sectional curvatures

Let {X} be a hyperbolic surface of genus {g\geq0} with {n\geq0}. If we write {X=\mathbb{H}/\Gamma}, where {\mathbb{H}} is the usual hyperbolic plane and {\Gamma} is a group of isometries of {\mathbb{H}} describing the fundamental group of {X}, then the holomorphic tangent space at {X} to the moduli space {\mathcal{M}_{g,n}} of Riemann surfaces of genus {g} with {n} punctures is naturally identified with the space {B(\Gamma)} of harmonic Beltrami differentials on {X} (and the cotangent space is related to quadratic differentials).

In this setting, the Weil–Petersson metric is the Riemannian metric {ds^2=2\sum g_{\alpha\overline{\beta}} dt_{\alpha}\overline{dt_{\beta}}} induced by the Hermitian inner product

\displaystyle g_{\alpha\overline{\beta}} = \langle\mu_{\alpha},\mu_{\beta}\rangle := \int_X \mu_{\alpha}\overline{\mu_{\beta}} \, dA

where {\mu_{\alpha}, \mu_{\beta}\in B(\Gamma)} and {dA} is the hyperbolic area form on {X}.

Remark 2 Note that {\langle.,.\rangle} is well-defined: if {\mu=\mu(z)\overline{dz}/dz} and {\nu=\nu(z)\overline{dz}/dz} are Beltrami differentials, then {\mu\overline{\nu}} is a function on {X}.

The Riemann tensor of the Weil–Petersson metric was computed by Wolpert in 1986:

\displaystyle R_{\alpha\overline{\beta}\gamma\overline{\delta}} = (\alpha\overline{\beta}, \gamma\overline{\delta}) + (\alpha\overline{\delta}, \gamma\overline{\beta})

where {(a\overline{b},c\overline{d}) := \int_X (\mu_a\overline{\mu_b}) \mathcal{D}(\mu_c\overline{\mu_d})\, dA} and {\mathcal{D}:=-2(\Delta-2)^{-1}} is an operator related to the Laplace–Beltrami operator {\Delta} on {L^2(X)}.

Remark 3 Our choice of notation here differs from Wolpert’s preprint! Indeed, he denotes the Laplace–Beltrami operator by {D} and he writes {\Delta=-2(D-2)^{-1}}.

The Riemann tensor gives access to nice formulas for the sectional curvatures thanks to the work of Bochner. More concretely, given {v_1} and {v_2} span a {2}-plane {P} in the real tangent space to {\mathcal{M}_{g,n}} at {X}, let us take Beltrami differentials {\mu_1} and {\mu_2} such that {v_1=\mu_1+\overline{\mu_2}}, {v_2=\mu_1-\overline{\mu_2}}, and {\{\mu_1,\mu_2\}} is orthonormal. Then, Bochner showed that the sectional curvature of {P} is

\displaystyle K(P)=\frac{R_{1\overline{2}1\overline{2}}-R_{1\overline{2}2\overline{1}}-R_{2\overline{1}1\overline{2}}+R_{2\overline{1}2\overline{1}}}{4g_{1\overline{1}}g_{2\overline{2}}-2|g_{1\overline{2}}|^2-2\textrm{Re}(g_{1\overline{2}})^2} = \frac{R_{1\overline{2}1\overline{2}}-R_{1\overline{2}2\overline{1}}-R_{2\overline{1}1\overline{2}}+R_{2\overline{1}2\overline{1}}}{4}

Hence, by Wolpert’s formula for the Riemann tensor of the WP metric, we see that

\displaystyle K(P) = \frac{2(1\overline{2}, 1\overline{2})-(1\overline{2}, 2\overline{1})-(1\overline{1}, 2\overline{2})-(2\overline{1}, 1\overline{2})-(2\overline{2}, 1\overline{1})+2(2\overline{1}, 2\overline{1})}{4} \ \ \ \ \ (1)

2. Spectral theory of {\mathcal{D}}

Wolpert’s formula for the Riemann tensor of the WP metric hints that the spectral theory of {\mathcal{D}} plays an important role in the study of the WP sectional curvatures.

For this reason, let us review some key properties of {\mathcal{D}} (and we refer to Section 3 of Wolpert’s preprint for more details and references). First, {\mathcal{D}=-2(\Delta-2)^{-1}} is a positive operator on {L^2(X)} whose norm is {1}: these facts follow by integration by parts. Secondly, {\Delta} is essentially self-adjoint on {L^2(X)}, so that {\mathcal{D}} is self-adjoint on {L^2(X)}. Moreover, the maximum principle permits to show that {\mathcal{D}} is also a positive operator on {C_0(X)} with unit norm. Finally, {\mathcal{D}} has a positive symmetric integral kernel: indeed,

\displaystyle \mathcal{D}f(p) = \int_X G(p,q) f(q) \, dA

where the Green propagator {G} is the Poincaré series

\displaystyle G(p,q)=-2\sum\limits_{\gamma\in\Gamma} Q_1(d(p,\gamma(q)))

associated to an appropriate Legendre function {Q_1}. (Here, {d(.,..)} stands for the hyperbolic distance on {\mathbb{H}}.) For later reference, we recall that {Q_1} has a logarithmic singularity at {0} and {-Q_1(x)\sim e^{-2x}} whenever {x} is large.

3. Negativity of the WP sectional curvatures

Interestingly enough, as it was first noticed by Wolpert in 1986, the spectral features of {\mathcal{D}} described above are sufficient to derive the negativity of WP sectional curvatures from Cauchy-Schwarz inequality. More precisely, since {\mathcal{D}} is self-adjoint, i.e.,

\displaystyle (a\overline{b},c\overline{d}) := \int_X (\mu_a\overline{\mu_b}) \, \mathcal{D}(\mu_c\overline{\mu_d}) \, dA = \int_X \mathcal{D}(\mu_a\overline{\mu_b}) \, \mu_c\overline{\mu_d}\,dA

and its integral kernel {G} is a real function, a straightforward computation reveals that the equation (1) for the sectional curvature {K(P)} of a {2}-plane {P} can be rewritten as

\displaystyle \begin{array}{rcl} K(P) &=& \frac{2(1\overline{2}, 1\overline{2})-(1\overline{2}, 2\overline{1})-(1\overline{1}, 2\overline{2})-(2\overline{1}, 1\overline{2})-(2\overline{2}, 1\overline{1})+2(2\overline{1}, 2\overline{1})}{4} \\ &=& \frac{4\textrm{Re}(1\overline{2}, 1\overline{2}) -2(1\overline{2}, 2\overline{1}) -2(1\overline{1}, 2\overline{2})}{4}. \end{array}

If we decompose the function {\mu_1\overline{\mu_2}} into its real and imaginary parts, say {\mu_1\overline{\mu_2} = f+ih}, then we see that

\displaystyle \begin{array}{rcl} \textrm{Re}(1\overline{2}, 1\overline{2}) - (1\overline{2}, 2\overline{1}) &=& \left[\int_X f\,\mathcal{D}f \, dA - \int_X h\, \mathcal{D}h \, dA\right] - \left[\int_X f\,\mathcal{D}f \, dA + \int_X h\, \mathcal{D}h \, dA\right] \\ &=& -2\int_X h\, \mathcal{D}h \, dA. \end{array}

Since {\mathcal{D}} is a positive operator, we conclude that {\textrm{Re}(1\overline{2}, 1\overline{2}) - (1\overline{2}, 2\overline{1})\leq 0} and, a fortiori,

\displaystyle K(P)\leq \frac{\textrm{Re}(1\overline{2}, 1\overline{2})-(1\overline{1}, 2\overline{2})}{2} \ \ \ \ \ (2)

The non-positivity of the right-hand side of (2) can be established in three steps. First, the positivity of {\mathcal{D}} also implies that

\displaystyle \textrm{Re}(1\overline{2}, 1\overline{2})\leq \int_X |f|\,\mathcal{D}|f|\,dA\leq \int_X |f|\,\mathcal{D}|\mu_1\overline{\mu_2}|\,dA.

Secondly, the fact that {\mathcal{D}} has a positive integral kernel {G} allows to apply the Cauchy–Schwarz inequality to get that {\mathcal{D}|uv| =\int G |u v| = \int G^{1/2}|u| G^{1/2}|v| \leq (\mathcal{D}|u|^2)^{1/2} (\mathcal{D}|v|^2)^{1/2}}. Therefore,

\displaystyle \int_X |f|\,\mathcal{D}|\mu_1\overline{\mu_2}|\,dA\leq \int_X |f|\,(\mathcal{D}|\mu_1|^2)^{1/2} (\mathcal{D}|\mu_2|^2)^{1/2}\,dA\leq \int_X |\mu_1\overline{\mu_2}| \, (\mathcal{D}|\mu_1|^2)^{1/2} (\mathcal{D}|\mu_2|^2)^{1/2}\,dA

Finally, the Cauchy–Schwarz inequality also says that

\displaystyle \int_X |\mu_1\overline{\mu_2}| \, (\mathcal{D}|\mu_1|^2)^{1/2} (\mathcal{D}|\mu_2|^2)^{1/2}\,dA\leq \left(\int_X |\mu_1|^2\,(\mathcal{D}|\mu_2|^2)\, dA\right)^{1/2}\left(\int_X |\mu_2|^2\,(\mathcal{D}|\mu_1|^2)\, dA\right)^{1/2}=(1\overline{1},2\overline{2})

In summary, we showed that

\displaystyle (I)\leq (II)\leq (III)\leq (IV)\leq (V)\leq (VI) \ \ \ \ \ (3)


\displaystyle \begin{array}{rcl} & &(I):=\textrm{Re}(1\overline{2}, 1\overline{2}), \quad (II):=\int_X |f|\,\mathcal{D}|f|\,dA, \quad (III):=\int_X |f|\,\mathcal{D}|\mu_1\overline{\mu_2}|\,dA, \\ & & (IV):=\int_X |f|\,(\mathcal{D}|\mu_1|^2)^{1/2} (\mathcal{D}|\mu_2|^2)^{1/2}\,dA, \quad (V):=\int_X |\mu_1\overline{\mu_2}| \, (\mathcal{D}|\mu_1|^2)^{1/2} (\mathcal{D}|\mu_2|^2)^{1/2}\,dA, \\ & & (VI):= (1\overline{1}, 2\overline{2}) \end{array}

In particular, {(I)\leq (VI)}, so that it follows from (2) that all sectional curvatures {K(P)} of the WP metric are non-positive, i.e., {K(P)\leq 0}.

Actually, it is not hard to derive that {K(P)<0} at this stage: indeed, {K(P)=0} would force a case of equality in Cauchy-Schwarz inequality and this is not possible in our context because {\{\mu_1,\mu_2\}} is orthonormal.

Remark 4 Philosophically speaking, the “analog” to this argument in the realm of Teichmüller dynamics is Forni’s proof of the spectral gap property {\lambda_2<1} for the Lyapunov exponents of the Teichmüller geodesic flow. In fact, after some computations with variational formulas for the so-called Hodge norm, Forni establishes that {\lambda_2<1} by ruling out an equality case in a certain Cauchy-Schwarz estimate.

4. Reduction of Theorem 1 to bounds on {\mathcal{D}}‘s kernel

The discussion in the previous section says that small WP sectional curvatures correspond to almost equalities in certain Cauchy-Schwarz inequalities.

Hence, a natural strategy towards the proof of Theorem 1 consists into showing that an almost equality in (3) is impossible. In this direction, Wolpert establishes the following result:

Theorem 2 (Wolpert) There are two constants {c_1(g,n)>0} and {c_2(g,n)>0} with the following property. If we have an almost equality

\displaystyle (V)-(I)\leq c_1(g,n)\cdot\sigma(X)^7,

between the terms {(I)} and {(V)} in (3), then {(VI)} and {(I)} can not be almost equal:

\displaystyle (VI)-(I)\geq c_2(g,n)\cdot\sigma(X)^3

Of course, Theorem 1 is an immediate consequence of Theorem 2 (in view of (2) and the estimate {(VI)-(I)\geq (V)-(I)} [implied by (3)]).

Thus, it remains only to prove Theorem 2. For this sake, we need further spectral information on {\mathcal{D}}, namely, some lower bounds on its the kernel {G(p,q)}. In order to illustrate this point, let us now show Theorem 2 assuming the following statement.

Proposition 3 There exists a constant {c_3(g,n)>0} such that

\displaystyle G(p,q)\geq c_3(g,n)\cdot \sigma(X)^3

whenever {p} and {q} do not belong to the cusp region {X_{cusps}} of {X}.

Remark 5 We recall that the cusp region {X_{cusps}} of {X} is a finite union of portions of {X} which are isometric to a punctured disk {\{0<|w|<c_4(g,n)\}} (equipped with the hyperbolic metric {ds^2=(|dw|/|w|\log|w|)^2}).

For the sake of exposition, let us first establish Theorem 2 when {X} is compact, i.e., {X_{cusps}=\emptyset}, before explaining the extra ingredient needed to treat the general case.

4.1. Proof of Theorem 2 modulo Proposition 3 when {X_{cusps}=\emptyset}

Suppose that {(V)-(I)\leq c_1(g,n)\sigma(X)^7} for a constant {c_1(g,n)} to be chosen later. In this regime, our goal is to show that {(VI)} is “big” and {(II)} is “small”, so that {(VI)-(I)} is necessarily “big”.

We start by quickly showing that {(VI)} is “big”. Since {\mu_1} and {\mu_2} are unitary tangent vectors, it follows from Proposition 3 that

\displaystyle (VI)=\int_X |\mu_1|^2\,\mathcal{D}|\mu_2|^2\,dA\geq c_3(g,n) \sigma(X)^3 \ \ \ \ \ (4)

Let us now focus on proving that {(II)} is “small”. Since {(II)-(I)\leq (V)-(I)} (cf. (3)), if we write {\mu_1\overline{\mu_2} = f+ih=f^+-f^-+ih} (where {f^+} and {f^-} are the positive and negative parts of the real part {f} of {\mu_1\overline{\mu_2})}, then we obtain that

\displaystyle \begin{array}{rcl} c_1(g,n)\,\sigma(X)^7\geq (II)-(I) &=& \int_X |f|\,\mathcal{D}|f|\,dA - \textrm{Re}\int_X\mu_1\overline{\mu_2}\,\mathcal{D}(\mu_1\overline{\mu_2})\, dA \\ &=& \int_X f^+\,\mathcal{D}f^+\,dA + 2\int_X f^+\,\mathcal{D}f^-\,dA+\int_X f^-\,\mathcal{D}f^-\,dA \\ & &- \int_X f\,\mathcal{D}f\,dA+\int_X h\,\mathcal{D}h\,dA \\ &=&4\int_X f^+\,\mathcal{D}f^-\,dA+\int_X h\,\mathcal{D}h\,dA. \end{array}

Since {\mathcal{D}} is positive, we derive that {4\int_X f^+\,\mathcal{D}f^-\,dA\leq c_1(g,n)\,\sigma(X)^7}. Thus, if {X} is compact, i.e., {X_{cusps}=\emptyset}, then Proposition 3 says that {G(p,q)\geq c_3(g,n)\,\sigma(X)^3} for all {p,q\in X}. It follows that

\displaystyle 4\,c_3(g,n)\,\sigma(X)^3\int_X f^+\,dA\int_X f^-\,dA\leq c_1(g,n)\,\sigma(X)^7

By orthogonality of {\{\mu_1,\mu_2\}}, we have that {\textrm{Re}\int_X\mu_1\overline{\mu_2}\,dA=0}, i.e., {\int_X f^+\,dA = \int_X f^-\,dA = (1/2) \int_X |f|\,dA}. By plugging this information into the previous inequality, we obtain the estimate

\displaystyle c_3(g,n)\,\left(\int_X |f|\,dA\right)^2\leq c_1(g,n)\,\sigma(X)^4 \ \ \ \ \ (5)

Next, we observe that {(V)-(IV)\leq (V)-(I)} (cf. (3)) in order to obtain that

\displaystyle c_1(g,n)\,\sigma(X)^7\geq (V)-(IV)=\int_X (|\mu_1\overline{\mu_2}|-|f|) \, (\mathcal{D}|\mu_1|^2)^{1/2} (\mathcal{D}|\mu_2|^2)^{1/2}\,dA

On the other hand, Proposition 3 ensures that {\mathcal{D}|\mu_{\ast}|^2\geq c_3(g,n)\,\sigma(X)^3\,\int_X|\mu_{\ast}|^2\,dA} for {\ast=1,2}. Since {\mu_1} and {\mu_2} are unitary tangent vectors, one has {\mathcal{D}|\mu_{\ast}|^2\geq c_3(g,n)\,\sigma(X)^3} for {\ast=1,2}. By inserting this inequality into the previous estimate, we derive that

\displaystyle c_1(g,n)\,\sigma(X)^7\geq c_3(g,n)\,\sigma(X)^3\,\int_X (|\mu_1\overline{\mu_2}|-|f|)\,dA \ \ \ \ \ (6)

From (5) and (6), we see that

\displaystyle \int_X |\mu_1\overline{\mu_2}|\,dA\leq \sqrt{\frac{c_1(g,n)}{c_3(g,n)}}\sigma(X)^2+\frac{c_1(g,n)}{c_3(g,n)}\sigma(X)^4\leq 2\sqrt{\frac{c_1(g,n)}{c_3(g,n)}}\sigma(X)^2 \ \ \ \ \ (7)

whenever {X} has a sufficiently small systole.

This {L^1} bound on {|\mu_1\overline{\mu_2}|} can be converted into a {C^0} bound thanks to Cauchy integral formula. More concrentely, as it is explained in Section 2 of Wolpert’s preprint, after observing that {|\mu_1\overline{\mu_2}| = |\mu_1\mu_2|} and replacing Beltrami differentials {\mu_1} and {\mu_2} by the dual objects {q_1} and {q_2} (namely, quadratic differentials), we are led to study quartic differentials {q_1q_2}. By Cauchy integral formula on {\mathbb{H}}, one has

\displaystyle |q_1q_2(ds^2)^{-2}|(p)\leq \frac{1}{\pi}\int_{B(p,1)}|q_1q_2(ds^2)^{-2}|\,dA

On the other hand, if {X=\mathbb{H}/\Gamma} has systole {\rho(X)} and the cusp region {X_{cusps}} is empty, then the injectivity radius at any {p\in X} is {\geq \rho(X)/2}. Thus, there exists an universal constant {c_0>0} such that

\displaystyle |q_1q_2(ds^2)^{-2}|(p)\leq \frac{1}{\pi}\int_{B(p,1)}|q_1q_2(ds^2)^{-2}|\,dA\leq c_0\frac{1}{\rho(X)}\|q_1q_2\|_{L^1(X)}

for all {p\in X}. By plugging this inequality into (7), we conclude that

\displaystyle |\mu_1\overline{\mu_2}(p)|\leq 2c_0\sqrt{\frac{c_1(g,n)}{c_3(g,n)}}\frac{1}{\rho(X)}\sigma(X)^2\leq 2c_0\sqrt{\frac{c_1(g,n)}{c_3(g,n)}}\sigma(X)

for all {p\in X}.

Since {\mathcal{D}} is a positive operator on {C_0(X)} with unit norm (cf. Section 2 above) and {|f|\leq |\mu_1\overline{\mu_2}|}, we have that the previous inequality implies the following {C^0} bound on {\mathcal{D}|f|}:

\displaystyle \mathcal{D}|f|(p)\leq 2c_0\sqrt{\frac{c_1(g,n)}{c_3(g,n)}}\sigma(X)

for all {p\in X}. By combining this estimate with (7), we conclude that

\displaystyle (II)=\int_X |f|\,\mathcal{D}|f|\,dA\leq 4c_0\frac{c_1(g,n)}{c_3(g,n)}\sigma(X)^3 \ \ \ \ \ (8)

In summary, (4) and (8) imply that

\displaystyle (VI)-(I)\geq (VI)-(II)\geq \frac{c_3(g,n)}{2}\cdot\sigma(X)^3:=c_2(g,n)\cdot\sigma(X)^3

for the choice of constant {c_1(g,n):=\frac{c_3(g,n)^2}{8c_0}}. This proves Theorem 2 in the absence of cusp regions.

4.2. Proof of Theorem 2 modulo Proposition 3 when {X_{cusps}\neq\emptyset}

The arguments above for the case {X_{cusps}=\emptyset} also work in the case {X_{cusps}\neq\emptyset} because the cusp regions carry only a tiny fraction of the mass of the relevant functions, Beltrami differentials, etc.

More precisely, as it is explained in Section 2 of Wolpert’s preprint, if the constant {c_4(g,n)>0} is chosen correctly, then the Cauchy integral formula and the Schwarz lemma can be used to prove that

\displaystyle \int_{X_{cusps}}|\varphi (ds^2)^{-2}|\,dA\leq \frac{1}{8}\|\varphi\|_{L^1(X)}

for all holomorphic quartic differentials {\varphi}.

In particular, we do not lose too much information after truncating {\mu_1}, {\mu_2}, etc. to {X\setminus X_{cusps}} and this allows us to repeat the arguments of the case {X_{cusps}=\emptyset} to the corresponding truncated objects {\widetilde{\mu_1}}, {\widetilde{\mu_2}}, etc. without any extra difficulty: see Section 5 of Wolpert’s preprint for more details.

5. Proof of Proposition 3

Closing this post, let us give an idea of the proof of Proposition 3 (and we refer the reader to Section 4 of Wolpert’s preprint for more details).

Since {G(p,q)=-2\sum\limits_{\gamma\in\Gamma} Q_1(d(p,\gamma(q)))} and {-Q_1\sim e^{-2x}} (cf. Section 2 above), our task is reduced to give lower bounds on the Poincaré series

\displaystyle K(p,q)=\sum\limits_{\gamma\in\Gamma} e^{-2d(p,\gamma(q))}

For this sake, let us first recall that a hyperbolic surface {X} has thick-thin decomposition: the thick portion is the region where the injectivity radius is bounded away from zero by a uniform constant and the thin portion is the complement of the thick region. Geometrically, the thin region is the disjoint union of the cusp region {X_{cusps}} and a finite number of collars around simple closed short geodesics: roughly speaking, a collar consisting of the points at distance {\leq w(\alpha)=\log(1/\ell_{\alpha})+O(1)} of a short simple closed geodesic {\alpha} of length {\ell_{\alpha}}.

We can provide lower bounds on {K(p,q)} in terms of the behaviours of simple geodesic arcs connecting {p} and {q} on {X}.

More concretely, let {\theta_{pq}} be the shortest geodesic connecting {p} and {q}. Since {\theta_{pq}} is simple, we have that, for certain adequate choices of the constants defining the collars, one has that {\theta_{pq}} can not “back track” after entering a collar, i.e., it must connect the boundaries (rather than going out via the same boundary component). Furthermore, {\theta_{pq}} can not go very high into a cusp. Thus, if we decompose {\theta_{pq}} according to its visits to the thick region, the collars and the cusps, then the fact that {p,q\in X\setminus X_{cusps}} permits to check that it suffices to study the passages of {\theta_{pq}} through collars in order to get a lower bound on {K(p,q)}.

Next, if {\eta} is a subarc of {\theta_{pq}} crossing a collar around a short closed geodesic {\alpha}, then we can apply Dehn twists to {\eta} to get a family of simple arcs indexed by {\mathbb{Z}} giving a “contribution” to {K(p,q)} of

\displaystyle \sum\limits_{n\in\mathbb{Z}}e^{-2(2w(\alpha)+|n|\ell_{\alpha})}\geq c_5(g,n)\cdot \ell_{\alpha}^3

for some constant {c_5(g,n)>0} depending only on the topology of {X}. In this way, the desired result follows by putting all “contributions” together.


The celebrated works of several mathematicians (including Poincaré, Denjoy, …, ArnoldHermanYoccoz, …) provide a very satisfactory picture of the dynamics of smooth circle diffeomorphisms:

  • each {C^r}-diffeomorphism {f\in\textrm{Diff}^r(\mathbb{T})} of the circle {\mathbb{T}:=\mathbb{R}/\mathbb{Z}} has a well-defined rotation number {\alpha=\rho(f)} (which can be defined using the cyclic order of its orbits, for instance);
  • {f\in\textrm{Diff}^r(\mathbb{T})} is topologically semi-conjugated to the rigid rotation {R_{\alpha}(x)=x+\alpha} (i.e., {h\circ f=R_{\alpha}\circ h} for a surjective continuous map {h:\mathbb{T}\rightarrow \mathbb{T}}) whenever its rotation number {\alpha=\rho(f)} is irrational;
  • if {f\in\textrm{Diff}^2(\mathbb{T})} has irrational rotation number {\alpha}, then {f} is topologically conjugated to {R_{\alpha}} (i.e., there is an homeomorphism {h:\mathbb{T}\rightarrow\mathbb{T}} such that {h\circ f = R_{\alpha}\circ h});
  • if {f\in\textrm{Diff}^r(\mathbb{T})}, {r\geq 3}, has an irrational rotation number {\alpha} satisfying a Diophantine condition of the form {|\alpha-p/q|\geq c/q^{2+\beta}} for some {c>0}, {(r-1)/2>\beta\geq 0}, and all {p/q\in\mathbb{Q}}, then there exists {h\in\textrm{Diff}^{r-1-\beta-}(\mathbb{T}):= \bigcap\limits_{\varepsilon>0}\textrm{Diff}^{r-1-\beta-\varepsilon}(\mathbb{T})} conjugating {f} and {R_{\alpha}} (i.e., {h\circ f = R_{\alpha}\circ h});
  • etc.

In particular, if {\alpha} has Roth type (i.e., for all {\varepsilon>0}, there exists {c_{\varepsilon}>0} such that {|\alpha-p/q|\geq c_{\varepsilon}/q^{2+\varepsilon}} for all {p/q\in\mathbb{Q}}), then any {f\in\textrm{Diff}^r(\mathbb{T})} with rotation number {\alpha} is {C^{r-1-}} conjugated to {R_{\alpha}} whenever {r>3}. (The nomenclature is motivated by Roth’s theorem saying that any irrational algebraic number has Roth type, and it is well-known that the set of Roth type numbers has full Lebesgue measure in {\mathbb{R}}.)

In the last twenty years, many authors gave important contributions towards the extension of this beautiful theory.

In this direction, a particularly successful line of research consists into thinking of circle rotations {R_{\alpha}} as standard interval exchange transformations on 2 intervals and trying to build smooth conjugations between generalized interval exchange transformations (g.i.e.t.) and standard interval exchange transformations. In fact, Marmi–Moussa–Yoccoz studied the notion of standard i.e.t. of restricted Roth type (a concept designed so that the circle rotation {R_{\alpha}} has restricted Roth type [when viewed as an i.e.t. on 2 intervals] if and only if {\alpha} has Roth type) and proved that, for any {r\geq 2}, the {C^{r+3}} g.i.e.t.s {T} close to a standard i.e.t. {T_0} of restricted Roth type such that {T} is {C^r}-conjugated to {T_0} form a {C^1}-submanifold of codimension {(g-1)(2r+1)+s} where {T_0} is the first return map to an interval transverse to a translation flow on a translation surface of genus {g\geq 1} and {T_0} is an i.e.t. on {d=2g+s-1} intervals.

An interesting consequence of this result of Marmi–Moussa–Yoccoz is the fact that local conjugacy classes behave differently for circle rotations and arbitrary i.e.t.s. Indeed, a circle rotation is an i.e.t. on 2 intervals associated to the first return map of a translation flow on the torus {\mathbb{T}^2=\mathbb{R}^2/\mathbb{Z}^2}, so that {R_{\alpha}} has genus {g=1} and also {s=1}. Hence, Marmi–Moussa–Yoccoz theorem says that its local conjugacy class of {R_{\alpha}} with {\alpha} of Roth type has codimension {(g-1)(2r+1)+s=1} regardless of the differentiability scale {r}. Of course, this fact was previously known from the theory of circle diffeomorphisms: by the results of Herman and Yoccoz, the sole obstruction to obtain a smooth conjugation between {f} and {R_{\alpha}} (with {\alpha} of Roth type) is described by a single parameter, namely, the rotation number of {f}. On the other hand, Marmi–Moussa–Yoccoz theorem says that the codimension

\displaystyle (g-1)(2r+1)+s

of the local conjugacy class of an i.e.t. of restricted Roth type with genus {g\geq 2} grows linearly with the differentiability scale {r}.

Remark 1 This indicates that KAM theoretical approaches to the study of the dynamics of g.i.e.t.s might be delicate because the “loss of regularity” in the usual KAM schemes forces the analysis of cohomological equations (linearized versions of the conjugacy problem) in several differentiability scales and Marmi–Moussa–Yoccoz theorem says that these changes of differentiabilty scale produce non-trivial effects on the numbers of obstructions (“codimensions”) to solve cohomological equations.

In any case, this interesting phenomenon concerning the codimension of local conjugacy classes of i.e.t.s of genus {g\geq 2} led Marmi–Moussa–Yoccoz to make a series of conjectures (cf. Section 1.2 of their paper) in order to further compare the local conjugacy classes of circle rotations and i.e.t.s of genus {g\geq 2}.

Among these fascinating conjectures, the second open problem in Section 1.2 of Marmi–Moussa–Yoccoz paper asks whether, for almost all i.e.t.s {T_0}, any {C^4} g.i.e.t. {T} with trivial conjugacy invariants (e.g., “simple deformations”) and {C^0} conjugated to {T_0} is also {C^1} conjugated to {T_0}. In other words, the {C^0} and {C^1} conjugacy classes of a typical i.e.t. {T_0} coincide.

In this short post, I would like to transcript below some remarks made during recent conversations with Pascal Hubert showing that the hypothesis “for almost all i.e.t.s {T_0}” can not be removed from the conjecture above. In a nutshell, we will see in the sequel that the self-similar standard interval exchange transformations associated to two special translation surfaces (called Eierlegende Wollmilchsau and Ornithorynque) of genera {3} and {4} are {C^0} but not {C^1} conjugated to a rich family of piecewise affine interval exchange transformations. Of course, I think that these examples are probably well-known to experts (and Jean-Christophe Yoccoz was probably aware of them by the time Marmi–Moussa–Yoccoz wrote down their conjectures), but I’m including some details of the construction of these examples here mostly for my own benefit.

Disclaimer: As usual, even though the content of this post arose from conversations with Pascal, all mistakes/errors in the sequel are my sole responsibility.

1. Preliminaries

1.1. Rauzy–Veech algorithm

The notion of “irrational rotation number” for generalized interval exchange transformations relies on the so-called Rauzy–Veech algorithm.

More concretely, given a {C^r}-g.i.e.t. {f:I\rightarrow I} sending a finite partition (modulo zero) {I=\bigcup\limits_{\alpha\in\mathcal{A}} I_{\alpha}^t} of {I} into closed subintervals {I_{\alpha}^t} disposed accordingly to a bijection {\pi_t:\mathcal{A}\rightarrow\{1,\dots,d\}} to a finite partition (modulo zero) {I=\bigcup\limits_{\alpha\in\mathcal{A}} I_{\alpha}^b} of {I} into closed subintervals {I_{\alpha}^b} disposed accordingly to a bijection {\pi_b:\mathcal{A}\rightarrow\{1,\dots,d\}} (via {C^r}-diffeomorphisms {f|_{I_{\alpha}^t}:I_{\alpha}^t\rightarrow I_{\alpha}^b}), an elementary step of the Rauzy–Veech algorithm produces a new {C^r}-g.i.e.t. {\mathcal{R}(f)} by taking the first return map of {f} to the interval {I\setminus J} where {J=I_{\pi_t^{-1}(d)}^t}, resp. {I_{\pi_b^{-1}(d)}^b} whenever {|I_{\pi_t^{-1}(d)}^t|<|I_{\pi_b^{-1}(d)}^b|}, resp. {|I_{\pi_t^{-1}(d)}^t|>|I_{\pi_b^{-1}(d)}^b|} (and {\mathcal{R}(f)} is not defined when {|I_{\pi_t^{-1}(d)}^t| = |I_{\pi_b^{-1}(d)}^b|}).

We say that a {C^r}-g.i.e.t. {f} has irrational rotation number whenever the Rauzy–Veech algorithm {\mathcal{R}} can be iterated indefinitely. This nomenclature is partly justified by the fact that Yoccoz generalized the proof of Poincaré’s theorem in order to establish that a {C^r}-g.i.e.t. {f} with irrational rotation number is topologically semi-conjugated to a standard, minimal i.e.t. {T_0}.

1.2. Denjoy counterexamples

Similarly to Denjoy’s theorem in the case of circle diffeomorphisms, the obstruction to promote topological semi-conjugations between {f} and {T_0} as above into {C^0}-conjugations is the presence of wandering intervals for {f}, i.e., non-trivial intervals {A} whose iterates under {f} are pairwise disjoint (i.e., {f^i(A)\cap f^j(A)=\emptyset} for all {i,j\in\mathbb{Z}}, {i\neq j}).

Moreover, as it was also famously established by Denjoy, a little bit of smoothness (e.g., {C^1} with derivative of bounded variation) suffices to preclude the existence of wandering intervals for circle diffeomorphisms, and, actually, some smoothness is needed because there are several examples of {C^1}-diffeomorphisms with any prescribed irrational rotation number and possessing wandering intervals. Nevertheless, it was pointed out by several authors (including Camelier–GutierrezBressaud–Hubert–MaasMarmi–Moussa–Yoccoz, …), a high amount of smoothness is not enough to avoid wandering intervals for arbitrary {C^r}-g.i.e.t.: indeed, there are many examples of piecewise affine interval exchange transformations possessing wandering intervals.

Remark 2 The facts mentioned in the previous two paragraphs partly justifies the nomenclature Denjoy counterexample for a {C^r}-g.i.e.t. with irrational rotation number possessing wandering intervals.

In the context of piecewise affine i.e.t.s, the Denjoy counterexamples are also characterized by the behavior of certain Birkhoff sums. More concretely, let {T} be a piecewise affine i.e.t. with irrational rotation number, say {T} is semi-conjugated to a standard i.e.t. {T_0:\bigcup I_{\alpha}^t\rightarrow \bigcup I_{\alpha}^b}. By definition, the logarithm {\log DT} of the slope of {T} is constant on the continuity intervals of {T} and, hence, it allows to naturally define a function {w} taking a constant value {w_{\alpha}} on each continuity interval {I_{\alpha}^t} of {T_0}. In this setting, it is possible to prove (see, e.g., the subsection 3.3.2 of Marmi–Moussa–Yoccoz paper) that {T} has wandering intervals if and only if there exists a point {x^*\in I=\bigcup I_{\alpha}^t} with bi-infinite {T_0}-orbit such that

\displaystyle \sum\limits_{n\in\mathbb{Z}} \exp(S_n w(x^*))<\infty

where the Birkhoff sum {S_nw(x^*)} at a point {x^*} with orbit {T_0^j(x^*)\in \textrm{int}(I_{\alpha_j}^t)} for all {j\in\mathbb{Z}} is defined as {S_nw(x^*)=\sum\limits_{j=0}^{n-1}w_{\alpha_j}}, resp. {\sum\limits_{j=-1}^{n}w_{\alpha_j}} for {n\geq 0}, resp. {n<0}.

For our subsequent purposes, it is worth to record the following interesting (direct) consequence of this “Birkhoff sums” characterization of piecewise affine Denjoy counterexamples:

Proposition 1 Let {T} be a piecewise affine i.e.t. topologically semi-conjugated to a standard, minimal i.e.t. {T_0}. Denote by {w} the piecewise constant function associated to the logarithms of the slopes of {T}.If {\liminf\limits_{n\rightarrow\infty} |S_n w(y)|<\infty} for all {y} with bi-infinite {T_0}-orbit, then {T} is topologically conjugated to {T_0} (i.e., {T} is not a Denjoy counterexample).

1.3. Special Birkhoff sums and the Kontsevich–Zorich cocycle

An elementary step of the Rauzy–Veech algorithm {\mathcal{R}} replaces a standard, minimal i.e.t. {T_0} on an interval {I=\bigcup\limits_{\alpha\in\mathcal{A}} I_{\alpha}^t} by a standard, minimal i.e.t. {\mathcal{R}(T_0)} given by the first return map of {T_0} on an appropriate subinterval {J=\bigcup\limits_{\alpha\in\mathcal{A}} J_{\alpha}^t\subset I}.

The special Birkhoff sum {\mathcal{S}} associated to an elementary step {\mathcal{R}} is the operator mapping a function {\phi:I\rightarrow I} to a function {\mathcal{S}\phi(x)=S_{r(x)}\phi(x):=\sum\limits_{j=0}^{r(x)-1}\phi(T_0^j(x))}, {x\in J}, where {r(x)} stands for the first return time to {J}.

The special Birkhoff sum operator {S} preserves the space of piecewise constant functions in the sense that {\mathcal{S}\phi} is constant on each {J_{\alpha}^t} whenever {\phi} is constant on each {I_{\beta}^t}. In particular, the restriction of {\mathcal{S}} to the space of such piecewise constant functions gives rise to a matrix {B:\mathbb{R}^{\mathcal{A}}\rightarrow \mathbb{R}^{\mathcal{A}}}. The family of matrices obtained from the successive iterates of the Rauzy–Veech algorithm provides a concrete description of the so-called Kontsevich–Zorich cocycle.

In summary, the behaviour of special Birkhoff sums (i.e., Birkhoff sums at certain “return” times) of piecewise constant functions is described by the Kontsevich–Zorich cocycle. Therefore, in view of Proposition 1, it is probably not surprising to the reader at this point that the Lyapunov exponents of the Kontsevich–Zorich cocycle will have something to do with the presence or absence of piecewise affine Denjoy counterexamples.

1.4. Eierlegende Wollmilchsau and Ornithorynque

The Eierlegende Wollmilchsau and Ornithorynque are two remarkable translation surfaces {M_{EW}} and {M_{O}} of genera {3} and {4} obtained from finite branched covers of the torus {\mathbb{T}^2}. Among their several curious features, we would like to point out that the following fact proved by Jean-Christophe Yoccoz and myself: if {T_0} is a standard i.e.t. on {\#\mathcal{A}=9} or {10} intervals (resp.) associated to the first return map of the translation flow {V} in a typical direction on {M_{EW}} or {M_{O}} (resp.), then there are vectors {q_V}, {p_{T_0}} and a {(\#\mathcal{A}-2)}-dimensional vector subspace {H} such that {\mathbb{R}^{\mathcal{A}} = \mathbb{R} q_V\oplus H\oplus \mathbb{R} p_{T_0}} is an equivariant decomposition with respect to the matrices of the Kontsevich–Zorich cocycle with the following properties:

  • (a) {q_V} generates the Oseledets direction of the top Lyapunov exponent {\theta_1>0};
  • (b) {p_{T_0}} generates the Oseledets direction of the smallest Lyapunov exponent {-\theta_1};
  • (c) the matrices of the Kontsevich–Zorich cocycle act on {H} through a finite group.

In the literature, the Lyapunov exponents {\pm\theta_1} are usually called the tautological exponents of the Kontsevich–Zorich cocycle. In this terminology, the third item above is saying that all non-tautological Lyapunov exponents of the Kontsevich–Zorich associated to {M_{EW}} and {M_{O}} vanish.

In the next two sections, we will see that this curious behaviour of the Kontsevich–Zorich cocycle of {M_{EW}} or {M_{O}} along {H} allows to construct plenty of piecewise affine i.e.t.s which are {C^0} but not {C^1} conjugated to standard (and uniquely ergodic) i.e.t.s.

2. “Il n’y a pas de contre-exemple de Denjoy affine par morceaux issu de {M_{EW}} et {M_{O}}

In this section (whose title is an obvious reference to a famous article by Jean-Christophe Yoccoz), we will see that the Eierlegende Wollmilchsau and Ornithorynque never produce piecewise affine Denjoy counterexamples with irrational rotation number of “bounded type”.

More precisely, let us consider {T} is a piecewise affine i.e.t. topologically semi-conjugated to {T_0} coming from (the first return map of the translation flow in the direction of a pseudo-Anosov homeomorphism of) {M_{EW}} or {M_{O}}. It is well-known that the piecewise constant function {w} associated to the logarithms of the slopes {DT} of {T} belongs to {H\oplus \mathbb{R} p_{T_0}} (see, e.g., Section 3.4 of Marmi–Moussa–Yoccoz paper). In order to simplify the exposition, we assume that the “irrational rotation number” {T_0} has “bounded type”, that is, {T_0} is self-similar in the sense that some of its iterates {\mathcal{R}^k(T_0)} under the Rauzy–Veech algorithm actually coincides with {T_0} up to scaling.

If {w\in H}, then the item (c) from Subsection 1 above implies that all special Birkhoff sums of {w} (in the future and in the past) are bounded. From this fact, we conclude that {\liminf\limits_{n\rightarrow\infty} |S_nw(y)|\leq C} for all {y} with bi-infinite {T_0}-orbit: indeed, as it is explained in details in Bressaud–Bufetov–Hubert article, if {T_0} is self-similar, then the orbits of {T_0} can be described by a substitution on a finite alphabet {\mathcal{A}} and this allows to select a bounded subsequence of {S_nw(y)} thanks to the repetition of certain words in the prefix-suffix decomposition.

In particular, it follows from Proposition 1 above that there is no Denjoy counterexample among the piecewise affine i.e.t.s {T} topologically semi-conjugated to a self-similar standard i.e.t. {T_0} coming from {M_{EW}} or {M_O} such that {w\in H}.

Remark 3 Actually, it is possible to explore the fact that {p_{T_0}} is a stable vector (i.e., it generates the Oseledets space of a negative Lyapunov exponent) to remove the constraint “{w\in H}” from the statement of the previous paragraph.

In other words, we showed that any {w\in H} always provides a piecewise affine i.e.t. {C^0}-conjugated to {T_0}. Note that this is a relatively rich family of piecewise affine i.e.t.s because {H} is a vector space of dimension {7}, resp. {8}, when {T_0} is a self-similar standard i.e.t. coming from {M_{EW}}, resp. {M_O}.

3. Cohomological obstructions to {C^1} conjugations

Closing this post, we will show that the elements {w\in H\setminus\{0\}} always lead to piecewise affine i.e.t.s which are not {C^1} conjugated to self-similar standard i.e.t.s of {M_{EW}} or {M_O}. Of course, this shows that the {C^0} and {C^1} conjugacy classes of a self-similar standard i.e.t. of {M_{EW}} or {M_O} are distinct and, a fortiori, the Marmi–Moussa–Yoccoz conjecture about the coincidence of {C^0} and {C^1} conjugacy classes of standard i.e.t.s becomes false if we remove “for almost all standard i.e.t.s” from its statement.

Suppose that {T} is a piecewise affine i.e.t. {C^1}-conjugated to a self-similar standard i.e.t. {T_0} of {M_{EW}} or {M_O}, say {T\circ h = h\circ T_0} for some {C^1}-diffeomorphism {h}. By taking derivatives, we get

\displaystyle (DT\circ h) \cdot h' = h'\circ T_0

since {T_0} is an isometry. Of course, we recognize the slope of {T} on the left-hand side of the previous equation. So, by taking logarithms, we obtain

\displaystyle w=\Psi\circ T_0 - \Psi

where {\Psi:=\log h'} is a {C^0} function. In other terms, {\Psi} is a solution of the cohomological equation and {w} is a {C^0}-coboundary. Hence, the Birkhoff sums {S_nw=\Psi\circ T_0^n-\Psi} are bounded and, by continuity of {\Psi}, the special Birkhoff sums {\mathcal{S}w} of {w} converge to zero. Equivalently, {w\in\mathbb{R}^{\mathcal{A}}} belongs to the weak stable space of the Kontsevich–Zorich cocycle (compare with Remark 3.9 of Marmi–Moussa–Yoccoz paper).

However, the item (c) from Subsection 1.4 above tells that the Kontsevich–Zorich cocycle acts on {w\in H\setminus\{0\}} through a finite group of matrices and, thus, {w\in H\setminus\{0\}} can not converge to zero under the Kontsevich–Zorich cocycle.

This contradiction proves that {T} is not {C^1}-conjugated to {T_0}, as desired.

Patrice Le Calvez and Jean-Christophe Yoccoz showed in 1997 that there are no minimal homemorphisms on the infinite annulus \mathbb{R}/\mathbb{Z}\times\mathbb{R}.

Their beautiful paper was motivated by the quest of finding minimal homeomorphisms on punctured spheres \mathbb{S}^2\setminus\{p_1,\dots,p_k\}. More concretely, the non-existence of such homeomorphism was previously known when k=0 (as an easy application of the features of Lefschetz indices), k=1 (thanks to the works of Brouwer and Guillou), and k\geq 3 (thanks to the work of Handel), so that the main result in Jean-Christophe and Patrice paper ensures the non-existence of minimal homeomorphisms in the remaining (harder) case of k=2.

A key step in Jean-Christophe and Patrice proof of their theorem above is to establish the following result about the sequence of Lefschetz indices i(f^k,z) of iterates f^k of a local homeomorphism f of the plane at a fixed point z of f: if z is not a sink nor a source, then there are integers q, r\geq 1 such that

i(f^k,z) = \left\{\begin{array}{cc} 1-rq & \textrm{ if }k\in q\mathbb{Z} \\ 1 & \textrm{ otherwise } \end{array}\right.

As it turns out, Jean-Christophe and Patrice planned a sequel to this paper with the idea of extending their techniques to compute the sequences of Lefschetz indices of periodic points of f belonging to any given Jordan domain U with K=\bigcap\limits_{n\in\mathbb{Z}} f^{-k}(U) is compact.

In fact, this plan was already known when the review of Jean-Christophe and Patrice paper came out (see here), and, as Patrice told me, some arguments from this promised subsequent work were used in the literature as a sort of folklore.

Nevertheless, a final version of this preprint was never released, and, even worse, some portions of the literature were invoking some arguments from a version of the preprint which was available only to Jean-Christophe (but not to Patrice).

Of course, this situation became slightly problematic when Jean-Christophe passed away, but fortunately Patrice and I were able to locate the final version of the preprint in Jean-Christophe’s mathematical archives. (Here, the word “final” means that all mathematical arguments are present, but the preprint has no abstract, introduction, or other “cosmetic” details.)

After doing some editing (to correct minor typos, add better figures [with the aid of Aline Cerqueira], etc.), Patrice and I are happy to announce that the folklore preprint by Jean-Christophe and Patrice (entitled “Suite des indices de Lefschetz des itérés pour un domaine de Jordan qui est un bloc isolant“) is finally publicly available here. We hope that you will enjoy reading this text (written in French)!

Posted by: matheuscmss | June 26, 2019

Yoccoz book collection at ICTP

The mathematical books of Michel Herman were donated to IMPA’s library by Jean-Christophe Yoccoz in the early 2000s: it amounts to more than 700 books and the complete list of titles can be found here.

This beautiful gesture of donating the books of a great mathematician to a developing country helped in the training of several mathematicians. In particular, I remember that reading Herman’s books during my PhD at IMPA was a singular experience in two aspects: intellectually, it gave me access to many high level mathematical topics, and olfactively, it was curious to get a smell of cigarette smoke out of old books (rather than the “usual” smell). (As I learned later, this experience was fully justified by the facts that Herman was an avid reader and a heavy smoker.)

Of course, this attitude of Jean-Christophe prompted me to discuss with Stefano Marmi about an appropriate destination in Africa to send Yoccoz’s mathematical books. After some conversations, we contacted ICTP (and, in particular, Stefano Luzzatto) to inquire about the possibility of sending Yoccoz’s books to Senegal (as a sort of “retribution” for the good memories that Jean-Christophe had during his visit to AIMS-Senegal and University of Dakar in December 2011) or Rwanda.

Unfortunately, some organisational difficulties made that we were obliged to split this plan into two parts. More concretely, rather than taking unnecessary risks by rushing to send Yoccoz’s books directly to Africa, last Thursday I sent all of them (a total of 13 boxes weighting approximately 35kg each) to ICTP library, so that they can already be useful to all ICTP visitors — in particular those coming from developing countries — instead of staying locked up in my office (where they were only sporadically read by me). In this way, we get some extra time to carefully think the definitive transfer of Yoccoz’s books to Africa while making them already publicly available.

Anyhow, the next time you visit ICTP, I hope that Yoccoz’s books will help you in some way!


Let {S_{g,n}} be a surface of genus {g\geq 0} with {n\geq 0} punctures. Given a Lie group {G}, the {G}-character variety of {S_{g,n}} is the space {X(S_{g,n},G)} of representations {\pi_1(S_{g,n})\rightarrow G} modulo conjugations by elements of {G}.

The mapping class group {\textrm{Mod}(S_{g,n})} of isotopy classes of orientation-preserving diffeomorphisms of {S_{g,n}} acts naturally on {X(S_{g,n},G)}.

The dynamics of mapping class groups on character varieties was systematically studied by Goldman in 1997: in his landmark paper, he showed that the {\textrm{Mod}(S_{g,0})}-action on {X(S_{g,0},SU(2))} is ergodic with respect to Goldman–Huebschmann measure whenever {g\geq 1}.

Remark 1 This nomenclature is not standard: we use it here because Goldman showed here that {X(S_{g,0},SU(2))} has a volume form coming from a natural symplectic structure and Huebschmann proved here that this volume form has finite mass.

The ergodicity result above partly motivates the question of understanding the dynamics of individual elements of mapping class groups acting on {SU(2)}-character varieties.

In this direction, Brown studied in 1998 the actions of elements of {SL(2,\mathbb{Z}) = \textrm{Mod}(S_{1,1})} on the character variety {X(S_{1,1}, SU(2))}. As it turns out, if {\gamma\in \pi_1(S_{1,1})} is a small loop around the puncture, then the {SL(2,\mathbb{Z})}-action on {X(S_{1,1},SU(2))} preserves each level set {\kappa^{-1}(k)}, {k\in\mathbb{R}}, of the function {\kappa:X(S_{1,1},SU(2))\rightarrow\mathbb{R}} sending {[\rho]\in X(S_{1,1},SU(2))} to the trace of the matrix {\rho(\gamma)}. Here, Brown noticed that the dynamics of elements of {SL(2,\mathbb{Z})} on level sets {\kappa^{-1}(k)} with {k} close to {-2} fit the setting of the celebrated KAM theory (assuring the stability of non-degenerate elliptic periodic points of smooth area-preserving maps). In particular, Brown tried to employ Moser’s twisting theorem to conclude that no element of {SL(2,\mathbb{Z})} can act ergodically on all level sets {\kappa^{-1}(k)}, {k\in[-2,2]}.

Strictly speaking, Brown’s original argument is not complete because Moser’s theorem is used without checking the twist condition.

In the sequel, we revisit Brown’s work in order to show that his conclusions can be derived once one replaces Moser’s twisting theorem by a KAM stability theorem from 2002 due to Rüssmann.

1. Statement of Brown’s theorem

1.1. {SU(2)}-character variety of a punctured torus

Recall that the fundamental group {\pi_1(S_{1,1})} of an once-punctured torus is naturally isomorphic to a free group {F_2} on two generators {\alpha} and {\beta} such that the commutator {[\alpha, \beta]} corresponds to a loop {\gamma} around the puncture of {S_{1,1}}.

Therefore, a representation {\rho:\pi_1(S_{1,1})\rightarrow SU(2)} is determined by a pair of matrices {\rho(\alpha), \rho(\beta)\in SU(2)}, and an element {[\rho]\in X(S_{1,1},SU(2))} of the {SU(2)}-character variety of {S_{1,1}} is determined by the simultaneous conjugacy class {(\phi\rho(\alpha)\phi^{-1}, \phi\rho(\beta)\phi^{-1})}, {\phi\in SU(2)}, of a pair of matrices {(\rho(\alpha), \rho(\beta))\in SU(2)\times SU(2)}.

The traces {x=\textrm{tr}(\rho(\alpha))}, {y=\textrm{tr}(\rho(\beta))} and {z=\textrm{tr}(\rho(\alpha\beta))} of the matrices {\rho(\alpha)}, {\rho(\beta)} and {\rho(\alpha\beta)} provide an useful system of coordinates on {X(S_{1,1}, SU(2))}: algebraically, this is an incarnation of the fact that the ring {\mathbb{R}[SU(2)\times SU(2)]^{SU(2)}} of invariants of {(A,B)\in SU(2)\times SU(2)} is freely generated by the traces of {A}, {B} and {AB}.

In particular, the following proposition expresses the trace of {\rho(\gamma)=\rho([\alpha,\beta])} in terms of {x=\textrm{tr}(\rho(\alpha))}, {y=\textrm{tr}(\rho(\beta))} and {z=\textrm{tr}(\rho(\alpha\beta))}.

Proposition 1 Given {A, B\in SL(2,\mathbb{C})}, one has

\displaystyle \textrm{tr}(ABA^{-1}B^{-1}) = \textrm{tr}(A)^2 + \textrm{tr}(B)^2 + \textrm{tr}(AB)^2 - \textrm{tr}(A)\textrm{tr}(B)\textrm{tr}(AB)-2

Proof: By Cayley–Hamilton theorem (or a direct calculation), any {M\in SL(2,\mathbb{C})} satisfies {M^2-\textrm{tr}(M) M + \textrm{Id}=0}, i.e., {M+M^{-1} = \textrm{tr}(M) \textrm{Id}}.

Hence, for any {X, Y\in SL(2,\mathbb{C})}, one has

\displaystyle XY+Y^{-1}X^{-1} = \textrm{tr}(XY) \textrm{Id} \quad \textrm{and} \quad XY^{-1}+YX^{-1} = \textrm{tr}(XY^{-1}) \textrm{Id},

so that

\displaystyle \textrm{tr}(XY)+\textrm{tr}(XY^{-1}) = \textrm{tr}(X)\textrm{tr}(Y).

It follows that, for any {A, B\in SL(2,\mathbb{C})}, one has

\displaystyle \textrm{tr}(ABA^{-1}B^{-1})+\textrm{tr}(ABA^{-1}B) = \textrm{tr}(ABA^{-1})\textrm{tr}(B) = \textrm{tr}(B)^2


\displaystyle \textrm{tr}(ABA^{-1}B)+\textrm{tr}(AB(A^{-1}B)^{-1}) = \textrm{tr}(AB)\textrm{tr}(A^{-1}B).

Since {\textrm{tr}(AB(A^{-1}B)^{-1}) = \textrm{tr}(A^2) = \textrm{tr}(A)^2-2} and {\textrm{tr}(A^{-1}B) +\textrm{tr}(AB) = \textrm{tr}(A)\textrm{tr}(B)}, the proof of the proposition is complete. \Box

1.2. Basic dynamics of {SL(2,\mathbb{Z})} on character varieties

Recall that the mapping class group {\textrm{Mod}(S_{1,1})} is generated by Dehn twists {\tau_{\alpha}} and {\tau_{\beta}} about the generators {\alpha} and {\beta} of {\pi_1(S_{1,1})}. In appropriate coordinates on the once-punctured torus {S_{1,1}}, the isotopy classes of these Dehn twists are represented by the actions of the matrices

\displaystyle \tau_{\alpha} = \left(\begin{array}{cc}1&1\\0&1\end{array}\right), \tau_{\beta} = \left(\begin{array}{cc}1&0\\1&1\end{array}\right) \in SL(2,\mathbb{Z})

on the flat torus {\mathbb{R}^2/\mathbb{Z}^2}. In particular, at the homotopy level, the actions of {\tau_{\alpha}} and {\tau_{\beta}} on {\pi_1(S_{1,1})} are given by the Nielsen transformations

\displaystyle \tau_{\alpha}(\alpha)=\alpha, \quad \tau_{\alpha}(\beta)=\beta\alpha, \quad \tau_{\beta}(\alpha) = \alpha\beta, \quad \tau_{\beta}(\beta)=\beta. \ \ \ \ \ (1)

Since the elements of {\textrm{Mod}(S_{1,1})=SL(2,\mathbb{Z})} fix the puncture of {S_{1,1}}, they preserve the homotopy class {\gamma=[\alpha,\beta]\in\pi_1(S_{1,1})} of a small loop around the puncture. Therefore, the {\textrm{Mod}(S_{1,1})}-action on the character variety {X(S_{1,1}, SU(2))} respects the level sets {\kappa^{-1}(k)}, {k\in[-2,2]}, of the function {\kappa: X(S_{1,1}, SU(2))\rightarrow [-2,2]} given by

\displaystyle \kappa([\rho]) := \textrm{tr}(\rho(\gamma)).

Furthermore, each level set {\kappa^{-1}(k)}, {-2<k\leq 2}, carries a finite (GoldmanHuebschmann) measure coming from a natural {\textrm{Mod}(S_{1,1})}-invariant symplectic structure.

In this context, the level set {\kappa^{-1}(2)} corresponds to impose the restriction {\rho(\gamma)=\textrm{Id}\in SU(2)}, so that {\kappa^{-1}(2)} is naturally identified with the character variety {X(S_{1,0}, SU(2))}.

In terms of the coordinates {x=\textrm{tr}(\rho(\alpha))}, {y=\textrm{tr}(\rho(\beta))} and {z=\textrm{tr}(\rho(\alpha\beta))} on {X(S_{1,1}, SU(2))}, we can use Proposition 1 (and its proof) and (1) to check that

\displaystyle \kappa(x,y,z)=x^2+y^2+z^2-xyz-2 \ \ \ \ \ (2)


\displaystyle \tau_{\alpha}(x,y,z)= (x,z,xz-y), \quad \tau_{\beta}^{-1}(x,y,z)=(xy-z,y,x). \ \ \ \ \ (3)

Hence, we see from (2) that:

  • the level set {\kappa^{-1}(-2)} consists of a single point {(0,0,0)};
  • the level sets {\kappa^{-1}(k)}, {-2<k<2}, are diffeomorphic to {2}-spheres;
  • the character variety {X(S_{1,1},SU(2))} is a {3}-dimensional orbifold whose boundary {\kappa^{-1}(2)} is a topological sphere with 4 singular points (of coordinates {2(\varepsilon_1,\varepsilon_2,\varepsilon_3)\in\{-2,2\}^3} with {\varepsilon_1\varepsilon_2\varepsilon_3=1}) corresponding to the character variety {X(S_{1,0}, SU(2))}.

After this brief discussion of some geometrical aspects of {X(S_{1,1}, SU(2))}, we are ready to begin the study of the dynamics of {\textrm{Mod}(S_{1,1})}. For this sake, recall that the elements of {\textrm{Mod}(S_{1,1})=SL(2,\mathbb{Z})} are classified into three types:

  • {g\in SL(2,\mathbb{Z})} is called elliptic whenever {|\textrm{tr}(g)|<2};
  • {g\in SL(2,\mathbb{Z})} is called parabolic whenever {|\textrm{tr}(g)|=2};
  • {g\in SL(2,\mathbb{Z})} is hyperbolic whenever {|\textrm{tr}(g)|>2}.

The elliptic elements {g\in SL(2,\mathbb{Z})} have finite order (because {\textrm{tr}(g)= 0, \pm 1} and {g^2-\textrm{tr}(g) g + \textrm{Id}=0}) and the parabolic elements {g\in SL(2,\mathbb{Z})} are conjugated to {\pm\tau_{\alpha}^n} for some {n\in\mathbb{Z}}.

In particular, if {g\in SL(2,\mathbb{Z})} is elliptic, then {g} leaves invariant non-trivial open subsets of each level set {\kappa^{-1}(k)}, {-2<k\leq 2}. Moreover, if {g\in SL(2,\mathbb{Z})} is parabolic, then {g} preserves a non-trivial and non-peripheral element {\delta\in\pi_1(S_{1,1})} and, a fortiori, {g} preserves the level sets of the function {f_{\delta}: X(S_{1,1}, SU(2))\rightarrow [-2,2]}, {f_{\delta}([\rho]) := \textrm{tr}(\rho(\delta))}. Since any such function {f_{\delta}} has a non-constant restriction to any level set {\kappa^{-1}(k)}, {-2<k\leq 2}, Brown concluded that:

Proposition 2 (Proposition 4.3 of Brown’s paper) If {g\in SL(2,\mathbb{Z})} is not hyperbolic, then its action on {\kappa^{-1}(k)} is not ergodic whenever {-2<k\leq 2}.

On the other hand, Brown observed that the action of any hyperbolic element of {SL(2,\mathbb{Z})} on {\kappa^{-1}(2)} can be understood via a result of Katok.

Proposition 3 (Theorem 4.1 of Brown’s paper) Any hyperbolic element of {SL(2,\mathbb{Z})} acts ergodically on {\kappa^{-1}(2)}.

Proof: The level set {\kappa^{-1}(2)} is the character variety {X(S_{1,0}, SU(2))}. In other words, a point in {\kappa^{-1}(2)} represents the simultaneous conjugacy class of a pair {(\rho(\alpha), \rho(\beta))} of commuting matrices in {SU(2)}.

Since a maximal torus of {SU(2)} is a conjugate of the subgroup

\displaystyle T = \left\{\left(\begin{array}{cc} e^{2\pi i\theta} & 0 \\ 0 & e^{-2\pi i\theta}\end{array}\right):\theta\in\mathbb{R}/\mathbb{Z}\right\},

we have that {X(S_{1,0}, SU(2))} is the set of simultaneous conjugacy classes of elements of {T\times T}. In view of the action by conjugation

\displaystyle \left(\begin{array}{cc} 0 & 1 \\ -1 & 0\end{array}\right) \left(\begin{array}{cc} e^{i\theta} & 0 \\ 0 & e^{-i\theta}\end{array}\right)\left(\begin{array}{cc} 0 & 1 \\ -1 & 0\end{array}\right)^{-1} = \left(\begin{array}{cc} e^{-i\theta} & 0 \\ 0 & e^{i\theta}\end{array}\right)

of the element {w=\left(\begin{array}{cc} 0 & 1 \\ -1 & 0\end{array}\right)} of the Weyl subgroup of {SU(2)}, we have

\displaystyle X(S_{1,0}, SU(2)) = (T\times T)/w.

In terms of the coordinates {(\theta,\phi)\in\mathbb{R}^2/\mathbb{Z}^2} given by the phases of the elements

\displaystyle (\left(\begin{array}{cc} e^{2\pi i\theta} & 0 \\ 0 & e^{-2\pi i\theta}\end{array}\right), \left(\begin{array}{cc} e^{2\pi i\phi} & 0 \\ 0 & e^{-2\pi i\phi}\end{array}\right))\in T\times T,

the element {w} acts by {(\theta,\phi)\mapsto(-\theta,-\phi)}, so that {X(S_{1,0}, SU(2))} is the topological sphere obtained from the quotient of {\mathbb{R}^2/\mathbb{Z}^2} by its hyperelliptic involution {\iota} (and {X(S_{1,0}, SU(2))} has only four singular points located at the subset {\{0, 1/2\}^2} of fixed points of the hyperelliptic involution). Moreover, an element {\left(\begin{array}{cc} a & b \\ c & d\end{array}\right)\in SL(2,\mathbb{Z})} acts on {T\times T} by mapping {(\theta,\phi)} to {(a\theta+c\phi, b\theta+d\phi)}.

In summary, the action of {SL(2,\mathbb{Z})} on {\kappa^{-1}(2)} is given by the usual {SL(2,\mathbb{Z})}-action on the topological sphere {(\mathbb{R}^2/\mathbb{Z}^2)/\iota} induced from the standard {SL(2,\mathbb{Z})} on the torus {\mathbb{R}^2/\mathbb{Z}^2}.

By a result of Katok, it follows that the action of any hyperbolic element of {SL(2,\mathbb{Z})} on {\kappa^{-1}(2)} is ergodic (and actually Bernoulli). \Box

1.3. Brown’s theorem

The previous two propositions raise the question of the ergodicity of the action of hyperbolic elements of {SL(2,\mathbb{Z})} on the level sets {\kappa^{-1}(k)}, {-2<k<2}. The following theorem of Brown provides an answer to this question:

Theorem 4 Let {g} be an hyperbolic element of {SL(2,\mathbb{Z})}. Then, there exists {-2<k<2} such that {g} does not act ergodically on {\kappa^{-1}(k)}.

Very roughly speaking, Brown establishes Theorem 4 along the following lines. One starts by performing a blowup at the origin {\kappa^{-1}(-2)=\{(0,0,0)\}} in order to think of the action of {g} on {X(S_{1,1},SU(2))} as a one-parameter family {g_{(k)}}, {-2\leq k\leq 2}, of area-preserving maps of the {2}-sphere such that {g_{(-2)}} is a finite order element of {SO(3)}. In this way, we have that {g_{(k)}} is a non-trivial one-parameter family going from a completely elliptic behaviour at {k=-2} to a non-uniformly hyperbolic behaviour at {k=2}. This scenario suggests that the conclusion of Theorem 4 can be derived via KAM theory in the elliptic regime.

In the next (and last) section of this post, we revisit Brown’s ideas leading to Theorem 4 (with an special emphasis on its KAM theoretical aspects).

2. Revisited proof of Brown’s theorem

2.1. Blowup of the origin

The origin {\kappa^{-1}(-2)} of the character variety {X(S_{1,1}, SU(2))} can be blown up into a sphere of directions {S_{-2}}. The action of {SL(2,\mathbb{Z})} on {S_{-2}} factors through an octahedral subgroup of {SO(3)}: this follows from the fact that (3) implies that the generators {\tau_{\alpha}} and {\tau_{\beta}} of {SL(2,\mathbb{Z})} act on {S_{-2}} as

\displaystyle \tau_{\alpha}|_{S_{-2}}(\dot{x},\dot{y},\dot{z}) = (\dot{x},\dot{z},-\dot{y}), \quad \tau_{\beta}^{-1}|_{S_{-2}}(\dot{x},\dot{y},\dot{z})=(-\dot{z},\dot{y},\dot{x}).

In this way, each element {g\in SL(2,\mathbb{Z})} is related to a root of unity

\displaystyle \lambda_{-2}(g)\in U(1)=\{w\in \mathbb{C}: |w|=1\}

of order {\leq 4} coming from the eigenvalues of the derivative of {g|_{S_{-2}}} at any of its fixed points.

Example 1 The hyperbolic element {\left(\begin{array}{cc} 2 & 1 \\ 1 & 1\end{array}\right) = \tau_{\alpha}\tau_{\beta}} acts on {S_{-2}} via the element {(\dot{x},\dot{y},\dot{z})\mapsto (\dot{z},-\dot{x},-\dot{y})} of {SO(3)} of order {3}.

2.2. Bifurcations of fixed points

An hyperbolic element {g\in SL(2,\mathbb{Z})} induces a non-trivial polynomial automorphism of {\mathbb{R}^3} whose restriction to {\kappa^{-1}([-2,2])} describe the action of {g} on {X(S_{1,1}, SU(2))}. In particular, the set {L_g} of fixed points of this polynomial automorphism in {\kappa^{-1}([-2,2])} is a semi-algebraic set of dimension {< 3}.

Actually, it is not hard to exploit the fact that {g} acts on the level sets {\kappa^{-1}(k)}, {k\in[-2,2]}, through area-preserving maps to compute the Zariski tangent space to {L_{g}} in order to verify that {L_g} is one-dimensional (cf. Proposition 5.1 in Brown’s work).

Moreover, this calculation of Zariski tangent space can be combined with the fact that any hyperbolic element {g\in SL(2,\mathbb{Z})} has a discrete set of fixed points in {\mathbb{R}^2/\mathbb{Z}^2} and, a fortiori, in {\kappa^{-1}(2)=X(S_{1,1}, SU(2))} to get that {L_g} is transverse to {\kappa} except at its discrete subset of singular points and, hence, {L_g\cap \kappa^{-1}(k)} is discrete for all {-2\leq k\leq 2} (cf. Proposition 5.2 in Brown’s work).

Example 2 The hyperbolic element {\left(\begin{array}{cc} 2 & 1 \\ 1 & 1\end{array}\right) = \tau_{\alpha}\tau_{\beta}} acts on {X(S_{1,1}, SU(2))} via the polynomial automorphism {(x,y,z)\mapsto (z, zy-x, z(zy-x)-y)} (cf. (3)). Thus, the corresponding set of fixed points is given by the equations

\displaystyle x=z, \quad y=zy-x, \quad z=z(zy-x)-y

describing an embedded curve in {\mathbb{R}^3}.

In general, the eigenvalues {\lambda(p), \lambda(p)^{-1}} of the derivative at {p\in L_g} of the action of an hyperbolic element {g\in SL(2, \mathbb{Z})} on {\kappa^{-1}(\kappa(p))} can be continuously followed along any irreducible component {\ell_g\ni p} of {L_g}.

Furthermore, it is not hard to check that {\lambda} is not constant on {\ell_g} (cf. Lemma 5.3 in Brown’s work). Indeed, this happens because there are only two cases: the first possibility is that {\ell_g} connects {\kappa^{-1}(-2)} and {\kappa^{-1}(2)} so that {\lambda} varies from {\lambda_{-2}(g)\in U(1)} to the unstable eigenvalue of {g} acting on {\mathbb{R}^2/\mathbb{Z}^2}; the second possibility is that {\ell_g} becomes tangent to {\kappa^{-1}(k)} for some {-2<k<2} so that the Zariski tangent space computation mentioned above reveals that {\lambda} varies from {1} (at {\ell_g\cap\kappa^{-1}(k)}) to some value {\neq 1} (at any point of transverse intersection between {\ell_g} and a level set of {\kappa}).

2.3. Detecting Brjuno elliptic periodic points

The discussion of the previous two subsections allows to show that the some portions of the action of an hyperbolic element {g\in SL(2,\mathbb{Z})} fit the assumptions of KAM theory.

Before entering into this matter, recall that {e^{2\pi i\theta}\in U(1)} is Brjuno whenever {\theta} is an irrational number whose continued fraction has partial convergents {(p_k/q_k)_{k\in\mathbb{Z}}} satisfying

\displaystyle \sum\limits_{k=1}^{\infty} \frac{\log q_{k+1}}{q_k}<\infty.

For our purposes, it is important to note that the Brjuno condition has full Lebesgue measure on {U(1)}.

Let {g\in SL(2,\mathbb{Z})} be an hyperbolic element. We have three possibilities for the limiting eigenvalue {\lambda_{-2}(g)\in U(1)}: it is not real, it equals {1} or it equals {-1}.

If the limiting eigenvalue {\lambda_{-2}(g)\in U(1)} is not real, then we take an irreducible component {\ell_g} intersecting the origin {\kappa^{-1}(-2)}. Since {\lambda} is not constant on {\ell_g} implies that {\lambda(\ell_{g})} contains an open subset of {U(1)}. Thus, we can find some {-2<k<2} such that {\{p\}=\ell_g\cap \kappa^{-1}(k)} has a Brjuno eigenvalue {\lambda(p)}, i.e., the action of {g} on {\kappa^{-1}(k)} has a Brjuno fixed point.

If the limiting eigenvalue is {\lambda_{-2}(g)=1}, we use Lefschetz fixed point theorem on the sphere {\kappa^{-1}(k)} with {k} close to {-2} to locate an irreducible component {\ell_g} of {L_g} such that {\{p_k\}=\ell_g\cap\kappa^{-1}(k)} is a fixed point of positive index of {g|_{\kappa^{-1}(k)}} for {k} close to {-2}. On the other hand, it is known that an isolated fixed point of an orientation-preserving surface homeomorphism which preserves area has index {<2}. Therefore, {p_k} is a fixed point of {g|_{\kappa^{-1}(k)}} of index {1} with multipliers {\lambda(p_k), \lambda(p_k)^{-1}} close to {1} whenever {k} is close to {-2}. Since a hyperbolic fixed point with positive multipliers has index {-1}, it follows that {p_k} is a fixed point with {\lambda(p_k)\in U(1)\setminus\{1\}} when {k} is close to {-2}. In particular, {\lambda(\ell_g)} contains an open subset of {U(1)} and, hence, we can find some {-2<k<2} such that {p_k} has a Brjuno multiplier {\lambda(p_k)}.

If the limiting eigenvalue is {\lambda_{-2}(g)=-1}, then {g^2} is an hyperbolic element with limiting eigenvalue {\lambda_{-2}(g^2)=1}. From the previous paragraph, it follows that we can find some {-2<k<2} such that {\kappa^{-1}(k)} contains a Brjuno elliptic fixed point of {g^2|_{\kappa^{-1}(k)}}.

In any event, the arguments above give the following result (cf. Theorem 4.4 in Brown’s work):

Theorem 5 Let {g\in SL(2,\mathbb{Z})} be an hyperbolic element. Then, there exists {-2<k<2} such that {g|_{\kappa^{-1}(k)}} has a periodic point of period one or two with a Brjuno multiplier.

2.4. Moser’s twisting theorem and Rüssmann’s stability theorem

At this point, the idea to derive Theorem 4 is to combine Theorem 5 with KAM theory ensuring the stability of certain types of elliptic periodic points.

Recall that a periodic point is called stable whenever there are arbitrarily small neighborhoods of its orbit which are invariant. In particular, the presence of a stable periodic point implies the non-ergodicity of an area-preserving map.

A famous stability criterion for fixed points of area-preserving maps is Moser’s twisting theorem. This result can be stated as follows. Suppose that {f} is an area-preserving {C^r}, {r\geq 4}, map having an elliptic fixed point at origin {(0,0)\in \mathbb{R}^2} with multipliers {e^{2\pi i\theta}}, {e^{-2\pi i\theta}} such that {n\theta\notin\mathbb{Z}} for {n=1, 2, 3, \dots, r}. After performing an appropriate area-preserving change of variables (tangent to the identity at the origin), one can bring {f} into its Birkhoff normal form, i.e., {f} has the form

\displaystyle \left(\begin{array}{c}\xi \\ \eta\end{array}\right) \mapsto \left(\begin{array}{c} \xi\cos(\sum\limits_{n=0}^s\gamma_n(\xi^2+\eta^2)^n)-\eta\sin(\sum\limits_{n=0}^s \gamma_n(\xi^2+\eta^2)^n) \\ \xi\sin(\sum\limits_{n=0}^s \gamma_n(\xi^2+\eta^2)^n)+\eta\cos(\sum\limits_{n=0}^s \gamma_n(\xi^2+\eta^2)^n)\end{array}\right) + h(\xi,\eta)

where {s=[r/2]-1}, {\gamma_0=2\pi\theta}, {\gamma_1, \dots, \gamma_s} are uniquely determined Birkhoff constants and {h(\xi,\eta)} denotes higher order terms.

Theorem 6 (Moser twisting theorem) Let {f} be an area-preserving map as in the previous paragraph. If {\gamma_n\neq0} for some {1\leq n\leq s}, then the origin {(0,0)\in\mathbb{R}^2} is a stable fixed point.

The nomenclature “twisting” comes from the fact {\gamma_1\neq 0} when {f} is a twist map, i.e., {f} has the form {f(r,\theta)=(r,\theta+\mu(r))} in polar coordinates where {\mu} is a smooth function with {|\mu'(0)|\neq 0}. In the literature, the condition “{\gamma_n\neq0} for some {n}” is called twist condition.

Example 3 The Dehn twist {\tau_{\alpha}} induces the polynomial automorphism {\tau_{\alpha}(x,y,z)= (x,z,xz-y)} on {X(S_{1,1}, SU(2))=\kappa^{-1}([-2,2])}. Each level set {\kappa^{-1}(k)}, {-2<k<2}, is a smooth {2}-sphere which is swept out by the {\tau_{\alpha}}-invariant ellipses {C_{k,x_0}} obtained from the intersections between {\kappa^{-1}(k)} and the planes of the form {\{x_0\}\times \mathbb{R}^2}.Goldman observed that, after an appropriate change of coordinates, each {C_{k,x_0}} becomes a circle where {\tau_{\alpha}} acts as a rotation by angle {\cos^{-1}(x_0/2)}. In particular, the restriction of {\tau_{\alpha}} to each level set {\kappa^{-1}(k)} is a twist map near its fixed points {(\pm\sqrt{2+k},0,0)}.

In his original argument, Brown deduced Theorem 4 from (a weaker version of) Theorem 5 and Moser’s twisting theorem. However, Brown employed Moser’s theorem with {r=4} while checking only the conditions on the multipliers of the elliptic fixed point but not the twist condition {\gamma_1\neq 0}.

As it turns out, it is not obvious to check the twist condition in Brown’s setting (especially because it is not satisfied at the sphere of directions {S_{-2}}).

Fortunately, Rüssmann discovered that a Brjuno elliptic fixed point of a real-analytic area-preserving map is always stable (independently of twisting conditions):

Theorem 7 (Rüssmann) Any Brjuno elliptic periodic point of a real-analytic area-preserving map is stable.

Remark 2 Actually, Rüssmann obtained the previous result by showing that a real-analytic area-preserving map with a Brjuno elliptic fixed point and vanishing Birkhoff constants (i.e., {\gamma_n=0} for all {n\in\mathbb{N}}) is analytically linearisable. Note that the analogue of this statement in the {C^{\infty}} category is false (as a counterexample is given by {(r,\theta)\mapsto (r,\theta+\rho+e^{-1/r})}).

In any case, at this stage, the proof of Theorem 4 is complete: it suffices to put together Theorems 5 and 7.

Posted by: matheuscmss | February 8, 2019

Breuillard-Sert’s joint spectrum (I)

Last November 2018, Romain Dujardin, Charles Favre, Thomas Gauthier, Rodolfo Gutiérrez-Romo and I started a groupe de travail around the preprint The joint spectrum by Emmanuel Breuillard and Cagri Sert.

My plan is to transcript my notes from this groupe de travail in a series of posts starting today with a summary of the first meeting where an overview of the whole article was provided. As usual, all mistakes/errors in the sequel are my sole responsibility.

1. Introduction

Let {M_d(\mathbb{C})} be the set of {d\times d} matrices with complex entries. Given {A\in M_d(\mathbb{C})}, recall that its spectral radius {r(A)} is given by Gelfand’s formula

\displaystyle r(A) = \lim\limits_{n\rightarrow\infty}\|A^n\|^{1/n}

More generally, given a compact subset {S\subset M_d(\mathbb{C})}, recall that its joint spectral radius of {S} (introduced by Rota–Strang in 1960) is the quantity

\displaystyle R(S) := \lim\limits_{n\rightarrow\infty} \sup\limits_{g_1,\dots, g_n\in S} \|g_1\dots g_n\|^{1/n} = \lim\limits_{n\rightarrow\infty} \sup\limits_{g\in S^n} \|g\|^{1/n}

where {S^n:=\{g_1\dots g_n: g_1,\dots, g_n\in S\}}.

Remark 1 By submultiplicativity (or, more precisely, Fekete’s lemma), the limit defining {R(S)} always exists.

Remark 2 {R(S)} is independent of the choice of {\|.\|}. In particular, {R(S) = R(g S g^{-1})} for all {g\in GL_d(\mathbb{C})}.

The joint spectral radius appears naturally in several areas of Mathematics (such as wavelets and control theory), and my first contact with this notion occurred through a subfield of Dynamical Systems called ergodic optimization (where one considers an observable {f} and one seeks to maximize {\int f d\mu} among all invariant probability measures {\mu} of a given dynamical system).

The goal of Breuillard–Sert article is two-fold: they introduce of a notion of joint spectrum of {S} and they show that it vastly refines previous related concepts such as joint spectral radius, Benoist cone, etc.

Today, our plan is to provide an overview of some of the main results obtained by Breuillard–Sert. For this sake, we divide this post into two sections: the first one contains a potpourri of prototypical versions of Breuillard–Sert’s theorems, and the the last section provides the precise statements whose proofs will be discussed in subsequent posts in this series.

Read More…

Posted by: matheuscmss | February 6, 2019

Examples of Rauzy classes (after Yoccoz)

This week I attended the mini-conference Autour des surfaces de translation organized by Corentin Boissy and Slavyana Geninska at Toulouse.

One of the main objectives of this meeting was to discuss in details a somewhat long (66 pages) text by Jean-Christophe Yoccoz containing new notions and tools allowing to efficiently describe certain combinatorial objects known as Rauzy diagrams.

In fact, this text was still in preliminary format when Jean-Christophe passed away and, for this reason, Corentin and I spend a certain time discussing the insertion of footnotes in order to clarify several portions of Jean-Christophe’s text. After Corentin and I found that the text was finally “accessible” (to anyone with a certain familiarity with Jean-Christophe’s survey here, say), it was decided that we should “celebrate” the occasion with a meeting around this matter.

In any case, one of the outcomes of the mini-conference is that Jean-Christophe’s text entitled Examples of Rauzy classes with footnotes by Corentin is finally publicly available here.

In a nutshell, the first part of Jean-Christophe’s text is devoted to the notions of heightbi-monotonous cycles, etc., allowing to explore a given Rauzy diagram starting from a certain subgraph whose vertices consist of the so-called standard permutations; then, the second part of Jean-Christophe’s text is a sort of “proof of concept” where several Rauzy diagrams are described (including some containing several thousands of vertices). Here, it is worth to notice that he did the corresponding calculations by hand (mostly during winter vacations at Loctudy as he told me)! Corentin wrote a few Sage programs to double check some of these calculations and, as expected, they turned out to be correct.

Closing this short post, let me try to explain below some of Jean-Christophe’s motivations to get a systematic description of Rauzy diagrams.

First of all, recall that the study of the dynamics of interval exchange transformations and translation flows often relies on a renormalization scheme (“continued fraction algorithm”) called Rauzy–Veech induction: for a detailed exposition of this topic, the reader can consult Yoccoz’s survey here.

Roughly speaking, the Rauzy–Veech induction serves to encode the renormalization of interval exchange transformations and translation flows via topological Markov shifts induced by Rauzy diagrams: more concretely, a Rauzy diagram is a special type of oriented graph {\mathcal{D}} and the dynamics of the renormalization procedure is described by the topological Markov shift consisting of the shift dynamics {(e_n)_{n\in\mathbb{Z}}\mapsto (e_{n+1})_{n\in\mathbb{Z}}} on the space of bi-infinite paths (i.e., concatenations of edges) on {\mathcal{D}}.

In general, Rauzy diagrams are defined as follows. We take an abstract finite alphabet {\mathcal{A}} on {d\geq 2} letters. A permutation {\pi=(\pi_t, \pi_b)} is a pair of bijections {\pi_t,\pi_b:\mathcal{A}\rightarrow\{1,\dots,d\}} (normally we would like to say that {\pi_b\circ\pi_t^{-1}} is a permutation of {\{1,\dots, d\}}, but the data {\pi=(\pi_t,\pi_b)} provides a more “symmetric” way to describe permutations). In the literature, {\pi} is often denoted as a list of the form

\displaystyle \pi=\left(\begin{array}{ccc} \pi_t^{-1}(1) & \dots & \pi_t^{-1}(d) \\ \pi_b^{-1}(1) & \dots & \pi_b^{-1}(d) \end{array}\right)

and the first, resp. last letter of the top and bottom rows are denoted {_{t}\alpha=\pi_t^{-1}(1)} and {_{b}\alpha = \pi_b^{-1}(1)}, resp. {\alpha_{t}=\pi_t^{-1}(d)} and {\alpha_{b}= \pi_b^{-1}(d)}.

The top operation {\mathcal{R}_t} maps a permutation {\pi=(\pi_t,\pi_b)} to {\mathcal{R}_t(\pi)=(\pi_t,\pi_b')} where {\pi_b'} is obtained from {\pi_b} by performing a cyclic permutation of the letters appearing after {\alpha_t} on the bottom row of {\pi}. Similarly, one can define the bottom operation {\mathcal{R}_b} by symmetry (i.e., essentially by exchanging the roles of top and bottom rows). In this setting, a Rauzy diagram {\mathcal{D}} is the oriented graph whose vertices correspond to the orbit of a given permutation {\pi} under the top and bottom operations, and whose oriented edges have the form {\kappa\rightarrow\mathcal{R}_t(\kappa)} and {\kappa\rightarrow\mathcal{R}_b(\kappa)}.

Exercise 1 Draw the three Rauzy diagrams associated to the following three permutations: {\left(\begin{array}{cc} A & B \\ B & A \end{array}\right)}, {\left(\begin{array}{ccc} A & B & C \\ C & B & A \end{array}\right)}, {\left(\begin{array}{cccc} A & B & C & D \\ D & C & B & A \end{array}\right)}.

Among many other results in this topic, our recent work with Avila and Yoccoz on the partial solution of the so-called Zorich conjecture (previously discussed in this post here) relies upon the precise knowledge of the geometry of Rauzy diagrams.

For this reason, right after partially solving Zorich’s conjecture, Jean-Christophe started his detailed study of arbitrary Rauzy diagrams in hope to solve Zorich’s conjecture in full generality.

As it turns out, Zorich’s conjecture was recently solved in full generality by Rodolfo Gutiérrez-Romo while bypassing many fine aspects of Rauzy diagrams (see the original article here and/or this post here), but it is clear that Jean-Christophe’s text on Rauzy diagrams will pave a way for further applications of the fascinating combinatorial objects.

Posted by: matheuscmss | January 3, 2019

Romain Dujardin’s Bourbaki seminar talk 2018

Last October, Romain Dujardin gave a nice talk at Bourbaki seminar about the equidistribution of Fekete points, pluripotential theory and the works of Robert BermanSébastien Boucksoum and David Witt Nyström(including this article here). The video of Dujardin’s talk (in French) is available here and the corresponding lecture notes (also in French) are available here.

In the sequel, I will transcript my notes for Dujardin’s talk (while referring to his text for all omitted details). In particular, we will follow his path, that is, we will describe how a question related to polynomial interpolation was solved by complex geometry methods, but we will not discuss the relationship of the material below with point processes.

Remark 1 As usual, any errors/mistakes in this post are my sole responsibility.

1. Polynomial interpolation and logarithmic potential theory in one complex variable

1.1. Polynomial interpolation

Let {\mathcal{P}_k(\mathbb{C})} be the vector space of polynomials of degree {\leq k} in one complex variable. By definition, {\dim\mathcal{P}_k(\mathbb{C})=k+1}.

The classical polynomial interpolation problem can be stated as follows: given {k+1} points {z_0,\dots, z_{k+1}} on {\mathbb{C}}, can we find a polynomial with prescribed values at {z_j}‘s? In other terms, can we invert the evaluation map {ev(z_0,\dots,z_k):\mathcal{P}_k(\mathbb{C})\rightarrow \mathbb{C}^{k+1}}, {P\mapsto (P(z_0),\dots, P(z_k))} ?

The solution to this old question is well-known: in particular, the problem can be explicitly solved (whenever the points are distinct).

What about the effectiveness and/or numerical stability of these solutions? It is also well-known that they might be “unstable” in many aspects: for instance, the inverse of {ev(z_0,\dots, z_k)} starts to behave badly when some of the points {z_0,\dots, z_k} get close together, a small error on the values {P(z_0),\dots, P(z_k)} might lead to huge errors in the polynomial {P}Runge’s phenomenon shows that certain interpolations about equidistant point in {[-1,1]} are highly oscillating, etc.

This motivates the following question: are there “optimal” choices for the points (leading to “minimal instabilities” in the solution of the interpolation problem)?

This vague question can be formalized in several ways. For instance, the interpolation problem turns out to be a linear algebra question asking to invert an appropriate Vandermonde matrix {V(z_0,\dots, z_k)} and, a fortiori, the calculations will eventually oblige us to divide by an adequate determinant {\det V(z_0,\dots, z_k)}. Hence, if we denote by {e_i(z)=z^i}, {i=0, \dots, k}, the base of monomials of {\mathcal{P}_k(\mathbb{C})}, then we can say that an optimal configuration maximizes the modulus of Vandermonde’s determinant

\displaystyle \det M(z_0,\dots, z_k) = \det(e_i(z_j))_{0\leq i, j\leq k} = \prod\limits_{0\leq i<j\leq k} (z_j-z_i).

Of course, this optimization problem has a trivial solution if we do not impose constraints on {z_0,\dots, z_k}. For this reason, we shall fix some compact subset {K\subset \mathbb{C}} and we will assume that {z_0,\dots, z_k\in K}.

Definition 1 A Fekete configuration {(z_0,\dots,z_k)\in K^{k+1}} is a maximum of

\displaystyle K^{k+1}\ni(w_0,\dots, w_k)\mapsto \prod\limits_{0\leq i<j\leq k} |w_j-w_i|

Definition 2 The {(k+1)}-diameter of {K} is

\displaystyle d_{k+1}(K) := \prod\limits_{0\leq i<j\leq k} |z_j-z_i|^{2/k(k+1)}

where {(z_0,\dots, z_k)} is a Fekete configuration.

It is not hard to see that {d_{k+1}(K)\leq d_k(K)}, i.e., {d_k(K)} is a decreasing sequence.

Definition 3 {d_{\infty}(K)=\lim\limits_{k\rightarrow\infty} d_k(K)} is the transfinite diameter.

The transfinite diameter is related to the logarithmic potential of {K}.

1.2. Logarithmic potential

In the one-dimensional setting, the equidistribution of Fekete configurations towards an equilibrium measure was established by Fekete and Szegö.

Theorem 4 (Fekete, Szegö) If {d_{\infty}(K)>0} and {F_k} is a sequence of Fekete configurations, then the sequence of probability measures

\displaystyle \frac{1}{k+1}\sum\limits_{z\in F_k} \delta_z =:\frac{1}{k+1}[F_k]

converges in the weak-* topology to the so-called equilibrium measure {\mu_K} of {K}.

Proof: Let us introduce the following “continuous” version of Fekete configurations. Given a measure {\mu} on {K}, its “energy” is

\displaystyle I(\mu) := \int\log|z-w| \, d\mu(z) \, d\mu(w),

so that if we forget about the “diagonal terms” {\log|z_i-z_i|}, then {I([F_k]) = \log d_{k+1}(K)}. Recall that the capacity of {K} is {\textrm{cap}(K)=\exp(V(K))} where {V(K) = \sup \{I(\mu): \mu\in\mathcal{M}(K)\}} (and {\mathcal{M}(K)} stands for the space of probability measures on {K}).

Theorem 5 (Frostman) Either {K} is polar, i.e., {I(\mu)=-\infty} for all {\mu\in\mathcal{M}(K)} or there is an unique {\mu_K\in\mathcal{M}(K)} with {I(\mu_K)=V(K)}.

We are not going to prove this result here. Nevertheless, let us mention that an important ingredient in the proof of Frostman’s theorem is the logarithmic potential {u_{\mu}(z)=\int\log|z-w|\,d\mu(w)} associated to {\mu\in\mathcal{M}(K)}: it is a subharmonic function whose (distributional) Laplacian is {\Delta u_{\mu} = 2\pi\mu}. A key feature of the logarithmic potential is the fact that if {I(\mu) = V(K)}, then {u_{\mu}(z)=V(K)} for {\mu}-almost every {z}: observe that this allows to conclude the uniqueness of {\mu_K} because it would follow from {I(\mu)=V(K)=I(\nu)} that {u_{\mu}-u_{\nu}} is harmonic, “basically” zero on {K}, and {u_{\mu}-u_{\nu} = O(1)} near infinity.

Anyhow, it is not hard to deduce the equidistribution of Fekete configurations towards {\mu_K} from Frostman’s theorem. Indeed, let {\mu_k} be {\frac{1}{k+1}[F_k]} and consider the modified energy {\widetilde{I}(\mu_k) = \int_{z\neq w} \log|z-k|\,d\mu_k(z)\,d\mu_k(w) = \frac{k+1}{k}\log\delta_{k+1}(K)}. A straighforward calculation (cf. the proof of Théorème 1.1 in Dujardin’s text) reveals that if {\mu_{k_j}} is a converging subsequence, say {\mu_{k_j}\rightarrow\nu}, then {I(\nu)\geq \log d_{\infty}(K) \geq V(K)}. \Box

1.3. Two remarks

The capacity of {K} admits several equivalent definitions: for instance, the quantities

\displaystyle \tau_k(K)=\inf\{\|P\|_K:=\sup\limits_{z\in K}|P(z)|: P \textrm{ is a monic polynomial of degree }\leq k\}

form a submultiplicative sequence (i.e., {\tau_{k+l}(K)\leq\tau_k(K)\tau_l(K)}) and the so-called Chebyshev constant

\displaystyle \tau_{\infty}(K) := \lim\limits_{n\rightarrow\infty}\tau_n(K)^{1/n}

coincides with {\textrm{cap}(K)}. In other terms, the capacity of {K} is the limit of certain geometrical quantities {\tau_k(K)} associated to a natural norm {\|.\|_K} on the spaces of polynomials {\mathcal{P}_k(\mathbb{C})}.

Also, it is interesting to consider the maximization problem for weighted versions

\displaystyle I_Q(\mu) = I(\mu)+\int Q\,d\mu

of the energy {I(\mu)} of measures.

As it turns out, these ideas play a role in higher dimensional context discussed below.

2. Pluripotential theory on {\mathbb{C}^n}

Denote by {\mathcal{P}_k(\mathbb{C}^n)} the space of polynomials of degree {\leq k} on {n} complex variables: it is a vector space of dimension {\binom{n+k}{k}:=N_k\sim k^n/n!} as {k\rightarrow \infty}.

Let {K} be a compact subset of {\mathbb{C}^n} and consider {z_1,\dots, z_{N_k}\in K}. Similarly to the case {n=1}, the interpolation problem of inverting the evaluation map {\mathcal{P}_k(\mathbb{C}^n)\ni P\mapsto (P(z_1),\dots, P(z_{N_k}))\in\mathbb{C}^{N_k}} involves the computation of the determinant {\det(e_i(z_j))} where {(e_i)} is the base of monomials. Once again, we say that a collection {(z_1,\dots, z_{N_k})} of {N_k} points in {K} maximizing the quantity {|\det(e_i(z_j))|} is a Fekete configuration and the transfinite diameter of {K} is {d_{\infty}(K)=\limsup\limits_{k\rightarrow\infty} d_k(K)} where

\displaystyle d_k(K) = \max\limits_{(z_1,\dots, z_{N_k})\in K^{N_k}}|\det(e_i(z_j))|^{1/kN_k}

is the {k}diameter of {K}.

Given the discussion of the previous section, it is natural to ask the following questions: do Fekete configurations equidistribute? what about the relation of the transfinite diameter and pluripotential theory?

A first difficulty in solving these questions comes from the fact that it is not easy to produce a “continuous” version of Fekete configurations via a natural concept of energy of measures having all properties of the quantity {I(\mu)} in the case {n=1}.

A second difficulty towards the questions above is the following: besides the issues coming from pairs of points which are too close together, our new interpolation problem has new sources of instability such as the case of a configuration of points lying in an algebraic curve. In particular, this hints that some techniques coming from complex geometry will help us here.

The next result provides an answer (comparable to Frostman’s theorem above) to the second question:

Theorem 6 (Zaharjuta (1975)) The limit of {(d_k(K))_{k\in\mathbb{N}}} exists. Moreover, if {K} is not pluripolar, then {d_{\infty}(K)>0}.

Here, we recall that a pluripolar set is defined in the context of pluripotential theory as follows. First, a function {u:\Omega\rightarrow[-\infty,+\infty)} on a open subset {\Omega\subset\mathbb{C}^n} is called plurisubharmonic (psh) whenever {u} is upper semicontinuous (usc) and {u|_C} is subharmonic for any {C\subset\Omega} holomorphic curve. Equivalently, if {u\not\equiv-\infty}, then {u} is psh when {u} is usc and the matrix of distributions {\left(\frac{\partial^2 u}{\partial z_j\partial\overline{z_k}}\right)} is positive-definite Hermitian, i.e., {dd^cu:=\frac{i}{\pi}\sum\frac{\partial^2 u}{\partial z_j\partial \overline{z_k}} dz_j\wedge d\overline{z_k}\geq 0}. Next, {E} is pluripolar whenever {E\subset \{u=-\infty\}} where {u\not\equiv-\infty} is a psh function.

An important fact in pluripotential theory is that the positive currents {dd^c u} can be multiplied: if {u_1,\dots, u_m} are bounded psh functions, then the exterior product {dd^c u_1\wedge\dots\wedge dd^c u_m} can be defined as a current. In particular, we can define the Monge-Ampère operator {MA(u)=(dd^c u)^n = \frac{n!}{\pi^n}\det\left(\frac{\partial^2 u}{\partial z_j\partial\overline{z_k}}\right)idz_1\wedge d\overline{z_1}\wedge\dots\wedge i dz_n\wedge d\overline{z_n}} on the space of bounded psh functions.

Note that {MA(u)} is a positive current of maximal degree, i.e., a positive measure. This allows us to define a candidate for the equilibrium measure in higher dimensions in the following way.


\displaystyle \mathcal{L} = \{u \textrm{ psh on } \mathbb{C}^n \textrm{ with } u(z)\leq \log|z|+O(1) \textrm{ as }z\rightarrow\infty\}

be the so-called Lelong class of psh functions. Given a compact subset {K\subset\mathbb{C}^n}, let

\displaystyle V_K(z) = \sup\{u(z): u\in\mathcal{L}, u|_K\leq 0\}.

Observe that {V_K(z)} is a natural object: for instance, it differs only by an additive constant from the logarithmic potential of the equilibrium measure of {K} when {n=1}. Indeed, this follows from the key property of the logarithmic potential (“{I(\mu)=V(K)} implies {u_{\mu}(z)=V(K)} for {\mu}-almost every {z}”) mentioned earlier and the fact that {V_K} is a subharmonic function which essentially vanishes on {K}.

In general, {V_K(z)} is not psh. So, let us consider the psh function given by its usc regularization {V_K^*(z):=\inf\{u(z): u \textrm{ usc, } u\geq V_K\}}. Note that {V_K^*\geq V_K\geq 0}, so that we can define the equilibrium measure of {K} as

\displaystyle \mu_K:=(d d^c V_K^*)^n.

It is worth to point out that {V_K} can be recovered from the study of polynomials (in a similar way to our discussion of the Chebyshev constant in the previous section). More concretely, we have {\frac{1}{k}\log |P|\in\mathcal{L}} for all {P\in\mathcal{P}_k(\mathbb{C}^n)} and a result of Siciak ensures that

\displaystyle V_K(z)=\sup\left\{\frac{1}{k}\log |P(z)|: k\in\mathbb{N}, P\in\mathcal{P}_k(\mathbb{C}^n), P\leq 1 \textrm{ on } K\right\}

Finally, note that this discussion admits a weighted version where the usual Euclidean norm {|.|} on {\mathbb{C}^n} is replaced by {|.| \exp(-Q)}, the determinant {|\det (e_i(z_j))|} is replaced by {|\det (e_i(z_j))| \exp(-Q(z_1))\dots \exp(-Q(z_{N_k}))}, etc.

Read More…

Posted by: matheuscmss | December 30, 2018

Continued fractions, binary quadratic forms and Markov spectrum

In Number Theory, the study of the so-called Markov spectrum often relies upon the following classical fact relating continued fractions and the values of real indefinite binary quadratic forms at integral points:

Theorem 1 The sets

\displaystyle M_1:=\left\{\frac{\sqrt{b^2-4ac}}{\inf\limits_{(x,y)\in\mathbb{Z}^2\setminus\{(0,0)\}}|ax^2+bxy+cy^2|}<\infty: a,b,c\in\mathbb{R}, b^2-4ac>0\right\}


\displaystyle M_2:=\left\{\sup\limits_{n\in\mathbb{Z}}([g_n; g_{n+1},\dots]+[0;g_{n-1},g_{n-2},\dots])<\infty: (g_m)_{m\in\mathbb{Z}}\in(\mathbb{N}^*)^{\mathbb{Z}}\right\}

coincide. (Here, {[t_0;t_1,\dots] := t_0+\frac{1}{t_1+\frac{1}{\ddots}}} stands for continued fraction expansions.)

Remark 1 It is worth to note that the set {M_1} concerns only the real binary quadratic forms {q(x,y)=ax^2+bxy+cy^2} with {b^2-4ac>0} such that the quantity

\displaystyle \frac{\sqrt{b^2-4ac}}{\inf\limits_{(x,y)\in\mathbb{Z}^2\setminus\{(0,0)\}}|ax^2+bxy+cy^2|}

is finite. In particular, we are excluding real binary quadratic forms {q} with {0\in q(\mathbb{Z}^2\setminus\{(0,0)\})}.

In this post, we follow the books of Dickson and Cusick–Flahive in order to give a proof of this result via the classical reduction theory of binary quadratic forms.

1. Generalities on binary quadratic forms

A real binary quadratic form is {q(x,y)=ax^2+bxy+cy^2} with {a,b,c\in\mathbb{R}}. From the point of view of linear algebra, a binary quadratic form is

\displaystyle q(x,y) = ax^2+bxy+cy^2 =\langle Mv,v\rangle = v^t M v \ \ \ \ \ (1)

where {\langle .,.\rangle} is the usual Euclidean inner product of {\mathbb{R}^2}, {v} is the column vector {v=\left(\begin{array}{c} x \\ y \end{array}\right)}, and {M} is the matrix {M=\left(\begin{array}{cc} a & b/2 \\ b/2 & c \end{array}\right)}.

The discriminant of {q(x,y)=ax^2+bxy+cy^2 = \langle Mv, v\rangle} is {d:=b^2-4ac = -4\det(M)}.

Remark 2 The values taken by {q(x,y)=bxy} are very easy to describe: in particular, {0\in q(\mathbb{Z}^2\setminus\{(0,0)\})} in this context. Hence, by Remark 1, we can focus on {q(x,y)=ax^2+bxy+cy^2} with {a\neq 0} or {c\neq0}. Moreover, by symmetry (i.e., exchanging the roles of the variables {x} and {y}), we can assume that {a\neq0}. So, from now on, we shall assume that {a\neq0}.

Note that {4aq(x,y)=(2ax+by)^2-dy^2}. Therefore, {q} is definite (i.e., its values are all positive or all negative) whenever the discriminant is {d\leq 0}. Conversely, when {a\neq 0}, {q} is indefinite (i.e., it takes both positive and negative values) whenever {d>0}. In view of the definition of {M_1} in the statement of Theorem 1 above, we will restrict ourselves from now on to the indefinite case {d>0}.

Observe also that {q(x,y)=a\prod\limits_{a\omega^2+b\omega+c=0}(x-\omega y) = a(x-fy)(x-sy)} where

\displaystyle f:=\frac{\sqrt{d}-b}{2a} \quad \textrm{and} \quad s := \frac{-\sqrt{d}-b}{2a} \ \ \ \ \ (2)

are the first and second roots of {a\omega^2+b\omega+c=0}. In particular, {0\in q(\mathbb{Z}^2\setminus\{(0,0)\})} when {f\in\mathbb{Q}} or {s\in\mathbb{Q}}. So, by Remark 1, we will suppose from now on that {f, s\notin\mathbb{Q}}.

In summary, from now on, our standing assumptions on {q(x,y)=ax^2+bxy+cy^2} are {a\neq0}, {d:=b^2-4ac>0} and {f:=\frac{\sqrt{d}-b}{2a}\notin\mathbb{Q}}, {s:= \frac{-\sqrt{d}-b}{2a}\notin\mathbb{Q}}.

Remark 3 The first and second roots {f}, {s} and the discriminant {d>0} determine the coefficients {a}, {b}, {c} of the binary quadratic form {q(x,y)}. Indeed, the formula {q(x,y)=a(x-fy)(x-sy)} says that it suffices to determine {a}. On one hand, the modulus {|a|} is given by {4a^2(f-s)^2 = d}. On the other hand, the fact that {f} is the 1st root and {s} is the 2nd root determines the sign of {a}: if we replace {a} by {-a} in the formula {q(x,y)=a(x-fy)(x-sy)}, then we obtain the binary quadratic form {-q} whose 1st root is {s} and 2nd root is {f}; hence, an ambiguity on the sign of {a} could only occur when {f=s}, i.e., {d=0}, in contradiction with our assumption {d>0}.

2. Action of {GL_2(\mathbb{R})} on binary quadratic forms

A key idea (going back to Lagrange) to study the values of a fixed binary quadratic form {q(x,y)=ax^2+bxy+cy^2} is to investigate the equivalent problem of describing the values of a family of binary quadratic forms on a fixed vector.

More precisely, if a vector {v=\left(\begin{array}{c} x \\ y \end{array}\right)=T(V)} is obtained from a fixed vector {V=\left(\begin{array}{c} X \\ Y \end{array}\right)} via a matrix {T=\left(\begin{array}{cc} \alpha & \beta \\ \gamma & \delta \end{array}\right)\in GL_2(\mathbb{R})}, i.e., {x=\alpha X + \beta Y}, {y=\gamma X+ \delta Y}, then the value of {q} at {v} equals the value of {q\circ T} at {V}. By (1), the binary quadratic forms {q} and {q\circ T} are related by:

\displaystyle q\circ T (V) = (T(V))^t\cdot M \cdot T(V) = V^t (T^t M T) V \ \ \ \ \ (3)

where {M=\left(\begin{array}{cc} a & b/2 \\ b/2 & c \end{array}\right)}. In other terms, {q\circ T(X,Y)=AX^2+BXY+CY^2} where

\displaystyle \left(\begin{array}{cc} A & B/2 \\ B/2 & C \end{array}\right) = T^t M T = \left(\begin{array}{cc} \alpha & \gamma \\ \beta & \delta \end{array}\right)\left(\begin{array}{cc} a & b/2 \\ b/2 & c \end{array}\right)\left(\begin{array}{cc} \alpha & \beta \\ \gamma & \delta \end{array}\right), \ \ \ \ \ (4)

i.e., {A=a\alpha^2+b\alpha\gamma+c\gamma^2}, {B=2a\alpha\beta+b(\alpha\delta+\beta\gamma)+ 2c\gamma\delta}, and {C=a\beta^2+b\beta\delta+c\delta^2}.

Observe that (3) implies that the discriminant of {q\circ T} is {\det(T)^2\cdot d}. Thus, {q\circ T} and {q} have the same discriminant whenever {\det(T)=\pm1}.

2.1. Action of {SL_2(\mathbb{Z})} on binary quadratic forms

The quantity {\inf\{|z|: z\in q(\mathbb{Z}^2\setminus\{(0,0)\})\}} appearing in the definition of {M_1} in the statement of Theorem 1 leads us to restrict our attention to the action of {SL_2(\mathbb{Z})} on {q(x,y)=ax^2+bxy+cy^2}. In fact, since {q(\lambda x,\lambda y) = \lambda^2 q(x,y)}, we have that

\displaystyle \inf\{|z|: z\in q(\mathbb{Z}^2\setminus\{(0,0)\})\} = \inf\{|z|: z\in q(\mathbb{Z}^2_{prim})\} \ \ \ \ \ (5)

where {\mathbb{Z}^2_{prim}:=\{(p,q)\in\mathbb{Z}^2: gcd(p,q)=1\}} is the set of primitive vectors of {\mathbb{Z}^2}. So, it is natural to concentrate on {SL_2(\mathbb{Z})} because it acts transitively on {\mathbb{Z}^2_{prim}}.

Definition 2 Two binary quadratic forms {q} and {Q} are equivalent whenever {Q=q\circ T} for some {T\in SL_2(\mathbb{Z})}.

Note that two equivalent binary quadratic forms have the same discriminant. Furthermore, if {(x-\omega y)} is a factor of {q(x,y)=ax^2+bxy+cy^2} and {x=\alpha X+\beta Y}, {y=\gamma X+\delta Y} with {T=\left(\begin{array}{cc}\alpha & \beta\\ \gamma&\delta\end{array}\right)\in SL_2(\mathbb{Z})}, then {(\alpha X+\beta Y - \omega(\gamma X+\delta Y))} is a factor of {q\circ T(X,Y)=AX^2+BXY+CY^2} and, a fortiori, {(X-\frac{-\beta+\omega\delta}{\alpha-\omega\gamma}Y)} is a factor of {q\circ T}. In particular, the roots of {A\Omega^2+B\Omega+C=0} are related to the roots of {a\omega^2+b\omega+c=0} via

\displaystyle \Omega=\frac{-\beta+\omega\delta}{\alpha-\omega\gamma} \ \ \ \ \ (6)

In other words, {\omega} and {\Omega} are related to each other via the action of {T^{-1} = \left(\begin{array}{cc} \delta & -\beta \\ -\gamma & \alpha \end{array}\right)} by Möbius transformations. Actually, a direct calculation with the formulas for {A} and {B} in (4) and the fact that {\alpha\delta-\beta\gamma=1} show that (6) respect the order of the roots, i.e.,

\displaystyle F=\frac{-\beta+f\delta}{\alpha-f\gamma} \quad \textrm{and} \quad S=\frac{-\beta+s\delta}{\alpha-s\gamma} \ \ \ \ \ (7)

where {f} and {F} are the first roots, and {s} and {S} are the 2nd roots.

Remark 4 Note that {\alpha-\omega\gamma\neq 0} because {(\alpha, \gamma)\in\mathbb{Z}^2_{prim}} and we are assuming that the roots of {a\omega^2+b\omega+c=0} are irrational.

Lemma 3 Under our standing assumptions, any binary quadratic form {q} is equivalent to some {ax^2+bxy+cy^2} with {|b|\leq |a|\leq \sqrt{d/3}}.

Proof: We will proceed in two steps: first, we apply certain elements of {SL_2(\mathbb{Z})} to ensure that {|a|\leq \sqrt{d/3}}, and, after that, we use a parabolic matrix to obtain {|b|\leq |a|}.

By (4), the matrix {H_0=\left(\begin{array}{cc} h_0 & 1 \\ -1 & 0 \end{array}\right)\in SL_2(\mathbb{Z})} converts {q(x,y)=a_0x^2+b_0xy+c_0y^2} into {q\circ H_0(X,Y) = a_1X^2+b_1XY+a_0Y^2} with {b_1=2a_0h_0-b_0}.

If {|a_0|>\sqrt{d/3}}, the choice of {h_0\in\mathbb{Z}} with {|2a_0h_0-b_0|\leq |a_0|} in the discussion above leads to {q\circ H_0(X,Y)=a_1X^2+b_1XY+a_0Y^2} with

\displaystyle 4a_1a_0 = b_1^2-d\leq b_1^2 = (2a_0h_0-b_0)^2\leq a_0^2 \quad \textrm{and} \quad -4a_1a_0 = d-b_1^2\leq d<3a_0^2.

In other terms, if {|a_0|>\sqrt{d/3}}, then {q} is equivalent to {a_1X^2+b_1XY+a_0Y^2} with {|4a_1a_0|<3a_0^2}, i.e., {|a_1|<(3/4)|a_0|}.

By iterating this process, we find a sequence {a_nX^2+b_nXY+a_{n-1}Y^2} of binary quadratic forms equivalent to {q} such that {|a_n|<(3/4)^n |a_0|} whenever {|a_{n-1}|>\sqrt{d/3}}. It follows that {q} is equivalent to some {ax^2+\widetilde{b}xy+\widetilde{c}y^2} with {|a|\leq \sqrt{d/3}}.

Finally, by (4), the matrix {P=\left(\begin{array}{cc} 1 & k \\ 0 & 1 \end{array}\right)\in SL_2(\mathbb{Z})} converts {ax^2+\widetilde{b}xy+\widetilde{c}y^2} into {aX^2+bXY+cY^2} with {b=\widetilde{b}+2ka}. Hence, the choice of {k\in\mathbb{Z}} with {|\widetilde{b}+2ka|\leq |a|} gives that {q} is equivalent to {ax^2+bxy+cy^2} with {|b|\leq |a|\leq \sqrt{d/3}}. \Box

2.2. Reduction theory (I)

In general, the study of the dynamics of the action of a group {G} on a certain space {\mathcal{M}} is greatly improved in the presence of nice fundamental domain, i.e., a portion {\mathcal{D}\subset \mathcal{M}} with good geometrical properties capturing all orbits in the sense that the {G}-orbit of any {x\in \mathcal{M}} intersects {\mathcal{D}}.

In the setting of {SL_2(\mathbb{Z})} acting on binary quadratic forms, the role of fundamental domain is played by the notion of reduced binary quadratic form:

Definition 4 We say that {q(x,y)=ax^2+bxy+cy^2} is reduced whenever

\displaystyle \frac{|\sqrt{d}-b|}{2|a|}=|f|<1, \quad \frac{|\sqrt{d}+b|}{2|a|}=|s|>1 \quad \textrm{and} \quad f\cdot s<0

Remark 5 Suppose that {q} is reduced. Then, {d-b^2=(\sqrt{d}-b)(\sqrt{d}+b)=-4a^2fs>0} thanks to the condition {f\cdot s<0}. In particular, {|b|<\sqrt{d}}, so that {\sqrt{d}- b, \sqrt{d}+b > 0}. Hence, {0 < \sqrt{d}-b < 2|a| < \sqrt{d}+b} thanks to the condition {|f|<1<|s|}. Note that the inequality {\sqrt{d}-b<\sqrt{d}+b} says that {b>0}. In other words, {q} reduced implies ({0<b<\sqrt{d}} and) {0<\sqrt{d}-b<2|a|< \sqrt{d}+b}.Conversely, the inequality {0<\sqrt{d}-b<2|a|< \sqrt{d}+b} implies that ({0<b<\sqrt{d}} and) {q} is reduced.

Therefore, {q} is reduced if and only if

\displaystyle 0<\sqrt{d}-b<2|a|< \sqrt{d}+b

Furthermore, the identity {|\sqrt{d}-b|\cdot|\sqrt{d}+b|=4|ac|} implies that {q} is reduced if and only if

\displaystyle 0<\sqrt{d}-b<2|c|< \sqrt{d}+b

In particular, {q} is reduced if and only {q\circ R} is reduced where {R\in GL_2(\mathbb{Z})} is the matrix {R=\left(\begin{array}{cc} 0&1\\ 1 & 0\end{array}\right)} inducing the change of variables {x=Y}, {y=X}.Moreover, if {q} is reduced, then {f\cdot a = (\sqrt{d}-b)/2>0} and {c/a = f\cdot s<0}, i.e., the sign of {c} is opposite to the signs of {f} and {a}.

The next result says that reduced forms are a fundamental domain for the {SL_2(\mathbb{Z})} action on binary quadratic forms:

Theorem 5 Under our standing assumptions, any binary quadratic form {q} is equivalent to some reduced binary quadratic form.

Proof: By Lemma 3, we can assume that {q(x,y)=ax^2+bxy+cy^2} with {|b|\leq |a|\leq \sqrt{d/3}\leq\sqrt{d}}, so that {|4ac|=d-b^2\leq d}. Thus, {\min\{2|a|, 2|c|\}\leq\sqrt{d}}. By performing the change of variables {x=Y}, {y=-X} (i.e., applying the matrix {\left(\begin{array}{cc} 0&1 \\ -1&0\end{array}\right)\in SL_2(\mathbb{Z})}) if necessary, we can suppose that {2|a|\leq \sqrt{d}}.

Since we are assuming that {a\neq 0}, it is possible to choose {k\in\mathbb{Z}} such that {0\leq\sqrt{d}-2|a|\leq \widetilde{b}:=b+2ak\leq \sqrt{d}}. So, if we apply the matrix {\left(\begin{array}{cc} 1&k \\ 0&1\end{array}\right)\in SL_2(\mathbb{Z})} (i.e., {x=X+kY}, {y=Y}) to {q}, then we get the equivalent binary quadratic form {ax^2+\widetilde{b}xy+\widetilde{c}} with {0\leq\sqrt{d}-\widetilde{b}\leq 2|a|\leq \sqrt{d}\leq \sqrt{d}+\widetilde{b}}.

We affirm that {ax^2+\widetilde{b}xy+\widetilde{c}} is reduced. In fact, by Remark 5, our task is to show that we have strict inequalities

\displaystyle 0<\sqrt{d}-\widetilde{b}< 2|a|< \sqrt{d}+\widetilde{b}

However, {0=\sqrt{d}-b} or {\sqrt{d}-\widetilde{b}=2|a|} or {2|a|=\sqrt{d}+\widetilde{b}} would imply that {0} or {\pm1} is a root of {a\omega^2+\widetilde{b}\omega+\widetilde{c}=0}, a contradiction with our assumption that none of its roots is rational. Hence, {ax^2+\widetilde{b}xy+\widetilde{c}} is reduced. \Box

Read More…

In this last post of this series, we want to complete the discussion of Oh–Benoist–Miquel theorem by giving a sketch of its proof in the cases not covered in previous posts.

More precisely, let us remind that Oh–Benoist–Miquel theorem (answering a conjecture of Margulis) asserts that:

Theorem 1 Let {G} be a semisimple algebraic Lie group of real rank {\textrm{rank}_{\mathbb{R}}(G)\geq 2}. Denote by {U\subset G} a horospherical subgroup of {G}. If {\Gamma\subset G} is a discrete Zariski-dense and irreducible subgroup such that {\Gamma\cap U} is cocompact, then {\Gamma} is commensurable to an arithmetic lattice {G_{\mathbb{Z}}}.

Moreover, we remind that the proof of this result was worked out in the previous two posts of this series for {G = SL(4,\mathbb{R})} and {U = \left\{\left(\begin{array}{cccc} 1 & 0 & \ast & \ast \\ 0 & 1 & \ast & \ast \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1\end{array}\right)\right\}}. Furthermore, we observed en passant that these arguments can be generalized without too much effort to yield a proof of Theorem 1 when

  • {U} is commutative;
  • {U} is reflexive (i.e., {U} is conjugate to an opposite horospherical subgroup {U^-});
  • {S} is not compact (where {P=N_G(U)}, {P^-=N_G(U^-)}, {L=P\cap P^-} and {S=[L,L]}).

Today, we will divide our discussion below into five sections discussing prototypical examples covering all possible remaining cases for {U} and {S}.

Remark 1 The fact that Theorem 1 holds for the examples in Sections 12 and 3 below is originally due to Oh. Similarly, the example in Section 4 was originally treated by Selberg. Finally, the original proof of Theorem 1 for the example in Section 5 is due to Benoist–Oh. Nevertheless, expect for Section 3, the arguments discussed below are some particular examples illustrating the general strategy of Benoist–Miquel and, hence, they provide some proofs which are different from the original ones.

1. {U} is not reflexive

The prototype example of this case is {G=SL(3,\mathbb{R})} and {U = \left\{\left(\begin{array}{ccc} 1 & \ast & \ast \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}\right)\right\}}.

The corresponding parabolic subgroup {P=N_G(U)} is the stabilizer of the line {\mathbb{R}\, e_1}:

\displaystyle P=\left\{\left(\begin{array}{ccc} \ast & \ast & \ast \\ 0 & \ast & \ast \\ 0 & \ast & \ast \end{array}\right)\right\} = \{T\in G: T(e_1)\in\mathbb{R}e_1\}.

Equivalently, {P} is the stabilizer of the flag {\{0\}\subset\mathbb{R}e_1\subset \mathbb{R}^3}. Therefore, {U} is not reflexive because its opposite is the stabilizer of a plane.

Since {\Gamma} is Zariski-dense in {G}, we can find {g, h\in\Gamma} such that {\{e_1, g(e_1), h(e_1)\}} is a basis of {\mathbb{R}^3}. Hence, there is no loss in generality in assuming that {g(e_1)=e_2} and {h(e_1)=e_3}. In this setting,

\displaystyle U'=gUg^{-1} = \left\{\left(\begin{array}{ccc} 1 & 0 & 0 \\ \ast & 1 & \ast \\ 0 & 0 & 1 \end{array}\right)\right\}, \quad U''=hUh^{-1} = \left\{\left(\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ \ast & \ast & 1 \end{array}\right)\right\}.

Also, we know that {U'/(U'\cap\Gamma)} and {U''/(U''\cap\Gamma)} are compact. Moreover, {\Delta = \langle\Gamma\cap U', \Gamma\cap U''\rangle} is a discrete and Zariski dense subgroup of the semi-direct product

\displaystyle H=\langle U, U'\rangle = \left\{ \left(\begin{array}{ccc} a & b & v_1 \\ c & d & v_2 \\ 0 & 0 & 1 \end{array}\right): \left(\begin{array}{cc} a & b \\ c & d \end{array}\right)\in S, \left(\begin{array}{c} v_1 \\ v_2 \end{array}\right)\in V \right\}

of {S=SL(2,\mathbb{R})} and {V=\mathbb{R}^2}.

In this context, a key fact is the following result of Auslander (compare with Proposition 4.17 in Benoist–Miquel paper):

Theorem 2 (Auslander) Let {H} be an algebraic subgroup obtained from a semi-direct product of {S} semisimple and {V} solvable, and denote by {p:H\rightarrow S} the natural projection. If {\Delta\subset H} is discrete and Zariski dense, then {p(\Delta)} is also discrete and Zariski dense in {S}.

The information about the discreteness of the projection {p(\Delta)} in the previous statement is extremely precious for our purposes. Indeed, Auslander theorem implies that the projections {p(\Gamma\cap U)} and {p(\Gamma\cap U')} are discrete. Using these facts, one checks that

\displaystyle \Gamma\cap\left\{\left(\begin{array}{ccc} 1 & 0 & \ast \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}\right)\right\} \neq \{\textrm{Id}\} \quad \textrm{ and } \quad \Gamma\cap\left\{\left(\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & \ast \\ 0 & 0 & 1 \end{array}\right)\right\} \neq \{\textrm{Id}\}

By repeating this argument with {(U, U'')} and {(U', U'')} in the place of {(U,U')}, one can “fill all non-diagonal entries”, that is, one essentially gets that {\Gamma} contains finite-index subgroups of

\displaystyle \left\{ \left(\begin{array}{ccc} 1 & \ast & \ast \\ 0 & 1 & \ast \\ 0 & 0 & 1 \end{array}\right)\in SL(3,\mathbb{Z})\right\} \textrm{ and } \left\{ \left(\begin{array}{ccc} 1 & 0 & 0 \\ \ast & 1 & 0 \\ \ast & \ast & 1 \end{array}\right)\in SL(3,\mathbb{Z})\right\},

so that Raghunathan–Venkataramana–Oh theorem (stated in the previous post of this series) guarantees that {\Gamma} is commensurable to {SL(3,\mathbb{Z})}.

This completes our sketch of proof of Theorem 1 for our prototype of non-reflexive subgroup {U} above.

2. {U} is Heisenberg and {S} is not compact

Heisenberg horospherical subgroup {U} is a {2}-step nilpotent whose associated parabolic group {P=N_G(U)} acts by similarities (of some Euclidean norm) on the center of the Lie algebra {\mathfrak{u}} of {U}.

A prototypical example of {U} Heisenberg and {S} non-compact is {G=SL(4,\mathbb{R})} and

\displaystyle U=\left\{ \left( \begin{array}{cccc} 1 & \ast & \ast & \ast \\ 0 & 1 & 0 & \ast \\ 0 & 0 & 1 & \ast \\ 0 & 0 & 0 & 1 \end{array} \right) \right\}

As it turns out, any Heisenberg {U} is reflexive. Thus, we have that {U^-=\gamma_0 U \gamma_0^{-1}} is opposite to {U} for some adequate choice {\gamma_0\in\Gamma}.

In particular, it is tempting to mimmic the arguments from the second and third posts of this series, namely, one introduces the lattices

\displaystyle \Lambda = \log(\Gamma\cap U)\in X_{\mathfrak{u}}, \quad \Lambda^- = \log(\Gamma\cap U^-)\in X_{\mathfrak{u}^-},

so that the arithmeticity of {\Gamma} follows from the closedness of the {\textrm{Ad}_L}-orbit of {(\Lambda, \Lambda^-)} in {X_{\mathfrak{u}}\times X_{\mathfrak{u}^-}} when {S} is not compact; moreover, the closedness of {\textrm{Ad}_L(\Lambda, \Lambda^-)} is basically a consequence of the closedness and discreteness of {F(\Lambda)} in {\mathbb{R}} for an appropriate choice of polynomial function {F}.

In the case of {U} commutative, we took {F(X) = \Phi(\exp(X)\gamma_0)}, where {\Phi(g) = \textrm{det}_{\mathfrak{u}}(M(g))}, {M(g)=\pi\circ \textrm{Ad}(g)\circ \pi} and {\pi:\mathfrak{g}\rightarrow \mathfrak{u}} was the natural projection with respect to the decomposition {\mathfrak{g} = \mathfrak{u}\oplus \mathfrak{l} \oplus \mathfrak{u}^-}.

As it turns out, the case of {U} Heisenberg can be dealt with by slightly modifying the construction in the previous paragraph. More precisely, one considers a natural graduation

\displaystyle \mathfrak{g} = \underbrace{\mathfrak{g}_{-2}\oplus \mathfrak{g}_{-1}}_{=\mathfrak{u}^-} \oplus \underbrace{\mathfrak{g}_0}_{=\mathfrak{l}} \oplus \underbrace{\mathfrak{g}_1 \oplus \overbrace{\mathfrak{g}_2}^{=\textrm{ center of }\mathfrak{u}}}_{=\mathfrak{u}}

and one sets {F(X)=\Phi(\exp(X)\gamma_0)}, {\Phi(g) = \textrm{det}(M(g))}, {M(g)=\pi\circ\textrm{Ad}(g)\circ \pi}, and {\pi} is the natural projection {\pi:\mathfrak{g}\rightarrow \mathfrak{g}_2}. In our prototypical example, the polynomial function {F(X) = \Phi(\exp(X)\gamma_0)} is very explicit:

\displaystyle F(\left(\begin{array}{cccc} 0 & x_1 & x_2 & z \\ 0 & 0 & 0 & y_1 \\ 0 & 0 & 0 & y_2 \\ 0 & 0 & 0 & 0 \end{array}\right)) = (x_1 y_1 + x_2 y_2)^2 - z^2

This completes our sketch of proof of Theorem 1 when {U} is Heisenberg and {S} is not compact.

3. {U} is not commutative and {U} is not Heisenberg

Our prototype of non-commutative and non-Heisenberg {U} is the subgroup

\displaystyle U=\left\{ \left(\begin{array}{cccc} 1 & \ast & \ast & \ast \\ 0 & 1 & \ast & \ast \\ 0 & 0 & 1 & \ast \\ 0 & 0 & 0 & 1 \end{array}\right) \right\}

of {G=SL(4,\mathbb{R})}.

In this context, we will explore some well-known results from the theory of lattices in nilpotent groups to reduce our task to the case of {U} commutative and reflexive.

More concretely, the properties of nilpotent groups together with our hypothesis that {\Delta=\Gamma\cap U} is a lattice in {U} allow to conclude that {\Delta':=[\Delta,\Delta]} is a lattice in

\displaystyle U':=[U, U] = \left\{ \left( \begin{array}{cccc} 1 & 0 & \ast & \ast \\ 0 & 1 & 0 & \ast \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right) \right\},

and, consequently, the centralizer {\Delta_0} of {\Delta'} in {\Delta} is a lattice in the centralizer

\displaystyle U_0= \left\{ \left( \begin{array}{cccc} 1 & 0 & \ast & \ast \\ 0 & 1 & \ast & \ast \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{array} \right) \right\}

of {U'} in {U}. Therefore, we reduced matters to the case of {U_0} commutative and reflexive which was discussed in the previous two posts of this series.

In particular, our sketch of proof of Theorem 1 when {U} is non-commutative and non-Heisenberg is complete.

4. {U} commutative and {S} is compact

The basic example of {U} commutative and {S} compact is {G=SL(2,\mathbb{R})\times SL(2,\mathbb{R})} and

\displaystyle U= \left\{ (\left(\begin{array}{cc} 1 & \ast \\ 0 & 1 \end{array}\right), \left(\begin{array}{cc} 1 & \ast \\ 0 & 1 \end{array}\right)) \right\}.

In this setting, we consider

\displaystyle L = \left\{ (\left(\begin{array}{cc} \lambda_1 & 0 \\ 0 & \lambda_1^{-1} \end{array}\right), \left(\begin{array}{cc} \lambda_2 & 0 \\ 0 & \lambda_2^{-1} \end{array}\right)): \lambda_1, \lambda_2\in \mathbb{R}^*\right\} = P\cap P^-

the common Levi subgroup of {P=N_G(U)} and the parabolic subgroup {P^-} normalizing an opposite of {U}, and the “unimodular Levi subgroup”

\displaystyle \begin{array}{rcl} L_0 &=& \{\ell\in L: \textrm{det}_{\mathfrak{u}} \textrm{Ad}(\ell)=1\} \\ &=& \left\{ (\left(\begin{array}{cc} \lambda_1 & 0 \\ 0 & \lambda_1^{-1} \end{array}\right), \left(\begin{array}{cc} \lambda_2 & 0 \\ 0 & \lambda_2^{-1} \end{array}\right)): \lambda_1 \lambda_2=\pm1\right\} \\ &\simeq& \mathbb{R}^*\times\{\pm1\} \end{array}

The discussion in the second post of this series ensures that the {\textrm{Ad}_{L_0}}-orbit of {(\Lambda, \Lambda^-)} is closed in {X_{\mathfrak{u}}\times X_{\mathfrak{u}^-}}.

We affirm that {\textrm{Ad}_{L_0}(\Lambda, \Lambda^-)} is compact. Indeed, this fact can be proved via Mahler’s compactness criterion: more concretely, recall from the second post of this series that the proof of the closedness of {\textrm{Ad}_L(\Lambda, \Lambda')} produced a polynomial {F} on {\mathfrak{u}} which is {\textrm{Ad}_{L_0}}-invariant and whose values {F(\Lambda)} on {\Lambda} form a closed and discrete subset of {\mathbb{R}}; in our prototypical example, a direct computation shows that

\displaystyle F(\left(\begin{array}{cc} 0 & x_1 \\ 0 & 0 \end{array}\right), \left(\begin{array}{cc} 0 & x_2 \\ 0 & 0 \end{array}\right)) = x_1^2 x_2^2;

in particular, {\inf\limits_{\ell\in L_0} \inf\limits_{X\in\Lambda\setminus\{0\}} \|\textrm{Ad}(\ell) X\|^4 > \inf\limits_{\ell\in L_0} \inf\limits_{X\in\Lambda\setminus\{0\}} F(\textrm{Ad}(\ell) X)}; therefore, the {\textrm{Ad}_{L_0}}-invariance of {F} together with the closedness and discreteness of {F(\Lambda)} imply that

\displaystyle \begin{array}{rcl} \inf\limits_{\ell\in L_0} \inf\limits_{X\in\Lambda\setminus\{0\}} \|\textrm{Ad}(\ell) X\|^4 &>& \inf\limits_{\ell\in L_0} \inf\limits_{X\in\Lambda\setminus\{0\}} F(\textrm{Ad}(\ell) X) \\ &=& \inf\limits_{X\in\Lambda\setminus\{0\}} F(X) =\min\limits_{X\in\Lambda\setminus\{0\}} F(X); \end{array}

since {\Lambda} is irreducible, {\min\limits_{X\in\Lambda\setminus\{0\}} F(X)>0}, and, a fortiori, there are no arbitrarily short non-trivial vectors in the closed family of lattices {\textrm{Ad}_{L_0}(\Lambda)}; hence, we can apply Mahler’s compactness criterion to complete the proof of our affirmation.

At this point, we observe that {L_0\simeq \mathbb{R}^*\times \{\pm1\}} is not compact (because {\textrm{rank}_{\mathbb{R}}(G)\geq 2}), so that the compactness of {\textrm{Ad}_{L_0}(\Lambda,\Lambda^-)} means that the stabilizer of this orbit is infinite. Consequently, {\Gamma\cap L} is infinite, and a quick inspection of the previous post reveals that this is precisely the information needed to apply Margulis’ construction of {\mathbb{Q}}-forms and Raghunathan–Venkataramana theorem in order to derive the arithmeticity of {\Gamma}. This completes our sketch of proof of Theorem 1 when {U} is commutative and {S} is compact (and the reader is invited to consult Section 4.6 of Benoist–Miquel paper for more details).

5. {U} is Heisenberg and {S} compact

Closing this series of post, let us discuss the remaining case of {U} Heisenberg and {S} compact. A concrete example of this situation is {G=SL(3,\mathbb{R})} and

\displaystyle U=\left\{ \left(\begin{array}{ccc} 1 & \ast & \ast \\ 0 & 1 & \ast \\ 0 & 0 & 1 \end{array}\right) \right\}.

In this context, {L=\left\{ \left(\begin{array}{ccc} a & 0 & 0 \\ 0 & b & 0 \\ 0 & 0 & c \end{array}\right): abc=1 \right\}} and an unimodular Levi subgroup is

\displaystyle L_0 = \left\{ \left(\begin{array}{ccc} a & 0 & 0 \\ 0 & 1/a^2 & 0 \\ 0 & 0 & a \end{array}\right): a\in\mathbb{R}^* \right\}

Once again, let us recall that we know that {\textrm{Ad}_{L_0}(\Lambda, \Lambda^-)} is closed, where

\displaystyle \Lambda= \left\{ u=u(x,y,z)=\left(\begin{array}{ccc} 0 & x & z \\ 0 & 0 & y \\ 0 & 0 & 0 \end{array}\right)\in \mathfrak{u} \right\}

We affirm that there is no loss of generality in assuming that {x\neq 0} and {y\neq 0} for all {u=u(x,y,z)\in \Lambda\setminus (\Lambda\cap [U, U])}. Indeed, if this is not the case (say {y=0} for some {u\in\Lambda\setminus (\Lambda\cap [U,U])}), then we are back to the setting of Section 1 above (of the horospherical subgroup {\left\{\left(\begin{array}{ccc} 1 & \ast & \ast \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}\right)\right\}}).

Here, we can derive the arithmeticity of {\Gamma} along the same lines in Section 4 above (where it sufficed to study an appropriate polynomial {F(\left(\begin{array}{cc} 0 & x_1 \\ 0 & 0 \end{array}\right), \left(\begin{array}{cc} 0 & x_2 \\ 0 & 0 \end{array}\right))=x_1^2 x_2^2} to employ Mahler’s compactness criterion). More precisely, one uses the fact that {x\neq0} and {y\neq0} for all {u\in\Lambda\setminus (\Lambda\cap [U,U])} to prove that {\textrm{Ad}_{L_0}(\Lambda, \Lambda')} is compact, so that {\Gamma\cap L} is infinite and, thus, by Margulis’ construction of {\mathbb{Q}}-forms and Raghunathan–Venkataramana theorem, {\Gamma} is arithmetic.

Older Posts »