Posted by: matheuscmss | July 30, 2009

Gugu’s theorem on the Markov and Lagrange spectrum

Hi! A few months ago, my friend Carlos Gustavo (Gugu) Moreira posted at the IMPA’s preprint server an article entitled “Geometric properties of the Markov and Lagrange spectra” explaining the proofs of his results on the Markov and Lagrange spectrum (see this previous post for an introduction of these spectra and the statements of Gugu’s results). Today, we’ll discuss the dynamical aspects of Gugu’s results. However, before starting the discussion of this preprint, let me take the opportunity to congratulate Gugu: he managed to post the first version of his interesting article at the same time of his first son’s birth! Moreover, let me thank my wife Aline Gomes Cerqueira whose nice comments helped me to clarify my thoughts about Gugu’s argument (and also helped the improvement of this work by her PhD advisor) .

[Update: Ops, I forgot to congratulate Gugu also for his recent UMALCA Prize 2009!]

[Update (August 11, 2009): The mistakes pointed out by Yuri are now fixed.]

Quick review of the Markov and Lagrange spectrum

During this section, we review the discussion of the first section of the this post. Given an irrational number {\alpha\in\mathbb{R}-\mathbb{Q}}, Dirichlet’s theorem claims that the inequality

\displaystyle \left|\alpha-\frac{p}{q}\right|<\frac{1}{q^2}

has infinitely many rational solutions {p/q} (in fact, it is a simple exercise to the reader to check that this is a direct consequence of Dirichlet’s pigeonhole principle). Furthermore, Markov and Hurwitz proved that the inequality

\displaystyle \left|\alpha-\frac{p}{q}\right|<\frac{1}{\sqrt{5}}\cdot\frac{1}{q^2}

has infinitely many rational solutions {p/q} for all irrational {\alpha} and {\sqrt{5}} is the biggest constant with this property, namely, for every {\varepsilon>0},

\displaystyle \left|\frac{1+\sqrt{5}}{2}-\frac{p}{q}\right|<\frac{1}{\sqrt{5}+\varepsilon}\cdot\frac{1}{q^2}

has only a finite number of rational solutions.

Nevertheless, during the study of Diophantine properties of specific irrational numbers {\alpha}, it is interesting to introduce the function

\displaystyle k(\alpha):=\sup\{k>0: |\alpha-p/q|<1/kq^2 \textrm{ has } \infty \textrm{ many rational solutions}\}

assigning to each irrational number {\alpha} its best rational approximation constant {k(\alpha)}. Observe that Khintchine’s theorem says that the set of irrational numbers {\alpha} with {k(\alpha)=\infty} has full Lebesgue measure.

In any case, we can consider the Lagrange spectrum

\displaystyle L=\{k(\alpha): \alpha\in\mathbb{R}-\mathbb{Q} \textrm{ and } k(\alpha)<\infty\}

formed by the collection of finite best constants {k(\alpha)}.

Concerning the structure of {L}, Markov (1879) showed that

\displaystyle L\cap (-\infty,3) = \{k_1:=\sqrt{5} < k_2:=2\sqrt{2} < k_3:=\frac{\sqrt{221}}{5} < \dots\}

where {k_n} are explicit quadratic irrationals (i.e., {k_n=p_n+q_n\sqrt{d_n}\notin\mathbb{Q}}, {p_n,q_n,d_n\in\mathbb{N}}, i.e., by Lagrange theorem, {k_n} has periodic continued fraction expansion). In particular, the beginning of {L} is discrete. On the other hand, M. Hall (1947) showed that {[6,+\infty)\subset L} and G. Freiman (1975) determined the biggest half-line {[c,+\infty)} contained {L}, namely, he proved that {c} is

\displaystyle c=4+\frac{253589820+283748\sqrt{462}}{491993569}.

Therefore, it remains only to understand the structure of middle of {L}. This is the content of Gugu’s preprint.

However, before entering this issue, let me mention that the arguments of Hall and Freiman use the study of arithmetic sums of dynamically defined Cantor sets. The interaction between the dynamically defined (regular) Cantor sets and the Lagrange spectrum occurs via the continued fraction expansion. We’ll explain this relationship later (when we introduce the dynamically defined Cantor sets associated to the Gauss map). In any case, we can reformulate the Lagrange spectrum in the language of Dynamical Systems as follows. We observe that, for any irrational number {\alpha=[a_0;a_1,a_2,\dots]} (where {[a_0;a_1,a_2,\dots]} ({a_i\in\mathbb{N}}) is the usual continued fraction notation), it holds

\displaystyle k(\alpha)=\limsup\limits_{n\rightarrow\infty}(\alpha_n+\beta_n)

where {\alpha_n:=[a_n;a_{n+1},\dots]} and {\beta_n:=[0;a_{n-1},\dots,a_1]}. Indeed, this fact follows from the identity {\left|\alpha-p_n/q_n\right|=\frac{1}{(\alpha_{n+1}+\beta_{n+1})} \frac{1}{q_n^2}} (which can be checked by induction). Keeping this notation in mind, we can give an alternative definition of the Lagrange spectrum:

Definition 1 Denote by {\Sigma=\mathbb{N}^{\mathbb{Z}}} the set of bi-infinite sequences of natural numbers. For each {\theta=(a_n)_{n\in\mathbb{Z}} \in\Sigma}, we put

\displaystyle \alpha_n(\theta):=[a_n;a_{n+1},\dots] \textrm{ and } \beta_n(\theta)=[0;a_{n-1},\dots],

and we introduce the function {f:\Sigma\rightarrow\mathbb{R}^+} given by

\displaystyle f(\theta)=\alpha_0(\theta)+\beta_0(\theta)=[a_0;a_1,\dots] + [0;a_{-1},a_{-2},\dots].

In this setting, the Lagrange spectrum {L} is

\displaystyle L=\{\limsup\limits_{n\rightarrow\infty}f(\sigma^n(\theta)): \theta\in\Sigma\},

where {\sigma:\Sigma\rightarrow\Sigma} is the shift dynamics {\sigma((a_n)_{n\in\mathbb{Z}})=(a_{n+1})_{n\in\mathbb{Z}}}.

Of course, this alternative definition of the Lagrange spectrum motivates the introduction of the Markov spectrum {M}:

Definition 2 {M=\{\sup\limits_{n\in\mathbb{Z}}f(\sigma^n(\theta)): \theta\in\Sigma\}}.

Remark 1 {M} admits the following arithmetic characterization:

\displaystyle M=\left\{\left(\inf\limits_{(x,y)\in\mathbb{Z}^2-\{(0,0)\}} ax^2+bxy+cy^2\right)^{-1}: b^2-4ac=1\right\}.

Remark 2 It is possible to prove that {L} and {M} are closed subsets of {\mathbb{R}} such that {L\subset M}.

A nice reference for the theory of continued fractions and the Markov and Lagrange spectrum containing the proofs of all facts quoted above is the book of Cusick and Flahive.

At this point, let us state the main theorems of Gugu’s preprint (concerning about the middle part of {L}).

Statements of the main results

The main three results of Gugu’s preprint are:

Theorem 3 Given {t\in\mathbb{R}}, we have {HD(L\cap(-\infty,t))=HD(M\cap(-\infty,t))} (where {HD(A)} stands for the Hausdorff dimension of the set {A}). Furthermore, {d(t):=HD(L\cap(-\infty,t))=HD(M\cap(-\infty,t))} is a continuous and surjective function from {\mathbb{R}} to {[0,1]}. Moreover,

  • {d(t)=\min\{1,2\cdot D(t)\}} where {D(t):=HD(k^{-1}((-\infty,t)))} is a continuous function from {\mathbb{R}} to {[0,1)};
  • {\max\{t\in\mathbb{R}: d(t)=0\}=3};
  • {d(\sqrt{12})=1}.

Remark 3 A very interesting consequence of theorem 3 is the fact that {d(t)} is not Hölder continuous (with respect to any exponent {\theta>0}). Indeed, {d} sends the subset {L\cap(-\infty,3+\varepsilon)} of Hausdorff dimension {d(3+\varepsilon)} onto the interval {[0,d(3+\varepsilon)]} for every {\varepsilon>0}. Thus, if {d} is {\theta}-Hölder continuous for some {\theta>0}, one would have {\theta\leq d(3+\varepsilon)} for all {\varepsilon>0}, a contradiction (for a sufficiently small {\varepsilon>0}).

Theorem 4 {\lim\limits_{t\rightarrow\infty} HD(k^{-1}(t))=1}.

Theorem 5 The set {L'} of accumulation points of {L} is a perfect set, i.e., {L''=L'}.

In the sequel, we’ll content ourselves with the discussion of the first part of theorem 3 only (i.e., the continuity of {d(t)}). The starting point of Gugu’s argument is inspired by the proofs of the theorems of M. Hall and G. Freiman, namely, one consider the dynamically defined Cantor sets related to the Gauss map. Then, the basic idea during the proof of the continuity properties of the function {d(t)} is the combination of some approximation arguments of the Lagrange spectrum by the arithmetic sum of two regular Cantor sets with the fact that the Hausdorff dimension of these arithmetic sums are well-behaved due to the so-called dimension formula. Now, let’s turn to the details.

The dimension formula

A fundamental tool in Gugu’s article is the so-called dimension formula (which will appear in a forthcoming paper by Gugu). Basically, it says that the Hausdorff dimension of the arithmetic sum {K+K'} of generic Cantor sets {K}, {K'} is {\min\{1, HD(K)+HD(K')\}}. In order to properly state this theorem, we introduce the notion of non essentially affine regular Cantor sets. We recall that a {C^l}-regular (i.e., dynamically defined) Cantor set {K} is the maximal invariant set of a transitive expanding map {\psi\in C^l} from a disjoint finite union of compact intervals {I_1,\dots,I_r} to the real line verifying the Markov partition property (see this post for more details). Observe that, for every periodic point {p} of period {n} of {\psi}, one can find a {C^l} diffeomorphism of the convex hull {I} of {K} such that {h^{-1}\circ\psi\circ h} is affine in {h^{-1}(J)}, where {J} is the connected component of the domain of {\psi} containing {p}.

Definition 6 We say that {K} is non essentially affine whenever we can find a periodic point {p} (as above) such that the map {\widetilde{\psi}:= h^{-1}\circ\psi\circ h} verifies {(\widetilde{\psi}^r)''(x)\neq 0} for some {x\in h^{-1}(K)}.

In other words, {K} is non essentially affine if it is not possible to perform a conjugation of the dynamics {\psi} so that it becomes affine near the corresponding Cantor set. Using this notion, Gugu will show (in a forthcoming article) the following theorem:

Theorem 7 (Dimension formula) If {K} and {K'} are {C^2}-regular Cantor sets and {K} is non essentially affine, then

\displaystyle HD(K+K')=\min\{1, HD(K)+HD(K')\}.

Remark 4 The proof of this result uses in a crucial way the previous work of Gugu and J.C. Yoccoz about the stable intersections of {C^2}-Cantor sets. More precisely, Gugu plans to use the so-called Scale Recurrence Lemma of this paper to prove the dimension formula. In fact, the curious reader can consult the article of P. Shmerink (15 pages) for a complete proof (along these lines) of Gugu’s dimension formula under slightly stronger assumptions (of non-resonance).

Remark 5 The appearance of the term {HD(K)+HD(K')} on the right-hand side of the dimension formula has a natural explanation: as we saw in a previous post about Marstrand’s theorem, the arithmetic sum {K+K'} is (essentially) the orthogonal projection of the product set {K\times K'\subset\mathbb{R}^2} into the diagonal {\ell_{\pi/4}} of {\mathbb{R}^2} (making angle of {\pi/4} with the {x}-axis). Hence, the dimension formula says that, for generic Cantor sets, this projection has the correct dimension, i.e., {HD(K)+HD(K')} (the Hausdorff dimension of {K\times K'}).

Assuming the validity of the dimension formula, we can begin the discussion of the proof of theorem 3.

Regular Cantor sets associated to the Gauss map

In Number Theory, the Gauss map {g:(0,1]\rightarrow[0,1]} is {g(x)=\left\{\frac{1}{x}\right\}} the fractional part of {1/x} (this should not be confused with the Gauss map from the Differential Geometry). Of course, the iterates of {g} are intimately related to the continued fraction algorithm. A central role in the subsequent arguments will be played by the regular Cantor sets associated to complete shifts via the Gauss map. More precisely, given a finite set {B=\{\beta_1,\dots,\beta_m\}} ({m\geq 2}) of finite sequences {\beta_j\in (\mathbb{N}^*)^{r_j}} ({r_j\in\mathbb{N}^*}) of positive integers such that {\beta_i} doesn’t start with {\beta_j} for every {i\neq j}, we introduce

Definition 8 The complete shift {\Sigma(B)} associated to {B} is the subset of {(\mathbb{N}^*)^{\mathbb{N}}} formed by the sequences {(\beta_{i_j})_{j\in\mathbb{N}} } obtained by concatenations of the elements {\beta_{i_j}\in B} for every {j\in\mathbb{N}}.

Using the complete shift {\Sigma(B)} and the Gauss map, we can construct a regular Cantor set:

Definition 9 We denote by {K(B)\subset [0,1]} the Cantor set of real numbers whose continued fractions are sequences of {\Sigma(B)}, that is,

\displaystyle K(B)=\{[0;\gamma_1,\gamma_2,...]: \gamma_j\in B\, \forall \, j\geq 1\}.

Exercise 1 Show that {K(B)} is a regular Cantor set. (Hint: For each {\beta_j\in B}, {\beta_j\in(\mathbb{N}^*)^{r_j}}, consider the interval {I_j=[a_j,b_j]} where {a_j=\inf\{x\in K(B): x=[0;\beta_j,\dots]\}} and {b_j=\sup\{x\in K(B): x=[0;\beta_j,\dots]\}} and define {\psi |_{I_j}=g^{r_j}})

In view of the shape of the graph of the Gauss map, the following proposition is very natural:

Proposition 10 {K(B)} is non essentially affine.

Proof: For each {\beta_j\in B} (say {\beta_j\in(\mathbb{N}^*)^{r_j}}), let us denote by {x_j:=[0;\beta_j,\beta_j,\dots]\in I_j=[a_j,b_j]} be the fixed point of {\psi |_{I_j}=g^{r_j}}. From the theory of continued fractions (see Cusick and Flahive), we know that

\displaystyle \psi |_{I_j}(x) = \frac{q_{r_j-1}^{(j)}x-p_{r_j-1}^{(j)}}{p_{r_j}^{(j)}-q_{r_j}^{(j)}x}

where {p_n^{(j)}/q_n^{(j)}} is the {n}-th rational approximation of {x_j} (for {1\leq n\leq r_j}). It follows that {x_j} is a positive solution of {q_{r_j}^{(j)}x^2 - (q_{r_j-1}^{(j)}-p_{r_j}^{(j)})x - p_{r_j-1}^{(j)} = 0}.

On the other hand, since {\psi |_{I_j}} is a Möbius transformation with an expanding fixed point {x_j}, we can find a Möbius function {\alpha_j(x)=(a_jx+b_j)/(c_jx+d_j)} such that {\alpha_j(x_j)=x_j}, {\alpha_j'(x_j)=x_j} and {\alpha_j\circ(\psi |_{I_j})\circ\alpha_j^{-1}} is affine. This reduces our task to show that {\alpha_1\circ(\psi |_{I_2})\circ\alpha_1^{-1}} is not affine (because the second derivative of a non-affine Möbius transformation never vanishes).

Assuming that {\alpha_1\circ(\psi |_{I_2})\circ\alpha_1^{-1}} is affine, we see that {\alpha_1^{-1}(\infty)=-d_1/c_1} is a common fixed point of {\psi |_{I_1}} and {\psi |_{I_2}} (since the point {\infty} at infinity is a common fixed point of the affine maps {\alpha_1\circ(\psi |_{I_1})\circ\alpha_1^{-1}} and {\alpha_1\circ(\psi |_{I_2})\circ\alpha_1^{-1}}) and, a fortiori, it is a common solution of the quadratic equations {q_{r_j}^{(j)}x^2 - (q_{r_j-1}^{(j)}-p_{r_j}^{(j)})x - p_{r_j-1}^{(j)} = 0} for {j=1,2}. Because these two quadratic polynomials of {\mathbb{Z}[x]} are irreducible (since {x_1} and {x_2} are irrational numbers), they must coincide. In particular, this forces {x_1=x_2}, a contradiction with {\beta_1\neq\beta_2}. \Box

Once we know that the regular Cantor sets {K(B)} are non essentially affine, we’ll try to combine this information with the dimension formula (for the arithmetic sum of Cantor sets) in order to show the continuity of {d(t)}. Note that this is a natural approach because the Lagrange and Markov spectra are naturally related to the values of a certain (sum) function over the orbits of bi-infinite sequences by the shift map. Before starting the discussion of this topic, let us close this section with some remarks and interesting facts.

Remark 6 Given a sequence {\beta=(b_1,\dots,b_n)}, we define its transposition {\beta^t} by {\beta^t=(b_n,\dots,b_1)}. Of course, this notion extends to finite sets of sequences {B} (via {B^t=\{\beta^t: \beta\in B\}}. In this notation, we have the classical fact {q_n(\beta)=q_n(\beta^t)} about the continuants of the continued fractions associated to {\beta} and {\beta^t} (see the appendix of Cusick and Flahive). Using this fact, it is not hard to prove that {HD(K(B)) = HD(K(B^t))}.

Putting together the previous remark 6 with the proposition 10, we conclude:

Corollary 11 {HD(K(B)+K(B^t))=\min\{1,2\cdot HD(K(B))\}}.

Remark 7 This corollary is related to the fact that {d(t)=\min\{1,2\cdot D(t)\}} (see the theorem 3).

Lagrange/Markov spectrum of complete shifts

Denote by {\Sigma=(\mathbb{N}^*)^{\mathbb{Z}}=\Sigma^-\times\Sigma^+} where {\Sigma^-=(\mathbb{N}^*)^{\mathbb{Z}_-}} and {\Sigma^+= (\mathbb{N}^*)^{\mathbb{N}}} and let {\sigma:\Sigma\rightarrow\Sigma} be the shift operator. Recall that the Lagrange spectrum is {L=\{l(\theta):\theta\in\Sigma\}} and the Markov spectrum is {M=\{m(\theta): \theta\in\Sigma\}}, where {l(\theta)=\limsup\limits_{n\rightarrow\infty} (\alpha_n(\theta)+\beta_n(\theta))} and {m(\theta)=\sup\limits_{n\in\mathbb{N}}(\alpha_n(\theta)+\beta_n(\theta))} (see the definition 1 for the definition of {\alpha_n} and {\beta_n}).

Remark 8 If {\theta=(a_n)_{n\in\mathbb{Z}}\in\Sigma}, then {m(\theta)\geq\sup\{a_n:n\in\mathbb{Z}\}}. In particular, if {\sup\{a_n:n\in\mathbb{Z}\}\geq 5}, we have that {m(\theta)} belongs to M. Hall’s ray (since {5>4,527...}).

Therefore, since we are interested in the continuity properties of the function {d(t)}, the previous remark allows us to make the following assumption:

From now on, we’ll always assume that {a_n\leq 4} for every {n\in\mathbb{Z}}, {m(\theta)<5} and {t\in [3,5]}.

In our approach to {L} and {M}, it is convenient to introduce the Lagrange and Markov spectrum associated to a complete shift {\Sigma(B)}:

Definition 12 Let {\Sigma(B)} be a complete shift. Its Lagrange spectrum is {l(\Sigma(B))\subset L} and its Markov spectrum is {m(\Sigma(B))\subset M}.

A nice feature of the Lagrange and Markov spectrum of complete shifts is:

Lemma 13 (lemma 2 of Gugu’s paper) Let {\Sigma(B)\subset\{1,2,3,4\}^{\mathbb{N}}} be a complete shift. Then, it holds

\displaystyle HD(l(\Sigma(B)))=HD(m(\Sigma(B)))=\min\{1, 2\cdot HD(K(B))\}.

Proof: The reader clearly sees that {l(\Sigma(B))\subset m(\Sigma(B))\subset \bigcup\limits_{a=1}^4\bigcup\limits_{i,j=1}^R (a+g^i(K(B))+g^j(K(B^t)))}, where $R$ is the length of the biggest word of $B$, so that

\displaystyle HD(l(\Sigma(B)))\leq HD(m(\Sigma(B)))\leq\min\{1, 2\cdot K(B)\}.

On the other hand, given {\varepsilon>0}, it suffices to construct two regular Cantor sets {K} and {K'} such that {HD(K),HD(K')>HD(K(B))-\varepsilon} and {K+K'\subset l(\Sigma(B))}. In this direction, we note that, up to replacing {B} by {B^n:=\{(\gamma_1,\dots,\gamma_n): \gamma_j\in B\}} for a large {n}, one can assume that {HD(K(B-A))>HD(K(B))-\varepsilon} for any {A\subset B} with {|A|\leq 2}. Next, we order the elements of {B} (resp. {B^t}) as follows: given {\gamma,\widetilde{\gamma}\in B} (resp. {\gamma,\widetilde{\gamma}\in B^t}), we say that {\gamma<\widetilde{\gamma}} if and only if {[0;\gamma]<[0;\widetilde{\gamma}]}. Using this total order, we can select {\min B} and {\max B} (resp. {\min B^t} and {\max B^t}) the minimal and maximal elements of {B} (resp. {B^t}). We define

\displaystyle B^*:= B -\{\min B, \max B\}


\displaystyle (B^t)^*:= B^t-\{\min B^t, \max B^t\}.

Observe that {HD(K(B^*))>HD(K(B))-\varepsilon} and {HD(K((B^t)^*))>HD(K(B^t))-\varepsilon=HD(K(B))-\varepsilon} (here we used the remark 6). Of course, since we removed the minimal and maximal elements of {B} and {B^t}, we expect that the values of {l} on {\Sigma((B^t)^*)\times\Sigma(B^*)} decrease in view of the following classical comparison result of the theory of continued fractions:

\displaystyle [a_0;a_1,\dots,a_n,\dots]<[b_0;b_1,\dots,b_n,\dots]

if and only if {(-1)^k(a_k-b_k)<0}, where {k} is the smallest integer such that {a_k\neq b_k} (this parity issue during the comparison of two continued fractions justifies the exclusion of the minimal and maximal elements of {B}).

Unfortunately, the exclusion of the minimal and maximal elements of {B} and {B^t} is not sufficient to ensure that {K(B^*)+K((B^t)^*)\subset l(\Sigma(B))} (in fact, although the values of {l} on {\Sigma((B^t)^*)\times\Sigma(B^*)} decrease, this doesn’t guarantees that they belong to {l(\Sigma(B))}. However, this technical problem can be solved by considering some smaller replicas of {K(B^*)} and {K((B^t)^*)}. Pick {\widetilde{\theta}=(\dots,\widetilde{\gamma}_{-1}, \widetilde{\gamma}_0, \widetilde{\gamma}_{1}, \dots))}, {\gamma_i\in B}, a point of {\Sigma(B)} where the maximum of {m(\Sigma(B))} is attained (in a position associated to {\widetilde{\gamma}_0}). Using {\widetilde{\theta}}, for each {m\geq 1}, we construct the subsets {C^m} formed of the sequences

\displaystyle (\dots,\gamma_{-m-2},\gamma_{-m-1},\widetilde{\gamma}_{-m},\dots,\widetilde{\gamma}_0,\dots,\widetilde{\gamma}_m,\gamma_{m+1},\gamma_{m+2},\dots)

where {\gamma_k\in (B^t)^*} for every {k\leq -m-1} and {\gamma_k\in B^*} for every {k\geq m+1}. By a compactness argument (involving the fact that we can eventually increase the value of {m} by replacing some of its elements by {\min B} or {\max B}), Gugu shows that there exists {\eta>0} and a large {m\in\mathbb{N}} such that, for every {\theta\in C^m}, the value {m(\theta):=\sup(\alpha_n(\theta)+\beta_n(\theta))} is attained at a position {n} associated to the central


\displaystyle \tau:=(\widetilde{\gamma}_{-m},\dots,\widetilde{\gamma}_0,\dots,\widetilde{\gamma}_m)

and {\alpha_n(\theta)+\beta_n(\theta)< m(\theta)-\eta} at any position outside {\tau}. Next, we pick an arbitrary {\mu\in B} such that {\mu^t\in (B^t)^*} and we associated to each {x=[0;\gamma_1(x),\dots]\in K(B^*)} the sequence

\displaystyle \theta(x)=(\dots,\mu^t,\mu^t,\tau,\gamma_1(x),\gamma_2(x),\dots).

Observe that, for each position {n} of the central block, {\beta_n(\theta(x))} doesn’t depend on {x}, so that {\tau}, the function {g_n(x):=\alpha_n(\theta(x)) + \beta_n(\theta(x))} is a Möbius transformation (for such positions n). Moreover, these functions are mutually distinct. In particular, the values {g_n(x)} (for the positions {n} of {\tau}) are distinct except for a finitely many choices of {x}. This allows us to take {x^\#=[0;\gamma_1^\#,\dots]} such that {g_n(x^\#)\neq g_m(x^\#)} for every two distinct positions {n} and {m} of {\tau}. Let’s denote by {n_0} the position of {\tau} where the value {g_{n_0}(x^\#)} is maximum.

Now, we take {N} large such that the enlarged block {\tau^\#:=(\underbrace{(\mu^t)),\dots,(\mu^t)}_{N \textrm{ times}},\tau,\gamma_1^\#,\dots,\gamma_N^\#)} verifies the following property: we have

\displaystyle m(\theta)=[\overline{a}_0;\overline{a}_1,\dots,\overline{a}_{N_2},\gamma_1,\dots] + [0;\overline{a}_{-1},\dots,\overline{a}_{-N_1},\gamma_{-1},\dots]

for any {\theta=(\dots,\gamma_{-1},\tau^\#,\gamma_1,\dots)} with {\gamma_k\in B^*} and {(\gamma_{-k})^t\in (B^t)^*} ({k\geq 1}), where

\displaystyle (\overline{a}_{-N_1},\dots,\overline{a}_{-1},\overline{a}_{0},\dots,\overline{a}_{N_2}):=\sigma^{n_0}(\tau^\#).

The existence of {N} is a consequence of the choice of {\tau} (so that the {m(\theta)} is attained at some position {n} of {\tau}), the fact that the values of {g_n(x^\#)} are mutually distinct at the positions {n} associated to {\tau} (so that {g_{n_0}(x^\#)>g_n(x^\#)} for every position {n\neq n_0} of {\tau}) and the fact that the values of {\alpha_n(\theta)+\beta_n(\theta)} are very close to {g_n(x^\#)} for a sufficiently large {N}.

Finally, we define

\displaystyle K=\{[\overline{a}_0;\overline{a}_1,\dots,\overline{a}_{N_2},\gamma_1',\dots]: \gamma_j'\in B^*\,\forall\, j\geq 1\}


\displaystyle K'=\{[0;\overline{a}_{-1},\dots,\overline{a}_{-N_1},(\gamma_{1}'')^t,\dots]: (\gamma_j'')^t\in (B^t)^*\,\forall\, j\geq 1\}.

Observe that {K} is a (diffeomorphic) copy of {K(B^*)} and {K'} is a (diffeomorphic) copy of {K((B^t)^*)}, so that {HD(K)>HD(K(B))-\varepsilon} and {HD(K')>HD(K(B))- \varepsilon}.

We claim that {K+K'\subset l(\Sigma(B))\subset m(\Sigma(B))}. Indeed, given

\displaystyle x=[\overline{a}_0;\overline{a}_1,\dots,\overline{a}_{N_2},\gamma_1',\dots]\in K


\displaystyle y=[0;\overline{a}_{-1},\dots,\overline{a}_{-N_1},(\gamma_{1}'')^t,\dots]\in K',

it is not hard to check that {m(\widehat{\theta}(x,y))=x+y} where

\displaystyle \widehat{\theta}(x,y):=(\dots,\gamma_2'',\gamma_1'',\tau^\#,\gamma_1,\gamma_2,\dots)

and {l(\theta^*(x,y))=x+y} where

\displaystyle \theta^*(x,y):=(\dots,\mu,\mu,\tau^{(1)},\tau^{(2)},\dots)

and {\tau^{(m)}:=(\gamma_m'',\dots,\gamma_1'',\tau^\#,\gamma_1',\dots,\gamma_m')}. This ends the proof of the lemma. \Box

Approximation of Lagrange/Markov spectrum by complete shifts

During the study of the function {d(t)}, it is interesting to introduce the subshifts {\Sigma_t\subset \Sigma} given by

\displaystyle \Sigma_t:=\{\theta\in\Sigma: m(\theta)\leq t\}

and the associated Cantor sets

\displaystyle K_t=\{[0;\gamma]:\gamma\in\pi_+(\Sigma_t)\},

where {\pi_+:\Sigma\rightarrow\Sigma^+} is the natural projection. For later use (in the arguments related to Hausdorff dimension), we’ll say that the size {s(\alpha)} of a finite sequence {\alpha=(a_1,\dots,a_n)} of strictly positive integers is the length of the interval

\displaystyle I(\alpha)=\{x=[0;a_1,\dots,a_n,\alpha_{n+1}]: \alpha_n\geq 1\}.

Note that the extremities of {I(\alpha)} are {[0;a_1,\dots,a_n]=p_n/q_n} and {[0;a_1,\dots,a_n+1]=(p_n+p_{n-1})/(q_n+q_{n-1})}, so that {s(\alpha)=1/q_n(q_n+q_{n-1})}.

We also introduce {r(\alpha)=\lfloor\log(1/s(\alpha))\rfloor},

\displaystyle P_r=\{\alpha=(a_1,\dots,a_n): r(\alpha)\geq r \textrm{ and } r((a_1,\dots,a_{n-1}))<r\}

for each {r\in\mathbb{N}}, and

\displaystyle C(t,r)=\{\alpha\in P_r: I(\alpha)\cap K_t\neq\emptyset\}.

The cardinatily of {C(t,r)} is denoted by {N(t,r)}. In this setting, it is possible to show that the box dimension {\Delta(t)} of {K_t} is

\displaystyle \Delta(t)=\lim\limits_{m\rightarrow\infty}\frac{1}{m}\log N(t,m).

This is expected since the collection {C(t,r)} corresponds to the natural intervals (with respect to the continued fraction algorithm) of scale {r} covering the set {K_t}. For more details see Gugu’s preprint.

Proposition 14 {\Delta} is a continuous function.

Proof: In fact, from the theory of continued fractions, we have that {r(\alpha\beta k)\geq r(\alpha)+r(\beta)} for any finite sequences {\alpha,\beta} and any {k\in\{1,2,3,4\}}. It follows that one can cover {K_t} using the {4N(t,r)N(t,s)} intervals of the form {I(\alpha\beta k)}, where {\alpha\in C(t,r)}, {\beta\in C(t,s)}, {1\leq k\leq 4}. Furthermore, these intervals verifies {r(\alpha\beta k)\geq r+s}. In particular, {N(t,r+s)\leq 4N(t,r)N(t,s)}, so that

\displaystyle \log(4N(t,r+s))\leq \log(4N(t,r)) + \log(4N(t,s)).

In other words, the sequence {\log(4N(t,m))} is sub-aditive. Hence,

\displaystyle \lim\limits_{m\rightarrow\infty}\frac{1}{m}\log N(t,m) = \lim\limits_{m\rightarrow\infty}\frac{1}{m}\log(4N(t,m))=\inf\limits_{m\rightarrow\infty}\frac{1}{m}\log(4N(t,m)).

Using this information, we can conclude that {\Delta(t)} is continuous: otherwise, one can find some {t_0} and {\eta>0} such that {\Delta(t)> \Delta(t_0)+\eta} for every {t>t_0} (since {\Delta} is monotone). In view of the previous identity, it follows that we can select some large integer {r_0} such that

\displaystyle \frac{1}{r}\log N(t,r)>\Delta(t_0)+\eta

for every {t>t_0} and {r\geq r_0}. On the other hand, because {C(t,r)\subset C(s,r)} for any {t\leq s} and {C(t_0,r)=\bigcap\limits_{t>t_0}C(t,r)} (by a simple compactness argument), by taking the limit when {r\rightarrow\infty}, we see that the previous estimate implies

\displaystyle \Delta(t_0)>\Delta(t_0)+\eta,

a contraction. This completes the proof of the proposition. \Box

Now, we are able to state the following lemma about the approximation of the Lagrange/Markov spectrum by regular Cantor sets associated to complete shifts:

Lemma 15 (lemma 1 of Gugu’s paper) Given {3\leq t\leq 5} and {0<\eta<1}, we can find {\delta>0} and a regular Cantor set {K(B)} associated to a complete shift {\Sigma(B)\subset \{1,2,3,4\}^{\mathbb{N}}} such that

  • {\Sigma(B)\subset \Sigma_{t-\delta}} and
  • {HD(K(B))>(1-\eta)\Delta(t)}.

Proof: Of course, it is not hard to approximate {\Sigma_t} by a complete shift {\Sigma(\widetilde{B})} (in order to ensure {HD(K(\widetilde{B}))} is close to {\Delta(t)}). Indeed, consider {\tau=\eta/40} and fix {r_0\in\mathbb{N}} large so that

\displaystyle \left|\frac{\log N(t,r)}{r}-\Delta(t)\right|<\frac{\tau}{2}\cdot\Delta(t)

for any {r\geq r_0}. Now, define {B_0:=C(t,r_0)} and put {N_0:=|B_0|}. Take {k=8N_0^2\cdot\lceil 2/\tau\rceil} and we introduce

\displaystyle \widetilde{B}=\{\beta:=(\beta_1,\dots,\beta_k): \beta_j\in B_0 \textrm{ and } K_t\cap I(\beta)\neq\emptyset\}.

It is not hard to check that the Hausdorff dimension of {K(\widetilde{B})} is close to {\Delta}: firstly, a simple counting argument shows that {|\widetilde{B}|>2 N_0^{(1-\tau)k}}; in particular, since the intervals {I(\beta)}, {\beta\in\widetilde{B}} are a covering of {K_t} such that {|I(\beta)|\geq e^{-100r_0\cdot k}} (exercise), we see that {HD(K(\widetilde{B}))> (1-2\tau)\Delta(t)}, but the main problem is the fact that we don’t have any control on the values of {m} on {\Sigma(\widetilde{B})} (since we would like to decrease them from {t} to {t-\delta}). In order to overcome this difficulty, Gugu introduces the notion of lef-good and right-good positions of a given {\beta=(\beta_1,\dots,\beta_k)\in\widetilde{B}}.

More precisely, we say that {1\leq j\leq k} is a right-good position of {\beta} whenever one can find two elements {\beta^{(s)}=\beta_1\dots\beta_{j-1}\beta_j^{(s)}\dots\beta_k^{(s)}} ({s=1,2}) of {\widetilde{B}} such that the following inequality holds

\displaystyle [0;\beta_j^{(1)}]<[0;\beta_j]<[0;\beta_j^{(2)}].

Similarly, {j} is a left-good position of {\beta} whenever one can find two elements {\beta^{(s)}=\beta_1\dots\beta_{j-1}\beta_j^{(s)}\dots\beta_k^{(s)}} ({s=3,4}) of {\widetilde{B}} such that

\displaystyle [0;(\beta_j^{(3)})^t]<[0;\beta_j]<[0;(\beta_j^{(4)})^t].

Finally, we say that {j} is a good position of {\beta} if it is both a left-good and a right-good position.

By a crude bound on the number of bad positions (namely, given a position, there are only two choices of {\beta_j} making {j} a bad position: {\beta_j\in\{\min B_0,\max B_0\}}), it is possible to prove that the quantity of words of {\widetilde{B}} with {\geq 9k/10} good positions is {\geq N_0^{(1-\tau)k}}. Such words are called excellent words by Gugu. Of course, we can hope to decrease the values of {m} on any excellent words (by some fixed amount {\delta>0}), but a new problem emerges: it may happen that the subset of {\widetilde{B}} formed by excellent words doesn’t give a complete subshift (e.g., it may be not possible to concatenate excellent words). At this stage, Gugu employs a exclusion procedure combined with a double counting argument to show that we can build a complete shift by focusing at a large portion of the good positions of {\widetilde{B}} and fixing the words of {B_0} appearing into these good positions. Namely, Gugu shows that there are special good positions {j_1,j_{s+1},\dots,j_{3N_0^2}} (i.e., they are good positions such that {j_1+1,j_2+1\dots,j_{3N_0^2}+1} are also good positions) such that the subset {X} of excellent words {\beta= (\beta_1,\dots,\beta_k)} whose entries at these good positions are equal to conveniently chosen words {\widehat{\beta}_{j_s}} (i.e., {\beta_{j_s}=\widehat{\beta}_{j_s}} and {\beta_{j_s+1}=\widehat{\beta}_{j_s+1}} verifies

\displaystyle |X|>N_0^{(1-2\tau)k}

Furthermore, Gugu proves that there are two integers {s2\lceil2/tau\rceil} such that the image {\pi_{s,t}(X):=B} of the projection {\pi_{s,t}:X\rightarrow B_0^{j_t-j_s}} of the form

\displaystyle \pi_{s,t}(\beta_1,\dots,\beta_k)=(\beta_{j_s},\beta_{j_s+1},\dots,\beta_{j_t})

is the desired subset of words such that {\Sigma(B)\subset\Sigma_{t-\delta}} (essentially by the definition of good position) and {HD(K(B))>(1-\eta)\Delta(t)} (since {I(\alpha)}, {\alpha\in B}, is a covering of {K_t} composed of {|B|\geq N_0^{(1-10\tau)(j_t-j_s)}} intervals verifying {|I(\alpha)|>e^{(j_t-j_s)(r_0+5)}}).

This completes the proof of the lemma. \Box

Once we have the lemmas 13, 15 and the proposition 14 in our toolbox, we can prove the first part of theorem 3.

End of the proof of theorem 3

We claim that {d(t)=\min\{1,2\cdot\Delta(t)\}}. Indeed, using the lemma 13 to the complete shift {K(B)} constructed in the lemma 15, we see that

\displaystyle \min\{1,2(1-\eta)\Delta(t)\}\leq \min\{1,2\cdot HD(K(B))\}=HD(l(\Sigma(B)))


\displaystyle HD(l(\Sigma(B)))\leq HD(l(\Sigma_{t-\delta}))

On the other hand,

\displaystyle HD(l(\Sigma_{t-\delta}))\leq HD(L\cap(-\infty,t-\delta))\leq HD(M\cap(-\infty,t-\delta))


\displaystyle HD(M\cap(-\infty,t-\delta))\leq HD(M\cap(-\infty,t))\leq \min\{1,2\cdot HD(K_t)\}


\displaystyle \min\{1,2\cdot HD(K_t)\}\leq \min\{1,2\cdot\Delta(t)\}

Putting these four estimates together and letting {\eta\rightarrow0}, we conclude the proof of the claim. Observe that this argument also shows that {d(t):=HD(L\cap(-\infty,t))=HD(M\cap(-\infty,t))}.

Finally, the continuity of {d(t)} follows from the formula {d(t)=\min\{1,2\cdot\Delta(t)\}} and the fact that {\Delta(t)} is continuous (see the proposition 14).

Remark 9 From the identity {d(t)=\min\{1,2\cdot\Delta(t)\}}, we see that the first item of theorem 3 is equivalent to {\Delta(t)=D(t)} (where {D(t):=HD(k^{-1}((-\infty,t)))}). Although this is not terribly difficult to show, we are not going to prove this fact here.


  1. Dear Matheus,

    I do think that the dimension formula for sums of non-essentially affine Cantor sets does not have the minus 1. In fact, the projection l_{\pi/4} should not necessarily make the Hausdorff Dimension smaller. It must have total Hausdorff Dimension or the expected one for products of Cantor sets, which is just the sum HD(K)+HD(K’).

  2. In the proof of Lemma 13, you must saturate the first inclusion by sufficiently large iterates of g. Specifically, the inclusion should be
    $$m(\sigma(B))\subset \cup_{1\le a\le 4 and 1\le i,j\le R} (a+g^i(K(B)))+g^j(K(B^t)),$$
    where R is the lenght of the biggest word of B. This happens because we don’t know for sure that the supremum is attained at the beggining of each word \beta_i.

  3. In the following inequality, the correct term is HD(K(B)).

  4. After considering B^n, the correct term is HD(K(B)) instead of HD(HD(B)).

    Observe that, FOR EACH POSITION {n} OF THE CENTRAL BLOCK, {\beta_n(\theta(x))} doesn’t depend on {x}, so that the function {g_n(x):=\alpha(\theta(x)) + \beta(\theta(x))}.
    (on the above definition, \alpha and \beta must be indexed by n).

    In the section -Approximation of Lagrange/Markov spectrum by complete shifts-, it remains to say that r(a_1,\ldots,a_n)<r.

  5. Dear Yuri,

    thank you very much for the careful list of corrections!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: