Posted by: matheuscmss | November 9, 2012

Decay of correlations for flows and Dolgopyat’s estimate

Let {T:M\rightarrow M} (resp. {T^t:M\rightarrow M}, {t\in\mathbb{R}}, {T^{t+s}=T^t\circ T^s}) be a discrete-time (resp. continuous-time) dynamical system preserving a probability measure {\mu}.

We say that {T} (resp. {T^t}) is mixing when {\mu(A\cap T^{-n}(B))\rightarrow\mu(A)\mu(B)} as {n\rightarrow\infty} (resp. {\mu(A\cap T^{-t}(B))\rightarrow \mu(A)\mu(B)} as {t\rightarrow\infty}). Intuitively, a dynamical system is mixing whenever the events {A} and {T^{-n}(B)} become “independent” as {n\rightarrow\infty}.

Equivalently, {T}, resp. {T^t}, is mixing when, given two observables {f:M\rightarrow\mathbb{R}}, {g:M\rightarrow\mathbb{R}} in {L^2(M,\mu)}, the correlation function

\displaystyle C_n(f,g):=\int f\circ T^n \cdot g \, d\mu - \left(\int f d\mu\right)\cdot\left(\int g d\mu\right),

resp.

\displaystyle C_t(f,g):=\int f\circ T^n \cdot g \, d\mu - \left(\int f d\mu\right)\cdot\left(\int g d\mu\right)

goes to zero as {n\rightarrow\infty}, resp. {t\rightarrow\infty}.

Remark 1 The equivalence between the two definitions of mixing can be easily seen by noticing that {C_n(\chi_A,\chi_B)=\mu(A\cap T^{-n}(B))-\mu(A)\mu(B)} (where {\chi_X} is the characteristic function of {X\subset M}) and by recalling that measurable observables are always nicely approximated by linear combinations of characteristic functions.

Example 1 The Bernoulli shift {\sigma:\{0,1\}^{\mathbb{Z}}\rightarrow\{0,1\}^{\mathbb{Z}}}, {\sigma((a_i)_{i\in\mathbb{Z}})=(a_{i+1})_{i\in\mathbb{Z}}} equipped with the Bernoulli (product) measure {\mu=\nu^{\mathbb{Z}}}, {\nu(\{0\})=\nu(\{1\})=1/2} is a mixing dynamical system. Indeed, since the cylinders {\Sigma([x_{-n},\dots,x_m])=\{(a_i)_i\in\mathbb{Z}: a_i=x_i \textrm{ for }i=-n,\dots,m\}} generate the Borel {\sigma}-algebra, it suffices to study the correlation function in the case {A=\Sigma([x_{-n},\dots,x_m])} and {B=\Sigma([y_{-k},\dots, y_l])}, and, in this situation, it is not hard to see that the events {A} and {\sigma^{-N}(B)} are independent (i.e., {\mu(A\cap \sigma^{-N}(B))=\mu(A)\mu(B)}) for all {N>l+n}.

Given a mixing dynamical system, it is natural to ask oneself about the speed of decay of the correlation function {C_n(f,g)}, that is, how fast (polynomially? exponentially?) does it go to zero as {n\rightarrow\infty}. In fact, besides its intrinsic interest, this question has interesting applications in other areas: for example, the exponential (or at least sufficiently high degree polynomial) decay of correlation functions (for “smooth” observables) of the geodesic flow on hyperbolic manifolds was recently used by J. Kahn and V. Markovic to study fundamental groups of hyperbolic {3}-manifolds (and, in its turn, this work of J. Kahn and V. Markovic was used by I. Agol in his solution of the so-called virtual Haken conjecture).

The long-term goal of this post is the discussion of the so-called Dolgopyat estimate and its application to exponential decay of correlations of certain flows, i.e., continuous-time dynamical systems. In this direction, we will divide this post into two sections. In the next section, we will firstly discuss exponential decay of correlations for certain discrete-time dynamical systems. Then, in the final section, we will see that, as far as decay of correlations are concerned, the case of flows is quite different from the discrete-time case, and we will discuss how Dolgopyat’s estimate enters into the game.

Remark 2 For a recent application of Dolgopyat’s estimate in a number-theoretical setting, see the paragraph after Theorem 4.6 of this article of J. Bourgain and A. Kontorovich on Zaremba’s conjecture.

Closing this introduction, let us make the following technically useful remark. In general, while studying correlation functions {C_n(f,g)} (or {C_t(f,g)}), we can assume (without loss of generality) that {\int g d\mu=0}. Indeed, this is a consequence of the following identities:

\displaystyle \begin{array}{rcl} C_n(f,g) &=& \int f\circ T^n \cdot \left(g-\int g d\mu + \int g d\mu\right) \, d\mu - \left(\int f d\mu\right)\cdot\left(\int g d\mu\right) \\ &=& \int f\circ T^n \cdot \left(g-\int g d\mu\right) d\mu \\ &+& \left(\int f\circ T^n d\mu\right) \left(\int g d\mu\right) - \left(\int f d\mu\right)\cdot\left(\int g d\mu\right) \\ &=& \int f\circ T^n \cdot \left(g-\int g d\mu\right) d\mu \\ &=& C_n(f,g-\int g d\mu) \end{array}

Here, in the third equality, we used the fact that {T} preserves {\mu} to deduce that {\int f\circ T^n d\mu=\int f d\mu}. Thus, we can (and do) systematically assume that {\int g d\mu=0} in what follows.

1. Correlations of certain discrete-time systems

Let’s warm up by studying decay of correlations of the following baby example. Consider {T:S^1=\mathbb{R}/\mathbb{Z}\rightarrow S^1=\mathbb{R}/\mathbb{Z}}, {T(x)=d\cdot x}, {d\geq 2} integer, a linear (uniformly) expanding map of the circle {S^1=\mathbb{R}/\mathbb{Z}}. Denote by {\mu} the Lebesgue measure on {S^1}. The reader is invited to check that the Lebesgue measure {\mu} is {T}-invariant.

Consider now {f, g:S^1\rightarrow\mathbb{R}} two analytic observables, say {f(x)=\sum\limits_{n\in\mathbb{Z}}a_n e^{inx}} and {g(x)=\sum\limits_{m\in\mathbb{Z}}b_m e^{imx}} such that, for some fixed {\rho>0}, we have {|a_n|<e^{-\rho n}} and {|b_m|<e^{-\rho m}}, {\forall\,n,m} sufficiently large.

Let us now study the correlation {C_k(f,g)}. As we mentioned above, we can assume that {b_0=\int g d\mu=0}, so that {C_k(f,g)=\int f\circ T^k(x) \cdot g(x) \, d\mu(x)}. In this setting, we have

\displaystyle C_k(f,g)=\int\left(\sum\limits_{n\in\mathbb{Z}} a_n e^{i n d^k x}\right)\left(\sum\limits_{m\in\mathbb{Z}-\{0\}}b_m e^{imx}\right) d\mu=\sum\limits_{n\in\mathbb{Z}-\{0\}} a_n b_{-n d^k}

It follows that

\displaystyle \begin{array}{rcl} |C_k(f,g)|&=&|\sum\limits_{n\in\mathbb{Z}-\{0\}} a_n b_{-n d^k}|\leq \left(\sup\limits_{n\in\mathbb{Z}-\{0\}} b_{-n d^k}\right)\left(\sum\limits_{n\in\mathbb{N}} |a_n|^2\right)^{1/2} \\ &=& \left(\sup\limits_{n\in\mathbb{Z}-\{0\}} b_{-n d^k}\right)\cdot\|f\|_{L^2} \\ &<& e^{-\rho\cdot d^k}\cdot\|f\|_{L^2}, \end{array}

that is, the correlation {C_k(f,g)} decays super-exponentiallyas {k\rightarrow\infty}.

Despite its simplicity, this calculation contains some important ideas in the study of correlations of hyperbolic systems.

Firstly, the constant {\sup\limits_{n\in\mathbb{Z}-\{0\}} b_{-n d^k}} “responsible” for the decay of {C_k(f,g)} is strongly related to the “expanding” (“hyperbolic”) nature of {T(x)=d\cdot x}, {d\geq 2}: indeed, by composing {f} with {T^k} and testing against {g}, we are “shifting” the Fourier modes {b_{n\cdot d^k}} from its original spot {n\cdot d^k} to the spot {n}; geometrically, this means that the iterates {T^k} of the expanding map {T} are “spreading” the support of {f} around {S^1}, so that the variations (derivatives) of {f\circ T^k} at any given scale become weaker as {k} increases. This “spreading” and “smoothing” process of {f} due to the expanding features of {T} is illustrated in the following picture:

Secondly, the decay of the correlations is measured by the quantity {\sup\limits_{n\in\mathbb{Z}-\{0\}} b_{-n d^k}} related to the “oscillations” (sizes of “derivatives”) of {f\circ T^k}. In particular, it is important to impose some regularity on the observables {f} and {g} and then try to estimate {C_k(f,g)} in terms of certain norms capturing the fine structure (oscillations) of {f} and {g} at small scales (such as Hölder norms, {C^r} norms, Sobolev norms, bounded variation norms, analytic norms, etc.). Of course, there is no unique choice of such norms but the main point here is that some regularity “better than {C^0} or {L^{\infty}}” is certainly needed: for instance, in Figure 1 above, we see that, despite the fact that {f} “oscillates” more than {f\circ T^k}, their {C^0} norm are equal (to {1}), i.e., the {C^0} norm doesn’t see the difference between the fine structures of {f} and {f\circ T^k}. Actually, as it turns out, even for dynamical systems such as the Bernoulli shift and the map {T(x)=dx} (mod 1) that we would like to call “exponentially mixing”, it is possible to select observables {f} and {g} with low regularity (for instance not Hölder) such that the correlation {C_k(f,g)} decays “slowly”. We refer the reader to Subsection 1.4 of V. Baladi’s book for more comments on such examples.

After our warm up with the baby example {T(x)=d\cdot x}, {d\geq 2} on the circle {S^1=\mathbb{R}/\mathbb{Z}}, let us mention that a Fourier-analytic (Harmonic Analysis) approach to the study of decay of correlations normally works well in algebraic contexts (hyperbolic automorphisms/flows on nilmanifolds), but, in general, they are not well-adapted to treat non-algebraic hyperbolic systems. For this reason, one needs other methods to study the decay of correlations of hyperbolic systems. In the sequel, we will loosely follow the excellent book by Viviane Baladi to describe a popular functional-analytic method for the decay of correlations of uniformly expanding discrete-time systems.

Let {M} be a connected compact Riemannian manifold without boundary, and let {T:M\rightarrow M} be an uniformly expanding {C^1} map, i.e., for some {\lambda>1}, one has {\|DT(x)\cdot v\|\geq \lambda\|v\|} for all {x\in M}, {v\in T_xM}.

Denote by {m} the Lebesgue measure on {M} induced by its Riemannian structure. In the sequel, we will illustrate the application of methods of Functional Analysis to the study of decay of correlations by giving an outline of the proof of the following result:

Theorem 1 Suppose that {T} is a {C^{1+\alpha}} expanding map for some {0<\alpha<1}. Then, {T} preserves an unique probability {\mu=\phi_0\cdot m} absolutely continuous with respect to the Lebesgue measure {m} whose Radon-Nikodym derivative {d\mu/dm=\phi_0} is {C^{\alpha}} (i.e., {\alpha}Hölder continuous) and positive. Moreover, the correlation functions {C_n(f,g)} decay exponentially as {n\rightarrow \infty} whenever {f} and {g} are {C^{\alpha}} observables.

Remark 3 The assumption that {T} is {C^{1+\alpha}} can’t be removed: indeed, it is not hard to construct {C^1} expanding maps where the existence of absolutely continuous invariant probabilities fails (see, e.g., here and here). Actually, A. Avila and J. Bochi showed here that the non-existence of absolutely invariant probabilities is a generic property among {C^1} maps.

Historically, the first part of this theorem (concerning the existence and uniqueness of an absolutely continuous invariant probability with Hölder density) is due to K. Krzyzewski and W. Szlenk, while the presentation below of the exponential decay of correlations is due to C. Liverani.

The first step towards the proof of Theorem~1 is the following observation. Suppose that {\nu=\phi\cdot m} is an absolutely continuous measure. Then, the pullback {T^*(\nu)} of {\nu} by {T} (defined as {T^*(\nu)(A):=\nu(T^{-1}(A))}) satisfies

\displaystyle T^*(\nu)=\mathcal{L}(\phi)\cdot m

where

\displaystyle \mathcal{L}(\phi)(x):=\sum\limits_{y\in T^{-1}(x)} \frac{1}{|\det DT(y)|}\phi(y)

is the so-called Ruelle’s transfer operator. The verification of this identity is a simple application of the change of variables theoremand it is left to the reader as an exercise.

Remark 4 In the previous identity it is implicit that, as {T} is local diffeomorphism of a compact manifold {M}, the function {|\det DT(x)|} is bounded away from {0} and {\infty}, and the quantity {\#T^{-1}(x)} is finite for every {x\in M} (actually, the quantity {\#T^{-1}(x)} is also independent of {x} and it is the so-called degree {\textrm{deg}(T)} of {T}).

Remark 5 More generally, given a function {g:M\rightarrow\mathbb{R}}, we can define the transfer operator {\mathcal{L}_g(\phi)(x):=\sum\limits_{y\in T^{-1}(x)} e^{g(y)}\phi(y)}. In this language, the transfer operator {\mathcal{L}} defined above is {\mathcal{L}_{-\log|\det DT|}}. In the literature, these slightly generalized transfer operators are important: for instance, they allow to construct a class of dynamically relevant invariant probabilities called equilibrium states (see V. Baladi’s book for more explanations).

In words, this identity says that the transfer operator {\mathcal{L}} keeps track of what happens with the density {\phi} when we pullback the measure {\nu=\phi\cdot m} (in other words, the pullback operator {T^*} on measures and the transfer operator {\mathcal{L}} are in a sort of duality).

In terms of the transfer operator, an absolutely continuous probability {\mu=\phi_0\cdot m} that is {T}-invariant (i.e., {T^*(\mu)=\mu}) corresponds to a fixed point {\phi_0=\mathcal{L}(\phi_0)} of {\mathcal{L}}. In particular, this hints that functional-analytic methods might be useful to find {\phi_0}.

In our setting, the basic idea is very simple: we will find the fixed point {\phi_0} by iterating {\mathcal{L}}, or, more precisely, we will show that the sequence {\mathcal{L}^n(1)} “converges” to {\phi_0} (where {1} is the constant function). However, before talking about “convergence”, we need to specify a nice functional space where this convergence will take place.

In fact, there is no unique/standard choice of adequate functional spaces for the study of the transfer operator, and this explains why there is a vast literature (see, e.g., these papers of V. Baladi, V. Baladi and S. Gouezel, S. Gouezel and C. Liverani, V. Baladi and M. Tsujii) on the selection of Banach spaces adapted to certain classes of dynamical systems.

Fortunately, in our current setting, the choice is not hard: we will consider the transfer operator {\mathcal{L}} acting on the Banach space {C^{\alpha}} of {\alpha}-Hölder continuous functions {\phi:M\rightarrow\mathbb{R}} equipped with the norm

\displaystyle \|\phi\|_{C^{\alpha}}=\sup\limits_{x\in M}|\phi(x)| + \sup\limits_{z\neq w}\frac{|\phi(z)-\phi(w)|}{d(z,w)^{\alpha}}=:\|\phi\|_{C^0}+|\phi|_{C^{\alpha}}

where {d(.,.)} is the distance induced by the Riemannian structure on {M}.

Note that {\mathcal{L}} preserves {C^{\alpha}}: indeed, as {T} is {C^{1+\alpha}} expanding of a compact manifold {M}, we have that each term of the finite sum

\displaystyle \sum\limits_{y\in T^{-1}(x)}\frac{1}{|\det DT(y)|}\phi(y) \quad \quad (=\mathcal{L}(\phi)(x))

is a locally uniformly Hölder continuous function of {x}.

Once we selected the functional space to let {\mathcal{L}} act, we need some sort of “contraction property” in order to check that the sequence {\mathcal{L}^n(1)} is actually converging to some {\phi_0}. For this purpose, we recall below some of Garrett Birkhoff‘s results on Hilbert (projective) metrics on positive cones in Banach spaces.

Let {B} be a Banach space and {\Lambda\subset B-\{0\}} a closed convex cone (i.e., {\Lambda\cup\{0\}} is a closed subset of {B} such that {\phi+\psi\in\Lambda} if {\phi} and {\psi} belong to {\Lambda}, and {\lambda \phi\in\Lambda} whenever {\lambda\in\mathbb{R}^+} and {\phi\in\Lambda}). A convex cone {\Lambda} defines a partial order {\preceq_{\Lambda}}: we say that {\phi\preceq_{\Lambda}\psi} when {\psi-\phi\in\Lambda\cup\{0\}}. This partial order is compatible with the vector space structure of {B} (i.e., it is respected by additions and multiplication by positive scalars), and it is continuous (i.e., if {\phi_n\rightarrow\phi}, {\psi_n\rightarrow\psi} and {\phi_n\preceq_{\Lambda}\psi_n} for all {n\in\mathbb{N}}, then {\phi\preceq_{\Lambda}\psi}) when the cone {\Lambda} is closed.

Using the partial order {\preceq_{\Lambda}}, we can define the so-called Hilbert projective metric on {\Lambda} as follows. Given {\phi,\psi\in\Lambda}, we put

\displaystyle \alpha(\phi,\psi):=\sup\{s\in\mathbb{R}:s\phi\preceq_{\Lambda}\psi\}, \, \beta(\phi,\psi)=\inf\{r\in\mathbb{R}:\psi\preceq_{\Lambda}r\phi\}

(with the convention that {\alpha(\phi,\psi)=\infty} and/or {\beta(\phi,\psi)=0} when these sets are empty), and we define the Hilbert projective metricas

\displaystyle \Theta_{\Lambda}(\phi,\psi)=\log\frac{\beta(\phi,\psi)}{\alpha(\phi,\psi)}

It is worth to point out that {\Theta_{\Lambda}} is a natural (and classical) object in Geometry because it induces a metricon the projectivization {\Lambda/\sim} obtained by quotienting {\Lambda} by the relation {\phi\sim\psi} iff {\phi=\lambda\psi} for some {\lambda\in\mathbb{R}^+}.

A key result of Garrett Birkhoff in this context is the following theorem:

Theorem 2 Let {\Lambda} be a convex cone in a Banach space {B} and let {\mathcal{L}:B\rightarrow B} be a linear operator such that {\mathcal{L}(\Lambda)\subset \Lambda}. Then,\displaystyle \Theta_{\Lambda}(\mathcal{L}(\phi), \mathcal{L}(\psi))\leq \tanh\left(\frac{\textrm{diam}_{\Theta_{\Lambda}}(\mathcal{L}(\Lambda))}{4}\right)\cdot\Theta_{\Lambda}(\phi,\psi)

for all {\phi,\psi\in\Lambda}.

In a nutshell, this theorem says that any linear operator {\mathcal{L}} sending a convex cone {\Lambda} inside itself doesn’t increase the Hilbert distance, and, furthermore, {\mathcal{L}} contracts strictly the Hilbert distance if the {\Theta_{\Lambda}}-diameter of {\mathcal{L}(\Lambda)} is finite.

In the context of transfer operators {\mathcal{L}} associated to a {C^{1+\alpha}} expanding map {T:M\rightarrow M}, the plan is to apply Birkhoff’s result to the following family {\Lambda_{L}}, {L\in\mathbb{R}}, of cones in {B=C^{\alpha}(M)}:

\displaystyle \Lambda_L:=\{\phi\in C^{\alpha}(M): \phi(z)>0,\, |\log\phi(z)-\log\phi(w)|\leq L d(z,w)^{\alpha}\,\forall\, z,w\in M\}

In plain terms, {\Lambda_L} is the cone of positive {\alpha}-Hölder functions {\phi} such that {\log\phi} has {\alpha}-Hölder constant at most {L}. As the reader can check, {\Lambda_L} is a family of closed convex cones in {C^{\alpha}}.

The family {\Lambda_L} is well-adapted for our current goals because of the following result:

Lemma 3 Let {T} be a {C^{1+\alpha}} expanding map, say {\|DT(x)\cdot v\|\geq \lambda\|v\|} for some {\lambda>1}. Then, given {\lambda^{-\alpha}<\gamma<1}, there exists {0<L_0=L_0(\gamma,\alpha,T)<\infty} such that\displaystyle \mathcal{L}(\Lambda_L)\subset \Lambda_{\gamma L}

for every {L>L_0}.

Proof: We start the argument with the following auxiliary estimate. Given {z, w\in M}, we have that

\displaystyle \begin{array}{rcl} \frac{|\det DT(z)|}{|\det DT(w)|}&=&\exp(\log|\det DT(z)|-\log|\det DT(w)|)\\ &\leq& \exp\left(\frac{1}{\inf\limits_{x\in M} |\det DT(x)|}||\det DT(z)|-|\det DT(w)||\right) \\ &\leq&\exp\left(\lambda^{-\ell}\cdot|\det DT|_{C^{\alpha}}\cdot d(z,w)^{\alpha}\right)=\exp(C(\alpha,T)\cdot d(z,w)^{\alpha}). \end{array}

where {C(\alpha,T):=\lambda^{-\ell}\cdot|\det DT|_{C^{\alpha}}}. Here, we used that {|\det DT|} is {C^{\alpha}} (because {T} is {C^{1+\alpha}}), and {|\det DT(x)|\geq \lambda^\ell} for all {x\in M} where {\ell} is the dimension of {M} (because {\|DT(x)\cdot v\|\geq \lambda\|v\|}).

After this preliminary estimate, let us consider {\phi\in\Lambda_L} and let us try to show that {\mathcal{L}\phi\in\Lambda_{\gamma L}} for {L} sufficiently large. Denote by {k=\textrm{deg}(T)} the degree of {T}. Given {x\in M}, let us fix a numbering {y_1,\dots, y_k} of the elements of {T^{-1}(x)}. From the expanding features of {T}, we have that for any {x'\in M} close to {x}, the elements of {T^{-1}(x')} can be numbered as {y_1',\dots, y_k'} in such a way that {d(y_i,y_i')\leq \lambda^{-1}d(x,x')}. Now, let us study the quantity

\displaystyle \log\mathcal{L}\phi(x)-\log\mathcal{L}\phi(x')

The proof of the lemma will be complete if we show that this quantity is bounded from above by {\gamma\cdot L\cdot d(x,x')^{\alpha}} (for {L} sufficiently large). In this direction, let us note that, by definition,

\displaystyle \mathcal{L}\phi(x')=\sum\limits_{j=1}^k\frac{1}{|\det DT(y_j')|}\phi(y_j')

From our auxiliary estimate and the fact that {\phi\in\Lambda_L}, we deduce that

\displaystyle \mathcal{L}\phi(x')\leq\sum\limits_{j=1}^k \exp\left(C(\alpha,T) d(y_j,y_j')^{\alpha}\right)\frac{1}{|\det DT(y_j)|} \exp(Ld(y_j,y_j')^{\alpha})|\phi(y_j)|

Since {d(y_j,y_j')\leq\lambda^{-1}d(x,x')}, we obtain from this inequality that

\displaystyle \begin{array}{rcl} \mathcal{L}\phi(x')&\leq& \exp((C(\alpha,T)+L)\lambda^{-\alpha}d(x,x')^{\alpha})\sum\limits_{j=1}^{k}\frac{1}{|\det DT(y_j)|}\phi(y_j) \\ &=&\exp((C(\alpha,T)+L)\lambda^{-\alpha}d(x,x')^{\alpha}) \mathcal{L}\phi(x) \end{array}

Because {x} and {x'} play symmetric roles, we conclude that

\displaystyle |\log\mathcal{L}\phi(x)-\log\mathcal{L}\phi(x')|\leq (C(\alpha,T)+L)\lambda^{-\alpha}d(x,x')^{\alpha},

that is, {\mathcal{L}\phi\in\Lambda_{(C(\alpha,T)+L)\lambda^{-\alpha}}} when {\phi\in\Lambda_L}. In particular, given {\lambda^{-\alpha}<\gamma<1}, since

\displaystyle (C(\alpha,T)+L)\lambda^{-\alpha}< \gamma L

for

\displaystyle L> (C(\alpha,T)\lambda^{-\alpha})/(\gamma-\lambda^{-\alpha}):=L_0(\gamma,\alpha,T),

the proof of the lemma is complete. \Box

In the literature, this type of “contraction estimate” for transfer operators is called Lasota-Yorke inequality (who first introduced this kind of inequality in this paper here). From Lemma 3 above (i.e., Lasota-Yorke inequality), we can outline (leaving the details to Baladi’s book) the end of the proof of Theorem 1 as follows.

By (directly) computing the quantities {\alpha(\phi,\psi)} and {\beta(\phi,\psi)} in the definition of Hilbert projective metric in the case of the family of cones {\Lambda_L} (see Equations (2.13) and (2.14) in Baladi’s book), it is possible to calculate the Hilbert projective metric {\Theta_{\Lambda_L}} on the cone {\Lambda_L} (see Equation (2.15) in Baladi’s book). From this calculation, one can show that the cone {\Lambda_{\gamma L}\subset \Lambda_L} has {\Theta_{\Lambda_L}}-diameter

\displaystyle \textrm{diam}_{\Theta_{\Lambda_L}}(\Lambda_{\gamma L})\leq 2\log\left(\frac{1+\gamma}{1-\gamma}\right)+2\gamma L (\textrm{diam}(M))^{\alpha}:=K(\gamma,L)<\infty

(see Equation (2.16) in Baladi’s book). Therefore, by combining the Lasota-Yorke inequality (Lemma 3), this diameter estimate, and Birkhoff’s theorem (cf. Theorem 2), we have that, for {L>L_0(\gamma,\alpha,T)}, {\psi,\phi\in\Lambda_L} and {n,k\in\mathbb{N}}:

\displaystyle \begin{array}{rcl} \Theta_{\Lambda_L}(\mathcal{L}^n(\phi),\mathcal{L}^{n+k}(\psi))&\leq& \tanh(K(\gamma,L)/4)^{n-1}\Theta_{\Lambda_L}(\mathcal{L}(\psi),\mathcal{L}(\phi)) \\ &\leq& \tanh(K(\gamma,L)/4)^{n-1}\cdot K(\gamma,L) \end{array}

In particular, since the constant function {1} belongs to {\Lambda_L} for all {L}, we deduce that the sequence {\mathcal{L}^n(1)} is Cauchy for the Hilbert metric {\Theta_{\Lambda_L}} (for {L>L_0}). In principle, this may not sound very interesting because we would like to extract a limit {\phi_0\in\mathcal{C}^{\alpha}} from the sequence {\mathcal{L}^n(1)} and, for this purpose, it is more relevant to get the Cauchy property with respect to the Hölder norm {\|.\|_{C^{\alpha}}}. Fortunately, it is possible to comparethe {\Theta_{\Lambda_L}}-distance and the {C^0}-distance for elements in {\Lambda_L} (see Lemmas 2.2 and 2.3 in Baladi’s book), so that we have a fixed point {\phi_0=\lim\limits_{n\rightarrow\infty}\mathcal{L}^n(1)\in\Lambda_L\subset C^{\alpha}} (for {L>L_0}). In particular, from the very definition of {\mathcal{L}}, it follows that {T} has an absolutely continuous invariant probability, namely {\mu=\phi_0\cdot m} with {\phi_0\in C^{\alpha}}. Furthermore, it is not hard to see that {\phi_0} is unique. Indeed, if {\phi_1\cdot m}, {\phi_1\in C^{\alpha}} ({\phi_1>0}), is another absolutely continuous {T}-invariant probability, {\phi_1} is another fixed point of {\mathcal{L}}. Since {\phi_1>0} and {\phi_1\in C^{\alpha}}, we have that {\phi_1\in\Lambda_L} for some {L(>L_0)}. Thus, from the previous (Cauchy) estimate for the {\Theta_{\Lambda_L}} we would have

\displaystyle \Theta_{\Lambda_L}(\phi_0, \phi_1)=\Theta_{\Lambda_L}(\mathcal{L}^n(\phi_0),\mathcal{L}^{n}(\phi_1))\leq \tanh(K(\gamma,L)/4)^{n-1}\cdot K(\gamma,L)\stackrel{n\rightarrow\infty}{\rightarrow} 0

and thus {\phi_1=\phi_0}: actually, from the fact that {\Theta_{\Lambda_L}(\phi_0, \phi_1)=0} we onlyget that {\phi_1\in\mathbb{R}_+\cdot\phi_0} (as {\Theta_{\Lambda_L}} is a projective metric), but in our case we can deduce the equality because {\phi_0\cdot m} and {\phi_1\cdot m} are probability measures (i.e., {\phi_0} and {\phi_1} are normalized so that {\int \phi_0\, dm=1=\int\phi_1\, dm}).

At this point, it remains only to justify why the correlation functions of Hölder observables decay exponentially fast to complete the proof of Theorem 1. Below we will just sketch how this is proven (refereeing to Theorem 2.3 in Baladi’s book for more details). One writes

\displaystyle \begin{array}{rcl} C_n(f,g)&=&\int f\circ T^n g d\mu - \int f d\mu \int g d\mu = \int f\circ T^n g \phi_0 dm - \int f \phi_0 dm \int g \phi_0 dm \\ &=& \int f \left(\mathcal{L}^n(g\phi_0)-\phi_0\left(\int g\phi_0 dm\right)\right)dm \end{array}

Thus, it is clear that {C_n(f,g)} decays exponentially if we can show that {\mathcal{L}^n(g\phi_0)} converges towards the point {\left(\int g\phi_0 dm\right)\phi_0} in the line {\mathbb{R}\cdot\phi_0}.

Here, this convergence takes place for {g\in\Lambda_L} ({L>L_0}) in the {C^0} norm because, as we vaguely mentioned, this norm can be compared with the {\Theta_{\Lambda_L}}-distance and for the latter we dispose of the Cauchy estimate {\Theta_{\Lambda_L}(\mathcal{L}^n(\phi),\mathcal{L}^{n+k}(\psi))\leq \tanh(K(\gamma,L)/4)^{n-1}\cdot K(\gamma,L)}. In general, the case {g\in C^{\alpha}} can be reduced to the case {g\in\Lambda_L} because the cones {\Lambda_L} are big enough to contain open balls in the {C^{\alpha}}-norm (formally, one replaces {g} by {(g+c)\phi_0} where {c>0} is a sufficiently large constant so that {(g+c)\phi_0} belongs to some {\Lambda_L}). Anyhow, this is essentially all we can say about the proof of Theorem 1 without entering into technical details. So, let’s close this section with the following remarks.

Remark 6 The Cauchy estimate for {\mathcal{L}} also allows to deduce its quasi-compactness in the space {C^{\alpha}} and the spectral gap property, i.e., its spectral radius is {1}, a simple eigenvalue of {\mathcal{L}} with eigenspace {\mathbb{R}\cdot\phi_0}, and all other elements of the spectrum of {\mathcal{L}} belong to the ball {B(0,r)\subset\mathbb{C}} centered at {0} and radius {r=\tanh(K(\gamma,L)/4)<1}. In general, the exponential decay of correlations and the quasi-compactness of the transfer operator are intimately related: for example, one can deduce exponential decay from quasi-compactness, and, for this reason, some authors try to prove first quasi-compactness before approaching the question of decay of correlation functions. However, as we will see in the next section, in the case of flows, one doesn’t necessarily has to show quasi-compactness of {\mathcal{L}} to deduce exponential decay of correlations, although some control of the spectrum of {\mathcal{L}} is needed in some way.

Remark 7 In this section we focused exclusively on expanding maps, but the theory of exponential decay of correlations exists also for hyperbolic diffeomorphisms, e.g., {C^{1+\alpha}} diffeomorphisms {f:M\rightarrow M} such that the tangent bundle {TM} decomposes into two complementary subbundles {TM=E^u\oplus E^s} where {f} expands along {E^u} and contracts along {E^s}. However, one has to be careful about how to construct the functional spaces along the stable direction: indeed, the fact that {f} contracts along {E^s} means that, by iterating forward, we see exactly the inverse of Figure 1! In particular, the transfer operator tend to behave “badly” on smooth functions in the direction {E^s}, and this is why normally one has to either “reduce” the dynamics to an expanding one by “quotienting modulo stable manifolds” (cf. Baladi’s book) or one considers the transfer operators on “anisotropic” functional spaces, i.e., a space of observables that are “smooth along the unstable direction {E^u}” and “distributions along the stable direction {E^s}” (cf. these articles here for more discussion on the “anisotropic approach”).

2. Dolgopyat’s estimate and correlations for flows

In this section we consider a volume-preserving {T_t:M\rightarrow M} continuous-time hyperbolic dynamical system, e.g., we assume that {TM=E^s\oplus\mathbb{R}X\oplus E^u} where {E^s} is contracted dy {DT_t}, {E^u} is expanded by {DT_t} and {X=(\partial T_t/\partial t)|_{t=0}} is the flow direction. A prominent class of hyperbolic (Anosov) flows are the geodesic flows on negatively curved manifolds (see, e.g., Katok-Hasselblat’s book). In this setting, denoting by {\mu} the volume (“Lebesgue measure”) preserved by {T_t} and we wish to study the correlation functions

\displaystyle C_t(f,g)=\int f\circ T_t(x) g(x) d\mu(x)-\int f(x) d\mu(x)\int g(x) d\mu(x)

via the (spectral) properties of the transfer operator

\displaystyle \mathcal{L}_t(\phi)(x):=\phi(T_{-t}(x))

At first sight, it is tempting to conjecture (by analogy with the case of diffeomorphisms) that the correlation functions of smooth observables for hyperbolic flows decay exponentially. However, one must be careful here because of a subtle but fundamental difference between hyperbolic diffeomorphisms and hyperbolic flows: while the derivative of hyperbolic diffeomorphisms have a definite (hyperbolic, i.e., contracting or expanding) behavior along all directions in {TM}, the derivative of hyperbolic flows have a definite hyperbolic behavior only along directions that are transverse to the flow direction {X}. Indeed, it is not hard to see that the behavior of {T_t} along the flow direction is neutral (“almost isometric”), i.e., the fact that the distance between {x} and {T_t(x)} is of order {t} remains almost unchanged after flowing these points by any time {s\in\mathbb{R}} (at least for {|t|} small).

In order to see how “nasty” the presence of the neutral (flow) direction might be (for the decay of correlations), let us consider the following “prototype” of hyperbolic flows.

Let {T:M\rightarrow M} be a discrete-time dynamical system, e.g., {T=\left(\begin{array}{cc} 2 & 1 \\ 1 & 1\end{array}\right)} acting on {\mathbb{T}^2=\mathbb{R}^2/\mathbb{Z}^2} (a.k.a. Arnold’s cat map). Given a positive “reasonable” function {g:M\rightarrow\mathbb{R}_+}, we can define a suspension flow {T_t} from the following receipt. We consider {N=(M\times\mathbb{R})/\sim} where {\sim} is the equivalence relation obtained by gluing the points {(x,g(x))} and {(T(x),0)}, and we let {T_t} be the flow induced by the flow {(x,s)\in M\times\mathbb{R}\mapsto (x,s+t)\in M\times\mathbb{R}}. In the literature, {T_t} is called the suspension of {T} with roof function {g}. In the case that the “basis dynamics” {T} is hyperbolic, it is not hard to see that {T_t} is a hyperbolic flow. The suspensions of hyperbolic maps represent all nice hyperbolic flows (because hyperbolic flows admit nice Markov partitions; see, e.g., Dolgopyat’s paper and references therein for more comments).

Now, let us consider the hyperbolic diffeomorphism {T} displaying a horseshoe whose dynamics is defined by two rectangles {R_0} and {R_1} in {M}, i.e., {R_0} and {R_1} are a Markov partition of the horseshoe (in Figure 1 of this post here, {R_0} and {R_1} are the connected components of {R\cap f^N(R)}). Next, we fix a roof function {g:M\rightarrow\mathbb{R}_+} such that {g|_{R_0}\equiv c_0}, {g|_{R_1}\equiv c_1} and {c_1/c_0\notin\mathbb{Q}}, that is, {g} is a piecewise constant roof function whose values on different parts of the horseshoe are rationally independent. In this setting, it is possible to check that the suspension flow {T_t} obtained from the data {T} and {g} is mixing (i.e., correlation functions of smooth observables decay to zero). But, as it was shown by D. Ruelle in this article here, the correlation functions do not decay exponentially fast! Intuitively, the decay of correlations gets slower for suspensions of hyperbolic maps {T} with piecewise constant roof functions {g} because, despite the fast mixing imposed by {T} in the “{M}-direction”, a set of the form {A\times[0,\varepsilon]}, {A\subset M}, {0<\varepsilon\ll \min\{c_0,c_1\}} travels with unit linear speed under {T_t} and hence it can’t hit the level {M\times \min\{c_0,c_1\}} before a time of order {\sim \min\{c_0,c_1\}} has passed (that is, the set {A\times[0,\varepsilon]} has to wait a little before visiting other levels such as {B\times[\min\{c_0,c_1\}-\varepsilon,\min\{c_0,c_1\}]}).

Once we know that the decay of correlation functions may be not exponential for hyperbolic flows, it is natural to ask oneself about the largest class of hyperbolic flows with exponential decay of correlations. In this direction, after the notable works of M. Ratner, M. Pollicott and P. Collet, H. Epstein and G. Gallavotti, we know that geodesic flows on constant negative curvature manifolds of dimension 2 and 3 (and, in dimension 2, some small perturbations of such flows) exhibit exponential decay of correlations. In these works, the techniques were algebraic (and/or perturbative), and thus they are hard to generalize.

After 6 years (or so) from these first works, N. Chernov came up with the first dynamical approach and he showed that geodesic flows on negatively curved surfaces have sub-exponential decay of correlations. Then, D. Dolgopyat developed a crucial estimate to show (here) allowing him to deduce exponential decay of correlations for Anosov flows with smooth ({C^1}) invariant foliations. In principle, the regularity assumption imposed by D. Dolgopyat is quite restrictive, but, a posteriori, some authors (such as C. Liverani, V. Baladi and B. Vallée, and A. Avila, S. Gouezel and J.-C. Yoccoz) saw that his estimate could work with less restrictive assumptions. In naive terms, the estimates by Dolgopyat can be used to prove exponential decay of correlations for suspension flows over hyperbolic maps with roof functions that are not integrable, i.e., they don’t look like piecewise constant function even after change of variables.

In the remainder of this post, we will try to briefly (and vaguely) explain from a geometrical point of view the meaning of Dolgopyat’s estimate and its relationship to suspensions flows over hyperbolic maps with non-integrable roof functions. Here, we will loosely follow this excellent article of C. Liverani on contact Anosov flows (and we strongly recommend the reader interested into details to consult this nicely written paper).

Firstly, while C. Liverani works with Anosov flows, I will somewhat simplify our lives by forgetting the stable direction, or, if one wishes, I will think of my flow {T_t} as a suspension of a {C^2} expanding map (instead of hyperbolic map) via some smooth (say {C^1}) roof function {g}. Next, I will assume that {g} is not integrable, i.e., {g} is not of the form {\psi+h\circ T-h} where {\psi} is piecewise constant and {h\circ T-h} is a smooth coboundary, i.e., {h\in C^1}. In simpler terms, {g} is integrable if it is possible to “change variables” (with the aid of the coboundary term {h\circ T-h}) so that our flow is conjugated to a suspension of an expanding map with a piecewise constant function {\psi}. In other words, the non-integrability says that our flow {T_t} is not one of the “bad flows” of D. Ruelle in disguise, and hence we have some chance of getting exponential mixing to start off. In C. Liverani’s paper, the analog of the non-integrability condition is “hidden” in the fact that he considers contact Anosov flows.

In this setting, given {f} a {C^1} observable with zero mean (i.e., {\int f=0}), our goal is to show that

\displaystyle \|\mathcal{L}_t f\|\leq C e^{-\sigma t}\|f\|_{C^1}

where {\|.\|} is some norm trying to measure the oscillations of {f} along the unstable direction, and {C,\sigma>0}. In fact, this estimate is sufficient to derive exponential decay of correlations for {C^1} observables because

\displaystyle \int f\cdot \phi\circ T_t=\int \mathcal{L}_t(f-\int f)\cdot \phi +\int f\int \mathcal{L}_t1 \phi = O_{f,\phi}(e^{-\sigma t}) + \int f \int \phi

Remark 8 The estimate {\|\mathcal{L}_t f\|\leq C e^{-\sigma t}\|f\|_{C^1}} doesn’t imply quasi-compactness of {\mathcal{L}_1} nor spectral gap property (compare with Remark 6). In fact, this happens because the current technology doesn’t permit to analyze directly the time-one map {T_1} of the flow {T_t} (except in some rare cases such as in this paper here). In some sense, this is an incarnation of the fundamental differences between maps and flows as far as correlations are involved).

In simple terms, we wish to take advantage of the expansion effect of {T} in the unstable (“{M}-direction”) in order to show the estimate {\|\mathcal{L}_t f\|\leq C e^{-\sigma t}\|f\|_{C^1}}, and, hence, it is a nice idea to consider a (Hölder or) {C^1} norm along this direction such as

\displaystyle \|f\|= \sup\limits_{x\in M}\sup\limits_{s\in\mathbb{R}}\left|\int_{s}^{s+\delta}f(x,t)dt\right| + \sup\limits_{s\in\mathbb{R}}\int_{s}^{s+\delta}\sup\limits_{x,y\in M, d(x,y)\leq \delta}\frac{|f(x,t)-f(y,t)|}{d(x,y)}dt

where {\delta>0} is a sufficiently small constant (for practical purposes, one may think that {\delta=1}).

Remark 9 Of course, {\|.\|} is not the norm used by C. Liverani in his article (instead, he deals with a slightly more technical version of it). However, we will stick to this norm because it is “good enough” to illustrate the cancellation idea behind Dolgopyat’s estimate.

By mimicking the argument of the previous section, we wish to understand how the norm of {\mathcal{L}_t f} evolves as {t\rightarrow\infty}. In this direction, we fix some piece of size {\delta} of “(fake) unstable manifold” of {T_t}, i.e., a set of the form {\hat{W}^u_{\delta}(x,s)=B(x,\delta)\times\{s\}} (where {B(x,\delta)\subset M} is the ball of center {x} and radius {\delta}) and we let evolve under {T_t}. Since the basis dynamics {T} of the suspension flow is an expanding map, we see that {T_t(W^u_{\delta}(x,s))} consists of a union of a certain exponential number of pieces of size {\delta}.

In particular, the function {\mathcal{L}_t (f|_{W^u_{\delta}(x,s)})} is nicely decomposed into a exponential number of functions {f_i} supported on (fake) unstable manifolds {W^u_{\delta}(x_i,t_{k,i})} crossing a set of the form {B(x_i,\delta)\times [s_i,s_i+\delta]} whose graphs obtained from the one of {f} after applying {T} a certain number of times. At this stage, the naive idea to estimate the norm {\|\mathcal{L}_t f\|} is: we estimate the norms of each {f_i} and we sum everything up. Here, since the basis dynamics {T} is expanding, the norm of {C^1} norm of {f_i} along the unstable direction (“{M}-direction”) gets better (by the reason explained in Figure 1). However, it is possible to see that the trivial bounds on the “{C^0} part” {\sup\limits_{s\in\mathbb{R}}\int_{s}^{s+\delta}\sup\limits_{x\in M}|f(x,t)|dt} of the norm {\|.\|} are not sufficient to deduce get the desired bound {\|\mathcal{L}_t f\|\leq Ce^{-\sigma t}\|f\|_{C^1}} because of the large (exponential) number of terms (i.e. functions {f_i}). In other words, the naive idea doesn’t work unless we get some sort of cancellation making the norms of {f_i}‘s somewhat smaller than the trivial bound.

In order to get the desired cancellation, we need essentially to compare (using {T_t}) the values of {f_i} on the several pieces {W^u_{\delta}(x_i,t_{k,i})} of unstable manifolds intersecting a given {B(x_i,\delta)\times [s_i,s_i+\delta]}. Here, the fact that the roof function {g} is not integrable plays a key role. Indeed, if {g} is (piecewise) constant (or more generally {g} is integrable), then the graphs of the restrictions of {f_i} to the several pieces {W^u_{\delta}(y_i,t_i)} match “perfectly” (possibly after a change of variables) as it is shown in the left side of the figure below. On the other hand, if {g} is not integrable, the graphs of the restrictions of {f_i} to the several pieces {W^u_{\delta}(x_i,t_{k,i})} will look somewhat shifted with respect to each other. In particular, a sum (or more generally a integral) of the form {\sum\limits_k f(x_i, t_{k,i})} exhibits a certain amount of cancellation (as the signs of the {f(x_i,t_{k,i})} really change) and, by quantifying this, one obtain gets a Dolgopyat estimate that the sum of the {C^0} parts of the norm {\|.\|} of the functions {f_i} is {\leq Ce^{-\sigma t}\|f\|_{C^1}} (compare with Lemma 5.2 in C. Liverani’s paper). This cancellation phenomenon is illustrated in the right-hand side of the figure below.

Closing this post, let us reiterate that this description is a very crude (first order) approximation of the beautiful ideas of Dolgopyat (as several details are missing such as the precise definition of the norms are not the same, one studies {\mathcal{L}_t} in terms of the its resolvent and not directly, etc.), but I hope that the sacrifices in rigor are compensated by the fact that the previous picture explains (at least intuitively) with the words “Dolgopyat estimate” and “cancellation due to oscillation and non-integrability” appear together in the papers in the fascinating subject of decay of correlations for hyperbolic flows.

Update [November 10, 2012]: The reader might find interesting to consult also this survey article by C. Liverani written in the occasion when the Brin prize 2009 was awarded to D. Dolgopyat.

Update [November 12, 2012]: As it was pointed out to me by Artur Avila today, there are some situations where the transfer operator \mathcal{L}_1 of the time-one map of the flow T_t can be analyzed directly (contrary to what was said in a previous version of Remark 8 above). Indeed, this is the case in this article here of Masato Tsujii.


Responses

  1. The topic in second section is a really involved one.

  2. Hi Matheus! Would you give some more explanation about Remark 8, that why the spectra gap does not follow from that estimate? Just because the two sides involve two different norms?

    • Hi Pengfei! You’re absolutely right. The norm \|\| is a sort of C^1 only along the unstable direction. But, since \|.\| doesn’t try to measure regularity in the flow direction, and, even worse, in the Anosov flow case, it is “dual to C^1” (in some sense), it is not possible to directly compare \|.\| and \|.\|_{C^1}, and, in particular, it is not possible to deduce quasi-compactness/spectral gap. Best, Matheus

      • Thank you!

  3. Dear Matheus, some further articles on this topic include “Nonlinearity 24 (2011), 1089” and a recent paper in arXiv:1301.6855, which amongst other things proves exponential decay of correlations for contact Anosov flows with
    respect to arbitrary Gibbs measures determined by Holder continuous potentials. This covers the case of geodesic flows on Riemann manifolds of any dimension.
    Luchezar


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Categories

%d bloggers like this: