1 Introduction
 Consider the (multiplicative) group 
 $G:=PSL_2(\mathbb {R})$
 with a Haar measure
$G:=PSL_2(\mathbb {R})$
 with a Haar measure 
 $\mu _G$
. A lattice
$\mu _G$
. A lattice 
 $\Gamma \subset G$
 is a discrete subgroup such that the quotient
$\Gamma \subset G$
 is a discrete subgroup such that the quotient 
 $X:=\Gamma \backslash G$
 has a fundamental domain in G of finite Haar measure. The Haar measure then descends to a finite measure
$X:=\Gamma \backslash G$
 has a fundamental domain in G of finite Haar measure. The Haar measure then descends to a finite measure 
 $\mu _X$
. We define the matrices
$\mu _X$
. We define the matrices 
 $$ \begin{align*} h(x):=\begin{pmatrix} 1 & x \\ 0 & 1 \end{pmatrix}, \quad a(y):=\begin{pmatrix} y^{{1}/{2}} & 0 \\ 0 & y^{-{1}/{2}} \end{pmatrix}. \end{align*} $$
$$ \begin{align*} h(x):=\begin{pmatrix} 1 & x \\ 0 & 1 \end{pmatrix}, \quad a(y):=\begin{pmatrix} y^{{1}/{2}} & 0 \\ 0 & y^{-{1}/{2}} \end{pmatrix}. \end{align*} $$
The geodesic flow at time t of 
 $p \in X$
 is defined by
$p \in X$
 is defined by 
 $g_t(p):=p a(e^t)$
 and the horocycle flow at time t is defined by
$g_t(p):=p a(e^t)$
 and the horocycle flow at time t is defined by 
 $h_t(p ):=p h(t)$
.
$h_t(p ):=p h(t)$
.
 While the orbit 
 $g_t(p )$
 for
$g_t(p )$
 for 
 $t \to \infty $
 can behave quite irregularly depending on the initial point, the horocycle orbit
$t \to \infty $
 can behave quite irregularly depending on the initial point, the horocycle orbit 
 $h_t(p )$
 is known to behave much more rigidly. Before we detail the known results, we pin down some notation. We say that the orbit
$h_t(p )$
 is known to behave much more rigidly. Before we detail the known results, we pin down some notation. We say that the orbit 
 $h_t(p )$
 equidistributes with respect to
$h_t(p )$
 equidistributes with respect to 
 $\mu _X$
 if for any compactly supported, continuous function f on X,
$\mu _X$
 if for any compactly supported, continuous function f on X, 
 $$ \begin{align*} \lim_{T \to \infty} \frac{1}{T} \int_{0}^T f(ph(t)) \; dt \to \int f \; d\mu_X. \end{align*} $$
$$ \begin{align*} \lim_{T \to \infty} \frac{1}{T} \int_{0}^T f(ph(t)) \; dt \to \int f \; d\mu_X. \end{align*} $$
Similarly, we say that the orbit equidistributes along a sequence 
 $a_n \in \mathbb {R}$
 with respect to
$a_n \in \mathbb {R}$
 with respect to 
 $\mu _X$
 if
$\mu _X$
 if 
 $$ \begin{align*} \lim_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} f(ph(a_n)) \; dt \to \int f \; d\mu_X. \end{align*} $$
$$ \begin{align*} \lim_{N \to \infty} \frac{1}{N} \sum_{n=0}^{N-1} f(ph(a_n)) \; dt \to \int f \; d\mu_X. \end{align*} $$
Lastly, a point 
 $p \in X$
 is called periodic if there is a
$p \in X$
 is called periodic if there is a 
 $t_0 \in \mathbb {R}$
 such that
$t_0 \in \mathbb {R}$
 such that 
 $p =p h(t_0)$
. In this case, the horocycle orbit will be trapped in the periodic orbit and will never equidistribute with respect to
$p =p h(t_0)$
. In this case, the horocycle orbit will be trapped in the periodic orbit and will never equidistribute with respect to 
 $\mu _X$
; the system
$\mu _X$
; the system 
 $t \mapsto p h(t)$
 is then isomorphic to the circle-rotation
$t \mapsto p h(t)$
 is then isomorphic to the circle-rotation 
 ${x \mapsto x+t_0^{-1}}$
 on the torus
${x \mapsto x+t_0^{-1}}$
 on the torus 
 $\mathbb {R}/\mathbb {Z}$
. Below, we use ‘
$\mathbb {R}/\mathbb {Z}$
. Below, we use ‘
 $p h(a_n)$
 equidistributes’ as a shorthand for ‘for all non-periodic
$p h(a_n)$
 equidistributes’ as a shorthand for ‘for all non-periodic 
 $p \in X$
,
$p \in X$
, 
 $p h(a_n)$
 equidistributes with respect to
$p h(a_n)$
 equidistributes with respect to 
 $\mu _X$
’. It was shown by Dani and Smillie that both
$\mu _X$
’. It was shown by Dani and Smillie that both 
 $p h(t)$
 for
$p h(t)$
 for 
 $t \in \mathbb {R}$
 and
$t \in \mathbb {R}$
 and 
 $p h(n)$
 for
$p h(n)$
 for 
 $n \in \mathbb {N}$
 equidistribute.
$n \in \mathbb {N}$
 equidistribute.
 It was subsequently asked what happens for sequences other than 
 $\mathbb {N}$
. Margulis conjectured that
$\mathbb {N}$
. Margulis conjectured that 
 $p h(p_n)$
, where
$p h(p_n)$
, where 
 $p_n$
 is the nth prime number, should also equidistribute. Shah conjectured that for any
$p_n$
 is the nth prime number, should also equidistribute. Shah conjectured that for any 
 $\gamma \geq 0$
,
$\gamma \geq 0$
, 
 $p h(n^{1+\gamma })$
 would equidistribute. We remark that these results follow for
$p h(n^{1+\gamma })$
 would equidistribute. We remark that these results follow for 
 $\mu _X$
-almost every
$\mu _X$
-almost every 
 $p \in X$
 from the work of Bourgain in a much more general context [Reference Bourgain1]. The challenge is really to establish equidistribution for all non-periodic
$p \in X$
 from the work of Bourgain in a much more general context [Reference Bourgain1]. The challenge is really to establish equidistribution for all non-periodic 
 $p \in X$
.
$p \in X$
.
 Venkatesh made progress on Shah’s conjecture by showing that for co-compact 
 $\Gamma $
, there is a small
$\Gamma $
, there is a small 
 $c=c(\Gamma )>0$
 such that for all
$c=c(\Gamma )>0$
 such that for all 
 $0 \leq \gamma <c$
 and all
$0 \leq \gamma <c$
 and all 
 $p \in X$
,
$p \in X$
, 
 $p h(n^{1+\gamma })$
 equidistributes [Reference Venkatesh9]. Venkatesh’s proof operates by controlling arithmetic sequences of the type
$p h(n^{1+\gamma })$
 equidistributes [Reference Venkatesh9]. Venkatesh’s proof operates by controlling arithmetic sequences of the type 
 $p h(sn)$
 for
$p h(sn)$
 for 
 $n \in \{0, \ldots , N-1\}$
 with s small compared with n. Controlling these sparse sequences also means that the almost-primes equidistribute for co-compact
$n \in \{0, \ldots , N-1\}$
 with s small compared with n. Controlling these sparse sequences also means that the almost-primes equidistribute for co-compact 
 $\Gamma $
; that is, for sufficiently big R,
$\Gamma $
; that is, for sufficiently big R, 
 $p h(q)$
 equidistributes, where q runs over all numbers having at most R many prime factors. That controlling sparse sequences is enough to control the almost-primes can be seen either using sieve methods or using the pseudo-random measure
$p h(q)$
 equidistributes, where q runs over all numbers having at most R many prime factors. That controlling sparse sequences is enough to control the almost-primes can be seen either using sieve methods or using the pseudo-random measure 
 $\nu $
, introduced by Goldston and Yilmaz, and subsequently used by Green and Tao to show that the primes contain infinitely long arithmetic progressions [Reference Goldston and Yildirim3, Reference Green and Tao4] (see [Reference Sarnak and Ubis6] for a proof of the equidistribution of almost-primes using sieve methods and [Reference Streck7] for a proof using the pseudo-random measure
$\nu $
, introduced by Goldston and Yilmaz, and subsequently used by Green and Tao to show that the primes contain infinitely long arithmetic progressions [Reference Goldston and Yildirim3, Reference Green and Tao4] (see [Reference Sarnak and Ubis6] for a proof of the equidistribution of almost-primes using sieve methods and [Reference Streck7] for a proof using the pseudo-random measure 
 $\nu $
).
$\nu $
).
 Using Venkatesh’s method in the case of a non-compact lattice, one can show that 
 $p h(n^{1+\gamma })$
 and
$p h(n^{1+\gamma })$
 and 
 $ph(q)$
, q almost prime, equidistribute under the assumption of a Diophantine condition on p [Reference McAdam5, Reference Zheng10, Reference Zheng11]. This Diophantine condition assures that the horocycle orbit
$ph(q)$
, q almost prime, equidistribute under the assumption of a Diophantine condition on p [Reference McAdam5, Reference Zheng10, Reference Zheng11]. This Diophantine condition assures that the horocycle orbit 
 $ph(t)$
 equidistributes with rate
$ph(t)$
 equidistributes with rate 
 $T^{-\varepsilon }$
 for all T, which is needed for Venkatesh’s argument. Using the fact that for any point p, there are some times
$T^{-\varepsilon }$
 for all T, which is needed for Venkatesh’s argument. Using the fact that for any point p, there are some times 
 $T_i \to \infty $
 such that
$T_i \to \infty $
 such that 
 $p h(t), t \leq T_i$
 equidistributes with error
$p h(t), t \leq T_i$
 equidistributes with error 
 $T_i^{-\varepsilon }$
, one can also deduce with the same method that the orbits
$T_i^{-\varepsilon }$
, one can also deduce with the same method that the orbits 
 $p h(n^{1+\gamma })$
 and
$p h(n^{1+\gamma })$
 and 
 $ph(q)$
, q almost prime, are dense.
$ph(q)$
, q almost prime, are dense.
 However, showing equidistribution for all p is significantly harder, as there are p such that there are times T for which the equidistribution of 
 $p h(t), t \leq T$
 is far worse than polynomial. In this case, Venkatesh’s method cannot be applied.
$p h(t), t \leq T$
 is far worse than polynomial. In this case, Venkatesh’s method cannot be applied.
 Sarnak and Ubis were the first to show such a sparse equidistribution result for all initial p. They showed that the almost-primes equidistribute for 
 $\Gamma =PSL_2(\mathbb {Z})$
, which is not co-compact [Reference Sarnak and Ubis6]. It was subsequently proved by the author that the almost-primes equidistribute for all lattices
$\Gamma =PSL_2(\mathbb {Z})$
, which is not co-compact [Reference Sarnak and Ubis6]. It was subsequently proved by the author that the almost-primes equidistribute for all lattices 
 $\Gamma $
 in
$\Gamma $
 in 
 $PSL_2(\mathbb {R})$
 [Reference Streck7].
$PSL_2(\mathbb {R})$
 [Reference Streck7].
 In this paper, the equidistribution of 
 $p h(n^{1+\gamma })$
 is established for small
$p h(n^{1+\gamma })$
 is established for small 
 $\gamma $
 in the setting of a general lattice. This generalises Venkatesh’s result from co-compact
$\gamma $
 in the setting of a general lattice. This generalises Venkatesh’s result from co-compact 
 $\Gamma $
 to all lattices
$\Gamma $
 to all lattices 
 $\Gamma $
 in
$\Gamma $
 in 
 $PSL_2(\mathbb {R})$
 and makes (modest) progress on the conjecture of Shah.
$PSL_2(\mathbb {R})$
 and makes (modest) progress on the conjecture of Shah.
 We make this precise in the result below, which is the main result of this paper. For this, we need some more notation and start by defining the metric 
 $d_X$
. The group
$d_X$
. The group 
 $G=PSL_2(\mathbb {R})$
 comes with a natural left-invariant metric
$G=PSL_2(\mathbb {R})$
 comes with a natural left-invariant metric 
 $d_G$
 (see for example [Reference Einsiedler and Ward2, Ch. 9]). This metric descends to X via
$d_G$
 (see for example [Reference Einsiedler and Ward2, Ch. 9]). This metric descends to X via 
 $d_X(\Gamma g, \Gamma h):=\inf _{\gamma \in \Gamma } d_G(g, \gamma h)$
. We also fix a point
$d_X(\Gamma g, \Gamma h):=\inf _{\gamma \in \Gamma } d_G(g, \gamma h)$
. We also fix a point 
 $p_0 \in X$
 and define
$p_0 \in X$
 and define 
 $\mathrm {dist}(p):=d_X(p, p_0)$
.
$\mathrm {dist}(p):=d_X(p, p_0)$
.
 For two functions 
 $f, g \colon U \to \mathbb {R}$
, we write
$f, g \colon U \to \mathbb {R}$
, we write 
 $f \ll g$
 or
$f \ll g$
 or 
 $f=O(g)$
 if there is a constant C such that
$f=O(g)$
 if there is a constant C such that 
 $|f(x)| \leq C |g(x)|$
 for all
$|f(x)| \leq C |g(x)|$
 for all 
 $x \in U$
, where U is some domain. In this paper, this constant C implicit in the definition is always allowed to depend on the lattice
$x \in U$
, where U is some domain. In this paper, this constant C implicit in the definition is always allowed to depend on the lattice 
 $\Gamma $
 and the choice of
$\Gamma $
 and the choice of 
 $\gamma $
, but nothing else. We write
$\gamma $
, but nothing else. We write 
 $f \sim g$
 if both
$f \sim g$
 if both 
 $f \ll g$
 and
$f \ll g$
 and 
 $g \ll f$
.
$g \ll f$
.
 For a function 
 $f \in C^4(X)$
, let
$f \in C^4(X)$
, let 
 $\Vert f \Vert _{W^4}$
 be its Sobolev norm in the Hilbert space
$\Vert f \Vert _{W^4}$
 be its Sobolev norm in the Hilbert space 
 $W^{4, 2}$
 involving the fourth derivative and let
$W^{4, 2}$
 involving the fourth derivative and let 
 $\Vert f \Vert _{\infty , j}$
 be the supremum norm of the jth derivatives. Define
$\Vert f \Vert _{\infty , j}$
 be the supremum norm of the jth derivatives. Define 
 $$ \begin{align*} \Vert f \Vert:=\Vert f \Vert_{W^4}+\Vert f \Vert_{\infty, 1}+\Vert f \Vert_{\infty, 0}; \end{align*} $$
$$ \begin{align*} \Vert f \Vert:=\Vert f \Vert_{W^4}+\Vert f \Vert_{\infty, 1}+\Vert f \Vert_{\infty, 0}; \end{align*} $$
this norm is the same one Strömbergsson used to show his equidistribution result [Reference Strömbergsson8]. In his result, the equidistribution properties of a horocycle piece 
 $\{p h(t), 0 \leq t \leq T\}$
 are measured in terms of the parameter
$\{p h(t), 0 \leq t \leq T\}$
 are measured in terms of the parameter 
 $$ \begin{align*} r(p, T):=T \exp(-\mathrm{dist}(g_{\log T}(p))), \end{align*} $$
$$ \begin{align*} r(p, T):=T \exp(-\mathrm{dist}(g_{\log T}(p))), \end{align*} $$
which will be an important quantity to measure the equidistribution properties throughout this paper; its significance and role in the proof will be discussed below in more detail. It is well known that 
 $r(p, T) \to \infty $
 as
$r(p, T) \to \infty $
 as 
 $T \to \infty $
 for any non-periodic p.
$T \to \infty $
 for any non-periodic p.
 We let 
 $\beta $
 be the constant in Theorem 1.2; it ultimately comes from the rate of effective mixing. The constant in Theorem 1.1 can be taken to be
$\beta $
 be the constant in Theorem 1.2; it ultimately comes from the rate of effective mixing. The constant in Theorem 1.1 can be taken to be 
 $c={\beta }/{600}$
.
$c={\beta }/{600}$
.
Theorem 1.1. For any lattice 
 $\Gamma \subset PSL_2(\mathbb {R})$
, there is a constant
$\Gamma \subset PSL_2(\mathbb {R})$
, there is a constant 
 $c=c(\Gamma )>0$
 such that for any
$c=c(\Gamma )>0$
 such that for any 
 $0 \leq \gamma \leq c$
, any non-periodic
$0 \leq \gamma \leq c$
, any non-periodic 
 $p \in X$
 and any function
$p \in X$
 and any function 
 $f \in C^4(X)$
 with
$f \in C^4(X)$
 with 
 $\Vert f \Vert =1$
,
$\Vert f \Vert =1$
, 
 $$ \begin{align*} \bigg| \frac{1}{T} \sum_{n \leq T} f(p h(n^{1+\gamma})) -\int f \; d\mu_X \bigg| \ll r^{-{\beta}/{4}}, \end{align*} $$
$$ \begin{align*} \bigg| \frac{1}{T} \sum_{n \leq T} f(p h(n^{1+\gamma})) -\int f \; d\mu_X \bigg| \ll r^{-{\beta}/{4}}, \end{align*} $$
where 
 $r=r(p, T^{1+\gamma })$
.
$r=r(p, T^{1+\gamma })$
.
 To prove Theorem 1.1, we will split the range into different intervals and use Taylor expansion on each one. On an interval 
 $[T_0, T_1]$
, the function
$[T_0, T_1]$
, the function 
 $t^{1+\gamma }$
 will be approximately equal to
$t^{1+\gamma }$
 will be approximately equal to 
 $T_0^{1+\gamma } + (1+\gamma ) T_0^\gamma (t-T_0)$
, provided that
$T_0^{1+\gamma } + (1+\gamma ) T_0^\gamma (t-T_0)$
, provided that 
 $T_0$
 is not too small and that the range is not too long. The question thus becomes how well
$T_0$
 is not too small and that the range is not too long. The question thus becomes how well 
 $ph(ns)$
 for
$ph(ns)$
 for 
 $s \sim T^{\gamma }$
 equidistributes. To control these sparse arithmetic sequences, we need two results.
$s \sim T^{\gamma }$
 equidistributes. To control these sparse arithmetic sequences, we need two results.
The first one is the following theorem, which is a straightforward consequence of combining Strömbergsson’s equidistribution result [Reference Strömbergsson8] with Venkatesh’s method [Reference Venkatesh9], as performed, for example, by Zheng [Reference Zheng10].
Theorem 1.2. [Reference Zheng10, Theorem 1.2]
 Let 
 $\Gamma $
 be a non-compact lattice in G. Let
$\Gamma $
 be a non-compact lattice in G. Let 
 $f \in C^4(X)$
 with
$f \in C^4(X)$
 with 
 $\Vert f \Vert < \infty $
 and
$\Vert f \Vert < \infty $
 and 
 $1 \leq s<T$
. Then,
$1 \leq s<T$
. Then, 

for any initial point 
 $p \in X$
, where
$p \in X$
, where 
 $r=r(p, T)$
. The parameter
$r=r(p, T)$
. The parameter 
 $\tfrac 16>\beta >0$
 and the implied constant depend only on
$\tfrac 16>\beta >0$
 and the implied constant depend only on 
 $\Gamma $
.
$\Gamma $
.
 In the cases where r is big compared with T (say 
 $r \geq T^\varepsilon $
 for some absolute
$r \geq T^\varepsilon $
 for some absolute 
 $\epsilon $
), this theorem in itself is enough to show equidistribution of the sequence
$\epsilon $
), this theorem in itself is enough to show equidistribution of the sequence 
 $ph(n^{1+\gamma })$
.
$ph(n^{1+\gamma })$
.
The result below will be used to deal with the case in which the equidistribution is bad. It was proved by the author in [Reference Streck7] to show equidistribution of almost-primes. Its proof uses ideas of Sarnak and Ubis [Reference Sarnak and Ubis6] and has parallels to [Reference Strömbergsson8], whose proof in turn uses ideas going back to Marina Ratner. This result encompasses the dichotomy mentioned in the abstract.
Lemma 1.3. [Reference Streck7, Lemma 1.3]
 Let 
 $\Gamma $
 be a lattice in
$\Gamma $
 be a lattice in 
 $G=PSL_2(\mathbb {R})$
 and let
$G=PSL_2(\mathbb {R})$
 and let 
 $X=\Gamma \backslash G$
. Let
$X=\Gamma \backslash G$
. Let 
 $p \in X$
 and
$p \in X$
 and 
 $T \geq 0$
. Let
$T \geq 0$
. Let 
 $\delta>0$
 and
$\delta>0$
 and 
 $K \leq T$
.
$K \leq T$
.
 There is an interval 
 $I_0 \subset [0,T]$
 of size
$I_0 \subset [0,T]$
 of size 
 $|I_0| \leq \delta ^{-1} K^2$
 such that: for all
$|I_0| \leq \delta ^{-1} K^2$
 such that: for all 
 $t_0 \in [0,T] \backslash I_0$
, there is a segment
$t_0 \in [0,T] \backslash I_0$
, there is a segment 
 $\{\xi h(t), t \leq K\}$
 of a closed horocycle approximating
$\{\xi h(t), t \leq K\}$
 of a closed horocycle approximating 
 $\{ph(t_0+t), 0 \leq t \leq K\}$
 of order
$\{ph(t_0+t), 0 \leq t \leq K\}$
 of order 
 $\delta $
, in the sense that
$\delta $
, in the sense that 
 $$ \begin{align*} \text{ for all } 0 \leq t \leq K, \quad d_X(ph(t_0+t), \xi h(t)) \leq \delta. \end{align*} $$
$$ \begin{align*} \text{ for all } 0 \leq t \leq K, \quad d_X(ph(t_0+t), \xi h(t)) \leq \delta. \end{align*} $$
The period 
 $P=P(t_0, p)$
 of this closed horocycle is at most
$P=P(t_0, p)$
 of this closed horocycle is at most 
 $ P \ll r(p, T)$
.
$ P \ll r(p, T)$
.
 Moreover, one can assure 
 $P \gg \eta ^2 r$
 for some
$P \gg \eta ^2 r$
 for some 
 $\eta>0$
 by weakening the bound on
$\eta>0$
 by weakening the bound on 
 $I_0$
 to
$I_0$
 to 
 $|I_0| \leq \max (\delta ^{-1} K^2, \eta T)$
.
$|I_0| \leq \max (\delta ^{-1} K^2, \eta T)$
.
2 On the behaviour of the equidistribution parameter in Theorem 1.2
Except for Lemma 1.3 itself, we will also need some of the other material in [Reference Streck7, Ch. 4] to prove Theorem 1.1. We recall some of the material, going slightly beyond what is presented in [Reference Streck7].
 It is well known that 
 $G \cong T_1 \mathbb {H}$
, where
$G \cong T_1 \mathbb {H}$
, where 
 $\mathbb {H}$
 is the upper half-plane with the hyperbolic metric. Then,
$\mathbb {H}$
 is the upper half-plane with the hyperbolic metric. Then, 
 $X=\Gamma \backslash G$
 has as fundamental domain a set
$X=\Gamma \backslash G$
 has as fundamental domain a set 
 $T_1F$
, where F is a geodesic polygon in
$T_1F$
, where F is a geodesic polygon in 
 $\mathbb {H}$
, that is, a polygon with finitely many vertices with the edges being pieces of geodesics [Reference Einsiedler and Ward2]. This fundamental polygon F has finitely many vertices touching the boundary of the upper half-plane, either at the axis with imaginary part equal to zero or at infinity. After identifying vertices that are in the same orbit under the action of
$\mathbb {H}$
, that is, a polygon with finitely many vertices with the edges being pieces of geodesics [Reference Einsiedler and Ward2]. This fundamental polygon F has finitely many vertices touching the boundary of the upper half-plane, either at the axis with imaginary part equal to zero or at infinity. After identifying vertices that are in the same orbit under the action of 
 $\Gamma $
, one gets the cusps of X, which we will denote by
$\Gamma $
, one gets the cusps of X, which we will denote by 
 $r_1, \ldots , r_n$
. Any such cusp
$r_1, \ldots , r_n$
. Any such cusp 
 $r_i$
 is in 1–1 correspondence to an element
$r_i$
 is in 1–1 correspondence to an element 
 $\gamma _i \in \Gamma $
 with the property that
$\gamma _i \in \Gamma $
 with the property that 
 $\gamma _i$
 fixes
$\gamma _i$
 fixes 
 $r_i$
 and that
$r_i$
 and that 
 $\gamma _i$
 is conjugated to
$\gamma _i$
 is conjugated to 
 $h(1)$
 (see [Reference Streck7, Lemma 3.1]). For each cusp, there are elements
$h(1)$
 (see [Reference Streck7, Lemma 3.1]). For each cusp, there are elements 
 $\sigma _i \in G$
 such that
$\sigma _i \in G$
 such that 
 $\sigma _i r_i=\infty $
 and
$\sigma _i r_i=\infty $
 and 
 $\sigma _i \gamma _i \sigma _i^{-1}=h(1)$
.
$\sigma _i \gamma _i \sigma _i^{-1}=h(1)$
.
 For 
 $g \in G$
, we define
$g \in G$
, we define 
 $Y^0_i(g):=\mathrm {Im}(\sigma _i g)$
, where
$Y^0_i(g):=\mathrm {Im}(\sigma _i g)$
, where 
 $$ \begin{align*} \mathrm{Im}\left(\begin{pmatrix} a & b \\ c & d \end{pmatrix} \right):=\frac{1}{c^2+d^2} \end{align*} $$
$$ \begin{align*} \mathrm{Im}\left(\begin{pmatrix} a & b \\ c & d \end{pmatrix} \right):=\frac{1}{c^2+d^2} \end{align*} $$
is the imaginary part of the the matrix projected to 
 $\mathbb {H}$
. We also set for
$\mathbb {H}$
. We also set for 
 $p=\Gamma g_p \in X$
,
$p=\Gamma g_p \in X$
, 
 $y_i^0(p):=\max _{\gamma \in \Gamma } Y^0_i(\gamma g_p)$
.
$y_i^0(p):=\max _{\gamma \in \Gamma } Y^0_i(\gamma g_p)$
.
 It was shown in [Reference Streck7, Lemma 4.1] that there exist disjoint neighbourhoods 
 $C_i \subset X$
 of each cusp
$C_i \subset X$
 of each cusp 
 $r_i$
 with
$r_i$
 with 
 $K=X \backslash \cup C_i$
 being compact such that for any
$K=X \backslash \cup C_i$
 being compact such that for any 
 $p \in C_i$
,
$p \in C_i$
, 
 $\exp (\mathrm {dist}(p)) \sim y_i^0(p)$
 (while of course
$\exp (\mathrm {dist}(p)) \sim y_i^0(p)$
 (while of course 
 $\exp (\mathrm {dist}(p)) \sim 1$
 for
$\exp (\mathrm {dist}(p)) \sim 1$
 for 
 $p \in K$
). Arguing as in the [Reference Streck7, proof of 1 in Lemma 4.1], one also sees that if
$p \in K$
). Arguing as in the [Reference Streck7, proof of 1 in Lemma 4.1], one also sees that if 
 $p=\Gamma g_p \in C_i$
 and
$p=\Gamma g_p \in C_i$
 and 
 $g_p$
 is such that
$g_p$
 is such that 
 $Y_i^0(g_p)=y_i^0(p)$
, then for any
$Y_i^0(g_p)=y_i^0(p)$
, then for any 
 $\gamma \in \Gamma $
, either
$\gamma \in \Gamma $
, either 
 $Y_i^0(\gamma g_p) \ll 1$
 or
$Y_i^0(\gamma g_p) \ll 1$
 or 
 $Y_i^0(\gamma g_p)=Y_i^0(g_p)$
 (which is the case in which
$Y_i^0(\gamma g_p)=Y_i^0(g_p)$
 (which is the case in which 
 ${\sigma _i \gamma g_p=h(n) \sigma _i g_p}$
 and
${\sigma _i \gamma g_p=h(n) \sigma _i g_p}$
 and 
 $\gamma =(\gamma _i)^n$
 for some n). This implies in particular that there is an absolute constant
$\gamma =(\gamma _i)^n$
 for some n). This implies in particular that there is an absolute constant 
 $C=C(\Gamma )$
 such that if
$C=C(\Gamma )$
 such that if 
 $g_p$
 is such that
$g_p$
 is such that 
 $Y_i^0(g_p) \geq C$
, then
$Y_i^0(g_p) \geq C$
, then 
 $$ \begin{align*} Y_i^0(g_p) \sim y_i^0(p) \sim\exp(\mathrm{dist}(p)), \end{align*} $$
$$ \begin{align*} Y_i^0(g_p) \sim y_i^0(p) \sim\exp(\mathrm{dist}(p)), \end{align*} $$
where the second equivalence holds because 
 $y_i^0(p) \geq C$
 implies that
$y_i^0(p) \geq C$
 implies that 
 $p \in C_i$
 for C sufficiently big.
$p \in C_i$
 for C sufficiently big.
From the expression above, the reader sees the relation to the equidistribution parameters
 $$ \begin{align*} r(q, K):=K \exp(-\mathrm{dist}(g_{\log K}(q))) \end{align*} $$
$$ \begin{align*} r(q, K):=K \exp(-\mathrm{dist}(g_{\log K}(q))) \end{align*} $$
appearing in Theorems 1.1 and 1.2.
Observation 2.1. There is an absolute 
 $c_0=c_0(\Gamma )>0$
 such that for any T and any p, if there is a representative
$c_0=c_0(\Gamma )>0$
 such that for any T and any p, if there is a representative 
 $g_p$
 of p and an i such that for
$g_p$
 of p and an i such that for 
 $\sigma _i g_p=:(\begin {smallmatrix} a & b \\ c & d \end {smallmatrix})$
,
$\sigma _i g_p=:(\begin {smallmatrix} a & b \\ c & d \end {smallmatrix})$
, 
 $\max (T^2 c^2, d^2) \leq c_0 T$
, then
$\max (T^2 c^2, d^2) \leq c_0 T$
, then 
 $r(p, T) \sim \max (T^2 c^2, d^2)$
.
$r(p, T) \sim \max (T^2 c^2, d^2)$
.
Proof. We have that
 $$ \begin{align*} 2 \max(T^2 c^2, d^2) \geq T (c^2T+d^2T^{-1})=Y_i^0(g_{\log T}(g_p))^{-1} T. \end{align*} $$
$$ \begin{align*} 2 \max(T^2 c^2, d^2) \geq T (c^2T+d^2T^{-1})=Y_i^0(g_{\log T}(g_p))^{-1} T. \end{align*} $$
Thus, 
 $Y_i^0(g_{\log T}(g_p)) \geq \tfrac 12 c_0^{-1}$
, which shows that
$Y_i^0(g_{\log T}(g_p)) \geq \tfrac 12 c_0^{-1}$
, which shows that 
 $$ \begin{align*} \exp(\mathrm{dist}(g_{\log T}(p))) \sim Y_i^0(g_{\log T}(g_p)) \end{align*} $$
$$ \begin{align*} \exp(\mathrm{dist}(g_{\log T}(p))) \sim Y_i^0(g_{\log T}(g_p)) \end{align*} $$
by the argument above, provided that 
 $c_0$
 is sufficiently small.
$c_0$
 is sufficiently small.
3 Proof of Theorem 1.1
 We start by approximating 
 $t^{1+\gamma }$
 with sparse arithmetic sequences. More precisely, we write
$t^{1+\gamma }$
 with sparse arithmetic sequences. More precisely, we write 
 $$ \begin{align*} t^{1+\gamma}=T_0^{1+\gamma}+(1+\gamma) T_0^\gamma (t-T_0)+O(T^{-{1}/{6}}) \end{align*} $$
$$ \begin{align*} t^{1+\gamma}=T_0^{1+\gamma}+(1+\gamma) T_0^\gamma (t-T_0)+O(T^{-{1}/{6}}) \end{align*} $$
on 
 $[T_0, T_0+T^{1/3}]$
 for
$[T_0, T_0+T^{1/3}]$
 for 
 $T_0 \geq T^{5/6}$
 using Taylor expansion.
$T_0 \geq T^{5/6}$
 using Taylor expansion.
 We will split into several cases. To govern in which case we are, we fix some 
 $\varepsilon>0$
 and impose that
$\varepsilon>0$
 and impose that 
 $\gamma < ({\varepsilon \beta }/{6})$
. We will see at the end which value of
$\gamma < ({\varepsilon \beta }/{6})$
. We will see at the end which value of 
 $\varepsilon $
 makes everything work (which will turn out to be
$\varepsilon $
 makes everything work (which will turn out to be 
 $\varepsilon ={1}/{100})$
.
$\varepsilon ={1}/{100})$
.
To apply the results about sparse equidistribution, we are thus tasked with evaluating expressions of the form
 $$ \begin{align*} \bigg| \frac{1}{K} \sum_{n \leq K} f(q h((1+\gamma) T_0^\gamma n )) -\int f \; d\mu_X \bigg| \end{align*} $$
$$ \begin{align*} \bigg| \frac{1}{K} \sum_{n \leq K} f(q h((1+\gamma) T_0^\gamma n )) -\int f \; d\mu_X \bigg| \end{align*} $$
for 
 $q=ph(T_0^{1+\gamma })$
 and
$q=ph(T_0^{1+\gamma })$
 and 
 $T^{1/6} \leq K \leq T^{1/3}$
, given some
$T^{1/6} \leq K \leq T^{1/3}$
, given some 
 $T_0 \leq T$
. In the case where
$T_0 \leq T$
. In the case where 
 $r(q, K) \geq T^\varepsilon $
, Theorem 1.2 is enough to deduce good equidistribution.
$r(q, K) \geq T^\varepsilon $
, Theorem 1.2 is enough to deduce good equidistribution.
 If 
 $r(q, K) \leq T^\varepsilon $
, then
$r(q, K) \leq T^\varepsilon $
, then 
 $g_{\log K}(q)$
 must lie in the neighbourhood
$g_{\log K}(q)$
 must lie in the neighbourhood 
 $C_i$
 of some cusp
$C_i$
 of some cusp 
 $r_i$
, as explained in the previous section. In this case, there is a (essentially unique) representative
$r_i$
, as explained in the previous section. In this case, there is a (essentially unique) representative 
 $g_q$
 of q such that
$g_q$
 of q such that 
 $r(q, K) \sim \max (K^2 c^2, d^2)$
, where we set
$r(q, K) \sim \max (K^2 c^2, d^2)$
, where we set 
 $$ \begin{align*} \begin{pmatrix} a & b \\ c & d \end{pmatrix}:=\sigma_i g_q, \end{align*} $$
$$ \begin{align*} \begin{pmatrix} a & b \\ c & d \end{pmatrix}:=\sigma_i g_q, \end{align*} $$
now and for the next couple of pages.
One then has to split into two more cases. The distinction between these cases is governed by
 $$ \begin{align*} W_q:=\bigg| \frac{d}{c} \bigg|. \end{align*} $$
$$ \begin{align*} W_q:=\bigg| \frac{d}{c} \bigg|. \end{align*} $$
The relevance of this 
 $W_q$
 is that it measures the time it takes until one gets from bad to good equidistribution again. More precisely, by Observation 2.1,
$W_q$
 is that it measures the time it takes until one gets from bad to good equidistribution again. More precisely, by Observation 2.1, 
 $$ \begin{align} r(q, K) \sim \begin{cases} d^2,& K \leq W_q, \\ d^2 \dfrac{K^2}{W_q^2},& K \geq W_q, \end{cases} \end{align} $$
$$ \begin{align} r(q, K) \sim \begin{cases} d^2,& K \leq W_q, \\ d^2 \dfrac{K^2}{W_q^2},& K \geq W_q, \end{cases} \end{align} $$
as long as 
 $r(q, K) \leq c_0 K$
.
$r(q, K) \leq c_0 K$
.
 This means that even if q and K are such that 
 $r(q, K) \leq T^{\varepsilon }$
, one has that
$r(q, K) \leq T^{\varepsilon }$
, one has that 
 $r(q, T^\varepsilon W_q) \geq ~T^{2\varepsilon }$
. Together with Theorem 1.2, this will be good enough to show effective equidistribution under all assumptions except for those of Proposition 3.1. Under those assumptions, which encompass the most interesting case, almost the entire horocycle orbit
$r(q, T^\varepsilon W_q) \geq ~T^{2\varepsilon }$
. Together with Theorem 1.2, this will be good enough to show effective equidistribution under all assumptions except for those of Proposition 3.1. Under those assumptions, which encompass the most interesting case, almost the entire horocycle orbit 
 $\{ph(t), t \leq T^{1+\gamma }\}$
 is close to periodic horocycle orbits of small period. In this case, one will need Lemma 1.3 to conclude.
$\{ph(t), t \leq T^{1+\gamma }\}$
 is close to periodic horocycle orbits of small period. In this case, one will need Lemma 1.3 to conclude.
Proposition 3.1. Let 
 $\Gamma $
 and
$\Gamma $
 and 
 $\gamma <c$
 be as in Theorem 1.1, and let
$\gamma <c$
 be as in Theorem 1.1, and let 
 $\varepsilon ={1}/{100}$
. Let
$\varepsilon ={1}/{100}$
. Let 
 $p \in X$
 and T be such that
$p \in X$
 and T be such that 
 $r(p, T^{1+\gamma }) \leq T^{4\varepsilon }$
 and
$r(p, T^{1+\gamma }) \leq T^{4\varepsilon }$
 and 
 $W_p \geq T^{1-\varepsilon }$
. Then, for f as in Theorem 1.1,
$W_p \geq T^{1-\varepsilon }$
. Then, for f as in Theorem 1.1, 
 $$ \begin{align*} \bigg|\frac{1}{T} \sum_{n \leq T} f(ph(n^{1+\gamma})) - \int f \; d\mu_X \bigg| \ll r^{-{\beta}/{4}}. \end{align*} $$
$$ \begin{align*} \bigg|\frac{1}{T} \sum_{n \leq T} f(ph(n^{1+\gamma})) - \int f \; d\mu_X \bigg| \ll r^{-{\beta}/{4}}. \end{align*} $$
To prove Theorem 1.1, we will first show how one can reduce its proof to Proposition 3.1 using Observation 2.1 and Theorem 1.2. We will then prove Proposition 3.1.
Proof of Theorem 1.1 assuming Proposition 3.1
 Say we are given some 
 $t_0$
 and set
$t_0$
 and set 
 ${q=ph(t_0^{1+\gamma })}$
. If
${q=ph(t_0^{1+\gamma })}$
. If 
 $r:=r(q, T^{1/6}) \geq T^{\varepsilon }$
, then we know by Theorem 1.2 that for any f with
$r:=r(q, T^{1/6}) \geq T^{\varepsilon }$
, then we know by Theorem 1.2 that for any f with 
 $\Vert f \Vert \leq 1$
,
$\Vert f \Vert \leq 1$
, 
 $$ \begin{align*} \bigg| \frac{1}{T^{{1}/{6}}} \sum_{n \leq T^{{1}/{6}}} f(q h((1+\gamma) t_0^\gamma n )) -\int f \; d\mu_X \bigg| \ll T^{{\gamma}/{2}} r^{-{\beta}/{2}} \leq r^{-{\beta}/{4}}, \end{align*} $$
$$ \begin{align*} \bigg| \frac{1}{T^{{1}/{6}}} \sum_{n \leq T^{{1}/{6}}} f(q h((1+\gamma) t_0^\gamma n )) -\int f \; d\mu_X \bigg| \ll T^{{\gamma}/{2}} r^{-{\beta}/{2}} \leq r^{-{\beta}/{4}}, \end{align*} $$
where we recall 
 $\gamma \leq ({\varepsilon \beta }/{6})$
. We are thus done unless there is a q such that
$\gamma \leq ({\varepsilon \beta }/{6})$
. We are thus done unless there is a q such that 
 ${r=r(q, T^{1/6}) \leq T^{\varepsilon }}$
. As we saw in §2, then with c and d as defined previously,
${r=r(q, T^{1/6}) \leq T^{\varepsilon }}$
. As we saw in §2, then with c and d as defined previously, 
 $$ \begin{align} r \sim \max(T^{{2}/{6}} c^2, d^2). \end{align} $$
$$ \begin{align} r \sim \max(T^{{2}/{6}} c^2, d^2). \end{align} $$
If 
 $c^2T^{2/6}$
 attains the maximum in (3.2), or equivalently, if
$c^2T^{2/6}$
 attains the maximum in (3.2), or equivalently, if 
 $W_q \leq T^{1/6}$
, then
$W_q \leq T^{1/6}$
, then 
 $r(q, T^{1/4}) \sim T^{1/6} r \geq T^{1/6}$
 by (3.1) and we are done by Theorem 1.2. We can thus assume
$r(q, T^{1/4}) \sim T^{1/6} r \geq T^{1/6}$
 by (3.1) and we are done by Theorem 1.2. We can thus assume 
 $W_q \geq T^{1/6}$
. The following claim shows how one can improve the lower bound on
$W_q \geq T^{1/6}$
. The following claim shows how one can improve the lower bound on 
 $W_q$
 further.
$W_q$
 further.
Claim 3.2. Let 
 $q=p h(t_0^{1+\gamma })$
 such that
$q=p h(t_0^{1+\gamma })$
 such that 
 $r \leq T^\varepsilon $
. Set
$r \leq T^\varepsilon $
. Set 
 $W:=W_q$
. If
$W:=W_q$
. If 
 $W \leq T^{1-\varepsilon }$
, then for
$W \leq T^{1-\varepsilon }$
, then for 
 $K=W^{1+\varepsilon }$
 and for f with
$K=W^{1+\varepsilon }$
 and for f with 
 $\Vert f \Vert \leq 1$
,
$\Vert f \Vert \leq 1$
, 
 $$ \begin{align*} \bigg| \frac{1}{K} \sum_{0 \leq n \leq K} f(p h((t_0+n)^{1+\gamma})) -\int f \; d\mu_X \bigg| \ll r^{-{\beta}/{4}}. \end{align*} $$
$$ \begin{align*} \bigg| \frac{1}{K} \sum_{0 \leq n \leq K} f(p h((t_0+n)^{1+\gamma})) -\int f \; d\mu_X \bigg| \ll r^{-{\beta}/{4}}. \end{align*} $$
Proof of Claim 3.2
 Fix some 
 $W^{1+\varepsilon } \geq s \geq W^{1+{\varepsilon }/{2}}$
 and note that then
$W^{1+\varepsilon } \geq s \geq W^{1+{\varepsilon }/{2}}$
 and note that then 
 $c^2s^2 \sim W^{-2} s^2 d^2 \gg d^2$
. Thus,
$c^2s^2 \sim W^{-2} s^2 d^2 \gg d^2$
. Thus, 
 $$ \begin{align*} r(qh(s), T^{{1}/{3}}) &\sim \max (T^{{2}/{6}} c^2, (d+cs)^2 ) \sim \max (T^{{2}/{6}} c^2, c^2 s^2 )\\ &=c^2 s^2 \sim \bigg(\frac{s}{W}\bigg)^2 r \geq r W^\varepsilon \geq r T^{{\varepsilon}/{6}}, \end{align*} $$
$$ \begin{align*} r(qh(s), T^{{1}/{3}}) &\sim \max (T^{{2}/{6}} c^2, (d+cs)^2 ) \sim \max (T^{{2}/{6}} c^2, c^2 s^2 )\\ &=c^2 s^2 \sim \bigg(\frac{s}{W}\bigg)^2 r \geq r W^\varepsilon \geq r T^{{\varepsilon}/{6}}, \end{align*} $$
where the first equivalence is due to Observation 2.1, which is applicable because 
 $ ({s}/{W})^2 r \ll T^{3\varepsilon }$
. Applying Theorem 1.2 shows that
$ ({s}/{W})^2 r \ll T^{3\varepsilon }$
. Applying Theorem 1.2 shows that 
 $$ \begin{align*} &\bigg| \frac{1}{T^{{1}/{3}}} \sum_{n \leq T^{{1}/{3}}} f(p h(t_0^{1+\gamma}+s) h((1+\gamma) (t_0+s)^\gamma n )) -\int f \; d\mu_X \bigg| \\ &\quad\ll T^{{\gamma}/{2}} T^{-{\epsilon \beta}/{12}} r^{-{\beta}/{2}}\leq r^{-{\beta}/{2}}. \end{align*} $$
$$ \begin{align*} &\bigg| \frac{1}{T^{{1}/{3}}} \sum_{n \leq T^{{1}/{3}}} f(p h(t_0^{1+\gamma}+s) h((1+\gamma) (t_0+s)^\gamma n )) -\int f \; d\mu_X \bigg| \\ &\quad\ll T^{{\gamma}/{2}} T^{-{\epsilon \beta}/{12}} r^{-{\beta}/{2}}\leq r^{-{\beta}/{2}}. \end{align*} $$
Now, we use Taylor approximation as above to split the orbit of 
 $(t_0+n)^\gamma $
 with
$(t_0+n)^\gamma $
 with 
 $n \leq K$
 into different ranges
$n \leq K$
 into different ranges 
 $[s, s+T^{1/3}]$
 and note that for all but a
$[s, s+T^{1/3}]$
 and note that for all but a 
 $W^{-{\varepsilon }/{2}} T^{\gamma }$
 proportion of s, one has
$W^{-{\varepsilon }/{2}} T^{\gamma }$
 proportion of s, one has 
 $(t_0+s)^{1+\gamma }-t_0^{1+\gamma } \geq W^{1+{\varepsilon }/{2}}$
. As
$(t_0+s)^{1+\gamma }-t_0^{1+\gamma } \geq W^{1+{\varepsilon }/{2}}$
. As 
 $W^{-{\varepsilon }/{2}} T^{\gamma } \leq T^{-{\varepsilon }/{12}} \leq r^{-{\beta }/{4}}$
, the claim is shown.
$W^{-{\varepsilon }/{2}} T^{\gamma } \leq T^{-{\varepsilon }/{12}} \leq r^{-{\beta }/{4}}$
, the claim is shown.
 We have thus shown the conclusion of Theorem 1.1 unless there is a 
 $q=p h(t_0)$
 such that
$q=p h(t_0)$
 such that 
 $r(q, T^{1/6}) \leq T^\varepsilon $
 and
$r(q, T^{1/6}) \leq T^\varepsilon $
 and 
 $W_q \geq T^{1-\varepsilon }$
. We let c and d be as defined above and note that in the case considered,
$W_q \geq T^{1-\varepsilon }$
. We let c and d be as defined above and note that in the case considered, 
 $r(q, T^{1/6}) \sim \max (c^2 T^{1/3}, d^2)=d^2$
 by definition of
$r(q, T^{1/6}) \sim \max (c^2 T^{1/3}, d^2)=d^2$
 by definition of 
 $W_q$
. By (3.1), this implies that
$W_q$
. By (3.1), this implies that 
 $$ \begin{align*} r(q, T^{1+\gamma}) \ll d^2 \frac{T^{2(1+\gamma)}}{W^2_q}\ll T^{4\varepsilon}. \end{align*} $$
$$ \begin{align*} r(q, T^{1+\gamma}) \ll d^2 \frac{T^{2(1+\gamma)}}{W^2_q}\ll T^{4\varepsilon}. \end{align*} $$
Lastly, to get an error term in 
 $r(p, T^{1+\gamma })$
 instead of
$r(p, T^{1+\gamma })$
 instead of 
 $r(q, T^{1+\gamma })$
, we note that
$r(q, T^{1+\gamma })$
, we note that 
 $g_q h(-t_0^{1+\gamma })$
 is a representative of p and that because
$g_q h(-t_0^{1+\gamma })$
 is a representative of p and that because 
 $(d-ct_0^{1+\gamma })^2 \ll d^2 T^{2(1+\gamma )} W^{-2} \ll T^{4\varepsilon }$
,
$(d-ct_0^{1+\gamma })^2 \ll d^2 T^{2(1+\gamma )} W^{-2} \ll T^{4\varepsilon }$
, 
 $$ \begin{align*} r(p, T^{1+\gamma}) \sim \max(c^2 T^{2(1+\gamma)}, (d-ct_0)^2 ) \ll T^ {4 \varepsilon}, \end{align*} $$
$$ \begin{align*} r(p, T^{1+\gamma}) \sim \max(c^2 T^{2(1+\gamma)}, (d-ct_0)^2 ) \ll T^ {4 \varepsilon}, \end{align*} $$
where the first equivalence is due to Observation 2.1. We have thus reduced the proof of Theorem 1.1 to the assumptions of Proposition 3.1.
It now only remains to show Proposition 3.1, which is the main part of the proof of Theorem 1.1.
Proof of Proposition 3.1
 Let p and T be given such that 
 $r:=r(p, T^{1+\gamma }) \leq T^{4\varepsilon }$
 and
$r:=r(p, T^{1+\gamma }) \leq T^{4\varepsilon }$
 and 
 $W:=W_p \geq T^{1-\varepsilon }$
. Here,
$W:=W_p \geq T^{1-\varepsilon }$
. Here, 
 $W=| {d}/{c}|$
, with c and d as defined in Observation 2.1. We also let
$W=| {d}/{c}|$
, with c and d as defined in Observation 2.1. We also let 
 $g:=g_p$
 and
$g:=g_p$
 and 
 $\sigma _i$
 be as in Observation 2.1. We invoke Lemma 1.3 to split the orbit
$\sigma _i$
 be as in Observation 2.1. We invoke Lemma 1.3 to split the orbit 
 $[0, T^{1+\gamma }]$
 into pieces of length
$[0, T^{1+\gamma }]$
 into pieces of length 
 $K=T^{1/3}$
. As in [Reference Streck7, proof of Lemma 1.3 in Ch. 4], we now parametrize the orbit using the equation
$K=T^{1/3}$
. As in [Reference Streck7, proof of Lemma 1.3 in Ch. 4], we now parametrize the orbit using the equation 
 $$ \begin{align*} \sigma_i g h(W+s)=lh(s)=h\bigg(\alpha-\frac{Rs}{s^2+1}\bigg)a\bigg(\frac{R}{s^2+1}\bigg)k(-\mathrm{arccot} \; s), \end{align*} $$
$$ \begin{align*} \sigma_i g h(W+s)=lh(s)=h\bigg(\alpha-\frac{Rs}{s^2+1}\bigg)a\bigg(\frac{R}{s^2+1}\bigg)k(-\mathrm{arccot} \; s), \end{align*} $$
where 
 $l:=\sigma _i g h(W)=:(\alpha +iR, -i)$
 is the highest point of the horocycle orbit and
$l:=\sigma _i g h(W)=:(\alpha +iR, -i)$
 is the highest point of the horocycle orbit and 
 $$ \begin{align*} k(\theta)=\begin{pmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{pmatrix} \end{align*} $$
$$ \begin{align*} k(\theta)=\begin{pmatrix} \cos(\theta) & \sin(\theta) \\ -\sin(\theta) & \cos(\theta) \end{pmatrix} \end{align*} $$
is the (subsequently unimportant) rotation component.
 Given an 
 $M \leq T$
, we then have that
$M \leq T$
, we then have that 
 $p h(M^{1+\gamma }+t), t \leq T^{1/3}$
 is at distance at most
$p h(M^{1+\gamma }+t), t \leq T^{1/3}$
 is at distance at most 
 $O(T^{-1/6})$
 from the orbit on a periodic horocycle
$O(T^{-1/6})$
 from the orbit on a periodic horocycle 
 $\xi h(t), t \leq T^{1/3}$
 with its period being equal to
$\xi h(t), t \leq T^{1/3}$
 with its period being equal to 
 $y^{-1}$
, where
$y^{-1}$
, where 
 $$ \begin{align*} y:=\frac{R}{(M^{1+\gamma}-W)^2+1}. \end{align*} $$
$$ \begin{align*} y:=\frac{R}{(M^{1+\gamma}-W)^2+1}. \end{align*} $$
By the second clause in Lemma 1.3, we can assume 
 $r \gg y^{-1} \gg \delta ^2 r$
 except on an interval of proportion
$r \gg y^{-1} \gg \delta ^2 r$
 except on an interval of proportion 
 $\delta $
, where
$\delta $
, where 
 $\delta $
 is to be chosen later. Using Taylor approximation on
$\delta $
 is to be chosen later. Using Taylor approximation on 
 $t^{1+\gamma }$
, we thus want to bound
$t^{1+\gamma }$
, we thus want to bound 
 $$ \begin{align*} \bigg| \frac{(1+\gamma) M^\gamma}{T^{{1}/{3}}}\sum_{(1+\gamma) M^\gamma n \leq T^{{1}/{3}}} f(\xi h((1+\gamma)M^\gamma n)) - \int f \,d\mu_X \bigg|. \end{align*} $$
$$ \begin{align*} \bigg| \frac{(1+\gamma) M^\gamma}{T^{{1}/{3}}}\sum_{(1+\gamma) M^\gamma n \leq T^{{1}/{3}}} f(\xi h((1+\gamma)M^\gamma n)) - \int f \,d\mu_X \bigg|. \end{align*} $$
However, we may run into problems here: if for example 
 $y^{-1}=(1+\gamma )M^\gamma $
, the points do not equidistribute at all in the periodic horocycle. To deal with this and related obstructions, we proceed similarly to in [Reference Streck7, the proof of Claim 5.2]. For notational convenience, we set
$y^{-1}=(1+\gamma )M^\gamma $
, the points do not equidistribute at all in the periodic horocycle. To deal with this and related obstructions, we proceed similarly to in [Reference Streck7, the proof of Claim 5.2]. For notational convenience, we set 
 $s:=(1+\gamma ) M^\gamma $
. Let
$s:=(1+\gamma ) M^\gamma $
. Let 
 $ q \in \mathbb {N}$
 with
$ q \in \mathbb {N}$
 with 
 $y^{-1} \leq q \leq y s^{-1} T^{1/3}$
 be such that
$y^{-1} \leq q \leq y s^{-1} T^{1/3}$
 be such that 
 $$ \begin{align*} \bigg| s y - \frac{a}{q} \bigg| \leq \frac{y^{-1} s}{q T^{{1}/{3}}} \end{align*} $$
$$ \begin{align*} \bigg| s y - \frac{a}{q} \bigg| \leq \frac{y^{-1} s}{q T^{{1}/{3}}} \end{align*} $$
for some a coprime to q (such q exists by the pigeonhole principle). The problem case occurs if q is small compared with 
 $y^{-1}$
. If, however, q is sufficiently big, there are so many distinct points in the interval
$y^{-1}$
. If, however, q is sufficiently big, there are so many distinct points in the interval 
 $[0, y^{-1}]$
 that they cannot help being dense enough to approximate
$[0, y^{-1}]$
 that they cannot help being dense enough to approximate 
 $\int _0^{1} f(\xi h(t y^{-1})) \,dt$
 by force, as we show now.
$\int _0^{1} f(\xi h(t y^{-1})) \,dt$
 by force, as we show now.
Claim 3.3. If 
 $q \geq y^{-3}$
, then
$q \geq y^{-3}$
, then 
 $$ \begin{align*} \bigg| \frac{s}{T^{{1}/{3}}}\sum_{s n \leq T^{{1}/{3}}} f(\xi h(s n)) -\int_0^{1} f(\xi h(t y^{-1})) \,dt \bigg| \ll y \ll \delta^{-2} r^{-1}, \end{align*} $$
$$ \begin{align*} \bigg| \frac{s}{T^{{1}/{3}}}\sum_{s n \leq T^{{1}/{3}}} f(\xi h(s n)) -\int_0^{1} f(\xi h(t y^{-1})) \,dt \bigg| \ll y \ll \delta^{-2} r^{-1}, \end{align*} $$
where 
 $q, s, y$
 and
$q, s, y$
 and 
 $\xi $
 all depend on M.
$\xi $
 all depend on M.
Proof of Claim 3.3
 (The argument in the proof of this claim was suggested by Adrián Ubis.) We set 
 $F(t):=f(\xi h(t y^{-1}))$
, which is one periodic. Because the function f is
$F(t):=f(\xi h(t y^{-1}))$
, which is one periodic. Because the function f is 
 $1$
-Lipschitz with respect to the hyperbolic metric, the function F is
$1$
-Lipschitz with respect to the hyperbolic metric, the function F is 
 $y^{-1}$
-Lipschitz. We wish to show
$y^{-1}$
-Lipschitz. We wish to show 
 $$ \begin{align*} \bigg| \frac{s}{T^{{1}/{3}}} \sum_{sn \leq T^{{1}/{3}}} F(nsy)-\int_0^1 F(t) \,dt \bigg| \ll y. \end{align*} $$
$$ \begin{align*} \bigg| \frac{s}{T^{{1}/{3}}} \sum_{sn \leq T^{{1}/{3}}} F(nsy)-\int_0^1 F(t) \,dt \bigg| \ll y. \end{align*} $$
For this, we note that as for any n,
 $$ \begin{align*} \bigg|sny-n\frac{a}{q} \bigg| \leq n \frac{y^{-1} s}{q T^{{1}/{3}}}, \end{align*} $$
$$ \begin{align*} \bigg|sny-n\frac{a}{q} \bigg| \leq n \frac{y^{-1} s}{q T^{{1}/{3}}}, \end{align*} $$
we have
 $$ \begin{align*} \frac{s}{T^{{1}/{3}}} \sum_{sn \leq T^{{1}/{3}}} F(nsy) &= O\bigg(\frac{y^{-2}}{q}\bigg)+\frac{s}{T^{{1}/{3}}} \sum_{sn \leq T^{{1}/{3}}} F\bigg(n \frac{a}{q}\bigg)\\ &=O\bigg(\frac{y^{-2}}{q}\bigg)+O\bigg(\frac{qs}{T^{{1}/{3}}} \bigg)+\frac{1}{q} \sum_{j=0}^{q-1} F\bigg(\frac{ja}{q}\bigg) \end{align*} $$
$$ \begin{align*} \frac{s}{T^{{1}/{3}}} \sum_{sn \leq T^{{1}/{3}}} F(nsy) &= O\bigg(\frac{y^{-2}}{q}\bigg)+\frac{s}{T^{{1}/{3}}} \sum_{sn \leq T^{{1}/{3}}} F\bigg(n \frac{a}{q}\bigg)\\ &=O\bigg(\frac{y^{-2}}{q}\bigg)+O\bigg(\frac{qs}{T^{{1}/{3}}} \bigg)+\frac{1}{q} \sum_{j=0}^{q-1} F\bigg(\frac{ja}{q}\bigg) \end{align*} $$
by the periodicity of F. As a is coprime to q, it does not play a role in the last average and can be dropped. Furthermore, for any 
 $t \leq ({1}/{q})$
,
$t \leq ({1}/{q})$
, 
 $$ \begin{align*} F\bigg(\frac{j}{q}\bigg)=O\bigg(\frac{y^{-1}}{q}\bigg)+F\bigg(\frac{j}{q}+t\bigg), \end{align*} $$
$$ \begin{align*} F\bigg(\frac{j}{q}\bigg)=O\bigg(\frac{y^{-1}}{q}\bigg)+F\bigg(\frac{j}{q}+t\bigg), \end{align*} $$
so
 $$ \begin{align*} \frac{1}{q} \sum_{j=0}^{q-1} F\bigg(\frac{j}{q}\bigg) &= O\bigg(\frac{y^{-1}}{q}\bigg)+\frac{1}{q} \sum_{j=0}^{q-1} \int_0^1 F\bigg(\frac{j+t}{q} \bigg) \,dt \\ &= O\bigg(\frac{y^{-1}}{q}\bigg)+\int_0^1 F(t) \,dt. \end{align*} $$
$$ \begin{align*} \frac{1}{q} \sum_{j=0}^{q-1} F\bigg(\frac{j}{q}\bigg) &= O\bigg(\frac{y^{-1}}{q}\bigg)+\frac{1}{q} \sum_{j=0}^{q-1} \int_0^1 F\bigg(\frac{j+t}{q} \bigg) \,dt \\ &= O\bigg(\frac{y^{-1}}{q}\bigg)+\int_0^1 F(t) \,dt. \end{align*} $$
As both 
 $y^{-2}q^{-1}$
 and
$y^{-2}q^{-1}$
 and 
 $qsT^{-1/3}$
 are
$qsT^{-1/3}$
 are 
 $O(y)$
, this implies the claim.
$O(y)$
, this implies the claim.
By Strömbergsson’s result [Reference Strömbergsson8],
 $$ \begin{align*} \bigg| y \int_0^{y^{-1}} f(\xi h(t)) \,dt - \int f \; d\mu_X \bigg| \ll y^\beta \ll (\delta^{-2} r^{-1})^\beta, \end{align*} $$
$$ \begin{align*} \bigg| y \int_0^{y^{-1}} f(\xi h(t)) \,dt - \int f \; d\mu_X \bigg| \ll y^\beta \ll (\delta^{-2} r^{-1})^\beta, \end{align*} $$
so we see from Claim 3.3 that
 $$ \begin{align*} \bigg| \frac{(1+\gamma) M^\gamma}{T^{{1}/{3}}}\sum_{n \leq T^{{1}/{3}}} f(\xi h((1+\gamma)M^\gamma n)) - \int f \,d\mu_X \bigg| \ll (\delta^{2} r)^{-\beta} \end{align*} $$
$$ \begin{align*} \bigg| \frac{(1+\gamma) M^\gamma}{T^{{1}/{3}}}\sum_{n \leq T^{{1}/{3}}} f(\xi h((1+\gamma)M^\gamma n)) - \int f \,d\mu_X \bigg| \ll (\delta^{2} r)^{-\beta} \end{align*} $$
unless there is a 
 $q \leq y^{-3} \leq r^{3} $
 and a coprime to q such that
$q \leq y^{-3} \leq r^{3} $
 and a coprime to q such that 
 $$ \begin{align*} \bigg| (1+\gamma)M^\gamma y - \frac{a}{q} \bigg| \ll M^{\gamma}y^{-1}T^{-{1}/{3}} \leq r T^{-{1}/{3}+\gamma}. \end{align*} $$
$$ \begin{align*} \bigg| (1+\gamma)M^\gamma y - \frac{a}{q} \bigg| \ll M^{\gamma}y^{-1}T^{-{1}/{3}} \leq r T^{-{1}/{3}+\gamma}. \end{align*} $$
To conclude the proof of Theorem 1.1, we just have to show that this is a very exceptional occurrence.
Fortunately, this is what one would expect: if we let
 $$ \begin{align*} I_{q, a}:=\bigg\{v \in \mathbb{R}: \bigg| v - \frac{a}{q} \bigg| \leq r T^{-{1}/{3}+\gamma} \bigg\} \end{align*} $$
$$ \begin{align*} I_{q, a}:=\bigg\{v \in \mathbb{R}: \bigg| v - \frac{a}{q} \bigg| \leq r T^{-{1}/{3}+\gamma} \bigg\} \end{align*} $$
denote the problem intervals for 
 $q \leq r^3$
 and
$q \leq r^3$
 and 
 $(a, q)=1$
, we note that they are proportional to
$(a, q)=1$
, we note that they are proportional to 
 $r T^{-1/3+\gamma }$
. Moreover, given distinct intervals
$r T^{-1/3+\gamma }$
. Moreover, given distinct intervals 
 $ I_{q_1, a_1}, I_{q_2, a_2}$
, the gap between them is at least of order
$ I_{q_1, a_1}, I_{q_2, a_2}$
, the gap between them is at least of order 
 $r^{-6}$
, as
$r^{-6}$
, as 
 $$ \begin{align*} \bigg|\frac{a_1}{q_1}-\frac{a_2}{q_2}\bigg|\geq \frac{1}{q_1 q_2} \geq r^{-6}. \end{align*} $$
$$ \begin{align*} \bigg|\frac{a_1}{q_1}-\frac{a_2}{q_2}\bigg|\geq \frac{1}{q_1 q_2} \geq r^{-6}. \end{align*} $$
As 
 $r \ll T^{4\varepsilon }$
, this means that the set
$r \ll T^{4\varepsilon }$
, this means that the set 
 $E:=\bigcup _{q \leq r^3, (a, q)=1} I_{q, a}$
 makes up only a tiny proportion of the entire range. Unless the function
$E:=\bigcup _{q \leq r^3, (a, q)=1} I_{q, a}$
 makes up only a tiny proportion of the entire range. Unless the function 
 $$ \begin{align*} G(t):= \frac{t^\gamma R}{(t^{1+\gamma}-W)^2+1}=t^\gamma y \end{align*} $$
$$ \begin{align*} G(t):= \frac{t^\gamma R}{(t^{1+\gamma}-W)^2+1}=t^\gamma y \end{align*} $$
is highly concentrated on a small part of its range, our problem case 
 $\{t \leq T: (1+\gamma ) G(t) \in E \}$
 will thus only occur on a negligible proportion of
$\{t \leq T: (1+\gamma ) G(t) \in E \}$
 will thus only occur on a negligible proportion of 
 $[0,T]$
. The following claim shows that G does not behave in this unusual manner.
$[0,T]$
. The following claim shows that G does not behave in this unusual manner.
Claim 3.4. For all but a 
 $O(\delta + \delta ^{-5} r^7 T^{-1/3+\gamma })$
 proportion of
$O(\delta + \delta ^{-5} r^7 T^{-1/3+\gamma })$
 proportion of 
 $t \leq T$
, there does not exist
$t \leq T$
, there does not exist 
 $q \leq r^3$
 such that
$q \leq r^3$
 such that 
 $$ \begin{align*} \bigg|(1+\gamma) G(t)-\frac{a}{q}\bigg| \leq r T^{-{1}/{3}+\gamma}. \end{align*} $$
$$ \begin{align*} \bigg|(1+\gamma) G(t)-\frac{a}{q}\bigg| \leq r T^{-{1}/{3}+\gamma}. \end{align*} $$
 Before we show the claim, we show how it implies Proposition 3.1. The claim implies that at most a small proportion of the intervals we split 
 $[0,T]$
 into when applying Taylor approximation will be bad; for the others, we know equidistribution from Claim 3.3. Collecting all the different error terms together,
$[0,T]$
 into when applying Taylor approximation will be bad; for the others, we know equidistribution from Claim 3.3. Collecting all the different error terms together, 
 $$ \begin{align*} \bigg|\frac{1}{T} \sum_{n \leq T} f(ph(n^{1+\gamma})) - \int f \; d\mu_X \bigg| \ll \delta + \delta^{-5} r^7 T^{-{1}/{3}+\gamma} + (\delta^{-2} r^{-1})^\beta, \end{align*} $$
$$ \begin{align*} \bigg|\frac{1}{T} \sum_{n \leq T} f(ph(n^{1+\gamma})) - \int f \; d\mu_X \bigg| \ll \delta + \delta^{-5} r^7 T^{-{1}/{3}+\gamma} + (\delta^{-2} r^{-1})^\beta, \end{align*} $$
where the error terms come from, in that order, Lemma 1.3 and Claim 3.4, the contribution of the problem intervals 
 $I_{q, a}$
 on which the sequence
$I_{q, a}$
 on which the sequence 
 $\xi (1+\gamma ) M^\gamma n $
 does not equidistribute in the periodic horocycle, and the comparison with
$\xi (1+\gamma ) M^\gamma n $
 does not equidistribute in the periodic horocycle, and the comparison with 
 $\int f \; d\mu _X$
 on the good intervals. Setting
$\int f \; d\mu _X$
 on the good intervals. Setting 
 $\delta =r^{-{1}/{10}}$
 takes care of the first and third terms, while, recalling that
$\delta =r^{-{1}/{10}}$
 takes care of the first and third terms, while, recalling that 
 $r \ll T^{4\varepsilon }$
, we can control the second term by setting
$r \ll T^{4\varepsilon }$
, we can control the second term by setting 
 $\varepsilon ={1}/{100}$
. This concludes the proof of Proposition 3.1 (and thus also the proof of Theorem 1.1) with only Claim 3.4 left to be shown.
$\varepsilon ={1}/{100}$
. This concludes the proof of Proposition 3.1 (and thus also the proof of Theorem 1.1) with only Claim 3.4 left to be shown.
Proof of Claim 3.4
To show this claim, we use the following simple lemma, whose proof is left to the reader as an exercise.
Lemma 3.5. Let 
 $I \subset \mathbb {R}$
 be an open interval and let
$I \subset \mathbb {R}$
 be an open interval and let 
 $G \colon I \to \mathbb {R}$
 be continuously differentiable such that
$G \colon I \to \mathbb {R}$
 be continuously differentiable such that 
 $0<c \leq |G^\prime (t)| \leq C$
 for all
$0<c \leq |G^\prime (t)| \leq C$
 for all 
 $t \in I$
. Let
$t \in I$
. Let 
 $\theta>0$
 and let
$\theta>0$
 and let 
 $a_1<b_1< a_2{\kern-1.2pt}<{\kern-1.2pt}\cdots {\kern-1.2pt}<{\kern-1.2pt}a_{n-1}{\kern-1.2pt}<{\kern-1.2pt}b_{n-1}{\kern-1.2pt}<{\kern-1.2pt}a_n$
 be real numbers with the property that
$a_1<b_1< a_2{\kern-1.2pt}<{\kern-1.2pt}\cdots {\kern-1.2pt}<{\kern-1.2pt}a_{n-1}{\kern-1.2pt}<{\kern-1.2pt}b_{n-1}{\kern-1.2pt}<{\kern-1.2pt}a_n$
 be real numbers with the property that 
 ${b_i{\kern-1pt}-{\kern-1pt}a_i {\kern-1pt}\leq{\kern-1pt} \theta (a_{i+1}{\kern-1pt}-{\kern-1pt}b_i)}$
 for all
${b_i{\kern-1pt}-{\kern-1pt}a_i {\kern-1pt}\leq{\kern-1pt} \theta (a_{i+1}{\kern-1pt}-{\kern-1pt}b_i)}$
 for all 
 $1 \leq i<n$
. Then, for
$1 \leq i<n$
. Then, for 
 $E:=(a_1, b_1) \cup \cdots \cup (a_{n-1}, b_{n-1})$
,
$E:=(a_1, b_1) \cup \cdots \cup (a_{n-1}, b_{n-1})$
, 
 $$ \begin{align*} |\{t \in I: G(t) \in E \}| \leq 2 \theta C c^{-1} |I| \end{align*} $$
$$ \begin{align*} |\{t \in I: G(t) \in E \}| \leq 2 \theta C c^{-1} |I| \end{align*} $$
provided that 
 $|I| \geq \theta C c^{-1}$
.
$|I| \geq \theta C c^{-1}$
.
To apply this to the function
 $$ \begin{align*} G(t)= \frac{t^\gamma R}{(t^{1+\gamma}-W)^2+1}=t^\gamma y \end{align*} $$
$$ \begin{align*} G(t)= \frac{t^\gamma R}{(t^{1+\gamma}-W)^2+1}=t^\gamma y \end{align*} $$
in which we are interested, we need to calculate its derivative. We see that
 $$ \begin{align*} \frac{dy}{dt}(t)=-\frac{2(1+\gamma) t^\gamma (t^{1+\gamma} - W)R}{((t^{1+\gamma}-W)^2+1)^2}=-\frac{2y(1+\gamma) t^\gamma (t^{1+\gamma} - W)}{(t^{1+\gamma}-W)^2+1} \end{align*} $$
$$ \begin{align*} \frac{dy}{dt}(t)=-\frac{2(1+\gamma) t^\gamma (t^{1+\gamma} - W)R}{((t^{1+\gamma}-W)^2+1)^2}=-\frac{2y(1+\gamma) t^\gamma (t^{1+\gamma} - W)}{(t^{1+\gamma}-W)^2+1} \end{align*} $$
and thus
 $$ \begin{align*} G^\prime(t)= y t^{\gamma-1} \bigg(\gamma - \frac{2(1+\gamma) t^{1+\gamma} (t^{1+\gamma} -W) }{(t^{1+\gamma}-W)^2+1} \bigg). \end{align*} $$
$$ \begin{align*} G^\prime(t)= y t^{\gamma-1} \bigg(\gamma - \frac{2(1+\gamma) t^{1+\gamma} (t^{1+\gamma} -W) }{(t^{1+\gamma}-W)^2+1} \bigg). \end{align*} $$
We recall that in Lemma 1.3, we exclude an interval 
 $J_0$
 of proportion
$J_0$
 of proportion 
 $\delta $
 to assure
$\delta $
 to assure 
 ${r^{-1} \ll y \ll \delta ^{-2} r^{-1}}$
. We also exclude a set
${r^{-1} \ll y \ll \delta ^{-2} r^{-1}}$
. We also exclude a set 
 $J_1$
 comprising two intervals of proportion
$J_1$
 comprising two intervals of proportion 
 $\delta $
 to assure
$\delta $
 to assure 
 $t \geq \delta T$
 and
$t \geq \delta T$
 and 
 $|{W}/{t^{1+\gamma }}-1| \geq \delta $
. This assures that
$|{W}/{t^{1+\gamma }}-1| \geq \delta $
. This assures that 
 $r^{-1} T^{\gamma -1} \ll y t^{\gamma -1} \ll \delta ^{-3} r^{-1} T^{\gamma -1}$
 on the range
$r^{-1} T^{\gamma -1} \ll y t^{\gamma -1} \ll \delta ^{-3} r^{-1} T^{\gamma -1}$
 on the range 
 $[0,T] \backslash (J_0 \cup J_1)$
. If we can bound the expression in the bracket in a similar manner up to factors of powers of
$[0,T] \backslash (J_0 \cup J_1)$
. If we can bound the expression in the bracket in a similar manner up to factors of powers of 
 $\delta ^{-1}$
, the claim will follow from Lemma 3.5.
$\delta ^{-1}$
, the claim will follow from Lemma 3.5.
 To do this, we note that for 
 $t \in [0,T] \backslash J_1$
,
$t \in [0,T] \backslash J_1$
, 
 $$ \begin{align*} \bigg| \frac{1}{(t^{1+\gamma}-W)^2+1}-\frac{1}{(t^{1+\gamma}-W)^2} \bigg|=O(\delta^{-4} T^{-4(1+\gamma)}), \end{align*} $$
$$ \begin{align*} \bigg| \frac{1}{(t^{1+\gamma}-W)^2+1}-\frac{1}{(t^{1+\gamma}-W)^2} \bigg|=O(\delta^{-4} T^{-4(1+\gamma)}), \end{align*} $$
which implies
 $$ \begin{align*} G^\prime(t)=y t^{\gamma-1} \bigg(\gamma + \frac{2(1+\gamma)}{{W}/{t^{1+\gamma}}-1} + O(\delta^{-4} T^{-2}) \bigg). \end{align*} $$
$$ \begin{align*} G^\prime(t)=y t^{\gamma-1} \bigg(\gamma + \frac{2(1+\gamma)}{{W}/{t^{1+\gamma}}-1} + O(\delta^{-4} T^{-2}) \bigg). \end{align*} $$
We set 
 $J_2:=\{t: |{W}/{t^{1+\gamma }}-(1-{(2+\gamma )}/{\gamma })| \geq \delta \}$
, which is the interval of proportion
$J_2:=\{t: |{W}/{t^{1+\gamma }}-(1-{(2+\gamma )}/{\gamma })| \geq \delta \}$
, which is the interval of proportion 
 $\delta $
 on which the second term roughly cancels out the first. We then have that
$\delta $
 on which the second term roughly cancels out the first. We then have that 
 $$ \begin{align*} \delta \ll \gamma \bigg|\frac{W}{t^{1+\gamma}}-1\bigg|^{-1} \bigg|\frac{W}{t^{1+\gamma}}-1+\frac{(2+\gamma)}{\gamma}\bigg|=\bigg|\gamma + \frac{2(1+\gamma)}{{W}/{t^{1+\gamma}}-1} \bigg| \ll \delta^{-1} \end{align*} $$
$$ \begin{align*} \delta \ll \gamma \bigg|\frac{W}{t^{1+\gamma}}-1\bigg|^{-1} \bigg|\frac{W}{t^{1+\gamma}}-1+\frac{(2+\gamma)}{\gamma}\bigg|=\bigg|\gamma + \frac{2(1+\gamma)}{{W}/{t^{1+\gamma}}-1} \bigg| \ll \delta^{-1} \end{align*} $$
on 
 $[0,T] \backslash (J_1 \cup J_2)$
, which implies that
$[0,T] \backslash (J_1 \cup J_2)$
, which implies that 
 $$ \begin{align*} \delta r^{-1} T^{\gamma-1} \ll |G^\prime(t)| \ll \delta^{-4} r^{-1} T^{\gamma-1} \end{align*} $$
$$ \begin{align*} \delta r^{-1} T^{\gamma-1} \ll |G^\prime(t)| \ll \delta^{-4} r^{-1} T^{\gamma-1} \end{align*} $$
on 
 $[0,T] \backslash (J_0 \cup J_1 \cup J_2)$
. We can now apply Lemma 3.5 to each of the intervals left. Recalling that each problem interval
$[0,T] \backslash (J_0 \cup J_1 \cup J_2)$
. We can now apply Lemma 3.5 to each of the intervals left. Recalling that each problem interval 
 $I_{q, a}$
 is of length
$I_{q, a}$
 is of length 
 $r T^{-1/3+\gamma }$
 and the gap between any two successive intervals is of size at least
$r T^{-1/3+\gamma }$
 and the gap between any two successive intervals is of size at least 
 $0.9 r^{-6}$
, we find that
$0.9 r^{-6}$
, we find that 
 $$ \begin{align*} \frac{1}{T}| \{t \in [0,T] \backslash (J_0 \cup J_1 \cup J_2): (1+\gamma)G(t) \in E \}| \ll \delta^{-5} r^7 T^{-{1}/{3}+\gamma}, \end{align*} $$
$$ \begin{align*} \frac{1}{T}| \{t \in [0,T] \backslash (J_0 \cup J_1 \cup J_2): (1+\gamma)G(t) \in E \}| \ll \delta^{-5} r^7 T^{-{1}/{3}+\gamma}, \end{align*} $$
where, as before, 
 $E=\bigcup _{q \leq r^3, (a, q)=1} I_{q, a}$
. This shows Claim 3.4, which was the last missing piece in the proof of Theorem 1.1.
$E=\bigcup _{q \leq r^3, (a, q)=1} I_{q, a}$
. This shows Claim 3.4, which was the last missing piece in the proof of Theorem 1.1.
Acknowledgements
This is a follow-up paper to [Reference Streck7], which is based on the master’s thesis I did at the Hebrew University of Jerusalem in 2020. As such, I am thankful for the support by my thesis advisor Tamar Ziegler and by Elon Lindenstrauss, who also suggested that the result in the present paper should be achievable with the ideas in [Reference Streck7]. I thank my PhD supervisor Péter Varjú for giving me the freedom to finish the work on these two papers while doing my PhD with him. Above all, I am grateful to Adrián Ubis, who suggested the argument used in the proof of Claim 3.3 in his review of the previous paper, simplifying the proof in [Reference Streck7] considerably. Without getting this new perspective on the material two years later, I would not even have thought of revisiting the problem solved in this paper. The author received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 803711)
 
 





 
