1. Introduction
Compressed sensing is a modern technique of data acquisition, which is at the intersection of mathematics, electrical engineering, computer science, and physics, and has grown tremendously in recent years. Mathematically, we define an unknown signal as a vector
$\boldsymbol{{x}}\in \mathbb{R}^d$
, and we have access to linear measurements: that is, for any vector
$\boldsymbol{{a}}\in \mathbb{R}^d$
, we have access to
$\boldsymbol{{a}}\cdot \boldsymbol{{x}}=\sum _{i=1}^da_ix_i$
. In particular, if
$\boldsymbol{{a}}^{(1)},\ldots \boldsymbol{{a}}^{(n)}\in \mathbb{R}^d$
are the measurements we make, then we have an access to the vector
, where

The tasks of compressed sensing are:
to recover
as accurately as possible, and
doing so in an efficient way. In practice, one would like to recover a high dimensional signal (that is,
is large) from as few measurements as possible (that is,
is small). In this regime, for an arbitrary vector
$x\in \mathbb{R}^d$
the problem is ill-posed: for any given
, the solution of
, if it exists, forms a (translation of) linear subspace of dimension at least
, and therefore there is no way to uniquely recover the original
A key quantity to look at to guarantee the success of (unique) recovery is the sparsity of the vector
, and we say that a vector is
-sparse if its support is of size at most
. That is, if

A neat observation is that having at most one
-sparse solution to
for every
is equivalent to saying that
-robust (that is, every
columns of
are linearly independent). Indeed, if we have two
-sparse vectors
$\boldsymbol{{x}}\neq \boldsymbol{{y}}$
such that
is a nonzero
-sparse vector in the kernel of
. For the other direction, if there is a nonzero
-sparse vector in the kernel of
, one can split its support into two disjoint sets of size at most
each and consider the vectors restricted to these sets, one of which is multiplied by
If we take
to be a random Gaussian matrix
(or any other matrix drawn from some ‘nice’ continuous distribution), then we clearly have that with probability one
-robust for
and any
$d\in \mathbb{N}$
(and in particular, one can uniquely recover
-sparse vectors). Moreover, in their seminal work, Candes and Tao [Reference Candes and Tao3] showed that it is possible to efficiently reconstruct
with very high accuracy by solving a simple linear programme if we take
In this paper, we are interested in the compressed sensing problem with integer-valued measurement matrices and with entries of magnitude at most
. Integer-valued measurement matrices have found applications in measuring gene regulatory expressions, wireless communications, and natural images [Reference Abdi, Fekri and Zhang1, Reference Eldar, Haimovich and Rossi4, Reference Haseyama, He and Ogawa12], and they are quick to generate and easy to store in practice [Reference Iwen13, Reference Jiang, Liu, Xia and Zheng14]. Under this setting, for integer-valued signal
, we can have exact recovery even if we allow some noise
$\|\boldsymbol{{e}}\|_{\infty }\lt 1/2$
(for more details, see [Reference Fukshansky, Needell and Sudakov10]).
The first step is to understand when the compressed sensing problem is well-posed for given
$s,n, k$
, and
. Namely, for which values of
does an
$n\times d$
integer-valued matrix with entries in
$\{-k,\ldots, k\}$
exist? For
, observe that if
$d\geq (2k+1)^2n$
, then by the pigeonhole principle, one can find
columns for which their first two rows are proportional and therefore are not linearly independent. In particular, we have
. In [Reference Fukshansky, Needell and Sudakov10], Fukshansky, Needell, and Sudakov showed that there exists an
$d=\Omega (\sqrt{k} n)$
, using the result of Bourgain, Vu and Wood [Reference Bourgain, Vu and Wood2] on the singularity of discrete random matrices (in fact, the more recent result by Tikhomirov [Reference Tikhomirov17] gives a better bound for
). Konyagin and Sudakov [Reference Konyagin and Sudakov15] improved the upper bound to
$d=O(k\sqrt{\log k}n)$
, and they gave a deterministic construction of
$d\geq \frac{1}{2} k^{n/(n-1)}\gt n$
$1\leq s\leq n-1$
, Fukshansky and Hsu [Reference Fukshansky and Hsu9] gave a deterministic construction such that
$d\geq \left (\frac{n+2}{2}\right )^{1+\frac{2}{3s-2}}$
. When
$s=o(\!\log n)$
, this implies we can take
$d=\omega (n)$
. This result hints that if we allow
to be ‘separated away’ from
, then one could take
to be ‘very large’. A natural and nontrivial step to understanding the
-robustness property of matrices is to investigate the typical behaviour. For convenience, we will focus on the case
(even though our argument can be generalized to all fixed
), and we define, for all
$n,d\in \mathbb{N}$
, the random variable
which corresponds to an
$n\times d$
matrix with independent entries chosen uniformly from
$\{\pm 1\}$
. For
$1\leq s\leq n$
, we would like to investigate the threshold behaviour of
with respect to being
-robust. That is, we wish to find some
such that

It is trivial to show (deterministically) that if
-robust, then
$d\leq 2n$
. What if we allow
to be ‘separated away’ from
? That is, what if
$s=(1-\delta )n$
for some
$0\lt \delta \lt 1$
? It is not hard to show (and it follows from the proof of Lemma 3.3) that the probability for a random
$n\times n$
matrix to have rank at least
$(1-\delta )n$
is at least
$1-2^{-\Omega (\delta ^2n^2)}$
. Therefore, one could think that a typical
might be
$(1-\delta )n$
-robust for some
. This turns out to be wrong as we show in the following simple theorem:
Theorem 1.1.
For any fixed
$0\lt \delta \lt 1$
there exists
$C\gt 0$
such that for sufficiently large
$n\in \mathbb{N}$
the following holds. If
$s= (1-\delta )n$
$d\ge Cn^{1+1/(1-\delta )}$
, then every
$\pm 1$
$n\times d$
is not
Proof. Given any
-subset of column vectors
$\boldsymbol{{v}}_1,\ldots,\boldsymbol{{v}}_{s/2}\in \{\pm 1\}^n$
, by Spencer’s ‘six standard deviations suffice’ [Reference Spencer16], there exist some
$x_1,\ldots,x_{s/2} \in \{\pm 1\}$
for which
$\|\sum _{i=1}^{s/2}x_i\boldsymbol{{v}}_i\|_{\infty }\leq C'\sqrt{n}$
for a universal constant
$C'\gt 0$
(a simple Chernoff bound suffices if one is willing to lose a
$\sqrt{\log n}$
factor). Fix such a combination
$\sum _{i=1}^{s/2}x_i\boldsymbol{{v}}_i$
for each
-subset of column vectors. Since there are at most
$\left (3C'\sqrt{n}\right )^{n}$
integer-valued vectors in the box
, and since

by the pigeonhole principle, as long as
is large enough, there are two
-subsets whose corresponding combination of column vectors are the same. Subtracting the corresponding combination of column vectors leads to a nonzero
-sparse kernel vector of
(since the indices of two
-subsets are not the same), proving the result.
In our main result, we determine the (typical) asymptotic behaviour up to a window of
$(\!\log n)^{\omega (1)}$
Theorem 1.2.
For any fixed
$0\lt \delta \lt 1$
, let
$n\in \mathbb{N}$
be sufficiently large, let
$s= (1-\delta )n$
, and let
$\varepsilon =\omega (\!\log \log n/\log n)$
. We have that:
(1) If
$d\leq n^{1+1/(2-2\delta )-\varepsilon }$ then with high probability
$M_{n,d}$ is
$s$ -robust.
(2) If
$d\geq n^{1+1/(2-2\delta )+\varepsilon }$ then with high probability
$M_{n,d}$ is not
$s$ -robust.
We believe that by optimizing our bounds/similar methods, one would be able to push the bounds in Theorem 1.2 up to a constant factor of
$n^{1+1/(2-2\delta )}$
(though we did not focus on this aspect). It would be interesting to obtain the
multiplicative threshold behaviour.
2. Proof outline
We first outline the proof of Theorem 1.2. We will prove part (1) of Theorem 1.2 over
for some prime
$p=e^{\omega (\!\log ^2 n)}$
to be chosen later (a stronger statement). Our strategy, at large, is to generate

, with
$n_1\approx n$
. The proof consists of the following two phases:
(1) Phase 1: Given any nonzero vector
$\boldsymbol{{a}}\in \mathbb{F}_p^d$ , we let
(2.1)where the\begin{equation} \rho _{\mathbb{F}_p}(\boldsymbol{{a}})=\max _{x\in \mathbb{F}_p}\mathbb{P}\left [\sum _{i=1}^da_i\xi _i=x\right ], \end{equation}
$\xi _i$ s are i.i.d. Rademacher random variables. In this phase, we will show that
$M_1$ is with high probability such that for all nonzero
$\boldsymbol{{a}}\in \mathbb{F}_p^d$ , if
$|\textrm{supp}\,{\boldsymbol{{a}}}|\leq s\;:\!=\;(1-\delta )n$ and
$M_1\boldsymbol{{a}}=\textbf{0}$ , then
$\rho _{\mathbb{F}_p}(\boldsymbol{{a}})=e^{-\omega (\!\log ^2 n)}$ , and
$M_1$ is with high probability such that every
$s$ -subset of its columns has rank
$s-o(s)$ .
(2) Phase 2: Conditioned on the above properties, we will use the extra randomness of
$M_2$ to show that for a specific set of
$s$ columns, after exposing
$M_2$ , the probability that it does not have full rank is
$o\left (1/\binom{d}{s}\right )$ , and hence a simple union bound will give us the desired result.
In this strategy, it turns out that Phase 1(a) is the limiting factor, that is, ruling out structured kernel vectors.
For the proof of the upper bound in Theorem 1.2, we exploit this observation. We show using the second-moment method that it is highly likely that some
$2\lfloor (1-\delta )n/2\rfloor$
columns sum to the zero vector (corresponding to an all
s, highly structured kernel vector).
3. Proof of the lower bound in Theorem 1.2
In this section we prove Theorem 1.2. Let (say)
$p\approx e^{\log ^3 n}$
be a prime, let
$d=n^{1+1/(2-2\delta )-\varepsilon }$
$s=(1-\delta )n$
as given, and
$n_1=(1-\beta )n$
$\beta =\omega (1/\log n)$
$\beta =o(\!\log \log n/\log n)$
. As described in Section 2, our proof consists of two phases, each of which will be handled separately.
3.1 Phase 1: no sparse structured vectors in the kernel of
Our first goal is to prove the following proposition.
Proposition 3.1.
is with high probability such that for every
$(1-\delta )n$
-sparse vector
$\boldsymbol{{a}}\in \mathbb{F}_p^d\setminus \{\textbf{0}\}$
, if
$\rho _{\mathbb{F}_p}(\boldsymbol{{a}})=e^{-\omega (\!\log ^2 n)}.$
In order to prove the above proposition, we need some auxiliary results.
Lemma 3.2.
is with high probability
$n/\log ^4n$
-robust over
Proof. Observe that for any
$\boldsymbol{{a}}\in \mathbb{F}_p^d\setminus \{\textbf{0}\}$
we trivially have that
$\mathbb{P}[M_1\boldsymbol{{a}}=\textbf{0}]\leq 2^{-n_1}=2^{-\Theta (n)}$
. Since there are at most

$n/\log ^4n$
-sparse vectors
$\boldsymbol{{a}}\in \mathbb{F}_p^d$
, by a simple union bound we obtain that the probability for such an
to satisfy
. This completes the proof.
In particular, by combining the above lemma with the Erdős-Littlewood-Offord inequality [Reference Erdös5], we conclude that if
$\boldsymbol{{a}}\in \mathbb{F}_p^d$
$(1-\delta )n$
-sparse and
, then
$\rho _{\mathbb{F}_p}(\boldsymbol{{a}})=O(\!\log ^2 n/n^{1/2})$
. However, to prove Proposition 3.1, we need a stronger estimate.
The following lemma asserts that every subset of
columns in
has large rank. It will be crucial in Phase 2.
Lemma 3.3.
$t = \omega (\!\log n)$
. Then, with high probability
is such that every subset of
columns contains at least
linearly independent columns.
Proof. Consider the event that one such subset has rank at most
. There are
$\binom{d}{s}\le d^s\le n^n$
possible choices of columns. For each such choice, there are at most
$2^s\le 2^n$
ways to choose a spanning set of
$r\le s-t$
columns. Such a subset has span containing at most
$\{\pm 1\}$
vectors (indeed, consider a full-rank
$r\times r$
sub-block; any
$\{\pm 1\}$
vector in the span of the columns is determined by its value on these
coordinates), so the probability that the remaining at least
$t = \omega (\!\log n)$
columns are in the span is at most
$(2^s/2^{n_1})^t\le (2^{-(\delta -\beta )n})^t = o(n^{-n})$
. Taking a union bound, the result follows.
Next, we state a version of Halász’s inequality [Reference Halász11, Theorem 3] as well as a ‘counting inverse Littlewood-Offord theorem’ as was developed in [Reference Ferber, Jain, Luh and Samotij7].
Definition 3.4. Let
$\boldsymbol{{a}}\in \mathbb{F}_p^n$
$k\in \mathbb{N}$
. We define
$R_k^{\ast }(\boldsymbol{{a}})$
to be the number of solutions to

$|\{i_1,\ldots,i_{2k}\}|\gt 1.01k$
Theorem 3.5. ([Reference Ferber, Jain, Luh and Samotij7, Theorem 1.4]). Given an odd prime
, integer
, and vector a = (a 1, … , an) ∈
$\mathbb{F}_p^{n}\setminus \{\textbf{0}\}$
, suppose that an integer
$0\le k \le n/2$
and positive real
$30L \le |\textrm{supp}{(\boldsymbol{{a}})}|$
$80kL \le n$
. Then

We denote
$\boldsymbol{{b}} \subset \boldsymbol{{a}}$
is a subvector of
and let
$| \boldsymbol{{b}}|$
be the size of the support of a vector
Theorem 3.6. ([Reference Ferber, Jain, Luh and Samotij7, Theorem 1.7]). Let
be a prime, let
$k, n \in \mathbb{N}$
$s\in [n]$
$t\in [p]$
. Define
${\boldsymbol{{B}}}_{k,m,\geq t}(s,d)$
as the following set:

We have

We now are in position to prove Proposition 3.1. The proof is quite similar to the proofs in [Reference Ferber and Jain6–Reference Ferber, Luh and McKinley8].
Proof of Proposition
3.1. Let
$k = \log ^3 n$
$m = n/\log ^4 n, p\approx e^{\log ^3 n}$
First we use Lemma 3.2 to rule out vectors
with a support of size less than
$n/\log ^4n$
. Next, let (say)
$L = n/\log ^{10}n$
and let
$\sqrt{L}\leq t\leq p$
Consider a fixed
$\boldsymbol{{a}}\in{\boldsymbol{{B}}}_{k,m,\geq t}(s,d)\setminus{\boldsymbol{{B}}}_{k,m,\geq 2t}(s,d)$
and we wish to bound the probability that
. By definition, there is a set
$S\subseteq \operatorname{supp}\!(\boldsymbol{{a}})$
of size at least
such that

Since the rows are independent and since
$\rho _{\mathbb{F}_p}\!(\boldsymbol{{a}})\leq \rho _{\mathbb{F}_p}\!(\boldsymbol{{a}}|_S)$
, the probability that
is at most
$\rho _{\mathbb{F}_p}(\boldsymbol{{a}}|_S)^{n_1}$
. Furthermore, by Theorem 3.5 and the given conditions, which guarantee
$30L\le m\le |\operatorname{supp}\!(\boldsymbol{{a}}|_S)|$
$80kL\le m\le |S|$
, and by
$\sqrt{L}\le t\le p$
, we have

for all sufficiently large
by equation (3.1). All in all, taking a union bound over all the possible choices of
(Theorem 3.6), and using the fact that
$s= (1-\delta )n$
$n_1=(1-\beta )n$
$\beta =\omega (1/\log n)$
, we obtain the bound

on the probability
has such a kernel vector for sufficiently large
. Here we used the bounds
$d\leq n^{1+1/(2-2\delta )-\varepsilon }$
$\varepsilon =\omega (\!\log \log n/\log n)$
$\beta =o(\varepsilon )$
. Union bounding over all possible values of
shows that there is an appropriately small chance of having such a vector for any
$t\ge \sqrt{L}$
Finally, note that
$B_{k,m,\ge p}(s,d)$
is empty and thus the above shows that kernel vectors
cannot be in
$\boldsymbol{{B}}_{k,m,\ge \sqrt{L}}(s,d)$
. A similar argument as in equations (3.1) and (3.2) shows that

and the result follows.
3.2 Phase 2: boosting the rank using
Here we show that, conditioned on the conclusions of Proposition 3.1 and Lemma 3.3, after exposing
with high probability
$M=\begin{pmatrix} M_1\\ M_2 \end{pmatrix}$
To analyse the probability that a given subset of
columns is not of full rank, we will use the following procedure:
Fix any subset of
columns in
, and let
be the submatrix in
that consists of those columns. We reveal
according to the following steps:
(1) Let
$I\subseteq [s]$ be the largest subset of indices such that the columns
$\{\boldsymbol{{c}}_i \mid i\in I\}$ are linearly independent. By Lemma 3.3 we have that
$T\;:\!=\;|I|\geq s-t= (1-\delta )n-t$ , where
$t=\omega (\!\log n)$ . Without loss of generality we may assume that
$I\;:\!=\;\{\boldsymbol{{c}}_1,\ldots,\boldsymbol{{c}}_T\}$ and
$T\leq s-1$ (otherwise we have already found
$s$ independent columns of
$M$ ). By maximality, we know that
$\boldsymbol{{c}}_{T+1}$ can be written (uniquely) as a linear combination of
$\boldsymbol{{c}}_1,\ldots, \boldsymbol{{c}}_{T}$ . That is, there exists a unique combination for which
$\sum ^{T}_{i=1} x_i \boldsymbol{{c}}_i= \boldsymbol{{c}}_{T+1}$ . In particular, this means that
\begin{align*} \sum _{i=1}^Tx_i \boldsymbol{{c}}_i-\boldsymbol{{c}}_{T+1}=0, \end{align*}
$\boldsymbol{{x}}=(0,\ldots, x_1,\ldots,x_{T}, -1, \ldots,0)\in \mathbb{F}_{q}^d$ is
$(T+1)$ -sparse and satisfies
$M_1\boldsymbol{{x}}=0$ . Since
$T+1\leq s$ , by Proposition 3.1 we know that
$\rho _{\mathbb{F}_p}(\boldsymbol{{x}})=2^{-\omega (\!\log ^2 n)}$ .
(2) Expose the row vector of dimension
$T+1$ from
$M_2$ below the matrix
$(\boldsymbol{{c}}_1,\ldots, \boldsymbol{{c}}_{T+1})$ . We obtain a matrix of size
$(n_1+1)\times (T+1)$ . Denote the new row as
$(y_1,\ldots,y_{T+1})$ .
(3) If the new matrix is of rank
$T+1$ , then consider this step as a ‘success’, expose the entire row and start over from
$(1)$ . Otherwise, consider this step as a ‘failure’ (As we failed to increase the rank) and observe that if
$\begin{bmatrix} \boldsymbol{{c}}_1 &\ldots & \boldsymbol{{c}}_{T+1}\\ y_1 &\ldots &y_{T+1} \end{bmatrix}$ is not of full rank, then we must have
\begin{equation*} x_1y_1+x_2y_2+\ldots -y_{T+1}=0. \end{equation*}
$y$ is at most
$\rho _{\mathbb{F}_p}(\boldsymbol{{x}})=e^{-\omega (\!\log ^2 n)}$ .
(4) All in all, the probability for more than
$\beta n-t$ failures is at most
$\binom{\beta n}{t}\left (e^{-\omega (\!\log ^2 n)}\right )^{\beta n-t}=e^{-\omega (n\log n)}=o\left (\binom{d}{s}^{-1}\right )$ . Therefore, by the union bound we obtain that with high probability
$M$ is
$s$ -robust.
This completes the proof.
4. Proof of the upper bound in Theorem 1.2
We first perform preliminary computations to compute a certain correlation. This boils down to estimating binomial sums. Let
$\xi _i,\xi' _{\!\!i}$
be independent Rademacher variables and define

$\alpha (n,m)\le \alpha (n,n)\le 10\sqrt{n}$
by [Reference Erdös5].
Lemma 4.1.
$\lambda \gt 0$
. If
is even and
$0\le m\le (1-\varepsilon )n$
we have

Proof. We have

We will also need a more refined bound when
is small.
Lemma 4.2.
is even and
$0\le m\le n^{1/2}$
, we have

Proof. Using the approximation
$|x|\le 1/2$
we see that if
is an integer satisfying
$1\le y\le x/2$

We now apply this to the situation at hand. We see
$\alpha (n,m)$
is equal to

In the third line, we used equation (4.1) and in the fourth line, we simplified the expression and used
$k\le m\le n^{1/2}$
to subsume many terms into an error of size
. The fifth line used
$|x|\le 1$
and the sixth line uses
$2^{-m}\binom{m}{k}(2k-m)^4\le 2m^2\exp\!(\!-(2k-m)^2/100)$
. Finally, this sum equals

We are ready to prove the upper bound in Theorem 1.2.
Proof of the upper bound in Theorem
1.2. We are given
$\delta \in (0,1)$
$\varepsilon = \omega (\!\log \log n/\log n)$
, with
$d = n^{1+1/(2-2\delta )+\varepsilon }$
. Let
$s = 2\lfloor (1-\delta )n/2\rfloor$
. We consider an
$n\times d$
random matrix with independent Rademacher entries and wish to show it is not
-robust with high probability. We may assume
$\varepsilon \lt 1/2$
as increasing
makes the desired statement strictly easier.
For an
-tuple of columns labelled by the index set
$S\subseteq [d]$
, let
be the indicator of the event that these columns sum to the zero vector. Let
$X = \sum _{S\in \binom{[d]}{s}}X_S$
, and let
$(\xi _1,\ldots,\xi _d)$
be a vector of independent Rademachers. We have


For every
$\eta \gt 0$
$m\le c_\eta n^{1/2}$
, where
is a sufficiently small absolute constant in terms of
, we see
$|\alpha (s,m)^n-1|\le \eta$
by Lemma 4.2. For
$c_\eta n^{1/2} \lt m\le (1-\varepsilon/8)s$
we have
$\alpha (s,m)^n\le \exp\!(O(m/\varepsilon ))$
by Lemma 4.1. For this range we have, since
$m/s\ge n^{\delta/2}s/d$

by Chernoff–Hoeffding (the fact that
$q\ge p$
with probability at most
, where this is the KL-divergence). Thus

$\varepsilon = \omega (\!\log \log n/\log n)$
Finally for
$(1-\varepsilon/8)s\le m\le s$
we have


$d = n^{1+1/(2-2\delta )+\varepsilon }$
$s = 2\lfloor (1-\delta )n/2\rfloor$
along with
$\varepsilon = \omega (\!\log \log n/\log n)$
. We see that this is
. Thus

sufficiently large, and thus
$X\gt 0$
with probability at least