Threshold Rates of Code Ensembles: Linear Is Best

In this work, we prove new results concerning the combinatorial properties of random linear codes. By applying the thresholds framework from Mosheiff et al. (FOCS 2020) we derive fine-grained results concerning the list-decodability and -recoverability of random linear codes. Firstly, we prove a lower bound on the list-size required for random linear codes over <inline-formula> <tex-math notation="LaTeX">$\mathbb {F}_{q}~\varepsilon $ </tex-math></inline-formula>-close to capacity to list-recover with error radius <inline-formula> <tex-math notation="LaTeX">$\rho $ </tex-math></inline-formula> and input lists of size <inline-formula> <tex-math notation="LaTeX">$\ell $ </tex-math></inline-formula>. We show that the list-size <inline-formula> <tex-math notation="LaTeX">$L$ </tex-math></inline-formula> must be at least <inline-formula> <tex-math notation="LaTeX">$\frac {\log _{q}\binom {q}{\ell }-R}{\varepsilon }$ </tex-math></inline-formula>, where <inline-formula> <tex-math notation="LaTeX">$R$ </tex-math></inline-formula> is the rate of the random linear code. This is analogous to a lower bound for list-decoding that was recently obtained by Guruswami et al. (IEEE TIT 2021B). As a comparison, we also pin down the list size of random codes which is <inline-formula> <tex-math notation="LaTeX">$\frac {\log _{q}\binom {q}{\ell }}{\varepsilon }$ </tex-math></inline-formula>. This result almost closes the <inline-formula> <tex-math notation="LaTeX">$O\left({\frac {q\log L}{L}}\right)$ </tex-math></inline-formula> gap left by Guruswami et al. (IEEE TIT 2021A). This leaves open the possibility (that we consider likely) that random linear codes perform better than random codes for list-recoverability, which is in contrast to a recent gap shown for the case of list-recovery from erasures (Guruswami et al., IEEE TIT 2021B). Next, we consider list-decoding with constant list-sizes. Specifically, we obtain new lower bounds on the rate required for: 1) List-of-3 decodability of random linear codes over <inline-formula> <tex-math notation="LaTeX">$\mathbb {F}_{2}$ </tex-math></inline-formula>; 2) List-of-2 decodability of random linear codes over <inline-formula> <tex-math notation="LaTeX">$\mathbb {F}_{q}$ </tex-math></inline-formula> (for any <inline-formula> <tex-math notation="LaTeX">$q$ </tex-math></inline-formula>). This expands upon Guruswami et al. (IEEE TIT 2021A) which only studied list-of-2 decodability of random linear codes over <inline-formula> <tex-math notation="LaTeX">$\mathbb {F}_{2}$ </tex-math></inline-formula>. Further, in both cases we are able to show that the rate is larger than that which is possible for uniformly random codes. A conclusion that we draw from our work is that, for many combinatorial properties of interest, random linear codes actually perform better than uniformly random codes, in contrast to the apparently standard intuition that uniformly random codes are best.

This expands upon Guruswami et al. (IEEE TIT 2021A) which only studied list-of-2 decodability of random linear codes over F 2 .Further, in both cases we are able to show that the rate is larger than that which is possible for uniformly random codes.
A conclusion that we draw from our work is that, for many combinatorial properties of interest, random linear codes actually perform better than uniformly random codes, in contrast to the apparently standard intuition that uniformly random codes are best.

Introduction
Coding theory is concerned with developing efficient means to make data robust to noise.The mathematical objects used for this purpose are (error-correcting) codes, which are just subsets C ⊆ Σ n , where Σ is a finite alphabet of size q.It is often convenient to set Σ = F q , where F q is the finite field of order q,1 in which case we can insist that C be a subspace of F n q .We call such a code linear and denote it C ≤ F n q .As we are mostly concerned with linear codes in the sequel we will always set Σ = F q . 2n order for a code to be useful for information transmission in noisy environments, we require C to satisfy noise-resilience properties, which amounts to insisting that the codewords are "difficult to confuse."A basic way to do this is to define a distance metric on F n q and then insist that the codewords are not too clustered.The standard choice is the (relative) Hamming distance which is defined as d( x, y) = 1 n |{i ∈ [n] : x i = y i }| for x, y ∈ F n q : in words, it is the fraction of coordinates on which the vectors x and y differ.The minimum distance of a code is then the minimum distance between two distinct codewords, i.e., δ := min{d( x, y) : x, y ∈ C, x = y}.
Beyond the minimum distance, there are other proxies for a code's noise-resilience that are widely studied.First and foremost, a popular relaxed notion of noise-resilience is provided by list-decodability, which informally asks that the code not be "too" clustered around any one point.More precisely, a code is said to be (ρ, L)-list-decodable if there are never L or more codewords that are all within distance ρ of some vector z ∈ F n q , i.e., ∀ z ∈ F n q , |{ x ∈ C : d( x, z) ≤ ρ}| < L .The integer L is called the list-size.This notion, originally introduced by Elias and Wozencraft [Eli57,Woz58], finds uses within coding theory and beyond in, e.g., complexity theory [Lip90, BFNW90, STV01], cryptography [KM93], and learning theory [GL89].
We will also investigate another relaxation of list-decoding: list-recovery.Here, we are given a collection of input lists S 1 , . . ., S n ⊆ F q of bounded size, and the requirement is that there are not too many codewords that agree too much with these input lists.More precisely, for an integer ≤ q we require that ∀ S = (S 1 , . . ., S n ) ∈ F q n , |{ x ∈ C : d( x, S) ≤ ρ}| < L .
In the above, we are denoting by Fq the family of all -element subsets of F q , and we are extending the Hamming distance notation d(•, •) via Note that (ρ, 1, L)-list-recovery is equivalent to list-decoding, demonstrating that list-recoverability is indeed a generalization of list-decodability.While list-recovery was initially introduced as a stepping stone • If R ≥ 1 − h q, (ρ) + ε, there do not exist (ρ, , L)-list-recoverable codes with L = o(q εn ).
In the above, the function h q, (•) is the (q, )-entropy function; its precise definition is not important at the moment so we defer it to Section 2. Informally, when studying codes of rate ε below the capacity for a small ε > 0 we refer to them as capacity-approaching and call ε as the gap-to-capacity.This already tells us that the capacity for (ρ, , L)-list-recovery is 1 − h q, (ρ) if we insist that L be subexponential in n.However, we can ask for more fine-grained information: in particular, exactly how large must the list-size L be as a function of ε and the other parameters?
For random linear codes, we prove the following lower bound.
For context, we consider the case of uniformly random codes.In this case, we obtain a tight result.
On the other hand, for any ε > 0 and n sufficiently large, a random code in F n q of rate 1 − h q, (ρ) − ε is whp ρ, , log q ( q ) ε + 1 -list-recoverable.
In this way, we pin down the list-recoverability for random codes to one of two or three possible values: log q ( q ) ε + 0.99 , log q ( q ) ε (if it's different) or log q ( q ) ε + 1. Comparing Theorems 1.1 and 1.2 we see that our lower bound on random linear codes is less than the precise bound we have on random codes.One could potentially draw the conclusion that Theorem 1.1 should be improved.However, we believe that it is in fact tight.For the case of list-decoding binary codes it has already been shown that random linear performs better than uniformly random, and the bounds we obtain are the natural generalizations of the (tight) results for that case.We therefore conjecture that Theorem 1.1 is indeed tight.This stands in stark contrast to erasure list-recovery:3 for this model, it is known that random linear codes can require lists of size Ω(1/ε) [GLM + 21] (at least, if the field has large characteristic), whereas the lists for random codes can be just O( /ε).A summary of the state-of-the-art for list-recovery of RLCs and RCs is provided in Figure 1.
Remark.It might appear that our conjecture that random linear codes outperform random codes for listrecovery is contradicted by the result of [GLM + 21].However, we emphasize that the capacity for erasure list-recovery is larger, so if a code is ε-close to capacity for list-recovery from erasures for small ε > 0 it is above capacity for list-recovery from errors, the model we study.Hence, this lower bound does not contradict our conjecture.One can also consider the model where ρ approaches the limit 1 − /q as is done

Source
This table summarizes much of the work on the list-recoverability of random linear codes (RLC) and random codes (RC).The lower bound of [GLM + 21] only applies when q = p Ω(1/ε) for a prime p, and in [RW18] η > 0 is viewed as a small constant.[GLM + 21] also offers a similar lower bound for the case of list-recovery from erasures.
in [RW18]; in this case we still suspect that random linear codes outperform uniformly random codes, but this is just speculation and further investigation is required.
List-decoding with small lists.Next, we turn our attention to the challenge of list-decoding when the output list-size L is a (small) constant.Thus, we are no longer in the regime where we can expect to approach the list-decoding capacity, and we are interested to know by how much we are required to back off if, say, L = 3, 4. First, we consider the case where L = 4 for the binary field, which we also refer to as list-of-3 decoding.Here and throughout, we also use the following notation (which is slightly abusive): for q ≥ 2 and nonnegative reals x 1 , . . ., x t with We first prove the following possibility result for random linear codes.In the following, Theorem 1.3 (List-of-3 decoding Random Linear Binary Codes).Let ρ ∈ (0, 5/16)4 and suppose Then a random linear code over F q of rate R is whp (ρ, 4)-list-decodable.
For context, we also study the list-of-3 decodability of random codes over the binary alphabet.In this case, we can prove the following: Theorem 1.4 (List-of-3 decoding Random Binary Codes).Let ρ ∈ (0, 5/16) and suppose Then a random code over {0, 1} of rate R is whp not (ρ, 4)-list-decodable.
As 1+F 4 ≥ F 3 whenever F ≤ 3, we see that the bound in Theorem 1.3 is greater than the bound from Theorem 1.4.Using terminology that we later make precise, we see that the threshold rate for list-of-3 decoding binary random linear codes strictly exceeds that of binary random codes.
Next, we study list-of-2 decoding over alphabets of size q > 2. And again, our theorems demonstrate that random linear codes strictly outperform random codes.Define Theorem 1.5 (List-of-2 decoding Random Linear q-ary Codes).Let ρ ∈ (0, 1/3) and suppose Then a random linear code over F q of rate R is whp (ρ, 3)-list-decodable.
Then a random code over F q of rate R is whp not (ρ, 3)-list-decodable.
On the other hand, if then a random code over F q is whp (ρ, 3)-list-decodable.
Again, we can see that the bound from Theorem 1.5 is greater than the bound from Theorem 1.6.We therefore conjecture that this phenomenon of random linear codes outperforming random codes extends to more values of L. To provide more evidence for this conjecture, we extend an argument for binary random linear codes of [GHSZ02,LW18] to larger values of L, and by comparing it to a computation of the threshold rate for random binary codes, show that for many parameter regimes of interest we do indeed have random linear codes outperforming random codes.

Techniques
In order to obtain our results, we rely on a recently developed toolkit for proving threshold rates for combinatorial properties of random (linear) codes.The toolkit for random linear codes was developed by Mosheiff et al. [MRRZ + 20] on the way to proving that LDPC codes achieve list-decoding capacity; recent works [GLM + 21, GM21] have found further uses for the techniques in investing combinatorial properties of random linear codes.An analogous threshold toolkit for random codes was provided in [GMR + 21].
Broadly speaking, the techniques of [MRRZ + 20, GMR + 21] apply when considering a property of codes defined by forbidding a family of "bad" subsets, each of which have constant cardinality (independent of n).For example, the property of (ρ, L)-list-decodability is defined by forbidding all L-element subsets B = {x 1 , . . ., x L } of a Hamming ball B(z, ρ) = {x ∈ F n q : d(x, z) ≤ ρ} from appearing in the code.In [MRRZ + 20], it is proved that for any such local property there is a threshold rate R * such that: • If R < R * , a random linear code satisfies the property with high probability; • If R > R * , a random linear code fails to satisfy the property with high probability.
The theorem furthermore characterizes the threshold rate R * as the solution to a certain optimization problem.In this work, we endeavour to compute new bounds on the threshold rate R * for various properties of interest.
In the remainder, we provide intuition for the characterization of the threshold rate from [MRRZ + 20].First, we identify subsets B ⊆ F n q of size L with the matrix in F n×L q whose columns are given by B (the choice of ordering is immaterial), and we say that a matrix M is contained in a code C if C contains all of M 's columns.For a collection of matrices M ⊆ F n×L q , we would like to compute the threshold rate R * for "M-freeness," i.e., the code property of not containing a matrix in M.
As we are interested in list-decoding/recovery, we define a set of matrices M such that if C contains a matrix from M then C is not list-decodable/recoverable.We say that the collection M is "bad" for listdecoding/recovery.As intuition, for list-decoding we can just take the set of matrices where each column lies in some ball B(z, ρ).Next, we would like to show that M is "abundant" in the sense that it is very likely that C contains a matrix M ∈ M. In other words, if X M denotes the indicator random variable for the event M ⊆ C, then we should expect It is relatively easy to compute E[X M ] and see when it exceeds 1; however, to conclude that X M is likely to be large one needs a concentration bound.Such a bound is often provided by estimating the variance of X M .Broadly construed, [MRRZ + 20] applies the second moment method to demonstrate that there is really only one reason that X M would fail to be concentrated: it is because for some compressing matrix A ∈ F L×L q with L ≤ L the set {M A : M ∈ M} is too small.List-Recovery.First, we endeavour to prove a lower bound on the list-size for list-recovery.This means that we need to say that if the list-size is too small then the random linear code quite likely contains a matrix from a set M of bad matrices for list-recovery.In light of the above, to conclude our argument we need to show that for any compressing matrix A, the set {M A : M ∈ M} remains large.
To do this, we use information-theoretic techniques: we identify each of our bad matrices M ∈ M with an appropriate type, which is a distribution τ ∼ F L q defined as the empirical distribution of M 's rows.A lower bound on {M A : M ∈ M} is then implied by a lower bound on the entropy of the random variable A u for u ∼ τ .We are also free to choose the type τ which is "bad" for a certain property, in the sense that if a code contains a matrix of type τ then it fails to satisfy the property.
For the case of (ρ, , L)-list-recovery, the following type is bad: one samples uniformly S ∈ Fq and then outputs u = (u 1 , . . ., u L ) ∈ F L q , where each u i is independently uniform over S with probability 1 − ρ and uniform over F q \ S otherwise.It thus follows that a lower bound on {AM : M ∈ M} is implied by a lower bound on the entropy of the random variable A u for u ∼ τ .
Obtaining this lower bound requires a rather lengthy argument; we overview the main ideas now.We begin by partitioning the coordinates of A u into subsets J 1 , . . ., J k ⊆ [L ], where each J i depends on at least 2 "fresh" coordinates from u, along with (perhaps) a set of leftover coordinates J k+1 .We then provide two arguments depending on the maximum size of a part.If, say, |J 1 | is large, then we can show that (A u) J 1 already experiences a large entropy increase.This is shown by demonstrating that these coordinates alone already allow us to nontrivially guess the subset S. Otherwise, we argue that all the parts provide a nontrivial increase in the entropy, and since there must be a large number of parts in this case, by summing over all the parts we provide an adequate lower bound.
This result generalizes the list-decoding lower bound that was provided in [GLM + 21, Theorem IV.1].The argument in that paper exploited the fact that a sample from the bad type for list-decoding has a simpler structure: it looks like v +α 1, where v is a q-ary Bernoulli random variable and α ∈ F q is uniformly random.In our case, we do not have this nice linear structure,5 making the analysis more intricate.
List-Decoding with Small Lists.For our results concerning list-decoding with small lists, we again use the thresholds framework.In this case, we need to consider any type that is bad for (ρ, 3) or (ρ, 4)-listdecoding.For these small values of L, we are able to identify the linear map A which leads to the maximum relative entropy Hq(Aτ ) dim(Aτ ) : in each case, it is given by the map sending (x 1 , . . ., x L ) → (x 1 −x L , . . ., x L−1 −x L ).To provide the proof, we break up the vector spaces based on the number of distinct coordinates of the entries, and observe that a type which is bad for list-decodability can only put so much probability mass on each of these parts.To conclude, we rely on the concavity of the entropy function as well as some combinatorial reasoning concerning the subspaces of F 4 2 and F 3 q .Even for these small values of L we need to be quite careful to avoid a massive explosion in the number of cases to consider, as we must look at all compressing linear maps A.
Random Codes.For the case of random codes, we can compute the threshold rates for all the properties of interest in a relatively straightforward way, as the characterization from [GMR + 21] does not require us to consider any sort of compressing mapping on the types.Quite notably, in all cases we see that random linear codes appear to perform better than random codes.This is perhaps in contrast to commonly held beliefs: a main goal of our work is to disseminate this counterintuitive phenomenon.

Related Work
In Section 1.2 we outlined the works [MRRZ + 20, GLM + 21, GMR + 21] which developed and studied the thresholds toolkit that we apply.In this section, we provide more context for the study of random linear codes and their list-decodability/recoverability.In what follows, q always denotes the alphabet size and ε the "gap-to-capacity" for a capacity-approaching code.
List Size Lower Bounds for Random (Linear) Codes.As we provide lower bounds for list-recovery of random linear codes, we briefly survey the known lower bounds for list-decoding.First, Guruswami and Narayanan [GV10] showed that capacity-approaching random (linear) codes require lists of size Ω ρ,q (1/ε): by inspecting the proof one can note that the implied constant tends to 0 as ρ → 1−1/q, or if q → ∞.While on the surface their approach appears very different to ours, their use of a second-moment method is akin to the proofs underlying the thresholds framework from [MRRZ + 20], so the approaches are in fact somewhat similar.Later, Li and Wootters [LW18] gave a ∼ 1/ε list-size lower bound for capacity-approaching random codes.Again, the argument relies on the second-moment method.
In [GLM + 21], a lower bound for the list-decodability of capacity-approaching random linear codes is given, showing that lists of size ∼ hq(ρ) ε are required: our list-recovery list-size lower bound is a generalization of this result.Lastly, in [GMR + 21] the threshold rate for (ρ, 2)-list-decodability is computed, providing a lower bound and an upper bound: this segues us nicely into a discussion of the work on computing upper bounds on list-sizes.

List Size Upper Bounds for Random Linear Codes.
There has been a long line of work [ZP81, GHSZ02, GHK11, CGV13, Woo13, RW14, RW18, LW18, GMR + 21] studying the list-decodability of capacity-approaching random linear codes, and we now highlight some relevant results.First, Zyablov and Pinkser [ZP81] demonstrated that capacity-approaching RLCs are indeed (ρ, L)-list-decodable, albeit with L = q Ω(1/ε) .Subsequent work has endeavoured to prove list-decodability with L = O(1/ε).The existence of such linear codes over F 2 was first demonstrated by [GHSZ02]; later, [LW18] showed that this holds with high probability for randomly sampled linear codes, and subsequently [GLM + 21] showed this is true for average-radius6 list-decoding.
As for larger alphabets, [GHK11] showed that lists of size O ρ,q (1/ε) do indeed suffice for random linear codes.We further remark that their argument uses a certain Ramsey-theoretic concept called a 2-increasing sequence to choose the order in which to reveal coordinates, which is vaguely reminiscent of the "fresh" coordinates that we have defined by the J i 's in our list-recovery lower bound argument.A drawback of this work is that the implied constant in the O ρ,q (•) notation degrades as ρ → 1 − 1/q or if q grows too large.In light of this restriction, a line of works [CGV13, Woo13, RW14] has studied the "high noise regime," where ρ = 1 − 1/q − η and one endeavours to show that lists of size O(1/η 2 ) suffice for codes of rate Ω(η 2 ).These results are still not quite optimal in the sense that the implied constants (even for the rate) lag behind the parameters achievable by random codes.Lastly, for list-recoverability with input list-size it appears that the best upper bound on the list-size is due to [RW18], where it is shown that lists of size (q ) O(log( )/ε) suffice.

Open Problems
In this work, we have progressed our understanding of combinatorial properties of random (linear) codes.A main conclusion of our work is that for list-decoding/recovery, random linear codes perform better. 7here are many open problems which remain to be studied and we list some below.
• Provide the corresponding upper bounds on the threshold rate for (ρ, 4)-list-decoding binary random linear codes, and the threshold rate for (ρ, 3)-list-decoding q-ary random linear codes.
• Provide the corresponding lower bound on the threshold rate for (ρ, , L)-list-recovery in the capacityapproaching regime.In fact, for q > 2, the threshold rate for (ρ, L)-list-decoding is still open.This is quite likely a very challenging problem; the only tight argument we have is due to [GHSZ02,LW18] (see also [GLM + 21]) which only applies to list-decoding over the binary field, and this argument appears too "rigid" to apply in more generality.
• Get a better understanding for worst-case codes.In particular, to the best of our knowledge the Plotkin points for (ρ, L)-list-decoding for q > 2 are not known.That is, compute the minimum value ρ * such that for all ρ > ρ * , there are no q-ary (ρ, L)-list-decodable code families with positive rate.(Recent work [ZBJ20] expresses the Plotkin point as a solution to a certain optimization problem, but we do not see how to extract a simple expression from this.)

Organization
In the subequent section, we introduce the necessary notations and definitions that we will use in this work, along with the tools from [MRRZ + 20, GMR + 21] that we apply.In Section 3, we provide our lower bound on the list-size for the list-recoverability of random linear codes which approach capacity.In Section 4, we lower bound the threshold rate for list-of-2 decoding (for general q) and list-of-3 decoding (in the binary case).We also compare random linear codes to random codes over the binary alphabet for more values of L.

Prelimaries
Miscellaneous Notations.For an integer n ≥ 1, we denote [n] := {1, 2, . . ., n}.For a set X we denote by X the family of all subsets of X with elements, and similarly X ≤ denotes the family of all subsets of X with ≤ elements.Throughout, F q denotes the finite field with q elements, for q a prime power.
For clarity, vectors are typically denoted with an arrow overtop.Given a vector x ∈ F n q and a subset I ⊆ [n] we denote by x I the length |I| vector (x i : i ∈ I) ∈ F |I| q .We reserve 1 for the all-1's vector; if we wish to emphasize its length we subscript it, i.e., 1 D is the all-1's vector of length D. Random variables are typically written in boldface, e.g., x, y, etc.In particular, random vectors are denoted, e.g., u.

Coding Theory Terminology.
A code C is a subset of F n q for F q the finite field of order q, a prime power.Elements c ∈ C are called codewords, the integer n is the block-length, and the integer q is the alphabet size; such a code is also called q-ary.When q = 2 the code is deemed binary.We are typically interested in linear codes, which are C ≤ F n q , i.e., they are subspaces.The rate of a code Hamming distance from x to y.We also slightly extend this notation as follows: for a vector x ∈ F n q and a tuple of subsets S = (S 1 , . . ., S n ), S i ⊆ F q , we define d( x, S) ., the fraction of coordinates i for which x "disagrees" with the corresponding subset of S.
A random linear code of rate R is a uniformly random subspace of F n q of dimension Rn. 8 As this concept will arise regularly in this work, we occasionally use the abbreviation RLC.A random code of rate R is a random subset of F n q obtained by including each element independently with probability q (R−1)n . 9For this concept, we use the abbreviation RC.

List-decodability and List-recoverability
In this work, we study combinatorial properties of linear codes.Of primary interest to us are listdecodability and list-recoverability, which we now define.
The list-decoding capacity is the value R * (ρ) such that for any R < R * (ρ) there exists L > 1 such that infinite families of (ρ, L)-list-decodable codes of rate at least R exist, but for any R > R * (ρ) such an is sampled and we output C = ker(H).Of course, there is a small chance C has rate larger than R, but as this probability is exponentially small in n it is immaterial to our conclusions.We also briefly use the model where a random G ∈ F Rn×n q is sampled and we output C = im(G). 9By Chernoff bounds, such a code as rate R ± o(1) with high probability.
infinite family does not exist.It is known that In analogy to the list-decoding capacity, the list-recovery capacity is the value R * (ρ, ) such that for any R < R * (ρ, ) there exists L > 1 such that infinite families of (ρ, , L)-list-recoverable codes of rate at least R exist, but for any R > R * (ρ, ) such an infinite family does not exist.It is known that is the (q, )-entropy function.

Information-Theoretic Concepts
For a random variable x over a domain X we denote its entropy by where we use the convention 0 log 1 0 = 0.If τ is a distribution then we define H(τ ) to be the entropy of a random variable distributed according to τ .
Given another random variable y supported on a set Y, the conditional entropy of x given y is Intuitively, this is the expected amount of entropy remaining in x after revealing y.Conditional entropy satisfies the chain rule H(x, y) = H(x|y) + H(y), which can be extended by induction to larger collections of random variables.
We also use the notion of mutual information, which is a measure of the amount of information one random variable gives about another and is defined as follows: (The equalities are justified by the chain rule.)We also consider the conditional mutual information, defined as follows: where z is another random variable.
Conditional entropy, mutual information and conditional mutual information all satisfy the dataprocessing inequality: for any function f supported on Y (the support of y), we have We will also use Fano's inequality, which makes precise the intuition that if y allows one to guess the value of x with good probability, then the conditional entropy H(x, y) cannot be too large.
Theorem 2.3 (Fano's Inequality.).Let x be a random variable supported on X , y a random variable supported on Y and f : Y → X .Define When we wish to change the base of the logarithm with which the entropy or mutual information is computer, the desired base is subscripted.That is, H q (x) := H(x) log q , I q (x; y) := I(x; y) log q , and similarly for the conditional versions of these quantities.Finally, as a slight abuse of notation, we also write

Thresholds
We now introduce the specialized notations and tools that we will need in order to apply the machinery of [MRRZ + 20].First, for a distribution τ ∼ F b q and a linear map A : F b q → F c q , we let Aτ denote the distribution of the random vector A u for u ∼ τ .In more detail, Aτ has the following probability mass function: Pr While we are generally concerned with understanding the probability that certain "bad sets" lie in our code, it is in fact more convenient to work with matrices.For a matrix M ∈ F n×b q and a code C ⊆ F n q we say that C contains M if the columns of M are contained in C.
Every matrix is assigned a type, and the type of a matrix is determined by the matrix's empirical row distribution as follows: Definition 2.4 (τ M , dim(τ ), M n,τ ).For a matrix M ∈ F n×b q , we define its type τ M to be the distribution given by the empirical distribution of M 's rows.That is, for all v ∈ F b q we have For a distribution τ on F b q , dim(τ ) denotes the dimension of the span of τ 's support, i.e., dim(τ ) := dim(span(supp(τ ))).
We denote by M n,τ the set of all matrices in F b×n q with empirical row distribution τ .We call a type τ b-local if τ ∼ F b q ; note that a b-local type has dim(τ ) ≤ b.Remark.Technically, for a distribution τ ∼ F b q it could be the case that M n,τ is empty just because, for some v ∈ F b q , τ ( v) / ∈ {0, 1/n, 2/n, . . ., (n − 1)/n, 1}.For such τ , we can define M n,τ to consist of those matrices which contain either n • τ ( v) or n • τ ( v) copies of v.As we are always dealing with the setting where n is assumed to be sufficiently large compared to all other parameters, this does not affect the analysis.Hence, we may safely ignore this technicality, which we do for the clarity of exposition.
Our target is an understanding of the threshold rate for a combinatorial property of random linear codes.The combinatorial properties that we will study are those that are defined by excluding a set of types, as follows.
Definition 2.5 (τ -freeness, local properties).Given a code C and a type τ , we say that C is τ -free if C does not contain any matrix M ∈ M n,τ , i.e., no matrix M of type τ .
For a set T of types, where each τ ∼ F b q for some b ∈ N, we say that C is T -free if it is τ -free for all τ ∈ T .We refer to T -freeness as a b-local property of codes.
For a more in-depth discussion of the definition, we refer the reader to, [MRRZ + 20, Section 2] or [Res20, Chapter 3].To provide some intuition, we demonstrate how (ρ, , L)-list-recoverability may be described as an L-local property.We define T to be the set of all types τ ∼ F L q such that for some (correlated) Pr and furthemore we require (This second condition amounts to requiring that any matrix of type τ has distinct columns.)We refer to the collection of all these types as T ρ, ,L .We now characterize (up to o(1) terms) the threshold rate of a property.
Theorem 2.6 ([Res20], Theorem 3.3.9:Thresholds for Random Linear Codes).Fix b ∈ N and let T be a set of b-local types.Then the threshold rate for T -freeness is where the minimum is taken over all surjective linear maps A : Let us specialize to the case of τ -freeness for a single type τ .Suppose that R > 1 − min A Hq(Aτ ) dim(Aτ ) .Theorem 2.6 tells us that it is unlikely that a RLC of rate R is τ -free.Stated differently, we can expect that there is at least one matrix of type τ contained in such an RLC.In fact, while we do not prove this, it is in fact likely that there will be many such matrices.For this reason, we use the following terminology for types τ satisfying R > 1 − min A Hq(Aτ ) dim(Aτ ) : we call them abundant.
In proving an upper bound R upper on the threshold rate for a property of interest (e.g., (ρ, , L)-listrecovery), we will follow the following steps.First, we define an appropriate type τ and prove that a code satisfies the property of interest only if it is τ -free.Informally, we refer to this as a proof that τ is bad for the property of interest.Next, we show that for RLCs of rate R upper , the type τ is abundant.This is the more challenging part of the theorem, as the minimization over the set of all linear maps A is quite challenging to control.Nonetheless, we are able to carry out this program for (ρ, , L)-list-recovery, as advertised.
In proving a lower bound on R low on the threshold rate for a property of interest (e.g., (ρ, 3)-listdecoding), we need to consider any type that is bad for list-decoding, and then show that it is implicitly rare: that is, for some matrix A, there are relatively few matrices of type Aτ , and hence it is likely no matrix of that type lies in the RLC.That is, we must upper bound the ratio of the entropy of Aτ with the dimension of Aτ .Here, we have the freedom to choose A, but the argument must apply to all types τ .This is especially tricky when given a type τ whose support is contained in a strict subspace, as then the bound on the entropy must be commensurately smaller.It is for this reason that we only consider small values of L, as one suffers from a combinatorial explosion in the number of possible support spaces for the types.
Thresholds for Random Codes.For thresholds of random codes, the characterization theorem is simpler in the sense that we do not have to minimize over compressive mappings, at least if the property satisfies certain technical conditions.Fortunately, the characterization applies to list-recoverability, and hence also list-decodability.
All these parameters are constants, independent of the growing parameter n.Our main result in this section is the following theorem.
Theorem 3.1.There exists ε q, ,ρ,δ > 0 such that for all 0 < ε < ε q, ,ρ,δ and n sufficiently large, a random linear code in F n q of rate 1 − h q, (ρ) − ε is not ρ, , The proof of this theorem follows the same outline as has been used in, e.g., [GLM + 21].Namely, we begin by defining a L-local type which we show is bad for (ρ, , L)-list-recovery.Later, we prove that the type is indeed abundant, which is the more challenging part of the theorem.
The bad L-local type is defined as follows.
Definition 3.2 (The bad type for (ρ, , L)-list-recoverability). Fix L ∈ N. Define the distribution τ ∼ F L q via the following procedure for sampling a random vector u = (u 1 , . . ., u L ): • First, S ∼ Fq is sampled uniformly at random; • Second, for i = 1, . . ., L, we sample and conditioned on S = S, the coordinates u 1 , . . ., u L are independent.
Note that such a type does indeed lie in the set T ρ, ,L .Indeed, if ν ∼ S we clearly have and we also readily have Pr u∼τ [u i = u j ] > 0. From [GMR + 21], we conclude that τ is bad for (ρ, , L)-listrecovery.
We now claim that the type τ is indeed abundant, i.e., that it has sufficiently large (relative) entropy.This is the more technical part of the proof, and its proof is deferred to Section 3.1.

Lemma 3.3.
There exists an integer L ρ,q, ,δ such that for all integers L ≥ L ρ,q, ,δ , the following holds.Let u ∼ τ , and let A ∈ F L ×L q with L ≤ L and rank(A) = L .Then Assuming Lemma 3.3, we now show that this does indeed yield our target Theorem 3.1.
By Lemma 3.3, as L ≥ L ρ,q, ,δ/2 we have that for all surjective linear maps A : We note further that as τ has full support the same is true for Aτ , i.e., dim(Aτ ) = L .Thus, by Theorem 2.6 we have that the threshold rate for τ -freeness is at most where the last inequality holds for large enough n.In other words, a random linear code of rate 1−h q, (ρ)−ε contains a matrix M ∈ M n,τ with probability 1 − o(1).As we know that a code C which contains a matrix of type τ is not (ρ, , L)-list-recoverable, our theorem is proved.

Proof of Lemma 3.3
In this section we prove Lemma 3.3.
Proof.Observe that the second inequality is trivial (it just uses that L ≥ L ), so we focus on the first one.First, note that by definition for any i ∈ [L] we have On the other hand, H q (u i ) = 1 as u i is uniformly distributed over the randomness of S. Note that if B ∈ F L ×L q and C ∈ F L×L q are any full-rank matrices then H q (A u) = H q (BAC u), so without loss of generality we may apply row operations and column permutations to A so that it has the form where k := L − L .When A has this form, we can write where , the sets J 1 , J 2 , . . ., J k+1 form a partition of [L ].Note that some of the sets J i could be empty.We emphasize that if i / ∈ supp( w (1) ) ∪ • • • ∪ supp( w (k) ), then i ∈ J k+1 .Define u J i = (u j ) j∈J i and w (j) J i to be the vector w j restricted to the indexes belonging to J i .Thus, by definition, each component of w (i) J i is nonzero.We also set α k+1 = 0, i.e., we define α k+1 ∈ F q to be a random variable of F q which is equal to 0 with probability 1.
For intuition, consider computing the entropy of the random variable A u ∈ F L q by revealing the coordinates of J 1 , then the coordinates of J 2 , and so on.Everytime we reveal the coordinates of a new set J i it will depend on a "fresh" coordinate u L +i of u, which did not influence (A u) J 1 ••• J i−1 .Thus, there is "new entropy" which we can lower bound, permitting us to incrementally lower bound the entropy of A u.
We now make the following claim.It allows us to conclude that, for coordinates in one of the J i 's with i < k + 1, the (marginal) entropy of the coordinate is strictly greater than h q, (ρ) (after conditioning on S).Claim 3.4.For any integers 1 ≤ i ≤ L and 1 ≤ j ≤ k and β ∈ F × q , we have for some λ = λ ρ,q, > 1.
To not distract from the flow of the proof, we defer the proof of Claim 3.4 to Appendix A. We now split the proof into two cases, depending on the maximum size of the sets J 1 , . . ., J k+1 .

Case 1: max i∈[k+1] {|J i |} ≤
L(λ−1)h q, (ρ) .In this case, we do not expect any of the (A u) J i 's to have particularly large entropy.So we can lower bound the entropy of (A u) "step-by-step", lower bounding the additional entropy after revealing each of the (A u) J i 's one at a time.Claim 3.4 allows us to guarantee that we have a sufficiently large increase in entropy.We begin by applying the chain rule and the definition of mutual information to expand H q (A u) as follows: Iterating this argument, one finds where in the final equality we used the fact that J 1 , . . ., J k+1 form a partition of [L] and hence u J 1 , . . ., u J k+1 , α 1 , . . ., α k+1 determine A u. Now, we manipulate a bit the mutual information terms in the above summation.For any 1 ≤ i ≤ k, we have the term When i = k + 1, we wish to lower bound the term For convenience, we relabel the random vector where y 1 , . . ., y d are fixed nonzero elements of F q , x 1 , . . ., x d are, conditioned on S, mutually independent random variables satisfying and z is sampled as the other x i 's.
Recall that, in this case, we are d ≤ L(λ−1)h q, (ρ) .From (5), We now proceed to lower bound (6), i.e., the entropy H q (u J k+1 ).For convenience, relabel the random vector u J k+1 as (x 1 , x 2 , . . ., x d ).Then, Thus, we have (ρ) .For some d ρ,q, ,δ to be chosen later, if we require L ≥ d ρ,q, ,δ (λ−1)h q, (ρ) , this implies that there exists some i ∈ [k + 1] with |J i | > d ρ,q, ,δ .We will show that the entropy in these coordinates already guarantees that we have a sufficiently large increase in the entropy, even when we use a relatively simple lower bound on the entropy of the other parts.Assuming i = 1 (which is almost without loss of generality), we do this by demonstrating that u J 1 is informative enough to let us guess the set S with very good probability.Fano's inequality (Theorem 2.3) implies that u J 1 has large entropy.The details follow.
It is useful to consider two subcases.
Subcase 1: i = k + 1. ease notation, we may reorder indices so that i = 1.Analogously to equation (3) (but now expanding in the opposite direction), we have We begin by studying the terms in the above summation with i > 1. Observe that for each such i, ).The data-processing inequality thus implies that We now consider the i = 1 term of (9), which is and seek an effective lower bound.This again corresponds to lower bounding H q (x 1 + y 1 z, . . ., x d + y d z), where the x i 's, y i 's and z are defined as in equations ( 7), (8) and the surrounding text.Recall that we are assuming that d ≥ d ρ,q, ,δ .We have The equality in (11) uses the fact that once z is revealed (x 1 x 2 , . . ., x d ) and (x 1 + y 1 z, x 2 + y 2 z, . . ., x d + y d z) have the same entropy.Formally: We lower bound the first term of (12).
where the first equality uses the fact that z is conditionally independent of x 1 , . . ., x d , given S. The second equality uses the fact that the x i 's are mutually conditionally independent given S, and each satisfies H q (x i |S) = h q, (ρ).
Next, we look at the H q (S|z) term of (12).Recalling the distribution of z (it is one of the u i 's, relabeled) we may apply Bayes' Rule for conditional entropy to get Thus, we have It thus remains to upper bound H q (S|x 1 , x 2 , . . ., x d ), a task for which we use Fano's inequality, Theorem 2.3.In order to do this, we must find a function f : F d q → Fq so that p err = Pr[f (x 1 , . . ., x d ) = S] is very small.We define f in the most obvious way: f (x 1 , . . ., x d ) := {y 1 , . . ., y } if y 1 , . . ., y are the most frequent elements appearing in (x 1 , . . ., x d ) (breaking ties arbitrarily).For any α ∈ F q , let c α = |{x i = α : i ∈ [d]}| be the random variable counting the number of x i 's taking on the value α.
Observe that Note that the assumption ρ < 1 − q is equivalent to 1−ρ > ρ q− .By the Chernoff bound, we therefore have that for any S ∈ Fq , α ∈ S and β / ∈ S: Thus, by applying the total probability rule and taking a union bound over all pairs (α, β) ∈ S × (F q \ S), we may upper bound the probability of error p err as Fano's inequality therefore yields Thus, for any δ > 0, there exists d ρ,q, ,δ such that if Putting everything together: Thus, combining Equations ( 10) and ( 13), we obtain the desired lower bound on H q (A u).
Subcase 2: i = k + 1 The proof follows almost the same as Subcase 1 except that we now want to prove H q (x 1 , . . ., x d ) ≥ dh q, (ρ) + log q q − δ as α k+1 = 0 in this case, and z corresponds to α k+1 .Observe that this entropy equals Apply Fano's inequality in the same manner as in the previous subcase to the last term and we can conclude that H q (S|x 1 , . . ., x d ) ≤ δ when d ≥ d ρ,q, ,δ .This completes the proof of this case, and therefore also the proof of the lemma.

List-recoverability lower bound for random codes
For context, we provide nearly matching upper and lower bounds for list-recovery for uniformly random codes.There is a similar result for list-recovery provided in [GMR + 21], but it is not optimized for the case of capacity-approaching codes.
Theorem 3.5.There exists ε q, ,ρ,δ > 0 such that for all 0 < < ε q, ,ρ,δ and n sufficiently large, a random code in F n q of rate 1 − h q, (ρ) − ε is not ρ, , On the other hand, for any ε > 0 and n sufficiently large, a random code in In this way, we can essentially pin-down the list size of a rate 1 − h q, (ρ) − ε random code to one of three possible values.This is similar to the result on the list-decodability of binary random linear codes from [GLM + 21].
Observe that if we want to prove a lower bound on the threshold rate of a random code instead of a random linear code, we can restrict to the case that the matrix A from Lemma 3.3 is the identity matrix.Thus, we are in the setting k = 0 and so we are in Subcase 2 where i = k + 1 = 1.We may reuse the lower bound on the entropy H q (x 1 , . . ., x L ) from this case, yielding the following lemma.Lemma 3.6.There exists an integer L p,q, ,δ such that for all integers L ≥ L ρ,q, ,δ , the following holds.Let u ∼ τ .Then An argumentation analogous to that of the proof of Theorem 3.1 yields the following corollary.
We proceed to pin down the threshold rate of list recovery of random code by showing an upper bound.
Clearly, the combination of Lemmas 3.6 and 3.8 yields our target, Theorem 3.5.
Proof.It suffices to prove an upper bound on H q (τ ) for any τ ∈ T ρ, ,L .In pariticular, this means that for some ν ∼ Fq we have ∀i ∈ [L], Pr Note that We turn to upper bound H q (τ |τ ).Let ( u, S) ∼ τ ) with u = (u 1 , . . ., u L ) and we compute The last inequality is due to The proof is completed.

List-Decoding with Small Lists
In this section, we investigate the list-decodability of random codes and random linear codes with constant list sizes.Specifically, for list-of-3 decoding over the binary field, we can show that the threshold rate for list-decoding of random linear codes is strictly better than that for list-decoding uniformly random codes.Further, for larger field sizes we are able to show that the threshold rate for list-of-2 decoding is strictly better for random linear codes than for uniformly random codes.This extends the result of [GMR + 21] which only applies to list-of-2 decoding for binary codes.
For our lower bound on the threshold rates for RLCs, we follow the following procedure.First, we consider any type that is bad for, e.g., (ρ, 3)-list-decoding, i.e., a type from T ρ,1,3 .For any such type τ , we upper bound Hq(Aτ ) dim(Aτ ) for the linear map A sending (x 1 , x 2 , x 3 ) → (x 1 − x 3 , x 2 − x 3 ).This is straightforward when the dim(Aτ ) is full (requiring essentially only the concavity of the entropy function); when it is smaller, more careful reasoning is required.
As a final contribution, we recall that in [GLM + 21] it is shown that over the binary field the threshold rate for random linear codes is strictly better than random codes in the capacity-approaching regime.We observe that their techniques can be extended to show that such a trend holds for any constant list size L (assuming the decoding radius ρ is not too large).To do this, we first prove a lower bound on the threshold rate of binary random linear codes by applying the argument in [LW18] and an upper bound on the threshold rate of binary random codes following the argument in [GLM + 21].Although our proof resorts to known techniques, such results were not stated before and greatly strengthen our belief that random linear codes perform better than random codes.In light of the available evidence, a reasonable conjecture would be that the for all alphabet sizes, the threshold rate of random linear codes is strictly better than that of random codes.

List-of-3 Decoding for Binary Alphabet
In this section, we study the threshold rate for list-of-3 decoding binary codes.We recall that the Plotkin point for list-of-3 decoding binary codes, i.e., the maximum value of ρ for which (ρ, 4)-list-decoding with positive rate is possible, is 5/16 [ABP18].Our main theorem is the following: Theorem 4.1.Let ρ ∈ (0, 5/16).The threshold rate for (ρ, 4)-list-decoding a random code over F 2 is at least Proof.Let τ ∈ T ρ,1,4 , which we recall means τ ∼ F 4 2 and there is a distribution ν ∼ F 2 for which ∀i ∈ [4], Pr and furthermore Note that condition (15) implies Note that if z = MAJ( u) then the left-hand-side of ( 16) can only decrease.Thus, we have Define the sets Observe that (17) implies that x 1 + 2x 2 ≤ 4ρ.We also clearly have x 1 + x 2 ≤ 1 and x 1 , x 2 ≥ 0; in the sequel, these two constraints are always assumed to hold for x 1 , x 2 .
Otherwise, dim(τ ) ≤ 3.If dim(τ ) = 3, this implies (1, 1, 1, 1) belongs to the support of τ .There are another two linearly independent vectors v 1 , v 2 in its support.We note that it suffices to consider the linearly independent vectors so as to ensure that the matrix generated by τ has distinct columns.By symmetry, it suffices to consider vectors of weight 1 or weight 2. It is clear that at least one of them must have weight 2. By symmetry, we assume v 1 = (1100).To generate distinct columns, the first component and the second component of v 2 must be different and so do the third and fourth component.This implies that v 2 = (0101) or v 2 = (1010).Due to the symmetry, we only need to consider the case v 1 = (1100), v 2 = (0101).Once the support set of τ is determined, we find that the support set of τ is exactly A 0 ∪ A 2 .A simple calculation shows H 2 (τ ) ≤ H 2 (x 2 ) + x 2 log 2 3, subject to x 2 ≤ 2ρ.As this bound clearly increases with x 2 , we have To show the upper bound from (18) is indeed larger, one can optimize the equation on the boundary x 1 + 2x 2 = 4ρ.To do this, one may take a derivative and solve for the critical point, which is a quadratic equation in x 2 whose positive root is A (tedious) computation shows that this bound does dominate h 2 (2ρ)+2ρ log 2 3 3 ; see Figure 2. Now, we proceed to the case dim(τ ) = 2.In this case, (1, 1, 1, 1) does not belong to the support of τ .There are two linearly independent vectors v 1 , v 2 in its support.By symmetry, the same argument shows that the only case is v 1 = (1100), v 2 = (0101).We conclude that H 2 (τ ) = H 2 (τ ) = H 2 (0, x 2 ) + x 2 log 2 3, subject to x 2 ≤ 2ρ.The same conclusion applies.The case dim(τ ) = 1 will result in that the matrix generated by τ does not have distinct columns.We can thus easily rule out this possibility.The proof is completed.
Next, for context, we consider the threshold rate for (ρ, 4)-list-decoding uniformly random codes.Theorem 4.2.Let ρ ∈ (0, 5/16).The threshold rate for (ρ, 4)-list decoding a random code over F 2 is Proof.Let τ ∈ T ρ,1,4 , and again define the sets the same reasoning that we used in the proof of Theorem 4.1 tells us x 1 + 2x 2 ≤ 4ρ.Now: due to the concavity of function f (x) = x log 2 x.This means the threshold rate of (ρ, 4)-list decoding a random code over F 2 is at least On the other hand, let x 1 and x 2 be the values achieving the maximum of 1 + H 2 (x 1 , x 2 ) + 2x 1 + x 2 log 2 3. We construct the distribution τ such that τ It is easy to verify that such τ achieves the maximum value and thus this lower bound is indeed the threshold rate for (ρ, 4)-list decoding a random code.
As 1+F 4 ≥ F 3 for all F ≤ 3, the lower bound on the threshold rate provided by Theorem 4.1 is greater than the exact value from Theorem 4.2.This demonstrates that random linear codes do indeed perform better.

List-of-2 Decoding for Arbitrary Alphabets
We now study list-of-2 decoding over F q for q ≥ 3. Here, the Plotkin point is to the best of our knowledge unknown, and we just prove our result for ρ < 1/3.Theorem 4.3.Let ρ ∈ (0, 1/3).The threshold rate for (ρ, 3)-list decoding random linear code over F q with q ≥ 3 is at least Since the linear code is (ρ, 3)-list decodable, by assuming z = MAJ( u) we observe that x 1 + 2x 2 ≤ 3ρ (this is analogous to the argumentation from the proof of Theorem 4.1).Clearly, we also have the constraint x 1 + x 2 ≤ 1 and x 1 , x 2 ≥ 0: in the remainder of the proof, these constraints are assumed to be satisfied.
For each distribution τ , we want to find τ = Aτ to reach min τ ∈Iτ Hq(τ ) dim(τ ) .If dim(τ ) = 3, the same argument in Theorem 4.4 shows that We now consider τ defined by the linear map (x − z, y − z).The kernel of this linear map is {(x, x, x) : x ∈ Observe that the preimage of the linear map in B i is exactly A i , i.e., τ (B i ) = τ (A i ).Thus, we have The first inequality is due to the concavity of x log q x.If dim(τ ) = 2, we obtain that This is smaller than the upper bound given by (21) as F 2 ≤ F +1 3 for F ≤ 2. It remains to consider the case dim(τ ) = 1 under this linear map.We divide it into two cases.
Case 1: dim(τ ) = 2.In this case, the support of τ must contain a nonzero element (a, a, a) in A 0 .By the linearity of A 0 , we assume that (b, c, d) / ∈ A 0 also lies in the support of τ .First, we claim that b, c, d must be distinct.Otherwise, without loss of generality, we assume that b = c.Then, the support of τ is contained in span Fq {(a, a, a), (b, b, d)} ⊆ A 1 ∪ A 0 .The first two coordinates of τ are always the same which contradicts the distinctness requirement.Thus, the support set of τ is contained in {λ(b−d, c−d) : λ ∈ F q } ⊆ B 0 ∪B 2 .This also implies that τ (A 1 ) = x 1 = 0.This leads to Clearly, the latter upper bound is smaller.Its maximum value is attained at Case 2: dim(τ ) = 1.The same argument in Case 1 implies that the support of τ must contain an element (x, y, z) such that x, y, z are distinct.It is clear that τ (A 1 ) = x 1 = 0.The same argument shows that subject to x 2 ≤ 3p 2 .We obtain the same function appearing in Case 1 and the same conclusion holds.It remains to compare the upper bound (22) with (23).For ρ < 1 3 , if we plug x 1 = 3ρ, x 2 = 0 into Equation (22), we obtain that max x 1 +2x 2 ≤3ρ H q (x 1 , x 2 ) + x 1 log q 3(q − 1) + x 2 log q (q − 1)(q − 2) 2 ≥ H q (3ρ, 0) + 3ρ log q 3(q − 1) 2 .
The proof is completed.
For context, we again consider random codes.
Again, by noting 1+F 3 ≥ F 2 for all F ≤ 2, we conclude that random linear codes do indeed perform better: the lower bound on the threshold rate furnished by Theorem 4.3 is strictly greater than the exact threshold rate of Theorem 4.4.

List Decoding for Binary Alphabets with Larger Lists
In this subsection, we observe that the list-decodability of random linear codes is better than random codes over the binary field for any list size L.
We begin by stating our possibility result for random linear codes.The proof is an adaptation of the argument from [GHSZ02,LW18].
Theorem 4.5.For any fixed list size L and δ > 0, a random linear code over the binary field of rate 1 − h 2 (ρ) − h 2 (ρ) L−1−2δ − δ is (ρ, L)-list decodable with probability 1 − 2 −Ω δ,L (n) .For space reasons we just show that a random linear has positive probability of achieving the stated list-decodability, as is done in [GHSZ02]; for the "with high probability" result the ideas used by [LW21] apply.
Therefore, there exists v i ∈ F n 2 such that S C i ≤ S 2 C i−1 .We continue in this manner to reach C k with k = (1 − h 2 (ρ) − 1 L − δ)n.Then, we have On the other hand, we have that C k is (ρ, L max )-list-decodable, where L max = max x∈F n 2 L C k ( x).We now bound L max .Since L C k ( x) = L C k ( x + c) for any c ∈ C k , we have Thus, we conclude that L max ≤ L h 2 (ρ) + 1 + δ = L − δ = L − 1.This completes the proof.
Next, we provide an upper bound on the list size of a random code.The proof uses the threshold framework.
From these two theorems, we note the following.If we let δ tend to 0, the upper bound provided by Theorem 4.6 is smaller than that provided by Theorem 4.5 as (3 + 1 L−1 )h 2 (ρ) − h 2 (2ρ − 2ρ 2 ) < 1, assuming ρ is not too large.

Theorem 2. 7
([GMR + 21], Theorem 2: Thresholds for Random Codes).Let b ∈ N and let T be a set of b-local types.Let T be a convex approximation for T .Then the threshold rate for T -freeness is