A Study of Sparse Representation of Boolean Functions

OF THE DISSERTATION A STUDY OF SPARSE REPRESENTATION OF BOOLEAN FUNCTIONS by Yekun Xu Florida International University, 2021 Miami, Florida Professor Ning Xie, Major Professor The Boolean function is one of the most fundamental computation models in theoretical computer science. The two most common representations of Boolean functions are Fourier transform and real polynomial form. Applying analytic tools under these representations to the study Boolean functions has led to fruitful research in many areas such as complexity theory, learning theory, inapproximability, pseudorandomness, metric embedding, property testing, threshold phenomena, social choice and etc. In this thesis, we focus on sparse representations of Boolean function in both Fourier transform and polynomial form, and obtain the following new results. A classical result of Rothschild and van Lint asserts that if every non-zero Fourier coefficient of a Boolean function f over F2 has the same absolute value, namely |f̂(α)| = 1/2 for every α in the Fourier support of f , then f must be the indicator function of some affine subspace of dimension n − k. Here we slightly generalize their result, and show that Boolean functions whose Fourier coefficients take values in the set {−2/2,−1/2, 0, 1/2, 2/2} are indicator functions of two disjoint affine subspaces of dimension n − k or four disjoint affine subspaces of dimension n − k − 1. Our main technical tools are results from additive combinatorics which offer tight bounds on the affine span size of a subset of F2 when the doubling constant of the subset is small. For polynomial representation of a Boolean functions, we study the distribution of the number of non-zero coefficients of random Boolean functions. For a random Boolean


INTRODUCTION
The fundamental problem in complexity theory [AB09] is to determine which problems are easy to solve computationally and which ones require more time or space for computers to find solutions. During the last half century, many scientists have contributed to this area and have developed many fruitful theories, such as NP-completeness [Coo71, Lev73,GJ79], PCP theorems for hardness of approximations [FGL + 96, AS98, ALM + 98], and parameterized complexity [FL87,ADF95,PY96,BFR98,DF12]. For a given problem, we may be able to design efficient algorithms, which lead to upper-bounds for the complexity; or the least time needed to solve the problem, which leads to lower-bounds.
If the gap between upper and lower bound are asymptotically negligible, we established the tight complexity result of the problem. For example, a classical result is that sorting n numbers using any comparison-based algorithm takes Θ(n log n) time.
Although different hardware may have different computation power, causing different running time for the same problem on different computers, we still have theoretical methods to measure the complexity. For example, we could use the number of bit operations, the size of circuit that computes the problem, or the simplicity of the function computes the problem as measurements. Moreover, sometimes the model of question may not rely on a single computing device, where the input data spread among multiple persons or computers. We could measure the difficulty by the number of communications between the multiple parties, which leads to the area of communication complexity [Yao79,KN97].
Despite much success, unfortunately, after decades of intensive research, there are numerous computational problems of which tight complexity bounds are still elusive. Of central importance to both computer science and mathematics is the well-know P versus NP problem: for all NP problems, can they be solved in polynomial times in terms of input size? Any state of the art algorithms for NP-complete problems still take exponential running time.
A systematic approach for studying computation complexity may start with formally modelling computational problems in a most generalized way. Note that almost all problems that can be computed by a computer can be modelled well by one or a collection of

Boolean functions
'Boolean', which represents the work derived by George Boole, is the subfield of mathematics handling variables only in two values, true or false, or equivalently 1 or 0, respectively. Boolean cube of dimension n, {0, 1} n can be considered as the set of all n-length boolean vectors or binary strings. Every n-length binary string S ∈ {0, 1} n can also be considered as a subset of [n] = {1, 2, · · · , n}. Any function f : {0, 1} n → R defined on the Boolean cube will be assigned a value f (S) for any input S.
For every function f defined on a Boolean cube, we may consider the input as n Boolean variables representing true or false, with arbitrary outputs. In the area of computer science, we usually focus on Boolean functions which restrict the output to Boolean variables only. Naturally, we can represent function f as a circuit of logical gates with n bits of Boolean variables as input, returning a single bit as output. The circuit may contain complicated multi-way gates, but using basic 2-way AND/OR/XOR gates would be enough to represent all functions.
The most common alternative way to characterize a function is by providing a polynomial that computes f . In order to compute a polynomial, we have to assign real numbers to represent the input bits.
The standard polynomial representation is assigning 0 to false and 1 to true, while multiplications are equal to logical AND gates. For each monomial, i∈S x i equals to 1 if and only if all the x i ∈ S are 1(true).
Another most widely used representation is called Fourier representation, which assigns 1 to false and −1 to true. In this representation, multiplications are equal to logical XOR gates. Then for each monomial in Fourier representation, i∈S x i equals to 1 if and only if the number of negative(true) x i ∈ S are even. By multiplying out the equations, we get the following two transformation formulas.

Relations between two representations
Although the coefficients differ substantially, the basis generated by Fourier representations is orthogonal, while the basis by standard polynomials is not. These two representations are essentially equivalent, represent the same functions, and can be easily converted via the above formulas. The study of properties of one representation will provide better understanding of the other.
In this thesis, we contributed to the understanding of sparsity of both representations of Boolean functions. For Fourier representation, we provide brand-new results, which could characterize the structure of the function support based only on the magnitude of the Fourier coefficients of the functions under specific circumstances. For real polynomial representation, we give statistical results to improve the understanding of the distribution and concentration of sparsity for all Boolean functions.

Introduction
One of the most fruitful approaches in functional analysis is to represent functions as sums of simple and well-structured objects, such as sine wave functions and polynomials. Such where Lf (x) := f (Lx). Now consider the following two families of Boolean functions One can check easily that both f k and g k are indeed Boolean functions and the multisets of non-zero Fourier coefficients On the other hand, the Fourier dimension -dimension of the subspace spanned by vectors at which the function's Fourier coefficients are non-zero -of f k is 2 while the Fourier dimension of g k is 3. Since the Fourier spectrum transforms according to (L T ) −1 when the function undergoes the linear transformation L, it follows that there is no invertible linear transformation L that maps f k to g k , i.e. they are not isomorphic to each other. Another such example is the class of address functions f n : F n 2 → {−1, 1}, where n = k + 2 k for some positive integer k, together with the class of functions g n : F n 2 → {−1, 1} formed by tensoring some bent function on 2k-bits with a δ-function on n − 2k bits. Then both f n and g n have 2 2k non-zero Fourier coefficients, with 2 2k−1 + 2 k−1 of them taking value 1/2 k and 2 2k−1 − 2 k−1 of them taking value −1/2 k ; moreover, since the Fourier dimension of f n is n and the Fourier dimension of g n is 2k < n, these two functions are not isomorphic to each other.

Rothschild and van Lint Theorem
Rothschild and van Lint [RvL74] (see also Chapter 13, Lemma 6 in [MS77]) proved the following theorem: Theorem 2.1.1. Let n ≥ 1 and 0 ≤ k ≤ n. Let f = 1 S be the indicator function of a set S ⊆ F n 2 of size |S| = 2 n−k . If for every α ∈ F n 2 , |f (α)| is equal to either zero or 1/2 k , then S is an affine subspace of dimension n − k.
In other words, Rothschild and van Lint Theorem shows that, up to an invertible linear transform, we have a complete characterization when the Fourier coefficients of a Boolean function are all from the set {−1/2 k , 0, 1/2 k }: the Boolean function must be the indicator of some affine subspace of co-dimension k.
A natural question is: how far can we extend such a nice characterization in terms of the values of Fourier coefficients only? Following [GOS + 11], for a rational number x, the granularity gran(x) of x is defined to be the least nonnegative integer k such that For a Boolean function, its granularity is known to be intimately correlated with its Fourier sparsity [GOS + 11] -the number of non-zero Fourier coefficients; see discussion in Section 2.1.4 for more details. Therefore, one can view Rothschild and van Lint Theorem as a characterization of k-granular Boolean functions with minimum support size (that is,f (0) = |{x : f (x) = 1}|/2 n = 1/2 k ).

Our results
In this chapter, we slightly generalize Rothschild and van Lint Theorem to give a complete characterization of k-granular Boolean functions of support size 2 n · 2/2 k = 2 n−k+1 .
Roughly speaking, our main theorem is the following: Theorem 2.1.2 (Informal statement). For large enough integers n ≥ k, if a Boolean function f : F n 2 → {0, 1} has all its Fourier coefficients in the set {0, ±1 2 k , ±2 2 k }, then f is the indicator function of disjoint union of two affine subspaces of dimension n − k.
Our Main Theorem is based on the following Main Lemma, which deals with the general case of k ≥ 5, together with case analysis 1 for small values of k.
Lemma 2.1.3 (Main). Let k ≥ 5 and n ≥ k be integers. Let f : F n 2 → {0, 1} be a Boolean function such thatf (0) = 1/2 k−1 and any other Fourier coefficients are either zero or equal to ± 1 2 k , then f is the indicator function of a disjoint union of two dimension n − k affine subspaces.

Proof overview and our techniques
The original form of Rothschild and van Lint Theorem was stated to characterize subspaces in affine geometry and projective geometry. For completeness and more importantly, because the first step in our proof of the main theorem follows a similar strategy, we present a slightly different proof using the notation of Fourier analysis.
A proof of Rothschild and van Lint Theorem. We prove the theorem by induction on n. It is trivial to see that the theorem holds for n = 1 (for both k = 0 and k = 1). Let n ≥ 2. Clearly there is nothing to prove for k = 0 and k = n, so we assume 0 < k < n. Note thatf (0) = |S|/2 n = 1/2 k , then by Parseval's identity, there exists a non-zero α such thatf (α) = 1/2 k or −1/2 k . Assume thatf (α) = 1/2 k and the case off (α) = −1/2 k is similar. Applying an invertible linear transform L that maps α to e 1 , where e 1 stands for the standard basis vector (1, 0, . . . , 0). Note that both the Fourier spectrum of f and any affine subspace are invariant under invertible linear transformations, hence it suffices to argue about g := Lf . Now we haveĝ(0) =ĝ(e 1 ) = 1/2 k . Applying a linear restriction over the first bit of the input to get sub-functions g 0 and g 1 (see Proposition 2.2.3 in 2.2.3 for details). By (2.2),ĝ 1 (0) =ĝ(0) −ĝ(e 1 ) = 0, which implies that g 1 is the zerofunction. This implies that S is completely contained in the support of g 0 and moreover, by (2.3),ĝ 0 (β) = 2f (0, β) for every β ∈ F n−1 2 . In other words, g 0 is a Boolean function over F n−1 2 and |ĝ(β)| is equal to either zero or 1/2 k−1 , therefore the induction hypothesis applies to g 0 . It follows that S is an affine subspace of dimension n − 1 − (k − 1) = n − k.
This completes the proof of Theorem 2.1.1.
Reducing the dimension of the function domain. The proof of the Main Theorem is much more involved than that of Rothschild and van Lint Theorem. In fact, the proof we we are ready to present our Main theorem more precisely.
Theorem 2.1.4 (Main). Let k ≥ 1, n > k be two integers, and let f : be a non-constant Boolean function with all its Fourier coefficients taking values in Then we have the following complete characterization • Iff (0) = 1 2 k , then f is the indicator function of an affine subspace of dimension n − k (Rothschild and van Lint Theorem); • Iff (0) = 1 2 k−1 and f is irreducible, then f is either the indicator function of disjoint union of two affine subspaces of dimension n − k, or the indicator function of disjoint union of four affine subspaces of dimension n − k − 1. Moreover, the latter case is only possible when k = 4.
Back to our problem, sincef (0) = 1/2 k−1 , it is easy to see that whenever there is a non-zero α such that |f (α)| = 1/2 k−1 , we can restrict f either to the subspace α, x = 0 or to the affine subspace α, x = 1 while keeping the entire support of f . We repeat this process until we reach a Boolean function f withf (0) = 1/2 k−1 and all other non-zero Green and Tao [GT09] proved that, when the underlying ambient group is F n 2 , then B is contained in a subspace of size 2 2K+O(K log K) |B|, which is asymptotically optimal. Unfortunately, such asymptotic "high end" bounds are not accurate enough to be useful for our problem. In fact, we make crucial use of a "low end" additive combinatorics result of Even-Zohar [EZ12], which provides tight bounds on the size of affine span of B in terms of its doubling constant. It is worth noting that all aforementioned applications of additive combinatorics in theoretical computer science employ theorems regarding asymptotic behaviors of certain combinatorial objects. We hope researchers may find further applications of such "low end" additive combinatorics results in other places.

Motivations and related work
To the best of our knowledge, besides the work of Rothschild and van Lint, there is no previous structural result on Boolean functions in terms the magnitudes of their Fourier coefficients only. Friedgut [Fri98] showed that if the total influence of a Boolean function is small, then it is close to some junta -a function that depends only on a bounded Of course, Corollary 2.1.5 is still very far from showing the desired kill number bound poly(k) as m can be as large as 2 k−1 , but it is hoped that further investigations along this approach may lead to more interesting results.

Organization
The rest of the chapter is organized as follows. Preliminaries and notations that we use throughout the chapter are summarized in Section 2.2. We prove our Main Lemma, which deals with the cases when k is at least 5 in Section 2.3, while the small value cases are discussed in Section 2.4. Then, by combining these two ingredients, we prove our Main Theorem in Section 2.5. Finally we end with a brief section of conclusions and open questions.

Preliminaries
All logarithms in this chapter are to the base 2. Let n ≥ 1 be a natural number, then [n] denotes the set {1, . . . , n}. We use F 2 for the field with 2 elements {0, 1}, where addition and multiplication are performed modulo 2. We view elements in F n 2 as n-bit binary strings, i.e. elements in {0, 1} n , interchangeably. If x and y are two n-bit strings, then x + y (or x − y) denotes bitwise addition (i.e. XOR) of x and y. For positive integers m and n, if y ∈ F m 2 and z ∈ F n 2 , then we write x = (y, z) to denote the binary string x ∈ F m+n 2 obtained from concatenating y and z together. We view F n 2 as a vector space equipped with an inner product x, y , which we take to be the standard dot product:

Boolean functions and Fourier analysis
We often use f to denote a real function defined on F n 2 and write supp( for the support of f . Sometimes we view f as a 2 n -dimensional vector, e.g. write f = 0 and f = 1 to denote the trivial all-zero function and all-one function, respectively. In this paper, a function f is Boolean if its range is {0, 1}. For every α ∈ F n 2 , one can define a linear function (or parity function) mapping F n 2 to {0, 1} as α (x) = α, x . Let χ α = (−1) α , which are commonly known as characters.
For functions f, g : F n 2 → R the inner product is defined as f, g := E x∈F n 2 (f (x)g(x)). For α = (α 1 , . . . , α n ) ∈ F n 2 , the corresponding character function χ α is defined as For α, β ∈ F n 2 , the inner product between χ α and χ β is 1 if α = β, and 0 otherwise. Therefore the characters form an orthonormal basis for real-valued functions over F n 2 , and we can expand any f defined on F n 2 using {χ α } α∈F n 2 as a basis.
The Fourier inversion formula is given by The Fourier sparsity of f , denoted by f 0 or spar(f ), is the number of nonzero Fourier coefficients of f .

Fourier characterization of Boolean functions
Our proof crucially relies on the following characterization of Boolean functions in terms of their Fourier spectra. We give a proof for completeness.
Proof. This follows from the fact that f is Boolean if and only if f 2 (x) − f (x) = 0 for every x. Now expand the left-hand side in terms of Fourier coefficients and notice that, since the right-hand side is the 0-function, all of its Fourier coefficients all zero.
Comparing each pair of the corresponding Fourier coefficients on both sides gives the desired equality.

Linear restrictions
The following is a folklore theorem regarding the effect of linear restrictions on the Fourier spectrum of a function defined over the Boolean hypercube.
Proposition 2.2.3. Let f : F n 2 → R be a function defined on the Boolean hypercube.
Let f 0 , f 1 : F n−1 2 → R be the "sub-functions" obtained from restricting the first bit of the input to 0 and 1, respectively; that is, f 0 (y) := f (0, y) and f 1 (y) := f (1, y) for all y ∈ F n−1 2 . Then the Fourier spectra of f 0 and f 1 satisfy that, for all β ∈ F n−1 Conversely, the Fourier spectrum of f satisfieŝ Proof. We prove the first part in (2.3), the second part follows analogously. By the definition of Fourier transform,

Tensor product
The statement as well as the proof of Main Theorem requires the standard notion of tensor products between functions.
Definition 2.2.4 (Tensor Product of Boolean Functions). Let f : F n 1 2 → {0, 1} and g : F n 2 2 → {0, 1} be two Boolean functions on n 1 and n 2 variables respectively. Then the tensor product of f and g, denoted by f ⊗ g, is a Boolean function over F n 1 +n 2 2 such that for all x ∈ F n 1 2 and y ∈ F n 2 2 .
It is easy to verify the following fact.
Fact 2.2.5. If h = f ⊗ g is the tensor product of two Boolean function defined above, then the Fourier spectrum h satisfies thatĥ(α, β) =f (α) ·ĝ(β), for every α ∈ F n 1 2 and Given a Boolean function f : F n 1 2 → {0, 1}, two commonly used functions to tensor with f are the all-one function g 1 = 1 whose Fourier spectrum isĝ 1 (0) = 1 andĝ 1 (α) = 0 for any α = 0; and the "δ-function" g 2 defined by g 2 (x) = 1 if and only if x = 0 n 2 , whose Fourier spectrum isĝ 2 (α) = 1/2 n 2 for every α. Note that tensoring f with g 1 is equivalent to setting each to the 2 n 2 sub-functions, defined by restricting y to different values in F n 2 2 , to f ; and tensoring f with g 1 is to set the sub-function with y = 0 to f and set all other sub-functions to the all-zero function.

Invertible linear transformations and linear shifts
Let L : F n 2 → F n 2 be an invertible linear transformation. If f : F n 2 → {0, 1} is a Boolean function, then define g := Lf , the function obtained from applying the linear transfor- The Fourier spectrum of g is given bŷ

Additive combinatorics
Additive combinatorics is the sub-field of mathematics concerned with subsets of integers or more generally abelian groups, and studies the interplay between the structural prop-erties of a subset and its combinatorial estimates associated with arithmetic operations.
Recently additive combinatorics has found many applications in computer science, see the excellent exposition [Lov17] and the textbook [TV06] for comprehensive treatments.
Throughout this chapter, G is the abelian group F n 2 for some positive integer n and the underlying field is The following Lemma of Laba is useful for our proofs.
Lemma 2.2.6 ([Łab01], Theorem 2.5). Let G be an abelian group and A ⊂ G be a subset of G such that |A − A| < 3 2 |A|. Then A − A is a subgroup of G.

Proof of the Main Lemma
First recall our Main Lemma states the following.
Lemma 2.1.3. Let k ≥ 5 and n ≥ k be integers. Let f : F n 2 → {0, 1} be a Boolean function such thatf (0) = 1/2 k−1 and any other Fourier coefficients are either zero or equal to ± 1 2 k , then f is the indicator function of a disjoint union of two dimension n − k affine subspaces.
In Section 2.6, we compute the Fourier spectrum of a Boolean function that is supported on two disjoint affine subspaces such that the two affine subspaces are of the same dimension and their Fourier spectra have minimum intersection. Our strategy for the proof of the Main Lemma is to show that if the Fourier coefficients of a Boolean function satisfy the condition prescribed in the Main Lemma, then its Fourier spectrum matches the one we show in Section 2.6.

Let us define
Without loss of generality 3 , from now on, we may assume f (0) = 1. We begin with calculating the cardinalities of sets A and B. On the other hand, which gives |A|−|B| = 2 k −2. Therefore we have |A| = 3(2 k−1 −1) and |B| = 2 k−1 −1.

Some additive properties of sets A and B
We now study the additive properties of sets A and B.
where the inequality in the second last line becomes equality if and only if the following two conditions hold: 1) for every 1 ≤ j ≤ t, j = i, β i + β j ∈ A; and 2) there is no triangle of the form (β i , α j , α ). Hence the lemma follows.
Proof. This follows directly from Lemma 2.3.3 and the fact sets A and B are disjoint.
These five elements must be distinct as otherwise they would give rise to a triangle in B.
Let us define Note that L and R are disjoint and A = L ∪ R. For any ρ ∈ R, let be the set of points in B which has a triangle passing through ρ. Define a set Γ ⊂ F n 2 as Observe that Γ is nonempty: since for every ρ ∈ R, all its β-neighbors can be paired together, so |N (ρ)| is an even number, but |B| = 2 k−1 − 1 is odd.
It is easy to see that Γ is disjoint from the Fourier support of f .
Claim 2.3.7. For every element γ ∈ Γ, we havef (γ) = 0. Freiman theorem asserts that if a set of integers has small doubling constant, then the set is well-structured. Ruzsa [Ruz99] established an analog of Freiman's theorem for finite abelian groups with torsion r. Specifically, he proved that any subset A with doubling constant K is contained in a subgroup of G of size at most K 2 r K 4 |A|. The question for groups F n 2 was first studied by Green and Ruzsa [GR06] and the bound was later improved by Sanders [San08]. An asymptotically tight bound was first proved in [GT09] and [Kon08].

Proof. Recall that, the support off is
(2.5)

Characterizing 2B and span(B)
Note that the doubling constant of set B satisfies that (2.6) and recall that t = 2 k−1 − 1. Therefore, when k ≥ 5, K = σ[B] ≤ 46 15 . Plugging this K into (2.4) gives that s ≤ 5 and consequently F (K) ≤ 2K < 7. That is, we have The most important step in our proof is establishing the following lemma, which almost completely characterizes the structure of set B. Lemma 2.3.9. If k ≥ 5, then | span(B)| = 2 k = 2(|B| + 1) and 2B is a subspace of dimension k − 1.
We prove Lemma 2.3.9 in the following two subsections, distinguishing between the case when B is an affine subspace and the case when B is a subspace.

If B is an affine subspace
In the case that B is an affine subspace, let B = a + H be the affine subspace, where so Γ ⊆ a + H and is disjoint from set A. It follows that for any γ ∈ Γ,f (γ) = 0 (or directly from Claim 2.3.7). However, applying Proposition 2.2.2 tof (γ), we see that by the definition of set Γ, γ = ρ + β with ρ ∈ A, β ∈ B and β / ∈ N (ρ). Hence there is at least one negative term contribution on the right-hand side in (2.1) forf (γ), but since both 2B and 2A are disjoint from Γ, there is no positive term on the right-hand side in (2.1), a contradiction.
In the following we exclude the first case.

Completing the proof of the Main Lemma
By Lemma 2.3.9, 2B is a dimension k − 1 subspace; without loss of generality, we may assume that H = 2B = span(e 1 , . . . , e k−1 ). (γ)f (e k + γ) where equality holds in the second last line only if for every λ ∈ L,f (e k + λ) = 1 2 k . That is, e k + λ ∈ A(= L ∪ R). As each element in R has already been taken into account in the first summation in the second line, therefore we necessarily have e k + λ ∈ L. Claim 2.3.13. For any λ ∈ L and ρ ∈ R,f (λ + ρ) = 0.
Proof. Applying Proposition 2.2.2 tof (ρ), where ρ is an arbitrary element in R, we havê where we have a factor of (t − 1) in the second line because ρ + e k ∈ B and equality holds in the last line only iff (λ + ρ) = 0 for every λ ∈ L and every ρ ∈ R.
Proof. Applying Proposition 2.2.2 tof (λ), where λ is an arbitrary element in L, we havê where equality holds in the second last line only if λ + λ ∈ L for every λ ∈ L, except when λ is equal to λ or λ + e k . 5 The second term vanishes because the only triangles passing through a point β i ∈ B are of the type (β i , β j , ρ ) where ρ ∈ R; the third term vanishes because of Claim 2.3.13. Put Claim 2.3.12, Claim 2.3.13 and Claim 2.3.14 together, and since |L| = 2 k − 2 we conclude that H := L ∪ {0, e k } is a subspace of dimension k. Moreover, as span(B) = span(e 1 , . . . , e k ) is a subspace of dimension k, and L ∩ span(B) = ∅, we thus have H ∩ span(B) = {0, e k }. Therefore, without loss of generality, we may take H = span(e k , . . . , e 2k−1 ) and consequently finally have L = span(e k , . . . , e 2k−1 ) \ {0, e k }. (2.10) It is straightforward to check 6 that the Fourier spectrum calculated in Section 2.6 for a disjoint union of two dimension n−k affine subspaces is identical to the Fourier spectrum of f , which is completely specified by sets in (2.8), (2.9) and (2.10). Therefore the proof of the Main Lemma is complete.
Without loss of generality, assumef (β + α 1 ) = 0 and denote β + α 1 by γ. Now applying Proposition 2.2.2 to γ gives: where equality holds in the last line only if γ = α 2 + α 3 so that After taking an invertible linear transformation if necessary, we may take α 1 = e 1 , β = e 1 + e 2 , α 2 = e 3 and α 3 = e 2 + e 3 , then it is easy to verify that this is identical to the Fourier spectrum in (2.11) for the case of k = 2.
Hence Lemma 2.3.9 is established and the rest of the proof is identical to that of the Main Lemma in Section 2.3.4 for the general k ≥ 5 case.

Proof of the case k = 4
First of all, it is easy to see that when k = 4, the indicator function of a disjoint union of 2 affine subspaces of dimension n − k = n − 4 is still a Boolean function with desired Fourier spectrum, for every n ≥ 4. Next we construct another Boolean function, which demonstrates that Main Lemma is no longer valid for k = 4.
Construction 2.4.2. Let G = F 6 2 with e 1 , · · · , e 6 as the standard basis and let A, B ⊂ G be two disjoint subsets given as follows: Clearly A = 2B \ {0}, |B| = 2 4−1 − 1 = 7 and |A| = 7 2 = 3|B|, which satisfy the size requirements for A and B for k = 4. To see that sets A and B in Construction 2.4.2 satisfy all the additive properties imposed by Proposition 2.2.2, one can explicitly compute a "core" function f CE : F 6 2 → R with A ∪ B ∪ {0} being its Fourier support to verify that f is indeed a Boolean function and supp(f That is, f is equal to 1 on vectors of weights 0, 5 and 6, and is equal to 0 on all other vectors. Note that supp(f CE ) consists of 8 distinct vectors and is a disjoint union of four affine subspaces of dimension n − 4 − 1 = 1 each. Moreover, it can be checked that supp(f CE ) is not the union of any two disjoint affine subspaces of dimension

2.
Our next claim shows that, up to an invertible linear transformation, Construction 2.4.2 is essentially the only counterexample to the Main Lemma. Hence, the counter-example is possible only when the dimension of span(B) is at least 6. Without loss of generality, we may assume B = {e i | 1 ≤ i ≤ 6} ∪ {β}. We will determine vector β next.
Note that every weight-2 vector e i + e j , 1 ≤ i < j ≤ 6, is in A. On the other hand, since |A| = |B| 2 , it follows that for every α k ∈ A, there exist a unique pair β i , β j ∈ B such that β i +β j = α k . Combining these two facts, we conclude that none of the weight-3 vector of the form e i + e j + e k is in B, for every 1 ≤ i < j < k ≤ 6, as it would gives two ways to obtain vectors such as e i + e j by adding two vectors from B, thus making |A| < |B| 2 . By Claim 2.3.5, none of the weight-4 vectors can be in B either, which leaves only the possibilities of weight-5 or weight-6 vector for β.
If β is a weight-5 vector, without loss of generality, we may assume β = 5 i=1 e i .
Then B would contain vectors of weight-1 and weight-5 only, consequently A would contain vectors of weight-2, weight-4 and weight-6 only. Now applying Proposition 2.2.2 to the vector e 1 + e 2 + e 3 yieldsf (e 1 + e 2 + e 3 ) < 0, contradicting to the fact that f (e 1 + e 2 + e 3 ) = 0 as e 1 + e 2 + e 3 / ∈ A ∪ B. Therefore, we have β = 6 i=1 e i , completing the proof of the claim.

Proof of the Main Theorem
Clearly, iff (0) = 1 2 k , then, because |f (α)| ≤f (0) for every α, all non-zero Fourier coefficients of f have absolute value 1 2 k . Therefore, Rothschild and van Lint Theorem applies and f is the indicator function of an affine subspace of dimension n−k. Therefore, from now on, we assumef (0) = 1 2 k−1 . The first step in our proof of the Main Theorem is to follow a similar procedure employed in the proof of Theorem 2.1.1. That is, whenever possible, we reduce the values of n and k simultaneously. This proceeds as follows. Suppose there exists a non-zero α withf (α) = 1 2 k−1 or − 1 2 k−1 . Without loss of generality, assume thatf (α) = 1 2 k−1 .
Apply an invertible linear transform L that maps α to e 1 and let g := Lf . Now we havê g(0) =ĝ(e 1 ) = 1 2 k−1 . Apply the restriction on the first bit of the input to get sub-functions g 0 and g 1 . Then by (2.2),ĝ 1 (0) =ĝ(0) −ĝ(e 1 ) = 0, which implies that g 1 ≡ 0. This implies that supp(f ) is completely contained in the support of g 0 and moreover, by (2.3), g 0 (β) = 2f (0, β) for every β ∈ F n−1 2 . In other words, g 0 is a Boolean function over F n−1 2 and |ĝ(β)| is equal to either zero, or 1 2 k−1 , or 1 2 k−2 . That is, by performing a linear restriction, we reduce both the dimension n and the parameter k by one, so that the Main Theorem holds for Boolean functions over F n 2 as long as it holds for Boolean functions When we arrive at a point that such a linear restriction is no longer possible; equivalently, f is irreducible, thenf (0) is the only Fourier coefficient whose absolute value is 1 2 k−1 . Therefore, the Main Lemma for k ≥ 5 or Lemma 2.4.1 for 2 ≤ k ≤ 4 applies.

The Fourier spectrum of disjoint union of two affine subspaces
In this section we calculate the Fourier spectrum of a Boolean function whose support is the union of two disjoint affine subspaces satisfying certain properties. In particular, the two affine subspaces are of the same dimension and their Fourier spectra have minimum intersection.
Let n ≥ 1 and 0 ≤ k < n be integers. If V is a linear subspace in F n 2 of dimension n− k and a ∈ V ⊥ , where V ⊥ denotes the linear subspace that is the orthogonal complement otherwise.
Let f : F n 2 → {0, 1} be a Boolean function whose support is the union of two disjoint affine subspaces of dimension n − k. By a shift of the origin if necessary, we may assume that one of the two affine subspaces is a linear subspace. Therefore f = 1 a+V 1 + 1 V 2 , where V 1 and V 2 are two linear subspaces of dimension n − k in F n 2 and a ∈ V ⊥ 1 . In order for a + V 1 and V 2 to be disjoint, a necessary condition is that their orthogonal complement The special configuration we are interested in is when this intersection is minimal, that is when To this end, without loss of generality, we let V ⊥ 1 = span(e 1 , . . . , e k ) and V ⊥ 2 = span(e k , . . . , e 2k−1 ) so that V ⊥ 1 ∩ V ⊥ 2 = {0, e k }. Then we necessarily have 7 e k , a = 1. Therefore for simplicity (and also without loss of generality) we may take a = e k .
Therefore the Fourier spectrum of f iŝ otherwise.
(2.11) 7 This is because, the affine subspace a + V 1 can be expressed as the solutions to a system of linear equations a + V 1 = {x ∈ F n 2 | x, e i = a i for every 1 ≤ i ≤ k}, where {e 1 , . . . , e k } is an orthonormal basis for V ⊥ 1 , and {a i := e i , a } k i=1 are the components under this basis. Now if |V ⊥ 1 ∩ V ⊥ 2 | = 2, and because the intersection of the two orthogonal complement subspaces is a subspace, we may take V ⊥ 1 ∩ V ⊥ 2 = {0, e k } for convenience. On the other hand, V 2 = {x ∈ F n 2 | x, e i = 0 for every k ≤ i ≤ 2k − 1}. a+V 1 and V 2 are disjoint if and only if there is no solution to the two systems of linear equations combined together, which is equivalent to the condition that e k , a = 1.

Concluding Remarks and Open Problems
In this chapter, we extend a classical result of Rothschild and van Lint to give a complete characterization of Boolean functions whose Fourier coefficients take values only in the set {−2/2 k , −1/2 k , 0, 1/2 k , 2/2 k }. Our work may be regarded as a first step toward understanding the structures of Boolean functions of granularity k. A major motivation for such studies is to prove a polynomial upper bound on the kill number for any k-granular Boolean function, thus resolving the Log-rank XOR conjecture. Another interesting question is to find other sets of Fourier coefficients which uniquely or almost uniquely determine the structures of their corresponding Boolean functions.

Introduction
In the previous chapter, we focused our work on Fourier representation of Boolean functions and the structure of the Fourier support. In this chapter, we want to learn more about the sparsity of Boolean functions in real polynomial form.
There is a long history of investigating the relation between polynomials and complexity bounds, from communication complexity to circuit complexity. In 1969, Minski and Papert [MP88] started to use real polynomial representations to prove computational complexity properties, together with works by Razborov [Raz87] and Smolensky [Smo87].
In 1992, Nisan and Szegedy [NS94] built connections between degrees, decision tree complexity and sensitivities of Boolean functions. Nisan and Szegedy [NS94] and Paturi's [Pat92] work gave new results related to polynomials that approximate function f .
The work from Beigel [Bei93] showed the relations between real polynomial representations and Fourier transform representations. More details can be found in the survey paper from Buhrman and de Wolf [BdW02].
Though, function analysis has drawn lots of attention in the last decades, but understanding of arbitrary functions still needs more work. A recent breakthrough from Knop et al. [KLMY20], is also based on real polynomial representation, reduced the originally exponential gap for log-rank conjecture of AND-functions to only log n. Their work built a strong connection between sparsity, monotone block sensitivity and AND-decision tree complexity.

Preliminaries
In this chapter, we will need some other common notations. R denotes all the real numbers and Z denotes all the integers.
As in the previous chapter, we can also consider any S ∈ {0, 1} n as a subset of [n], and from now on, we will denote the size of S as the corresponding lower-case letter, s := |S| For an event A, we use 1 A to denote the indicator function of A, which means (3.1)

Real polynomial representation of Boolean function
For any functions f : {0, 1} n → R, we can always have a multivariate polynomial computes f as follow, f (x 1 , x 2 , · · · , x n ) = S∈{0,1} n f (S) i∈S x i i / ∈S (1−x i ). Beigel [Bei93] named this as table lookup representation. For any input S, we can directly go to the coefficient of the corresponding term i∈S x i i / ∈S (1 − x i ). We can also view those terms as indicator polynomials.
The 2 n terms form a basis for the vector space of functions f : {0, 1} n → R as that is a vector space of dimension 2 n . Therefore every functions f : {0, 1} n → R has a unique table lookup representation.  Now suppose for all s − t ≤ k − 1, we have µ(S, T ) = (−1) s−t , we will show that when s − t = k, µ(S, T ) = (−1) s−t = (−1) k still holds.   We noticed that this lemma can be easily generalized to any c S as below, and that would be the starting point of our method measuring the sparsity. As we may call c S = 0 as a balanced state, c S = k as k-biased state, then we have the following corollary for k-biased state using Vandermonde convolution again. With the help of Stirling formula n! ∼ √ 2πn( n e ) n and n n/2 ∼ 2 π 2 n √ n as an approximation, we obtain the following corollary.

Expectation
The last equation is by Binomial theorem, n i=0 n i a i b n−i = (a + b) n .

Variance
The next important measurement would be the variance, since we have First, we will list the probability Pr(c S = 0, c T = 0) for all circumstances in the following Lemma.
Lemma 3.3.5.  Proof. Case 1 is trivial, as S and T are the exact same set.
Case 2 could follow the same proof in Lemma 3.3.2, we may noticed that the Boolean cube B S and set B T \ B S , these two parts will be independent. The probability for each part being balanced could be referred from Lemma 3.3.2, and the result is multiplying the probabilities together. Here we present the Fortuin-Kasteleyn-Ginibre (FKG) inequality with a self-contained folklore proof for completeness.
Lemma 3.3.6 (FKG inequality). let µ : Z → R be a non-negative function, and f, g : Z → R be two monotonically non-decreasing functions on Z. Then we have the following inequality: Proof.

Concluding remarks and Open Problems
In this chapter, we give several bounds and concentration results about the distribution of the sparsity for the real polynomial representation of random Boolean functions. However, though the bound for expectation of sparsity is asymptotically tight, there still exists a gap for variance. We conjecture that the variance Var[zero(f )] = O(( 3 2 + 1 √ 2 ) n ), and that will consequently lead to better concentration results.
A major motivation for this study is to find the exact distribution of sparsity. However, having only expectation and variance is not enough to characterize the distribution.
Another interesting question is to obtain an approximate distribution of spar(f ).