STOC 2021: Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing

Full Citation in the ACM Digital Library

SESSION: Keynote

Computational thinking in programming language and compiler design (keynote)

Abstractions and algorithms are at the heart of computational thinking. In this talk I will discuss the evolution of the theory and practice of programming language and compiler design through the lens of computational thinking. Many of the key concepts in this area were introduced at the ACM Symposium on the Theory of Computing.

SESSION: Invited Talk

Climbing algorithms (invited talk)

NP (search) problems allow easy correctness tests for solutions. Climbing algorithms allow also easy assessment of how close to yielding the correct answer is the configuration at any stage of their run. This offers a great flexibility, as how sensible is any deviation from the standard procedures can be instantly assessed.

An example is the Dual Matrix Algorithm (DMA) for linear programming, variations of which were considered by A.Y.Levin in 1965 and by Yamnitsky and myself in 1982. It has little sensitivity to numerical errors and to the number of inequalities. It offers substantial flexibility and, thus, potential for further developments.

SESSION: Invited Papers

A polynomial-time approximation algorithm for counting words accepted by an NFA (invited paper)

Counting the number of words of a certain length accepted by a non-deterministic finite automaton (NFA) is a fundamental problem, which has many applications in different areas such as graph databases, knowledge compilation, and information extraction. Along with this, generating such words uniformly at random is also a relevant problem, particularly in scenarios where returning varied outputs is a desirable feature.

The previous problems are formalized as follows. The input of #NFA is an NFA N and a length k given in unary (that is, given as a string 0^k), and then the task is to compute the number of strings of length k accepted by N. The input of GEN-NFA is the same as #NFA, but now the task is to generate uniformly, at random, a string accepted by N of length k.

It is known that #NFA is #P-complete, so an efficient algorithm to compute this function exactly is not expected to exist. However, this does not preclude the existence of an efficient approximation algorithm for it. In this talk, we will show that #NFA admits a fully polynomial-time randomized approximation scheme (FPRAS). Prior to our work, it was open whether #NFA admits an FPRAS; in fact, the best randomized approximation scheme known for #NFA ran in time n^O(log(n)).

Besides, we will mention some consequences and applications of our results. In particular, from well-known results on counting and uniform generation, we obtain that GEN-NFA admits a fully polynomial-time almost uniform generator. Moreover, as #NFA is SpanL-complete under polynomial-time parsimonious reductions, we obtain that every function in the complexity class SpanL admits an FPRAS.

Chasing convex bodies with linear competitive ratio (invited paper)

The problem of chasing convex functions is easy to state: faced with a sequence of convex functions f t over d-dimensional Euclidean spaces, the goal of the algorithm is to output a point x t at each time, so that the sum of the function costs f t (x t ), plus the movement costs ||x t − x t − 1 || is minimized. This problem generalizes questions in online algorithms such as caching and the k-server problem. In 1994, Friedman and Linial posed the question of getting an algorithm with a competitive ratio that depends only on the dimension d. In this talk we give an O (d)-competitive algorithm, based on the notion of the Steiner point of a convex body.

Neural tangent kernel: convergence and generalization in neural networks (invited paper)

The Neural Tangent Kernel is a new way to understand the gradient descent in deep neural networks, connecting them with kernel methods. In this talk, I'll introduce this formalism and give a number of results on the Neural Tangent Kernel and explain how they give us insight into the dynamics of neural networks during training and into their generalization features.

Simplicity creates inequity: implications for fairness, stereotypes, and interpretability (invited paper)

Algorithms are increasingly used to aid, or in some cases supplant, human decision-making, particularly for decisions that hinge on predictions. As a result, two additional features in addition to prediction quality have generated interest: (i) to facilitate human interaction and understanding with these algorithms, we desire prediction functions that are in some fashion simple or interpretable; and (ii) because they influence consequential decisions, we also want them to produce equitable allocations. We develop a formal model to explore the relationship between the demands of simplicity and equity. Although the two concepts appear to be motivated by qualitatively distinct goals, we show a fundamental inconsistency between them. Specifically, we formalize a general framework for producing simple prediction functions, and in this framework we establish two basic results. First, every simple prediction function is strictly improvable: there exists a more complex prediction function that is both strictly more efficient and also strictly more equitable. Put another way, using a simple prediction function both reduces utility for disadvantaged groups and reduces overall welfare relative to other options. Second, we show that simple prediction functions necessarily create incentives to use information about individuals' membership in a disadvantaged group --- incentives that weren't present before simplification, and that work against these individuals. Thus, simplicity transforms disadvantage into bias against the disadvantaged group. Our results are not only about algorithms but about any process that produces simple models, and as such they connect to the psychology of stereotypes and to an earlier economics literature on statistical discrimination.

The ghost in the radiation: robust encodings of the black hole interior (invited paper)

We reconsider the black hole firewall puzzle, emphasizing that quantum error-correction, computational complexity, and pseudorandomness are crucial concepts for understanding the black hole interior. We assume that the Hawking radiation emitted by an old black hole is pseudorandom, meaning that it cannot be distinguished from a perfectly thermal state by any efficient quantum computation acting on the radiation alone. We then infer the existence of a subspace of the radiation system which we interpret as an encoding of the black hole interior. This encoded interior is entangled with the late outgoing Hawking quanta emitted by the old black hole, and is inaccessible to computationally bounded observers who are outside the black hole. Specifically, efficient operations acting on the radiation, those with quantum computational complexity polynomial in the entropy of the remaining black hole, commute with a complete set of logical operators acting on the encoded interior, up to corrections which are exponentially small in the entropy. Thus, under our pseudorandomness assumption, the black hole interior is well protected from exterior observers as long as the remaining black hole is macroscopic. On the other hand, if the radiation is not pseudorandom, an exterior observer may be able to create a firewall by applying a polynomial-time quantum computation to the radiation.

A new analysis of differential privacy’s generalization guarantees (invited paper)

We give a new proof of the "transfer theorem" underlying adaptive data analysis: that any mechanism for answering adaptively chosen statistical queries that is differentially private and sample-accurate is also accurate out-of-sample. Our new proof is elementary and gives structural insights that we expect will be useful elsewhere. We show: 1) that differential privacy ensures that the expectation of any query on the conditional distribution on datasets induced by the transcript of the interaction is close to its true value on the data distribution, and 2) sample accuracy on its own ensures that any query answer produced by the mechanism is close to its conditional expectation with high probability. This second claim follows from a thought experiment in which we imagine that the dataset is resampled from the conditional distribution after the mechanism has committed to its answers. The transfer theorem then follows by summing these two bounds. An upshot of our new proof technique is that the concrete bounds we obtain are substantially better than the best previously known bounds.

Load balancing guardrails: keeping your heavy traffic on the road to low response times (invited paper)

This talk is about scheduling and load balancing in a multi-server system, with the goal of minimizing mean response time in a general stochastic setting. We will specifically concentrate on the common case of a load balancing system, where a front-end load balancer (a.k.a. dispatcher) dispatches requests to multiple back-end servers, each with their own queue. Much is known about load balancing in the case where the scheduling at the servers is First-Come-First-Served (FCFS).  However, to minimize mean response time, we need to use Shortest-Remaining-Processing-Time (SRPT) scheduling at the servers. Unfortunately, there is almost nothing known about optimal dispatching when SRPT scheduling is used at the servers.  To make things worse, it turns out that the traditional dispatching policies that are used in practice with FCFS servers often have poor performance in systems with SRPT servers. In this talk, we devise a simple fix that can be applied to any dispatching policy.  This fix, called "guardrails" ensures that the dispatching policy yields optimal mean response time under heavy traffic, when used in a system with SRPT servers.  Any dispatching policy, when augmented with guardrails becomes heavy-traffic optimal.  Our results also yield the first analytical bounds on mean response time for load balancing systems with SRPT scheduling at the servers. Load balancing and scheduling are highly studied both in the stochastic and the worst-case scheduling communities.  One aim of this talk is to contrast some differences in the approaches of the two communities when tackling multi-server scheduling problems.

Learnability can be independent of set theory (invited paper)

A fundamental result in statistical learning theory is the equivalence of PAC learnability of a class with the finiteness of its Vapnik-Chervonenkis dimension. However, this clean result applies only to binary classification problems. In search for a similar combinatorial characterization of learnability in a more general setting, we discovered a surprising independence of set theory for some basic general notion of learnability. Consider the following statistical estimation problem: given a family F of real valued random variables over some domain X and an i.i.d. sample drawn from an unknown distribution P over X, find f in F such that its expectation w.r.t. P is close to the supremum expectation over all members of F. This Expectation Maximization (EMX) problem captures many well studied learning problems. Surprisingly, we show that the EMX learnability of some simple classes depends on the cardinality of the continuum and is therefore independent of the set theory ZFC axioms. Our results imply that that there exist no "finitary" combinatorial parameter that characterizes EMX learnability in a way similar to the VC-dimension characterization of binary classification learnability.

SESSION: Tutorials

Log-concave polynomials in theory and applications (tutorial)

Log-concave polynomials give rise to discrete probability distributions with several nice properties. In particular, log-concavity of the generating polynomial guarantees the existence of efficient algorithms for approximately sampling from a distribution and finding the size of its support. This class of distributions contains several important examples, including uniform measures over bases or independent sets of matroids, determinantal point processes and strongly Rayleigh measures, measures defined by mixed volumes in Mikowski sums, the random cluster model in certain regimes, and more.  In this tutorial, we will introduce the theory and applications of log-concave polynomials and survey some of the recent developments in this area.

Statistical physics of random CSPs (tutorial)

I will describe recent progress in determination of asymptotic behavior in random constraint satisfaction problems, including the independent set problem on random graphs, random regular NAE-SAT, and random SAT. The results include sharp phase transitions and some understanding of solution geometry, particularly in the setting of the random regular NAE-SAT problem. In this lecture I will survey the physics heuristics, and explain how they lead to combinatorial models for the solution geometry, which form a basis of mathematical approaches to these problems. As time allows, I will discuss some of the mathematical techniques that have been introduced, particularly with regards to solving certain non-convex optimization problems that arise in moment method calculations.

SESSION: Best Student Papers

Discrepancy minimization via a self-balancing walk

We study discrepancy minimization for vectors in ℝn under various settings. The main result is the analysis of a new simple random process in high dimensions through a comparison argument. As corollaries, we obtain bounds which are tight up to logarithmic factors for online vector balancing against oblivious adversaries, resolving several questions posed by Bansal, Jiang, Singla, and Sinha (STOC 2020), as well as a linear time algorithm for logarithmic bounds for the Komlós conjecture.

Separating words and trace reconstruction

We prove that for any distinct x,y ∈ {0,1}n, there is a deterministic finite automaton with O(n1/3) states that accepts x but not y. This improves Robson’s 1989 bound of O(n2/5). Using a similar complex analytic technique, we improve the upper bound on worst case trace reconstruction, showing that any unknown string x ∈ {0,1}n can be reconstructed with high probability from exp(O(n1/5)) independently generated traces.

SESSION: Best Papers

A (slightly) improved approximation algorithm for metric TSP

For some > 10−36 we give a randomized 3/2− approximation algorithm for metric TSP.

The complexity of gradient descent: CLS = PPAD ∩ PLS

We study search problems that can be solved by performing Gradient Descent on a bounded convex polytopal domain and show that this class is equal to the intersection of two well-known classes: PPAD and PLS. As our main underlying technical contribution, we show that computing a Karush-Kuhn-Tucker (KKT) point of a continuously differentiable function over the domain [0,1]2 is PPAD ∩ PLS-complete. This is the first natural problem to be shown complete for this class. Our results also imply that the class CLS (Continuous Local Search) - which was defined by Daskalakis and Papadimitriou as a more “natural” counterpart to PPAD ∩ PLS and contains many interesting problems - is itself equal to PPAD ∩ PLS.

Indistinguishability obfuscation from well-founded assumptions

Indistinguishability obfuscation, introduced by [Barak et. al. Crypto 2001], aims to compile programs into unintelligible ones while preserving functionality. It is a fascinating and powerful object that has been shown to enable a host of new cryptographic goals and beyond. However, constructions of indistinguishability obfuscation have remained elusive, with all other proposals relying on heuristics or newly conjectured hardness assumptions. In this work, we show how to construct indistinguishability obfuscation from subexponential hardness of four well-founded assumptions. We prove:

Informal Theorem: Let τ ∈ (0,∞), δ ∈ (0,1), ∈ (0,1) be arbitrary constants. Assume sub-exponential security of the following assumptions:

- the Learning With Errors (LWE) assumption with subexponential modulus-to-noise ratio 2kє and noises of magnitude polynomial in k, where k is the dimension of the LWE secret,

- the Learning Parity with Noise (LPN) assumption over general prime fields p with polynomially many LPN samples and error rate 1/ℓδ, where is the dimension of the LPN secret,

- the existence of a Boolean Pseudo-Random Generator (PRG) in NC0 with stretch n1+τ, where n is the length of the PRG seed,

- the Decision Linear (DLIN) assumption on symmetric bilinear groups of prime order.

Then, (subexponentially secure) indistinguishability obfuscation for all polynomial-size circuits exists. Further, assuming only polynomial security of the aforementioned assumptions, there exists collusion resistant public-key functional encryption for all polynomial-size circuits.

SESSION: Session 1A

Linear bandits with limited adaptivity and learning distributional optimal design

Motivated by practical needs such as large-scale learning, we study the impact of adaptivity constraints to linear contextual bandits, a central problem in online learning and decision making. We consider two popular limited adaptivity models in literature: batch learning and rare policy switches. We show that, when the context vectors are adversarially chosen in d-dimensional linear contextual bandits, the learner needs O(d logd logT) policy switches to achieve the minimax-optimal regret, and this is optimal up to poly(logd, loglogT) factors; for stochastic context vectors, even in the more restricted batch learning model, only O(loglogT) batches are needed to achieve the optimal regret. Together with the known results in literature, our results present a complete picture about the adaptivity constraints in linear contextual bandits. Along the way, we propose the distributional optimal design, a natural extension of the optimal experiment design, and provide a both statistically and computationally efficient learning algorithm for the problem, which may be of independent interest.

Efficiently learning halfspaces with Tsybakov noise

We study the problem of PAC learning homogeneous halfspaces with Tsybakov noise. In the Tsybakov noise model, the label of every example is independently flipped with an adversarially controlled probability that can be arbitrarily close to 1/2 for a fraction of the examples. We give the first polynomial-time algorithm for this fundamental learning problem. Our algorithm learns the true halfspace within any desired accuracy and succeeds under a broad family of well-behaved distributions including log-concave distributions.

This extended abstract is a merge of two papers. In an earlier work, a subset of the authors developed an efficient reduction from learning to certifying the non-optimality of a candidate halfspace and gave a quasi-polynomial time certificate algorithm. In a subsequent work, the authors of the this paper developed a polynomial-time certificate algorithm.

Robust linear regression: optimal rates in polynomial time

We obtain robust and computationally efficient estimators for learning several linear models that achieve statistically optimal convergence rate under minimal distributional assumptions. Concretely, we assume our data is drawn from a k-hypercontractive distribution and an є-fraction is adversarially corrupted. We then describe an estimator that converges to the optimal least-squares minimizer for the true distribution at a rate proportional to є2−2/k, when the noise is independent of the covariates. We note that no such estimator was known prior to our work, even with access to unbounded computation. The rate we achieve is information-theoretically optimal and thus we resolve the main open question in Klivans, Kothari and Meka [COLT’18].

Our key insight is to identify an analytic condition that serves as a polynomial relaxation of independence of random variables. In particular, we show that when the moments of the noise and covariates are negatively-correlated, we obtain the same rate as independent noise. Further, when the condition is not satisfied, we obtain a rate proportional to є2−4/k, and again match the information-theoretic lower bound. Our central technical contribution is to algorithmically exploit independence of random variables in the ”sum-of-squares” framework by formulating it as the aforementioned polynomial inequality.

Statistical query complexity of manifold estimation

This paper studies the statistical query (SQ) complexity of estimating d-dimensional submanifolds in ℝn. We propose a purely geometric algorithm called Manifold Propagation, that reduces the problem to three natural geometric routines: projection, tangent space estimation, and point detection. We then provide constructions of these geometric routines in the SQ framework. Given an adversarial STAT(τ) oracle and a target Hausdorff distance precision ε = Ω(τ2/(d+1)), the resulting SQ manifold reconstruction algorithm has query complexity O(n polylog(n) εd/2), which is proved to be nearly optimal. In the process, we establish low-rank matrix completion results for SQ’s and lower bounds for randomized SQ estimators in general metric spaces.

When is memorization of irrelevant training data necessary for high-accuracy learning?

Modern machine learning models are complex and frequently encode surprising amounts of information about individual inputs. In extreme cases, complex models appear to memorize entire input examples, including seemingly irrelevant information (social security numbers from text, for example). In this paper, we aim to understand whether this sort of memorization is necessary for accurate learning. We describe natural prediction problems in which every sufficiently accurate training algorithm must encode, in the prediction model, essentially all the information about a large subset of its training examples. This remains true even when the examples are high-dimensional and have entropy much higher than the sample size, and even when most of that information is ultimately irrelevant to the task at hand. Further, our results do not depend on the training algorithm or the class of models used for learning.

Our problems are simple and fairly natural variants of the next-symbol prediction and the cluster labeling tasks. These tasks can be seen as abstractions of text- and image-related prediction problems. To establish our results, we reduce from a family of one-way communication problems for which we prove new information complexity lower bounds.

SESSION: Session 2A

Sample-optimal and efficient learning of tree Ising models

We show that n-variable tree-structured Ising models can be learned computationally-efficiently to within total variation distance є from an optimal O(n lnn2) samples, where O(·) hides an absolute constant which, importantly, does not depend on the model being learned—neither its tree nor the magnitude of its edge strengths, on which we place no assumptions. Our guarantees hold, in fact, for the celebrated Chow-Liu algorithm [1968], using the plug-in estimator for estimating mutual information. While this (or any other) algorithm may fail to identify the structure of the underlying model correctly from a finite sample, we show that it will still learn a tree-structured model that is є-close to the true one in total variation distance, a guarantee called “proper learning.”

Our guarantees do not follow from known results for the Chow-Liu algorithm and the ensuing literature on learning graphical models, including the very recent renaissance of algorithms on this learning challenge, which only yield asymptotic consistency results, or sample-suboptimal and/or time-inefficient algorithms, unless further assumptions are placed on the model, such as bounds on the “strengths” of the model’s edges. While we establish guarantees for a widely known and simple algorithm, the analysis that this algorithm succeeds and is sample-optimal is quite complex, requiring a hierarchical classification of the edges into layers with different reconstruction guarantees, depending on their strength, combined with delicate uses of the subadditivity of the squared Hellinger distance over graphical models to control the error accumulation.

Near-optimal learning of tree-structured distributions by Chow-Liu

We provide finite sample guarantees for the classical Chow-Liu algorithm (IEEE Trans. Inform. Theory, 1968) to learn a tree-structured graphical model of a distribution. For a distribution P on Σn and a tree T on n nodes, we say T is an ε-approximate tree for P if there is a T-structured distribution Q such that D(P || Q) is at most ε more than the best possible tree-structured distribution for P. We show that if P itself is tree-structured, then the Chow-Liu algorithm with the plug-in estimator for mutual information with O(|Σ|3 nε−1) i.i.d. samples outputs an ε-approximate tree for P with constant probability. In contrast, for a general P (which may not be tree-structured), Ω(n2ε−2) samples are necessary to find an ε-approximate tree. Our upper bound is based on a new conditional independence tester that addresses an open problem posed by Canonne, Diakonikolas, Kane, and Stewart (STOC, 2018): we prove that for three random variables X,Y,Z each over Σ, testing if I(X; YZ) is 0 or ≥ ε is possible with O(|Σ|3/ε) samples. Finally, we show that for a specific tree T, with O(|Σ|2nε−1) samples from a distribution P over Σn, one can efficiently learn the closest T-structured distribution in KL divergence by applying the add-1 estimator at each node.

Learning Ising models from one or multiple samples

There have been two main lines of work on estimating Ising models: (1) estimating them from multiple independent samples under minimal assumptions about the model's interaction matrix ; and (2) estimating them from one sample in restrictive settings. We propose a unified framework that smoothly interpolates between these two settings, enabling significantly richer estimation guarantees from one, a few, or many samples.

Our main theorem provides guarantees for one-sample estimation, quantifying the estimation error in terms of the metric entropy of a family of interaction matrices. As corollaries of our main theorem, we derive bounds when the model's interaction matrix is a (sparse) linear combination of known matrices, or it belongs to a finite set, or to a high-dimensional manifold. In fact, our main result handles multiple independent samples by viewing them as one sample from a larger model, and can be used to derive estimation bounds that are qualitatively similar to those obtained in the afore-described multiple-sample literature. Our technical approach benefits from sparsifying a model's interaction network, conditioning on subsets of variables that make the dependencies in the resulting conditional distribution sufficiently weak. We use this sparsification technique to prove strong concentration and anti-concentration results for the Ising model, which we believe have applications beyond the scope of this paper.

A new coreset framework for clustering

Given a metric space, the (k,z)-clustering problem consists of finding k centers such that the sum of the of distances raised to the power z of every point to its closest center is minimized. This encapsulates the famous k-median (z=1) and k-means (z=2) clustering problems. Designing small-space sketches of the data that approximately preserves the cost of the solutions, also known as coresets, has been an important research direction over the last 15 years. In this paper, we present a new, simple coreset framework that simultaneously improves upon the best known bounds for a large variety of settings, ranging from Euclidean space, doubling metric, minor-free metric, and the general metric cases.

Sample-efficient proper PAC learning with approximate differential privacy

In this paper we prove that the sample complexity of properly learning a class of Littlestone dimension d with approximate differential privacy is Õ(d6), ignoring privacy and accuracy parameters. This result answers a question of Bun et al. (FOCS 2020) by improving upon their upper bound of 2O(d) on the sample complexity. Prior to our work, finiteness of the sample complexity for privately learning a class of finite Littlestone dimension was only known for improper private learners, and the fact that our learner is proper answers another question of Bun et al., which was also asked by Bousquet et al. (NeurIPS 2020). Using machinery developed by Bousquet et al., we then show that the sample complexity of sanitizing a binary hypothesis class is at most polynomial in its Littlestone dimension and dual Littlestone dimension. This implies that a class is sanitizable if and only if it has finite Littlestone dimension. An important ingredient of our proofs is a new property of binary hypothesis classes that we call irreducibility, which may be of independent interest.

SESSION: Session 1B

Log-rank and lifting for AND-functions

Let f: {0, 1}n → {0, 1} be a boolean function, and let f(x, y) = f(xy) denote the AND-function of f, where xy denotes bit-wise AND. We study the deterministic communication complexity of f and show that, up to a logn factor, it is bounded by a polynomial in the logarithm of the real rank of the communication matrix of f. This comes within a logn factor of establishing the log-rank conjecture for AND-functions with no assumptions on f. Our result stands in contrast with previous results on special cases of the log-rank conjecture, which needed significant restrictions on f such as monotonicity or low F2-degree. Our techniques can also be used to prove (within a logn factor) a lifting theorem for AND-functions, stating that the deterministic communication complexity of f is polynomially related to the AND-decision tree complexity of f.

The results rely on a new structural result regarding boolean functions f: {0, 1}n → {0, 1} with a sparse polynomial representation, which may be of independent interest. We show that if the polynomial computing f has few monomials then the set system of the monomials has a small hitting set, of size poly-logarithmic in its sparsity. We also establish extensions of this result to multi-linear polynomials f: {0, 1}n → with a larger range.

Automating algebraic proof systems is NP-hard

We show that algebraic proofs are hard to find: Given an unsatisfiable CNF formula F, it is NP-hard to find a refutation of F in the Nullstellensatz, Polynomial Calculus, or Sherali–Adams proof systems in time polynomial in the size of the shortest such refutation. Our work extends, and gives a simplified proof of, the recent breakthrough of Atserias and Müller (JACM 2020) that established an analogous result for Resolution.

Strong co-nondeterministic lower bounds for NP cannot be proved feasibly

We show unconditionally that Cook’s theory PV formalizing poly-time reasoning cannot prove, for any non-deterministic poly-time machine M defining a language L(M), that L(M) is inapproximable by co-nondeterministic circuits of sub-exponential size. In fact, our unprovability result holds also for a theory which supports a fragment of Jeřábek’s theory of approximate counting APC1. We also show similar unconditional unprovability results for the conjecture of Rudich about the existence of super-bits.

Iterated lower bound formulas: a diagonalization-based approach to proof complexity

We propose a diagonalization-based approach to several important questions in proof complexity. We illustrate this approach in the context of the algebraic proof system IPS and in the context of propositional proof systems more generally.

We use the approach to give an explicit sequence of CNF formulas {φn} such that VNPVP iff there are no polynomial-size IPS proofs for the formulas φn. This provides a natural equivalence between proof complexity lower bounds and standard algebraic complexity lower bounds. Our proof of this fact uses the implication from IPS lower bounds to algebraic complexity lower bounds due to Grochow and Pitassi together with a diagonalization argument: the formulas φn themselves assert the non-existence of short IPS proofs for formulas encoding VNPVP at a different input length. Our result also has meta-mathematical implications: it gives evidence for the difficulty of proving strong lower bounds for IPS within IPS.

For any strong enough propositional proof system R, we define the *iterated R-lower bound formulas*, which inductively assert the non-existence of short R proofs for formulas encoding the same statement at a different input length, and propose them as explicit hard candidates for the proof system R. We observe that this hypothesis holds for Resolution following recent results of Atserias and Muller and of Garlik, and give evidence in favour of it for other proof systems.

New separations results for external information

We obtain new separation results for the two-party external information complexity of Boolean functions. The external information complexity of a function f(x,y) is the minimum amount of information a two-party protocol computing f must reveal to an outside observer about the input. We prove an exponential separation between external and internal information complexity, which is the best possible; previously no separation was known. We use this result in order to then prove a near-quadratic separation between amortized zero-error communication complexity and external information complexity for total functions, disproving a conjecture of the first author. Finally, we prove a matching upper bound showing that our separation result is tight.

SESSION: Session 2B

Polynomial time deterministic identity testing algorithm for Σ[3]ΠΣΠ[2] circuits via Edelstein–Kelly type theorem for quadratic polynomials

In this work we resolve conjectures of Beecken, Mitmann and Saxena [BMS13] and Gupta [Gupta14], by proving an analog of a theorem of Edelstein and Kelly for quadratic polynomials. As immediate corollary we obtain the first deterministic polynomial time black-box algorithm for testing zeroness of Σ[3]ΠΣΠ[2] circuits.

An improved derandomization of the switching lemma

We prove a new derandomization of Håstad’s switching lemma, showing how to efficiently generate restrictions satisfying the switching lemma for DNF or CNF formulas of size m using only O(logm) random bits. Derandomizations of the switching lemma have been useful in many works as a key building-block for constructing objects which are in some way provably-pseudorandom with respect to AC0-circuits.

Here, we use our new derandomization to give an improved analysis of the pseudorandom generator of Trevisan and Xue for AC0-circuits (CCC’13): we show that the generator ε-fools size-m, depth-D circuits with n-bit inputs using only O(log(m/ε)D · logn) random bits. In particular, we obtain (modulo the loglog-factors hidden in the O-notation) a dependence on m/ε which is best-possible with respect to currently-known AC0-circuit lower bounds.

Simple and fast derandomization from very hard functions: eliminating randomness at almost no cost

Extending the classical “hardness-to-randomness” line-of-works, Doron, Moshkovitz, Oh, and Zuckerman (FOCS 2020) recently proved that derandomization with near-quadratic time overhead is possible, under the assumption that there exists a function in DTIME[2n] that cannot be computed by randomized SVN circuits of size 2(1−є)· n for a small є.

In this work we extend their inquiry and answer several open questions that arose from their work. For a time function T(n), consider the following assumption: Non-uniformly secure one-way functions exist, and for δ=δ(є) and k=kT(є) there exists a problem in DTIME[2k· n] that is hard for algorithms that run in time 2(k−δ)· n and use 2(1−δ)· n bits of advice. Under this assumption, we show that:

1. (Worst-case derandomization.) Probabilistic algorithms that run in time T(n) can be deterministically simulated in time n· T(n)1+є.

2. (Average-case derandomization.) For polynomial time functions T(n)=poly(n), we can improve the derandomization time to nє· T(n) if we allow the derandomization to succeed only on average, rather than in the worst-case.

3. (Conditional optimality.) For worst-case derandomization, the multiplicative time overhead of n is essentially optimal, conditioned on a counting version of the non-deterministic strong exponential-time hypothesis (i.e., on #NSETH).

Lastly, we present an alternative proof for the result of Doron, Moshkovitz, Oh, and Zuckerman that is simpler and more versatile. In fact, we show how to simplify the analysis not only of their construction, but of any construction that “extracts randomness from a pseudoentropic string”.

Average-case hardness of NP from exponential worst-case hardness assumptions

A long-standing and central open question in the theory of average-case complexity is to base average-case hardness of NP on worst-case hardness of NP. A frontier question along this line is to prove that PH is hard on average if UP requires (sub-)exponential worst-case complexity. The difficulty of resolving this question has been discussed from various perspectives based on technical barrier results, such as the limits of black-box reductions and the non-existence of worst-case hardness amplification procedures in PH.

In this paper, we overcome these barriers and resolve the open question by presenting the following main results:

1. UPDTIME(2O(n / logn)) implies DistNPAvgP.

2. PHDTIME(2O(n / logn)) implies DistPHAvgP.

3. NPDTIME(2O(n / logn)) implies DistNPAvgP P. Here, AvgP P denotes P-computable average-case polynomial time, which interpolates average-case polynomial-time and worst-case polynomial-time. We complement this result by showing that DistPHAvgP if and only if DistPHAvgP P.

At the core of all of our results is a new notion of universal heuristic scheme, whose running time is P-computable average-case polynomial time under every polynomial-time samplable distribution. Our proofs are based on the meta-complexity of time-bounded Kolmogorov complexity: We analyze average-case complexity through the lens of worst-case meta-complexity using a new “algorithmic” proof of language compression and weak symmetry of information for time-bounded Kolmogorov complexity.

Pseudodeterministic algorithms and the structure of probabilistic time

We connect the study of pseudodeterministic algorithms to two major open problems about the structural complexity of BPTIME: proving hierarchy theorems and showing the existence of complete problems. Our main contributions can be summarised as follows.

A new pseudorandom generator and its consequences. We build on techniques developed to prove hierarchy theorems for probabilistic time with advice (Fortnow and Santhanam, FOCS 2004) to construct the first unconditional pseudorandom generator of polynomial stretch computable in pseudodeterministic polynomial time (with one bit of advice) that is secure infinitely often against polynomial-time computations. As an application of this construction, we obtain new results about the complexity of generating and representing prime numbers. For instance, we show unconditionally for each ε > 0 that infinitely many primes pn have a succinct representation in the following sense: there is a fixed probabilistic polynomial time algorithm that generates pn with high probability from its succinct representation of size O(|pn|ε). This offers an exponential improvement over the running time of previous results, and shows that infinitely many primes have succinct and efficient representations.

Structural results for probabilistic time from pseudodeterministic algorithms. Oliveira and Santhanam (STOC 2017) established unconditionally that there is a pseudodeterministic algorithm for the Circuit Acceptance Probability Problem (CAPP) that runs in sub-exponential time and is correct with high probability over any samplable distribution on circuits on infinitely many input lengths. We show that improving this running time or obtaining a result that holds for every large input length would imply new time hierarchy theorems for probabilistic time. In addition, we prove that a worst-case polynomial-time pseudodeterministic algorithm for CAPP would imply that BPP has complete problems.

Equivalence between pseudodeterministic constructions and hierarchies. We establish an equivalence between a certain explicit pseudodeterministic construction problem and the existence of strong hierarchy theorems for probabilistic time. More precisely, we show that pseudodeterministically constructing in exponential time strings of large rKt complexity (Oliveira, ICALP 2019) is possible if and only if for every constructive function T(n) ≤ exp(o(exp(n))) we have BPTIME[poly(T)] ⊈ i.o.BPTIME[T]/logT.

More generally, these results suggest new approaches for designing pseudodeterministic algorithms for search problems and for unveiling the structure of probabilistic time.

SESSION: Session 1C

Vertex connectivity in poly-logarithmic max-flows

The vertex connectivity of an m-edge n-vertex undirected graph is the smallest number of vertices whose removal disconnects the graph, or leaves only a singleton vertex. In this paper, we give a reduction from the vertex connectivity problem to a set of maxflow instances. Using this reduction, we can solve vertex connectivity in (mα) time for any α ≥ 1, if there is a mα-time maxflow algorithm. Using the current best maxflow algorithm that runs in m4/3+o(1) time (Kathuria, Liu and Sidford, FOCS 2020), this yields a m4/3+o(1)-time vertex connectivity algorithm. This is the first improvement in the running time of the vertex connectivity problem in over 20 years, the previous best being an Õ(mn)-time algorithm due to Henzinger, Rao, and Gabow (FOCS 1996). Indeed, no algorithm with an o(mn) running time was known before our work, even if we assume an (m)-time maxflow algorithm.

Our new technique is robust enough to also improve the best Õ(mn)-time bound for directed vertex connectivity to mn1−1/12+o(1) time

Finding large induced sparse subgraphs in c>t -free graphs in quasipolynomial time

For an integer t, a graph G is called C>t-free if G does not contain any induced cycle on more than t vertices. We prove the following statement: for every pair of integers d and t and a statement φ, there exists an algorithm that, given an n-vertex C>t-free graph G with weights on vertices, finds in time n(log3 n) a maximum-weight vertex subset S such that G[S] has degeneracy at most d and satisfies φ. The running time can be improved to n(log2 n) assuming G is Pt-free, that is, G does not contain an induced path on t vertices. This expands the recent results of the authors [FOCS 2020 and SOSA 2021] on the Maximum Weight Independent Set problem on Pt-free graphs in two directions: by encompassing the more general setting of C>t-free graphs, and by being applicable to a much wider variety of problems, such as Maximum Weight Induced Forest or Maximum Weight Induced Planar Graph.

Clan embeddings into trees, and low treewidth graphs

In low distortion metric embeddings, the goal is to embed a host “hard” metric space into a “simpler” target space while approximately preserving pairwise distances. A highly desirable target space is that of a tree metric. Unfortunately, such embedding will result in a huge distortion. A celebrated bypass to this problem is stochastic embedding with logarithmic expected distortion. Another bypass is Ramsey-type embedding, where the distortion guarantee applies only to a subset of the points. However, both these solutions fail to provide an embedding into a single tree with a worst-case distortion guarantee on all pairs. In this paper, we propose a novel third bypass called clan embedding. Here each point x is mapped to a subset of points f(x), called a clan, with a special chief point χ(x)∈ f(x). The clan embedding has multiplicative distortion t if for every pair (x,y) some copy y′∈ f(y) in the clan of y is close to the chief of x: miny′∈ f(y)d(y′,χ(x))≤ t· d(x,y). Our first result is a clan embedding into a tree with multiplicative distortion O(logn/є) such that each point has 1+є copies (in expectation). In addition, we provide a “spanning” version of this theorem for graphs and use it to devise the first compact routing scheme with constant size routing tables.

We then focus on minor-free graphs of diameter prameterized by D, which were known to be stochastically embeddable into bounded treewidth graphs with expected additive distortion є D. We devise Ramsey-type embedding and clan embedding analogs of the stochastic embedding. We use these embeddings to construct the first (bicriteria quasi-polynomial time) approximation scheme for the metric ρ-dominating set and metric ρ-independent set problems in minor-free graphs.

Tree embeddings for hop-constrained network design

Network design problems aim to compute low-cost structures such as routes, trees and subgraphs. Often, it is natural and desirable to require that these structures have small hop length or hop diameter. Unfortunately, optimization problems with hop constraints are much harder and less well understood than their hop-unconstrained counterparts. A significant algorithmic barrier in this setting is the fact that hop-constrained distances in graphs are very far from being a metric.

We show that, nonetheless, hop-constrained distances can be approximated by distributions over ``partial tree metrics.'' We build this result into a powerful and versatile algorithmic tool which, similarly to classic probabilistic tree embeddings, reduces hop-constrained problems in general graphs to hop-unconstrained problems on trees. We then use this tool to give the first poly-logarithmic bicriteria approximations for the hop-constrained variants of many classic network design problems. These include Steiner forest, group Steiner tree, group Steiner forest, buy-at-bulk network design as well as online and oblivious versions of many of these problems.

Bridging the gap between tree and connectivity augmentation: unified and stronger approaches

We consider the Connectivity Augmentation Problem (CAP), a classical problem in the area of Survivable Network Design. It is about increasing the edge-connectivity of a graph by one unit in the cheapest possible way. More precisely, given a k-edge-connected graph G=(V,E) and a set of extra edges, the task is to find a minimum cardinality subset of extra edges whose addition to G makes the graph (k+1)-edge-connected. If k is odd, the problem is known to reduce to the Tree Augmentation Problem (TAP)—i.e., G is a spanning tree—for which significant progress has been achieved recently, leading to approximation factors below 1.5 (the currently best factor is 1.458). However, advances on TAP did not carry over to CAP so far. Indeed, only very recently, Byrka, Grandoni, and Ameli (STOC 2020) managed to obtain the first approximation factor below 2 for CAP by presenting a 1.91-approximation algorithm based on a method that is disjoint from recent advances for TAP.

We first bridge the gap between TAP and CAP, by presenting techniques that allow for leveraging insights and methods from TAP to approach CAP. We then introduce a new way to get approximation factors below 1.5, based on a new analysis technique. Through these ingredients, we obtain a -approximation algorithm for CAP, and therefore also TAP. This leads to the currently best approximation result for both problems in a unified way, by significantly improving on the above-mentioned 1.91-approximation for CAP and also the previously best approximation factor of 1.458 for TAP by Grandoni, Kalaitzis, and Zenklusen (STOC 2018). Additionally, a feature we inherit from recent TAP advances is that our approach can deal with the weighted setting when the ratio between the largest to smallest cost on extra links is bounded, in which case we obtain approximation factors below 1.5.

SESSION: Session 2C

Deterministic mincut in almost-linear time

We present a deterministic (global) mincut algorithm for weighted, undirected graphs that runs in m1+o(1) time, answering an open question of Karger from the 1990s. To obtain our result, we de-randomize the construction of the skeleton graph in Karger’s near-linear time mincut algorithm, which is its only randomized component. In particular, we partially de-randomize the well-known Benczur-Karger graph sparsification technique by random sampling, which we accomplish by the method of pessimistic estimators. Our main technical component is designing an efficient pessimistic estimator to capture the cuts of a graph, which involves harnessing the expander decomposition framework introduced in recent work by Goranci et al. (SODA 2021). As a side-effect, we obtain a structural representation of all approximate mincuts in a graph, which may have future applications.

Support of closed walks and second eigenvalue multiplicity of graphs

We show that the multiplicity of the second normalized adjacency matrix eigenvalue of any connected graph of maximum degree Δ is bounded by O(n Δ7/5/log1/5−o(1)n) for any Δ, and improve this to O(nlog1/2d/log1/4−o(1)n) for simple d-regular graphs when d≥ log1/4n. In fact, the same bounds hold for the number of eigenvalues in any interval of width λ2/logΔ1−o(1)n containing the second eigenvalue λ2. The main ingredient in the proof is a polynomial (in k) lower bound on the typical support of a closed random walk of length 2k in any connected graph, which in turn relies on new lower bounds for the entries of the Perron eigenvector of submatrices of the normalized adjacency matrix.

Log-concave polynomials IV: approximate exchange, tight mixing times, and near-optimal sampling of forests

We prove tight mixing time bounds for natural random walks on bases of matroids, determinantal distributions, and more generally distributions associated with log-concave polynomials. For a matroid of rank k on a ground set of n elements, or more generally distributions associated with log-concave polynomials of homogeneous degree k on n variables, we show that the down-up random walk, started from an arbitrary point in the support, mixes in time O(klogk). Our bound has no dependence on n or the starting point, unlike the previous analyses of Anari et al. (STOC 2019), Cryan et al. (FOCS 2019), and is tight up to constant factors. The main new ingredient is a property we call approximate exchange, a generalization of well-studied exchange properties for matroids and valuated matroids, which may be of independent interest. In particular, given a distribution µ over size-k subsets of [n], our approximate exchange property implies that a simple local search algorithm gives a kO(k)-approximation of maxS µ(S) when µ is generated by a log-concave polynomial, and that greedy gives the same approximation ratio when µ is strongly Rayleigh. As an application, we show how to leverage down-up random walks to approximately sample random forests or random spanning trees in a graph with n edges in time O(nlog2 n). The best known result for sampling random forest was a FPAUS with high polynomial runtime recently found by Anari et al. (STOC 2019), Cryan et al. (FOCS 2019). For spanning tree, we improve on the almost-linear time algorithm by Schild (STOC 2018). Our analysis works on weighted graphs too, and is the first to achieve nearly-linear running time for these problems. Our algorithms can be naturally extended to support approximately sampling from random forests of size between k1 and k2 in time O(n log2 n), for fixed parameters k1, k2.

Breaking the quadratic barrier for matroid intersection

The matroid intersection problem is a fundamental problem that has been extensively studied for half a century. In the classic version of this problem, we are given two matroids M1 = (V, I1) and M2 = (V, I2) on a comment ground set V of n elements, and then we have to find the largest common independent set SI1I2 by making independence oracle queries of the form ”Is SI1?” or ”Is SI2?” for SV. The goal is to minimize the number of queries.

Beating the existing Õ(n2) bound, known as the quadratic barrier, is an open problem that captures the limits of techniques from two lines of work. The first one is the classic Cunningham’s algorithm [SICOMP 1986], whose Õ(n2)-query implementations were shown by CLS+ [FOCS 2019] and Nguyen [2019] (more generally, these algorithms take Õ(nr) queries where r denotes the rank which can be as big as n). The other one is the general cutting plane method of Lee, Sidford, and Wong [FOCS 2015]. The only progress towards breaking the quadratic barrier requires either approximation algorithms or a more powerful rank oracle query [CLS+ FOCS 2019]. No exact algorithm with o(n2) independence queries was known.

In this work, we break the quadratic barrier with a randomized algorithm guaranteeing Õ(n9/5) independence queries with high probability, and a deterministic algorithm guaranteeing Õ(n11/6) independence queries. Our key insight is simple and fast algorithms to solve a graph reachability problem that arose in the standard augmenting path framework [Edmonds 1968]. Combining this with previous exact and approximation algorithms leads to our results.

Fractionally log-concave and sector-stable polynomials: counting planar matchings and more

We show fully polynomial time randomized approximation schemes (FPRAS) for counting matchings of a given size, or more generally sampling/counting monomer-dimer systems in planar, not-necessarily-bipartite, graphs. While perfect matchings on planar graphs can be counted exactly in polynomial time, counting non-perfect matchings was shown by Jerrum (J Stat Phys 1987) to be #P-hard, who also raised the question of whether efficient approximate counting is possible. We answer this affirmatively by showing that the multi-site Glauber dynamics on the set of monomers in a monomer-dimer system always mixes rapidly, and that this dynamics can be implemented efficiently on downward-closed families of graphs where counting perfect matchings is tractable. As further applications of our results, we show how to sample efficiently using multi-site Glauber dynamics from partition-constrained strongly Rayleigh distributions, and nonsymmetric determinantal point processes. In order to analyze mixing properties of the multi-site Glauber dynamics, we establish two notions for generating polynomials of discrete set-valued distributions: sector-stability and fractional log-concavity. These notions generalize well-studied properties like real-stability and log-concavity, but unlike them robustly degrade under useful transformations applied to the distribution. We relate these notions to pairwise correlations in the underlying distribution and the notion of spectral independence introduced by Anari et al. (FOCS 2020), providing a new tool for establishing spectral independence based on geometry of polynomials. As a byproduct of our techniques, we show that polynomials avoiding roots in a sector of the complex plane must satisfy what we call fractional log-concavity; this generalizes a classic result established by Gårding (J Math Mech 1959) who showed homogeneous polynomials that have no roots in a half-plane must be log-concave over the positive orthant.

SESSION: Session 3A

Adversarial laws of large numbers and optimal regret in online classification

Laws of large numbers guarantee that given a large enough sample from some population, the measure of any fixed sub-population is well-estimated by its frequency in the sample. We study laws of large numbers in sampling processes that can affect the environment they are acting upon and interact with it. Specifically, we consider the sequential sampling model proposed by Ben-Eliezer and Yogev (2020), and characterize the classes which admit a uniform law of large numbers in this model: these are exactly the classes that are online learnable. Our characterization may be interpreted as an online analogue to the equivalence between learnability and uniform convergence in statistical (PAC) learning. The sample-complexity bounds we obtain are tight for many parameter regimes, and as an application, we determine the optimal regret bounds in online learning, stated in terms of Littlestone’s dimension, thus resolving the main open question from Ben-David, Pál, and Shalev-Shwartz (2009), which was also posed by Rakhlin, Sridharan, and Tewari (2015).

Stronger calibration lower bounds via sidestepping

We consider an online binary prediction setting where a forecaster observes a sequence of T bits one by one. Before each bit is revealed, the forecaster predicts the probability that the bit is 1. The forecaster is called well-calibrated if for each p ∈ [0, 1], among the np bits for which the forecaster predicts probability p, the actual number of ones, mp, is indeed equal to p · np. The calibration error, defined as ∑p |mpp np|, quantifies the extent to which the forecaster deviates from being well-calibrated. It has long been known that an O(T2/3) calibration error is achievable even when the bits are chosen adversarially, and possibly based on the previous predictions. However, little is known on the lower bound side, except an Ω(√T) bound that follows from the trivial example of independent fair coin flips.

In this paper, we prove an Ω(T0.528) bound on the calibration error, which is the first super-√T lower bound for this setting to the best of our knowledge. The technical contributions of our work include two lower bound techniques, early stopping and sidestepping, which circumvent the obstacles that have previously hindered strong calibration lower bounds. We also propose an abstraction of the prediction setting, termed the Sign-Preservation game, which may be of independent interest. This game has a much smaller state space than the full prediction setting and allows simpler analyses. The Ω(T0.528) lower bound follows from a general reduction theorem that translates lower bounds on the game value of Sign-Preservation into lower bounds on the calibration error.

Hardness of learning DNFs using halfspaces

The problem of learning t-term DNF formulas (for t = O(1)) has been studied extensively in the PAC model since its introduction by Valiant (STOC 1984). A t-term DNF can be efficiently learnt using a t-term DNF only if t = 1 i.e., when it is an AND, while even weakly learning a 2-term DNF using a constant term DNF was shown to be NP-hard by Khot and Saket (FOCS 2008). On the other hand, Feldman, Guruswami, Raghavendra and Wu (FOCS 2009) showed the hardness of weakly learning a noisy AND using a halfspace – the latter being a generalization of an AND, while Khot and Saket (STOC 2008) showed that an intersection of two halfspaces is hard to weakly learn using any function of constantly many halfspaces. The question of whether a 2-term DNF is efficiently learnable using 2 or constantly many halfspaces remained open.

In this work we answer this question in the negative by showing the hardness of weakly learning a 2-term DNF as well as a noisy AND using any function of a constant number of halfspaces. In particular we prove the following. For any constants ν, ζ > 0 and ℓ ∈ N, given a distribution over point-value pairs {0,1}n × {0,1}, it is NP-hard to decide whether, (i) YES Case. There is a 2-term DNF that classifies all the points of the distribution, and an AND that classifies at least 1−ζ fraction of the points correctly. (ii) NO Case. Any boolean function depending on at most ℓ halfspaces classifies at most 1/2 + ν fraction of the points of the distribution correctly.

Our result generalizes and strengthens the previous best results mentioned above on the hardness of learning a 2-term DNF, learning an intersection of two halfspaces, and learning a noisy AND.

Boosting simple learners

Boosting is a celebrated machine learning approach which is based on the idea of combining weak and moderately inaccurate hypotheses to a strong and accurate one. We study boosting under the assumption that the weak hypotheses belong to a class of bounded capacity. This assumption is inspired by the common convention that weak hypotheses are “rules-of-thumbs” from an “easy-to-learn class”. (Schapire and Freund ’12, Shalev-Shwartz and Ben-David ’14.) Formally, we assume the class of weak hypotheses has a bounded VC dimension. We focus on two main questions: (i) Oracle Complexity: How many weak hypotheses are needed in order to produce an accurate hypothesis? We design a novel boosting algorithm and demonstrate that it circumvents a classical lower bound by Freund and Schapire (’95, ’12). Whereas the lower bound shows that Ω(1/γ2) weak hypotheses with γ-margin are sometimes necessary, our new method requires only Õ(1/γ) weak hypothesis, provided that they belong to a class of bounded VC dimension. Unlike previous boosting algorithms which aggregate the weak hypotheses by majority votes, the new boosting algorithm uses more complex (“deeper”) aggregation rules. We complement this result by showing that complex aggregation rules are in fact necessary to circumvent the aforementioned lower bound. (ii) Expressivity: Which tasks can be learned by boosting weak hypotheses from a bounded VC class? Can complex concepts that are “far away” from the class be learned? Towards answering the first question we identify a combinatorial-geometric parameter which captures the expressivity of base-classes in boosting. As a corollary we provide an affirmative answer to the second question for many well-studied classes, including half-spaces and decision stumps. Along the way, we establish and exploit connections with Discrepancy Theory.

Algorithmic foundations for the diffraction limit

For more than a century and a half it has been widely-believed (but was never rigorously shown) that the physics of diffraction imposes certain fundamental limits on the resolution of an optical system. However our understanding of what exactly can and cannot be resolved has never risen above heuristic arguments which, even worse, appear contradictory. In this work we remedy this gap by studying the diffraction limit as a statistical inverse problem and, based on connections to provable algorithms for learning mixture models, we rigorously prove upper and lower bounds on the statistical and algorithmic complexity needed to resolve closely spaced point sources. In particular we show that there is a phase transition where the sample complexity goes from polynomial to exponential. Surprisingly, we show that this does not occur at the Abbe limit, which has long been presumed to be the true diffraction limit.

SESSION: Session 4A

VC dimension and distribution-free sample-based testing

We consider the problem of determining which classes of functions can be tested more efficiently than they can be learned, in the distribution-free sample-based model that corresponds to the standard PAC learning setting. Our main result shows that while VC dimension by itself does not always provide tight bounds on the number of samples required to test a class of functions in this model, it can be combined with a closely-related variant that we call “lower VC” (or LVC) dimension to obtain strong lower bounds on this sample complexity.

We use this result to obtain strong and in many cases nearly optimal bounds on the sample complexity for testing unions of intervals, halfspaces, intersections of halfspaces, polynomial threshold functions, and decision trees. Conversely, we show that two natural classes of functions, juntas and monotone functions, can be tested with a number of samples that is polynomially smaller than the number of samples required for PAC learning.

Finally, we also use the connection between VC dimension and property testing to establish new lower bounds for testing radius clusterability and testing feasibility of linear constraint systems.

Settling the robust learnability of mixtures of Gaussians

This work represents a natural coalescence of two important lines of work – learning mixtures of Gaussians and algorithmic robust statistics. In particular we give the first provably robust algorithm for learning mixtures of any constant number of Gaussians. We require only mild assumptions on the mixing weights (bounded fractionality) and that the total variation distance between components is bounded away from zero. At the heart of our algorithm is a new method for proving dimension-independent polynomial identifiability through applying a carefully chosen sequence of differential operations to certain generating functions that not only encode the parameters we would like to learn but also the system of polynomial equations we would like to solve. We show how the symbolic identities we derive can be directly used to analyze a natural sum-of-squares relaxation.

A theory of universal learning

How quickly can a given class of concepts be learned from examples? It is common to measure the performance of a supervised machine learning algorithm by plotting its “learning curve”, that is, the decay of the error rate as a function of the number of training examples. However, the classical theoretical framework for understanding learnability, the PAC model of Vapnik-Chervonenkis and Valiant, does not explain the behavior of learning curves: the distribution-free PAC model of learning can only bound the upper envelope of the learning curves over all possible data distributions. This does not match the practice of machine learning, where the data source is typically fixed in any given scenario, while the learner may choose the number of training examples on the basis of factors such as computational resources and desired accuracy.

In this paper, we study an alternative learning model that better captures such practical aspects of machine learning, but still gives rise to a complete theory of the learnable in the spirit of the PAC model. More precisely, we consider the problem of universal learning, which aims to understand the performance of learning algorithms on every data distribution, but without requiring uniformity over the distribution. The main result of this paper is a remarkable trichotomy: there are only three possible rates of universal learning. More precisely, we show that the learning curves of any given concept class decay either at an exponential, linear, or arbitrarily slow rates. Moreover, each of these cases is completely characterized by appropriate combinatorial parameters, and we exhibit optimal learning algorithms that achieve the best possible rate in each case.

For concreteness, we consider in this paper only the realizable case, though analogous results are expected to extend to more general learning scenarios.

Optimal testing of discrete distributions with high probability

We study the problem of testing discrete distributions with a focus on the high probability regime. Specifically, given samples from one or more discrete distributions, a property P, and parameters 0< є, δ <1, we want to distinguish with probability at least 1−δ whether these distributions satisfy P or are є-far from P in total variation distance. Most prior work in distribution testing studied the constant confidence case (corresponding to δ = Ω(1)), and provided sample-optimal testers for a range of properties. While one can always boost the confidence probability of any such tester by black-box amplification, this generic boosting method typically leads to sub-optimal sample bounds.

Here we study the following broad question: For a given property P, can we characterize the sample complexity of testing P as a function of all relevant problem parameters, including the error probability δ? Prior to this work, uniformity testing was the only statistical task whose sample complexity had been characterized in this setting. As our main results, we provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors, as a function of all relevant parameters. We also show matching information-theoretic lower bounds on the sample complexity of these problems. Our techniques naturally extend to give optimal testers for related problems. To illustrate the generality of our methods, we give optimal algorithms for testing collections of distributions and testing closeness with unequal sized samples.

SESSION: Session 3B

Information theoretic limits of cardinality estimation: Fisher meets Shannon

Estimating the cardinality (number of distinct elements) of a large multiset is a classic problem in streaming and sketching, dating back to Flajolet and Martin’s classic Probabilistic Counting (PCSA) algorithm from 1983.

In this paper we study the intrinsic tradeoff between the space complexity of the sketch and its estimation error in the random oracle model. We define a new measure of efficiency for cardinality estimators called the Fisher-Shannon (Fish) number H/I. It captures the tension between the limiting Shannon entropy (H) of the sketch and its normalized Fisher information (I), which characterizes the variance of a statistically efficient, asymptotically unbiased estimator.

Our results are as follows.

(i) We prove that all base-q variants of Flajolet and Martin’s PCSA sketch have Fish-number H0/I0 ≈ 1.98016 and that every base-q variant of (Hyper)LogLog has Fish-number worse than H0/I0, but that they tend to H0/I0 in the limit as q→ ∞. Here H0,I0 are precisely defined constants.

(ii) We describe a sketch called Fishmonger that is based on a smoothed, entropy-compressed variant of PCSA with a different estimator function. It is proved that with high probability, Fishmonger processes a multiset of [U] such that at all times, its space is O(log2logU) + (1+o(1))(H0/I0)b ≈ 1.98b bits and its standard error is 1/√b.

(iii) We give circumstantial evidence that H0/I0 is the optimum Fish-number of mergeable sketches for Cardinality Estimation. We define a class of linearizable sketches and prove that no member of this class can beat H0/I0. The popular mergeable sketches are, in fact, also linearizable.

Almost optimal super-constant-pass streaming lower bounds for reachability

We give an almost quadratic n2−o(1) lower bound on the space consumption of any o(√logn)-pass streaming algorithm solving the (directed) s-t reachability problem. This means that any such algorithm must essentially store the entire graph. As corollaries, we obtain almost quadratic space lower bounds for additional fundamental problems, including maximum matching, shortest path, matrix rank, and linear programming.

Our main technical contribution is the definition and construction of set hiding graphs, that may be of independent interest: we give a general way of encoding a set S ⊆ [k] as a directed graph with n = k 1 + o( 1 ) vertices, such that deciding whether iS boils down to deciding if ti is reachable from si, for a specific pair of vertices (si,ti) in the graph. Furthermore, we prove that our graph “hides” S, in the sense that no low-space streaming algorithm with a small number of passes can learn (almost) anything about S.

Robust testing of low dimensional functions

A natural problem in high-dimensional inference is to decide if a classifier f:ℝn → {−1,1} depends on a small number of linear directions of its input data. Call a function g: ℝn → {−1,1}, a linear k-junta if it is completely determined by some k-dimensional subspace of the input space. A recent work of the authors showed that linear k-juntas are testable. Thus there exists an algorithm to distinguish between:  (1) f: ℝn → {−1,1} which is a linear k-junta with surface area s.   (2) f is є-far from any linear k-junta with surface area (1+є)s.   The query complexity of the algorithm is independent of the ambient dimension n.

Following the surge of interest in noise-tolerant property testing, in this paper we prove a noise-tolerant (or robust) version of this result. Namely, we give an algorithm which given any c>0, є>0, distinguishes between:   (1) f: ℝn → {−1,1} has correlation at least c with some linear k-junta with surface area s.   (2) f has correlation at most c−є with any linear k-junta with surface area at most s.   The query complexity of our tester is kpoly(s/є). Using our techniques, we also obtain a fully noise tolerant tester with the same query complexity for any class C of linear k-juntas with surface area bounded by s. As a consequence, we obtain a fully noise tolerant tester with query complexity kO(poly(logk/є)) for the class of intersection of k-halfspaces (for constant k) over the Gaussian space. Our query complexity is independent of the ambient dimension n. Previously, no non-trivial noise tolerant testers were known even for a single halfspace.

Towards tight bounds for spectral sparsification of hypergraphs

Cut and spectral sparsification of graphs have numerous applications, including e.g. speeding up algorithms for cuts and Laplacian solvers. These powerful notions have recently been extended to hypergraphs, which are much richer and may offer new applications. However, the current bounds on the size of hypergraph sparsifiers are not as tight as the corresponding bounds for graphs.

Our first result is a polynomial-time algorithm that, given a hypergraph on n vertices with maximum hyperedge size r, outputs an є-spectral sparsifier with O*(nr) hyperedges, where O* suppresses (є−1 logn)O(1) factors. This size bound improves the two previous bounds: O*(n3) [Soma and Yoshida, SODA’19] and O*(nr3) [Bansal, Svensson and Trevisan, FOCS’19]. Our main technical tool is a new method for proving concentration of the nonlinear analogue of the quadratic form of the Laplacians for hypergraph expanders.

We complement this with lower bounds on the bit complexity of any compression scheme that (1+є)-approximates all the cuts in a given hypergraph, and hence also on the bit complexity of every є-cut/spectral sparsifier. These lower bounds are based on Ruzsa-Szemerédi graphs, and a particular instantiation yields an Ω(nr) lower bound on the bit complexity even for fixed constant є. In the case of hypergraph cut sparsifiers, this is tight up to polylogarithmic factors in n, due to recent result of [Chen, Khanna and Nagda, FOCS’20]. For spectral sparsifiers it narrows the gap to O*(r).

Finally, for directed hypergraphs, we present an algorithm that computes an є-spectral sparsifier with O*(n2r3) hyperarcs, where r is the maximum size of a hyperarc. For small r, this improves over O*(n3) known from [Soma and Yoshida, SODA’19], and is getting close to the trivial lower bound of Ω(n2) hyperarcs.

Graph streaming lower bounds for parameter estimation and property testing via a streaming XOR lemma

We study space-pass tradeoffs in graph streaming algorithms for parameter estimation and property testing problems such as estimating the size of maximum matchings and maximum cuts, weight of minimum spanning trees, or testing if a graph is connected or cycle-free versus being far from these properties. We develop a new lower bound technique that proves that for many problems of interest, including all the above, obtaining a (1+є)-approximation requires either nΩ(1) space or Ω(1/є) passes, even on highly restricted families of graphs such as bounded-degree planar graphs. For multiple of these problems, this bound matches those of existing algorithms and is thus (asymptotically) optimal. Our results considerably strengthen prior lower bounds even for arbitrary graphs: starting from the influential work of [Verbin, Yu; SODA 2011], there has been a plethora of lower bounds for single-pass algorithms for these problems; however, the only multi-pass lower bounds proven very recently in [Assadi, Kol, Saxena, Yu; FOCS 2020] rules out sublinear-space algorithms with exponentially smaller o(log(1/є)) passes for these problems. One key ingredient of our proofs is a simple streaming XOR Lemma, a generic hardness amplification result, that we prove: informally speaking, if a p-pass s-space streaming algorithm can only solve a decision problem with advantage δ > 0 over random guessing, then it cannot solve XOR of ℓ independent copies of the problem with advantage much better than δ. This result can be of independent interest and useful for other streaming lower bounds as well.

SESSION: Session 4B

Decremental all-pairs shortest paths in deterministic near-linear time

We study the decremental All-Pairs Shortest Paths (APSP) problem in undirected edge-weighted graphs. The input to the problem is an undirected n-vertex m-edge graph G with non-negative lengths on edges, that undergoes an online sequence of edge deletions. The goal is to support approximate shortest-paths queries: given a pair x,y of vertices of G, return a path P connecting x to y, whose length is within factor α of the length of the shortest x-y path, in time Õ(|E(P)|), where α is the approximation factor of the algorithm. APSP is one of the most basic and extensively studied dynamic graph problems.

A long line of work culminated in the algorithm of [Chechik, FOCS 2018] with near optimal guarantees: for any constant 0<є≤ 1 and parameter k≥ 1, the algorithm achieves approximation factor (2+є)k−1, and total update time O(mn1/k+o(1)log(nL)), where L is the ratio of longest to shortest edge lengths. Unfortunately, as much of prior work, the algorithm is randomized and needs to assume an oblivious adversary; that is, the input edge-deletion sequence is fixed in advance and may not depend on the algorithm’s behavior. In many real-world scenarios, and in applications of APSP to static graph problems, it is crucial that the algorithm works against an adaptive adversary, where the edge deletion sequence may depend on the algorithm’s past behavior arbitrarily; ideally, such an algorithm should be deterministic. Unfortunately, unlike the oblivious-adversary setting, its adaptive-adversary counterpart is still poorly understood. For unweighted graphs, the algorithm of [Henzinger, Krinninger and Nanongkai, FOCS ’13, SICOMP ’16] achieves a (1+є)-approximation with total update time Õ(mn/є); the best current total update time guarantee of n2.5+O(є) is achieved by the recent deterministic algorithm of [Chuzhoy, Saranurak, SODA’21], with 2O(1/є)-multiplicative and 2O(log3/4n/є)-additive approximation. To the best of our knowledge, for arbitrary non-negative edge weights, the fastest current adaptive-update algorithm has total update time O(n3logL/є), achieving a (1+є)-approximation. Even if we are willing to settle for any o(n)-approximation factor, no currently known algorithm has a better than Θ(n3) total update time in weighted graphs and better than Θ(n2.5) total update time in unweighted graphs. Several conditional lower bounds suggest that no algorithm with a sufficiently small approximation factor can achieve an o(n3) total update time. Our main result is a deterministic algorithm for decremental APSP in undirected edge-weighted graphs, that, for any Ω(1/loglogm)≤ є< 1, achieves approximation factor (logm)2O(1/є), with total update time O(m1+O(є)· (logm)O(1/є2)· logL). In particular, we obtain a (polylogm)-approximation in time Õ(m1+є) for any constant є, and, for any slowly growing function f(m), we obtain (logm)f(m)-approximation in time m1+o(1). We also provide an algorithm with similar guarantees for decremental Sparse Neighborhood Covers.

Improved dynamic algorithms for longest increasing subsequence

We study dynamic algorithms for the longest increasing subsequence (LIS) problem. A dynamic LIS algorithm maintains a sequence subject to operations of the following form arriving one by one: insert an element, delete an element, or substitute an element for another. After each update, the algorithm must report the length of the longest increasing subsequence of the current sequence.

Our main contribution is the first exact dynamic LIS algorithm with sublinear update time. More precisely, we present a randomized algorithm that performs each operation in time Õ(n4/5) and, after each update, reports the answer to the LIS problem correctly with high probability. We use several novel techniques and observations for this algorithm that may find applications in future work.

In the second part of the paper, we study approximate dynamic LIS algorithms, which are allowed to underestimate the solution size within a bounded multiplicative factor. In this setting, we give a deterministic (1−o(1))-approximation algorithm with update time O(no(1)). This result improves upon the previous work of Mitzenmacher and Seddighin (STOC’20) that provides an Ω(єO(1/є))-approximation algorithm with update time Õ(nє) for any є > 0.

Fully dynamic approximation of LIS in polylogarithmic time

We revisit the problem of maintaining the longest increasing subsequence (LIS) of an array under (i) inserting an element, and (ii) deleting an element of an array. In a recent breakthrough, Mitzenmacher and Seddighin [STOC 2020] designed an algorithm that maintains an O((1/є)O(1/є))-approximation of LIS under both operations with worst-case update time Õ(nє), for any constant є>0 (Õ hides factors polynomial in logn, where n is the length of the input). We exponentially improve on their result by designing an algorithm that maintains an (1+є) approximation of LIS under both operations with worst-case update time Õ(є−5). Instead of working with the grid packing technique introduced by Mitzenmacher and Seddighin, we take a different approach building on a new tool that might be of independent interest: LIS sparsification.

A particularly interesting consequence of our result is an improved solution for the so-called Erdős-Szekeres partitioning, in which we seek a partition of a given permutation of {1,2,…,n} into O(√n) monotone subsequences. This problem has been repeatedly stated as one of the natural examples in which we see a large gap between the decision-tree complexity and algorithmic complexity. The result of Mitzenmacher and Seddighin implies an O(n1+є) time solution for this problem, for any є>0. Our algorithm (in fact, its simpler decremental version) further improves this to Õ(n).

A framework for dynamic matching in weighted graphs

We introduce a new framework for computing approximate maximum weight matchings. Our primary focus is on the fully dynamic setting, where there is a large gap between the guarantees of the best known algorithms for computing weighted and unweighted matchings. Indeed, almost all current weighted matching algorithms that reduce to the unweighted problem lose a factor of two in the approximation ratio. In contrast, in other sublinear models such as the distributed and streaming models, recent work has largely closed this weighted/unweighted gap.

For bipartite graphs, we almost completely settle the gap with a general reduction that converts any algorithm for α-approximate unweighted matching to an algorithm for (1−)α-approximate weighted matching, while only increasing the update time by an O(logn) factor for constant . We also show that our framework leads to significant improvements for non-bipartite graphs, though not in the form of a universal reduction. In particular, we give two algorithms for weighted non-bipartite matching:

1. A randomized (Las Vegas) fully dynamic algorithm that maintains a (1/2−)-approximate maximum weight matching in worst-case update time O(polylog n) with high probability against an adaptive adversary. Our bounds are essentially the same as those of the unweighted algorithm of Wajc [STOC 2020]. 2. A deterministic fully dynamic algorithm that maintains a (2/3−)-approximate maximum weight matching in amortized update time O(m1/4). Our bounds are essentially the same as those of the unweighted algorithm of Bernstein and Stein [SODA 2016].

A key feature of our framework is that it uses existing algorithms for unweighted matching as black-boxes. As a result, our framework is simple and versatile. Moreover, our framework easily translates to other models, and we use it to derive new results for the weighted matching problem in streaming and communication complexity models.

Online stochastic matching, poisson arrivals, and the natural linear program

We study the online stochastic matching problem. Consider a bipartite graph with offline vertices on one side, and with i.i.d.online vertices on the other side. The offline vertices and the distribution of online vertices are known to the algorithm beforehand. The realization of the online vertices, however, is revealed one at a time, upon which the algorithm immediately decides how to match it. For maximizing the cardinality of the matching, we give a 0.711-competitive online algorithm, which improves the best previous ratio of 0.706. When the offline vertices are weighted, we introduce a 0.7009-competitive online algorithm for maximizing the total weight of the matched offline vertices, which improves the best previous ratio of 0.662.

Conceptually, we find that the analysis of online algorithms simplifies if the online vertices follow a Poisson process, and establish an approximate equivalence between this Poisson arrival model and online stochstic matching. Technically, we propose a natural linear program for the Poisson arrival model, and demonstrate how to exploit its structure by introducing a converse of Jensen’s inequality. Moreover, we design an algorithmic amortization to replace the analytic one in previous work, and as a result get the first vertex-weighted online stochastic matching algorithm that improves the results in the weaker random arrival model.

SESSION: Session 3C

Continuous LWE

We introduce a continuous analogue of the Learning with Errors (LWE) problem, which we name CLWE. We give a polynomial-time quantum reduction from worst-case lattice problems to CLWE, showing that CLWE enjoys similar hardness guarantees to those of LWE. Alternatively, our result can also be seen as opening new avenues of (quantum) attacks on lattice problems. Our work resolves an open problem regarding the computational complexity of learning mixtures of Gaussians without separability assumptions (Diakonikolas 2016, Moitra 2018). As an additional motivation, (a slight variant of) CLWE was considered in the context of robust machine learning (Diakonikolas et al.~FOCS 2017), where hardness in the statistical query (SQ) model was shown; our work addresses the open question regarding its computational hardness (Bubeck et al.~ICML 2019).

SNARGs for bounded depth computations and PPAD hardness from sub-exponential LWE

We construct a succinct non-interactive publicly-verifiable delegation scheme for any log-space uniform circuit under the sub-exponential Learning With Errors (LWE) assumption. For a circuit C:{0,1}N→{0,1} of size S and depth D, the prover runs in time poly(S), the communication complexity is D · polylog(S), and the verifier runs in time (D+N) ·polylog(S). To obtain this result, we introduce a new cryptographic primitive: a lossy correlation-intractable hash function family. We use this primitive to soundly instantiate the Fiat-Shamir transform for a large class of interactive proofs, including the interactive sum-check protocol and the GKR protocol, assuming the sub-exponential hardness of LWE.

Additionally, by relying on the result of Choudhuri et al. (STOC 2019), we establish (sub-exponential) average-case hardness of PPAD, assuming the sub-exponential hardness of LWE.

Cryptography from sublinear-time average-case hardness of time-bounded Kolmogorov complexity

Let MKtP[s] be the set of strings x such that Kt(x) ≤ s(|x|), where Kt(x) denotes the t-bounded Kolmogorov complexity of the truthtable described by x. Our main theorem shows that for an appropriate notion of mild average-case hardness, for every ε>0, polynomial t(n) ≥ (1+ε)n, and every “nice” class F of super-polynomial functions, the following are equivalent: (i) the existence of some function TF such that T-hard one-way functions (OWF) exists (with non-uniform security); (ii) the existence of some function TF such that MKtP[T−1] is mildly average-case hard with respect to sublinear-time non-uniform algorithms (with running-time nδ for some 0<δ<1). For instance, existence of subexponentially-hard (resp. quasi-poly-nomially-hard) OWFs is equivalent to mild average-case hardness of MKtP[poly logn] (resp. MKtP[2O(√logn))]) w.r.t. sublinear-time non-uniform algorithms. We additionally note that if we want to deduce T-hard OWFs where security holds w.r.t. uniform T-time probabilistic attackers (i.e., uniformly-secure OWFs), it suffices to assume sublinear time hardness of MKtP w.r.t. uniform probabilistic sublinear-time attackers. We complement this result by proving lower bounds that come surprisingly close to what is required to unconditionally deduce the existence of (uniformly-secure) OWFs: MKtP[polylogn] is worst-case hard w.r.t. uniform probabilistic sublinear-time algorithms, and MKtP[n−logn] is mildly average-case hard for all O(t(n)/n3)-time deterministic algorithms.

Indistinguishability obfuscation from circular security

We show the existence of indistinguishability obfuscators (iO) for general circuits assuming subexponential security of: (a) the Learning with Errors (LWE) assumption (with subexponential modulus-to-noise ratio); (b) a circular security conjecture regarding the Gentry-Sahai-Waters' (GSW) encryption scheme and a Packed version of Regev's encryption scheme. The circular security conjecture states that a notion of leakage-resilient security, that we prove is satisfied by GSW assuming LWE, is retained in the presence of an encrypted key-cycle involving GSW and Packed Regev.

Fiat–Shamir via list-recoverable codes (or: parallel repetition of GMW is not zero-knowledge)

In a seminal work, Goldreich, Micali and Wigderson (CRYPTO ’86) demonstrated the wide applicability of zero-knowledge proofs by constructing such a proof system for the NP-complete problem of graph 3-coloring. A long-standing open question has been whether parallel repetition of their protocol preserves zero knowledge. In this work, we answer this question in the negative, assuming a standard cryptographic assumption (i.e., the hardness of learning with errors (LWE)).

Leveraging a connection observed by Dwork, Naor, Reingold, and Stockmeyer (FOCS ’99), our negative result is obtained by making positive progress on a related fundamental problem in cryptography: securely instantiating the Fiat-Shamir heuristic for eliminating interaction in public-coin interactive protocols. A recent line of work has shown how to instantiate the heuristic securely, albeit only for a limited class of protocols.

Our main result shows how to instantiate Fiat-Shamir for parallel repetitions of much more general interactive proofs. In particular, we construct hash functions that, assuming LWE, securely realize the Fiat-Shamir transform for the following rich classes of protocols:

1) The parallel repetition of any “commit-and-open” protocol (such as the GMW protocol mentioned above), when a specific (natural) commitment scheme is used. Commit-and-open protocols are a ubiquitous paradigm for constructing general purpose public-coin zero knowledge proofs.

2) The parallel repetition of any base protocol that (1) satisfies a stronger notion of soundness called round-by-round soundness, and (2) has an efficient procedure, using a suitable trapdoor, for recognizing “bad verifier randomness” that would allow the prover to cheat.

Our results are obtained by establishing a new connection between the Fiat-Shamir transform and list-recoverable codes. In contrast to the usual focus in coding theory, we focus on a parameter regime in which the input lists are extremely large, but the rate can be small. We give a (probabilistic) construction based on Parvaresh-Vardy codes (FOCS ’05) that suffices for our applications.

SESSION: Session 4C

Inverse-exponential correlation bounds and extremely rigid matrices from a new derandomized XOR lemma

In this work we prove that there is a function fE NP such that, for every sufficiently large n and d = √n/logn, fn (f restricted to n-bit inputs) cannot be (1/2 + 2d)-approximated by F2-polynomials of degree d. We also observe that a minor improvement (e.g., improving d to n1/2+ε for any ε > 0) over our result would imply E NP cannot be computed by depth-3 AC0-circuits of 2n1/2 + ε size, which is a notoriously hard open question in complexity theory.

Using the same proof techniques, we are also able to construct extremely rigid matrices over F2 in P NP. More specifically, we show that for every constant ε ∈ (0,1), there is a P NP algorithm which on input 1n outputs an n× n F2-matrix Hn satisfying RHn(2log1 − ε n) ≥ (1/2 − exp(−log2/3 · ε n) ) · n2, for every sufficiently large n. This improves the recent P NP constructions of rigid matrices in [Alman and Chen, FOCS 2019] and [Bhangale et al., FOCS 2020], which only give Ω(n2) rigidity.

The key ingredient in the proof of our new results is a new derandomized XOR lemma based on approximate linear sums, which roughly says that given an n-input function f which cannot be 0.99-approximated by certain linear sum of s many functions in F within ℓ1-distance, one can construct a new function Ampf with O(n) input bits, which cannot be (1/2+sΩ(1))-approximated by F-functions. Taking F to be a function collection containing low-degree F2-polynomials or low-rank F2-matrices, our results are then obtained by first using the algorithmic method to construct a function which is weakly hard against linear sums of F in the above sense, and then applying the derandomized XOR lemma to f.

We obtain our new derandomized XOR lemma by giving a generalization of the famous hardcore lemma by Impagliazzo. Our generalization in some sense constructs a non-Boolean hardcore of a weakly hard function f with respect to F-functions, from the weak inapproximability of f by any linear sum of F with bounded ℓp-norm. This generalization recovers the original hardcore lemma by considering the ℓ-norm. Surprisingly, when we switch to the ℓ1-norm, we immediately rediscover Levin’s proof of Yao’s XOR Lemma. That is, these first two proofs of Yao’s XOR Lemma can be unified with our new perspective. For proving the correlation bounds, our new derandomized XOR lemma indeed works with the ℓ4/3-norm.

Kronecker products, low-depth circuits, and matrix rigidity

For a matrix M and a positive integer r, the rank r rigidity of M is the smallest number of entries of M which one must change to make its rank at most r. There are many known applications of rigidity lower bounds to a variety of areas in complexity theory, but fewer known applications of rigidity upper bounds. In this paper, we use rigidity upper bounds to prove new upper bounds in a few different models of computation. Our results include:

- For any d>1, and over any field F, the N × N Walsh-Hadamard transform has a depth-d linear circuit of size O(d · N1 + 0.96/d). This circumvents a known lower bound of Ω(d · N1 + 1/d) for circuits with bounded coefficients over ℂ by Pudlák (2000), by using coefficients of magnitude polynomial in N. Our construction also generalizes to linear transformations given by a Kronecker power of any fixed 2 × 2 matrix.

- The N × N Walsh-Hadamard transform has a linear circuit of size ≤ (1.81 + o(1)) N log2 N, improving on the bound of ≈ 1.88 N log2 N which one obtains from the standard fast Walsh-Hadamard transform.

- A new rigidity upper bound, showing that the following classes of matrices are not rigid enough to prove circuit lower bounds using Valiant’s approach: (1) for any field F and any function f : {0,1}n → F, the matrix Vf ∈ F2n × 2n given by, for any x,y ∈ {0,1}n, Vf[x,y] = f(xy), and (2) for any field F and any fixed-size matrices M1, …, Mn ∈ Fq × q, the Kronecker product M1M2 ⊗ ⋯ ⊗ Mn. This generalizes recent results on non-rigidity, using a simpler approach which avoids needing the polynomial method.

- New connections between recursive linear transformations like Fourier and Walsh-Hadamard transforms, and circuits for matrix multiplication.

Lower bounds for monotone arithmetic circuits via communication complexity

Valiant (1980) showed that general arithmetic circuits with negation can be exponentially more powerful than monotone ones. We give the first improvement to this classical result: we construct a family of polynomials Pn in n variables, each of its monomials has non-negative coefficient, such that Pn can be computed by a polynomial-size depth-three formula but every monotone circuit computing it has size 2Ω(n1/4/log(n)).

The polynomial Pn embeds the SINK∘ XOR function devised recently by Chattopadhyay, Mande and Sherif (2020) to refute the Log-Approximate-Rank Conjecture in communication complexity. To prove our lower bound for Pn, we develop a general connection between corruption of combinatorial rectangles by any function f ∘ XOR and corruption of product polynomials by a certain polynomial Pf that is an arithmetic embedding of f. This connection should be of independent interest.

Using further ideas from communication complexity, we construct another family of set-multilinear polynomials fn,m such that both Fn,m − є· fn,m and Fn,m + є· fn,m have monotone circuit complexity 2Ω(n/log(n)) if є ≥ 2− Ω( m ) and Fn,mi=1n (xi,1 +⋯+xi,m), with m = O( n/logn ). The polynomials fn,m have 0/1 coefficients and are in VNP. Proving such lower bounds for monotone circuits has been advocated recently by Hrubeš (2020) as a first step towards proving lower bounds against general circuits via his new approach.

Structure vs. randomness for bilinear maps

We prove that the slice rank of a 3-tensor (a combinatorial notion introduced by Tao in the context of the cap-set problem), the analytic rank (a Fourier-theoretic notion introduced by Gowers and Wolf), and the geometric rank (a recently introduced algebro-geometric notion) are all equivalent up to an absolute constant. As a corollary, we obtain strong trade-offs on the arithmetic complexity of a biased bililnear map, and on the separation between computing a bilinear map exactly and on average. Our result settles open questions of Haramaty and Shpilka [STOC 2010], and of Lovett [Discrete Anal., 2019] for 3-tensors.

Reconstruction algorithms for low-rank tensors and depth-3 multilinear circuits

We give new and efficient black-box reconstruction algorithms for some classes of depth-3 arithmetic circuits. As a consequence, we obtain the first efficient algorithm for computing the tensor rank and for finding the optimal tensor decomposition as a sum of rank-one tensors when then input is a constant-rank tensor. More specifically, we provide efficient learning algorithms that run in randomized polynomial time over general fields and in deterministic polynomial time over and for the following classes: 1) Set-multilinear depth-3 circuits of constant top fan-in ((k) circuits). As a consequence of our algorithm, we obtain the first polynomial time algorithm for tensor rank computation and optimal tensor decomposition of constant-rank tensors. This result holds for d dimensional tensors for any d, but is interesting even for d=3. 2) Sums of powers of constantly many linear forms ((k) circuits). As a consequence we obtain the first polynomial-time algorithm for tensor rank computation and optimal tensor decomposition of constant-rank symmetric tensors. 3) Multilinear depth-3 circuits of constant top fan-in (multilinear (k) circuits). Our algorithm works over all fields of characteristic 0 or large enough characteristic. Prior to our work the only efficient algorithms known were over polynomially-sized finite fields (see. Karnin-Shpilka 09’). Prior to our work, the only polynomial-time or even subexponential-time algorithms known (deterministic or randomized) for subclasses of (k) circuits that also work over large/infinite fields were for the setting when the top fan-in k is at most 2 (see Sinha 16’ and Sinha 20’).

SESSION: Session 5A

A faster algorithm for solving general LPs

The fastest known LP solver for general (dense) linear programs is due to [Cohen, Lee and Song’19] and runs in O*(nω +n2.5−α/2 + n2+1/6) time. A number of follow-up works [Lee, Song and Zhang’19, Brand’20, Song and Yu’20] obtain the same complexity through different techniques, but none of them can go below n2+1/6, even if ω=2. This leaves a polynomial gap between the cost of solving linear systems (nω) and the cost of solving linear programs, and as such, improving the n2+1/6 term is crucial toward establishing an equivalence between these two fundamental problems. In this paper, we reduce the running time to O*(nω +n2.5−α/2 + n2+1/18) where ω and α are the fast matrix multiplication exponent and its dual. Hence, under the common belief that ω ≈ 2 and α ≈ 1, our LP solver runs in O*(n2.055) time instead of O*(n2.16).

Combinatorial Bernoulli factories: matchings, flows, and other polytopes

A Bernoulli factory is an algorithmic procedure for exact sampling of certain random variables having only Bernoulli access to their parameters. Bernoulli access to a parameter p ∈ [0,1] means the algorithm does not know p, but has sample access to independent draws of a Bernoulli random variable with mean equal to p. In this paper, we study the problem of Bernoulli factories for polytopes: given Bernoulli access to a vector xP for a given polytope P⊂ [0,1]n, output a randomized vertex such that the expected value of the i-th coordinate is exactly equal to xi. For example, for the special case of the perfect matching polytope, one is given Bernoulli access to the entries of a doubly stochastic matrix [xij] and asked to sample a matching such that the probability of each edge (i,j) be present in the matching is exactly equal to xij.

We show that a polytope P admits a Bernoulli factory if and and only if P is the intersection of [0,1]n with an affine subspace. Our construction is based on an algebraic formulation of the problem, involving identifying a family of Bernstein polynomials (one per vertex) that satisfy a certain algebraic identity on P. The main technical tool behind our construction is a connection between these polynomials and the geometry of zonotope tilings.

We apply these results to construct an explicit factory for the perfect matching polytope. The resulting factory is deeply connected to the combinatorial enumeration of arborescences and may be of independent interest. For the k-uniform matroid polytope, we recover a sampling procedure known in statistics as Sampford sampling.

Capacity lower bounds via productization

We give a sharp lower bound on the capacity of a real stable polynomial, depending only on the value of its gradient at x = 1. This result implies a sharp improvement to a similar inequality proved by Linial-Samorodnitsky-Wigderson in 2000, which was crucial to the analysis of their permanent approximation algorithm. Such inequalities have played an important role in the recent work on operator scaling and its generalizations and applications, and in fact we use our bound to construct a new scaling algorithm for real stable polynomials. Our bound is also quite similar to one used very recently by Karlin-Klein-Oveis Gharan to give an improved approximation factor for metric TSP.

The new technique we develop to prove this bound is productization, which says that any real stable polynomial can be approximated at any point in the positive orthant by a product of linear forms. Beyond the results of this paper, our main hope is that this new technique will allow us to avoid ”frightening technicalities”, in the words of Laurent and Schrijver, that often accompany combinatorial lower bounds. In particular, we believe that this technique will be useful towards simplifying and improving further the approximation factor given in the fantastic work of Karlin-Klein-Oveis Gharan on metric TSP.

Minimum cost flows, MDPs, and ℓ1-regression in nearly linear time for dense instances

In this paper we provide new randomized algorithms with improved runtimes for solving linear programs with two-sided constraints. In the special case of the minimum cost flow problem on n-vertex m-edge graphs with integer polynomially-bounded costs and capacities we obtain a randomized method which solves the problem in Õ(m + n1.5) time. This improves upon the previous best runtime of Õ(mn) [Lee-Sidford’14] and, in the special case of unit-capacity maximum flow, improves upon the previous best runtimes of m4/3 + o(1) [Liu-Sidford’20, Kathuria’20] and Õ(mn) [Lee-Sidford’14] for sufficiently dense graphs.

In the case of ℓ1-regression in a matrix with n-columns and m-rows we obtain a randomized method which computes an є-approximate solution in Õ(mn + n2.5) time. This yields a randomized method which computes an є-optimal policy of a discounted Markov Decision Process with S states and, A actions per state in time Õ(S2 A + S2.5). These methods improve upon the previous best runtimes of methods which depend polylogarithmically on problem parameters, which were Õ(mn1.5) [Lee-Sidford’15] and Õ(S2.5 A) [Lee-Sidford’14, Sidford-Wang-Wu-Ye’18] respectively.

To obtain this result we introduce two new algorithmic tools of possible independent interest. First, we design a new general interior point method for solving linear programs with two sided constraints which combines techniques from [Lee-Song-Zhang’19, Brand et al.’20] to obtain a robust stochastic method with iteration count nearly the square root of the smaller dimension. Second, to implement this method we provide dynamic data structures for efficiently maintaining approximations to variants of Lewis-weights, a fundamental importance measure for matrices which generalize leverage scores and effective resistances.

A framework for quadratic form maximization over convex sets through nonconvex relaxations

We investigate the approximability of the following optimization problem. The input is an n× n matrix A=(Aij) with real entries and an origin-symmetric convex body K⊂ ℝn that is given by a membership oracle. The task is to compute (or approximate) the maximum of the quadratic form ∑i=1nj=1n Aij xixj=⟨ x,Ax⟩ as x ranges over K. This is a rich and expressive family of optimization problems; for different choices of matrices A and convex bodies K it includes a diverse range of optimization problems like max-cut, Grothendieck/non-commutative Grothendieck inequalities, small set expansion and more. While the literature studied these special cases using case-specific reasoning, here we develop a general methodology for treatment of the approximability and inapproximability aspects of these questions.

The underlying geometry of K plays a critical role; we show under commonly used complexity assumptions that polytime constant-approximability necessitates that K has type-2 constant that grows slowly with n. However, we show that even when the type-2 constant is bounded, this problem sometimes exhibits strong hardness of approximation. Thus, even within the realm of type-2 bodies, the approximability landscape is nuanced and subtle.

However, the link that we establish between optimization and geometry of Banach spaces allows us to devise a generic algorithmic approach to the above problem. We associate to each convex body a new (higher dimensional) auxiliary set that is not convex, but is approximately convex when K has a bounded type-2 constant. If our auxiliary set has an approximate separation oracle, then we design an approximation algorithm for the original quadratic optimization problem, using an approximate version of the ellipsoid method. Even though our hardness result implies that such an oracle does not exist in general, this new question can be solved in specific cases of interest by implementing a range of classical tools from functional analysis, most notably the deep factorization theory of linear operators.

Beyond encompassing the scenarios in the literature for which constant-factor approximation algorithms were found, our generic framework implies that that for convex sets with bounded type-2 constant, constant factor approximability is preserved under the following basic operations: (a) Subspaces, (b) Quotients, (c) Minkowski Sums, (d) Complex Interpolation. This yields a rich family of new examples where constant factor approximations are possible, which were beyond the reach of previous methods. We also show (under commonly used complexity assumptions) that for symmetric norms and unitarily invariant matrix norms the type-2 constant nearly characterizes the approximability of quadratic maximization.

SESSION: Session 5B

The randomized communication complexity of randomized auctions

We study the communication complexity of incentive compatible auction-protocols between a monopolist seller and a single buyer with a combinatorial valuation function over n items. Motivated by the fact that revenue-optimal auctions are randomized (as well as by an open problem of Babaioff, Gonczarowski, and Nisan), we focus on the randomized communication complexity of this problem (in contrast to most prior work on deterministic communication). We design simple, incentive compatible, and revenue-optimal auction-protocols whose expected communication complexity is much (in fact infinitely) more efficient than their deterministic counterparts. We also give nearly matching lower bounds on the expected communication complexity of approximately-revenue-optimal auctions. These results follow from a simple characterization of incentive compatible auction-protocols that allows us to prove lower bounds against randomized auction-protocols. In particular, our lower bounds give the first approximation-resistant, exponential separation between communication complexity of incentivizing vs implementing a Bayesian incentive compatible social choice rule, settling an open question of Fadel and Segal.

Greedy adversarial equilibrium: an efficient alternative to nonconvex-nonconcave min-max optimization

Min-max optimization of an objective function f: ℝd × ℝd → ℝ is an important model for robustness in an adversarial setting, with applications to many areas including optimization, economics, and deep learning. In many applications f may be nonconvex-nonconcave, and finding a global min-max point may be computationally intractable. There is a long line of work that seeks computationally tractable algorithms for alternatives to the min-max optimization model. However, many of the alternative models have solution points which are only guaranteed to exist under strong assumptions on f, such as convexity, monotonicity, or special properties of the starting point. We propose an optimization model, the ε-greedy adversarial equilibrium, and show that it can serve as a computationally tractable alternative to the min-max optimization model. Roughly, we say that a point (x, y) is an ε-greedy adversarial equilibrium if y is an ε-approximate local maximum for f(x,·), and x is an ε-approximate local minimum for a “greedy approximation” to the function maxz f(x, z) which can be efficiently estimated using second-order optimization algorithms. We prove the existence of such a point for any smooth function which is bounded and has Lipschitz Hessian. To prove existence, we introduce an algorithm that converges from any starting point to an ε-greedy adversarial equilibrium in a number of evaluations of the function f, the max-player’s gradient ∇y f(x,y), and its Hessian ∇y2 f(x,y), that is polynomial in the dimension d, 1/ε, and the bounds on f and its Lipschitz constant.

Contextual search in the presence of irrational agents

We study contextual search, a generalization of binary search in higher dimensions, which captures settings such as feature-based dynamic pricing. Standard game-theoretic formulations of this problem assume that agents act in accordance with a specific behavioral model. In practice, some agents may not subscribe to the dominant behavioral model or may act in ways that are seemingly arbitrarily irrational. Existing algorithms heavily depend on the behavioral model being (approximately) accurate for all agents and have poor performance even with a few arbitrarily irrational agents.

We initiate the study of contextual search when some of the agents can behave in ways inconsistent with the underlying behavioral model. In particular, we provide two algorithms, one based on multidimensional binary search methods and one based on gradient descent. Our techniques draw inspiration from learning theory, game theory, high-dimensional geometry, and convex analysis.

How much data is sufficient to learn high-performing algorithms? generalization guarantees for data-driven algorithm design

Algorithms often have tunable parameters that impact performance metrics such as runtime and solution quality. For many algorithms used in practice, no parameter settings admit meaningful worst-case bounds, so the parameters are made available for the user to tune. Alternatively, parameters may be tuned implicitly within the proof of a worst-case guarantee. Worst-case instances, however, may be rare or nonexistent in practice. A growing body of research has demonstrated that data-driven algorithm design can lead to significant improvements in performance. This approach uses a training set of problem instances sampled from an unknown, application-specific distribution and returns a parameter setting with strong average performance on the training set.

We provide a broadly applicable theory for deriving generalization guarantees that bound the difference between the algorithm’s average performance over the training set and its expected performance on the unknown distribution. Our results apply no matter how the parameters are tuned, be it via an automated or manual approach. The challenge is that for many types of algorithms, performance is a volatile function of the parameters: slightly perturbing the parameters can cause a large change in behavior. Prior research (e.g., Gupta and Roughgarden, SICOMP’17; Balcan et al., COLT’17, ICML’18, EC’18) has proved generalization bounds by employing case-by-case analyses of greedy algorithms, clustering algorithms, integer programming algorithms, and selling mechanisms. We uncover a unifying structure which we use to prove extremely general guarantees, yet we recover the bounds from prior research. Our guarantees, which are tight up to logarithmic factors in the worst case, apply whenever an algorithm’s performance is a piecewise-constant, -linear, or—more generally—piecewise-structured function of its parameters. Our theory also implies novel bounds for voting mechanisms and dynamic programming algorithms from computational biology.

The communication complexity of payment computation

Let (f,P) be an incentive compatible mechanism where f is the social choice function and P is the payment function. In many important settings, f uniquely determines P (up to a constant) and therefore a common approach is to focus on the design of f and neglect the role of the payment function. Fadel and Segal [JET, 2009] question this approach by taking the lenses of communication complexity: can it be that the communication complexity of an incentive compatible mechanism that implements f (that is, computes both the output and the payments) is much larger than the communication complexity of computing the output? I.e., can it be that ccIC(f)>>cc(f)? Fadel and Segal show that for every f, ccIC(f)≤ exp(cc(f)). They also show that fully computing the incentive compatible mechanism is strictly harder than computing only the output: there exists a social choice function f such that ccIC(f)=cc(f)+1. In a follow-up work, Babaioff, Blumrosen, Naor, and Schapira [EC’08] provide a social choice function f such that ccIC(f)=Θ(n· cc(f)), where n is the number of players. The question of whether the exponential upper bound of Fadel and Segal is tight remained wide open. In this paper we solve this question by explicitly providing a function f such that ccIC(f)= exp(cc(f)). In fact, we establish this via two very different proofs. In contrast, we show that if the players are risk-neutral and we can compromise on a randomized truthful-in-expectation implementation (and not on deterministic ex-post implementation) gives that ccTIE(f)=poly(n,cc(f)) for every function f, as long as the domain of f is single parameter or a convex multi-parameter domain. We also provide efficient algorithms for deterministic computation of payments in several important domains.

Exponential communication separations between notions of selfishness

We consider the problem of implementing a fixed social choice function between multiple players (which takes as input a type ti from each player i and outputs an outcome f(t1,…, tn)), in which each player must be incentivized to follow the protocol. In particular, we study the communication requirements of a protocol which: (a) implements f, (b) implements f and computes payments that make it ex-post incentive compatible (EPIC) to follow the protocol, and (c) implements f and computes payments in a way that makes it dominant-strategy incentive compatible (DSIC) to follow the protocol.

We show exponential separations between all three of these quantities, already for just two players. That is, we first construct an f such that f can be implemented in communication c, but any EPIC implementation of f (with any choice of payments) requires communication exp(c). This answers an open question of [Fadel and Segal, 2009; Babaioff et. al., 2013]. Second, we construct an f such that an EPIC protocol implements f with communication C, but all DSIC implementations of f require communication exp(C).

SESSION: Session 5C

Reducing isotropy and volume to KLS: an o*(n3ψ2) volume algorithm

We show that the volume of a convex body in Rn in the general membership oracle model can be computed to within relative error ε using O(n3ψ22) oracle queries, where ψ is the KLS constant. With the current bound of ψ=O(no(1)), this gives an O(n3+o(1)2) algorithm, the first improvement on the Lovász-Vempala O(n42) algorithm from 2003. The main new ingredient is an O(n3ψ2) algorithm for isotropic transformation, following which we can apply the O(n32) volume algorithm of Cousins and Vempala for well-rounded convex bodies. A positive resolution of the KLS conjecture would imply an O(n32) volume algorithm. We also give an efficient implementation of the new algorithm for convex polytopes defined by m inequalities in Rn: polytope volume can be estimated in time O(mnc2) where c<3.2 depends on the current matrix multiplication exponent and improves on the previous best bound.

A new algorithm for Euclidean shortest paths in the plane

Given a set of pairwise disjoint polygonal obstacles in the plane, finding an obstacle-avoiding Euclidean shortest path between two points is a classical problem in computational geometry and has been studied extensively. Previously, Hershberger and Suri [SIAM J. Comput. 1999] gave an algorithm of O(nlogn) time and O(nlogn) space, where n is the total number of vertices of all obstacles. Recently, by modifying Hershberger and Suri’s algorithm, Wang [SODA 2021] reduced the space to O(n) while the runtime of the algorithm is still O(nlogn). In this paper, we present a new algorithm of O(n+hlogh) time and O(n) space, provided that a triangulation of the free space is given, where h is the number of obstacles. Our algorithm builds a shortest path map for a source point s, so that given any query point t, the shortest path length from s to t can be computed in O(logn) time and a shortest s-t path can be produced in additional time linear in the number of edges of the path.

Stronger bounds for weak epsilon-nets in higher dimensions

Given a finite point set P in ℝd, and >0 we say that N⊆ ℝd is a weak -net if it pierces every convex set K with |KP|≥ є |P|.

Let d≥ 3. We show that for any finite point set in ℝd, and any є>0, there exist a weak -net of cardinality O(1/єd−1/2+γ), where γ>0 is an arbitrary small constant.

This is the first improvement of the bound of O*(1/єd) that was obtained in 1993 by Chazelle, Edelsbrunner, Grigni, Guibas, Sharir, and Welzl for general point sets in dimension d≥ 3.

Dynamic planar point location in optimal time

In this paper we describe a fully-dynamic data structure that supports point location queries in a connected planar subdivision with n edges. Our data structure uses O(n) space, answers queries in O(logn) time, and supports updates in O(logn) time. Our solution is based on a data structure for vertical ray shooting queries that supports queries and updates in O(logn) time.

SESSION: Session 6A

When is approximate counting for conjunctive queries tractable?

Conjunctive queries are one of the most common class of queries used in database systems, and the best studied in the literature. A seminal result of Grohe, Schwentick, and Segoufin (STOC 2001) demonstrates that for every class G of graphs, the evaluation of all conjunctive queries whose underlying graph is in G is tractable if, and only if, G has bounded treewidth. In this work, we extend this characterization to the counting problem for conjunctive queries. Specifically, for every class C of conjunctive queries with bounded treewidth, we introduce the first fully polynomial-time randomized approximation scheme (FPRAS) for counting answers to a query in C, and the first polynomial-time algorithm for sampling answers uniformly from a query in C. As a corollary, it follows that for every class G of graphs, the counting problem for conjunctive queries whose underlying graph is in G admits an FPRAS if, and only if, G has bounded treewidth (unless BPP is different from P). In fact, our FPRAS is more general, and also applies to conjunctive queries with bounded hypertree width, as well as unions of such queries.

The key ingredient in our proof is the resolution of a fundamental counting problem from automata theory. Specifically, we demonstrate the first FPRAS and polynomial time sampler for the set of trees of size n accepted by a tree automaton, which improves the prior quasi-polynomial time randomized approximation scheme (QPRAS) and sampling algorithm of Gore, Jerrum, Kannan, Sweedyk, and Mahaney ’97. We demonstrate how this algorithm can be used to obtain an FPRAS for many open problems, such as counting solutions to constraint satisfaction problems (CSP) with bounded hypertree width, counting the number of error threads in programs with nested call subroutines, and counting valid assignments to structured DNNF circuits.

Near-linear time approximation schemes for Steiner tree and forest in low-dimensional spaces

We give an algorithm that computes a (1+є)-approximate Steiner forest in near-linear time n · 2(1/є)O(ddim2) (loglogn)2, where ddim is the doubling dimension of the metric space. This improves upon the best previous result due to Chan et al. (SIAM J. Comput. 4 (2018)), who gave a runtime of about n2O(ddim) · 2(ddim/є)O(ddim) √logn. For Steiner tree our methods achieve an even better runtime n (logn)(1/є)O(ddim2).

A (2 + ε)-approximation algorithm for preemptive weighted flow time on a single machine

Weighted flow time is a fundamental and very well-studied objective function in scheduling. In this paper, we study the setting of a single machine with preemptions.

The input consists of a set of jobs, characterized by their processing times, release times, and weights and we want to compute a (possibly preemptive) schedule for them. The objective is to minimize the sum of the weighted flow times of the jobs, where the flow time of a job is the time between its release date and its completion time.

It had been a long-standing open problem to find a polynomial time O(1)-approximation algorithm for this setting. In a recent break-through result, Batra, Garg, and Kumar (FOCS 2018) found such an algorithm if the input data are polynomially bounded integers, and Feige, Kulkarni, and Li (SODA 2019) presented a black-box reduction to this setting. The resulting approximation ratio is a (not explicitly stated) constant which is at least 10000. In this paper we improve this ratio to 2+ε. The algorithm by Batra, Garg, and Kumar (FOCS 2018) reduces the problem to Demand MultiCut on trees and solves the resulting instances via LP-rounding and a dynamic program. Instead, we first reduce the problem to a (different) geometric problem while losing only a factor 1+ε, and then solve its resulting instances up to a factor of 2+ε by a dynamic program. In particular, our reduction ensures certain structural properties, thanks to which we do not need LP-rounding methods.

We believe that our result makes substantial progress towards finding a PTAS for weighted flow time on a single machine.

A quasipolynomial (2 + ε)-approximation for planar sparsest cut

The (non-uniform) sparsest cut problem is the following graph-partitioning problem: given a “supply” graph, and demands on pairs of vertices, delete some subset of supply edges to minimize the ratio of the supply edges cut to the total demand of the pairs separated by this deletion. Despite much effort, there are only a handful of nontrivial classes of supply graphs for which constant-factor approximations are known.

We consider the problem for planar graphs, and give a (2+)-approximation algorithm that runs in quasipolynomial time. Our approach defines a new structural decomposition of an optimal solution using a “patching” primitive. We combine this decomposition with a Sherali-Adams-style linear programming relaxation of the problem, which we then round. This should be compared with the polynomial-time approximation algorithm of Rao (1999), which uses the metric linear programming relaxation and ℓ1-embeddings, and achieves an O(√logn)-approximation in polynomial time.

Flow time scheduling with uncertain processing time

We consider the problem of online scheduling on a single machine in order to minimize weighted flow time. The existing algorithms for this problem (STOC ’01, SODA ’03, FOCS ’18) all require exact knowledge of the processing time of each job. This assumption is crucial, as even a slight perturbation of the processing time would lead to polynomial competitive ratio. However, this assumption very rarely holds in real-life scenarios.

In this paper, we present the first algorithm for weighted flow time which do not require exact knowledge of the processing times of jobs. Specifically, we introduce the Scheduling with Predicted Processing Time (SPPT) problem, where the algorithm is given a prediction for the processing time of each job, instead of its real processing time. For the case of a constant factor distortion between the predictions and the real processing time, our algorithms match all the best known competitiveness bounds for weighted flow time – namely O(logP), O(logD) and O(logW), where P,D,W are the maximum ratios of processing times, densities, and weights, respectively. For larger errors, the competitiveness of our algorithms degrades gracefully.

SESSION: Session 7A

The limits of pan privacy and shuffle privacy for learning and estimation

There has been a recent wave of interest in intermediate trust models for differential privacy that eliminate the need for a fully trusted central data collector, but overcome the limitations of local differential privacy. This interest has led to the introduction of the shuffle model (Cheu et al., EUROCRYPT 2019; Erlingsson et al., SODA 2019) and revisiting the pan-private model (Dwork et al., ITCS 2010). The message of this line of work is that, for a variety of low-dimensional problems—such as counts, means, and histograms—these intermediate models offer nearly as much power as central differential privacy. However, there has been considerably less success using these models for high-dimensional learning and estimation problems.

In this work we prove the first non-trivial lower bounds for high-dimensional learning and estimation in both the pan-private model and the general multi-message shuffle model. Our lower bounds apply to a variety of problems—for example, we show that, private agnostic learning of parity functions over d bits requires Ω(2d/2) samples in these models, and privately selecting the most common attribute from a set of d choices requires Ω(d1/2) samples, both of which are exponential separations from the central model.

Outcome indistinguishability

Prediction algorithms assign numbers to individuals that are popularly understood as individual “probabilities”—what is the probability of 5-year survival after cancer diagnosis?—and which increasingly form the basis for life-altering decisions. Drawing on an understanding of computational indistinguishability developed in complexity theory and cryptography, we introduce Outcome Indistinguishability. Predictors that are Outcome Indistinguishable (OI) yield a generative model for outcomes that cannot be efficiently refuted on the basis of the real-life observations produced by .

We investigate a hierarchy of OI definitions, whose stringency increases with the degree to which distinguishers may access the predictor in question. Our findings reveal that OI behaves qualitatively differently than previously studied notions of indistinguishability. First, we provide constructions at all levels of the hierarchy. Then, leveraging recently-developed machinery for proving average-case fine-grained hardness, we obtain lower bounds on the complexity of the more stringent forms of OI. This hardness result provides the first scientific grounds for the political argument that, when inspecting algorithmic risk prediction instruments, auditors should be granted oracle access to the algorithm, not simply historical predictions.

Optimal labelling schemes for adjacency, comparability, and reachability

We construct asymptotically optimal adjacency labelling schemes for every hereditary class containing 2Ω(n2) n-vertex graphs as n→ ∞. This regime contains many classes of interest, for instance perfect graphs or comparability graphs, for which we obtain an adjacency labelling scheme with labels of n/4+o(n) bits per vertex. This implies the existence of a reachability labelling scheme for digraphs with labels of n/4+o(n) bits per vertex and comparability labelling scheme for posets with labels of n/4+o(n) bits per element. All these results are best possible, up to the lower order term.

Bipartite perfect matching as a real polynomial

We obtain a description of the Bipartite Perfect Matching decision problem as a multilinear polynomial over the Reals. We show that it has full degree and (1−on(1))· 2n2 monomials with non-zero coefficients. In contrast, we show that in the dual representation (switching the roles of 0 and 1) the number of monomials is only exponential in Θ(n logn). Our proof relies heavily on the fact that the lattice of graphs which are “matching-covered” is Eulerian.

Efficient and near-optimal algorithms for sampling connected subgraphs

We study the graphlet sampling problem: given an integer k ≥ 3 and a graph G=(V,E), sample a connected induced k-node subgraph of G (also called k-graphlet) uniformly at random. This is a fundamental graph mining primitive, with applications in social network analysis and bioinformatics. The two state-of-the-art techniques are random walks and color coding. The random walk is elegant, but the current upper bounds and lower bounds on its mixing time suffer a gap of Δk−1 where Δ is the maximum degree of G. Color coding is better understood, but requires a 2O(k) m-time preprocessing over the entire graph. Moreover, no efficient algorithm is known for sampling graphlets uniformly — random walks and color coding yield only є-uniform samples. In this work, we provide the following results:

(i) A near-optimal mixing time bound for the classic k-graphlet random walk, as a function of the mixing time of G. In particular, ignoring kO(k) factors, we show that the k-graphlet random walk mixes in Θ(t(G) · ρ(G)k−1) steps, where t(G) is the mixing time of G and ρ(G) is the ratio between its maximum and minimum degree, and on some graphs this is tight up to lgn factors.

(ii) The first efficient algorithm for uniform graphlet sampling. The algorithm has a preprocessing phase that uses time O(n k2 lgk + m) and space O(n), and a sampling phase that takes kO(k) lgΔ time per sample. It is based on ordering G in a simple way, so to virtually partition the graphlets into buckets, and then sampling from those buckets using rejection sampling. The algorithm can be used also for counting, with additive guarantees.

(iii) A near-optimal algorithm for є-uniform graphlet sampling, with a preprocessing phase that runs in time O(k6 є−1 n lgn) and space O(n), and a sampling phase that takes kO(k)(1/є)10 lg1/є expected time per sample. The algorithm is based on a nontrivial sketching of the ordering of G, followed by emulating uniform sampling through coupling arguments.

SESSION: Session 6B

Distributed weighted min-cut in nearly-optimal time

Minimum-weight cut (min-cut) is a basic measure of a network’s connectivity strength. While the min-cut can be computed efficiently in the sequential setting [Karger STOC’96], there was no efficient way for a distributed network to compute its own min-cut without limiting the input structure or dropping the output quality: In the standard CONGEST model, existing algorithms with nearly-optimal time (e.g. [Ghaffari, Kuhn, DISC’13; Nanongkai, Su, DISC’14]) can guarantee a solution that is (1+є)-approximation at best while the exact Õ(n0.8D0.2 + n0.9)-time algorithm [Ghaffari, Nowicki, Thorup, SODA’20] works only on simple networks (no weights and no parallel edges). Throughout, n and D denote the network’s number of vertices and hop-diameter, respectively. For the weighted case, the best bound was Õ(n) [Daga, Henzinger, Nanongkai, Saranurak, STOC’19]. In this paper, we provide an exact Õ(√n + D)-time algorithm for computing min-cut on weighted networks. Our result improves even the previous algorithm that works only on simple networks. Its time complexity matches the known lower bound up to polylogarithmic factors. At the heart of our algorithm are a routing trick and two structural lemmas regarding the structure of a minimum cut of a graph. These two structural lemmas considerably strengthen and generalize the framework of Mukhopadhyay-Nanongkai [STOC’20] and can be of independent interest.

A deterministic algorithm for the MST problem in constant rounds of congested clique

In this paper we show that the Minimum Spanning Tree problem (MST) can be solved deterministically in O(1) rounds of the Congested Clique model.

In the Congested Clique model there are n players that perform computation in synchronous rounds. Each round consist of a phase of local computation and a phase of communication, in which each pair of players is allowed to exchange O(logn) bit messages. The studies of this model began with the MST problem: in the paper by Lotker, Pavlov, Patt-Shamir, and Peleg [SPAA’03, SICOMP’05] that defines the Congested Clique model the authors give a deterministic O(loglogn) round algorithm that improved over a trivial O(logn) round adaptation of Borůvka’s algorithm.

There was a sequence of gradual improvements to this result: an O(logloglogn) round algorithm by Hegeman, Pandurangan, Pemmaraju, Sardeshmukh, and Scquizzato [PODC’15], an O(log* n) round algorithm by Ghaffari and Parter, [PODC’16] and an O(1) round algorithm by Jurdziński and Nowicki, [SODA’18], but all those algorithms were randomized. Therefore, the question about the existence of any deterministic o(loglogn) round algorithms for the Minimum Spanning Tree problem remains open since the seminal paper by Lotker, Pavlov, Patt-Shamir, and Peleg [SPAA’03, SICOMP’05].

Our result resolves this question and establishes that O(1) rounds is enough to solve the MST problem in the Congested Clique model, even if we are not allowed to use any randomness. Furthermore, the amount of communication needed by the algorithm makes it applicable to a variant of the MPC model using machines with local memory of size O(n).

Universally-optimal distributed algorithms for known topologies

Many distributed optimization algorithms achieve existentially-optimal running times, meaning that there exists some pathological worst-case topology on which no algorithm can do better. Still, most networks of interest allow for exponentially faster algorithms. This motivates two questions:

(i) What network topology parameters determine the complexity of distributed optimization?

(ii) Are there universally-optimal algorithms that are as fast as possible on every topology?

We resolve these 25-year-old open problems in the known-topology setting (i.e., supported CONGEST) for a wide class of global network optimization problems including MST, (1+є)-min cut, various approximate shortest paths problems, sub-graph connectivity, etc.

In particular, we provide several (equivalent) graph parameters and show they are tight universal lower bounds for the above problems, fully characterizing their inherent complexity. Our results also imply that algorithms based on the low-congestion shortcut framework match the above lower bound, making them universally optimal if shortcuts are efficiently approximable.

Efficient randomized distributed coloring in CONGEST

Distributed vertex coloring is one of the classic problems and probably also the most widely studied problems in the area of distributed graph algorithms. We present a new randomized distributed vertex coloring algorithm for the standard CONGEST model, where the network is modeled as an n-node graph G, and where the nodes of G operate in synchronous communication rounds in which they can exchange O(logn)-bit messages over all the edges of G. For graphs with maximum degree Δ, we show that the (Δ+1)-list coloring problem (and therefore also the standard (Δ+1)-coloring problem) can be solved in O(log5logn) rounds. Previously such a result was only known for the significantly more powerful LOCAL model, where in each round, neighboring nodes can exchange messages of arbitrary size. The best previous (Δ+1)-coloring algorithm in the CONGEST model had a running time of O(logΔ + log6logn) rounds. As a function of n alone, the best previous algorithm therefore had a round complexity of O(logn), which is a bound that can also be achieved by a na'ive folklore algorithm. For large maximum degree Δ, our algorithm hence is an exponential improvement over the previous state of the art.

The communication complexity of multiparty set disjointness under product distributions

In the multiparty number-in-hand set disjointness problem, we have k players, with private inputs X1,…,Xk ⊆ [n]. The players’ goal is to check whether ∩ℓ=1k X = ∅. It is known that in the shared blackboard model of communication, set disjointness requires Ω(n logk + k) bits of communication, and in the coordinator model, it requires Ω(kn) bits. However, these two lower bounds require that the players’ inputs can be highly correlated. We study the communication complexity of multiparty set disjointness under product distributions, and ask whether the problem becomes significantly easier, as it is known to become in the two-party case. Our main result is a nearly-tight bound of Θ̃(n1−1/k + k) for both the shared blackboard model and the coordinator model. This shows that in the shared blackboard model, as the number of players grows, having independent inputs helps less and less; but in the coordinator model, when k is very large, having independent inputs makes the problem much easier. Both our upper and our lower bounds use new ideas, as the original techniques developed for the two-party case do not scale to more than two players.

SESSION: Session 7B

Hop-constrained oblivious routing

We prove the existence of an oblivious routing scheme that is poly(logn)-competitive in terms of (congestion + dilation), thus resolving a well-known question in oblivious routing.

Concretely, consider an undirected network and a set of packets each with its own source and destination. The objective is to choose a path for each packet, from its source to its destination, so as to minimize (congestion + dilation), defined as follows: The dilation is the maximum path hop-length, and the congestion is the maximum number of paths that include any single edge. The routing scheme obliviously and randomly selects a path for each packet independent of (the existence of) the other packets. Despite this obliviousness, the selected paths have (congestion + dilation) within a poly(logn) factor of the best possible value. More precisely, for any integer hop-bound h, this oblivious routing scheme selects paths of length at most h · poly(logn) and is poly(logn)-competitive in terms of congestion in comparison to the best possible congestion achievable via paths of length at most h hops. These paths can be sampled in polynomial time.

This result can be viewed as an analogue of the celebrated oblivious routing results of R'acke [FOCS 2002, STOC 2008], which are O(logn)-competitive in terms of congestion, but are not competitive in terms of dilation.

Efficient randomized DCAS

Double Compare-And-Swap (DCAS) is a tremendously useful synchronization primitive, which is also notoriously difficult to implement efficiently from objects that are provided by hardware. We present a randomized implementation of DCAS with O(logn) expected amortized step complexity against the oblivious adversary, where n is the number of processes in the system. This is the only algorithm to-date that achieves sub-linear step complexity. We achieve that by first implementing two novel algorithms as building blocks. One is a mechanism that allows processes to repeatedly agree on a random value among multiple proposed ones, and the other one is a restricted bipartite version of DCAS.

Optimal error resilience of adaptive message exchange

We study the error resilience of the message exchange task: Two parties, each holding a private input, want to exchange their inputs. However, the channel connecting them is governed by an adversary that may corrupt a constant fraction of the transmissions. What is the maximum fraction of corruptions that still allows the parties to exchange their inputs?

For the non-adaptive channel, where the parties must agree in advance on the order in which they communicate, the maximum error resilience was shown to be 1/4 (see Braverman and Rao, STOC 2011). The problem was also studied over the adaptive channel, where the order in which the parties communicate may not be predetermined (Ghaffari, Haeupler, and Sudan, STOC 2014; Efremenko, Kol, and Saxena, STOC 2020). These works show that the adaptive channel admits much richer set of protocols but leave open the question of finding its maximum error resilience.

In this work, we show that the maximum error resilience of a protocol for message exchange over the adaptive channel is 5/16, thereby settling the above question. Our result requires improving both the known upper bounds and the known lower bounds for the problem.

How asymmetry helps buffer management: achieving optimal tail size in cup games

The cup game on n cups is a multi-step game with two players, a filler and an emptier. At each step, the filler distributes 1 unit of water among the cups, and then the emptier selects a single cup to remove (up to) 1 unit of water from.

There are several objective functions that the emptier might wish to minimize. One of the strongest guarantees would be to minimize tail size, which is defined to be the number of cups with fill 2 or greater. A simple lower-bound construction shows that the optimal tail size for deterministic emptying algorithms is Θ(n), however.

We present a simple randomized emptying algorithm that achieves tail size Õ(logn) with high probability in n for poly n steps. Moreover, we show that this is tight up to doubly logarithmic factors. We also extend our results to the multi-processor cup game, achieving tail size Õ(logn + p) on p processors with high probability in n. We show that the dependence on p is near optimal for any emptying algorithm that achieves polynomial-bounded backlog.

A natural question is whether our results can be extended to give unending guarantees, which apply to arbitrarily long games. We give a lower bound construction showing that no monotone memoryless emptying algorithm can achieve an unending guarantee on either tail size or the related objective function of backlog. On the other hand, we show that even a very small (i.e., 1 / poly n) amount of resource augmentation is sufficient to overcome this barrier.

Load balancing with dynamic set of balls and bins

In dynamic load balancing, we wish to distribute balls into bins in an environment where both balls and bins can be added and removed. We want to minimize the maximum load of any bin but we also want to minimize the number of balls and bins that are affected when adding or removing a ball or a bin. We want a hashing-style solution where we given the ID of a ball can find its bin efficiently.

We are given a user-specified balancing parameter c=1+ε, where ε∈ (0,1). Let n and m be the current number of balls and bins. Then we want no bin with load above C=⌈ c n/m⌉, referred to as the capacity of the bins.

We present a scheme where we can locate a ball checking 1+O(log1/ε) bins in expectation. When inserting or deleting a ball, we expect to move O(1/ε) balls, and when inserting or deleting a bin, we expect to move O(C/ε) balls. Previous bounds were off by a factor 1/ε.

The above bounds are best possible when C=O(1) but for larger C, we can do much better: We define fC when C≤ log1/ε, f=ε√C· √log(1/(ε√C)) when log1/ε≤ C<1/2ε2, and f=1 when C≥ 1/2ε2. We show that we expect to move O(1/f) balls when inserting or deleting a ball, and O(C/f) balls when inserting or deleting a bin. Moreover, when C≥ log1/ε, we can search a ball checking only O(1) bins in expectation.

For the bounds with larger C, we first have to resolve a much simpler probabilistic problem. Place n balls in m bins of capacity C, one ball at the time. Each ball picks a uniformly random non-full bin. We show that in expectation and with high probability, the fraction of non-full bins is Θ(f). Then the expected number of bins that a new ball would have to visit to find one that is not full is Θ(1/f). As it turns out, this is also the complexity of an insertion in our more complicated scheme where both balls and bins can be added and removed.

SESSION: Session 6C

Fiber bundle codes: breaking the n1/2 polylog(n) barrier for Quantum LDPC codes

We present a quantum LDPC code family that has distance Ω(N3/5/polylog(N)) and Θ(N3/5) logical qubits, where N is the code length. This is the first quantum LDPC code construction that achieves distance greater than N1/2 polylog(N). The construction is based on generalizing the homological product of codes to a fiber bundle.

An optimal separation of randomized and Quantum query complexity

We prove that for every decision tree, the absolute values of the Fourier coefficients of given order t≥1 sum to at most (cd/t)t/2(1+logn)(t−1)/2, where n is the number of variables, d is the tree depth, and c>0 is an absolute constant. This bound is essentially tight and settles a conjecture due to Tal (arxiv 2019; FOCS 2020). The bounds prior to our work degraded rapidly with t, becoming trivial already at t=√d.

As an application, we obtain, for every integer k≥1, a partial Boolean function on n bits that has bounded-error quantum query complexity at most ⌈ k/2⌉ and randomized query complexity Ω(n1−1/k). This separation of bounded-error quantum versus randomized query complexity is best possible, by the results of Aaronson and Ambainis (STOC 2015). Prior to our work, the best known separation was polynomially weaker: O(1) versus Ω(n2/3−є) for any є>0 (Tal, FOCS 2020).

As another application, we obtain an essentially optimal separation of O(logn) versus Ω(n1−є) for bounded-error quantum versus randomized communication complexity, for any є>0. The best previous separation was polynomially weaker: O(logn) versus Ω(n2/3−є) (implicit in Tal, FOCS 2020).

k-forrelation optimally separates Quantum and classical query complexity

Aaronson and Ambainis (SICOMP ‘18) showed that any partial function on N bits that can be computed with an advantage δ over a random guess by making q quantum queries, can also be computed classically with an advantage δ/2 by a randomized decision tree making Oq(N1−1/2qδ−2) queries. Moreover, they conjectured the k-Forrelation problem — a partial function that can be computed with q = ⌈ k/2 ⌉ quantum queries — to be a suitable candidate for exhibiting such an extremal separation.

We prove their conjecture by showing a tight lower bound of Ω(N1−1/k) for the randomized query complexity of k-Forrelation, where δ = 2O(k). By standard amplification arguments, this gives an explicit partial function that exhibits an Oє(1) vs Ω(N1−є) separation between bounded-error quantum and randomized query complexities, where є>0 can be made arbitrarily small. Our proof also gives the same bound for the closely related but non-explicit k-Rorrelation function introduced by Tal (FOCS ‘20).

Our techniques rely on classical Gaussian tools, in particular, Gaussian interpolation and Gaussian integration by parts, and in fact, give a more general statement. We show that to prove lower bounds for k-Forrelation against a family of functions, it suffices to bound the ℓ1-weight of the Fourier coefficients between levels k and (k−1)k. We also prove new interpolation and integration by parts identities that might be of independent interest in the context of rounding high-dimensional Gaussian vectors.

New cosystolic expanders from tensors imply explicit Quantum LDPC codes with Ω(√n logk n) distance

In this work we introduce a new notion of expansion in higher dimensions that is stronger than the well studied cosystolic expansion notion, and is termed Collective-cosystolic expansion.

We show that tensoring two cosystolic expanders yields a new cosystolic expander, assuming one of the complexes in the product, is not only cosystolic expander, but rather a collective cosystolic expander.

We then show that the well known bounded degree cosystolic expanders, the Ramanujan complexes are, in fact, collective cosystolic expanders. This enables us to construct new bounded degree cosystolic expanders, by tensoring of Ramanujan complexes.

Using our new constructed bounded degree cosystolic expanders we construct explicit quantum LDPC codes of distance √n logk n for any k, improving a recent result of Evra et. al. [FOCS, 2020], and setting a new record for distance of explicit quantum LDPC codes.

The work of Evra et. al. [FOCS, 2020] took advantage of the high dimensional expansion notion known as cosystolic expansion, that occurs in Ramanujan complexes. Our improvement is achieved by considering tensor product of Ramanujan complexes, and using their newly derived property, the collective cosystolic expansion.

Degree vs. approximate degree and Quantum implications of Huang’s sensitivity theorem

Based on the recent breakthrough of Huang (2019), we show that for any total Boolean function f,

deg(f) = O(adeg(f)^2): The degree of f is at most quadratic in the approximate degree of f. This is optimal as witnessed by the OR function.

D(f) = O(Q(f)^4): The deterministic query complexity of f is at most quartic in the quantum query complexity of f. This matches the known separation (up to log factors) due to Ambainis, Balodis, Belovs, Lee, Santha, and Smotrovs (2017).

We apply these results to resolve the quantum analogue of the Aanderaa–Karp–Rosenberg conjecture. We show that if f is a nontrivial monotone graph property of an n-vertex graph specified by its adjacency matrix, then Q(f)=Ω(n), which is also optimal. We also show that the approximate degree of any read-once formula on n variables is Θ(√n).

Eliminating intermediate measurements in space-bounded Quantum computation

A foundational result in the theory of quantum computation, known as the "principle of safe storage," shows that it is always possible to take a quantum circuit and produce an equivalent circuit that makes all measurements at the end of the computation. While this procedure is time efficient, meaning that it does not introduce a large overhead in the number of gates, it uses extra ancillary qubits, and so is not generally space efficient. It is quite natural to ask whether it is possible to eliminate intermediate measurements without increasing the number of ancillary qubits. We give an affirmative answer to this question by exhibiting a procedure to eliminate all intermediate measurements that is simultaneously space efficient and time efficient. In particular, this shows that the definition of a space-bounded quantum complexity class is robust to allowing or forbidding intermediate measurements. A key component of our approach, which may be of independent interest, involves showing that the well-conditioned versions of many standard linear-algebraic problems may be solved by a quantum computer in less space than seems possible by a classical computer.

SESSION: Session 7C

(Sub)Exponential advantage of adiabatic Quantum computation with no sign problem

We demonstrate the possibility of (sub)exponential quantum speedup via a quantum algorithm that follows an adiabatic path of a gapped Hamiltonian with no sign problem. The Hamiltonian that exhibits this speed-up comes from the adjacency matrix of an undirected graph whose vertices are labeled by n-bit strings, and we can view the adiabatic evolution as an efficient O(poly(n))-time quantum algorithm for finding a specific “EXIT” vertex in the graph given the “ENTRANCE” vertex. On the other hand we show that if the graph is given via an adjacency-list oracle, there is no classical algorithm that finds the “EXIT” with probability greater than exp(−nδ) using at most exp(nδ) queries for δ= 1/5 − o(1). Our construction of the graph is somewhat similar to the “welded-trees” construction of Childs et al., but uses additional ideas of Hastings for achieving a spectral gap and a short adiabatic path.

Succinct blind Quantum computation using a random oracle

In the universal blind quantum computation problem, a client wants to make use of a single quantum server to evaluate C|0⟩ where C is an arbitrary quantum circuit while keeping C secret. The client’s goal is to use as few resources as possible. This problem, first raised by Broadbent, Fitzsimons and Kashefi [FOCS 2009], has become fundamental to the study of quantum cryptography, not only because of its own importance, but also because it provides a testbed for new techniques that can be later applied to related problems (for example, quantum computation verification). Known protocols on this problem are mainly either information-theoretically (IT) secure or based on trapdoor assumptions (public key encryptions). In this paper we study how the availability of symmetric-key primitives, modeled by a random oracle, changes the complexity of universal blind quantum computation. We give a new universal blind quantum computation protocol. Similar to previous works on IT-secure protocols (for example, Broadbent-Fitzsimons-Kashefi), our protocol can be divided into two phases. In the first phase the client prepares some quantum gadgets with relatively simple quantum gates and sends them to the server, and in the second phase the client is entirely classical — it does not even need quantum storage. Crucially, the protocol’s first phase is succinct, that is, its complexity is independent of the circuit size. Given the security parameter κ, its complexity is only a fixed polynomial of κ, and can be used to evaluate any circuit (or several circuits) of size up to a subexponential of κ. In contrast, known schemes either require the client to perform quantum computations that scale with the size of the circuit, or require trapdoor assumptions.

Sampling matrices from Harish-Chandra–Itzykson–Zuber densities with applications to Quantum inference and differential privacy

Given two Hermitian matrices Y and Λ, the Harish-Chandra–Itzykson–Zuber (HCIZ) distribution is given by the density eTr(U Λ U*Y) with respect to the Haar measure on the unitary group. Random unitary matrices distributed according to the HCIZ distribution are important in various settings in physics and random matrix theory, but the problem of sampling efficiently from this distribution has remained open. We present two algorithms to sample matrices from distributions that are close to the HCIZ distribution. The first produces samples that are ξ-close in total variation distance, and the number of arithmetic operations required depends on poly(log1/ξ). The second produces samples that are ξ-close in infinity divergence, but with a poly(1/ξ) dependence. Our results have the following applications: 1) an efficient algorithm to sample from complex versions of matrix Langevin distributions studied in statistics, 2) an efficient algorithm to sample from continuous maximum entropy distributions over unitary orbits, which in turn implies an efficient algorithm to sample a pure quantum state from the entropy-maximizing ensemble representing a given density matrix, and 3) an efficient algorithm for differentially private rank-k approximation that comes with improved utility bounds for k>1.

Improved Quantum data analysis

We provide more sample-efficient versions of some basic routines in quantum data analysis, along with simpler proofs. Particularly, we give a quantum ”Threshold Search” algorithm that requires only O((log2 m)/є2) samples of a d-dimensional state ρ. That is, given observables 0 ≤ A1, A2, …, Am ≤ 1 such that (ρ Ai) ≥ 1/2 for at least one i, the algorithm finds j with (ρ Aj) ≥ 1/2−є. As a consequence, we obtain a Shadow Tomography algorithm requiring only O((log2 m)(logd)/є4) samples, which simultaneously achieves the best known dependence on each parameter m, d, є. This yields the same sample complexity for quantum Hypothesis Selection among m states; we also give an alternative Hypothesis Selection method using O((log3 m)/є2) samples.

SESSION: Session 8A

Approximating Nash social welfare under rado valuations

We consider the problem of approximating maximum Nash social welfare (NSW) while allocating a set of indivisible items to n agents. The NSW is a popular objective that provides a balanced tradeoff between the often conflicting requirements of fairness and efficiency, defined as the weighted geometric mean of the agents’ valuations. For the symmetric additive case of the problem, where agents have the same weight with additive valuations, the first constant-factor approximation algorithm was obtained in 2015. Subsequent work has obtained constant-factor approximation algorithms for the symmetric case under mild generalizations of additive, and O(n)-approximation algorithms for subadditive valuations and for the asymmetric case.

In this paper, we make significant progress towards both symmetric and asymmetric NSW problems. We present the first constant-factor approximation algorithm for the symmetric case under Rado valuations. Rado valuations form a general class of valuation functions that arise from maximum cost independent matching problems, including as special cases assignment (OXS) valuations and weighted matroid rank functions. Furthermore, our approach also gives the first constant-factor approximation algorithm for the asymmetric case under Rado valuations, provided that the maximum ratio between the weights is bounded by a constant.

Settling the complexity of Nash equilibrium in congestion games

We consider (i) the problem of finding a (possibly mixed) Nash equilibrium in congestion games, and (ii) the problem of finding an (exponential precision) fixed point of the gradient descent dynamics of a smooth function f:[0,1]n → ℝ. We prove that these problems are equivalent. Our result holds for various explicit descriptions of f, ranging from (almost general) arithmetic circuits, to degree-5 polynomials. By a very recent result of [Fearnley et al., STOC 2021], this implies that these problems are PPADPLS-complete. As a corollary, we also obtain the following equivalence of complexity classes:

CCLS = PPAD ⋂ PLS.

Revelation gap for pricing from samples

This paper considers prior-independent mechanism design, in which a single mechanism is designed to achieve approximately optimal performance on every prior distribution from a given class. Most results in this literature focus on mechanisms with truthtelling equilibria, a.k.a., truthful mechanisms. Feng and Hartline [FOCS 2018] introduce the revelation gap to quantify the loss of the restriction to truthful mechanisms. We solve a main open question left in Feng and Hartline [FOCS 2018]; namely, we identify a non-trivial revelation gap for revenue maximization.

Our analysis focuses on the canonical problem of selling a single item to a single agent with only access to a single sample from the agent's valuation distribution. We identify the sample-bid mechanism (a simple non-truthful mechanism) and upper-bound its prior-independent approximation ratio by 1.835 (resp. 1.296) for regular (resp. MHR) distributions. We further prove that no truthful mechanism can achieve prior-independent approximation ratio better than 1.957 (resp. 1.543) for regular (resp. MHR) distributions. Thus, a non-trivial revelation gap is shown as the sample-bid mechanism outperforms the optimal prior-independent truthful mechanism. On the hardness side, we prove that no (possibly non-truthful) mechanism can achieve prior-independent approximation ratio better than 1.073 even for uniform distributions.

Efficient two-sided markets with limited information

A celebrated impossibility result by Myerson and Satterthwaite (1983) shows that any truthful mechanism for two-sided markets that maximizes social welfare must run a deficit, resulting in a necessity to relax welfare efficiency and the use of approximation mechanisms. Such mechanisms in general make extensive use of the Bayesian priors. In this work, we investigate a question of increasing theoretical and practical importance: how much prior information is required to design mechanisms with near-optimal approximations?

Our first contribution is a more general impossibility result stating that no meaningful approximation is possible without any prior information, expanding the famous impossibility result of Myerson and Satterthwaite.

Our second contribution is that one single sample (one number per item), arguably a minimum-possible amount of prior information, from each seller distribution is sufficient for a large class of two-sided markets. We prove matching upper and lower bounds on the best approximation that can be obtained with one single sample for subadditive buyers and additive sellers, regardless of computational considerations.

Our third contribution is the design of computationally efficient blackbox reductions that turn any one-sided mechanism into a two-sided mechanism with a small loss in the approximation, while using only one single sample from each seller. On the way, our blackbox-type mechanisms deliver several interesting positive results in their own right, often beating even the state of the art that uses full prior information.

The complexity of constrained min-max optimization

Despite its important applications in Machine Learning, min-max optimization of objective functions that are nonconvex-nonconcave remains elusive. Not only are there no known first-order methods converging to even approximate local min-max equilibria (a.k.a. approximate saddle points), but the computational complexity of identifying them is also poorly understood. In this paper, we provide a characterization of the computational complexity as well as of the limitations of first-order methods in this problem. Specifically, we show that in linearly constrained min-max optimization problems with nonconvex-nonconcave objectives an approximate local min-max equilibrium of large enough approximation is guaranteed to exist, but computing such a point is PPAD-complete. The same is true of computing an approximate fixed point of the (Projected) Gradient Descent/Ascent update dynamics, which is computationally equivalent to computing approximate local min-max equilibria. An important byproduct of our proof is to establish an unconditional hardness result in the Nemirovsky-Yudin 1983 oracle optimization model, where we are given oracle access to the values of some function f : P → [−1, 1] and its gradient ∇ f, where P ⊆ [0, 1]d is a known convex polytope. We show that any algorithm that uses such first-order oracle access to f and finds an ε-approximate local min-max equilibrium needs to make a number of oracle queries that is exponential in at least one of 1/ε, L, G, or d, where L and G are respectively the smoothness and Lipschitzness of f. This comes in sharp contrast to minimization problems, where finding approximate local minima in the same setting can be done with Projected Gradient Descent using O(L/ε) many queries. Our result is the first to show an exponential separation between these two fundamental optimization problems in the oracle model.

SESSION: Session 9A

On codes decoding a constant fraction of errors on the BSC

We strengthen the results from a recent work by the second author, achieving bounds on the weight distribution of binary linear codes that are successful under block-MAP (as well as bit-MAP) decoding on the BEC. We conclude that a linear code that is successful on the BEC can also decode over a range of binary memoryless symmetric (BMS) channels. In particular, applying the result of Kudekar, Kumar, Mondelli, Pfister, Şaşoğlu and Urbanke from STOC 2016, we prove that a Reed–Muller code of positive rate R decodes errors on the p with high probability if p < 1/2 − √2R(1−2R).

Decoding multivariate multiplicity codes on product sets

The multiplicity Schwartz-Zippel lemma bounds the total multiplicity of zeroes of a multivariate polynomial on a product set. This lemma motivates the multiplicity codes of Kopparty, Saraf and Yekhanin [J. ACM, 2014], who showed how to use this lemma to construct high-rate locally-decodable codes. However, the algorithmic results about these codes crucially rely on the fact that the polynomials are evaluated on a vector space and not an arbitrary product set.

In this work, we show how to decode multivariate multiplicity codes of large multiplicities in polynomial time over finite product sets (over fields of large characteristic and zero characteristic). Previously such decoding algorithms were not known even for a positive fraction of errors. In contrast, our work goes all the way to the distance of the code and in particular exceeds both the unique decoding bound and the Johnson radius. For errors exceeding the Johnson radius, even combinatorial list-decodablity of these codes was not known.

Our algorithm is an application of the classical polynomial method directly to the multivariate setting. In particular, we do not rely on a reduction from the multivariate to the univariate case as is typical of many of the existing results on decoding codes based on multivariate polynomials. However, a vanilla application of the polynomial method in the multivariate setting does not yield a polynomial upper bound on the list size. We obtain a polynomial bound on the list size by taking an alternative view of multivariate multiplicity codes. In this view, we glue all the partial derivatives of the same order together using a fresh set of variables. We then apply the polynomial method by viewing this as a problem over the field () of rational functions in .

Efficient list-decoding with constant alphabet and list sizes

We present an explicit and efficient algebraic construction of capacity-achieving list decodable codes with both constant alphabet and constant list sizes. More specifically, for any R ∈ (0,1) and є>0, we give an algebraic construction of an infinite family of error-correcting codes of rate R, over an alphabet of size (1/є)O(1/є2), that can be list decoded from a (1−R−є)-fraction of errors with list size at most exp(poly(1/є)). Moreover, the codes can be encoded in time poly(1/є, n), the output list is contained in a linear subspace of dimension at most poly(1/є), and a basis for this subspace can be found in time poly(1/є, n). Thus, both encoding and list decoding can be performed in fully polynomial-time poly(1/є, n), except for pruning the subspace and outputting the final list which takes time exp(poly(1/є)) · poly(n). In contrast, prior explicit and efficient constructions of capacity-achieving list decodable codes either required a much higher complexity in terms of 1/є (and were additionally much less structured), or had super-constant alphabet or list sizes.

Our codes are quite natural and structured. Specifically, we use algebraic-geometric (AG) codes with evaluation points restricted to a subfield, and with the message space restricted to a (carefully chosen) linear subspace. Our main observation is that the output list of AG codes with subfield evaluation points is contained in an affine shift of the image of a block-triangular-Toeplitz (BTT) matrix, and that the list size can potentially be reduced to a constant by restricting the message space to a BTT evasive subspace, which is a large subspace that intersects the image of any BTT matrix in a constant number of points. We further show how to explicitly construct such BTT evasive subspaces, based on the explicit subspace designs of Guruswami and Kopparty (Combinatorica, 2016), and composition.

Explicit uniquely decodable codes for space bounded channels that achieve list-decoding capacity

We consider codes for space bounded channels. This is a model for communication under noise that was introduced by Guruswami and Smith (J. ACM 2016) and lies between the Shannon (random) and Hamming (adversarial) models. In this model, a channel is a space bounded procedure that reads the codeword in one pass, and modifies at most a p fraction of the bits of the codeword.

(1) Explicit uniquely decodable codes for space bounded channels: Our main result is that for every 0 ≤ p < 1/4, there exists a constant δ>0 and a uniquely decodable code with rate 1−H(p) for channels with space nδ. This code is explicit (meaning that encoding and decoding are in poly-time). This improves upon previous explicit codes by Guruswami and Smith, and Kopparty, Shaltiel and Silbak (FOCS 2019). Specifically, we obtain the same space and rate as earlier works, even though prior work gave only list-decodable codes (rather than uniquely decodable codes).

(2) Complete characterization of the capacity of space bounded channels: Together with a result by Guruswami and Smith showing the impossibility of unique decoding for p ≥ 1/4, our techniques also give a complete characterization of the capacity R(p) of space n1−o(1) channels, specifically: For 0≤p<1/4 R(p)=1-H(p) and for p ≥1/4 R(p)=0. This capacity is strictly larger than the capacity of Hamming channels for every 0 < p < 1/4, and matches the capacity of list decoding, and binary symmetric channels in this range. Curiously, this shows that R(·) is not continuous at p=1/4.

Our results are incomparable to recent work on casual channels (these are stronger channels that read the codeword in one pass, but there is no space restriction). The best known codes for casual channels, due to Chen, Jaggi and Langberg (STOC 2015), are shown to exist by the probabilistic method, and no explicit codes are known.

A key new ingredient in our construction is a new notion of “evasiveness” of codes, which is concerned with whether a decoding algorithm rejects a word that is obtained when a channel induces few errors to a uniformly chosen (or pseudorandom) string. We use evasiveness (as well as several additional new ideas related to coding theory and pseudorandomness) to identify the “correct” message in the list obtained by previous list-decoding algorithms.

Near-linear time decoding of Ta-Shma’s codes via splittable regularity

The Gilbert–Varshamov bound non-constructively establishes the existence of binary codes of distance 1/2−є/2 and rate Ω(є2). In a breakthrough result, Ta-Shma [STOC 2017] constructed the first explicit family of nearly optimal binary codes with distance 1/2−є/2 and rate Ω(є2+α), where α → 0 as є → 0. Moreover, the codes in Ta-Shma’s construction are є-balanced, where the distance between distinct codewords is not only bounded from below by 1/2−є/2, but also from above by 1/2+є/2.

Polynomial time decoding algorithms for (a slight modification of) Ta-Shma’s codes appeared in [FOCS 2020], and were based on the Sum-of-Squares (SoS) semidefinite programming hierarchy. The running times for these algorithms were of the form NOα(1) for unique decoding, and NOє,α(1) for the setting of “gentle list decoding”, with large exponents of N even when α is a fixed constant. We derive new algorithms for both these tasks, running in time Õє(N). Our algorithms also apply to the general setting of decoding direct-sum codes.

Our algorithms follow from new structural and algorithmic results for collections of k-tuples (ordered hypergraphs) possessing a “structured expansion” property, which we call splittability. This property was previously identified and used in the analysis of SoS-based decoding and constraint satisfaction algorithms, and is also known to be satisfied by Ta-Shma’s code construction. We obtain a new weak regularity decomposition for (possibly sparse) splittable collections W ⊆ [n]k, similar to the regularity decomposition for dense structures by Frieze and Kannan [FOCS 1996]. These decompositions are also computable in near-linear time Õ(|W |), and form a key component of our algorithmic results.

SESSION: Session 8B

Optimal mixing of Glauber dynamics: entropy factorization via high-dimensional expansion

We prove an optimal mixing time bound for the single-site update Markov chain known as the Glauber dynamics or Gibbs sampling in a variety of settings. Our work presents an improved version of the spectral independence approach of Anari et al. (2020) and shows O(nlogn) mixing time on any n-vertex graph of bounded degree when the maximum eigenvalue of an associated influence matrix is bounded. As an application of our results, for the hard-core model on independent sets weighted by a fugacity λ, we establish O(nlogn) mixing time for the Glauber dynamics on any n-vertex graph of constant maximum degree Δ when λ<λc(Δ) where λc(Δ) is the critical point for the uniqueness/non-uniqueness phase transition on the Δ-regular tree. More generally, for any antiferromagnetic 2-spin system we prove O(nlogn) mixing time of the Glauber dynamics on any bounded degree graph in the corresponding tree uniqueness region. Our results apply more broadly; for example, we also obtain O(nlogn) mixing for q-colorings of triangle-free graphs of maximum degree Δ when the number of colors satisfies q > α Δ where α ≈ 1.763, and O(mlogn) mixing for generating random matchings of any graph with bounded degree and m edges.

Our approach is based on two steps. First, we show that the approximate tensorization of entropy (i.e., factorizing entropy into single vertices), which is a key step for establishing the modified log-Sobolev inequality in many previous works, can be deduced from entropy factorization into blocks of fixed linear size. Second, we adapt the local-to-global scheme of Alev and Lau (2020) to establish such block factorization of entropy in a more general setting of pure weighted simplicial complexes satisfying local spectral expansion; this also substantially generalizes the result of Cryan et al. (2019).

Entropy decay in the Swendsen–Wang dynamics on ℤd

We study the mixing time of the Swendsen-Wang dynamics for the ferromagnetic Ising and Potts models on the integer lattice ℤd. This dynamics is a widely used Markov chain that has largely resisted sharp analysis because it is non-local, i.e., it changes the entire configuration in one step. We prove that, whenever strong spatial mixing (SSM) holds, the mixing time on any n-vertex cube in ℤd is O(logn), and we prove this is tight by establishing a matching lower bound. The previous best known bound was O(n). SSM is a standard condition corresponding to exponential decay of correlations with distance between spins on the lattice and is known to hold in d=2 dimensions throughout the high-temperature (single phase) region. Our result follows from a modified log-Sobolev inequality, which expresses the fact that the dynamics contracts relative entropy at a constant rate at each step. The proof of this fact utilizes a new factorization of the entropy in the joint probability space over spins and edges that underlies the Swendsen-Wang dynamics, which extends to general bipartite graphs of bounded degree. This factorization leads to several additional results, including mixing time bounds for a number of natural local and non-local Markov chains on the joint space, as well as for the standard random-cluster dynamics.

Sampling constraint satisfaction solutions in the local lemma regime

We give a Markov chain based algorithm for sampling almost uniform solutions of constraint satisfaction problems (CSPs). Assuming a canonical setting for the Lovász local lemma, where each constraint is violated by a small number of forbidden local configurations, our sampling algorithm is accurate in a local lemma regime, and the running time is a fixed polynomial whose dependency on n is close to linear, where n is the number of variables. Our main approach is a new technique called state compression, which generalizes the “mark/unmark” paradigm of Moitra, and can give fast local-lemma-based sampling algorithms. As concrete applications of our technique, we give the current best almost-uniform samplers for hypergraph colorings and for CNF solutions.

Frozen 1-RSB structure of the symmetric Ising perceptron

We prove, under an assumption on the critical points of a real-valued function, that the symmetric Ising perceptron exhibits the `frozen 1-RSB' structure conjectured by Krauth and Mezard in the physics literature; that is, typical solutions of the model lie in clusters of vanishing entropy density. Moreover, we prove this in a very strong form conjectured by Huang, Wong, and Kabashima: a typical solution of the model is isolated with high probability and the Hamming distance to all other solutions is linear in the dimension. The frozen 1-RSB scenario is part of a recent and intriguing explanation of the performance of learning algorithms by Baldassi, Ingrosso, Lucibello, Saglietti, and Zecchina. We prove this structural result by comparing the symmetric Ising perceptron model to a planted model and proving a comparison result between the two models. Our main technical tool towards this comparison is an inductive argument for the concentration of the logarithm of number of solutions in the model.

Perfectly sampling k ≥ (8/3 + o(1))Δ-colorings in graphs

We present a randomized algorithm which takes as input an undirected graph G on n vertices with maximum degree Δ, and a number of colors k ≥ (8/3 + oΔ(1))Δ, and returns – in expected time Õ(nΔ2logk) – a proper k-coloring of G distributed perfectly uniformly on the set of all proper k-colorings of G. Notably, our sampler breaks the barrier at k = 3Δ encountered in recent work of Bhandari and Chakraborty [STOC 2020]. We also discuss how our methods may be modified to relax the restriction on k to k ≥ (8/3 − є0)Δ for an absolute constant є0 > 0.

As in the work of Bhandari and Chakraborty, and the pioneering work of Huber [STOC 1998], our sampler is based on Coupling from the Past [Propp&Wilson, Random Struct. Algorithms, 1995] and the bounding chain method [Huber, STOC 1998; H'aggstr'om& Nelander, Scand. J. Statist., 1999]. Our innovations include a novel bounding chain routine inspired by Jerrum’s analysis of the Glauber dynamics [Random Struct. Algorithms, 1995], as well as a preconditioning routine for bounding chains which uses the algorithmic Lovász Local Lemma [Moser&Tardos, J.ACM, 2010].

SESSION: Session 9B

The metric relaxation for 0-extension admits an Ω(log2/3k) gap

We consider the 0-Extension problem, where we are given an undirected graph G=(V,E) equipped with non-negative edge weights w:E→ ℝ+, a collection T={ t1,…,tk}⊆ V of k special vertices called terminals, and a semi-metric D over T. The goal is to assign every non-terminal vertex to a terminal while minimizing the sum over all edges of the weight of the edge multiplied by the distance in D between the terminals to which the endpoints of the edge are assigned. 0-Extension admits two known algorithms, achieving approximations of O(logk) [Călinescu-Karloff-Rabani SICOMP ’05] and O(logk/loglogk) [Fakcharoenphol-Harrelson-Rao-Talwar SODA ’03]. Both known algorithms are based on rounding a natural linear programming relaxation called the metric relaxation, in which D is extended from T to the entire of V. The current best known integrality gap for the metric relaxation is Ω (√logk). In this work we present an improved integrality gap of Ω(log2/3k) for the metric relaxation. Our construction is based on the randomized extension of one graph by another, a notion that captures lifts of graphs as a special case and might be of independent interest. Inspired by algebraic topology, our analysis of the gap instance is based on proving no continuous section (in the topological sense) exists in the randomized extension.

Optimal inapproximability of satisfiable k-LIN over non-abelian groups

A seminal result of Håstad (2001) shows that it is NP-hard to find an assignment that satisfies 1/|G|+ε fraction of the constraints of a given k-LIN instance over an abelian group, even if there is an assignment that satisfies (1−ε) fraction of the constraints, for any constant ε>0. Engebretsen, Holmerin and Russell (2004) later showed that the same hardness result holds for k-LIN instances over any finite non-abelian group.

Unlike the abelian case, where we can efficiently find a solution if the instance is satisfiable, in the non-abelian case, it is NP-complete to decide if a given system of linear equations is satisfiable or not, as shown by Goldmann and Russell (1999).

Surprisingly, for certain non-abelian groups G, given a satisfiable k-LIN instance over G, one can in fact do better than just outputting a random assignment using a simple but clever algorithm. The approximation factor achieved by this algorithm varies with the underlying group. In this paper, we show that this algorithm is optimal by proving a tight hardness of approximation of satisfiable k-LIN instance over any non-abelian G, assuming PNP. As a corollary, we also get 3-query probabilistically checkable proofs with perfect completeness over large alphabets with improved soundness.

Our proof crucially uses the quasi-random properties of the non-abelian groups defined by Gowers (2008).

Playing unique games on certified small-set expanders

We give an algorithm for solving unique games (UG) instances whenever low-degree sum-of-squares proofs certify good bounds on the small-set-expansion of the underlying constraint graph via a hypercontractive inequality. Our algorithm is in fact more versatile, and succeeds even when the constraint graph is not a small-set expander as long as the structure of non-expanding small sets is (informally speaking) “characterized” by a low-degree sum-of-squares proof. Our results are obtained by rounding low-entropy solutions — measured via a new global potential function — to sum-of-squares (SoS) semidefinite programs. This technique adds to the (currently short) list of general tools for analyzing SoS relaxations for worst-case optimization problems.

As corollaries, we obtain the first polynomial-time algorithms for solving any UG instance where the constraint graph is either the noisy hypercube, the short code or the Johnson graph. The prior best algorithm for such instances was the eigenvalue enumeration algorithm of Arora, Barak, and Steurer (2010) which requires quasi-polynomial time for the noisy hypercube and nearly-exponential time for the short code and Johnson graphs. All of our results achieve an approximation of 1−є vs δ for UG instances, where є>0 and δ > 0 depend on the expansion parameters of the graph but are independent of the alphabet size.

Expander random walks: a Fourier-analytic approach

In this work we ask the following basic question: assume the vertices of an expander graph are labelled by 0,1. What “test” functions f : { 0,1}t → {0,1} cannot distinguish t independent samples from those obtained by a random walk? The expander hitting property due to Ajtai, Komlos and Szemeredi (STOC 1987) is captured by the AND test function, whereas the fundamental expander Chernoff bound due to Gillman (SICOMP 1998), Heally (Computational Complexity 2008) is about test functions indicating whether the weight is close to the mean. In fact, it is known that all threshold functions are fooled by a random walk (Kipnis and Varadhan, Communications in Mathematical Physics 1986). Recently, it was shown that even the highly sensitive PARITY function is fooled by a random walk Ta-Shma (STOC 2017).

We focus on balanced labels. Our first main result is proving that all symmetric functions are fooled by a random walk. Put differently, we prove a central limit theorem (CLT) for expander random walks with respect to the total variation distance, significantly strengthening the classic CLT for Markov Chains that is established with respect to the Kolmogorov distance (Kipnis and Varadhan, Communications in Mathematical Physics 1986). Our approach significantly deviates from prior works. We first study how well a Fourier character χS is fooled by a random walk as a function of S. Then, given a test function f, we expand f in the Fourier basis and combine the above with known results on the Fourier spectrum of f.

We also proceed further and consider general test functions - not necessarily symmetric. As our approach is Fourier analytic, it is general enough to analyze such versatile test functions. For our second result, we prove that random walks on sufficiently good expander graphs fool tests functions computed by AC0 circuits, read-once branching programs, and functions with bounded query complexity.

Local concentration inequalities and Tomaszewski’s conjecture

We prove Tomaszewski’s conjecture (1986): Let f:{−1,1}n → ℝ be of the form f(x)= ∑i=1n ai xi. Then Pr[|f(x)| ≤ √Var[f]] ≥ 1/2. Our main novel tools are local concentration inequalities and an improved Berry-Esseen inequality for first-degree functions on the discrete cube. These tools are of independent interest, and may be useful in the study of linear threshold functions and of low degree Boolean functions.

SESSION: Session 8C

Improving Schroeppel and Shamir’s algorithm for subset sum via orthogonal vectors

We present an O(20.5n) time and O(20.249999n) space randomized algorithm for solving worst-case Subset Sum instances with n integers. This is the first improvement over the long-standing O(2n/2) time and O(2n/4) space algorithm due to Schroeppel and Shamir (FOCS 1979).

We breach this gap in two steps: (1) We present a space efficient reduction to the Orthogonal Vectors Problem (OV), one of the most central problem in Fine-Grained Complexity. The reduction is established via an intricate combination of the method of Schroeppel and Shamir, and the representation technique introduced by Howgrave-Graham and Joux (EUROCRYPT 2010) for designing Subset Sum algorithms for the average case regime. (2) We provide an algorithm for OV that detects an orthogonal pair among N given vectors in {0,1}d with support size d/4 in time Õ(N· 2d/d d/4). Our algorithm for OV is based on and refines the representative families framework developed by Fomin, Lokshtanov, Panolan and Saurabh (J. ACM 2016).

Our reduction uncovers a curious tight relation between Subset Sum and OV, because any improvement of our algorithm for OV would imply an improvement over the runtime of Schroeppel and Shamir, which is also a long standing open problem.

Settling SETH vs. approximate sparse directed unweighted diameter (up to (NU)NSETH)

We prove several tight results on the fine-grained complexity of approximating the diameter of a graph. First, we prove that, for any ε>0, assuming the Strong Exponential Time Hypothesis (SETH), there are no near-linear time 2−ε-approximation algorithms for the Diameter of a sparse directed graph, even in unweighted graphs. This result shows that a simple near-linear time 2-approximation algorithm for Diameter is optimal under SETH, answering a question from a survey of Rubinstein and Vassilevska-Williams (SIGACT ’19) for the case of directed graphs.

In the same survey, Rubinstein and Vassilevska-Williams also asked if it is possible to show that there are no 2−ε approximation algorithms for Diameter in a directed graph in O(n1.499) time. We show that, assuming a hypothesis called NSETH, one cannot use a deterministic SETH-based reduction to rule out the existence of such algorithms.

Extending the techniques in these two results, we characterize whether a 2−ε approximation algorithm running in time O(n1+δ) for the Diameter of a sparse directed unweighted graph can be ruled out by a deterministic SETH-based reduction for every δ∈(0,1) and essentially every ε∈(0,1), assuming NSETH. This settles the SETH-hardness of approximating the diameter of sparse directed unweighted graphs for deterministic reductions, up to NSETH. We make the same characterization for randomized SETH-based reductions, assuming another hypothesis called NUNSETH.

We prove additional hardness and non-reducibility results for undirected graphs.

Tight conditional lower bounds for approximating diameter in directed graphs

Among the most fundamental graph parameters is the Diameter, the largest distance between any pair of vertices in a graph. Computing the Diameter of a graph with m edges requires m2−o(1) time under the Strong Exponential Time Hypothesis (SETH), which can be prohibitive for very large graphs, so efficient approximation algorithms for Diameter are desired.

There is a folklore algorithm that gives a 2-approximation for Diameter in Õ(m) time (where Õ notation suppresses logarithmic factors). Additionally, a line of work [SODA’96, STOC’13, SODA’14] concludes with a 3/2-approximation algorithm for Diameter in weighted directed graphs that runs in Õ(m3/2) time. For directed graphs, these are the only known approximation algorithms for Diameter.

The 3/2-approximation algorithm is known to be tight under SETH: Roditty and Vassilevska W. [STOC’13] proved that under SETH any 3/2−ε approximation algorithm for Diameter in undirected unweighted graphs requires m2−o(1) time, and then Backurs, Roditty, Segal, Vassilevska W., and Wein [STOC’18] and the follow-up work of Li proved that under SETH any 5/3−ε approximation algorithm for Diameter in undirected unweighted graphs requires m3/2−o(1) time.

Whether or not the folklore 2-approximation algorithm is tight, however, is unknown, and has been explicitly posed as an open problem in numerous papers. Towards this question, Bonnet recently proved that under SETH, any 7/4−ε approximation requires m4/3−o(1), only for directed weighted graphs.

We completely resolve this question for directed graphs by proving that the folklore 2-approximation algorithm is conditionally optimal. In doing so, we obtain a series of conditional lower bounds that together with prior work, give a complete time-accuracy trade-off that is tight with the three known algorithms for directed graphs. Specifically, we prove that under SETH for any δ>0, a (2k−1/k−δ)-approximation algorithm for Diameter on directed unweighted graphs requires mk/k−1−o(1) time.

Sparse nonnegative convolution is equivalent to dense nonnegative convolution

Computing the convolution AB of two length-n vectors A,B is an ubiquitous computational primitive, with applications in a variety of disciplines. Within theoretical computer science, applications range from string problems to Knapsack-type problems, and from 3SUM to All-Pairs Shortest Paths. These applications often come in the form of nonnegative convolution, where the entries of A,B are nonnegative integers. The classical algorithm to compute AB uses the Fast Fourier Transform (FFT) and runs in time O(n logn).

However, in many cases A and B might satisfy sparsity conditions, and hence one could hope for significant gains compared to the standard FFT algorithm. The ideal goal would be an O(k logk)-time algorithm, where k is the number of non-zero elements in the output, i.e., the size of the support of AB. This problem is referred to as sparse nonnegative convolution, and has received a considerable amount of attention in the literature; the fastest algorithms to date run in time O(k log2 n).

The main result of this paper is the first O(k logk)-time algorithm for sparse nonnegative convolution. Our algorithm is randomized and assumes that the length n and the largest entry of A and B are subexponential in k. Surprisingly, we can phrase our algorithm as a reduction from the sparse case to the dense case of nonnegative convolution, showing that, under some mild assumptions, sparse nonnegative convolution is equivalent to dense nonnegative convolution for constant-error randomized algorithms. Specifically, if D(n) is the time to convolve two nonnegative length-n vectors with success probability 2/3, and S(k) is the time to convolve two nonnegative vectors with output size k with success probability 2/3, then S(k) = O(D(k) + k (loglogk)2).

Our approach uses a variety of new techniques in combination with some old machinery from linear sketching and structured linear algebra, as well as new insights on linear hashing, the most classical hash function.

Subcubic algorithms for Gomory–Hu tree in unweighted graphs

Every undirected graph G has a (weighted) cut-equivalent tree T, commonly named after Gomory and Hu who discovered it in 1961. Both T and G have the same node set, and for every node pair s,t, the minimum (s,t)-cut in T is also an exact minimum (s,t)-cut in G.

We give the first subcubic-time algorithm that constructs such a tree for a simple graph G (unweighted with no parallel edges). Its time complexity is Õ(n2.5), for n=|V(G)|; previously, only Õ(n3) was known, except for restricted cases like sparse graphs. Consequently, we obtain the first algorithm for All-Pairs Max-Flow in simple graphs that breaks the cubic-time barrier.

Gomory and Hu compute this tree using n−1 queries to (single-pair) Max-Flow; the new algorithm can be viewed as a fine-grained reduction to Õ(√n) Max-Flow computations on n-node graphs.

Approximate Gomory–Hu tree is faster than n – 1 max-flows

The Gomory-Hu tree or cut tree (Gomory and Hu, 1961) is a classic data structure for reporting st mincuts (and by duality, the values of st maxflows) for all pairs of vertices s and t in an undirected graph. Gomory and Hu showed that it can be computed using n−1 exact maxflow computations. Surprisingly, this remains the best algorithm for Gomory-Hu trees more than 50 years later, even for approximate mincuts. In this paper, we break this longstanding barrier and give an algorithm for computing a (1+є)-approximate Gomory-Hu tree using log(n) maxflow computations. Specifically, we obtain the runtime bounds we describe below.

We obtain a randomized (Monte Carlo) algorithm for undirected, weighted graphs that runs in Õ(m + n3/2) time and returns a (1+є)-approximate Gomory-Hu tree algorithm whp. Previously, the best running time known was Õ(n5/2), which is obtained by running Gomory and Hu’s original algorithm on a cut sparsifier of the graph.

Next, we obtain a randomized (Monte Carlo) algorithm for undirected, unweighted graphs that runs in m4/3+o(1) time and returns a (1+є)-approximate Gomory-Hu tree algorithm whp. This improves on our first result for sparse graphs, namely m = o(n9/8). Previously, the best running time known for unweighted graphs was Õ(mn) for an exact Gomory-Hu tree (Bhalgat et al., STOC 2007); no better result was known if approximations are allowed.

As a consequence of our Gomory-Hu tree algorithms, we also solve the (1+є)-approximate all pairs mincut and single source mincut problems in the same time bounds. (These problems are simpler in that the goal is to only return the st mincut values, and not the mincuts.) This improves on the recent algorithm for these problems in Õ(n2) time due to Abboud et al. (FOCS 2020).

SESSION: Session 9C

Constant approximating k-clique is w[1]-hard

For every graph G, let ω(G) be the largest size of complete subgraph in G. This paper presents a simple algorithm which, on input a graph G, a positive integer k and a small constant є>0, outputs a graph G′ and an integer k′ in 2Θ(k5)· |G|O(1)-time such that (1) k′≤ 2Θ(k5), (2) if ω(G)≥ k, then ω(G′)≥ k′, (3) if ω(G)<k, then ω(G′)< (1−є)k′. This implies that no f(k)· |G|O(1)-time algorithm can distinguish between the cases ω(G)≥ k and ω(G)<k/c for any constant c≥ 1 and computable function f, unless FPT= W[1].

Vertex deletion parameterized by elimination distance and even less

We study the parameterized complexity of various classic vertex-deletion problems such as Odd cycle transversal, Vertex planarization, and Chordal vertex deletion under hybrid parameterizations. Existing FPT algorithms for these problems either focus on the parameterization by solution size, detecting solutions of size k in time f(k) · nO(1), or width parameterizations, finding arbitrarily large optimal solutions in time f(w) · nO(1) for some width measure w like treewidth. We unify these lines of research by presenting FPT algorithms for parameterizations that can simultaneously be arbitrarily much smaller than the solution size and the treewidth.

The first class of parameterizations is based on the notion of elimination distance of the input graph to the target graph class , which intuitively measures the number of rounds needed to obtain a graph in by removing one vertex from each connected component in each round. The second class of parameterizations consists of a relaxation of the notion of treewidth, allowing arbitrarily large bags that induce subgraphs belonging to the target class of the deletion problem as long as these subgraphs have small neighborhoods. Both kinds of parameterizations have been introduced recently and have already spawned several independent results.

Our contribution is twofold. First, we present a framework for computing approximately optimal decompositions related to these graph measures. Namely, if the cost of an optimal decomposition is k, we show how to find a decomposition of cost kO(1) in time f(k) · nO(1). This is applicable to any class for which we can solve the so-called separation problem. Secondly, we exploit the constructed decompositions for solving vertex-deletion problems by extending ideas from algorithms using iterative compression and the finite state property. For the three mentioned vertex-deletion problems, and all problems which can be formulated as hitting a finite set of connected forbidden (a) minors or (b) (induced) subgraphs, we obtain FPT algorithms with respect to both studied parameterizations. For example, we present an algorithm running in time nO(1) + 2kO(1)·(n+m) and polynomial space for Odd cycle transversal parameterized by the elimination distance k to the class of bipartite graphs.

A full complexity dichotomy for immanant families

Given an integer n ≥ 1 and an irreducible character χλ of Sn for some partition λ of n, the immanant immλ:ℂn× n→ℂ maps matrices A∈ℂn× n to immλ(A)=∑π∈ Snχλ(π)∏i=1nAi,π(i). Important special cases include the determinant and permanent, which are obtained from the sign and trivial character, respectively.

It is known that immanants can be evaluated in polynomial time for characters that are “close” to the sign character: Given a partition λ of n with s parts, let b(λ):=ns count the boxes to the right of the first column in the Young diagram of λ. For a family of partitions Λ, let b(Λ) := maxλ∈Λb(λ) and write Imm(Λ) for the problem of evaluating immλ(A) on input A and λ∈Λ. On the positive side, if b(Λ)<∞, then Imm(Λ) is known to be polynomial-time computable. This subsumes the case of the determinant. Conversely, if b(Λ)=∞, then previously known hardness results suggest that Imm(Λ) cannot be solved in polynomial time. However, these results only address certain restricted classes of families Λ.

In this paper, we show that the assumption FPT≠ #W[1] from parameterized complexity rules out polynomial-time algorithms for Imm(Λ) for any computationally reasonable family of partitions Λ with b(Λ)=∞. We give an analogous result in algebraic complexity under the assumption VFPTVW[1]. Furthermore, if b(λ) even grows polynomially in Λ, we show that Imm(Λ) is hard for #P and VNP. This concludes a series of partial results on the complexity of immanants obtained over the last 35 years.

A nearly-linear time algorithm for linear programs with small treewidth: a multiscale representation of robust central path

Arising from structural graph theory, treewidth has become a focus of study in fixed-parameter tractable algorithms. Many NP-hard problems are known to be solvable in O(n · 2O(τ)) time, where τ is the treewidth of the input graph. Analogously, many problems in P should be solvable in O(n · τO(1)) time; however, due to the lack of appropriate tools, only a few such results are currently known. In our paper, we show this holds for linear programs: Given a linear program of the form minAx=b,ℓ ≤ xu c x whose dual graph GA has treewidth τ, and a corresponding width-τ tree decomposition, we show how to solve it in time

O(n · τ2 log(1/ε)),

where n is the number of variables and ε is the relative accuracy. When a tree decomposition is not given, we use existing techniques in vertex separators to obtain algorithms with O(n · τ4 log(1/ε)) and O(n · τ2 log(1/ε) + n1.5) run-times. Besides being the first of its kind, our algorithm has run-time nearly matching the fastest run-time for solving the sub-problem Ax=b (under the assumption that no fast matrix multiplication is used). We obtain these results by combining recent techniques in interior-point methods (IPMs), sketching, and a novel representation of the solution under a multiscale basis similar to the wavelet basis. This representation further yields the first IPM with o(rank(A)) time per iteration when the treewidth is small.