This page contains proposed challenges for theoretical computer science to be discussed at the upcoming Workshop on Challenges for Theoretical Computer Science and/or included in the resulting report. If you would like to submit a proposed challenge, send a paragraph-length description (html, raw text, latex source) to David Johnson (dsj@research.att.com).

For each proposal, we list the proposer and the date that the proposed challenge was added to the website (to help repeat visitors identify what is new since the last time they visited).

Note that these draft challenges are presented at various levels of informality, with possible overlaps and imperfect classification. They also do not yet cover the full range of topics that will be discussed at the workshop on May 20. Between now and then there will be daily updates. As the contents increase, we will begin organizing the material into separate web pages based on area, with a separate page for the most recent additions.

After the workshop we will begin the process of refining the list, improving the formulations, and filling in the gaps.

David Johnson - 11 May 2000

Many real-world optimization problems are not only NP-hard but also hard-to-approximate. Thus practitioners for the most part seem to be turning to variants on local search, either straight local optimization or one of its "meta-heuristic" variants such as simulated annealing, genetic algorithms, go-with-the-winners, GRASP, tabu search, neural nets, etc. Currently the only way to tell whether an algorithm of this sort will be effective for a given problem is to implement and run it. We need a theory that will address both the design of search neighborhoods and the effectiveness of various search strategies. Initial work has given insight in narrow special cases, but no useful general theory has yet been developed. We need one.

**Challenge: Use Finance Ideas to Solve Computer Science Problems**

Ming Kao - 18 May 2000

A number of researchers have proposed to use auction mechanisms to solve optimization problems. I have seen at least one paper that uses the notion of call options to evaluate software design choices.

**Challenge: Exploiting Stability/Robustness**

Naftali Tishby - 12 May 2000

Some hard computational problems, such as data clustering, seem "hard only when they are not interesting". In other words, when there is a "stable/useful" solution, or when the solution is robust in some sense, it is often not too hard to find or approximate, and many "heuristics" or "meta-heuristics" do it very well. This clearly indicates a fundamental theoretical link between stability/robustness of the solution and hardness. What is this link?

Jeff Ullman - 12 May 2000

Invent new ways of executing queries on large-scale data, using the secondary-storage model of cost. Especially, consider complex queries involving grouping and aggregation and multiway joins.

**Challenge: Constructing Query Plans**

Jeff Ullman - 12 May 2000

Find better ways of searching the exponential space of query plans in order to find adequate plans in subexponential time. Again, the interesting cases are when the complex operations mentioned above are present, and when the availability of materialized views allows substitutions of views for certain portions of the query.

background can be found in Garcia, Ullman, Widom, "Database System Implementation," PH, 1999: ch. 2 for the model, ch. 6 for query execution, and ch. 7 for query optimization.

**Challenge: Efficient Modelling of Web Pages**

Jeff Ullman - 12 May 2000

Develop tools for describing the structure of Web pages and other information sources and efficiently integrating them into a relational database or other regular structure. Some interesting approaches include the way Sergey Brin mined the Web for book-author facts, in

http://www-db.stanford.edu/~sergey/extract.ps

and the technology of Levy, Mendelzon, Sagiv, and Srivastava ("answering queries using views"), Duschka, and others for using Datalog to build integrated products. These references, others, and an overview of the subject can be found through

http://www-db.stanford.edu/~ullman/cs345-notes.html

Uzi Vishkin - 19 May 2000

Recent processors harness more and more hardware for improving performance by a variety of ways, including instruction-level parallelism. The Vtune package, http://developer.intel.com/vtune, allows for performance tuning of application programs for each of Intel's recent processors, and AMD has similar tools.

(i) Evaluate how well such processor-specific tuning fits algorithmic
models we use, or could use.

(ii) Could PRAM algorithmics, or another algorithmic approach, lead towards an
a research agenda for new computer systems,
where the responsibility of the programmer could be cleanly stated?
Proceed to validate whatever you propose through
modeling and various modes of quantitative analysis.

Naftali Tishby - 12 May 2000

Modern biology poses several important computational problems with "cryptographic nature". These include the problem of neural-coding, the *true* reading of the genetic code (finding the phenotype from the genes), the connection between sequence and folding of proteins, etc. Is there any possible TCS contribution to the understanding/solving these problems?

Peter Shor - 10 May 2000

What can be computed in polynomial time by a quantum computer? In particular, can one solve NP-hard problems?

**Challenge: Using Small Quantum Computers**

Peter Shor - 10 May 2000

It is likely that if quantum computers can be built, the first ones will be limited to a small number of qubits. Determine interesting problems that can be solved with such small (e.g., 100 qubit) quantum computers.

**Challenge: Complexity Theory for Phase Transitions**

Naftali Tishby - 12 May 2000

One thing we learn from the statistical mechanics of several hard combinatorial optimization/decision problems (e.g. K-SAT, learning in binary perceptrons, clustering, etc.) is that there are *phases* (regions in the parameter space of the problem) where the problem seems to be hard-on-average, and other phases where the problem seems easy. What is corresponding computational complexity theory behind this important phenomena?

**Challenge: Complexity Theory for Physical Systems**

Naftali Tishby - 12 May 2000

Many physical systems, even classical, have behind them computational theories that seem to be hard in the TCS sense. Examples include general relativity and other classical field theories, turbulence and classical chaos, statistical mechanics of disordered systems, etc.

What does TCS tell the physicists about the fundamental computational limitations of their "fancy" theories? How is this related to the computational "phases" issue?

Fred Roberts - 12 May 2000

Increasingly, problems involving finding a group consensus, voting, and elections involve daunting computational issues. Concepts of TCS such as computational complexity are not new to voting. There is already a whole body of results showing the computational intractability of finding a group consensus under certain voting rules.

Alternative election procedures such as approval voting carry with them the need for efficient computational methods for calculating the solution.

Related problems involve measures of "power" such as the Shapley and Banzhaf indices. These power indices have to be calculated for huge voting "games" and the challenge is to develop new methods for calculating them, at least approximately.

**Challenge: Political Districting**

Fred Roberts - 12 May 2000

There are many different criteria involved in finding a "fair" political districting. Are the districts approximately equal in size? Homogeneous? Geographically coherent? Do they keep natural regions together?

Finding solutions that even approximate optimality on a few of these criteria turns out to be a complex computational problem. Recent developments in large scale discrete optimization help both theoretically and algorithmically and some computational experiments indicate that there are promising heuristics for solving these problems. However, the problems are only now coming to be precisely defined to the point where an approach by methods of TCS would be useful.

**Challenge: Modelling Conflict/Competition/Coalition**

Fred Roberts - 12 May 2000

Game Theory was developed to model conflict/competition/coalition behavior. Today's business and military operations commonly involve coalitions. The coalition partners are sometimes also competitors. Information sharing is needed, but vital interests of partners must be protected.

Game theory has traditionally provided a framework for analyzing competitive decision making, but major new developments are needed to handle these new competitive realities. Challenges for TCS include modeling security and interoperability in the conflict/competition context of game theory.

**Challenge: Distributed Decision Making**

Fred Roberts - 12 May 2000

Today, decision making by the corporate, government, or military decision maker requires flawless, efficient, coordination among many different "players". Such decision making is increasingly dominated by the interconnection of different computing devices, global communications and information sharing, and the interconnection of multiple networks and software systems.

Communication and coordination among agents has been studied in the framework of distributed computing. This work addresses reliability, synchronization, and communication complexity, but often involves such unrealistic assumptions as essentially infinite computing power by agents. Procedures for merging or aggregating information have been widely studied by the decision science community, but without emphasis on the reliability of communication or the computational complexity of the protocol.

The computational characteristics of the agents, especially when they include sensors, are extremely relevant, as is the structure of the communication network. We will need to combine algorithmic and decision science approaches, simultaneously taking into account requirements for good group decisions, constraints on the computational power of the agents, and the structure of the communication network.

**Challenge: Auctions and Bidding**

Fred Roberts - 12 May 2000

Increasingly, economists are turning to models that involve individual "actors" who each have partial information or where economic decisions have to be made in the face of computational limits or costly computation. New foundational models are needed to develop these ideas.

Auctions are a prime example where these new models are needed. Auctions and bidding have enormous impact (FCC auctions to telecommunications companies for parts of the radio spectrum, oil leases, etc.). Theoretical approaches to auctions and bidding are already heavily mathematical. New models taking account of partial information or limits on computation would be extremely important.

Ming Kao - 18 May 2000

How randomly do stock prices move? Is it possible that stock prices are deterministic to a significant degree if we may process them in exponential time? Possible approach: Construct communities that are formed by very simple agents and yet have NP-hard behavior. etc.

**Challenge: Risk Management and Derivative Pricing for Very Large Portfolios**

Ming Kao - 18 May 2000

The statistical behavior of a single stock is considered simple. Consequently, it is relatively easy to price options written on a single stock. However, if a portfolio has a large number of stocks, then to price options on the portfolio, we have to deal with high-dimensional computation. For instance, pricing a call option on n stocks would involve n binomial trees, each accounting for one dimension. How are the time complexity and accuracy affected by the dimension? The same questions can be asked about Risk Management. Source: Peter Carr at Bank of America.

**Challenge: Mechanism Design**

Joan Feigenbaum - 10 May 2000

Develop a computational complexity of mechanism design, esp. distributed mechanisms. Find the right computational model(s), the right resource(s) to measure, the right notion(s) of reduction, canonical hard problem(s), etc.

References (to preliminary work along these lines):

- Nisan-Ronen, "Algorithmic Mechanism Design," STOC 1999.
- Feigenbaum-Papadimitriou-Shenker, "Sharing the Cost of Multicast Transmissions," STOC 2000.
- Nisan-Ronen, "Computationally Feasible VCG Mechanisms," Games 2000 (to appear).

**Challenge: Massive Data Set Algorithmics with Data Faults**

Joan Feigenbaum - 10 May 2000

[Thius challenge originally came from Sergey Brinn and Monika Henzinger of Google.COM.]

Searching and other web applications lead to data sets that are massive enough to make memory faults a problem, even with the extremely reliable components that we have today. Develop sorting algorithms and, more generally, a basic algorithmic toolkit that are robust in the presence of memory faults.

Example: Merge Sort. If you write a value v and later read a value v', you can wind up with two subarrays that should have been merged but weren't (because it is assumed [based on the erroneous read] that the least element in one exceeds the greatest element in the other). This is not a hypothetical problem: It occurs in Google's day-to-day operations.

Notes: (1) A robust algorithm need not produce a perfectly sorted sequence. If the value on which the memory fault occurred winds up out of order, that's ok; the goal is that it not cause a large subset to be out of order. More generally, the result just has to be "close to sorted," by some reasonable definition of "close." (2) The number of faults will be small in comparison to the length n of the sequence. An algorithm would be considered "robust" if it produced an "almost sorted" sequence when faced with O(log n) memory faults or perhaps even O(log log n).

**Challenge: One-Pass Massive Data Set Algorithmics**

Joan Feigenbaum - 10 May 2000

Massive data sets are increasingly important in a wide range of applications, including observational sciences, product marketing, and monitoring and operations of large systems. In network operations, raw data typically arrive in "streams," and decisions must be made by algorithms that make one pass over each stream, throw much of the raw data away, and produce small data structures for further processing. Moreover, network-generated massive data streams are often "distributed": Several different, physically separated network elements may receive or generate data streams that, together, comprise one logical data set; to be of use in operations, the streams must be analyzed locally and their synopses sent to a central operations facility. Develop a comprehensive algorithmic toolkit to deal with the enormous scale, distributed nature, and one-pass processing requirements on network-generated massive data streams.

References (to preliminary work along these lines):

- Alon-Matias-Szegedy, "The space complexity of approximating frequency moments," STOC 1996
- Alon-Gibbons-Matias-Szegedy, "Tracking Join and Self-Join Sizes in Limited Storage," PODS 1999
- Broder-Charikar-Frieze-Mitzenmacher, "Min-wise independent permutations," STOC 1998
- Feigenbaum-Kannan-Strauss-Viswanathan, "An Approximate L1-Difference Algorithm for Massive Data Streams," FOCS 1999
- Feigenbaum-Kannan-Strauss-Viswanathan, "Testing and Spot-Checking of Data Streams," SODA 2000
- Henzinger-Raghavan-Rajagopalan, "Computing on Data Streams," DEC SRC TR 1998-011

- Kleinberg, "Authoritative Sources in a Hyperlinked Environment," SODA 1998.
- Drineas, Frieze, Kannan, Vempala, and Vinay, "Clustering in large graphs and matrices," SODA 1999.

**Challenge: Public-Key Watermarking**

Joan Feigenbaum - 10 May 2000

Develop a public-key watermarking scheme, or prove that none can exist.

For an overview of and introduction to watermarking, see Matheson, Mitchell, Shamoon, Tarjan, and Zane, "Security of Digital Watermarking," Financial Cryptography 1998.

**Challenge: Real-World Multiparty Function Computation**

Joan Feigenbaum - 10 May 2000

Develop efficient, secure, multiparty protocols for specific functions of interest. Take the network topology (i.e., communication paths between parties) into account in designing the protocol.

References (to two recent protocols of this form):

- Naor-Pinkas-Sumner, "Privacy Preserving Auctions and Mechanism Design," ACM Conf. on E-Commerce 1999
- Lindell-Pinkas, "Privacy Preserving Datamining," Crypto 2000 (to appear).

**Challenge: "Good Enough" Intellectual Property Management**

Joan Feigenbaum - 10 May 2000

Critical to the development of electronic commerce is the management of digital Intellectual Property (IP). Technology has challenged the status quo of IP-management in many ways. Widespread use of personal computers and Internet communication creates vast opportunities for producers, distributors, and consumers of digital IP of all forms, but it also threatens to make copying and modification of that IP completely uncontrollable. Oft-repeated, accepted wisdom dictates that it is, of course, impossible to prevent copying of digital material that is delivered to ordinary PCs, because a determined adversary can always "capture the bits at the rendering stage," but that, of course, it's not necessary to protect the material perfectly -- a protection scheme that is "good enough for the business model" will do. Develop the tools to make rigorous statements about "good enough" IP-management.

For a high-level introduction to the interplay of technology and business models in digital copyright challenges, see Ch. 5 of: "The Digital Dilemma: Intellectual Property in the Information Age," http://books.nap.edu/html/digital_dilemma/

**Challenge: Electronic Trading Systems**

Ming Kao - 18 May 2000

Design Internet-based trading systems or protocols with the following properties: (1) Far-away traders are not disadvantaged by communication delay (fairness). (2) Traders can verify that their orders are executed according to their instructions (anti-manipulation). (3) Traders can detect whether someone's trading program is winning excessively (market efficiency). Source: Shyam Sunder at Yale School of Management. Possible theoretical computer areas: distributed computing, cryptography, information dispersal, and computational learning.

**Challenge: Semantic Zero-Knowledge Proofs**

David Karger - 18 May 2000

Web searchers are typically looking for information "about X" where aboutness is a rather fuzzy notion. Any human can verify that a given document satisfies their information need. And cooperative information retrieval system can do a pretty good job of selecting documents that satisfy a specified information need. But suppose the given document costs money: how do you convince the human to pay for it without showing it to them and eliminating the incentive? Zero knowledge proofs suggest a means, but they work only when the language of discourse is boolean logic. They might be used to prove that a document contains certain words, but this proves nothing about whether the document even makes sense in English. We need something fuzzier. Develop a protocol that lets a server with a given document convince a client that the document is "about" topic X, without revealing the document to the client.

**Challenge: Protecting the Web from Quantum Computers**

David Johnson - 11 May 2000

The currently-used public key cryptosystems depend for their security on the intractability of factoring or computing discrete logs. The proposed public key cryptosystems based on hardness of the shortest vector problem do not appear to be practical. Are there practical public key cryptosystems that do not rely on the hardness of problems known to be solvable in polynomial time by quantum computers?