CMPS 260
P vs. NP; NP-Completeness

In the most recent part of the course, we have been studying the distinction between (decision) problems (equivalently, languages) according to whether or not they are decidable/recursive. Among those languages that are not decidable/recursive, we split them further according to whether or not they are recursively enumerable (aka semi-decidable). This branch of theoretical computer science (TCS) is called computability theory.

Another branch of TCS, complexity theory, attempts to distinguish between decidable/recursive languages/problems according to whether or not they are tractable. An intractable problem would be one for which there is no algorithmic solution that, except when applied to a very small amount of input, would produce a result in a reasonable amount of time.

Although it is a somewhat crude measure, the distinction is based upon whether or not a problem/language has a (deterministic) algorithm that runs in polynomial time, meaning one in the complexity class O(n^k), for some k. Problems for which a "best" algorithm runs in exponential time, O(cⁿ) for some constant c, are considered to be intractable.

This is not a perfect distinction, because it would seem to be odd to classify a problem solvable in no better than O(n¹⁰⁰) time as being tractable, as n¹⁰⁰ is a huge number even when n is small. But such problems do not seem to arise in practice. Indeed, one does not often see exponents of more than, say 4.

On the other hand, it might be unfair to classify a problem that has a O(1.00000001ⁿ)-time solution as being intractable, even though that is an exponential running time (and thus, for some threshold n₀, 1.00000001ⁿ > n¹⁰⁰ for all n≥n₀. Again, such problems do not seem to arise in practice.

It turns out that there is an interesting class of problems that is widely believed to be intractable, but no one has been able to prove it. These are the NP-Complete problems.

Definition: P is the set of decision problems (i.e., languages) for which there exist deterministic polynomial-time algorithms, i.e., ones whose runtime complexity is O(N^k), for some k.

Definition: NP is the set of decision problems (i.e., languages) for which there exist nondeterministic polynomial-time algorithms, i.e., ones whose runtime complexity is O(N^k), for some k.
To clarify, such an algorithm that decides language L would, when given string w as input, halt in no more f(|w|) steps (where f(n) is in the class O(n^k)) and output either YES or NO. In the case that w ∉ L, it will always output NO. In the case that w ∈ L, there must be at least one computation path that results in it producing YES. (Recall that we have this weird notion that a nondeterministic algorithm will always make a sequence of choices leading to acceptance, if that is possible.)

Of course, every deterministic algorithm is a special case of a nondeterministic algorithm, and thus P ⊆ NP.

There are many interesting problems/languages that are easily shown to be members of NP, but for which we do not know whether they are members of P. Frequently, such problems can be stated in these terms:

Given as input the string x, does the entity/object encoded by x satisfy some particular property of interest?

The language being decided is, of course,

L = { x | the entity encoded by x has the property of interest }

Examples:

CLIQUE: Given a string x encoding the ordered pair <G,k>, where G is an undirected graph and k is a natural number, does there exist a clique of size k in G? (A clique is a collection of vertices each of which is connected to all the others by an edge.)
HAMILTONIAN PATH: Given a string x encoding an undirected graph G, does G have a Hamiltonian Path (a path that visits every vertex exactly once)? (Recall that a path cannot "traverse" any edge more than once.)
SATISFIABILITY': Given a boolean expression E (involving variables and the boolean operators ∧, ∨, and ¬) does there exists a satisfying assignment to the variables, i.e., one that makes E evaluate to TRUE?
E.g., (a ∨ b) ∧ (¬a ∨ ¬b) is satisfiable, but (a ∨ ¬b) ∧ ¬a ∧ (a ∨ b) is not.

Nondeterministic algorithms for such problems are typically expressed as having two phases:

Guess a certificate
(Deterministically) use the certificate to try to verify that the entity described by the input string has the property of interest.

For example, with respect to the problems listed above, a nondeterministic algorithm for CLIQUE can, during its first phase, "guess" a subset of size k of the vertices of G; then, in phase 2, determine whether or not that set of vertices actually forms a clique. If it does, the algorithm answers YES; otherwise it answers NO. If a clique of size k exists in G, there is at least one guess that will lead to YES being produced; hence the algorithm satisfies the requirement of nondeterministically deciding the language.

Note also that both phases can be accomplished in polynomial time (with respect to the length of the encoding <G,k>), so that this algorithm satisfies its running time bounds.

Nondeterministic algorithms of a similar nature exist for the HAMILTONIAN PATH and SATISFIABILITY' problems.

So these three problems, and a number of others, are in the class NP, but are not known to be in P.

Problem Reduction

Problem A is said to be reducible to problem B if A can be solved in this way:

Given string x (an instance of problem A), convert it to x', an instance of problem B.
Feed x' to an algorithm that solves problem B.
Convert the solution produced by the previous step into the solution for the original instance (i.e., x) of problem A.

For decision problems (in which the question to be answered is whether or not x is a member of some language L), reductions often take this form:

Given a string x to be tested for membership in L, convert it to x'.
Feed x' to a decision algorithm for language L'.
Convert the answer to the question "Is x' ∈ L'?" (supplied in the previous step) into the answer to the question "Is x ∈ L?".

For this to work, it is necessary for there to exist a computable "L to L' conversion function" f : Σ^* ⟶ Σ'^* (Σ and Σ' being the alphabets of languages L and L', respectively) such that, for all x ∈ Σ^*, x ∈ L iff f(x) ∈ L'.

If such a solution to language L exists, we say that L is reducible to L', sometimes written L ≤ L'.

Example: As a simple example, take the languages L₁ = { x ∈ {a,b}^* | #_a(x) = #_b(x) } and L₂ = { aⁿbⁿ | n≥0 }. Let f() be the function that, applied to a string composed of a's and b's, rearranges them so that it is of the form a^kb^m. Clearly, f is a computable function.

Given a string x ∈ {a,b}^*, we can answer the question "Is x ∈ L₁?" by answering the question "Is f(x) ∈ L₂?". Hence L₁ ≤ L₂. End of example.

Theorem: Suppose that L ≤ L'. Then

If L' is recursive/decidable, so is L
If L is not recursive/decidable, neither is L'

Thus, to prove that a language L' is undecidable, the standard approach is to show that L ≤ L' for some (known-to-be) undecidable L.

Polynomial-time reducibility:

If the "conversion function" f() mentioned above has the property that it is computable in polynomial time, then we say that L is polynomial-time reducible to L', often denoted L ≤_p L'.

Significantly, L ≤_p L' implies that if L' is solvable in polynomial time, then so is L. Another significant fact is that ≤_p is a transitive relation. That is, if L₁ ≤_p L₂ and L₂ ≤_p L₃, then L₁ ≤_p L₃.

Stephen Cook and (Leonid?) Levin proved that every problem/language in NP is polynomial-time reducible to SAT! Cook did so by showing that, for any fixed TM M that is guaranteed to halt in polynomial time, given a string w, you can (in polynomial time in |w|) construct a CNF formula E that is satisfiable iff M accepts w (i.e., there is an accepting computation in M of w).

Thus, SAT is in the class NP-Complete, which includes any language that is both in NP and is such that every problem in NP is polynomial-time reducible to it.

Among the conclusions that can be drawn are

Let L be a language in NP. If SAT ≤_p L, then, by transitivity of ≤_p, every language L' in NP is such that L' ≤_p L (making L an NP-complete language).
Let L be any NP-Complete language, and let L' be in NP. Then if L ≤_p L', it follows that L' is NP-Complete, too (by transitivity of ≤_p). The point here is that any NP-Complete language (as opposed to only SAT) can be used to show that other languages are also NP-Complete.
If any NP-Complete language is in P, then all of them are!

Over the years, thousands of NP-Complete languages have been identified, many of practical interest. But no one has ever devised a polynomial-time algorithm for any of them. Yet the question as to whether P = NP is still unsolved.

In 1972, Richard Karp identified about 20 NP-Complete problems, which set the ball rolling.

Examples of polynomial-time reductions

showing languages to be NP-Complete.

CNF-SAT ≤_p 3-SAT (MacCormick pp 288-289, Linz p. 371): Given a CNF boolean expression, replace each clause (x₁ ∨ x₂ ∨ ·· ∨ x_m) (where m>3) by

(x₁ ∨ x₂ ∨ z₁) ∧ (¬z₁ ∨ x₃ ∨ z₂) ∧ (¬z₂ ∨ x₄ ∨ z₃) ∧ ··· (¬z_m-3 ∨ x_m-1 ∨ x_m)

3-SAT ≤_p CLIQUE (MacCormick p. 362, Linz p. 372):

CLIQUE ≤_p VERTEX COVER (MacCormick, p. 363-364): Recall that a clique in a graph G = (V,E) is a subset C⊆V of its vertices such that every pair of vertices in C is connected by an edge. A vertex cover of a graph is a subset S of its vertices such that at least one endpoint of every edge is a member of S.
Theorem: Let G be a graph having N vertices. Then G has a clique of size k if and only if G^c (the complement of G) has a vertex cover of size N-k.

CMPS 260 P vs. NP; NP-Completeness

Problem Reduction

Polynomial-time reducibility:

Examples of polynomial-time reductions

CMPS 260
P vs. NP; NP-Completeness