Saturday, 28 December 2013

Class NP and NP-Completeness

Class NP


  • The class of decision problems for which there is a polynomially bounded nondeterministic algorithm.


  • The class of problems that are "slightly harder" than P.


  • In general, an exponential number of subproblems, each of which can be solved or verified in polynomial time.


  • Being in class NP constitutes an upper bound on problem complexity.


Nondeterministic Turing Machine (NDTM)



  • Model a problem as a tree in which the nodes represent states, with time flowing from the root to the children. A leaf represents a solution.


  • In an NDTM, the computation splits whenever a guess is needed.
    • The tree has polynomial depth but is exponentially bushy.


  • "Guess" nodes are like OR nodes: if either of the children returns T, the "guess" node returns T.


  • NP-hard



  • A problem Q is NP-hard if every problem in NP is reducible to Q.


  • A lower bound on a problem: "at least as hard as any problem in NP."


  • NP-Complete



  • The hardest decision problems in NP.


  • If there were a polynomially bounded algorithm for an NP-complete problem, then there would be a polynomially bounded algorithm for each problem in NP.


  • Establishes both a lower and an upper bound (although it may be that P=NP).


  • If any NP-complete problem is in P, then P=NP.


  • Satisfiability (SAT)



  • Instance: Given a set U = {u1, ..., um} of variables and a collection C = {c1, ..., cn} of clauses over U.


  • Question: Is there a truth assignment to U that satisfies C?


  • A logical formula in conjunctive normal form (CNF): a conjunction of disjunctions
    e.g., (x1 + x2 + ¬x3) · (x1 + x3) · (¬x2)

    • easy to falsify: make all terms in one clause fail.


    • hard to satisfy: terms in clauses interact.



  • Cook's Theorem: SAT is NP-complete.

    • uses SAT to simulate a machine



  • Once the primordial NP-complete problem is found, we can prove other problems NP-complete without reference to machines:

    • To prove that problem Q is NP-complete, reduce SAT (or any other NP-complete problem) to Q:
            SAT <=P Q


    • E.g., 3SAT: like SAT, but | ci | = 3 for 1 <= i <= n.


    • Need to show that 3SAT is NP-hard:
      Since X <=P SAT for any problem X in NP, we need only show SAT <=P 3SAT.


    • Proof:
      General form of clauses in SAT is cj = (z1 + z2 + ... + zk), where each zi is either an input or the negation of an input.
      k = 1
      {z1 + z1 + z1}
      k = 2
      {z1 + z2 + z2}
      k = 3
      {z1 + z2 + z3}
      k >= 4
      Add auxiliary variables that transmit information between pieces.
      E.g., {z1 + z2 + z3 + z4 + z5} becomes {(z1 + z2 + y1) (¬y1 + z3 + y2) (¬y2 + z4 + z5)}
      The transformation is polynomial in the length of the SAT instance.

    • Note: 2SAT is solvable in polynomial time.



  • Other NP-Complete Problems


    • Graph coloring
      A coloring of a graph is the assignment of a "color" to each vertex of the graph such that adjacent vertices are not assigned the same color. The chromatic number of a graph G, X(G) [Chi(G)], is the smallest number of colors needed to color G.
      Optimization problem: Given an undirected graph G, determine X(G) (and produce an optimal coloring).
      Decision problem: Given G and a positive integer k, is there a coloring of G using at most k colors (i.e., is G k-colorable)?


    • Job scheduling with penalties
      Suppose we are given n jobs to be executed one at a time and, for each job, an execution time, deadline, and penalty for missing the deadline.
      Optimization problem: Determine the minimum possible penalty (and find an optimal schedule).
      Decision problem: Given a nonnegative integer k, is there a schedule with penalty < k?
      Note: Job scheduling without penalties such that at most k jobs miss their deadlines is in P.


    • Multiprocessor scheduling
      Suppose we are given a deadline and a set of tasks of varying length to be performed on two identical processors.
      Optimization problem: Determine an allocation of the tasks to the two processors such that the deadline can be met.
      Decision problem: Can the tasks be arranged so that the deadline is met?


    • Bin packing
      Optimization problem: Given an unlimited number of bins, each of size 1, and n objects with sizes s1, ..., sn where 0 < si < 1, determine the smallest number of bins into which the objects can be packed (and find an optimal packing).
      Decision problem: Given, in addition to the inputs described above, an integer k, do the objects fit in k bins?


    • Knapsack problem
      Optimization problem: Given a knapsack of capacity C and n objects with sizes s1, ..., sn and "profits" p1, ..., pn, find the largest total profit of any subset of the objects that fits in the knapsack ( and find a subset that achieves the maximum profit).
      Decision problem: Given k, is there a subset of the objects that fits in the knapsack and has total profit at least k?


    • Subset sum problem
      Optimization problem: Given a positive integer C and n objects with positive sizes s1, ..., sn, among subsets of the objects with sum at most C, what is the largest subset sum?
      Decision problem: Is there a subset of the objects whose sizes add up to exactly C?
      The subset sum problem is a simpler version of the knapsack problem, in which the profit for each object is the same as its size.


    • Partition
      Suppose we are given a set of integers.
      Decision problem: Can the integers be partitioned into two sets whose sum is equal?


    • Hamiltonian cycles and Hamiltonian paths
      A Hamiltonian cycle (path) is a simple cycle (path) that passes through every vertex of a graph exactly once.
      Decision problem: Does a given undirected graph have a Hamiltonian cycle (path)?
      Note: An Euler tour -- a cycle that traverses each edge of a graph exactly once -- can be found in O(e) time.


    • Traveling salesperson problem (TSP) or minimum tour problem
      Optimization problem: Given a complete, weighted graph, find a minimum-weight Hamiltonian cycle.
      Decision problem: Given a complete, weighted graph and an integer k, is there a Hamiltonian cycle with total weight at most k?
      A complete graph has an edge between each pair of vertices.


    • Vertex cover
      A vertex cover for an undirected graph G is a subset V' of vertices such that each edge in G is incident upon some vertex in V'.
      Optimization problem: Find a vertex cover for G with as few vertices as possible.
      Decision problem: Given an integer k, does G have a vertex cover consisting of k vertices?
      Note: The edge cover problem is in P.


    • Clique
      A clique is a subset V' of vertices in an undirected graph G such that every pair of distinct vertices in V' is joined by an edge in G (i.e., the subgraph induced by V' is complete). A clique with k vertices is called a k-clique.
      Optimization problem: Find a clique with as many vertices as possible.
      Decision problem: Given an integer k, does G have a k-clique?


    • Independent set
      An independent set is a subset V' of vertices in an undirected graph G such that no pair of vertices in V' is joined by an edge of G.
      Optimization problem: Find an independent set with as many vertices as possible.
      Decision problem: Given an integer k, does G have an independent set consisting of k vertices?


    • Feedback edge set
      A feedback edge set in a digraph G is a subset E' of edges such that every cycle in G has an edge in E'.
      Optimization problem: Find a feedback edge set with as few edges as possible.
      Decision problem: Given an integer k, does G have a feedback edge set consisting of k edges?
      Note: The feedback edge set for undirected graphs is in P.


    • Longest simple path problem
      Optimization problem: What is the longest simple path between any two vertices in graph G?
      Decision problem: Given an integer k, is there a path in graph G longer than k?
      Note: The shortest path between any two vertices can be found in O(ne) time.


    • Subgraph isomorphism
      Decision problem: Given two graphs G1 and G2, determine if G1 is isomorphic to a subgraph of G2.


    • Integer linear programming
      Linear programming is a method of analyzing systems of linear inequalities to arrive at optimal solutions to problems
      Decision problem: Given a linear program, is there an integral solution?


    References
    • Baase and Van Gelder, Computer Algorithms, 3e (Addison Wesley Longman, 2000)
    • Sedgewick, Algorithms (Addison Wesley, 1983)
    • Cormen, Leiserson, Rivest, and Stein, Introduction to Algorithms, 2e (MIT Press/McGraw Hill, 2001) The authors provide NP-completeness proofs reducing 3SAT to CLIQUE to VERTEX-COVER to HAMILTONIAN-CYCLE to TSP, and 3SAT to SUBSET-SUM.