Speeding up dynamic programming with applications to molecular biology

Title:

Speeding up dynamic programming with applications to molecular biology

Author(s):

Galil, Zvi
Giancarlo, Raffaele

Date:

1987

Type:

Technical reports

Department:

Computer Science

Permanent URL:

http://hdl.handle.net/10022/AC:P:11870

Series:

Columbia University Computer Science Technical Reports

Part Number:

CUCS30087

Abstract:

Consider the problem of computing E[j] = mit:! {D[k] + w(k, j)}, j = 1, ... , n, O~k~]l where w is a given weight function, D[D] is given and for every k = 1, ... , n, D[k] is easily computable from E[k]. This problem appears as a subproblem in dynamic programming solutions to various problems. Obviously, it can be solved in time O( n2 ), and for a general weight function no better algorithm is possible. We consider two dual cases that arise in applications: In the concave case, the weight function satisfies the quadrangle inequality: w(k,j) + w(l,j') ~ w(l,j) +w(k,j'), for all k ~ 1 ~ j ~ j'. In the convex case, the weight function satisfies the inverse quadrangle inequality. In both cases we show how to use the assumed property of w to derive an O( n log n) algorithm. Even better, lineartime algorithms are obtained if w satisfies the following additional closest zero property: for every two integers 1 and k, 1 < k, and real number a, the smallest zero of f(x) = w(l,x)  w(k,x)  a which is larger than 1 can be found in constant time. Surprisingly, the two algorithms are also dual in the following sense: Both work in stages. In the jth stage they compute Elj]. They maintain a set of candidates which satisfies the property that Elj] depends only on D[k] + w(k, j) for k's in the set. Moreover, each algorithm discards candidates from the set, and discarded candidates never rejoin the set. To be able to maintain such a set of candidates efficiently one uses the following "dual" data structures: a queue in the concave case and a stack in the convex case. The two algorithms speed up several dynamic programming routines that solve as a subproblem the problem above. The speedup is from O(n3 ) to O(n2Iogn) or O(n2 ). Applications include algorithms for comparing DNA sequences, algorithms for determining the secondary structure of RNA, and algorithms used in speech recognition and geology. One typical problem is the following: Given the cost of substituting any pair of symbols and a convex cost function g for gaps (where g(r) is the cost of a gap of size r), compute the modified edit distance between the two given sequences.

Subject(s):

Computer science
 Item views:

160
 Metadata:

text  xml