1987 Reports

# Speeding up dynamic programming with applications to molecular biology

Consider the problem of computing E[j] = mit:! {D[k] + w(k, j)}, j = 1, ... , n, O~k~]-l where w is a given weight function, D[D] is given and for every k = 1, ... , n, D[k] is easily computable from E[k]. This problem appears as a subproblem in dynamic programming solutions to various problems. Obviously, it can be solved in time O( n2 ), and for a general weight function no better algorithm is possible. We consider two dual cases that arise in applications: In the concave case, the weight function satisfies the quadrangle inequality: w(k,j) + w(l,j') ~ w(l,j) +w(k,j'), for all k ~ 1 ~ j ~ j'. In the convex case, the weight function satisfies the inverse quadrangle inequality. In both cases we show how to use the assumed property of w to derive an O( n log n) algorithm. Even better, linear-time algorithms are obtained if w satisfies the following additional closest zero property: for every two integers 1 and k, 1 < k, and real number a, the smallest zero of f(x) = w(l,x) - w(k,x) - a which is larger than 1 can be found in constant time. Surprisingly, the two algorithms are also dual in the following sense: Both work in stages. In the j-th stage they compute Elj]. They maintain a set of candidates which satisfies the property that Elj] depends only on D[k] + w(k, j) for k's in the set. Moreover, each algorithm discards candidates from the set, and discarded candidates never rejoin the set. To be able to maintain such a set of candidates efficiently one uses the following "dual" data structures: a queue in the concave case and a stack in the convex case. The two algorithms speed up several dynamic programming routines that solve as a subproblem the problem above. The speed-up is from O(n3 ) to O(n2Iogn) or O(n2 ). Applications include algorithms for comparing DNA sequences, algorithms for determining the secondary structure of RNA, and algorithms used in speech recognition and geology. One typical problem is the following: Given the cost of substituting any pair of symbols and a convex cost function g for gaps (where g(r) is the cost of a gap of size r), compute the modified edit distance between the two given sequences.

## Subjects

## Files

- CUCS-300-87.pdf application/pdf 542 KB Download File

## More About This Work

- Academic Units
- Computer Science
- Publisher
- Department of Computer Science, Columbia University
- Series
- Columbia University Computer Science Technical Reports, CUCS-300-87
- Published Here
- December 2, 2011