## Zvi Galil; Raffaele Giancarlo; Columbia University. Computer Science

- Title:
- Speeding up dynamic programming with applications to molecular biology
- Author(s):
- Galil, Zvi; Giancarlo, Raffaele; Columbia University. Computer Science
- Date:
- 1987
- Type:
- Technical reports
- Department:
- Computer Science
- Permanent URL:
- http://hdl.handle.net/10022/AC:P:11870
- Series:
- Columbia University Computer Science Technical Reports
- Part Number:
- CUCS-300-87
- Abstract:
- Consider the problem of computing E[j] = mit:! {D[k] + w(k, j)}, j = 1, ... , n, O~k~]-l where w is a given weight function, D[D] is given and for every k = 1, ... , n, D[k] is easily computable from E[k]. This problem appears as a subproblem in dynamic programming solutions to various problems. Obviously, it can be solved in time O( n2 ), and for a general weight function no better algorithm is possible. We consider two dual cases that arise in applications: In the concave case, the weight function satisfies the quadrangle inequality: w(k,j) + w(l,j') ~ w(l,j) +w(k,j'), for all k ~ 1 ~ j ~ j'. In the convex case, the weight function satisfies the inverse quadrangle inequality. In both cases we show how to use the assumed property of w to derive an O( n log n) algorithm. Even better, linear-time algorithms are obtained if w satisfies the following additional closest zero property: for every two integers 1 and k, 1 < k, and real number a, the smallest zero of f(x) = w(l,x) - w(k,x) - a which is larger than 1 can be found in constant time. Surprisingly, the two algorithms are also dual in the following sense: Both work in stages. In the j-th stage they compute Elj]. They maintain a set of candidates which satisfies the property that Elj] depends only on D[k] + w(k, j) for k's in the set. Moreover, each algorithm discards candidates from the set, and discarded candidates never rejoin the set. To be able to maintain such a set of candidates efficiently one uses the following "dual" data structures: a queue in the concave case and a stack in the convex case. The two algorithms speed up several dynamic programming routines that solve as a subproblem the problem above. The speed-up is from O(n3 ) to O(n2Iogn) or O(n2 ). Applications include algorithms for comparing DNA sequences, algorithms for determining the secondary structure of RNA, and algorithms used in speech recognition and geology. One typical problem is the following: Given the cost of substituting any pair of symbols and a convex cost function g for gaps (where g(r) is the cost of a gap of size r), compute the modified edit distance between the two given sequences.
- Subject(s):
- Computer science

- Item views:
- 127