2025 Theses Doctoral
New Frontiers in String Algorithms and Nearest Neighbour Search
This thesis revolves arounds two main frameworks of problems: (i) String Problems, such as the Longest Increasing Subsequence (LIS), Longest Common Subsequence (LCS) and the Edit Distance (ED); and (ii) Approximate Nearest Neighbour Search Problems under different metrics.
For String Problems, we show the following results:* For ED, a constant approximation algorithm which runs in near-linear time. In particular, an algorithm for approximating the edit distance between two strings of length ๐ in time ๐ยนโบ^๐ up to a constant factor, for any ๐ > 0.
* For parametrized LIS, the first sub-polynomial (in the parameter) approximation. In particular, we show that for any ๐ โ โ and ๐ = ๐(1), there exists a (randomized) ๐ฏ๐ฐ๐ฏ-๐ข๐ฅ๐ข๐ฑ๐ต๐ช๐ท๐ฆ algorithm that, given a sequence of length ๐ with LIS โฅ ๐ ๐, approximates the LIS up to a factor of 1/๐โฐโฝยนโพ in ๐โฐโฝยนโพ ๐ time.
* For LCS, we give the first sub-polynomial approximation in linear time.
For Approximate Nearest Neighbour Search Problems, we show the following results:* The first instance of a~simple, practical algorithm that provably leverages data-dependent hashing to improve upon data-oblivious LSH.
* We introduce the notion of metric embeddings into similarity measures over โโ^๐, and use such new embeddings to develop Approximate Nearest Neighbor Search (ANN) algorithms which improve the approximation factor exponentially on the (longstanding) state-of-the-art on two important metrics:
* Edit distance over length-๐ strings: poly(log ๐) approximation;
* ๐_๐ over โ^๐, for ๐ > 2: ๐(log ๐) approximation (known to be asymptotically optimal in the relevant models of computation).
While the two frameworks might apriori feel unrelated, recent progress have shown they are ultimately intertwined. First and foremost, both frameworks have proven to be core problems in the area of fine-grained complexity, where conditional hardness has been shown under the Strong Exponential Time Hypothesis (SETH), among other conjectures. Second, the algorithmic techniques for solving the frameworks are inter-related, and we will use Similarity Search techniques to solve string problems and vice versa.
Subjects
Files
-
ShekelNosatzki_columbia_0054D_19433.pdf
application/pdf
2.38 MB
Download File
More About This Work
- Academic Units
- Computer Science
- Thesis Advisors
- Andoni, Alexandr
- Degree
- D.E.S., Columbia University
- Published Here
- November 12, 2025