Theses Doctoral

New Frontiers in String Algorithms and Nearest Neighbour Search

Shekel Nosatzki, Negev

This thesis revolves arounds two main frameworks of problems: (i) String Problems, such as the Longest Increasing Subsequence (LIS), Longest Common Subsequence (LCS) and the Edit Distance (ED); and (ii) Approximate Nearest Neighbour Search Problems under different metrics.

For String Problems, we show the following results:* For ED, a constant approximation algorithm which runs in near-linear time. In particular, an algorithm for approximating the edit distance between two strings of length ๐‘› in time ๐‘›ยนโบ^๐œ€ up to a constant factor, for any ๐œ€ > 0.

* For parametrized LIS, the first sub-polynomial (in the parameter) approximation. In particular, we show that for any ๐‘› โˆˆ โ„• and ๐œ† = ๐‘œ(1), there exists a (randomized) ๐˜ฏ๐˜ฐ๐˜ฏ-๐˜ข๐˜ฅ๐˜ข๐˜ฑ๐˜ต๐˜ช๐˜ท๐˜ฆ algorithm that, given a sequence of length ๐‘› with LIS โ‰ฅ ๐œ† ๐‘›, approximates the LIS up to a factor of 1/๐œ†โฐโฝยนโพ in ๐‘›โฐโฝยนโพ ๐œ† time.

* For LCS, we give the first sub-polynomial approximation in linear time.

For Approximate Nearest Neighbour Search Problems, we show the following results:* The first instance of a~simple, practical algorithm that provably leverages data-dependent hashing to improve upon data-oblivious LSH.

* We introduce the notion of metric embeddings into similarity measures over โ„โ‚Š^๐‘š, and use such new embeddings to develop Approximate Nearest Neighbor Search (ANN) algorithms which improve the approximation factor exponentially on the (longstanding) state-of-the-art on two important metrics:

* Edit distance over length-๐‘˜ strings: poly(log ๐‘˜) approximation;

* ๐“_๐‘ over โ„^๐‘‘, for ๐‘ > 2: ๐‘‚(log ๐‘) approximation (known to be asymptotically optimal in the relevant models of computation).

While the two frameworks might apriori feel unrelated, recent progress have shown they are ultimately intertwined. First and foremost, both frameworks have proven to be core problems in the area of fine-grained complexity, where conditional hardness has been shown under the Strong Exponential Time Hypothesis (SETH), among other conjectures. Second, the algorithmic techniques for solving the frameworks are inter-related, and we will use Similarity Search techniques to solve string problems and vice versa.

Files

  • thumbnail for ShekelNosatzki_columbia_0054D_19433.pdf ShekelNosatzki_columbia_0054D_19433.pdf application/pdf 2.38 MB Download File

More About This Work

Academic Units
Computer Science
Thesis Advisors
Andoni, Alexandr
Degree
D.E.S., Columbia University
Published Here
November 12, 2025