Paweł Gawrychowski: "String algorithms in the read-only random access model"

×

Error message

  • Deprecated function: Creation of dynamic property LdapUserConf::$createLDAPAccounts is deprecated in LdapUserConf->load() (line 265 of /var/lib/drupal7/modules/ldap/ldap_user/LdapUserConf.class.php).
  • Deprecated function: Creation of dynamic property LdapUserConf::$createLDAPAccountsAdminApproval is deprecated in LdapUserConf->load() (line 266 of /var/lib/drupal7/modules/ldap/ldap_user/LdapUserConf.class.php).

A clean model for designing algorithms for processing strings is to assume a read-only random access to the input and measure the working space. This is particularly attractive when we consider big data, where even a linear time complexity might be not enough if the algorithm uses substantial amount of additional space on top of the input (the streaming model is another possibility, but it appears too restrictive for some natural problems). We consider three basic questions in such a model:

  1. multiple pattern matching (locate the occurrences of multiple patterns in a single text),
  2. computing the Lempel-Ziv factorization (factorize a given text into phrases such that every phrase is either a single letter or appears before),
  3. constructing the sparse suffix array (lexicographically sort a given subset of suffixes), and provide efficient (linear time or almost linear time) randomized solutions to all of them.

For multiple pattern matching, we generalize the well-known Karp-Rabin string matching to handle multiple patterns in $O(nlog(n)+m)$ time and $O(s)$ space, where n is the length of the text and m is the total length of the s patterns. Then we show how to use this to approximate the LZ77 factorization. For sparse suffix array, it is natural to seek a construction algorithm using only $O(b)$ words of space, where b is the number of interesting suffixes of the input that we would like to lexicographically sort. It has been an open problem if there exists a linear-time Monte Carlo algorithm for this problem, with the best known time bound being $O(nlog(b))$. We resolve it by designing such an algorithm, and also provide a deterministic verification procedure working in $O(n(log(b))^{\frac{1}{2}})$ time.

Joint work with Johannes Fischer, Travis Gagie, and Tomasz Kociumaka.

Date and Time: 
Thursday, April 27, 2017 - 13:30 to 14:30
Speaker: 
Pawel Gawrychowski
Location: 
IDC, C.110
Speaker Bio: 

Paweł Gawrychowski, Haifa University