Recently, there has been a growing interest in harnessing the power of big datasets and modern machine learning for designing new scalable algorithms. This invites us to rethink the role of data in algorithm design: not just as the input to pre-designed algorithms, but also a factor that enters the algorithm design process itself, driving it in a strong and possibly automated manner. This talk will show how to leverage data and learning for better algorithm design in two fundamental areas: high-dimensional similarity search and efficient linear algebra. In particular, I will show the following:1. Using data-dependent compression, we obtain optimal compressed representations of high-dimensional Euclidean distances. This result marks the first improvement over classical data-oblivious compression schemes, which provably cannot match its performance.2. Using neural networks, we show how to learn to construct better data structures for high-dimensional similarity search.3. We then show how those techniques also give rise to fast algorithms for low rank approximation of matrices. Our algorithms are both proven analytically and implemented and validated empirically, showing improvements over previous methods on standard benchmark datasets.
Tal Wagner is a postdoctoral researcher in the Machine Learning Foundations group at Microsoft Research Redmond. His research interests are in designing algorithms for massive high-dimensional datasets and large-scale machine learning. He received his PhD from the EECS department at MIT in September 2020. Previously, he earned his MSc from the Weizmann Institute and BSc from the Technion. During his PhD, he spent time as a research intern in Microsoft, Amazon and VMware.