In this talk, I'll present examples of mathematical structures emerging from theoretical and empirical observations in machine learning.
1. Theoretical:
Machine learning and information-theoretic tasks are in some sense equivalent since both involve identifying patterns and regularities in data. To recognize an elephant, a child (or a neural network) observes the repeating pattern of big ears, a trunk, and grey skin. To compress a book, a compression algorithm searches for highly repeating letters or words. So the high-level question we rigorously answer is:
When is learning equivalent to compression?
2. Empirical:
Neural networks (NNs) are an empirically successful phenomenon. Toward a better understanding, it's practical to start with simple experiments. For example, consider a NN at initialization. We ask:
How does the geometric representation of a dataset change after the application of each randomly initialized layer of a NN?
For fully connected NNs, the representation does not depend on the nature of input given to the NN. However, for convolutional NNs, we empirically observe two distinct behaviors: for natural images, the geometry is almost preserved; for artificial data (Gaussian), the geometry roughly collapses to a single point after enough layers. To explain this discrepancy, we extend the celebrated Johnson--Lindenstrauss lemma.
3. In the last part of the talk, I'll show how the theoretical and empirical perspectives interact in a future line of research that studies the generalization power of NNs.
Ido Nachum holds a BSc in aerospace engineering and worked in RAFAEL Ltd. (Atuda military service) while completing his MSc in pure math, studying measured group theory. He continued in pure math for his PhD and studied learning theory questions from an information theory perspective. He is currently a postdoctoral researcher in the School of Computer Science at EPFL, focusing on mathematical questions that arise from artificial or biological neural computation.