Alon Kipnis

Profile picture

About

I am a Senior Lecturer (Assistant Professor) at the School of Computer Science at Reichman University. I completed my PhD in electrical engineering from Stanford University in 2017 under the supervision of Andrea Goldsmith. Between 2017-2021 I was a postdoctoral research scholar and a lecturer at the Department of Statistics at Stanford University advised by David Donoho.

My research is in the areas of mathematical statistics and information theory. I am particularly interested in applications of my research in machine learning and in ambitious data science projects.

I have held several industry positions alongside academia. Since 2020 I am the Head of AI at SenseIP. In 2017, I helped co-found Tarseir Systems after taking part in the first ever iteration of the Hacking for Defence class at Stanford. I was also a research intern at Huawei in the summer of 2015 and an algorithm developer at Sandisk (now part of Western Digital) between 2011-2012.

I am looking for graduate students or postdocs with strong analytical and programing skills interested in working on theoretical and applied research projects in the broad area of machine learning or natural language processing. Specific topics include: authorship attribution, transfer learning, data compression, searching in large databases, feature/varaible/model selction, characterizing the distribution of artificial neural responses.

This coming fall (2022-2023) I will be teaching a new class titled Information-theoretic analysis of neural language models.

Recent Research Projects

one round of threshold adaptation

Mean Estimation from One-bit Measurements

This project considers the problem of estimating the mean of a symmetric log-concave distribution under the constraint that only a single bit per sample from this distribution is available to the estimator. We study the mean squared error as a function of the sample size (and hence the number of bits). We consider three settings: first, a centralized setting, where an encoder may release n bits given a sample of size n, and for which there is no asymptotic penalty for quantization; second, an adaptive setting in which each bit is a function of the current observation and previously recorded bits, where we show that the optimal relative efficiency compared to the sample mean is precisely the efficiency of the median; lastly, we show that in a distributed setting where each bit is only a function of a local sample, no estimator can achieve optimal efficiency uniformly over the parameter space. We additionally complement our results in the adaptive setting by showing that one round of adaptivity is sufficient to achieve optimal mean-square error.

Hamilton_vs_Madison

Discriminating Word-Frequency Tables and Testing Authorship

In this project, we adapt the Higher Criticism goodness-of-fit tests to measure the closeness between word-frequency tables and apply the resulting test to authorship attribution challenges, where the goal is to identify the author of a document using other documents whose authorship is known. The method is simple yet performs well without handcrafting and tuning; reporting accuracy at the state of the art level in various current challenges. As an inherent side effect, the HC calculation identifies a subset of discriminating words. In practice, the identified words have low variance across documents belonging to a corpus of homogeneous authorship. We conclude that in comparing the similarity of a new document and a corpus of a single author, HC is mostly affected by words characteristic of the author and is relatively unaffected by topic structure.

phase transition

Models for Detecting Rare and Weak Effects and Log-Chisquared P-values

Rare/Weak models for multiple hypothesis testing assume that only a small proportion of the tested hypotheses concern non-null effects and the individual effects are only moderately large, so that they generally do not stand out individually, for example in a Bonferroni analysis. Such models have been studied in quite a few settings, for example in some cases studies focused on underlying Gaussian means model for the hypotheses being tested; in some others, Poisson. It seems not to have been noticed before that such seemingly different models have asymptotically the following common structure: Summarizing the evidence each test provides by the negative logarithm of its P-value, previous rare/weak model settings are asymptotically equivalent to detection where most negative log P-values have a standard exponential distribution but a small fraction of the P-values might have an alternative distribution which is moderately larger; we do not know which individual tests those might be, or even if there are any such. Moreover, the alternative distribution is approximately noncentral chisquared on one degree of freedom.

In this work, we characterize the asymptotic performance of several global testing procedures combining these P-values using a phase diagram analysis involving the log-chisquared mixture parameters. Interestingly, the log-chisquared approximation for P-values we use here is different from Bahadur's log-normal approximation which would be unsuitable for understanding Rare/Weak multiple testing models.

phase transition

Comparing Large Frequency Tables with sensitivity to Possible Rare and Weak Differences

My recent work shows that the HC test has interesting optimality properties when applied to detect differences between discrete distributions. In the most challenging situation, when the difference between the distributions is sparse and weak, the adapted HC test experiences a phase transition phenomenon.

It turns out that HC performs better than any other known test in terms of this phase transition analysis. Based on this insight, I developed and applied an HC-based test to classify text documents and detect authorship; see this page for more details. When applied to authorship attribution challenges, this technique performs as-well-as state-of-the-art methods but without handcrafting or tuning.

estimation from x

Estimation from compressed data

This project considers the performance in estimation or learning from datasets undergoing bit-level data compression. In general, it is difficult to apply existing results from statistics and learning theory to this situation because of the non-linearity introduced by the compression algorithm. To address this issue, we characterize the distributional connection between the lossy compressed representation of a high-dimensional signal using a random spherical code and the observation of this signal under an additive white Gaussian noise (AWGN). Our characterization shows that for a large class of estimators, the risk in estimating from the lossy compressed data can be obtained by analyzing the risk in estimating from an AWGN-corrupted version of the data. We demonstrate the usefulness of this connection by deriving many novel results for inference problems under compression constraints, including standard regression, sparse regression, and compressed-sensing under quantization or communication constraints.

ADX

Analog-to-digial compression

My PhD work provided a unified treatment of processing data under three fundamental information inhibiting processes: sampling, bit-level data compression, and random noise. While the effect of each of these processes has been well-understood before, my work shows that the combination of them undermines standard conceptions. The highlight of this work is a fundamental result showing that sampling at a rate smaller than Nyquist is optimal when the compression/quantization bitrate of the digital representation is restricted. This restriction may be the result of limited memory or communication rates as in self-driving cars, or limited processing power as in most modern data science applications.

In other words, while sampling at or above Nyquist is needed for optimality when the bitrate is unlimited (which is never the case in practice), for each finite bitrate there exists a new critical sampling rate that is smaller than Nyquist and also depends on the bitrate. Sampling at this new rate leads to optimal performance under the bitrate constraint.

Contact

C.121b
Efi Arazi School of Computer Science
Reichman University (IDC Herzliya)
Herzliya, Israel

alon.kipnis@runi.ac.il
kipnisal@alumni.standord.edu