Ofir Lindenbaum - Machine Learning for Scientific Discovery

×

Error message

  • Deprecated function: Creation of dynamic property LdapUserConf::$createLDAPAccounts is deprecated in LdapUserConf->load() (line 265 of /var/lib/drupal7/modules/ldap/ldap_user/LdapUserConf.class.php).
  • Deprecated function: Creation of dynamic property LdapUserConf::$createLDAPAccountsAdminApproval is deprecated in LdapUserConf->load() (line 266 of /var/lib/drupal7/modules/ldap/ldap_user/LdapUserConf.class.php).

The computational resource growth in natural science motivates the use of machine learning for automated scientific discovery. However, unstructured empirical datasets are often high dimensional, unlabeled, and imbalanced. Therefore, discarding irrelevant (i.e., noisy and information-poor) features is essential for the automated discovery of governing parameters in scientific environments. To address this challenge, I will present Gaussian Stochastic Gates (STG), which rely on a probabilistic relaxation of the L0 norm of the number of selected features. By applying the Stochastic Gates to a neural network's input layer, I will derive a flexible, fully differentiable model that simultaneously identities the most relevant features and learns complex nonlinear models. The STG neural network outperforms the state-of-the-art feature selection methods, both in terms of predictive power and its ability to correctly identify the correct subset of informative features. The model was successfully applied for critical biological tasks such as COX proportional hazards model and differential expression analysis on HIV and Melanoma patients. Next, using a linear model, I will provide a theoretical basis for optimizing the STG objective using small batches (i.e., SGD). In particular, I will present an approximation bound for estimating an unknown signal based on noisy observations. Finally, I will show an extension of the STG model for unsupervised feature selection. The new model is trained to select features with high correlation with the leading eigenvectors of a gated graph Laplacian. The gating mechanism allows us to re-evaluate the Laplacian for different subsets of features and unmask informative structures buried by nuisance features. I will demonstrate that the proposed approach outperforms several unsupervised feature selection baselines.

Date and Time: 
Thursday, January 21, 2021 - 13:30 to 14:30
Speaker: 
Ofir Lindenbaum
Location: 
Zoom
Speaker Bio: 

Ofir Lindenbaum is a Term assistant professor at Yale University working with Prof. Ronald R. Coifman. He received his B.Sc. in Electrical Engineering and Physics (both summa cum laude) from the Technion. Ofir earned his Ph.D. and M.Sc. in Electrical Engineering from Tel Aviv University. His research is focused on the theory and practice of machine learning. His main goal is to enable the practical use of machine learning algorithms for scientific discovery. He is currently working on problems related to feature selection, feature extraction, and generative modeling. Ofir is the recipient of several awards, including the Weinstein prize for graduate studies and the Trotsky foundation award for outstanding Ph.D. students.