Alon Kipnis - Detecting Some Human Edits of AI-Generated Text with Information Theory and Higher Criticism

×

Error message

  • Deprecated function: Creation of dynamic property LdapUserConf::$createLDAPAccounts is deprecated in LdapUserConf->load() (line 265 of /var/lib/drupal7/modules/ldap/ldap_user/LdapUserConf.class.php).
  • Deprecated function: Creation of dynamic property LdapUserConf::$createLDAPAccountsAdminApproval is deprecated in LdapUserConf->load() (line 266 of /var/lib/drupal7/modules/ldap/ldap_user/LdapUserConf.class.php).

We address the question of whether a given article is the output of a generative language model or perhaps includes some significant edits by a different author, possibly a human. For this problem, we develop a detection method that involves many perplexity tests for the origin of individual sentences, combining these multiple tests into a global test of significance using Higher Criticism (HC). As a by-product, we can identify sentences or other text chunks suspected as generated by a different mechanism than the language model. Our method is motivated by a statistical model for edited text saying that sentences are mostly generated by sampling from a specific language model, except perhaps for a few sentences that might have originated via a different mechanism. We use synthetic data and real-world examples to demonstrate that our method effectively distinguishes between machine-generated and human-edited documents and identifies parts of documents that were edited by humans.

Date and Time: 
Thursday, May 18, 2023 - 13:30 to 14:30
Speaker: 
Alon Kipnis
Location: 
C110
Speaker Bio: 

Alon Kipnis is a Senior Lecturer at the School of Computer Science at Reichman University. He received his Ph.D. degree in electrical engineering from Stanford University in 2017. Between 2017-2021, he was a postdoctoral research fellow and a lecturer in the Department of Statistics at Stanford University hosted by David Donoho and funded by the Koret Foundation. His research interests include mathematical statistics, statistical learning, and information theory.