Colloquium

The IDC CS Colloquium
 

Alon Kipnis - Detecting Some Human Edits of AI-Generated Text with Information Theory and Higher Criticism

We address the question of whether a given article is the output of a generative language model or perhaps includes some significant edits by a different author, possibly a human. For this problem, we develop a detection method that involves many perplexity tests for the origin of individual sentences, combining these multiple tests into a global test of significance using Higher Criticism (HC). As a by-product, we can identify sentences or other text chunks suspected as generated by a different mechanism than the language model.

18/05/2023 - 13:30

Aviv Gaon - Through the Looking Glass: The Hidden Impacts of Data Regulation

Developing Artificial Intelligence systems require access to masses of data. This notion is common knowledge for computer engineers and data analysts. Data regulation is essential to ensure our safety, privacy, and ownership rights. Regulating the amount of data, the quality, and the priority with which organizations can access data are paramount. However, as with other areas of law, regulation could result in unwarranted results. One impact of data regulation is incentivizing the usage of low-quality data that often demonstrates bias.

01/06/2023 - 13:30

Ran Balicer - AI-Driven Healthcare: Innovation in practice

מערכות בריאות בעולם הולכות ומפנימות שהסטטוס קוו אינו בר קיימא, ושנדרש עיצוב מחדש של שירותי הבריאות בעידן הנוכחי.

ההתקדמות הטכנולוגית המואצת מייצרת הזדמנויות אך המערכות הנוקשות מתקשות להסתגל.

30/03/2023 - 13:30

Tal Shapira - FlowPic: Encrypted Internet Traffic Classification is as Easy as Image Recognition

Internet traffic classification has been intensively studied over the past decade due to its importance for traffic engineering and cyber security. However, identifying the type of a network flow or a specific application become harder in recent years due to the use of encryption, e.g., by VPN and Tor.

11/05/2023 - 13:30

Teddy Lazebnik: A SAT-Pruned Explainable Machine Learning Model To Predict Acute Kidney Injury Following Open Partial Nephrectomy Treatment

A decision tree (DT) is one of the most popular and efficient techniques in data mining. Specifically, in the clinical domain, DTs have been widely used thanks to their relatively easy explainable nature, efficient computation time, and relatively accurate predictions. However, some DT constriction algorithms may produce a large tree-size structure which is difficult to understand and often leads to misclassification of data in the testing process due to poor generalization.

10/11/2022 - 13:30

Meitar Ronen - DeepDPM: Deep Clustering With an Unknown Number of Clusters

Deep Learning (DL) has shown great promise in the unsupervised task of clustering. That said, while in classical (i.e., non-deep) clustering the benefits of the nonparametric approach are well known, most deep-clustering methods are parametric: namely, they require a predefined and fixed number of clusters, denoted by K. When K is unknown, however, using model-selection criteria to choose its optimal value might become computationally expensive, especially in DL as the training process would have to be repeated numerous times.

26/05/2022 - 13:30