Data science has become prevalent in various fields that affect day-to-day lives, such as healthcare, banking, and the job market. The process of developing data science applications usually consists of several automatic systems that manipulate and prepare the data in different manners. Examples of automatic data manipulations and preparations include generating synthetic data, exploring it, repairing the data, and labeling it for machine learning. These systems can be highly complex and even data scientists can find it difficult to understand and verify their output. Moreover, uninformed use of these systems can lead to errors that may affect the quality of the results of such applications.
In the talk, I will highlight prominent challenges in the data science process and present three approaches for addressing them. In particular, I will present a solution that generates natural language explanations for query results, a tool for generating synthetic linked data, and a solution that explains complex queries using abstract database instances.
Amir Gilad is a postdoctoral researcher in the Database group at Duke University. He received his Ph.D. in Computer Science from Tel Aviv university under the supervision of Prof. Daniel Deutch. His work focuses on developing tools and algorithms that assist users in understanding and gaining insights into data and the systems that manipulate it. His research relates to classic database tools such as data provenance, as well as natural language processing, causal inference, and privacy. Amir is the recipient of the VLDB best paper award, the SIGMOD research highlight award, and the Google Ph.D. fellowship for Structured Data and Database Management.