A decision tree (DT) is one of the most popular and efficient techniques in data mining. Specifically, in the clinical domain, DTs have been widely used thanks to their relatively easy explainable nature, efficient computation time, and relatively accurate predictions. However, some DT constriction algorithms may produce a large tree-size structure which is difficult to understand and often leads to misclassification of data in the testing process due to poor generalization. Post pruning (PP) algorithms have been introduced to reduce the size of the tree structure with a minor (or not at all) decrease in the accuracy of classification while trying to improve the model's generalization. We propose a new Boolean satisfiability (SAT) based PP algorithm called SAT-PP. Our algorithm reduces the tree size while preserving the accuracy of the unpruned tree. We implemented our algorithm on medical-related classification data sets since in medical-related tasks we emphatically try to avoid decreasing the model's performance when better training is not an option. We compared the proposed algorithm with other PP algorithms and found similar generalization capabilities. Moreover, we show an application of SAT-PP for the prediction of acute kidney injury following open partial nephrectomy treatment.
Dr. Teddy Lazebnik is a Postdoctoral Research Associative at the Cancer Biology Department, Cancer Institute, University College London (UCL), UK. He obtained a PhD in Biomathematics from Ariel University following an MSc and BSc in Applied Mathematics from Bar Ilan University, Israel. Lazebnik's research focus is personalized treatment protocols on the individual and community level. He co-authored over 20 papers on this and similar topics. He also guest-edited several special issues and served as a reviewer for dozens of journal articles.