PD-Explain: A Unified Python-native Framework for Query Explanations Over DataFrames

Abstract

Interfaces that rely on the Python programming language have become a popular tool for data analysis and exploration. In particular, the Pandas library allows users to query, manipulate, and visualize data in an easy and intuitive manner. However, users who perform such manipulations over the data in the exploratory process may struggle to justify their results, or understand which part (if any) of the obtained results is interesting and why. To handle such scenarios we developed PD-Explain, a Python library that adapts multiple prevalent query explanation approaches from the literature, and makes them accessible to Pandas users. PD-Explain is seamlessly integrated with Pandas and contains explanation functions that users can employ to choose the explanation approach they wish to use along with the necessary parameters in order to get the explanation in the suitable form. PD-Explain further allows users to automatically detect the interesting parts of a query result and get a visualization of the explanation accompanied by a Natural Language description. Our demonstration will include four different types of query result explanations and three real world datasets with appropriate analysis tasks that will highlight the intuitive nature and usefulness of PD-Explain in data exploration tasks.

Publication
In PVLDB 17(12), 2024
Amir Gilad
Amir Gilad
Assistant Professor

The Hebrew University