Hubert Baniecki

h.baniecki (at) uw.edu.pl

I am a 4th (last) year PhD student in Computer Science at the University of Warsaw, advised by Przemyslaw Biecek. During my PhD, I have been a visiting researcher at LMU Munich, hosted by Bernd Bischl (2024) and Eyke Hüllermeier (2025). Previously, I earned a Master’s degree in Data Science from Warsaw University of Technology.

My research focuses on machine learning interpretability and explainable AI:

interpreting multimodal, vision–language models like CLIP (NeurIPS 2025),
statistical foundations of explainable machine learning (ICLR 2025 Spotlight),
open-source software (JMLR 2021) and benchmarks (NeurIPS 2024) in this domain,
applications of interpretability in medicine (WACV 2025) and beyond (PNAS 2024).

I actively contribute to the academic community by serving as a reviewer for conferences like NeurIPS, ECML, ICLR, with their workshops on interpretability/xAI, and journals like the Journal of Machine Learning Research, Machine Learning, Nature Communications.

I am on the job market for positions starting in 2026/2027; feel free to contact me if there is an opportunity that fits.

recent news [previous]

2025 Nov	I stay in Italy until December for a 1-month research visit at the University of Pisa hosted by Riccardo Guidotti.
2025 Sep	A paper Explaining similarity in vision-language encoders with weighted Banzhaf interactions is accepted at NeurIPS 2025.
2025 Sep	A paper Birds look like cars: Adversarial analysis of intrinsically interpretable deep learning is accepted for publication in the Machine Learning journal.
2025 May	Foundation for Polish Science awarded me the START scholarship for young scientists.
2025 May	A paper Interpreting CLIP with hierarchical sparse autoencoders is accepted at ICML 2025.
2025 Mar	I stay in Germany until April for a 1-month research visit at LMU Munich hosted by Eyke Hüllermeier.
2025 Jan	A paper Efficient and accurate explanation estimation with distribution compression is accepted as a Spotlight at ICLR 2025 (notable 5% of submissions).
2024 Nov	A paper Increasing phosphorus loss despite widespread concentration decline in US rivers is published in the Proceedings of the National Academy of Sciences.

selected publications [full list]

NeurIPS

Explaining similarity in vision-language encoders with weighted Banzhaf interactions

H. Baniecki, M. Muschalik, F. Fumagalli, B. Hammer, E. Hüllermeier, P. Biecek

Advances in Neural Information Processing Systems (NeurIPS), 2025
We introduce faithful interaction explanations of CLIP and SigLIP models (FIxLIP), offering a unique, game-theoretic perspective on interpreting image–text similarity predictions.

Abstract Paper arXiv Code

Language-image pre-training (LIP) enables the development of vision-language models capable of zero-shot classification, localization, multimodal retrieval, and semantic understanding. Various explanation methods have been proposed to visualize the importance of input image-text pairs on the model's similarity outputs. However, popular saliency maps are limited by capturing only first-order attributions, overlooking the complex cross-modal interactions intrinsic to such encoders. We introduce faithful interaction explanations of LIP models (FIxLIP) as a unified approach to decomposing the similarity in vision-language encoders. FIxLIP is rooted in game theory, where we analyze how using the weighted Banzhaf interaction index offers greater flexibility and improves computational efficiency over the Shapley interaction quantification framework. From a practical perspective, we propose how to naturally extend explanation evaluation metrics, such as the pointing game and area between the insertion/deletion curves, to second-order interaction explanations. Experiments on the MS COCO and ImageNet-1k benchmarks validate that second-order methods, such as FIxLIP, outperform first-order attribution methods. Beyond delivering high-quality explanations, we demonstrate the utility of FIxLIP in comparing different models, e.g. CLIP vs. SigLIP-2.
ICML

Interpreting CLIP with hierarchical sparse autoencoders

V. Zaigrajew, H. Baniecki, P. Biecek

International Conference on Machine Learning (ICML), 2025
We introduce the Matryoshka sparse autoencoder (MSAE) that establishes a state-of-the-art Pareto frontier between reconstruction quality and sparsity for interpreting CLIP models.

Abstract Paper arXiv Code

Sparse autoencoders (SAEs) are useful for detecting and steering interpretable features in neural networks, with particular potential for understanding complex multimodal representations. Given their ability to uncover interpretable features, SAEs are particularly valuable for analyzing vision-language models (e.g., CLIP and SigLIP), which are fundamental building blocks in modern large-scale systems yet remain challenging to interpret and control. However, current SAE methods are limited by optimizing both reconstruction quality and sparsity simultaneously, as they rely on either activation suppression or rigid sparsity constraints. To this end, we introduce Matryoshka SAE (MSAE), a new architecture that learns hierarchical representations at multiple granularities simultaneously, enabling a direct optimization of both metrics without compromise. MSAE establishes a state-of-the-art Pareto frontier between reconstruction quality and sparsity for CLIP, achieving 0.99 cosine similarity and less than 0.1 fraction of variance unexplained while maintaining 80% sparsity. Finally, we demonstrate the utility of MSAE as a tool for interpreting and controlling CLIP by extracting over 120 semantic concepts from its representation to perform concept-based similarity search and bias analysis in downstream tasks like CelebA. We make the codebase available at https://github.com/WolodjaZ/MSAE.
ICLR Spotlight

Efficient and accurate explanation estimation with distribution compression

H. Baniecki, G. Casalicchio, B. Bischl, P. Biecek

International Conference on Learning Representations (ICLR), 2025 (Spotlight)
We introduce compress then explain (CTE) as a new paradigm for sample-efficient estimation of post-hoc explanations, including feature attributions, importance, and effects.

Abstract Paper arXiv Code

We discover a theoretical connection between explanation estimation and distribution compression that significantly improves the approximation of feature attributions, importance, and effects. While the exact computation of various machine learning explanations requires numerous model inferences and becomes impractical, the computational cost of approximation increases with an ever-increasing size of data and model parameters. We show that the standard i.i.d. sampling used in a broad spectrum of algorithms for post-hoc explanation leads to an approximation error worthy of improvement. To this end, we introduce Compress Then Explain (CTE), a new paradigm of sample-efficient explainability. It relies on distribution compression through kernel thinning to obtain a data sample that best approximates its marginal distribution. CTE significantly improves the accuracy and stability of explanation estimation with negligible computational overhead. It often achieves an on-par explanation approximation error 2-3x faster by using fewer samples, i.e. requiring 2-3x fewer model evaluations. CTE is a simple, yet powerful, plug-in for any explanation method that now relies on i.i.d. sampling.
WACV Oral

Aggregated attributions for explanatory analysis of 3D segmentation models

M. Chrabaszcz⁼, H. Baniecki⁼, P. Komorowski, S. Plotka, P. Biecek

IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2025 (Oral)
We discover knowledge acquired by the TotalSegmentator foundation model trained to segment all anatomical structures in computed tomography medical images.

Abstract Paper arXiv Code

Analysis of 3D segmentation models, especially in the context of medical imaging, is often limited to segmentation performance metrics that overlook the crucial aspect of explainability and bias. Currently, effectively explaining these models with saliency maps is challenging due to the high dimensions of input images multiplied by the ever-growing number of segmented class labels. To this end, we introduce Agg^2Exp, a methodology for aggregating fine-grained voxel attributions of the segmentation model's predictions. Unlike classical explanation methods that primarily focus on the local feature attribution, Agg^2Exp enables a more comprehensive global view on the importance of predicted segments in 3D images. Our benchmarking experiments show that gradient-based voxel attributions are more faithful to the model's predictions than perturbation-based explanations. As a concrete use-case, we apply Agg^2Exp to discover knowledge acquired by the Swin UNEt TRansformer model trained on the TotalSegmentator v2 dataset for segmenting anatomical structures in computed tomography medical images. Agg^2Exp facilitates the explanatory analysis of large segmentation models beyond their predictive performance.
NeurIPS

shapiq: Shapley interactions for machine learning

M. Muschalik, H. Baniecki, F. Fumagalli, P. Kolpaczki, B. Hammer, E. Hüllermeier

Advances in Neural Information Processing Systems (NeurIPS), 2024
We develop {shapiq}, an open-source Python package that implements several algorithms and benchmarks for efficiently approximating game-theoretic attribution and interaction indices.

Abstract Paper arXiv Code

Originally rooted in game theory, the Shapley Value (SV) has recently become an important tool in machine learning research. Perhaps most notably, it is used for feature attribution and data valuation in explainable artificial intelligence. Shapley Interactions (SIs) naturally extend the SV and address its limitations by assigning joint contributions to groups of entities, which enhance understanding of black box machine learning models. Due to the exponential complexity of computing SVs and SIs, various methods have been proposed that exploit structural assumptions or yield probabilistic estimates given limited resources. In this work, we introduce shapiq, an open-source Python package that unifies state-of-the-art algorithms to efficiently compute SVs and any-order SIs in an application-agnostic framework. Moreover, it includes a benchmarking suite containing 11 machine learning applications of SIs with pre-computed games and ground-truth values to systematically assess computational performance across domains. For practitioners, shapiq is able to explain and visualize any-order feature interactions in predictions of models, including vision transformers, language models, as well as XGBoost and LightGBM with TreeSHAP-IQ. With shapiq, we extend shap beyond feature attributions and consolidate the application of SVs and SIs in machine learning that facilitates future research. The source code and documentation are available at https://github.com/mmschlk/shapiq.
PNAS

Increasing phosphorus loss despite widespread concentration decline in US rivers

W. Zhi, H. Baniecki, J. Liu, E. Boyer, C. Shen, G. Shenk, X. Liu, L. Li

Proceedings of the National Academy of Sciences, 2024
We reveal a paradox in US rivers with deep learning: phosphorus concentration is down over the last 40 years, particularly in urban areas, but total phosphorus loss is up due to climate change.

Abstract Paper Code

The loss of phosphorous (P) from the land to aquatic systems has polluted waters and threatened food production worldwide. Systematic trend analysis of P, a nonrenewable resource, has been challenging, primarily due to sparse and inconsistent historical data. Here, we leveraged intensive hydrometeorological data and the recent renaissance of deep learning approaches to fill data gaps and reconstruct temporal trends. We trained a multitask long short-term memory model for total P (TP) using data from 430 rivers across the contiguous United States (CONUS). Trend analysis of reconstructed daily records (1980–2019) shows widespread decline in concentrations, with declining, increasing, and insignificantly changing trends in 60%, 28%, and 12% of the rivers, respectively. Concentrations in urban rivers have declined the most despite rising urban population in the past decades; concentrations in agricultural rivers however have mostly increased, suggesting not-as-effective controls of nonpoint sources in agriculture lands compared to point sources in cities. TP loss, calculated as fluxes by multiplying concentration and discharge, however exhibited an overall increasing rate of 6.5% per decade at the CONUS scale over the past 40 y, largely due to increasing river discharge. Results highlight the challenge of reducing TP loss that is complicated by changing river discharge in a warming climate.
ECML PKDD

On the robustness of global feature effect explanations

H. Baniecki, G. Casalicchio, B. Bischl, P. Biecek

European Conference on Machine Learning (ECML PKDD), 2024
Theoretical bounds for the robustness of feature effects to data and model perturbations.

Abstract Paper arXiv Code Slides

We study the robustness of global post-hoc explanations for predictive models trained on tabular data. Effects of predictor features in black-box supervised learning are an essential diagnostic tool for model debugging and scientific discovery in applied sciences. However, how vulnerable they are to data and model perturbations remains an open research question. We introduce several theoretical bounds for evaluating the robustness of partial dependence plots and accumulated local effects. Our experimental results with synthetic and real-world datasets quantify the gap between the best and worst-case scenarios of (mis)interpreting machine learning predictions globally.
JMLR

dalex: Responsible machine learning with interactive explainability and fairness in Python

H. Baniecki, W. Kretowicz, P. Piatyszek, J. Wisniewski, P. Biecek

Journal of Machine Learning Research, 2021
2022 John M. Chambers Statistical Software Award by the American Statistical Association

Abstract Paper arXiv Code Website

In modern machine learning, we observe the phenomenon of opaqueness debt, which manifests itself by an increased risk of discrimination, lack of reproducibility, and deflated performance due to data drift. An increasing amount of available data and computing power results in the growing complexity of black-box predictive models. To manage these issues, good MLOps practice asks for better validation of model performance and fairness, higher explainability, and continuous monitoring. The necessity for deeper model transparency comes from both scientific and social domains and is also caused by emerging laws and regulations on artificial intelligence. To facilitate the responsible development of machine learning models, we introduce dalex, a Python package which implements a model-agnostic interface for interactive explainability and fairness. It adopts the design crafted through the development of various tools for explainable machine learning; thus, it aims at the unification of existing solutions. This library's source code and documentation are available under open license at https://python.drwhy.ai.