Contrastive Explanations for Model Interpretability
, Swabha Swayamdipta, Shauli Ravfogel, Yanai Elazar, Yejin Choi, Yoav Goldberg
A paper about explaining classifier decisions contrastively against alternative decisions.
Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI
, Ana Marasović, Tim Miller, Yoav Goldberg
In ACM FAccT 2021.
We formalize what “trust” and “trustworthiness” means in the context of humans trusting AI (where AI is any “smart” automation which is attributed with social intent by the human). This player is meant as a foundational definition of trust in AI, how to cause it, and how to verify that the trust should exist.
Scalable Evaluation and Improvement of Document Set Expansion via Neural Positive-Unlabeled Learning
, Gang Niu, Yoav Goldberg, Masashi Sugiyama.
In EACL 2021.
We apply state-of-the-art PU learning solutions to large neural models for the document set expansion (DSE) task, finding that they fail - so we propose modifications to PU learning to perform well on large models and small quantity of labeled data, with strong results. We also propose a new method of evaluation DSE models, which is scalable to a large amount of topics.
Aligning Faithful Interpretations with their Social Attribution.
, Yoav Goldberg.
In TACL 2020.
An exciting (to me!) new formalization on explanations of artificial model decisions, with highlight explanations as a motivating case-study.
Amnesic Probing: Behavioral Explanations with Amnesic Counterfactuals.
Yanai Elazar, Shauli Ravfogel, , Yoav Goldberg.
In TACL 2020.
A behavioral, causality-inspired analysis of the linguistic knowledge utilized by masked language models, using nullspace projection to remove the information from the representation and then observe the change in output.
Exposing Shallow Heuristics of Relation Extraction Models with Challenge Data
Shachar Rosenman, , Yoav Goldberg.
In EMNLP 2020.
We show that Relation Classification models often use shallow heuristics, and create a challenge set of natural sentences where these heuristics don’t apply / are misleading, which SOTA models today fail on.
Towards Faithfully Interpretable NLP Systems: How should we define and evaluate faithfulness?
We survey the available literature, and present a position, on the faithfulness attribute of artificial model interpretations. This involves guidelines on faithfulness evaluation, a survey and meta-analysis of works in the area, and an opinion on what is missing in current work and how to move forward.
Improving Task-Oriented Dialogue Systems In Production with Conversation Logs
When dialogue agents are deployed in production, their failure cases are escalated to a human agent to handle the issue. This escalated conversation log is often recorded to help the system developer to improve the system. We propose a way of using these logs, often available “for free” during production, to automatically generate system improvement recommendations to the developer.
Neural network gradient-based learning of black-box function interfaces
, Guy Hadash, Einat Kermany, Boaz Carmeli, Ofer Lavi, George Kour, Jonathan Berant.
In ICLR 2019 and in AI Week 2019.
We propose a modular neural framework for integrating black-box (non-differentiable) oracle functions into the neural model, such that the new model can interface directly with the black-box function during inference.
Understanding Convolutional Neural Networks for Text Classification
We analyze CNNs trained to classify text, arriving at various conclusions on the learned behavior of each of the filters in the network. This allows us to propose model interpretation and prediction interpretation techniques for such models.