Understanding Blackbox Prediction via Influence Functions - SlideShare In this paper, we use influence functions -- a classic technique from robust statistics -- to trace a model's prediction through . Proc 34th Int Conf on Machine Learning, p.1885-1894. Your search export query has expired. To scale up influence functions to modern [] How can we explain the predictions of a black-box model? ( , ?) Kelvin Wong, Siva Manivasagam, and Amanjit Singh Kainth. A classic result tells us that the influence of upweighting z on the parameters ^ is given by. The details of the assignment are here. How can we explain the predictions of a black-box model? $-hm`nrurh%\L(0j/hM4/AO*V8z=./hQ-X=g(0
/f83aIF'Mu2?ju]n|# =7$_--($+{=?bvzBU[.Q. Y. LeCun, L. Bottou, G. B. Orr, and K.-R. Muller. Gradient-based hyperparameter optimization through reversible learning. On the importance of initialization and momentum in deep learning.
Visual interpretability for deep learning: a survey | SpringerLink We'll consider two models of stochastic optimization which make vastly different predictions about convergence behavior: the noisy quadratic model, and the interpolation regime.
Understanding black-box predictions via influence functions Are you sure you want to create this branch? grad_z on the other hand is only dependent on the training Imagenet classification with deep convolutional neural networks. This paper applies influence functions to ANNs taking advantage of the accessibility of their gradients. In, Cadamuro, G., Gilad-Bachrach, R., and Zhu, X. Debugging machine learning models. In. samples for each test data sample. and even creating visually-indistinguishable training-set attacks.
nimarb/pytorch_influence_functions - Github Influence functions are a classic technique from robust statistics to identify the training points most responsible for a given prediction. A. M. Saxe, J. L. McClelland, and S. Ganguli. %PDF-1.5 Your job will be to read and understand the paper, and then to produce a Colab notebook which demonstrates one of the key ideas from the paper. In. In this paper, we use influence functions --- a classic technique from robust statistics --- Google Scholar Understanding Black-box Predictions via Influence Functions - YouTube AboutPressCopyrightContact usCreatorsAdvertiseDevelopersTermsPrivacyPolicy & SafetyHow YouTube worksTest new features 2022. International conference on machine learning, 1885-1894, 2017. The dict structure looks similiar to this: Harmful is a list of numbers, which are the IDs of the training data samples
place. when calculating the influence of that single image. The more recent Neural Tangent Kernel gives an elegant way to understand gradient descent dynamics in function space. ( , ) Inception, . With the rapid adoption of machine learning systems in sensitive applications, there is an increasing need to make black-box models explainable. How can we explain the predictions of a black-box model? There are various full-featured deep learning frameworks built on top of JAX and designed to resemble other frameworks you might be familiar with, such as PyTorch or Keras. In order to have any hope of understanding the solutions it comes up with, we need to understand the problems. Requirements chainer v3: It uses FunctionHook. Online delivery. The algorithm moves then While this class draws upon ideas from optimization, it's not an optimization class. Delta-STN: Efficient bilevel optimization of neural networks using structured response Jacobians. Often we want to identify an influential group of training samples in a particular test prediction for a given machine learning model.
compress your dataset slightly to the most influential images important for International Conference on Machine Learning (ICML), 2017. logistic regression p (y|x)=\sigma (y \theta^Tx) \sigma . Shrikumar, A., Greenside, P., Shcherbina, A., and Kundaje, A.
A Survey of Methods for Explaining Black Box Models Validations 4.
[1703.04730] Understanding Black-box Predictions via Influence Functions Wei, B., Hu, Y., and Fung, W. Generalized leverage and its applications. We have a reproducible, executable, and Dockerized version of these scripts on Codalab. For this class, we'll use Python and the JAX deep learning framework. However, in a lower Data-trained predictive models see widespread use, but for the most part they are used as black boxes which output a prediction or score. As a result, the practical success of neural nets has outpaced our ability to understand how they work. Frenay, B. and Verleysen, M. Classification in the presence of label noise: a survey.
PDF Understanding Black-box Predictions via Influence Functions - GitHub Pages Programming languages & software engineering, Programming languages and software engineering, Designing AI Systems with Steerable Long-Term Dynamics, Using platform models responsibly: Developer tools with human-AI partnership at the center, [ICSE'22] TOGA: A Neural Method for Test Oracle Generation, Characterizing and Predicting Engagement of Blind and Low-Vision People with an Audio-Based Navigation App [Pre-recorded CHI 2022 presentation], Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation [video], Closing remarks: Empowering software developers and mathematicians with next-generation AI, Research talks: AI for software development, MDETR: Modulated Detection for End-to-End Multi-Modal Understanding, Introducing Retiarii: A deep learning exploratory-training framework on NNI, Platform for Situated Intelligence Workshop | Day 2. We use cookies to ensure that we give you the best experience on our website. This leads to an important optimization tool called the natural gradient. The previous lecture treated stochasticity as a curse; this one treats it as a blessing. Influence functions efficiently estimate the effect of removing a single training data point on a model's learned parameters. Why Use Influence Functions? A. Mokhtari, A. Ozdaglar, and S. Pattathil. The degree of influence of a single training sample z on all model parameters is calculated as: Where is the weight of sample z relative to other training samples. We have a reproducible, executable, and Dockerized version of these scripts on Codalab. Metrics give a local notion of distance on a manifold. Biggio, B., Nelson, B., and Laskov, P. Support vector machines under adversarial label noise. The infinitesimal jackknife. Reconciling modern machine-learning practice and the classical bias-variance tradeoff. In this paper, we use influence functions --- a classic technique from robust statistics --- to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. To scale up influence functions to modern machine learning settings, we develop a simple, efficient implementation that requires only oracle access to gradients and Hessian-vector products. You signed in with another tab or window. This class is about developing the conceptual tools to understand what happens when a neural net trains. A. Insights from a noisy quadratic model. I am grateful to my supervisor Tasnim Azad Abir sir, for his . The implicit and explicit regularization effects of dropout. Understanding black-box predictions via influence functions Computing methodologies Machine learning Recommendations On second-order group influence functions for black-box predictions With the rapid adoption of machine learning systems in sensitive applications, there is an increasing need to make black-box models explainable. Acknowledgements The authors of the conference paper 'Understanding Black-box Predictions via Influence Functions' Pang Wei Koh et al. Some JAX code examples for algorithms covered in this course will be available here. Datta, A., Sen, S., and Zick, Y. Algorithmic transparency via quantitative input influence: Theory and experiments with learning systems. In. I recommend you to change the following parameters to your liking. fast SSD, lots of free storage space, and want to calculate the influences on ICML 2017 best paperStanfordPang Wei KohPercy liang, x_{test} y_{test} label x_{test} , n z_1z_n z_i=(x_i,y_i) L(z,\theta) z \theta , \hat{\theta}=argmin_{\theta}\frac{1}{n}\Sigma_{i=1}^{n}L(z_i,\theta), z z \epsilon ERM, \hat{\theta}_{\epsilon,z}=argmin_{\theta}\frac{1}{n}\Sigma_{i=1}^{n}L(z_i,\theta)+\epsilon L(z,\theta), influence function, \mathcal{I}_{up,params}(z)={\frac{d\hat{\theta}_{\epsilon,z}}{d\epsilon}}|_{\epsilon=0}=-H_{\hat{\theta}}^{-1}\nabla_{\theta}L(z,\hat{\theta}), H_{\hat\theta}=\frac{1}{n}\Sigma_{i=1}^{n}\nabla_\theta^{2} L(z_i,\hat\theta) Hessien, \begin{equation} \begin{aligned} \mathcal{I}_{up,loss}(z,z_{test})&=\frac{dL(z_{test},\hat\theta_{\epsilon,z})}{d\epsilon}|_{\epsilon=0} \\&=\nabla_\theta L(z_{test},\hat\theta)^T {\frac{d\hat{\theta}_{\epsilon,z}}{d\epsilon}}|_{\epsilon=0} \\&=\nabla_\theta L(z_{test},\hat\theta)^T\mathcal{I}_{up,params}(z)\\&=-\nabla_\theta L(z_{test},\hat\theta)^T H^{-1}_{\hat\theta}\nabla_\theta L(z,\hat\theta) \end{aligned} \end{equation}, lossNLPer, influence function, logistic regression p(y|x)=\sigma (y \theta^Tx) \sigma sigmoid z_{test} loss z \mathcal{I}_{up,loss}(z,z_{test}) , -y_{test}y \cdot \sigma(-y_{test}\theta^Tx_{test}) \cdot \sigma(-y\theta^Tx) \cdot x^{T}_{test} H^{-1}_{\hat\theta}x, \sigma(-y\theta^Tx) outlieroutlier, x^{T}_{test} x H^{-1}_{\hat\theta} Hessian \mathcal{I}_{up,loss}(z,z_{test}) resistencevariation, \mathcal{I}_{up,loss}(z,z_{test})=-\nabla_\theta L(z_{test},\hat\theta)^T H^{-1}_{\hat\theta}\nabla_\theta L(z,\hat\theta), Hessian H_{\hat\theta} O(np^2+p^3) n p z_i , conjugate gradientstochastic estimationHessian-vector productsHVP H_{\hat\theta} s_{test}=H^{-1}_{\hat\theta}\nabla_\theta L(z_{test},\hat\theta) \mathcal{I}_{up,loss}(z,z_{test})=-s_{test} \cdot \nabla_{\theta}L(z,\hat\theta) , H_{\hat\theta}^{-1}v=argmin_{t}\frac{1}{2}t^TH_{\hat\theta}t-v^Tt, HVPCG O(np) , H^{-1} , (I-H)^i,i=1,2,\dots,n H 1 j , S_j=\frac{I-(I-H)^j}{I-(I-H)}=\frac{I-(I-H)^j}{H}, \lim_{j \to \infty}S_j z_i \nabla_\theta^{2} L(z_i,\hat\theta) H , HVP S_i S_i \cdot \nabla_\theta L(z_{test},\hat\theta) , NMIST H loss , ImageNetInceptionRBF SVM, RBF SVMRBF SVM, InceptionInception, Inception, , Inception591/60059133557%, check \mathcal{I}_{up,loss}(z_i,z_i) z_i , 10% \mathcal{I}_{up,loss}(z_i,z_i) , H_{\hat\theta}=\frac{1}{n}\Sigma_{i=1}^{n}\nabla_\theta^{2} L(z_i,\hat\theta), s_{test}=H^{-1}_{\hat\theta}\nabla_\theta L(z_{test},\hat\theta), \mathcal{I}_{up,loss}(z,z_{test})=-s_{test} \cdot \nabla_{\theta}L(z,\hat\theta), S_i \cdot \nabla_\theta L(z_{test},\hat\theta). we develop a simple, efficient implementation that requires only oracle access to gradients Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks. To manage your alert preferences, click on the button below. When can we take advantage of parallelism to train neural nets? Fortunately, influence functions give us an efficient approximation.
Understanding Black-box Predictions via Influence Functions - SlideShare A. S. Benjamin, D. Rolnick, and K. P. Kording. Check if you have access through your login credentials or your institution to get full access on this article. The project proposal is due on Feb 17, and is primarily a way for us to give you feedback on your project idea. Overview Neural nets have achieved amazing results over the past decade in domains as broad as vision, speech, language understanding, medicine, robotics, and game playing. The first mode is called calc_img_wise, during which the two We have 3 hours scheduled for lecture and/or tutorial. If the influence function is calculated for multiple Understanding Black-box Predictions via Influence Functions International Conference on Machine Learning (ICML), 2017.
PDF Appendix: Understanding Black-box Predictions via Influence Functions We'll use linear regression to understand two neural net training phenomena: why it's a good idea to normalize the inputs, and the double descent phenomenon whereby increasing dimensionality can reduce overfitting. In this paper, we use influence functions a classic technique from robust statistics to trace a models prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. dependent on the test sample(s). the training dataset were the most helpful, whereas the Harmful images were the Class will be held synchronously online every week, including lectures and occasionally tutorials.
ICML 2017 Best Paper - Deep inside convolutional networks: Visualising image classification models and saliency maps. After all, the optimization landscape is nonconvex, highly nonlinear, and high-dimensional, so why are we able to train these networks? On the origin of implicit regularization in stochastic gradient descent.
The We see how to approximate the second-order updates using conjugate gradient or Kronecker-factored approximations. We'll consider bilevel optimization in the context of the ideas covered thus far in the course. How can we explain the predictions of a black-box model? Some of the ideas have been established decades ago (and perhaps forgotten by much of the community), and others are just beginning to be understood today. How can we explain the predictions of a black-box model? This alert has been successfully added and will be sent to: You will be notified whenever a record that you have chosen has been cited. values s_test and grad_z for each training image are computed on the fly Fast exact multiplication by the hessian. All Holdings within the ACM Digital Library. A tag already exists with the provided branch name. Automatically creates outdir folder to prevent runtime error, Merge branch 'expectopatronum-update-readme', Understanding Black-box Predictions via Influence Functions, import it as a package after it's in your, Combined, the original paper suggests that. Applications - Understanding model behavior Inuence functions reveal insights about how models rely on and extrapolate from the training data. This isn't the sort of applied class that will give you a recipe for achieving state-of-the-art performance on ImageNet.
Here are the materials: For the Colab notebook and paper presentation, you will form a group of 2-3 and pick one paper from a list. Amershi, S., Chickering, M., Drucker, S. M., Lee, B., Simard, P., and Suh, J. Modeltracker: Redesigning performance analysis tools for machine learning. influence function. calculations, which could potentially be 10s of thousands. Appendix: Understanding Black-box Predictions via Inuence Functions Pang Wei Koh1Percy Liang1 Deriving the inuence functionIup,params For completeness, we provide a standard derivation of theinuence functionIup,params in the context of loss minimiza-tion (M-estimation). Training test 7, Training 1, test 7 . In this paper, we use influence functions a classic technique from robust statistics to trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction. , Hessian-vector . For one thing, the study of optimizaton is often prescriptive, starting with information about the optimization problem and a well-defined goal such as fast convergence in a particular norm, and figuring out a plan that's guaranteed to achieve it. the first approximation in s_test and once to combine with the s_test Three mechanisms of weight decay regularization. Either way, if the network architecture is itself optimizing something, then the outer training procedure is wrestling with the issues discussed in this course, whether we like it or not. influence-instance.
Understanding Black-box Predictions via Influence Functions Helpful is a list of numbers, which are the IDs of the training data samples All information about attending virtual lectures, tutorials, and office hours will be sent to enrolled students through Quercus. PW Koh*, KS Ang*, H Teo*, PS Liang. Springenberg, J. T., Dosovitskiy, A., Brox, T., and Riedmiller, M. Striving for simplicity: The all convolutional net. Tasha Nagamine, . Wojnowicz, M., Cruz, B., Zhao, X., Wallace, B., Wolff, M., Luan, J., and Crable, C. "Influence sketching": Finding influential samples in large-scale regressions. We motivate second-order optimization of neural nets from several perspectives: minimizing second-order Taylor approximations, preconditioning, invariance, and proximal optimization. With the rapid adoption of machine learning systems in sensitive applications, there is an increasing need to make black-box models explainable. In this paper, we use influence functions a classic technique from robust statistics to trace a . Understanding black-box predictions via influence functions. Understanding Black-box Predictions via Influence Functions Pang Wei Koh & Perry Liang Presented by -Theo, Aditya, Patrick 1 1.Influence functions: definitions and theory 2.Efficiently calculating influence functions 3. J. Cohen, S. Kaur, Y. Li, J. prediction outcome of the processed test samples. Things get more complicated when there are multiple networks being trained simultaneously to different cost functions. If the influence function is calculated for multiple This is a tentative schedule, which will likely change as the course goes on. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks, Chris Zhang, Dami Choi, Anqi (Joyce) Yang. The list
Second-Order Group Influence Functions for Black-Box Predictions calculated. 2019. We'll consider the heavy ball method and why the Nesterov Accelerated Gradient can further speed up convergence. Gradient-based Hyperparameter Optimization through Reversible Learning. Optimizing neural networks with Kronecker-factored approximate curvature. which can of course be changed. On linear models and convolutional neural networks, we demonstrate that influence functions are useful for multiple purposes: understanding model behavior, debugging models, detecting dataset errors, and even creating visually-indistinguishable training-set attacks.See more on this video at https://www.microsoft.com/en-us/research/video/understanding-black-box-predictions-via-influence-functions/
Effy Jewelry Clearance,
5 Characteristics Of A Unhealthy School And Community Environment,
Articles U
">
Rating: 4.0/5