Sage Journals: Discover world-class research

Abstract

Graph neural networks (GNNs) have achieved excellent results in various graph-based learning tasks. However, the redundant parameters of GNNs and the large-scale graphs used as inputs have prevented GNNs from scaling up to real-world large-scale graph applications. To solve this problem, the graph lottery hypothesis claims the existence of graph lottery tickets (GLT), a combination of sparse core subgraph and subnetwork, which can be retrained to achieve performance similar to the original input graphs and dense networks. However, the GLT identified in the existing work lose valuable information due to irreversible pruning schemes. In addition, the performance of GNNs drops significantly when the graph sparsity is high. In this paper, we propose a gradual pruning and knowledge distillation (GPKD) framework to compensate for the loss caused by pruning and eventually identify GLT efficiently. Specifically, we first prune the input graph and model parameters according to the gradual iterative magnitude pruning strategy and then reset the remaining parameters. After each round of pruning, the pre-trained and pruned networks are considered teacher and student models, respectively. We employ a knowledge distillation scheme to allow students to mimic the output of the teacher model. The experimental results demonstrate that our proposed GPKD framework significantly outperforms the state-of-the-art unified GNNs sparsification (UGS) framework.

Keywords

graph neural networks lottery ticket hypothesis network pruning knowledge distillation

Get full access to this article

View all access options for this article.

References

Boyd

S. P.

Parikh

Chu

Peleato

Eckstein

(2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1), 1–122.

Cai

Lin

Liu

Tang

Wang

Zhu

Han

(2022). Enable deep learning on mobile devices: Methods, systems, and applications. ACM Transactions on Design Automation of Electronic Systems, 27(3), 20:1–20:50.

Chen

Sui

Chen

Zhang

Wang

(2021). A unified lottery ticket hypothesis for graph neural networks.

Cui

Zhang

(2021). Joint structured pruning and dense knowledge distillation for efficient transformer model compression. Neurocomputing, 458, 56–69.

Frankle

Carbin

(2019). The lottery ticket hypothesis: Finding sparse, trainable neural networks. In 7th International conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019.

Frankle

Dziugaite

G. K.

Roy

D. M.

Carbin

(2019). Stabilizing the lottery ticket hypothesis. arXiv preprint arXiv:1903.01611.

Han

Mao

Dally

W. J.

(2016). Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In 4th International conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2–4, 2016, Conference Track Proceedings.

Han

Pool

Tran

(2015). Learning both weights and connections for efficient neural networks. Proceedings of the 29th International Conference on Neural Information Processing Systems, 1, 1135–1143.

Harn

Yeddula

S. D.

Hui

Zhang

Sun

(2022). IGRP: Iterative gradient rank pruning for finding graph lottery ticket. In IEEE International conference on big data, big data 2022, Osaka, Japan, December 17–20, 2022 (pp. 931–941). IEEE.

10.

Hinton

Vinyals

Dean

(2015). Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531.

11.

Huang

Wang

(2020). Multi-granularity pruning for deep residual networks. Journal of Intelligent & Fuzzy Systems, 39(5), 7403–7410.

12.

Hui

Yan

W. S.

(2023). Rethinking graph lottery tickets: Graph sparsity matters. arXiv preprint arXiv:2305.02190.

13.

Jiang

Chen

Liu

(2023). Single-shot pruning and quantization for hardware-friendly neural network acceleration. Engineering Applications of Artificial Intelligence, 126, 106816.

14.

Kipf

T. N.

Welling

(2017). Semi-supervised classification with graph convolutional networks. In 5th International conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings.

15.

Wang

Huang

Bragin

M. A.

Ding

(2023). Towards lossless head pruning through automatic peer distillation for language models. In Proceedings of the Thirty-Second international joint conference on artificial intelligence, IJCAI 2023, 19th-25th August 2023, Macao, SAR, China (pp. 5113–5121). ijcai.org.

16.

Zhang

Tian

Jin

Fardad

Zafarani

(2020). Sgcn: A graph sparsifier based on graph convolutional networks. In Advances in knowledge discovery and data mining: 24th Pacific-Asia Conference, PAKDD 2020, Singapore, May 11–14, 2020, Proceedings, Part I 24 (pp. 275–287). Springer.

17.

Liu

Zhan

(2023a). Comprehensive graph gradual pruning for sparse training in graph neural networks. IEEE Transactions on Neural Networks and Learning Systems, 35(10), 14903–14917.

18.

Liu

Zheng

Zhang

Hao

(2023b). Graph-based knowledge distillation: A survey and experimental evaluation. arXiv preprint arXiv:2302.14643.

19.

Park

(2022). Prune your model before distill it. In Computer Vision - ECCV 2022 - 17th European conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XI, Lecture Notes in Computer Science (Vol. 13671, pp. 120–136). Springer.

20.

Rong

Huang

(2020). Dropedge: Towards deep graph convolutional networks on node classification. In 8th International conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26–30, 2020.

21.

Savarese

Silva

Maire

(2020). Winning the lottery with continuous sparsification. In Advances in neural information processing systems 33: Annual conference on neural information processing systems 2020, NeurIPS 2020, December 6–12, 2020, virtual.

22.

Shen

Kong

Qin

Dong

Yuan

Meng

Tang

Wang

(2022). The lottery ticket hypothesis for vision transformers. arXiv preprint arXiv:2211.01484.

23.

Velickovic

Cucurull

Casanova

Romero

Liò

Bengio

(2018). Graph attention networks. In 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 – May 3, 2018, Conference Track Proceedings.

24.

Wang

Liang

Wang

Fang

Wang

(2023a). Searching lottery tickets in graph neural networks: A dual perspective. In The Eleventh international conference on learning representations, ICLR 2023, Kigali, Rwanda, May 1–5, 2023.

25.

Wang

Wan

Zhang

(2023b). Progressive multi-level distillation learning for pruning network. Complex and Intelligent Systems, 9(5), 5779–5791.

26.

Wang

Liu

Chen

Zhu

Qiao

Shi

Wan

Song

(2023c). Adversarial erasing with pruned elements: Towards better graph lottery ticket. arXiv preprint arXiv:2308.02916.

27.

Wang

Sui

Wang

Liu

(2022). Exploring lottery ticket hypothesis in media recommender systems. International Journal of Intelligent Systems , 37(5), 3006–3024.

28.

Leskovec

Jegelka

(2019). How powerful are graph neural networks? In 7th International conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019.

29.

Yan

Wang

Guo

Lou

(2020). Tinygnn: Learning efficient graph neural networks. In KDD ’20: The 26th ACM SIGKDD conference on knowledge discovery and data mining, virtual event, CA, USA, August 23–27, 2020 (pp. 1848–1856). ACM.

30.

Yang

Qiu

Song

Tao

Wang

(2020). Distilling knowledge from graph convolutional networks. In 2020 IEEE/CVF Conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020 (pp. 7072–7081). Computer Vision Foundation / IEEE.

31.

(2023). Sparse graph attention networks. IEEE Transactions on Knowledge and Data Engineering, 35(1), 905–916.

32.

Yeo

Jang

Sohn

J. Y.

Han

Yoo

(2023). Can we find strong lottery tickets in generative models? In Proceedings of the AAAI conference on artificial intelligence (Vol. 37, pp. 3267–3275).

33.

Ying

You

Morris

Ren

Hamilton

Leskovec

(2018). Hierarchical graph representation learning with differentiable pooling. Advances in Neural Information Processing Systems, 31.

34.

Zhang

Chen

(2020). Inductive matrix completion based on graph neural networks. In 8th International conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020.

35.

Zhu

Gupta

(2018). To prune, or not to prune: Exploring the efficacy of pruning for model compression. In 6th International conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 – May 3, 2018, Workshop Track Proceedings.

Joint Gradual Pruning and Knowledge Distillation for Identifying Graph Lottery Tickets

Abstract

Keywords

Get full access to this article

References