Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks?---A Neural Tangent Kernel Perspective K Huang, Y Wang, M Tao, T Zhao Advances in neural information processing systems 33, 2698-2709, 2020 | 97 | 2020 |
Large learning rate tames homogeneity: Convergence and balancing effect Y Wang, M Chen, T Zhao, M Tao arXiv preprint arXiv:2110.03677, 2021 | 37 | 2021 |
Momentum Stiefel Optimizer, with Applications to Suitably-Orthogonal Attention, and Optimal Transport L Kong, Y Wang, M Tao arXiv preprint arXiv:2205.14173, 2022 | 6 | 2022 |
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult Y Wang, Z Xu, T Zhao, M Tao arXiv preprint arXiv:2310.17087, 2023 | 1 | 2023 |
Markov chain Monte Carlo for Gaussian: A linear control perspective B Yuan, J Fan, Y Wang, M Tao, Y Chen IEEE Control Systems Letters, 2023 | 1 | 2023 |