ABSTRACT: We propose an approximation of kernel ridge regression (KRR) based on random features and a multi-layer structure. KRR is popular in statistics and machine learning for nonparametric regressions over reproducing kernel Hilbert spaces. We study the minimum number of random features and the size of layers can be chosen for preserving minimax optimality of the approximate KRR estimate. We show that the multi-layer kernel machines only require O(n^1.5 log(n)) time and O(n log(n)) memory, which is significantly better compared to O(n^3) time and O(n^2) memory in computing KRR. For various classes of random features, we prove that the multilayer structure is more effective in reducing the computational complexity than the single-layer while keeping statistical minimax optimality. The analysis is supported by simulations and real data examples.
- Start date: 2020-02-10 09:00:00
- End date: 2020-02-10 10:00:00