坐标下降法

坐标下降法（英語：coordinate descent）是一种非梯度优化算法。算法在每次迭代中，在当前点处沿一个坐标方向进行一维搜索（英语：line search）以求得一个函数的局部极小值。在整个过程中循环使用不同的坐标方向。对于不可拆分的函数而言，算法可能无法在较小的迭代步数中求得最优解。为了加速收敛，可以采用一个适当的坐标系，例如通过主成分分析获得一个坐标间尽可能不相互关联的新坐标系（参考自适应坐标下降法（英语：Adaptive coordinate descent））。

算法描述

坐标下降法基于的思想是多变量函数 $F(\mathbf {x} )$ 可以通过每次沿一个方向优化来获取最小值。与通过梯度获取最速下降的方向不同，在坐标下降法中，优化方向从算法一开始就予以固定。例如，可以选择线性空间的一组基 $\mathbf {e} _{1},\mathbf {e} _{2},\dots ,\mathbf {e} _{n}$ 作为搜索方向。在算法中，循环最小化各个坐标方向上的目标函数值。亦即，如果 $\mathbf {x} ^{k}$ 已给定，那么， $\mathbf {x} ^{k+1}$ 的第 $i$ 个维度为

\mathbf {x} _{i}^{k+1}={\underset {y\in \mathbb {R} }{\operatorname {arg\,min} }}\;f(x_{1}^{k+1},...,x_{i-1}^{k+1},y,x_{i+1}^{k},...,x_{n}^{k});

因而，从一个初始的猜测值 $\mathbf {x} _{0}$ 以求得函数 $F$ 的局部最优值，可以迭代获得 $\mathbf {x} _{0},\mathbf {x} _{1},\mathbf {x} _{2},\dots$ 的序列。

通过在每一次迭代中采用一维搜索（英语：line search），可以很自然地获得不等式

F(\mathbf {x} _{0})\geq F(\mathbf {x} _{1})\geq F(\mathbf {x} _{2})\geq \cdots ,

可以知道，这一序列与最速下降具有类似的收敛性质。如果在某次迭代中，函数得不到优化，说明一个驻点已经达到。

这一过程可以用下图表示。

例子

对于非平滑函数，坐标下降法可能会遇到问题。下图展示了当函数等高线非平滑时，算法可能在非驻点中断执行。

应用

坐标下降法在机器学习中有应用，例如训练线性支持向量机^[1]（可见LIBLINEAR（英语：LIBLINEAR））以及非负矩阵分解（英语：non-negative matrix factorization）^[2]。

参见

自适应坐标下降法（英语：Adaptive coordinate descent）
共轭梯度法
梯度下降法
牛顿法
最优化
一维搜索（英语：line search）

参考

^ Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. ACM: 408–415. 2008-07-05 [2018-04-02]. ISBN 9781605582054. doi:10.1145/1390156.1390208.
^ Cho-Jui Hsieh, Inderjit S. Dhillon. Fast coordinate descent methods with variable selection for non-negative matrix factorization. ACM: 1064–1072. 2011-08-21 [2018-04-02]. ISBN 9781450308137. doi:10.1145/2020408.2020577.

Bezdek, J. C.; Hathaway, R. J.; Howard, R. E.; Wilson, C. A.; Windham, M. P., Local convergence analysis of a grouped variable version of coordinate descent, Journal of Optimization theory and applications 54 (3) (Kluwer Academic/Plenum Publishers), 1987, 54 (3): 471–477, doi:10.1007/BF00940196
Bertsekas, Dimitri P. (1999). Nonlinear Programming, Second Edition Athena Scientific, Belmont, Massachusetts. ISBN 1-886529-00-0.
Canutescu, AA; Dunbrack, RL, Cyclic coordinate descent: A robotics algorithm for protein loop closure., Protein science 12 (5), 2003, 12 (5): 963–72, PMID 12717019 .
Luo, Zhiquan; Tseng, P., On the convergence of the coordinate descent method for convex differentiable minimization, Journal of Optimization theory and applications 72 (1) (Kluwer Academic/Plenum Publishers), 1992, 72 (1): 7–35, doi:10.1007/BF00939948 .
Wu, TongTong; Lange, Kenneth, Coordinate descent algorithms for Lasso penalized regression, The Annals of Applied Statistics 2 (1) (Institute of Mathematical Statistics), 2008, 2 (1): 224–244, doi:10.1214/07-AOAS147 .
Richtarik, Peter; Takac, Martin, Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function, Mathematical Programming (Springer), April 2011, doi:10.1007/s10107-012-0614-z .
Richtarik, Peter; Takac, Martin, Parallel coordinate descent methods for big data optimization, arXiv:1212.0873, December 2012 .

[1] Cho-Jui Hsieh, Kai-Wei Chang, Chih-Jen Lin, S. Sathiya Keerthi, S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. ACM: 408–415. 2008-07-05 [2018-04-02]. ISBN 9781605582054. doi:10.1145/1390156.1390208.

[2] Cho-Jui Hsieh, Inderjit S. Dhillon. Fast coordinate descent methods with variable selection for non-negative matrix factorization. ACM: 1064–1072. 2011-08-21 [2018-04-02]. ISBN 9781450308137. doi:10.1145/2020408.2020577.

[1]

[2]