Classical methods as gradient descent and Newton can be justified from Taylor’s theorem. It is a powerful tool to have at handy when doing convergence analysis of optimization methods. We try to develop here the necessary background in order to master this important tool.
Rolle’s Theorem: Let and a continuous function differentiable at the interval . Let such that . Then there exists such that .
Proof: Assume, without loss of generality (w.l.o.g.), that is the first value greater than such that . Furter, w.l.o.g., assume that . Since is a compact set, it attains a maximum at some point . Therefore, .