# 11.1. Basic Subgradient Method#

The theory of subgradients for convex functions is discussed in Subgradients. This section discusses basic subgradient method for optimization. Primary references for this section are [15].

Recall that a vector $$\bg$$ is a subgradient of $$f$$ at a point $$\bx$$ if

$f(\by) \geq f(\bx) + \langle \by - \bx, \bg \rangle \Forall \by \in \RR^n.$

## 11.1.1. Unconstrained Minimization#

We start with a simple case of minimizing a convex function $$f: \RR^n \to \RR$$.

1. The optimal value shall be denoted by $$f^*$$.

2. We shall assume that the optimal value is finite (i.e., $$f^* > -\infty$$).

3. The set of optimal minimizers shall be denoted by $$X^*$$.

4. Since $$f$$ attains its optimal value, hence $$X^*$$ is nonempty.

5. A global minimizer will be denoted by $$\bx^*$$.

The basic subgradient method uses the following iteration.

(11.1)#$\bx^{k+1} = \bx^k - t_k \bg^k.$

In this iteration

1. $$\bx^k$$ is the current ($$k$$-th) iterate.

2. $$\bg^k$$ is any subgradient of $$f$$ at $$\bx^k$$. We have $$\bg^k \in \partial f(\bx^k)$$.

3. $$t_k > 0$$ is the step size for the $$k$$-th iteration in the negative subgradient direction $$- \bg^k$$.

4. $$\bx^{k+1}$$ is the new ($$k+1$$-th) iterate.

5. $$\| t_k \bg^k \|_2$$ denotes the step-length.

Thus, in each iteration of the subgradient method, we take a step in the direction of a negative subgradient.

### 11.1.1.1. Descent Directions#

Observation 11.1 (Negative subgradient need not be a descent direction)

1. Recall from Definition 10.22 that a descent direction of $$f$$ at $$\bx$$ is a direction along which the directional derivative is negative; i.e., $$f'(\bx; \bd) < 0$$.

2. If $$\bd$$ is a descent direction at $$\bx$$, then there exists a step size $$t > 0$$ such that

$f(\bx + t \bd) < f(\bx).$
3. If $$f$$ is a differentiable function and $$\bd = - \nabla f(\bx)$$, then

$f'(\bx; - \nabla f(\bx)) = - \| \nabla f(\bx) \|^2 \leq 0.$
4. Hence, the negative gradient is always a descent direction if the gradient is nonzero.

5. However, for a nondifferentiable function $$f$$, a negative subgradient may not be a descent direction.

6. Recall from max formula (Theorem 9.219) that

$f'(\bx;\bd) = \sup \{ \langle \bd, \bg \rangle \ST \bg \in \partial f(\bx) \}.$
7. Let $$\tilde{\bg}$$ be the chosen subgradient at an iteration of the subgradient method.

8. Then

$f'(\bx; -\tilde{\bg}) = \sup_{\bg \in \partial f(\bx)} \{ \langle - \tilde{\bg}, \bg \rangle\} = - \inf_{\bg \in \partial f(\bx)} \{ \langle \tilde{\bg}, \bg \rangle\}.$
9. It is possible that the quantity $$\langle \tilde{\bg}, \bg \rangle$$ is negative for some other subgradient $$\bg$$ at $$\bx$$.

10. Hence the directional derivative along $$-\tilde{\bg}$$ may be positive.

11. This argument shows that a negative subgradient is not necessarily a descent direction.

### 11.1.1.2. Tracking of the Best Estimate#

1. Since the negative subgradient $$-\bg^k$$ may not be a descent direction, hence it is quite likely that $$f(\bx^{k+1}) > f(\bx^k)$$.

2. Even if the selected $$-\bg^k$$ is a descent direction at $$\bx^k$$, it is possible that the step size can be such that $$f(\bx^{k+1}) > f(\bx^k)$$.

3. This happens since typically there is no local line search method invoked to select an appropriate step size in a subgradient method.

4. Since the negative subgradient itself may not be a descent direction, hence running a local line search will be futile.

5. The alternative is to track the best estimate of the optimal value so far using the following rule

$f_{\best}^k = \min \{f_{\best}^{k-1}, f(\bx^k) \}.$
6. If the best estimate has decreased, we also update the estimate of the local minimizer as $$\bx_{\best}$$ to $$\bx^{k+1}$$.

7. It is easy to see that

$f_{\best}^k = \min \{f(\bx^1), \dots, f(\bx^k) \}.$
8. We can see that the sequence $$\{ f_{\best}^k \}$$ is a nonincreasing sequence.

9. Hence it has a limit.

10. The success of the subgradient method depends on whether the limit equals the optimal value.

### 11.1.1.3. Step Size Selection#

In a gradient descent method, one typically uses a line search method to select the step size. Hence, it depends on current iterate and the function values in its neighborhood.

In a subgradient method, the step size selection is different. Typically, the step sizes (or step lengths) are determined beforehand and they donâ€™t depend on the data computed during the execution of the algorithm. Following are some of the common methods for step size selection.

1. Constant step size. $$t_k = t$$ for some positive constant $$t > 0$$ which is independent of $$k$$.

2. Constant step length.

$t_k = \frac{c}{\| \bg^k \|_2}$

where $$c > 0$$ is a predefined step length. Note that with this choice of step size, we have:

$\| \bx^{k + 1} - \bx^k \|_2 = \| t_k \bg^k \|_2 = c.$
3. Square summable but not summable. We choose step sizes that satisfy the following constraints:

$t_k > 0 \Forall k, \quad \sum_{k=1}^{\infty} t_k^2 < \infty, \sum_{k=1}^{\infty} t_k = \infty.$

As an example, fix some $$a > 0$$ and $$b \geq 0$$ and let

$t_k = \frac{a}{b + k}.$
4. Nonsummable diminishing. We choose step sizes that satisfy the following constraints:

$t_k > 0 \Forall k, \quad \lim_{k \to \infty} t_k = 0, \quad \sum_{k=1}^{\infty} t_k = \infty.$

As and example, fix some $$a > 0$$ and let

$t_k = \frac{a}{\sqrt{k}}.$
5. Nonsummable diminishing step lengths: We choose step sizes as $$t_k = \frac{c_k}{\| \bg^k \|_2}$$ where

$c_k > 0, \quad \lim_{k \to \infty} c_k = 0,\quad \sum_{k=1}^{\infty} c_k = \infty.$

### 11.1.1.4. The Subgradient Method#

We now present the overall template for the subgradient method.

Algorithm 11.1 (The subgradient method)

Inputs

1. $$f$$ : function to be minimized

2. $$\bx^1$$: initial guess for the minimizer

3. $$t_1$$: initial step size

Outputs

1. $$f_{\best}^1$$: Best estimate of minimum value of $$f$$

2. $$\bx_{\best}^1$$ the best estimate of the minimizer

Initialization

1. $$f_{\best} = f(\bx^1)$$.

2. $$\bx_{\best} = \bx^1$$.

General iteration: for $$k=1,2,\dots$$, execute the following steps

1. Select a subgradient $$\bg^k$$ from $$\partial f(\bx^k)$$.

2. Select a step size: $$t_k$$.

3. Update minimizer estimate: $$\bx^{k+1} = \bx^k - t_k \bg^k$$.

4. $$k = k + 1$$.

5. Compute $$f(\bx^k)$$.

6. if $$f_{\best}^{k-1} > f(\bx^k)$$:

1. Update best estimate of optimal value: $$f_{\best}^k = f(\bx^k)$$.

2. Update best estimate of minimizer: $$\bx_{\best}^k = \bx^k$$.

7. Otherwise, retain current values

1. $$f_{\best}^k = f_{\best}^{k-1}$$.

2. $$\bx_{\best}^k = \bx_{\best}^{k-1}$$.

8. If stopping criterion is met, then STOP.

For a concrete implementation, we shall need the following:

1. A way to compute $$f(\bx)$$ at every $$\bx$$.

2. A way to pick a subgradient $$\bg \in \partial f(\bx)$$ at every $$\bx$$.

3. A step size selection strategy.

4. A stopping criterion.

Note that there is no need to specify the complete subdifferential at every $$\bx$$. The stopping criterion will be developed in the following as part of the convergence analysis of the algorithm.

## 11.1.2. Convergence Analysis#

Our goal is to show that $$f_{\best}^k \downarrow f^*$$. Towards this, we will establish a suboptimality bound on the estimate of the optimal value after $$k$$ iterations.

We shall make the following assumptions in the analysis.

1. $$f$$ is a real valued convex function.

2. The optimal value $$f^*$$ is finite and the minimizer $$\bx^*$$ exists.

3. The norm of the subgradients is bounded. i.e., there exists $$G > 0$$ such that

$\| \bg \|_2 \leq G \Forall \bg \in \partial f(\bx) \Forall \bx.$

Recall from Theorem 9.232 that if a convex function is Lipschitz continuous, then its subgradients are bounded.

4. A number $$R$$ that satisfies

$R \geq \| \bx^* - \bx^1 \|_2$

for some $$\bx^* \in X^*$$ is known beforehand. $$R$$ can be interpreted as an upper bound on the distance of the initial point to the set of optimal minimizers.

$R \geq \text{dist}(\bx^1, X^*).$

Theorem 11.1 (Suboptimality bound for subgradient method)

Let $$\| \bx^1 - \bx^* \|_2 \leq R$$.

After $$k$$ iterations of the subgradient method, we have

(11.2)#$f_{\best}^k - f^* \leq \frac{R^2 + \sum_{i=1}^k t_i^2 \| \bg_i \|_2^2 }{2 \sum_{i=1}^k t_i }.$

If the subgradients of $$f$$ are bounded by $$\| \bg \| \leq G$$ for all $$\bg \in \partial f(\bx)$$ for every $$\bx$$, then this simplifies to:

(11.3)#$f_{\best}^k - f^* \leq \frac{R^2 + G^2 \sum_{i=1}^k t_i^2}{2 \sum_{i=1}^k t_i }.$

Proof. Let $$\bx^*$$ be a minimizer of $$f$$. From the subgradient inequality, we have

$f^* = f(\bx^*) \geq f(\bx^k) + \langle \bx^* - \bx^k, \bg^k \rangle.$

Hence

$\langle \bx^* - \bx^k, \bg^k \rangle \leq f^* - f(\bx^k).$

We first develop an upper bound on the distance between the $$k+1$$-th iterate and a minimizer.

$\begin{split} \| \bx^{k+1} - \bx^* \|_2^2 & = \| \bx^k - t_k \bg^k - \bx^* \|_2^2 \\ & = \| (\bx^k - \bx^*) - t_k \bg^k \|_2^2 \\ & = \| \bx^k - \bx^* \|_2^2 - 2 t_k \langle \bx^k - \bx^*, \bg^k \rangle + t_k^2 \| \bg^k \|_2^2 \\ & \leq \| \bx^k - \bx^* \|_2^2 - 2 t_k (f(\bx^k) - f^*) + t_k^2 \| \bg^k \|_2^2. \end{split}$

Applying this inequality recursively, we get

$\| \bx^{k+1} - \bx^* \|_2^2 \leq \| \bx^1 - \bx^* \|_2^2 - 2 \sum_{i=1}^k t_i (f(\bx^i) - f^*) + \sum_{i=1}^k t_i^2 \| \bg^i \|_2^2.$

Using the fact that $$\| \bx^{k+1} - \bx^* \|_2^2 \geq 0$$ and $$\| \bx^1 - \bx^* \|_2 \leq R$$, we have

$2 \sum_{i=1}^k t_i (f(\bx^i) - f^*) \leq R^2 + \sum_{i=1}^k t_i^2 \| \bg^i \|_2^2.$

On the L.H.S., we can see that

$\sum_{i=1}^k t_i (f(\bx^i) - f^*) \geq \left ( \sum_{i=1}^k t_i \right ) \min_{i=1,\dots,k} (f(\bx^i) - f^*) = \left ( \sum_{i=1}^k t_i \right ) (f_{\best}^k - f^* ).$

Combining this with the previous inequality, we get

$2 \left ( \sum_{i=1}^k t_i \right ) (f_{\best}^k - f^* ) \leq R^2 + \sum_{i=1}^k t_i^2 \| \bg^i \|_2^2.$

This can be rewritten as

$f_{\best}^k - f^* \leq \frac{R^2 + \sum_{i=1}^k t_i^2 \| \bg_i \|_2^2 }{2 \sum_{i=1}^k t_i }$

as desired.

Applying the upper bound on the subgradient $$\| \bg \|_2 \leq G$$, we have

$f_{\best}^k - f^* \leq \frac{R^2 + G^2 \sum_{i=1}^k t_i^2}{2\sum_{i=1}^k t_i }$

as desired.

Based on this result, we can provide upper bounds on the estimation error for different step size selection strategies.

### 11.1.2.1. Constant Step Size#

Corollary 11.1 (Convergence of subgradient method with constant step size)

If $$t_k = t$$ for all $$k$$, then we have

$f_{\best}^k - f^* \leq \frac{R^2 + G^2 t^2 k}{2 t k }.$

The subgradient method converges to within $$G^2 t / 2$$ of the optimal value $$f^*$$. In other words,

$\lim_{k \to \infty} f_{\best}^k - f^* \leq \frac{G^2 t}{2}.$

Also, $$f_{\best}^k - f^* \leq G^2 t$$ within at most $$R^2 / (G^2 t^2)$$ steps.

Proof. By putting $$t_k = t$$ in (11.3), we obtain the desired upper bound. Taking the limit $$k \to \infty$$, we see that

$\lim_{k \to \infty} (f_{\best}^k - f^*) \leq \frac{G^2 t}{2}.$

Now,

$\begin{split} & \frac{R^2 + G^2 t^2 k}{2 t k } \leq G^2 t \\ \iff & R^2 + G^2 t^2 k \leq 2 G^2 t^2 k \\ \iff & R^2 \leq G^2 t^2 k \\ \iff & k \geq \frac{R^2}{G^2 t^2}. \end{split}$

Since $$f_{\best}^k$$ is a nonincreasing sequence, hence for all $$k \geq R^2 / (G^2 t^2)$$, we must have

$f_{\best}^k - f^* \leq G^2 t.$

### 11.1.2.2. Constant Step Length#

Corollary 11.2 (Convergence of subgradient method with constant step length)

Let $$c$$ be the constant step length for the subgradient method. Let $$t_k = \frac{c}{\| \bg^k \|_2}$$ for every $$k$$. Then we have

$f_{\best}^k - f^* \leq \frac{R^2 + c^2 k}{2 c k / G }.$

The subgradient method converges to within $$G c / 2$$ of the optimal value $$f^*$$. In other words,

$\lim_{k \to \infty} f_{\best}^k - f^* \leq \frac{G c}{2}.$

Proof. By putting $$t_i = \frac{c}{\| \bg^i \|_2}$$ in (11.2), we see that

$f_{\best}^k - f^* \leq \frac{R^2 + \sum_{i=1}^k c^2 }{2 \sum_{i=1}^k t_i }.$

Also, note that

$\begin{split} & t_i = \frac{c}{\| \bg^i \|_2}\\ \implies & t_i \geq \frac{c}{ G} \\ \implies & \sum_{i=1}^k t_i \geq \frac{c k}{G} \\ \implies & \frac{1}{\sum_{i=1}^k t_i} \leq \frac{1}{c k / G}. \end{split}$

Putting this back, we have

$f_{\best}^k - f^* \leq \frac{R^2 + c^2 k }{2 c k / G }.$

Taking the limit $$k \to \infty$$, we see that

$\lim_{k \to \infty} (f_{\best}^k - f^*) \leq \frac{G c}{2}.$

### 11.1.2.3. Square Summable Step Sizes#

Corollary 11.3 (Convergence of subgradient method with square summable step sizes)

Let the step sizes satisfy the rule

$t_k > 0 \Forall k, \quad \sum_{k=1}^{\infty} t_k^2 < \infty, \sum_{k=1}^{\infty} t_k = \infty.$

Further, let $$T = \sum_{k=1}^{\infty} t_k^2$$.

Then we have

$f_{\best}^k - f^* \leq \frac{R^2 + G^2 T}{2 \sum_{i=1}^k t_k}.$

The subgradient method converges to $$f^*$$.

Proof. By putting the step size constraints into (11.3), we obtain

$f_{\best}^k - f^* \leq \frac{R^2 + G^2 T}{2 \sum_{i=1}^k t_i }.$

Taking the limit $$k \to \infty$$, the denominator grows to $$\infty$$ while the numerator converges to $$R^2 + G^2 T$$. Hence

$\lim_{k \to \infty} f_{\best}^k - f^* = 0.$

### 11.1.2.4. Diminishing Step Sizes#

Corollary 11.4 (Convergence of subgradient method with diminishing step sizes)

Let the step sizes satisfy the rule

$t_k > 0 \Forall k, \quad \lim_{k \to \infty} t_k = 0, \quad \sum_{k=1}^{\infty} t_k = \infty.$

Then the subgradient method converges to $$f^*$$.

Proof. We shall show that the R.H.S. of (11.3) converges to 0 as $$k \to \infty$$.

1. Let $$\epsilon > 0$$.

2. There exists integer $$n_1$$ such that $$t_i \leq \epsilon / G^2$$ for all $$i > n_1$$ since $$t_k \to 0$$.

3. There also exists an integer $$n_2$$ such that

$\sum_{i=1}^{n_2} t_i \geq \frac{1}{\epsilon} \left ( R^2 + G^2 \sum_{i=1}^{n_1} t_i^2 \right)$

since $$t_i$$ is nonsummable.

4. Let $$n = \max\{n_1, n_2 \}$$.

5. Then, for every $$k > n$$, we have

$\frac{R^2 + G^2 \sum_{i=1}^k t_i^2}{2 \sum_{i=1}^k t_i} = \frac{R^2 + G^2 \sum_{i=1}^{n_1} t_i^2}{2 \sum_{i=1}^k t_i} + \frac{G^2 \sum_{i=n_1 + 1}^k t_i^2}{2 \sum_{i=1}^{n_1} t_i + 2 \sum_{i=n_1 + 1}^k t_i}.$
6. We can see that

$\sum_{i=1}^k t_i \geq \sum_{i=1}^{n_2} t_i \geq \frac{1}{\epsilon} \left ( R^2 + G^2 \sum_{i=1}^{n_1} t_i^2 \right).$
7. Hence

$\frac{R^2 + G^2 \sum_{i=1}^{n_1} t_i^2}{2 \sum_{i=1}^k t_i} \leq \frac{\epsilon}{2}.$
8. For every $$i > n_1$$, we have $$t_i < \epsilon / G^2$$.

9. Hence

$G^2 \sum_{i=n_1 + 1}^k t_i^2 \leq \epsilon \sum_{i=n_1 + 1}^k t_i.$
10. We can now see that

$\begin{split} & \frac{G^2 \sum_{i=n_1 + 1}^k t_i^2}{2 \sum_{i=1}^{n_1} t_i + 2 \sum_{i=n_1 + 1}^k t_i}\\ & < \frac{G^2 \sum_{i=n_1 + 1}^k t_i^2}{2 \sum_{i=n_1 + 1}^k t_i}\\ & \leq \frac{\epsilon \sum_{i=n_1 + 1}^k t_i}{2 \sum_{i=n_1 + 1}^k t_i}\\ &= \frac{\epsilon}{2}. \end{split}$
11. Combining, we have

$\frac{R^2 + G^2 \sum_{i=1}^k t_i^2}{2 \sum_{i=1}^k t_i} < \frac{\epsilon}{2} + \frac{\epsilon}{2} = \epsilon$

for every $$k > n$$.

12. Thus, for every $$\epsilon > 0$$, there exists $$n$$ such that for all $$k > n$$, we have

$\frac{R^2 + G^2 \sum_{i=1}^k t_i^2}{2 \sum_{i=1}^k t_i} < \epsilon.$
13. Hence

$\lim_{k \to \infty}\frac{R^2 + G^2 \sum_{i=1}^k t_i^2}{2 \sum_{i=1}^k t_i} = 0.$
14. Hence the subgradient method converges.

### 11.1.2.5. Diminishing Step Lengths#

Corollary 11.5 (Convergence of subgradient method with diminishing step lengths)

Let the step sizes $$t_k = \frac{c_k}{\| \bg^k \|_2}$$ be chosen such that

$c_k > 0, \quad \lim_{k \to \infty} c_k = 0,\quad \sum_{k=1}^{\infty} c_k = \infty$

where $$c_k$$ is the step length for the $$k$$-th iteration.

Then the subgradient method converges to $$f^*$$.

Proof. From (11.2) we have

$f_{\best}^k - f^* \leq \frac{R^2 + \sum_{i=1}^k t_i^2 \| \bg_i \|_2^2 }{2 \sum_{i=1}^k t_i } = \frac{R^2 + \sum_{i=1}^k c_i^2}{2 \sum_{i=1}^k t_i}.$
1. We have

$\sum_{i=1}^k t_i = \sum_{i=1}^k \frac{c_i}{\| \bg^i \|_2} \geq \sum_{i=1}^k \frac{c_i}{G}.$
2. Hence we have

$f_{\best}^k - f^* \leq \frac{R^2 + \sum_{i=1}^k c_i^2}{(2/G) \sum_{i=1}^k c_i}.$
3. Following an argument similar to the proof of Corollary 11.4, the R.H.S. converges to 0 as $$k \to \infty$$.

4. Hence

$\lim_{k \to \infty} f_{\best}^k - f^* = 0.$

### 11.1.2.6. On the Suboptimality Bound#

If we run the subgradient method for $$k$$ iterations, we get a suboptimality bound given by (11.3).

$f_{\best}^k - f^* \leq \frac{R^2 + G^2 \sum_{i=1}^k t_i^2}{2 \sum_{i=1}^k t_i }.$

A natural question is how tight can this bound be by an appropriate selection of step sizes $$t_1, \dots, t_k$$.

1. Note that the R.H.S. is a convex and symmetric function of $$t_1, \dots, t_k$$.

2. Hence, at the optimal value, we must have $$t_1 = \dots = t_k$$.

3. Let $$t = t_1 = \dots = t_k$$.

4. Then the suboptimality bound reduces to

$\frac{R^2 + G^2 k t^2}{2 k t }.$
5. This is minimized at $$t = \frac{R}{G \sqrt{k}}$$.

6. In other words, the finite sequence of step-sizes $$t_1, \dots, t_k$$ that minimizes the suboptimality bound in (11.3) after exactly $$k$$ iterations is given by

$t_i = \frac{R}{G \sqrt{k}}, \Forall i=1,\dots,k.$
7. It must be noted that this suboptimality bound is dependent on the number of iterations.

8. This choice of constant step size gives us the bound

$f_{\best}^k - f^* \leq \frac{RG}{\sqrt{k}}.$

Theorem 11.2 (A bound on the suboptimality bound)

Let $$t_1, \dots, t_k$$ be any choice of step sizes for the subgradient method for $$k$$ iterations. Then we must have

$\frac{R^2 + G^2 \sum_{i=1}^k t_i^2}{2 \sum_{i=1}^k t_i } \geq \frac{RG}{\sqrt{k}}$

where the L.H.S. is the step-size selection dependent suboptimality bound on $$f_{\best}^k - f^*$$.

Accordingly, the number of steps required to achieve a guaranteed accuracy of $$f_{\best}^k - f^* \leq \epsilon$$ for some $$\epsilon > 0$$ is at least $$(RG / \epsilon)^2$$ as per the suboptimality bound (11.3) irrespective of the step size selection.

Proof. We have already shown that the suboptimality bound is minimized by the constant step size selection where $$t_i = \frac{R}{G \sqrt{k}}$$ for every $$i=1,\dots,k$$ and is given by $$\frac{RG}{\sqrt{k}}$$.

1. Let $$\epsilon > 0$$ be the required guaranteed accuracy.

2. Then $$\epsilon \geq \frac{RG}{\sqrt{k}}$$ since $$\frac{RG}{\sqrt{k}}$$ is the minimum guaranteed suboptimality bound after $$k$$ iterations as per (11.3).

3. Hence we have $$\sqrt{k} \geq \frac{RG}{\epsilon}$$.

4. Hence we have $$k \geq \left ( \frac{RG}{\epsilon} \right )^2$$.

We note that the suboptimality bound of $$\epsilon$$ can be guaranteed in $$k$$ iterations only if we use the constant step sizes of $$t = \frac{R}{G \sqrt{k}}$$.

Observation 11.2 (Interpretation of $$R G$$)

Initial uncertainty

1. Assume that $$f$$ is Lipschitz continuous with the constant $$G$$.

2. By Lipschitz continuity, we have

$f^1 - f^* \leq G \| \bx^1 - \bx^* \|_2 \leq R G.$
3. Hence $$RG$$ is an estimate of the initial uncertainty in $$f^*$$.

Reduction in uncertainty

1. If our goal is to achieve $$f_{\best}^k - f^* \leq \epsilon$$, then $$\epsilon$$ is an estimate of the final uncertainty in $$f^*$$.

2. Hence $$RG / \epsilon$$ is the ratio of initial uncertainty in $$f^*$$ to the final uncertainty in $$f^*$$.

3. By Theorem 11.2, we require a minimum of $$(R G / \epsilon)^2$$ iterations to achieve this much reduction in uncertainty.

This analysis shows that the subgradient method is very slow. To achieve a 1000x reduction in the uncertainty of $$f^*$$, we need a million iterations at least.