Optimization in Julia

This lecture gives an overview of some optimization tools in Julia.

In [29]:
versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.6.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4

Flowchart

  • Statisticians do optimizations in daily life: maximum likelihood estimation, machine learning, ...

  • Category of optimization problems:

    1. Problems with analytical solutions: least squares, principle component analysis, canonical correlation analysis, ...

    2. Problems subject to Disciplined Convex Programming (DCP): linear programming (LP), quadratic programming (QP), second-order cone programming (SOCP), semidefinite programming (SDP), and geometric programming (GP).

    3. Nonlinear programming (NLP): Newton type algorithms, Fisher scoring algorithm, EM algorithm, MM algorithms.

    4. Large scale optimization: ADMM, SGD, ...

Flowchart

Modeling tools and solvers

Getting familiar with good optimization softwares broadens the scope and scale of problems we are able to solve in statistics. Following table lists some of the best optimization softwares.

LP MILP SOCP MISOCP SDP GP NLP MINLP R Matlab Julia Python Cost
modeling tools
cvx x x x x x x x x x A
Convex.jl x x x x x x O
JuMP.jl x x x x x x x O
MathProgBase.jl x x x x x x x O
MathOptInterface.jl x x x x x x x O
convex solvers
Mosek x x x x x x x x x x x A
Gurobi x x x x x x x x A
CPLEX x x x x x x x x A
SCS x x x x x x O
COSMO.jl x x x x O
NLP solvers
NLopt x x x x x O
Ipopt x x x x x O
KNITRO x x x x x x x x $
  • O: open source
  • A: free academic license
  • $: commercial
  • Difference between modeling tool and solvers

    • Modeling tools such as cvx (for Matlab) and Convex.jl (Julia analog of cvx) implement the disciplined convex programming (DCP) paradigm proposed by Grant and Boyd (2008) http://stanford.edu/~boyd/papers/disc_cvx_prog.html. DCP prescribes a set of simple rules from which users can construct convex optimization problems easily.

    • Solvers (Mosek, Gurobi, Cplex, SCS, COSMO, ...) are concrete software implementation of optimization algorithms. My favorite ones are: Mosek/Gurobi/SCS for DCP and Ipopt/NLopt for nonlinear programming. Mosek and Gurobi are commercial software but free for academic use. SCS/Ipopt/NLopt are open source.

    • Modeling tools usually have the capability to use a variety of solvers. But modeling tools are solver agnostic so users do not have to worry about specific solver interface.

  • For this course, install following tools:

    • Gurobi: 1. Download Gurobi at link. 2. Request free academic license at link. 3. Run grbgetkey XXXXXXXXX command on terminal as suggested. It'll retrieve a license file and put it under the home folder. 4. Set up the environmental variables. On my machine, I put following two lines in the ~/.julia/config/startup.jl file: ENV["GUROBI_HOME"] = "/Library/gurobi902/mac64" and ENV["GRB_LICENSE_FILE"] = "/Users/huazhou/Documents/Gurobi/gurobi.lic".
    • Mosek: 1. Request free academic license at link. The license file will be sent to your edu email within minutes. Check Spam folder if necessary. 2. Put the license file at the default location ~/mosek/.
    • Convex.jl, SCS.jl, Gurobi.jl, Mosek.jl, MathProgBase.jl, NLopt.jl, Ipopt.jl.

DCP Using Convex.jl

Standard convex problem classes like LP (linear programming), QP (quadratic programming), SOCP (second-order cone programming), SDP (semidefinite programming), and GP (geometric programming), are becoming a technology.

DCP Hierarchy

Example: microbiome regression analysis

We illustrate optimization tools in Julia using microbiome analysis as an example.

16S microbiome sequencing techonology generates sequence counts of various organisms (OTUs, operational taxonomic units) in samples.

Microbiome Data

For statistical analysis, counts are normalized into proportions for each sample, resulting in a covariate matrix $\mathbf{X}$ with all rows summing to 1. For identifiability, we need to add a sum-to-zero constraint to the regression cofficients. In other words, we need to solve a constrained least squares problem
$$ \text{minimize} \frac{1}{2} \|\mathbf{y} - \mathbf{X} \beta\|_2^2 $$ subject to the constraint $\sum_{j=1}^p \beta_j = 0$. For simplicity we ignore intercept and non-OTU covariates in this presentation.

Let's first generate an artifical data set.

In [30]:
using Random, LinearAlgebra, SparseArrays

Random.seed!(123) # seed

n, p = 100, 50
X = rand(n, p)
# scale each row of X sum to 1
lmul!(Diagonal(1 ./ vec(sum(X, dims=2))), X)
# true β is a sparse vector with about 10% non-zero entries
β = sprandn(p, 0.1) 
y = X * β + randn(n);

Sum-to-zero regression

The sum-to-zero contrained least squares is a standard quadratic programming (QP) problem so should be solved easily by any QP solver.

Modeling using Convex.jl

We use the Convex.jl package to model this QP problem. For a complete list of operations supported by Convex.jl, see http://www.juliaopt.org/Convex.jl/stable/operations/.

In [31]:
using Convex

β̂cls = Variable(size(X, 2))
problem = minimize(0.5sumsquares(y - X * β̂cls)) # objective
problem.constraints += sum(β̂cls) == 0; # constraint
problem
Out[31]:
minimize
└─ * (convex; positive)
   ├─ 0.5
   └─ qol_elem (convex; positive)
      ├─ norm2 (convex; positive)
      │  └─ …
      └─ [1.0]
subject to
└─ == constraint (affine)
   ├─ sum (affine; real)
   │  └─ 50-element real variable (id: 101…509)
   └─ 0

status: `solve!` not called yet

Mosek

We first use the Mosek solver to solve this QP.

In [32]:
using MosekTools
solver = () -> Mosek.Optimizer(LOG=1)

@time solve!(problem, solver)
Problem
  Name                   :                 
  Objective sense        : min             
  Type                   : CONIC (conic optimization problem)
  Constraints            : 107             
  Cones                  : 2               
  Scalar variables       : 157             
  Matrix variables       : 0               
  Integer variables      : 0               

Optimizer started.
Presolve started.
Linear dependency checker started.
Linear dependency checker terminated.
Eliminator started.
Freed constraints in eliminator : 0
Eliminator terminated.
Eliminator - tries                  : 1                 time                   : 0.00            
Lin. dep.  - tries                  : 1                 time                   : 0.00            
Lin. dep.  - number                 : 0               
Presolve terminated. Time: 0.00    
Problem
  Name                   :                 
  Objective sense        : min             
  Type                   : CONIC (conic optimization problem)
  Constraints            : 107             
  Cones                  : 2               
  Scalar variables       : 157             
  Matrix variables       : 0               
  Integer variables      : 0               

Optimizer  - threads                : 8               
Optimizer  - solved problem         : the dual        
Optimizer  - Constraints            : 52
Optimizer  - Cones                  : 3
Optimizer  - Scalar variables       : 106               conic                  : 106             
Optimizer  - Semi-definite variables: 0                 scalarized             : 0               
Factor     - setup time             : 0.00              dense det. time        : 0.00            
Factor     - ML order time          : 0.00              GP order time          : 0.00            
Factor     - nonzeros before factor : 1328              after factor           : 1328            
Factor     - dense dim.             : 0                 flops                  : 3.08e+05        
ITE PFEAS    DFEAS    GFEAS    PRSTATUS   POBJ              DOBJ              MU       TIME  
0   3.0e+00  5.0e-01  2.0e+00  0.00e+00   0.000000000e+00   -1.000000000e+00  1.0e+00  0.00  
1   2.9e-01  4.8e-02  5.7e-01  -9.26e-01  2.915150182e+00   9.703344707e+00   9.7e-02  0.00  
2   4.4e-02  7.4e-03  1.3e-01  -7.10e-01  1.845835780e+01   3.574733771e+01   1.5e-02  0.00  
3   2.1e-03  3.5e-04  4.6e-04  7.65e-01   2.901642971e+01   2.908435432e+01   7.0e-04  0.00  
4   1.7e-06  2.9e-07  1.3e-08  1.00e+00   2.899788909e+01   2.899798941e+01   5.7e-07  0.00  
5   2.5e-10  4.2e-11  2.6e-14  1.00e+00   2.899829742e+01   2.899829744e+01   8.4e-11  0.00  
Optimizer terminated. Time: 0.01    

  0.630648 seconds (1.02 M allocations: 59.503 MiB, 98.52% compilation time)
In [33]:
# Check the status, optimal value, and minimizer of the problem
problem.status, problem.optval, β̂cls.value
Out[33]:
(MathOptInterface.OPTIMAL, 28.998297419888974, [13.668687462007188; -9.77274135078777; … ; 21.133785589283498; -15.115087759732402])
In [34]:
# check constraint satisfication
sum(β̂cls.value)
Out[34]:
-1.234852220477478e-10

Gurobi

Switch to Gurobi solver:

In [35]:
using Gurobi
solver = () -> Gurobi.Optimizer(OutputFlag=1)

@time solve!(problem, solver)
Academic license - for non-commercial use only
Gurobi Optimizer version 9.0.2 build v9.0.2rc0 (mac64)
Optimize a model with 107 rows, 157 columns and 5160 nonzeros
Model fingerprint: 0xab10ce97
Model has 2 quadratic constraints
Coefficient statistics:
  Matrix range     [1e-05, 2e+00]
  QMatrix range    [1e+00, 1e+00]
  Objective range  [1e+00, 1e+00]
  Bounds range     [0e+00, 0e+00]
  RHS range        [2e-03, 3e+00]
Presolve removed 2 rows and 1 columns
Presolve time: 0.00s
Presolved: 105 rows, 156 columns, 5158 nonzeros
Presolved model has 2 second-order cone constraints
Ordering time: 0.00s

Barrier statistics:
 Free vars  : 50
 AA' NZ     : 5.154e+03
 Factor NZ  : 5.262e+03
 Factor Ops : 3.590e+05 (less than 1 second per iteration)
 Threads    : 1

                  Objective                Residual
Iter       Primal          Dual         Primal    Dual     Compl     Time
   0   1.12826689e+01 -5.01000000e-01  2.29e+01 1.00e-01  2.00e-01     0s
   1   3.22840813e+00  6.27715458e-02  4.49e+00 3.72e-05  3.88e-02     0s
   2   4.08178592e+00  5.62149847e+00  3.32e+00 3.97e-05  2.14e-02     0s
   3   1.64813094e+01  8.57926093e+00  9.91e-01 3.54e-05  9.22e-02     0s
   4   1.75521420e+01  1.58590777e+01  7.37e-01 4.78e-06  4.66e-02     0s
   5   1.92008663e+01  2.62514505e+01  6.28e-01 3.87e-06  1.63e-02     0s
   6   3.03618788e+01  2.79564255e+01  9.67e-03 1.02e-07  2.45e-02     0s
   7   2.91004528e+01  2.89455416e+01  3.83e-04 4.40e-08  1.54e-03     0s
   8   2.90061906e+01  2.89953287e+01  2.12e-05 3.54e-09  1.07e-04     0s
   9   2.89983319e+01  2.89982844e+01  2.40e-07 1.52e-09  4.70e-07     0s
  10   2.89982975e+01  2.89982974e+01  4.99e-09 3.80e-11  2.65e-09     0s

Barrier solved model in 10 iterations and 0.01 seconds
Optimal objective 2.89982975e+01


User-callback calls 61, time in user-callback 0.00 sec
  0.370650 seconds (593.40 k allocations: 34.694 MiB, 8.44% gc time, 95.54% compilation time)
┌ Warning: Passing optimizer attributes as keyword arguments to
│ Gurobi.Optimizer is deprecated. Use
│     MOI.set(model, MOI.RawParameter("key"), value)
│ or
│     JuMP.set_optimizer_attribute(model, "key", value)
│ instead.
└ @ Gurobi /Users/huazhou/.julia/packages/Gurobi/HtUHB/src/MOI_wrapper.jl:258

In [36]:
# Check the status, optimal value, and minimizer of the problem
problem.status, problem.optval, β̂cls.value
Out[36]:
(MathOptInterface.OPTIMAL, 28.998297507778627, [13.668678309342742; -9.7727402141536; … ; 21.133769207305104; -15.115075595252181])
In [37]:
# check constraint satisfication
sum(β̂cls.value)
Out[37]:
1.4210854715202004e-14

COSMO

Switch to COSMO solver (pure Julia implementation:

In [38]:
# Use COSMO solver
using COSMO
solver = () -> COSMO.Optimizer(max_iter=5000)

@time solve!(problem, solver)
------------------------------------------------------------------
          COSMO v0.7.9 - A Quadratic Objective Conic Solver
                         Michael Garstka
                University of Oxford, 2017 - 2020
------------------------------------------------------------------

Problem:  x ∈ R^{53},
          constraints: A ∈ R^{107x53} (5056 nnz),
          matrix size to factor: 160x160,
          Floating-point precision: Float64
Sets:     SecondOrderCone) of dim: 101
          SecondOrderCone) of dim: 3
          ZeroSet) of dim: 2
          Nonnegatives) of dim: 1
Settings: ϵ_abs = 1.0e-04, ϵ_rel = 1.0e-04,
          ϵ_prim_inf = 1.0e-05, ϵ_dual_inf = 1.0e-04,
          ρ = 0.1, σ = 1e-06, α = 1.6,
          max_iter = 5000,
          scaling iter = 10 (on),
          check termination every 40 iter,
          check infeasibility every 40 iter,
          KKT system solver: QDLDL
Setup Time: 1.18ms

Iter:	Objective:	Primal Res:	Dual Res:	Rho:
1	-2.4722e+00	6.0889e+00	3.8576e+02	1.0000e-01
40	2.6882e+01	3.2489e-01	5.6117e+00	1.0000e-01
80	1.5528e+01	4.4478e+00	1.2249e-02	3.7247e-03
120	2.5316e+01	1.5241e+00	2.1910e-03	3.7247e-03
160	2.2682e+01	2.7035e-01	2.0778e-03	3.7247e-03
200	2.2820e+01	2.6320e-01	5.6296e-04	3.7247e-03
240	2.3124e+01	2.6405e-01	3.0970e-05	3.7247e-03
280	2.3409e+01	2.3256e-01	3.9044e-04	2.3597e-02
320	2.3752e+01	2.1534e-01	3.9286e-04	2.3597e-02
360	2.4065e+01	1.9999e-01	3.6019e-04	2.3597e-02
400	2.4352e+01	1.8621e-01	3.3135e-04	2.3597e-02
440	2.4617e+01	1.7379e-01	3.0587e-04	2.3597e-02
480	2.4862e+01	1.6252e-01	2.8320e-04	2.3597e-02
520	2.5090e+01	1.5227e-01	2.6290e-04	2.3597e-02
560	2.5301e+01	1.4289e-01	2.4464e-04	2.3597e-02
600	2.5497e+01	1.3429e-01	2.2812e-04	2.3597e-02
640	2.5681e+01	1.2638e-01	2.1313e-04	2.3597e-02
680	2.5853e+01	1.1908e-01	1.9947e-04	2.3597e-02
720	2.6013e+01	1.1232e-01	1.8698e-04	2.3597e-02
760	2.6164e+01	1.0606e-01	1.7552e-04	2.3597e-02
800	2.6306e+01	1.0024e-01	1.6498e-04	2.3597e-02
840	2.6439e+01	9.4813e-02	1.5526e-04	2.3597e-02
880	2.6565e+01	8.9756e-02	1.4628e-04	2.3597e-02
920	2.6683e+01	8.5030e-02	1.3796e-04	2.3597e-02
960	2.6795e+01	8.0608e-02	1.3023e-04	2.3597e-02
1000	2.6900e+01	7.6463e-02	1.2305e-04	2.3597e-02
1040	2.7000e+01	7.2575e-02	1.1635e-04	2.3597e-02
1080	2.7094e+01	6.8921e-02	1.1011e-04	2.3597e-02
1120	2.7183e+01	6.5485e-02	1.0427e-04	2.3597e-02
1160	2.7268e+01	6.2250e-02	9.8811e-05	2.3597e-02
1200	2.7348e+01	5.9201e-02	9.3694e-05	2.3597e-02
1240	2.7424e+01	5.6326e-02	8.8893e-05	2.3597e-02
1280	2.7496e+01	5.3610e-02	8.4383e-05	2.3597e-02
1320	2.7564e+01	5.1045e-02	8.0144e-05	2.3597e-02
1360	2.7629e+01	4.8619e-02	7.6153e-05	2.3597e-02
1400	2.7691e+01	4.6324e-02	7.2395e-05	2.3597e-02
1440	2.7750e+01	4.4150e-02	6.8850e-05	2.3597e-02
1480	2.7806e+01	4.2091e-02	6.5506e-05	2.3597e-02
1520	2.7859e+01	4.0139e-02	6.2347e-05	2.3597e-02
1560	2.7910e+01	3.8287e-02	5.9362e-05	2.3597e-02
1600	2.7958e+01	3.6530e-02	5.6538e-05	2.3597e-02
1640	2.8004e+01	3.4861e-02	5.3866e-05	2.3597e-02
1680	2.8048e+01	3.3276e-02	5.1336e-05	2.3597e-02
1720	2.8089e+01	3.1769e-02	4.8938e-05	2.3597e-02
1760	2.8129e+01	3.0337e-02	4.6665e-05	2.3597e-02
1800	2.8167e+01	2.8975e-02	4.4509e-05	2.3597e-02
1840	2.8203e+01	2.7679e-02	4.2463e-05	2.3597e-02
1880	2.8238e+01	2.6445e-02	4.0520e-05	2.3597e-02
1920	2.8271e+01	2.5270e-02	3.8675e-05	2.3597e-02
1960	2.8302e+01	2.4151e-02	3.6921e-05	2.3597e-02
2000	2.8332e+01	2.3085e-02	3.5253e-05	2.3597e-02
2040	2.8361e+01	2.2069e-02	3.3667e-05	2.3597e-02
2080	2.8388e+01	2.1101e-02	3.2159e-05	2.3597e-02
2120	2.8415e+01	2.0177e-02	3.0723e-05	2.3597e-02
2160	2.8440e+01	1.9297e-02	2.9356e-05	2.3597e-02
2200	2.8463e+01	1.8456e-02	2.8053e-05	2.3597e-02
2240	2.8486e+01	1.7655e-02	2.6813e-05	2.3597e-02
2280	2.8508e+01	1.6889e-02	2.5631e-05	2.3597e-02
2320	2.8529e+01	1.6159e-02	2.4504e-05	2.3597e-02
2360	2.8549e+01	1.5462e-02	2.3430e-05	2.3597e-02
2400	2.8568e+01	1.4795e-02	2.2406e-05	2.3597e-02
2440	2.8586e+01	1.4159e-02	2.1429e-05	2.3597e-02
2480	2.8604e+01	1.3552e-02	2.0496e-05	2.3597e-02
2520	2.8620e+01	1.2971e-02	1.9607e-05	2.3597e-02
2560	2.8636e+01	1.2416e-02	1.8757e-05	2.3597e-02
2600	2.8652e+01	1.1886e-02	1.7947e-05	2.3597e-02
2640	2.8666e+01	1.1379e-02	1.7172e-05	2.3597e-02
2680	2.8680e+01	1.0895e-02	1.6433e-05	2.3597e-02
2720	2.8694e+01	1.0431e-02	1.5727e-05	2.3597e-02
2760	2.8706e+01	9.9882e-03	1.5052e-05	2.3597e-02
2800	2.8719e+01	9.5646e-03	1.4408e-05	2.3597e-02
2840	2.8730e+01	9.1594e-03	1.3792e-05	2.3597e-02
2880	2.8742e+01	8.7718e-03	1.3203e-05	2.3597e-02
2920	2.8752e+01	8.4010e-03	1.2640e-05	2.3597e-02
2960	2.8763e+01	8.0463e-03	1.2102e-05	2.3597e-02
3000	2.8773e+01	7.7069e-03	1.1587e-05	2.3597e-02
3040	2.8782e+01	7.3821e-03	1.1095e-05	2.3597e-02
3080	2.8791e+01	7.0713e-03	1.0625e-05	2.3597e-02
3120	2.8800e+01	6.7739e-03	1.0175e-05	2.3597e-02
3160	2.8808e+01	6.4892e-03	9.7443e-06	2.3597e-02
3200	2.8816e+01	6.2166e-03	9.3325e-06	2.3597e-02
3240	2.8824e+01	5.9558e-03	8.9385e-06	2.3597e-02

------------------------------------------------------------------
>>> Results
Status: Solved
Iterations: 3240
Optimal objective: 28.82
Runtime: 0.095s (95.5ms)

  0.650007 seconds (694.38 k allocations: 41.038 MiB, 84.76% compilation time)
In [39]:
# Check the status, optimal value, and minimizer of the problem
problem.status, problem.optval, β̂cls.value
Out[39]:
(MathOptInterface.OPTIMAL, 28.82361325151151, [13.668687507707883; -9.772741375075672; … ; 21.133785642955573; -15.115087804925041])

We see COSMO have a looser criterion for constraint satisfication, resulting a lower objective value.

In [40]:
sum(β̂cls.value)
Out[40]:
-3.915359769735005e-8

SCS

Switch to the open source SCS solver:

In [41]:
# Use SCS solver
using SCS
solver = () -> SCS.Optimizer(verbose=1)

@time solve!(problem, solver)
  0.498026 seconds (548.70 k allocations: 33.400 MiB, 72.70% compilation time)
----------------------------------------------------------------------------
	SCS v2.1.2 - Splitting Conic Solver
	(c) Brendan O'Donoghue, Stanford University, 2012
----------------------------------------------------------------------------
Lin-sys: sparse-indirect, nnz in A = 5056, CG tol ~ 1/iter^(2.00)
eps = 1.00e-05, alpha = 1.50, max_iters = 5000, normalize = 1, scale = 1.00
acceleration_lookback = 10, rho_x = 1.00e-03
Variables n = 53, constraints m = 107
Cones:	primal zero / dual free vars: 2
	linear vars: 1
	soc vars: 104, soc blks: 2
Setup time: 3.32e-04s
----------------------------------------------------------------------------
 Iter | pri res | dua res | rel gap | pri obj | dua obj | kap/tau | time (s)
----------------------------------------------------------------------------
     0| 9.17e+19  5.76e+19  1.00e+00 -1.04e+21  8.59e+20  1.59e+21  6.67e-04 
   100| 1.44e-05  9.44e-06  5.09e-06  2.90e+01  2.90e+01  7.55e-16  1.26e-02 
   120| 3.60e-06  2.64e-06  1.23e-06  2.90e+01  2.90e+01  3.93e-16  1.38e-02 
----------------------------------------------------------------------------
Status: Solved
Timing: Solve time: 1.38e-02s
	Lin-sys: avg # CG iterations: 6.79, avg solve time: 9.83e-05s
	Cones: avg projection time: 2.64e-07s
	Acceleration: avg step time: 1.18e-05s
----------------------------------------------------------------------------
Error metrics:
dist(s, K) = 3.5527e-15, dist(y, K*) = 4.4409e-16, s'y/|s||y| = -8.9856e-17
primal res: |Ax + s - b|_2 / (1 + |b|_2) = 3.6046e-06
dual res:   |A'y + c|_2 / (1 + |c|_2) = 2.6431e-06
rel gap:    |c'x + b'y| / (1 + |c'x| + |b'y|) = 1.2313e-06
----------------------------------------------------------------------------
c'x = 28.9983, -b'y = 28.9983
============================================================================
In [42]:
# Check the status, optimal value, and minimizer of the problem
problem.status, problem.optval, β̂cls.value
Out[42]:
(MathOptInterface.OPTIMAL, 28.99834367864642, [13.668719778517216; -9.772627374792503; … ; 21.13380901845421; -15.115084607474534])
In [43]:
# check constraint satisfication
sum(β̂cls.value)
Out[43]:
-3.906193668967717e-6

Sum-to-zero lasso

Suppose we want to know which organisms (OTU) are associated with the response. We can answer this question using a sum-to-zero contrained lasso $$ \text{minimize} \frac 12 \|\mathbf{y} - \mathbf{X} \beta\|_2^2 + \lambda \|\beta\|_1 $$ subject to the constraint $\sum_{j=1}^p \beta_j = 0$. Varying $\lambda$ from small to large values will generate a solution path.

In [44]:
using Convex

# Use Mosek solver
using Mosek
solver = () -> Mosek.Optimizer(LOG=0)

# # Use Gurobi solver
# using Gurobi
# solver = () -> Gurobi.Optimizer(OutputFlag=0)

# Use Cplex solver
# using CPLEX
# solver = CplexSolver(CPXPARAM_ScreenOutput=0)

# # Use SCS solver
# using SCS
# solver = () -> SCS.Optimizer(verbose=0)

# solve at a grid of λ
λgrid = 0:0.01:0.35

# holder for solution path
β̂path = zeros(length(λgrid), size(X, 2)) # each row is β̂ at a λ
# optimization variable
β̂classo = Variable(size(X, 2))
# obtain solution path using warm start
@time for i in 1:length(λgrid)
    λ = λgrid[i]
    # define optimization problem
    # objective
    problem = minimize(0.5sumsquares(y - X * β̂classo) + λ * sum(abs, β̂classo))
    # constraint
    problem.constraints += sum(β̂classo) == 0 # constraint
    solve!(problem, solver)
    β̂path[i, :] = β̂classo.value
end
  1.107936 seconds (2.54 M allocations: 231.231 MiB, 6.87% gc time, 45.16% compilation time)
In [45]:
using Plots; gr()
using LaTeXStrings

p = plot(collect(λgrid), β̂path, legend=:none)
xlabel!(p, L"\lambda")
ylabel!(p, L"\hat \beta")
title!(p, "Sum-to-Zero Lasso")
Out[45]:

Sum-to-zero group lasso

Suppose we want to do variable selection not at the OTU level, but at the Phylum level. OTUs are clustered into various Phyla. We can answer this question using a sum-to-zero contrained group lasso $$ \text{minimize} \frac 12 \|\mathbf{y} - \mathbf{X} \beta\|_2^2 + \lambda \sum_j \|\mathbf{\beta}_j\|_2 $$ subject to the constraint $\sum_{j=1}^p \beta_j = 0$, where $\mathbf{\beta}_j$ are regression coefficients corresponding to the $j$-th phylum. This is a second-order cone programming (SOCP) problem readily modeled by Convex.jl.

Let's assume each 10 contiguous OTUs belong to one Phylum.

In [46]:
# Use Mosek solver
using Mosek
solver = () -> Mosek.Optimizer(LOG=0)

# # Use Gurobi solver
# using Gurobi

# # Use Cplex solver
# using CPLEX
# solver = CplexSolver(CPXPARAM_ScreenOutput=0)

# # Use SCS solver
# using SCS
# solver = SCSSolver(verbose=0)

# solve at a grid of λ
λgrid = 0.0:0.005:0.5
β̂pathgrp = zeros(length(λgrid), size(X, 2)) # each row is β̂ at a λ
β̂classo = Variable(size(X, 2))
@time for i in 1:length(λgrid)
    λ = λgrid[i]
    # loss
    obj = 0.5sumsquares(y - X * β̂classo)
    # group lasso penalty term
    for j in 1:(size(X, 2)/10)
        βj = β̂classo[(10(j-1)+1):10j]
        obj = obj + λ * norm(βj)
    end
    problem = minimize(obj)
    # constraint
    problem.constraints += sum(β̂classo) == 0 # constraint
    solve!(problem, solver)
    β̂pathgrp[i, :] = β̂classo.value
end
  1.109452 seconds (1.43 M allocations: 261.446 MiB, 7.66% gc time, 1.53% compilation time)

We see it took Mosek <2 second to solve this seemingly hard optimization problem at 101 different $\lambda$ values.

In [47]:
p2 = plot(collect(λgrid), β̂pathgrp, legend=:none)
xlabel!(p2, L"\lambda")
ylabel!(p2, L"\hat \beta")
title!(p2, "Sum-to-Zero Group Lasso")
Out[47]:

Example: matrix completion

Load the $128 \times 128$ Lena picture with missing pixels.

In [48]:
using FileIO

lena = load("lena128missing.png")
Out[48]:
In [49]:
# convert to real matrices
Y = Float64.(lena)
Out[49]:
128×128 Matrix{Float64}:
 0.0       0.0       0.635294  0.0       …  0.0       0.0       0.627451
 0.627451  0.623529  0.0       0.611765     0.0       0.0       0.388235
 0.611765  0.611765  0.0       0.0          0.403922  0.219608  0.0
 0.0       0.0       0.611765  0.0          0.223529  0.176471  0.192157
 0.611765  0.0       0.615686  0.615686     0.0       0.0       0.0
 0.0       0.0       0.0       0.619608  …  0.0       0.0       0.2
 0.607843  0.0       0.623529  0.0          0.176471  0.192157  0.0
 0.0       0.0       0.623529  0.0          0.0       0.0       0.215686
 0.619608  0.619608  0.0       0.0          0.2       0.0       0.207843
 0.0       0.0       0.635294  0.635294     0.2       0.192157  0.188235
 0.635294  0.0       0.0       0.0       …  0.192157  0.180392  0.0
 0.631373  0.0       0.0       0.0          0.0       0.0       0.0
 0.0       0.627451  0.635294  0.666667     0.172549  0.0       0.184314
 ⋮                                       ⋱  ⋮                   
 0.0       0.129412  0.0       0.541176     0.0       0.286275  0.0
 0.14902   0.129412  0.196078  0.537255     0.345098  0.0       0.0
 0.215686  0.0       0.262745  0.0          0.301961  0.0       0.207843
 0.345098  0.341176  0.356863  0.513725     0.0       0.0       0.231373
 0.0       0.0       0.0       0.0       …  0.0       0.243137  0.258824
 0.298039  0.415686  0.458824  0.0          0.0       0.0       0.258824
 0.0       0.368627  0.4       0.0          0.0       0.0       0.235294
 0.0       0.0       0.34902   0.0          0.0       0.239216  0.207843
 0.219608  0.0       0.0       0.0          0.0       0.0       0.2
 0.0       0.219608  0.235294  0.356863  …  0.0       0.0       0.0
 0.196078  0.207843  0.211765  0.0          0.0       0.270588  0.345098
 0.192157  0.0       0.196078  0.309804     0.266667  0.356863  0.0

We fill out the missin pixels uisng a matrix completion technique developed by Candes and Tao $$ \text{minimize } \|\mathbf{X}\|_* $$ $$ \text{subject to } x_{ij} = y_{ij} \text{ for all observed entries } (i, j). $$ Here $\|\mathbf{M}\|_* = \sum_i \sigma_i(\mathbf{M})$ is the nuclear norm. In words we seek the matrix with minimal nuclear norm that agrees with the observed entries. This is a semidefinite programming (SDP) problem readily modeled by Convex.jl.

This example takes long because of high dimensionality.

In [50]:
# Use COSMO solver
using COSMO

# Linear indices of obs. entries
obsidx = findall(Y[:] .≠ 0.0)
# Create optimization variables
X = Variable(size(Y))
# Set up optmization problem
problem = minimize(nuclearnorm(X))
problem.constraints += X[obsidx] == Y[obsidx]
# Solve the problem by calling solve
# @time solve!(problem, () -> Mosek.Optimizer(LOG=1)) # Mosek takes about 20min
@time solve!(problem, () -> COSMO.Optimizer()) # fast
------------------------------------------------------------------
          COSMO v0.7.9 - A Quadratic Objective Conic Solver
                         Michael Garstka
                University of Oxford, 2017 - 2020
------------------------------------------------------------------

Problem:  x ∈ R^{49153},
          constraints: A ∈ R^{73665x49153} (73793 nnz),
          matrix size to factor: 122818x122818,
          Floating-point precision: Float64
Sets:     ZeroSet) of dim: 40769
          DensePsdConeTriangle) of dim: 32896
Settings: ϵ_abs = 1.0e-04, ϵ_rel = 1.0e-04,
          ϵ_prim_inf = 1.0e-05, ϵ_dual_inf = 1.0e-04,
          ρ = 0.1, σ = 1e-06, α = 1.6,
          max_iter = 5000,
          scaling iter = 10 (on),
          check termination every 40 iter,
          check infeasibility every 40 iter,
          KKT system solver: QDLDL
Setup Time: 60.85ms

Iter:	Objective:	Primal Res:	Dual Res:	Rho:
1	-1.4426e+03	1.1678e+01	5.9856e-01	1.0000e-01
40	1.4739e+02	1.7996e-02	2.4732e-04	1.0000e-01
80	1.4797e+02	4.6598e-04	4.2189e-05	6.8117e-01
120	1.4797e+02	3.9276e-05	2.6024e-06	6.8117e-01

------------------------------------------------------------------
>>> Results
Status: Solved
Iterations: 120
Optimal objective: 148
Runtime: 1.481s (1481.31ms)

  4.342192 seconds (5.54 M allocations: 573.937 MiB, 4.69% gc time, 56.42% compilation time)
In [51]:
using Images

Gray.(X.value)
Out[51]:

Nonlinear programming (NLP)

We use MLE of Gamma distribution to illustrate some rudiments of nonlinear programming (NLP) in Julia.

Let $x_1,\ldots,x_m$ be a random sample from the gamma density $$ f(x) = \Gamma(\alpha)^{-1} \beta^{\alpha} x^{\alpha-1} e^{-\beta x} $$ on $(0,\infty)$. The loglikelihood function is $$ L(\alpha, \beta) = m [- \ln \Gamma(\alpha) + \alpha \ln \beta + (\alpha - 1)\overline{\ln x} - \beta \bar x], $$ where $\overline{x} = \frac{1}{m} \sum_{i=1}^m x_i$ and $\overline{\ln x} = \frac{1}{m} \sum_{i=1}^m \ln x_i$.

In [52]:
using Random, Statistics, SpecialFunctions
Random.seed!(123)

function gamma_logpdf(x::Vector, α::Real, β::Real)
    m = length(x)
    avg = mean(x)
    logavg = sum(log, x) / m
    m * (- log(gamma(α)) + α * log(β) + (α - 1) * logavg - β * avg)
end

x = rand(5)
gamma_logpdf(x, 1.0, 1.0)
Out[52]:
-3.0916184386224517

Many optimization algorithms involve taking derivatives of the objective function. The ForwardDiff.jl package implements automatic differentiation. For example, to compute the derivative and Hessian of the log-likelihood with data x at α=1.0 and β=1.0.

In [53]:
using ForwardDiff

ForwardDiff.gradient(θ -> gamma_logpdf(x, θ...), [1.0; 1.0])
Out[53]:
2-element Vector{Float64}:
 0.07828535245887835
 1.9083815613775483
In [54]:
ForwardDiff.hessian(θ -> gamma_logpdf(x, θ...), [1.0; 1.0])
Out[54]:
2×2 Matrix{Float64}:
 -8.22467   5.0
  5.0      -5.0

Generate data:

In [55]:
using Distributions, Random

Random.seed!(123)
(n, p) = (1000, 2)
(α, β) = 5.0 * rand(p)
x = rand(Gamma(α, β), n)
println("True parameter values:")
println("α = ", α, ", β = ", β)
True parameter values:
α = 3.8422383759828493, β = 4.7025750035759355

We use JuMP.jl to define and solve our NLP problem.

In [56]:
using JuMP, Ipopt, NLopt

m = Model(with_optimizer(Ipopt.Optimizer, print_level=3))
# m = Model(with_optimizer(NLopt.Optimizer, algorithm=:LD_MMA))

myf(a, b) = gamma_logpdf(x, a, b)
JuMP.register(m, :myf, 2, myf, autodiff=true)
@variable(m, α >= 1e-8)
@variable(m, β >= 1e-8)
@NLobjective(m, Max, myf(α, β))

print(m)
status = JuMP.optimize!(m)

println("MLE (JuMP):")
println("α = ", α, ", β = ", β)
println("Objective value: ", JuMP.objective_value(m))
println("α = ", JuMP.value(α), ", β = ", 1 / JuMP.value(β))
println("MLE (Distribution package):")
println(fit_mle(Gamma, x))
Max myf(α, β)
Subject to
 α ≥ 1.0e-8
 β ≥ 1.0e-8
Total number of variables............................:        2
                     variables with only lower bounds:        2
                variables with lower and upper bounds:        0
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0


Number of Iterations....: 20

                                   (scaled)                 (unscaled)
Objective...............:   3.5872527717763427e-05   -3.5872527717763428e+03
Dual infeasibility......:   3.1718198595246968e-09    3.1718198595246966e-01
Constraint violation....:   0.0000000000000000e+00    0.0000000000000000e+00
Complementarity.........:   2.0994604946507695e-20   -2.0994604946507695e-12
Overall NLP error.......:   3.1718198595246968e-09    3.1718198595246966e-01


Number of objective function evaluations             = 38
Number of objective gradient evaluations             = 21
Number of equality constraint evaluations            = 0
Number of inequality constraint evaluations          = 0
Number of equality constraint Jacobian evaluations   = 0
Number of inequality constraint Jacobian evaluations = 0
Number of Lagrangian Hessian evaluations             = 0
Total CPU secs in IPOPT (w/o function evaluations)   =      0.139
Total CPU secs in NLP function evaluations           =      0.002

EXIT: Optimal Solution Found.
MLE (JuMP):
α = α, β = β
Objective value: -3587.252771776343
α = 3.635960189565557, β = 5.05796780575568
MLE (Distribution package):
Gamma{Float64}(α=3.635463551374468, θ=5.058746017922469)
In [ ]: