A list of "easy" linear systems

Consider $\mathbf{A} \mathbf{x} = \mathbf{b}$, $\mathbf{A} \in \mathbb{R}^{n \times n}$. Or, consider matrix inverse (if you want). $\mathbf{A}$ can be huge. Keep massive data in mind: 1000 Genome Project, NetFlix, Google PageRank, finance, spatial statistics, ... We should be alert to many easy linear systems.

Don't blindly use A \ b and inv in Julia or solve function in R. Don't waste computing resources by bad choices of algorithms!

In [1]:
versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.6.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4

Diagonal matrix

Diagonal $\mathbf{A}$: $n$ flops. Use Diagonal type of Julia.

In [2]:
using BenchmarkTools, LinearAlgebra, Random

# generate random data
Random.seed!(123)
n = 1000
A = diagm(0 => randn(n)) # a diagonal matrix stored as Matrix{Float64}
b = randn(n);
In [3]:
# should give link: https://github.com/JuliaLang/julia/blob/5b637df34396034b0dd353e603ab3d61322369fb/stdlib/LinearAlgebra/src/generic.jl#L956
@which A \ b
Out[3]:
\(A::AbstractMatrix{T} where T, B::AbstractVecOrMat{T} where T) in LinearAlgebra at /Applications/Julia-1.6.app/Contents/Resources/julia/share/julia/stdlib/v1.6/LinearAlgebra/src/generic.jl:1122
In [4]:
# check `istril(A)` and `istriu(A)` (O(n^2)), then call `Diagonal(A) \ b` (O(n))
@benchmark $A \ $b
Out[4]:
BenchmarkTools.Trial: 
  memory estimate:  15.88 KiB
  allocs estimate:  2
  --------------
  minimum time:     840.548 μs (0.00% GC)
  median time:      937.143 μs (0.00% GC)
  mean time:        961.377 μs (0.12% GC)
  maximum time:     7.117 ms (86.67% GC)
  --------------
  samples:          5198
  evals/sample:     1
In [5]:
# O(n) computation, no extra array allocation
@benchmark Diagonal($A) \ $b
Out[5]:
BenchmarkTools.Trial: 
  memory estimate:  15.88 KiB
  allocs estimate:  2
  --------------
  minimum time:     2.844 μs (0.00% GC)
  median time:      3.737 μs (0.00% GC)
  mean time:        6.461 μs (37.41% GC)
  maximum time:     1.052 ms (99.04% GC)
  --------------
  samples:          10000
  evals/sample:     9

Bidiagonal, tridiagonal, and banded matrices

Bidiagonal, tridiagonal, or banded $\mathbf{A}$: Band LU, band Cholesky, ... roughly $O(n)$ flops.

In [6]:
Random.seed!(123) 

n  = 1000
dv = randn(n)
ev = randn(n - 1)
b  = randn(n) # rhs
# symmetric tridiagonal matrix
A  = SymTridiagonal(dv, ev)
Out[6]:
1000×1000 SymTridiagonal{Float64, Vector{Float64}}:
  1.19027   -0.195171    ⋅         ⋅        …   ⋅          ⋅          ⋅ 
 -0.195171   2.04818   -1.42013    ⋅            ⋅          ⋅          ⋅ 
   ⋅        -1.42013    1.14265   1.0819        ⋅          ⋅          ⋅ 
   ⋅          ⋅         1.0819    0.459416      ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅       -0.112465      ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅        …   ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅            ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅            ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅            ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅            ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅        …   ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅            ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅            ⋅          ⋅          ⋅ 
  ⋮                                         ⋱                       
   ⋅          ⋅          ⋅         ⋅            ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅            ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅        …   ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅            ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅            ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅            ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅            ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅        …   ⋅          ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅           1.14112     ⋅          ⋅ 
   ⋅          ⋅          ⋅         ⋅           1.09938    0.597106    ⋅ 
   ⋅          ⋅          ⋅         ⋅           0.597106  -1.70143   -1.30405
   ⋅          ⋅          ⋅         ⋅            ⋅        -1.30405   -0.543953
In [7]:
# convert to a full matrix
Afull = Matrix(A)

# LU decomposition (2/3) n^3 flops!
@benchmark $Afull \ $b
Out[7]:
BenchmarkTools.Trial: 
  memory estimate:  7.64 MiB
  allocs estimate:  4
  --------------
  minimum time:     8.970 ms (0.00% GC)
  median time:      10.450 ms (0.00% GC)
  mean time:        11.118 ms (4.22% GC)
  maximum time:     23.517 ms (14.47% GC)
  --------------
  samples:          450
  evals/sample:     1
In [8]:
# specialized algorithm for tridiagonal matrix, O(n) flops
@benchmark $A \ $b
Out[8]:
BenchmarkTools.Trial: 
  memory estimate:  23.81 KiB
  allocs estimate:  3
  --------------
  minimum time:     13.741 μs (0.00% GC)
  median time:      15.677 μs (0.00% GC)
  mean time:        20.589 μs (16.97% GC)
  maximum time:     7.546 ms (99.49% GC)
  --------------
  samples:          10000
  evals/sample:     1

Triangular matrix

Triangular $\mathbf{A}$: $n^2$ flops to solve linear system.

In [9]:
Random.seed!(123)

n = 1000
A = tril(randn(n, n)) # a lower-triangular matrix stored as Matrix{Float64}
b = randn(n)

# check istril() then triangular solve
@benchmark $A \ $b
Out[9]:
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     531.598 μs (0.00% GC)
  median time:      621.947 μs (0.00% GC)
  mean time:        696.560 μs (0.09% GC)
  maximum time:     4.943 ms (87.20% GC)
  --------------
  samples:          7167
  evals/sample:     1
In [10]:
# triangular solve directly; save the cost of istril()
@benchmark LowerTriangular($A) \ $b
Out[10]:
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     91.702 μs (0.00% GC)
  median time:      105.241 μs (0.00% GC)
  mean time:        114.309 μs (0.41% GC)
  maximum time:     4.913 ms (95.96% GC)
  --------------
  samples:          10000
  evals/sample:     1

Block diagonal matrix

Block diagonal: Suppose $n = \sum_b n_b$. For linear equations, $(\sum_b n_b)^3$ (without using block diagonal structure) vs $\sum_b n_b^3$ (using block diagonal structure).

Julia has a blockdiag function that generates a sparse matrix. Anyone interested writing a BlockDiagonal.jl package?

In [11]:
using SparseArrays

Random.seed!(123)

B  = 10 # number of blocks
ni = 100
A  = blockdiag([sprandn(ni, ni, 0.01) for b in 1:B]...)
Out[11]:
1000×1000 SparseMatrixCSC{Float64, Int64} with 941 stored entries:
⣻⢼⢵⡴⡆⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⣻⢿⣝⠳⡃⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⡶⣿⣿⣬⠄⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⢾⣽⠿⢾⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠈⠁⠀⠀⢽⣳⡏⣒⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⣽⢋⣻⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠈⣷⣽⣫⢳⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢋⡹⡛⡭⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣻⡮⣿⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠏⣾⣯⡪⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡻⡿⡿⣻⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡶⣿⢶⠟⠀⠀⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣻⣟⢹⣿⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢈⠟⢽⡽⢍⠀⠀⡀⠀⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣵⣏⢾⠟⠀⠀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⡆⣳⣺⡿⠀⡀⠀⠀⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⣿⢺⡽⣢⠀⠀⠀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢀⣿⣾⣳⣷⢀⢀⣀⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣯⣆⣿⣾
⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⢘⠿⢻⡾⡯
In [12]:
using UnicodePlots
spy(A)
Out[12]:
                     Sparsity Pattern
        ┌──────────────────────────────────────────┐    
      1⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ > 0
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ < 0
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
   1000⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        └──────────────────────────────────────────┘    
        1                                       1000
                         nz = 941

Kronecker product

Use $$ \begin{eqnarray*} (\mathbf{A} \otimes \mathbf{B})^{-1} &=& \mathbf{A}^{-1} \otimes \mathbf{B}^{-1} \\ (\mathbf{C}^T \otimes \mathbf{A}) \text{vec}(\mathbf{B}) &=& \text{vec}(\mathbf{A} \mathbf{B} \mathbf{C}) \\ \text{det}(\mathbf{A} \otimes \mathbf{B}) &=& [\text{det}(\mathbf{A})]^p [\text{det}(\mathbf{B})]^m, \quad \mathbf{A} \in \mathbb{R}^{m \times m}, \mathbf{B} \in \mathbb{R}^{p \times p} \end{eqnarray*} $$ to avoid forming and doing costly computation on the potentially huge Kronecker $\mathbf{A} \otimes \mathbf{B}$.

Anyone interested writing a package?

In [13]:
using MatrixDepot, LinearAlgebra

A = matrixdepot("lehmer", 50) # a pd matrix
include group.jl for user defined matrix generators
verify download of index files...
reading database
adding metadata...
adding svd data...
writing database
used remote sites are sparse.tamu.edu with MAT index and math.nist.gov with HTML index
Out[13]:
50×50 Matrix{Float64}:
 1.0        0.5        0.333333   0.25       …  0.0208333  0.0204082  0.02
 0.5        1.0        0.666667   0.5           0.0416667  0.0408163  0.04
 0.333333   0.666667   1.0        0.75          0.0625     0.0612245  0.06
 0.25       0.5        0.75       1.0           0.0833333  0.0816327  0.08
 0.2        0.4        0.6        0.8           0.104167   0.102041   0.1
 0.166667   0.333333   0.5        0.666667   …  0.125      0.122449   0.12
 0.142857   0.285714   0.428571   0.571429      0.145833   0.142857   0.14
 0.125      0.25       0.375      0.5           0.166667   0.163265   0.16
 0.111111   0.222222   0.333333   0.444444      0.1875     0.183673   0.18
 0.1        0.2        0.3        0.4           0.208333   0.204082   0.2
 0.0909091  0.181818   0.272727   0.363636   …  0.229167   0.22449    0.22
 0.0833333  0.166667   0.25       0.333333      0.25       0.244898   0.24
 0.0769231  0.153846   0.230769   0.307692      0.270833   0.265306   0.26
 ⋮                                           ⋱                        
 0.025641   0.0512821  0.0769231  0.102564      0.8125     0.795918   0.78
 0.025      0.05       0.075      0.1           0.833333   0.816327   0.8
 0.0243902  0.0487805  0.0731707  0.097561   …  0.854167   0.836735   0.82
 0.0238095  0.047619   0.0714286  0.0952381     0.875      0.857143   0.84
 0.0232558  0.0465116  0.0697674  0.0930233     0.895833   0.877551   0.86
 0.0227273  0.0454545  0.0681818  0.0909091     0.916667   0.897959   0.88
 0.0222222  0.0444444  0.0666667  0.0888889     0.9375     0.918367   0.9
 0.0217391  0.0434783  0.0652174  0.0869565  …  0.958333   0.938776   0.92
 0.0212766  0.0425532  0.0638298  0.0851064     0.979167   0.959184   0.94
 0.0208333  0.0416667  0.0625     0.0833333     1.0        0.979592   0.96
 0.0204082  0.0408163  0.0612245  0.0816327     0.979592   1.0        0.98
 0.02       0.04       0.06       0.08          0.96       0.98       1.0
In [14]:
B = matrixdepot("oscillate", 100) # pd matrix
Out[14]:
100×100 Matrix{Float64}:
  0.732819      0.204116     -0.0358704    …   6.36245e-16  -5.2958e-16
  0.204116      0.655435      0.1325          -1.13504e-15   9.44756e-16
 -0.0358704     0.1325        0.504374         2.29508e-15  -1.91032e-15
  0.00463216   -0.0168494     0.0978287       -2.85981e-15   2.38037e-15
 -0.00363204    0.00442701   -0.0232426        5.63353e-15  -4.68908e-15
 -0.00284163    0.00162401    0.00616123   …  -3.46992e-14   2.8882e-14
  1.91114e-5   -7.74183e-5    0.000177805      1.59942e-12  -1.33128e-12
  2.8273e-5    -1.64103e-5   -4.67529e-5      -6.25191e-12   5.20379e-12
 -4.02336e-5    5.58245e-5   -3.72592e-5       2.64575e-11  -2.2022e-11
  2.13929e-5   -3.54843e-5    3.77951e-5      -3.12201e-11   2.59861e-11
  3.48406e-6   -4.80497e-6    2.89002e-6   …   4.97718e-10  -4.14277e-10
  1.3064e-7    -1.67631e-7    7.01213e-8      -2.31569e-10   1.92746e-10
 -1.18051e-9    3.238e-9     -7.85849e-9       9.60367e-11  -7.99363e-11
  ⋮                                        ⋱                
 -1.47428e-15   2.63007e-15  -5.31806e-15     -3.60796e-5    3.0031e-5
  4.87598e-16  -8.69861e-16   1.75888e-15      1.19448e-5   -9.94153e-6
 -1.98529e-16   3.54171e-16  -7.16141e-16  …  -4.88131e-6    4.06676e-6
  1.91074e-16  -3.40871e-16   6.89249e-16      4.70708e-6   -3.9154e-6
  1.64013e-16  -2.92595e-16   5.91633e-16     -0.00122316    0.000937246
 -2.94416e-16   5.2523e-16   -1.06203e-15      0.00236155   -0.00193024
  4.66012e-16  -8.31352e-16   1.68101e-15     -0.00418716    0.00321704
 -2.05008e-15   3.65729e-15  -7.39512e-15  …   0.0206542    -0.0161171
  9.09071e-16  -1.62176e-15   3.27923e-15     -0.0162885     0.0109771
 -1.96949e-15   3.51351e-15  -7.10439e-15      0.108375     -0.045411
  6.36245e-16  -1.13504e-15   2.29508e-15      0.628243      0.23527
 -5.2958e-16    9.44756e-16  -1.91032e-15      0.23527       0.539612
In [15]:
M = kron(A, B)
c = ones(size(M, 2)) # rhs
# Method 1: form Kronecker product and Cholesky solve
x1 = cholesky(Symmetric(M)) \ c;
In [16]:
# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
m, p = size(A, 1), size(B, 1)
x2 = vec(transpose(cholesky(Symmetric(A)) \ 
    transpose(cholesky(Symmetric(B)) \ reshape(c, p, m))));
In [17]:
# relative error
norm(x1 - x2) / norm(x1)
Out[17]:
1.423978030835246e-7
In [18]:
using BenchmarkTools

# Method 1: form Kronecker and Cholesky solve
@benchmark cholesky(Symmetric(kron($A, $B))) \ c
Out[18]:
BenchmarkTools.Trial: 
  memory estimate:  381.51 MiB
  allocs estimate:  9
  --------------
  minimum time:     555.949 ms (0.24% GC)
  median time:      591.758 ms (4.64% GC)
  mean time:        618.437 ms (7.30% GC)
  maximum time:     707.333 ms (3.93% GC)
  --------------
  samples:          9
  evals/sample:     1
In [19]:
# Method 2: use (A ⊗ B)^{-1} = A^{-1} ⊗ B^{-1}
@benchmark vec(transpose(cholesky(Symmetric($A)) \ 
    transpose(cholesky(Symmetric($B)) \ reshape($c, p, m))))
Out[19]:
BenchmarkTools.Trial: 
  memory estimate:  176.36 KiB
  allocs estimate:  16
  --------------
  minimum time:     127.807 μs (0.00% GC)
  median time:      302.715 μs (0.00% GC)
  mean time:        327.201 μs (7.93% GC)
  maximum time:     22.702 ms (97.96% GC)
  --------------
  samples:          10000
  evals/sample:     1

Sparse matrix

Sparsity: sparse matrix decomposition or iterative method.

  • The easiest recognizable structure. Familiarize yourself with the sparse matrix computation tools in Julia, Matlab, R (Matrix package), MKL (sparse BLAS), ... as much as possible.
In [20]:
using MatrixDepot

Random.seed!(123)

# a 7701-by-7701 sparse pd matrix
A = matrixdepot("wathen", 50)
# random generated rhs
b = randn(size(A, 1))
Afull = Matrix(A)
count(!iszero, A) / length(A) # sparsity
Out[20]:
0.001994776158751544
In [21]:
using UnicodePlots
spy(A)
Out[21]:
                     Sparsity Pattern
        ┌──────────────────────────────────────────┐    
      1⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ > 0
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀ < 0
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
       ⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
   7701⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀    
        └──────────────────────────────────────────┘    
        1                                       7701
                        nz = 118301

Matrix-vector multiplication

In [22]:
# dense matrix-vector multiplication
@benchmark $Afull * $b
Out[22]:
BenchmarkTools.Trial: 
  memory estimate:  60.27 KiB
  allocs estimate:  2
  --------------
  minimum time:     19.893 ms (0.00% GC)
  median time:      21.651 ms (0.00% GC)
  mean time:        22.494 ms (0.00% GC)
  maximum time:     31.167 ms (0.00% GC)
  --------------
  samples:          223
  evals/sample:     1
In [23]:
# sparse matrix-vector multiplication
@benchmark $A * $b
Out[23]:
BenchmarkTools.Trial: 
  memory estimate:  60.27 KiB
  allocs estimate:  2
  --------------
  minimum time:     121.508 μs (0.00% GC)
  median time:      149.204 μs (0.00% GC)
  mean time:        160.218 μs (3.53% GC)
  maximum time:     56.831 ms (99.47% GC)
  --------------
  samples:          10000
  evals/sample:     1

Solve linear equation

In [24]:
# solve via dense Cholesky
xchol = cholesky(Symmetric(Afull)) \ b
@benchmark cholesky($(Symmetric(Afull))) \ $b
Out[24]:
BenchmarkTools.Trial: 
  memory estimate:  452.52 MiB
  allocs estimate:  6
  --------------
  minimum time:     1.419 s (0.05% GC)
  median time:      1.466 s (0.95% GC)
  mean time:        1.562 s (1.50% GC)
  maximum time:     1.896 s (3.47% GC)
  --------------
  samples:          4
  evals/sample:     1
In [25]:
# solve via sparse Cholesky
xcholsp = cholesky(Symmetric(A)) \ b
@show norm(xchol - xcholsp)
@benchmark cholesky($(Symmetric(A))) \ $b
norm(xchol - xcholsp) = 2.951350597311586e-15
Out[25]:
BenchmarkTools.Trial: 
  memory estimate:  12.43 MiB
  allocs estimate:  50
  --------------
  minimum time:     12.182 ms (0.00% GC)
  median time:      14.598 ms (0.00% GC)
  mean time:        15.083 ms (0.21% GC)
  maximum time:     31.102 ms (4.08% GC)
  --------------
  samples:          332
  evals/sample:     1
In [26]:
# sparse solve via conjugate gradient
using IterativeSolvers

xcg = cg(A, b)
@show norm(xcg - xchol)
@benchmark cg($A, $b)
norm(xcg - xchol) = 1.6977624055764347e-7
Out[26]:
BenchmarkTools.Trial: 
  memory estimate:  241.86 KiB
  allocs estimate:  16
  --------------
  minimum time:     23.349 ms (0.00% GC)
  median time:      24.155 ms (0.00% GC)
  mean time:        24.372 ms (0.00% GC)
  maximum time:     27.847 ms (0.00% GC)
  --------------
  samples:          206
  evals/sample:     1

Easy plus low rank

Easy plus low rank: $\mathbf{U} \in \mathbb{R}^{n \times r}$, $\mathbf{V} \in \mathbb{R}^{r \times n}$, $r \ll n$. Woodbury formula \begin{eqnarray*} (\mathbf{A} + \mathbf{U} \mathbf{V}^T)^{-1} &=& \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{U} (\mathbf{I}_r + \mathbf{V}^T \mathbf{A}^{-1} \mathbf{U})^{-1} \mathbf{V}^T \mathbf{A}^{-1} \\ \text{det}(\mathbf{A} + \mathbf{U} \mathbf{V}^T) &=& \text{det}(\mathbf{A}) \text{det}(\mathbf{I}_r + \mathbf{V} \mathbf{A}^{-1} \mathbf{U}^T). \end{eqnarray*}

  • Keep HW2 Q2 (multivariate density) and PageRank problem in mind.
  • WoodburyMatrices.jl package can be useful.
In [27]:
using BenchmarkTools, Random, WoodburyMatrices

Random.seed!(123)
n = 1000
r = 5

A = Diagonal(rand(n))
B = randn(n, r)
D = Diagonal(rand(r))
b = randn(n)
# Woodbury structure: W = A + B * D * B'
W = SymWoodbury(A, B, D)
Wfull = Matrix(W) # stored as a Matrix{Float64}
Out[27]:
1000×1000 Matrix{Float64}:
  1.00751     0.0942481  -0.89675    …  -0.215749   0.867569    0.110269
  0.0942481   3.96651    -1.23243       -0.222615   1.55709     2.25474
 -0.89675    -1.23243     4.74782        0.174786  -3.85127    -0.98163
  0.781968   -2.85065    -1.44214       -3.07727    0.415963   -2.60972
 -0.22913    -0.333805    1.02916        0.222734  -0.902749   -0.198271
 -0.592582    0.282854    2.21757    …   0.661326  -1.85877     0.276912
  0.514081   -0.557125   -1.77699       -0.346809   1.67041    -0.277478
 -0.0756023   1.76631    -0.303411       0.838138   0.938665    1.67483
  0.181564   -0.42357    -0.845054       2.51877    2.04249     0.886271
  0.193806   -1.55235     0.274039      -1.66764   -0.608694   -1.3487
 -0.27576     0.875352    1.05925    …   1.55017    0.433278    1.69237
 -0.120843   -0.419892    1.20975       -0.189021  -0.466013    0.187383
  0.137803   -2.34569    -0.222221       1.73242    0.436486   -1.18657
  ⋮                                  ⋱                         
 -0.244003   -0.825899    1.63995       -0.633534  -1.4448     -0.556889
  0.086005   -1.33356    -0.071707       0.492142   0.0295225  -0.83359
  0.152855   -0.805283   -0.139264   …   0.275684   0.534003   -0.169895
  1.44618    -3.86181    -3.79254       -1.23254    3.80313    -2.10821
  0.651458    1.50427    -2.26143       -2.70592    1.94592     0.768484
 -0.291346   -1.6475      1.4919         1.72771   -0.808014   -0.548643
  0.199968    0.425293   -0.615276      -1.1191     0.473018    0.155778
  0.164382   -1.16224     0.649759   …  -0.60271    0.236174   -0.136834
 -0.0390954   0.744044   -0.0749459      1.10033    0.818488    1.10986
 -0.215749   -0.222615    0.174786       4.05625    0.886098    0.860332
  0.867569    1.55709    -3.85127        0.886098   5.27085     2.02414
  0.110269    2.25474    -0.98163        0.860332   2.02414     2.59729
In [28]:
# compares storage
Base.summarysize(W), Base.summarysize(Wfull)
Out[28]:
(64720, 8000040)

Solve linear equation

In [29]:
# solve via Cholesky
@benchmark cholesky($(Symmetric(Wfull))) \ $b
Out[29]:
BenchmarkTools.Trial: 
  memory estimate:  7.64 MiB
  allocs estimate:  5
  --------------
  minimum time:     6.865 ms (0.00% GC)
  median time:      12.046 ms (0.00% GC)
  mean time:        12.541 ms (7.53% GC)
  maximum time:     63.468 ms (79.87% GC)
  --------------
  samples:          399
  evals/sample:     1
In [30]:
# solve using Woodbury formula
@benchmark $W \ reshape($b, n, 1) # hack; need to file an issue
Out[30]:
BenchmarkTools.Trial: 
  memory estimate:  32.09 KiB
  allocs estimate:  8
  --------------
  minimum time:     12.066 μs (0.00% GC)
  median time:      22.729 μs (0.00% GC)
  mean time:        21.819 μs (0.00% GC)
  maximum time:     193.880 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

Matrix-vector multiplication

In [31]:
# multiplication without using Woodbury structure
@benchmark $Wfull * $b
Out[31]:
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     84.059 μs (0.00% GC)
  median time:      106.483 μs (0.00% GC)
  mean time:        148.375 μs (0.00% GC)
  maximum time:     1.538 ms (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1
In [32]:
# multiplication using Woodbury structure
@benchmark $W * $b
Out[32]:
BenchmarkTools.Trial: 
  memory estimate:  7.94 KiB
  allocs estimate:  1
  --------------
  minimum time:     2.657 μs (0.00% GC)
  median time:      4.030 μs (0.00% GC)
  mean time:        5.160 μs (13.35% GC)
  maximum time:     6.895 ms (99.88% GC)
  --------------
  samples:          10000
  evals/sample:     9

Determinant

In [33]:
# determinant without using Woodbury structure
@benchmark det($Wfull)
Out[33]:
BenchmarkTools.Trial: 
  memory estimate:  7.64 MiB
  allocs estimate:  3
  --------------
  minimum time:     8.655 ms (0.00% GC)
  median time:      12.902 ms (0.00% GC)
  mean time:        14.250 ms (6.72% GC)
  maximum time:     67.765 ms (77.56% GC)
  --------------
  samples:          351
  evals/sample:     1
In [34]:
# determinant using Woodbury structure
@benchmark det($W)
Out[34]:
BenchmarkTools.Trial: 
  memory estimate:  416 bytes
  allocs estimate:  2
  --------------
  minimum time:     467.485 ns (0.00% GC)
  median time:      497.090 ns (0.00% GC)
  mean time:        601.926 ns (9.63% GC)
  maximum time:     298.711 μs (99.77% GC)
  --------------
  samples:          10000
  evals/sample:     194

Easy plus border

Easy plus border: For $\mathbf{A}$ pd and $\mathbf{V}$ full row rank, $$ \begin{pmatrix} \mathbf{A} & \mathbf{V}^T \\ \mathbf{V} & \mathbf{0} \end{pmatrix}^{-1} = \begin{pmatrix} \mathbf{A}^{-1} - \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & \mathbf{A}^{-1} \mathbf{V}^T (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \\ (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \mathbf{V} \mathbf{A}^{-1} & - (\mathbf{V} \mathbf{A}^{-1} \mathbf{V}^T)^{-1} \end{pmatrix}. $$ Anyone interested writing a package?

Orthogonal matrix

Orthogonal $\mathbf{A}$: $n^2$ flops at most. Why? Permutation matrix, Householder matrix, Jacobi matrix, ... take less.

Toeplitz matrix

Toeplitz systems (constant diagonals): $$ \mathbf{T} = \begin{pmatrix} r_0 & r_1 & r_2 & r_3 \\ r_{-1} & r_0 & r_1 & r_2 \\ r_{-2} & r_{-1} & r_0 & r_1 \\ r_{-3} & r_{-2} & r_{-1} & r_0 \end{pmatrix}. $$ $\mathbf{T} \mathbf{x} = \mathbf{b}$, where $\mathbf{T}$ is pd and Toeplitz, can be solved in $O(n^2)$ flops. Durbin algorithm (Yule-Walker equation), Levinson algorithm (general $\mathbf{b}$), Trench algorithm (inverse). These matrices occur in auto-regressive models and econometrics.

Circulant matrix

Circulant systems: Toeplitz matrix with wraparound $$ C(\mathbf{z}) = \begin{pmatrix} z_0 & z_4 & z_3 & z_2 & z_1 \\ z_1 & z_0 & z_4 & z_3 & z_2 \\ z_2 & z_1 & z_0 & z_4 & z_3 \\ z_3 & z_2 & z_1 & z_0 & z_4 \\ z_4 & z_3 & z_2 & z_1 & z_0 \end{pmatrix}, $$ FFT type algorithms: DCT (discrete cosine transform) and DST (discrete sine transform).

Vandermonde matrix

Vandermonde matrix: such as in interpolation and approximation problems $$ \mathbf{V}(x_0,\ldots,x_n) = \begin{pmatrix} 1 & 1 & \cdots & 1 \\ x_0 & x_1 & \cdots & x_n \\ \vdots & \vdots & & \vdots \\ x_0^n & x_1^n & \cdots & x_n^n \end{pmatrix}. $$ $\mathbf{V} \mathbf{x} = \mathbf{b}$ or $\mathbf{V}^T \mathbf{x} = \mathbf{b}$ can be solved in $O(n^2)$ flops.

Cauchy-like matrix

Cauchy-like matrices: $$ \Omega \mathbf{A} - \mathbf{A} \Lambda = \mathbf{R} \mathbf{S}^T, $$ where $\Omega = \text{diag}(\omega_1,\ldots,\omega_n)$ and $\Lambda = \text{diag}(\lambda_1,\ldots, \lambda_n)$. $O(n)$ flops for LU and QR.

Structured-rank matrix

Structured-rank problems: semiseparable matrices (LU and QR takes $O(n)$ flops), quasiseparable matrices, ...