Introduction to

System information (for reproducibility)

In [1]:
versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.6.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 4

What's Julia?

Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments

  • History:

    • Project started in 2009. First public release in 2012
    • Creators: Jeff Bezanson, Alan Edelman, Stefan Karpinski, Viral Shah
    • First major release v1.0 was released on Aug 8, 2018
    • Current stable release v1.6.0
  • Aim to solve the notorious two language problem: Prototype code goes into high-level languages like R/Python, production code goes into low-level language like C/C++.

    Julia aims to:

    Walks like Python. Runs like C.

See https://julialang.org/benchmarks/ for the details of benchmark.

  • Write high-level, abstract code that closely resembles mathematical formulas

    • yet produces fast, low-level machine code that has traditionally only been generated by static languages.
  • Julia is more than just "Fast R" or "Fast Matlab"

    • Performance comes from features that work well together.
    • You can't just take the magic dust that makes Julia fast and sprinkle it on [language of choice]

Learning resources

  1. The (free) online course Introduction to Julia, by Jane Herriman.

  2. Cheat sheet: The Fast Track to Julia.

  3. Browse the Julia documentation.

  4. For Matlab users, read Noteworthy Differences From Matlab.

    For R users, read Noteworthy Differences From R.

    For Python users, read Noteworthy Differences From Python.

  5. The Learning page on Julia's website has pointers to many other learning resources.

Julia REPL (Read-Evaluation-Print-Loop)

The Julia REPL, or Julia shell, has at least five modes.

  1. Default mode is the Julian prompt julia>. Type backspace in other modes to enter default mode.

  2. Help mode help?>. Type ? to enter help mode. ?search_term does a fuzzy search for search_term.

  3. Shell mode shell>. Type ; to enter shell mode.

  4. Package mode (@v1.6) pkg>. Type ] to enter package mode for managing Julia packages (install, uninstall, update, ...).

  5. Search mode (reverse-i-search). Press ctrl+R to enter search model.

  6. With RCall.jl package installed, we can enter the R mode by typing $ (shift+4) at Julia REPL.

Some survival commands in Julia REPL:

  1. exit() or Ctrl+D: exit Julia.

  2. Ctrl+C: interrupt execution.

  3. Ctrl+L: clear screen.

  4. Append ; (semi-colon) to suppress displaying output from a command. Same as Matlab.

  5. include("filename.jl") to source a Julia code file.

Seek help

Which IDE?

  • Julia homepage lists many choices: Juno, VS Code, Vim, ...

  • Unfortunately at the moment there are no mature RStudio- or Matlab-like IDE for Julia yet.

  • For dynamic document, e.g., homework, I recommend Jupyter Notebook or JupyterLab.

  • For extensive Julia coding, myself has been happily using the editor VS Code with extensions Julia and VS Code Jupyter Notebook Previewer installed.

Julia package system

  • Each Julia package is a Git repository. Each Julia package name ends with .jl. E.g., Distributions.jl package lives at https://github.com/JuliaStats/Distributions.jl.
    Google search with PackageName.jl usually leads to the package on github.com.

  • The package ecosystem is rapidly maturing; a complete list of registered packages (which are required to have a certain level of testing and documentation) is at http://pkg.julialang.org/.

  • For example, the package called Distributions.jl is added with

    # in Pkg mode
    (@v1.6) pkg> add Distributions
    

    and "removed" (although not completely deleted) with

    # in Pkg mode
    (@v1.6) pkg> rm Distributions
    
  • The package manager provides a dependency solver that determines which packages are actually required to be installed.

  • Non-registered packages are added by cloning the relevant Git repository. E.g.,

    # in Pkg mode
    (@v1.6) pkg> add https://github.com/OpenMendel/SnpArrays.jl
    
  • A package needs only be added once, at which point it is downloaded into your local .julia/packages directory in your home directory.

In [2]:
readdir(Sys.islinux() ? ENV["JULIA_PATH"] * "/pkg/packages" : ENV["HOME"] * "/.julia/packages")
Out[2]:
639-element Vector{String}:
 ".DS_Store"
 "ADMIXTURE"
 "AMD"
 "ASL_jll"
 "AbstractFFTs"
 "AbstractMCMC"
 "AbstractTrees"
 "AccurateArithmetic"
 "Adapt"
 "AdvancedHMC"
 "AdvancedMH"
 "AdvancedVI"
 "AlgebraicMultigrid"
 ⋮
 "Zygote"
 "ZygoteRules"
 "ghr_jll"
 "libass_jll"
 "libfdk_aac_jll"
 "libpng_jll"
 "libsodium_jll"
 "libvorbis_jll"
 "nghttp2_jll"
 "x264_jll"
 "x265_jll"
 "xkbcommon_jll"
  • Directory of a specific package can be queried by pathof():
In [3]:
using Distributions

pathof(Distributions)
Out[3]:
"/Users/huazhou/.julia/packages/Distributions/cNe2C/src/Distributions.jl"
  • If you start having problems with packages that seem to be unsolvable, you may try just deleting your .julia directory and reinstalling all your packages.

  • Periodically, one should run update in Pkg mode, which checks for, downloads and installs updated versions of all the packages you currently have installed.

  • status lists the status of all installed packages.

  • Using functions in package.

    using Distributions
    

    This pulls all of the exported functions in the module into your local namespace, as you can check using the whos() command. An alternative is

    import Distributions
    

    Now, the functions from the Distributions package are available only using

    Distributions.<FUNNAME>
    

    All functions, not only exported functions, are always available like this.

Calling R from Julia

  • The RCall.jl package allows us to embed R code inside of Julia.

  • There are also PyCall.jl, MATLAB.jl, JavaCall.jl, CxxWrap.jl packages for interfacing with other languages.

In [4]:
using RCall

x = randn(1000)
# $ is the interpolation operator
R"""
hist($x, main="I'm plotting a Julia vector")
"""
Out[4]:
RObject{VecSxp}
$breaks
 [1] -3.0 -2.5 -2.0 -1.5 -1.0 -0.5  0.0  0.5  1.0  1.5  2.0  2.5  3.0  3.5  4.0

$counts
 [1]   4  12  46  90 136 200 173 180 104  41   7   4   2   1

$density
 [1] 0.008 0.024 0.092 0.180 0.272 0.400 0.346 0.360 0.208 0.082 0.014 0.008
[13] 0.004 0.002

$mids
 [1] -2.75 -2.25 -1.75 -1.25 -0.75 -0.25  0.25  0.75  1.25  1.75  2.25  2.75
[13]  3.25  3.75

$xname
[1] "`#JL`$x"

$equidist
[1] TRUE

attr(,"class")
[1] "histogram"
In [5]:
R"""
library(ggplot2)
qplot($x)
"""
Out[5]:
RObject{VecSxp}
┌ Warning: RCall.jl: `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
└ @ RCall /Users/huazhou/.julia/packages/RCall/eRsxl/src/io.jl:160
In [6]:
x = R"""
rnorm(10)
"""
Out[6]:
RObject{RealSxp}
 [1] -0.6593045  0.2219417 -1.4704323 -0.8410344  0.2447009 -0.5804216
 [7]  0.8104607  1.5540820  0.6305695  2.1678757
In [7]:
# collect R variable into Julia workspace
y = collect(x)
Out[7]:
10-element Vector{Float64}:
 -0.6593044530311302
  0.22194171544582944
 -1.470432300725073
 -0.8410343858775778
  0.2447009076423775
 -0.5804216213278303
  0.8104607407360201
  1.55408197052369
  0.6305694657699336
  2.167875663382325
  • Access Julia variables in R REPL mode:

    julia> x = rand(5) # Julia variable
    R> y <- $x
    
  • Pass Julia expression in R REPL mode:

    R> y <- $(rand(5))
    
  • Put Julia variable into R environment:

    julia> @rput x
    R> x
    
  • Get R variable into Julia environment:

    R> r <- 2
    Julia> @rget r
    
  • If you want to call Julia within R, check out the XRJulia package by John Chambers.

Some basic Julia code

In [8]:
# an integer, same as int in R
y = 1
Out[8]:
1
In [9]:
typeof(y)
Out[9]:
Int64
In [10]:
# a Float64 number, same as double in R
y = 1.0
Out[10]:
1.0
In [11]:
typeof(y)
Out[11]:
Float64
In [12]:
# Greek letters:  `\pi<tab>`
π
Out[12]:
π = 3.1415926535897...
In [13]:
typeof(π)
Out[13]:
Irrational{:π}
In [14]:
# Greek letters:  `\theta<tab>`
θ = y + π
Out[14]:
4.141592653589793
In [15]:
# emoji! `\:kissing_cat:<tab>`
😽 = 5.0
Out[15]:
5.0
In [16]:
# `\alpha<tab>\hat<tab>`
α̂ = π
Out[16]:
π = 3.1415926535897...
In [17]:
# vector of Float64 0s
x = zeros(5)
Out[17]:
5-element Vector{Float64}:
 0.0
 0.0
 0.0
 0.0
 0.0
In [18]:
# vector Int64 0s
x = zeros(Int, 5)
Out[18]:
5-element Vector{Int64}:
 0
 0
 0
 0
 0
In [19]:
# matrix of Float64 0s
x = zeros(5, 3)
Out[19]:
5×3 Matrix{Float64}:
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
In [20]:
# matrix of Float64 1s
x = ones(5, 3)
Out[20]:
5×3 Matrix{Float64}:
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
In [21]:
# define array without initialization
x = Matrix{Float64}(undef, 5, 3)
Out[21]:
5×3 Matrix{Float64}:
 2.2811e-314  2.2811e-314   2.28111e-314
 2.2811e-314  2.2816e-314   2.2816e-314
 2.2811e-314  2.28158e-314  2.2816e-314
 2.2811e-314  2.28158e-314  2.28161e-314
 2.2811e-314  2.28156e-314  2.28161e-314
In [22]:
# fill a matrix by 0s
fill!(x, 0)
Out[22]:
5×3 Matrix{Float64}:
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
 0.0  0.0  0.0
In [23]:
# initialize an array to be constant 2.5
fill(2.5, (5, 3))
Out[23]:
5×3 Matrix{Float64}:
 2.5  2.5  2.5
 2.5  2.5  2.5
 2.5  2.5  2.5
 2.5  2.5  2.5
 2.5  2.5  2.5
In [24]:
# rational number
a = 3//5
Out[24]:
3//5
In [25]:
typeof(a)
Out[25]:
Rational{Int64}
In [26]:
b = 3//7
Out[26]:
3//7
In [27]:
a + b
Out[27]:
36//35
In [28]:
# uniform [0, 1) random numbers
x = rand(5, 3)
Out[28]:
5×3 Matrix{Float64}:
 0.196633  0.25007    0.246755
 0.975534  0.831166   0.326326
 0.536742  0.416409   0.424956
 0.695618  0.156913   0.598685
 0.316543  0.0706881  0.145683
In [29]:
# uniform random numbers (in single precision)
x = rand(Float16, 5, 3)
Out[29]:
5×3 Matrix{Float16}:
 0.0947  0.6445   0.5186
 0.3857  0.3662   0.4414
 0.871   0.2822   0.9355
 0.21    0.06055  0.2793
 0.583   0.1318   0.659
In [30]:
# random numbers from {1,...,5}
x = rand(1:5, 5, 3)
Out[30]:
5×3 Matrix{Int64}:
 1  4  5
 3  4  1
 3  2  5
 3  5  3
 1  2  2
In [31]:
# standard normal random numbers
x = randn(5, 3)
Out[31]:
5×3 Matrix{Float64}:
  0.186185   0.957214  -1.57553
  0.432648  -0.630868   0.348913
 -1.47336   -0.707431   0.734245
  0.90118   -0.446944  -1.52815
  1.07925    1.47781   -0.571975
In [32]:
# range
1:10
Out[32]:
1:10
In [33]:
typeof(1:10)
Out[33]:
UnitRange{Int64}
In [34]:
1:2:10
Out[34]:
1:2:9
In [35]:
typeof(1:2:10)
Out[35]:
StepRange{Int64, Int64}
In [36]:
# integers 1-10
x = collect(1:10)
Out[36]:
10-element Vector{Int64}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
In [37]:
# or equivalently
[1:10...]
Out[37]:
10-element Vector{Int64}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
In [38]:
# Float64 numbers 1-10
x = collect(1.0:10)
Out[38]:
10-element Vector{Float64}:
  1.0
  2.0
  3.0
  4.0
  5.0
  6.0
  7.0
  8.0
  9.0
 10.0
In [39]:
# convert to a specific type
convert(Vector{Float64}, 1:10)
Out[39]:
10-element Vector{Float64}:
  1.0
  2.0
  3.0
  4.0
  5.0
  6.0
  7.0
  8.0
  9.0
 10.0

Matrices and vectors

Dimensions

In [40]:
x = randn(5, 3)
Out[40]:
5×3 Matrix{Float64}:
 -0.214486    0.0806611   1.27571
 -0.424663   -1.54493     0.469783
 -0.514114   -0.287893   -0.233671
 -0.496689    1.57703    -0.0242696
  0.0551647   1.05996     0.416554
In [41]:
size(x)
Out[41]:
(5, 3)
In [42]:
size(x, 1) # nrow() in R
Out[42]:
5
In [43]:
size(x, 2) # ncol() in R
Out[43]:
3
In [44]:
# total number of elements
length(x)
Out[44]:
15

Indexing

In [45]:
# 5 × 5 matrix of random Normal(0, 1)
x = randn(5, 5)
Out[45]:
5×5 Matrix{Float64}:
 -0.768742   1.37251   -0.94028    1.85472    0.329705
 -0.193331  -1.51501   -0.37093    0.139382   1.1934
  2.57975   -1.73623   -0.117903  -1.88187   -0.206465
  0.75469    1.16957   -0.26015    0.984381   0.268466
  1.13697    0.514303   1.0669     0.661268  -1.21845
In [46]:
# first column
x[:, 1]
Out[46]:
5-element Vector{Float64}:
 -0.7687419026222582
 -0.19333107496656124
  2.579745994697399
  0.7546904213447839
  1.1369742220845698
In [47]:
# first row
x[1, :]
Out[47]:
5-element Vector{Float64}:
 -0.7687419026222582
  1.37251008513823
 -0.9402799661512303
  1.8547226377996413
  0.32970481743486413
In [48]:
# sub-array
x[1:2, 2:3]
Out[48]:
2×2 Matrix{Float64}:
  1.37251  -0.94028
 -1.51501  -0.37093
In [49]:
# getting a subset of a matrix creates a copy, but you can also create "views"
z = view(x, 1:2, 2:3)
Out[49]:
2×2 view(::Matrix{Float64}, 1:2, 2:3) with eltype Float64:
  1.37251  -0.94028
 -1.51501  -0.37093
In [50]:
# same as
@views z = x[1:2, 2:3]
Out[50]:
2×2 view(::Matrix{Float64}, 1:2, 2:3) with eltype Float64:
  1.37251  -0.94028
 -1.51501  -0.37093
In [51]:
# change in z (view) changes x as well
z[2, 2] = 0.0
x
Out[51]:
5×5 Matrix{Float64}:
 -0.768742   1.37251   -0.94028    1.85472    0.329705
 -0.193331  -1.51501    0.0        0.139382   1.1934
  2.57975   -1.73623   -0.117903  -1.88187   -0.206465
  0.75469    1.16957   -0.26015    0.984381   0.268466
  1.13697    0.514303   1.0669     0.661268  -1.21845
In [52]:
# y points to same data as x
y = x
Out[52]:
5×5 Matrix{Float64}:
 -0.768742   1.37251   -0.94028    1.85472    0.329705
 -0.193331  -1.51501    0.0        0.139382   1.1934
  2.57975   -1.73623   -0.117903  -1.88187   -0.206465
  0.75469    1.16957   -0.26015    0.984381   0.268466
  1.13697    0.514303   1.0669     0.661268  -1.21845
In [53]:
# x and y point to same data
pointer(x), pointer(y)
Out[53]:
(Ptr{Float64} @0x0000000111399250, Ptr{Float64} @0x0000000111399250)
In [54]:
# changing y also changes x
y[:, 1] .= 0
x
Out[54]:
5×5 Matrix{Float64}:
 0.0   1.37251   -0.94028    1.85472    0.329705
 0.0  -1.51501    0.0        0.139382   1.1934
 0.0  -1.73623   -0.117903  -1.88187   -0.206465
 0.0   1.16957   -0.26015    0.984381   0.268466
 0.0   0.514303   1.0669     0.661268  -1.21845
In [55]:
# create a new copy of data
z = copy(x)
Out[55]:
5×5 Matrix{Float64}:
 0.0   1.37251   -0.94028    1.85472    0.329705
 0.0  -1.51501    0.0        0.139382   1.1934
 0.0  -1.73623   -0.117903  -1.88187   -0.206465
 0.0   1.16957   -0.26015    0.984381   0.268466
 0.0   0.514303   1.0669     0.661268  -1.21845
In [56]:
pointer(x), pointer(z)
Out[56]:
(Ptr{Float64} @0x0000000111399250, Ptr{Float64} @0x0000000147c59490)

Concatenate matrices

In [57]:
# 3-by-1 vector
[1, 2, 3]
Out[57]:
3-element Vector{Int64}:
 1
 2
 3
In [58]:
# 1-by-3 array
[1 2 3]
Out[58]:
1×3 Matrix{Int64}:
 1  2  3
In [59]:
# multiple assignment by tuple
x, y, z = randn(5, 3), randn(5, 2), randn(3, 5)
Out[59]:
([0.5349148928465062 -0.3382398970389121 -0.9594186631338332; 0.6894282403870624 0.09051368736236846 -0.895222900999779; … ; 0.37397098515937177 0.9435997512811811 0.34228780757734595; -0.6541382520670976 0.7572595427640615 -1.7539779998392409], [-2.0718940597540834 -0.9689459023183032; -0.333270564303245 -0.06423628918631011; … ; 1.4056517162118447 -1.6076218083511646; -1.0011670788250695 -0.11034455615992884], [0.2720028723593593 0.23144778671418487 … -1.104283516836434 -1.5623643520783892; 0.5381660680155546 -0.8841946593053203 … 0.636524755987337 -0.9132392345243479; -0.12679368122253004 0.198703443836664 … 0.535602833441876 0.9342379236743564])
In [60]:
[x y] # 5-by-5 matrix
Out[60]:
5×5 Matrix{Float64}:
  0.534915  -0.33824    -0.959419  -2.07189   -0.968946
  0.689428   0.0905137  -0.895223  -0.333271  -0.0642363
  1.01488    1.23035     0.078198  -1.11647    1.14552
  0.373971   0.9436      0.342288   1.40565   -1.60762
 -0.654138   0.75726    -1.75398   -1.00117   -0.110345
In [61]:
[x y; z] # 8-by-5 matrix
Out[61]:
8×5 Matrix{Float64}:
  0.534915  -0.33824    -0.959419  -2.07189   -0.968946
  0.689428   0.0905137  -0.895223  -0.333271  -0.0642363
  1.01488    1.23035     0.078198  -1.11647    1.14552
  0.373971   0.9436      0.342288   1.40565   -1.60762
 -0.654138   0.75726    -1.75398   -1.00117   -0.110345
  0.272003   0.231448    1.10148   -1.10428   -1.56236
  0.538166  -0.884195    0.821874   0.636525  -0.913239
 -0.126794   0.198703   -0.984517   0.535603   0.934238

Dot operation

Dot operation in Julia is elementwise operation, similar to Matlab.

In [62]:
x = randn(5, 3)
Out[62]:
5×3 Matrix{Float64}:
  0.0494776  -1.13422    -0.292431
 -1.83966     0.0102203   1.29322
  0.370799   -0.17178     0.157387
 -0.972744   -0.815663    0.540777
  0.591524    0.701522    1.23933
In [63]:
y = ones(5, 3)
Out[63]:
5×3 Matrix{Float64}:
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
 1.0  1.0  1.0
In [64]:
x .* y # same x * y in R
Out[64]:
5×3 Matrix{Float64}:
  0.0494776  -1.13422    -0.292431
 -1.83966     0.0102203   1.29322
  0.370799   -0.17178     0.157387
 -0.972744   -0.815663    0.540777
  0.591524    0.701522    1.23933
In [65]:
x .^ (-2)
Out[65]:
5×3 Matrix{Float64}:
 408.492        0.777335  11.6937
   0.295479  9573.49       0.597932
   7.27316     33.8889    40.3705
   1.05682      1.50307    3.41951
   2.85795      2.03197    0.651071
In [66]:
sin.(x)
Out[66]:
5×3 Matrix{Float64}:
  0.0494574  -0.906203   -0.288281
 -0.964075    0.0102202   0.961724
  0.36236    -0.170936    0.156738
 -0.826434   -0.72818     0.514802
  0.557627    0.645381    0.945565

Basic linear algebra

In [67]:
x = randn(5)
Out[67]:
5-element Vector{Float64}:
 -0.7900017339182867
 -0.0018383852521659263
 -0.9355297751284029
 -0.028246296211044238
  2.0265139209294984
In [68]:
using LinearAlgebra
# vector L2 norm
norm(x)
Out[68]:
2.367884837650836
In [69]:
# same as
sqrt(sum(abs2, x))
Out[69]:
2.367884837650836
In [70]:
y = randn(5) # another vector
# dot product
dot(x, y) # x' * y
Out[70]:
-1.3953153730071963
In [71]:
# same as
x'y
Out[71]:
-1.3953153730071963
In [72]:
x, y = randn(5, 3), randn(3, 2)
# matrix multiplication, same as %*% in R
x * y
Out[72]:
5×2 Matrix{Float64}:
 -0.53943    -0.821845
  0.462859   -1.31398
 -0.798802   -0.772382
  0.821015   -1.37641
  0.0837496   0.084106
In [73]:
x = randn(3, 3)
Out[73]:
3×3 Matrix{Float64}:
 -0.817818   -0.0158858   0.402896
  0.0483322   1.25873    -1.05049
  0.676956   -0.335275   -0.992148
In [74]:
# conjugate transpose
x'
Out[74]:
3×3 adjoint(::Matrix{Float64}) with eltype Float64:
 -0.817818    0.0483322   0.676956
 -0.0158858   1.25873    -0.335275
  0.402896   -1.05049    -0.992148
In [75]:
b = rand(3)
x'b # same as x' * b
Out[75]:
3-element Vector{Float64}:
  0.5749306585787634
  0.33461090642999347
 -1.4052467447041679
In [76]:
# trace
tr(x)
Out[76]:
-0.5512363398956722
In [77]:
det(x)
Out[77]:
0.9700626443415385
In [78]:
rank(x)
Out[78]:
3

Sparse matrices

In [79]:
using SparseArrays

# 10-by-10 sparse matrix with sparsity 0.1
X = sprandn(10, 10, .1)
Out[79]:
10×10 SparseMatrixCSC{Float64, Int64} with 10 stored entries:
  ⋅     ⋅           ⋅          ⋅        …    ⋅         ⋅        ⋅     ⋅ 
  ⋅     ⋅         -0.315539    ⋅             ⋅         ⋅        ⋅     ⋅ 
  ⋅     ⋅           ⋅          ⋅             ⋅         ⋅        ⋅   -0.138429
  ⋅     ⋅           ⋅          ⋅             ⋅         ⋅        ⋅     ⋅ 
  ⋅     ⋅           ⋅          ⋅             ⋅         ⋅        ⋅     ⋅ 
  ⋅     ⋅           ⋅          ⋅        …    ⋅         ⋅        ⋅     ⋅ 
  ⋅     ⋅           ⋅          ⋅             ⋅        1.08481   ⋅     ⋅ 
  ⋅   -0.0821398    ⋅          ⋅             ⋅         ⋅        ⋅     ⋅ 
  ⋅     ⋅           ⋅          ⋅           -0.414476   ⋅        ⋅     ⋅ 
  ⋅     ⋅           ⋅        -0.828599       ⋅         ⋅        ⋅     ⋅ 
In [80]:
# dump() in Julia is like str() in R
dump(X)
SparseMatrixCSC{Float64, Int64}
  m: Int64 10
  n: Int64 10
  colptr: Array{Int64}((11,)) [1, 1, 2, 3, 4, 7, 8, 9, 10, 10, 11]
  rowval: Array{Int64}((10,)) [8, 2, 10, 1, 4, 10, 7, 9, 7, 3]
  nzval: Array{Float64}((10,)) [-0.08213976430923124, -0.31553932700421344, -0.8285987703142571, -0.5112830922873857, 0.30367185812513176, 0.008618303951553729, -1.0342664766145304, -0.41447629561975424, 1.084812624186494, -0.13842919122189926]
In [81]:
# convert to dense matrix; be cautious when dealing with big data
Xfull = convert(Matrix{Float64}, X)
Out[81]:
10×10 Matrix{Float64}:
 0.0   0.0         0.0        0.0       …   0.0       0.0      0.0   0.0
 0.0   0.0        -0.315539   0.0           0.0       0.0      0.0   0.0
 0.0   0.0         0.0        0.0           0.0       0.0      0.0  -0.138429
 0.0   0.0         0.0        0.0           0.0       0.0      0.0   0.0
 0.0   0.0         0.0        0.0           0.0       0.0      0.0   0.0
 0.0   0.0         0.0        0.0       …   0.0       0.0      0.0   0.0
 0.0   0.0         0.0        0.0           0.0       1.08481  0.0   0.0
 0.0  -0.0821398   0.0        0.0           0.0       0.0      0.0   0.0
 0.0   0.0         0.0        0.0          -0.414476  0.0      0.0   0.0
 0.0   0.0         0.0       -0.828599      0.0       0.0      0.0   0.0
In [82]:
# convert a dense matrix to sparse matrix
sparse(Xfull)
Out[82]:
10×10 SparseMatrixCSC{Float64, Int64} with 10 stored entries:
  ⋅     ⋅           ⋅          ⋅        …    ⋅         ⋅        ⋅     ⋅ 
  ⋅     ⋅         -0.315539    ⋅             ⋅         ⋅        ⋅     ⋅ 
  ⋅     ⋅           ⋅          ⋅             ⋅         ⋅        ⋅   -0.138429
  ⋅     ⋅           ⋅          ⋅             ⋅         ⋅        ⋅     ⋅ 
  ⋅     ⋅           ⋅          ⋅             ⋅         ⋅        ⋅     ⋅ 
  ⋅     ⋅           ⋅          ⋅        …    ⋅         ⋅        ⋅     ⋅ 
  ⋅     ⋅           ⋅          ⋅             ⋅        1.08481   ⋅     ⋅ 
  ⋅   -0.0821398    ⋅          ⋅             ⋅         ⋅        ⋅     ⋅ 
  ⋅     ⋅           ⋅          ⋅           -0.414476   ⋅        ⋅     ⋅ 
  ⋅     ⋅           ⋅        -0.828599       ⋅         ⋅        ⋅     ⋅ 
In [83]:
# syntax for sparse linear algebra is same as dense linear algebra
β = ones(10)
X * β
Out[83]:
10-element Vector{Float64}:
 -0.5112830922873857
 -0.31553932700421344
 -0.13842919122189926
  0.30367185812513176
  0.0
  0.0
  0.05054614757196352
 -0.08213976430923124
 -0.41447629561975424
 -0.8199804663627034
In [84]:
# many functions apply to sparse matrices as well
sum(X)
Out[84]:
-1.927630131108092

Control flow and loops

  • if-elseif-else-end
if condition1
    # do something
elseif condition2
    # do something
else
    # do something
end
  • for loop
for i in 1:10
    println(i)
end
  • Nested for loop:
for i in 1:10
    for j in 1:5
        println(i * j)
    end
end

Same as

for i in 1:10, j in 1:5
    println(i * j)
end
  • Exit loop:
for i in 1:10
    # do something
    if condition1
        break # skip remaining loop
    end
end
  • Exit iteration:
for i in 1:10
    # do something
    if condition1
        continue # skip to next iteration
    end
    # do something
end

Functions

  • In Julia, all arguments to functions are passed by reference, in contrast to R and Matlab.

  • Function names ending with ! indicates that function mutates at least one argument, typically the first.

    sort!(x) # vs sort(x)
    
  • Function definition

function func(req1, req2; key1=dflt1, key2=dflt2)
    # do stuff
    return out1, out2, out3
end

Required arguments are separated with a comma and use the positional notation.
Optional arguments need a default value in the signature.
Semicolon is not required in function call.
return statement is optional.
Multiple outputs can be returned as a tuple, e.g., return out1, out2, out3.

  • Anonymous functions, e.g., x -> x^2, is commonly used in collection function or list comprehensions.

    map(x -> x^2, y) # square each element in x
    
  • Functions can be nested:

function outerfunction()
    # do some outer stuff
    function innerfunction()
        # do inner stuff
        # can access prior outer definitions
    end
    # do more outer stuff
end
  • Functions can be vectorized using the Dot syntax:
In [85]:
# defined for scalar
function myfunc(x)
    return sin(x^2)
end

x = randn(5, 3)
myfunc.(x)
Out[85]:
5×3 Matrix{Float64}:
  0.980998   0.594273  -0.97791
 -0.865159   0.995027   0.144067
  0.0843822  0.597199   0.0654844
  0.356943   0.138678   0.108388
  0.659464   0.926429  -0.587781
  • Collection function (think this as the series of apply functions in R).

    Apply a function to each element of a collection:

map(f, coll) # or
map(coll) do elem
    # do stuff with elem
    # must contain return
end
In [86]:
map(x -> sin(x^2), x)
Out[86]:
5×3 Matrix{Float64}:
  0.980998   0.594273  -0.97791
 -0.865159   0.995027   0.144067
  0.0843822  0.597199   0.0654844
  0.356943   0.138678   0.108388
  0.659464   0.926429  -0.587781
In [87]:
map(x) do elem
    elem = elem^2
    return sin(elem)
end
Out[87]:
5×3 Matrix{Float64}:
  0.980998   0.594273  -0.97791
 -0.865159   0.995027   0.144067
  0.0843822  0.597199   0.0654844
  0.356943   0.138678   0.108388
  0.659464   0.926429  -0.587781
In [88]:
# Mapreduce
mapreduce(x -> sin(x^2), +, x)
Out[88]:
3.220483448019637
In [89]:
# same as
sum(x -> sin(x^2), x)
Out[89]:
3.220483448019637
  • List comprehension
In [90]:
[sin(2i + j) for i in 1:5, j in 1:3] # similar to Python
Out[90]:
5×3 Matrix{Float64}:
  0.14112   -0.756802  -0.958924
 -0.958924  -0.279415   0.656987
  0.656987   0.989358   0.412118
  0.412118  -0.544021  -0.99999
 -0.99999   -0.536573   0.420167

Type system

  • Every variable in Julia has a type.

  • When thinking about types, think about sets.

  • Everything is a subtype of the abstract type Any.

  • An abstract type defines a set of types

    • Consider types in Julia that are a Number:

  • We can explore type hierarchy with typeof(), supertype(), and subtypes().
In [91]:
typeof(1.0), typeof(1)
Out[91]:
(Float64, Int64)
In [92]:
supertype(Float64)
Out[92]:
AbstractFloat
In [93]:
subtypes(AbstractFloat)
Out[93]:
4-element Vector{Any}:
 BigFloat
 Float16
 Float32
 Float64
In [94]:
# Is Float64 a subtype of AbstractFloat?
Float64 <: AbstractFloat
Out[94]:
true
In [95]:
# On 64bit machine, Int == Int64
Int == Int64
Out[95]:
true
In [96]:
# convert to Float64
convert(Float64, 1)
Out[96]:
1.0
In [97]:
# same as
Float64(1)
Out[97]:
1.0
In [98]:
# Float32 vector
x = randn(Float32, 5)
Out[98]:
5-element Vector{Float32}:
  0.41623175
  2.094286
 -1.7258859
  0.13745666
  0.0074228114
In [99]:
# convert to Float64
convert(Vector{Float64}, x)
Out[99]:
5-element Vector{Float64}:
  0.41623175144195557
  2.0942859649658203
 -1.7258858680725098
  0.13745665550231934
  0.007422811351716518
In [100]:
# same as
Float64.(x)
Out[100]:
5-element Vector{Float64}:
  0.41623175144195557
  2.0942859649658203
 -1.7258858680725098
  0.13745665550231934
  0.007422811351716518
In [101]:
# convert Float64 to Int64
convert(Int, 1.0)
Out[101]:
1
In [102]:
convert(Int, 1.5) # should use round(1.5)
InexactError: Int64(1.5)

Stacktrace:
 [1] Int64
   @ ./float.jl:723 [inlined]
 [2] convert(#unused#::Type{Int64}, x::Float64)
   @ Base ./number.jl:7
 [3] top-level scope
   @ In[102]:1
 [4] eval
   @ ./boot.jl:360 [inlined]
 [5] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1094
In [103]:
round(Int, 1.5)
Out[103]:
2

Multiple dispatch

  • Multiple dispatch lies in the core of Julia design. It allows built-in and user-defined functions to be overloaded for different combinations of argument types.

  • Let's consider a simple "doubling" function:

In [104]:
g(x) = x + x
Out[104]:
g (generic function with 1 method)
In [105]:
g(1.5)
Out[105]:
3.0

This definition is too broad, since some things, e.g., strings, can't be added

In [106]:
g("hello world")
MethodError: no method matching +(::String, ::String)
Closest candidates are:
  +(::Any, ::Any, ::Any, ::Any...) at operators.jl:560
  +(::ChainRulesCore.Composite{P, T} where T, ::P) where P at /Users/huazhou/.julia/packages/ChainRulesCore/JWrYo/src/differential_arithmetic.jl:132
  +(::ChainRulesCore.AbstractThunk, ::Any) at /Users/huazhou/.julia/packages/ChainRulesCore/JWrYo/src/differential_arithmetic.jl:108
  ...

Stacktrace:
 [1] g(x::String)
   @ Main ./In[104]:1
 [2] top-level scope
   @ In[106]:1
 [3] eval
   @ ./boot.jl:360 [inlined]
 [4] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1094
  • This definition is correct but too restrictive, since any Number can be added.
In [107]:
g(x::Float64) = x + x
Out[107]:
g (generic function with 2 methods)
  • This definition will automatically work on the entire type tree above!
In [108]:
g(x::Number) = x + x
Out[108]:
g (generic function with 3 methods)

This is a lot nicer than

function g(x)
    if isa(x, Number)
        return x + x
    else
        throw(ArgumentError("x should be a number"))
    end
end
  • methods(func) function display all methods defined for func.
In [109]:
methods(g)
Out[109]:
# 3 methods for generic function g:
  • g(x::Float64) in Main at In[107]:1
  • g(x::Number) in Main at In[108]:1
  • g(x) in Main at In[104]:1
  • When calling a function with multiple definitions, Julia will search from the narrowest signature to the broadest signature.

  • @which func(x) marco tells which method is being used for argument signature x.

In [110]:
# an Int64 input
@which g(1)
Out[110]:
g(x::Number) in Main at In[108]:1
In [111]:
# a Vector{Float64} input
@which g(randn(5))
Out[111]:
g(x) in Main at In[104]:1

Just-in-time compilation (JIT)

Following figures are taken from Arch D. Robinson's slides Introduction to Writing High Performance Julia.

Julia toolchain Julia toolchain
  • Julia's efficiency results from its capability to infer the types of all variables within a function and then call LLVM to generate optimized machine code at run-time.

Consider the g (doubling) function defined earlier. This function will work on any type which has a method for +.

In [112]:
g(2), g(2.0)
Out[112]:
(4, 4.0)

Step 1: Parse Julia code into abstract syntax tree (AST).

In [113]:
@code_lowered g(2)
Out[113]:
CodeInfo(
1 ─ %1 = x + x
└──      return %1
)

Step 2: Type inference according to input type.

In [114]:
@code_warntype g(2)
Variables
  #self#::Core.Const(g)
  x::Int64

Body::Int64
1 ─ %1 = (x + x)::Int64
└──      return %1
In [115]:
@code_warntype g(2.0)
Variables
  #self#::Core.Const(g)
  x::Float64

Body::Float64
1 ─ %1 = (x + x)::Float64
└──      return %1

Step 3: Compile into LLVM bytecode (equivalent of R bytecode generated by the compiler package).

In [116]:
@code_llvm g(2)
;  @ In[108]:1 within `g'
define i64 @julia_g_5795(i64 signext %0) {
top:
; ┌ @ int.jl:87 within `+'
   %1 = shl i64 %0, 1
; └
  ret i64 %1
}
In [117]:
@code_llvm g(2.0)
;  @ In[107]:1 within `g'
define double @julia_g_5818(double %0) {
top:
; ┌ @ float.jl:326 within `+'
   %1 = fadd double %0, %0
; └
  ret double %1
}

We didn't provide a type annotation. But different LLVM code was generated depending on the argument type!

In R or Python, g(2) and g(2.0) would use the same code for both.

In Julia, g(2) and g(2.0) dispatches to optimized code for Int64 and Float64, respectively.

For integer input x, LLVM compiler is smart enough to know x + x is simple shifting x by 1 bit, which is faster than addition.

  • Step 4: Lowest level is the assembly code, which is machine dependent.
In [118]:
@code_native g(2)
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ In[108]:1 within `g'
; │┌ @ int.jl:87 within `+'
	leaq	(%rdi,%rdi), %rax
; │└
	retq
	nopw	%cs:(%rax,%rax)
; └
In [119]:
@code_native g(2.0)
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ In[107]:1 within `g'
; │┌ @ float.jl:326 within `+'
	vaddsd	%xmm0, %xmm0, %xmm0
; │└
	retq
	nopw	%cs:(%rax,%rax)
; └

Profiling Julia code

Julia has several built-in tools for profiling. The @time marco outputs run time and heap allocation.

In [120]:
# a function defined earlier
function tally(x::Array)
    s = zero(eltype(x))
    for v in x
        s += v
    end
    s
end

using Random
Random.seed!(123)
a = rand(20_000_000)
@time tally(a) # first run: include compile time
  0.037080 seconds (4.36 k allocations: 228.751 KiB, 19.71% compilation time)
Out[120]:
1.0000233387279043e7
In [121]:
@time tally(a)
  0.027201 seconds (1 allocation: 16 bytes)
Out[121]:
1.0000233387279043e7

For more robust benchmarking, the BenchmarkTools.jl package is highly recommended.

In [122]:
using BenchmarkTools

@benchmark tally($a)
Out[122]:
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     24.846 ms (0.00% GC)
  median time:      25.112 ms (0.00% GC)
  mean time:        25.250 ms (0.00% GC)
  maximum time:     31.004 ms (0.00% GC)
  --------------
  samples:          198
  evals/sample:     1

The Profile module gives line by line profile results.

In [123]:
using Profile

Profile.clear()
@profile tally(a)
Profile.print(format=:flat)
 Count  Overhead File                    Line Function
 =====  ======== ====                    ==== ========
    22         0 In[120]                    5 tally(x::Vector{Float64})
    22         0 @Base/array.jl           777 iterate
    23         1 @Base/boot.jl            360 eval
    23         0 @Base/essentials.jl      708 #invokelatest#2
    23         0 @Base/essentials.jl      706 invokelatest
    22        22 @Base/int.jl             441 <
    22         0 @Base/int.jl             448 <
    23         0 @Base/loading.jl        1094 include_string(mapexpr::typeof(...
    23         0 @Base/task.jl            406 (::IJulia.var"#15#18")()
    23         0 ...ia/src/eventloop.jl     8 eventloop(socket::ZMQ.Socket)
    23         0 .../execute_request.jl    67 execute_request(socket::ZMQ.Soc...
    23         0 .../SoftGlobalScope.jl    65 softscope_include_string(m::Mod...
Total snapshots: 96

One can use ProfileView package for better visualization of profile data:

using ProfileView

ProfileView.view()
In [124]:
# check type stability
@code_warntype tally(a)
Variables
  #self#::Core.Const(tally)
  x::Vector{Float64}
  @_3::Union{Nothing, Tuple{Float64, Int64}}
  s::Float64
  v::Float64

Body::Float64
1 ─ %1  = Main.eltype(x)::Core.Const(Float64)
       (s = Main.zero(%1))
 %3  = x::Vector{Float64}
       (@_3 = Base.iterate(%3))
 %5  = (@_3 === nothing)::Bool
 %6  = Base.not_int(%5)::Bool
└──       goto #4 if not %6
2 ┄ %8  = @_3::Tuple{Float64, Int64}::Tuple{Float64, Int64}
       (v = Core.getfield(%8, 1))
 %10 = Core.getfield(%8, 2)::Int64
       (s = s + v)
       (@_3 = Base.iterate(%3, %10))
 %13 = (@_3 === nothing)::Bool
 %14 = Base.not_int(%13)::Bool
└──       goto #4 if not %14
3 ─       goto #2
4 ┄       return s
In [125]:
# check LLVM bitcode
@code_llvm tally(a)
;  @ In[120]:2 within `tally'
define double @julia_tally_6446({}* nonnull align 16 dereferenceable(40) %0) {
top:
;  @ In[120]:4 within `tally'
; ┌ @ array.jl:777 within `iterate' @ array.jl:777
; │┌ @ array.jl:197 within `length'
    %1 = bitcast {}* %0 to { i8*, i64, i16, i16, i32 }*
    %2 = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %1, i64 0, i32 1
    %3 = load i64, i64* %2, align 8
; │└
; │┌ @ int.jl:448 within `<' @ int.jl:441
    %.not = icmp eq i64 %3, 0
; │└
   br i1 %.not, label %L43, label %L18

L18:                                              ; preds = %top
; │┌ @ array.jl:801 within `getindex'
    %4 = bitcast {}* %0 to double**
    %5 = load double*, double** %4, align 8
    %6 = load double, double* %5, align 8
; └└
;  @ In[120]:5 within `tally'
; ┌ @ float.jl:326 within `+'
   %7 = fadd double %6, 0.000000e+00
; └
; ┌ @ array.jl:777 within `iterate'
; │┌ @ int.jl:448 within `<' @ int.jl:441
    %.not1319.not = icmp eq i64 %3, 1
; │└
   br i1 %.not1319.not, label %L43, label %L37.lr.ph

L37.lr.ph:                                        ; preds = %L18
   %8 = icmp ugt i64 %3, 2
   %umax = select i1 %8, i64 %3, i64 2
   br label %L37

L37:                                              ; preds = %L37, %L37.lr.ph
   %9 = phi i64 [ 1, %L37.lr.ph ], [ %value_phi420, %L37 ]
   %10 = phi double [ %7, %L37.lr.ph ], [ %14, %L37 ]
   %value_phi420 = phi i64 [ 2, %L37.lr.ph ], [ %13, %L37 ]
; │┌ @ array.jl:801 within `getindex'
    %11 = getelementptr inbounds double, double* %5, i64 %9
    %12 = load double, double* %11, align 8
; │└
; │┌ @ int.jl:87 within `+'
    %13 = add nuw nsw i64 %value_phi420, 1
; └└
; ┌ @ float.jl:326 within `+'
   %14 = fadd double %12, %10
; └
; ┌ @ array.jl:777 within `iterate'
; │┌ @ int.jl:448 within `<' @ int.jl:441
    %exitcond.not = icmp eq i64 %value_phi420, %umax
; │└
   br i1 %exitcond.not, label %L43, label %L37

L43:                                              ; preds = %L37, %L18, %top
   %value_phi9 = phi double [ 0.000000e+00, %top ], [ %7, %L18 ], [ %14, %L37 ]
; └
;  @ In[120]:7 within `tally'
  ret double %value_phi9
}
In [126]:
@code_native tally(a)
	.section	__TEXT,__text,regular,pure_instructions
; ┌ @ In[120]:4 within `tally'
; │┌ @ array.jl:777 within `iterate' @ array.jl:777
; ││┌ @ array.jl:197 within `length'
	movq	8(%rdi), %rdx
; ││└
; ││┌ @ int.jl:448 within `<' @ int.jl:441
	testq	%rdx, %rdx
; ││└
	je	L62
; ││┌ @ array.jl:801 within `getindex'
	movq	(%rdi), %rax
	vxorpd	%xmm0, %xmm0, %xmm0
; │└└
; │ @ In[120]:5 within `tally'
; │┌ @ float.jl:326 within `+'
	vaddsd	(%rax), %xmm0, %xmm0
; │└
; │┌ @ array.jl:777 within `iterate'
; ││┌ @ int.jl:448 within `<' @ int.jl:441
	cmpq	$1, %rdx
; ││└
	je	L61
	cmpq	$2, %rdx
	movl	$2, %ecx
	cmovaq	%rdx, %rcx
	movl	$1, %edx
	nopl	(%rax)
; │└
; │┌ @ float.jl:326 within `+'
L48:
	vaddsd	(%rax,%rdx,8), %xmm0, %xmm0
; │└
; │┌ @ array.jl:777 within `iterate'
; ││┌ @ int.jl:448 within `<' @ int.jl:441
	incq	%rdx
	cmpq	%rdx, %rcx
; ││└
	jne	L48
; │└
; │ @ In[120]:7 within `tally'
L61:
	retq
; │ @ In[120] within `tally'
L62:
	vxorps	%xmm0, %xmm0, %xmm0
; │ @ In[120]:7 within `tally'
	retq
	nopw	%cs:(%rax,%rax)
; └

Exercise: Annotate the loop in tally function by @simd and look for the difference in LLVM bitcode and machine code.

Memory profiling

Detailed memory profiling requires a detour. First let's write a script bar.jl, which contains the workload function tally and a wrapper for profiling.

In [127]:
;cat bar.jl
using Profile

function tally(x::Array)
    s = zero(eltype(x))
    for v in x
        s += v
    end
    s
end

# call workload from wrapper to avoid misattribution bug
function wrapper()
    y = rand(10000)
    # force compilation
    println(tally(y))
    # clear allocation counters
    Profile.clear_malloc_data()
    # run compiled workload
    println(tally(y))
end

wrapper()

Next, in terminal, we run the script with --track-allocation=user option.

In [128]:
#;julia --track-allocation=user bar.jl

The profiler outputs a file bar.jl.51116.mem.

In [129]:
;cat bar.jl.51116.mem
        - using Profile
        - 
        - function tally(x::Array)
        -     s = zero(eltype(x))
        -     for v in x
        -         s += v
        -     end
        -     s
        - end
        - 
        - # call workload from wrapper to avoid misattribution bug
        - function wrapper()
        0     y = rand(10000)
        -     # force compilation
        0     println(tally(y))
        -     # clear allocation counters
        0     Profile.clear_malloc_data()
        -     # run compiled workload
      528     println(tally(y))
        - end
        - 
        - wrapper()
        -