An overview of statistial learning
Statistical learning refers to a vast set of tools for understanding
data.
Two categories: supervised and unsupervised.
Supervised
: Build models based on known input and output data, then use the model
for prediction or estimation.
Unsupervised
: There are inputs but no supervised outputs. We can learn
relationships and structures from such data.
Notation and simple algebra
Let the X denotes a matrix. Xij represents the value of row i and column j.
X=⎝⎜⎜⎛x11x21…xn1x12x22…xn2⋯⋯…⋯x1px2p…xnp⎠⎟⎟⎞
For the rows of X, wich we write as x1,x2,…,xn .
xi=⎝⎜⎛xi1xi2⋮xip⎠⎟⎞
Vectors are by default represented as columns. We use X1, X2, …, to represent the columns of X.
Xj=⎝⎜⎜⎜⎛x1jx2j⋮xnj⎠⎟⎟⎟⎞
Using this notation, the matirx X can be written as:
X=(X1 X2 ⋯ Xp)
or
X=⎝⎜⎜⎜⎛x1Tx2T⋮xnT⎠⎟⎟⎟⎞
The T notation denotes the transpose
of a matrix.
We use yi to denote the i th observation of the variable on which we wish to make predictions. Hence we wirte the set of all n observations in vector format as
y=⎝⎜⎜⎜⎛y1y2⋮yn⎠⎟⎟⎟⎞
The out observed data consits of {(x1,y1),(x2,y2),…,(xn,yn)}, where each xi is a vector
of length p.
Occationally we will want to indicate the dimension of a particular object.
To indicate that an object is a scalar: a∈R.
To indicate that it is avector of length k: a∈Rk.
To indicate that an object is a r×s matrix: A∈Rr×s.
The product of matrix A and matrixt B is denoted AB.
A=(1324)and B=(5768)
Then
AB=(1324)(5768)=(1×5+2×73×5+4×71×6+2×83×6+3×8)
get the R package