An overview of statistial learning
Statistical learning refers to a vast set of tools for understanding data.
Two categories: supervised and unsupervised.
Supervised
: Build models based on known input and output data, then use the model
for prediction or estimation.
Unsupervised
: There are inputs but no supervised outputs. We can learn
relationships and structures from such data.
Notation and simple algebra
Let the $X$ denotes a matrix. $X_{ij}$ represents the value of row $i$ and column $j$.
For the rows of $X$, wich we write as $x_1, x_2, …, x_n$ .
Vectors are by default represented as columns. We use $X_1$, $X_2$, $\ldots$, to represent the columns of $X$.
Using this notation, the matirx $X$ can be written as:
or
The $^T$ notation denotes the transpose
of a matrix.
We use $y_i$ to denote the $i$ th observation of the variable on which we wish to make predictions. Hence we wirte the set of all $n$ observations in vector format as
The out observed data consits of {$ (x_1,y_1),(x_2,y_2),\ldots ,(x_n,y_n)$}, where each $x_i$ is a vector
of length $p$.
Occationally we will want to indicate the dimension of a particular object.
To indicate that an object is a scalar: $a \in \mathbb{R}$.
To indicate that it is avector of length $k$: $a \in \mathbb{R}^k$.
To indicate that an object is a $r \times s$ matrix: $ A \in \mathbb{R}^{r \times s}$.
The product of matrix $A$ and matrixt $B$ is denoted $AB$.
Then
get the R package
install.packages("ISLR")