Introduction to Statistial Learning with applications in R

Chapter6 Linear Model Selection and Regularization

Posted on March 6, 2019 | 30 minutes | 6342 words | S Wang

In the regression setting, the standard linear model $$ \tag{6.1}Y = \beta_0 + \beta_1 X_1 + \cdots + \beta_p X_p + \epsilon $$ is commonly used to describe the relationship between a response $Y$ and a set of variables $X_1, X_2, \ldots, X_p$. We have seen in Chapter 3 that one typiclaly fits this model using least square. In the chapters that follow, we consider some approaches for extending the linear model framework. [Read More]

Chapter5 Resampling Methods

Posted on March 1, 2019 | 10 minutes | 2011 words | S Wang

Resampling methods are an indispensable tool in modern statistics. They involve repeatedly drawing samples from a training set and refitting a model of interest on each sample in order to obtain additional information abou the fitted model. For example, in order to estimate the variability of a linear regression fit, we can repeatedly draw different samples from the training data, fit a linear regression to each new sample, and then examine the extent to which the resulting fits differ. [Read More]

Chapter4 Classification

Posted on February 18, 2019 | 17 minutes | 3522 words | S Wang

In this chapter, we study approaches for predicting qualitative responses, a process that is known as classification. Predicting a qualitative response for an obervation can be refered to as classifying that observation, since it involves assigning the observation to a category, or class. On the other hand, often the methods used for classification first predict the probability of each of the categories of a qualitative variable, as the basis for making the classification. [Read More]

Chapter3: Linear Regression

Posted on February 7, 2019 | 30 minutes | 6271 words | S Wang

Linear Regression Linear regression is a very simple supervised learning methods, though still very useful. Simple Linear Regression Simple linear regression is a straightforward approach for predicting a quantitative response $Y$ on the basis of a single predictor variable $X$. It assumes that there is approximately a linear relationship between $X$ and $Y$. $$ Y \approx \beta_{0} + \beta_{1}X $$ In the equation, $\beta_0$ and $\beta_1$ are two unknown constants that represetn the intercept and slope termes in the linear model. [Read More]

Chapter2: Basic concepts of Statistical Learning

Posted on February 1, 2019 | 14 minutes | 2923 words | S Wang

Statistical Learning What is statistial learning Suppose we observe a quantitative response $Y$ and $p$ different predictors, $X_1, X_2, \ldots,X_p$ . We assume that there is a relationship between $Y$ and $X=(X_1, X_2,\ldots,X_p)$, which can be written as $$ Y=f(X)+\epsilon $$ Here $f$ is some fixed but unknown fucntion of $X_1,X_2,\ldots,X_p$, and $\epsilon$ is a random error term, which is independent of $X$ and has mean zero. In this formulation, $f$ represents the systematic informationa $X$ provides about $Y$. [Read More]

islr statistics learning R

Introduction

Posted on January 30, 2019 | 2 minutes | 403 words | S Wang

An overview of statistial learning Statistical learning refers to a vast set of tools for understanding data. Two categories: supervised and unsupervised. Supervised: Build models based on known input and output data, then use the model for prediction or estimation. Unsupervised: There are inputs but no supervised outputs. We can learn relationships and structures from such data. Notation and simple algebra Let the $X$ denotes a matrix. $X_{ij}$ represents the value of row $i$ and column $j$. [Read More]