머신러닝

Regression

경희대생 2023. 2. 24. 01:43

Regression? 

통계학에서 회귀 분석은 관찰된 연속형 변수들에 대해 두 변수 사이의 모형을 구한 뒤 적합도를 측정해 내는 분석 방법이다. 회귀분석은 시간에 따라 변화하는 데이터나 어떤 영향, 가설적 실험, 인과 관계의 모델링등의 통계적 예측에 이용 (위키백과) 

 

[Regression model]

Regression means finding a relationship between independent variables Xi and dependent variables Y through model f.

For the relationship, we want to see how strong the relationship is between each of the independent variables X with Y.

We also see how accurately the relationship was predicted after it was conducted.

 

[X and Y]

Y is known as the response, target, or outcome that is wished to be predicted, whereas, X is called features, inputs, or predictors.

If there is only one X(input), it is called simple (factor) regression. If there is more than one input, it’s called multiple (factor) regression.

If it is multiple (factors) regression, we can put the input variables into a single input vector (feature vector), which can be X = (X1, X2, ..., Xn)

 

[Y]

The model can be represented as Y = f(X1, X2, ..., Xn) + errors. = f(X) + errors.

(Errors capture measurement errors or discrepancies )

 


 

 

[True regression function = f(x) =expected value of Y at X=x ]

For the single regression, Y = f(X1) + errors, we predict the Y value from X1 from ‘ the regression function.

The regression function is given by f(x) = E(Y|X=x), where x is valid data of X1. This Regression function (expected value Y at X=x) is the ideal predictor of Y.

The true regression function might not be linear.

From Regression function f(x), which might not be linear, we can construct a linear regression estimate, hat{f(x)}.

 

[Different Regression Methods]

Among different Regression methods which are linear regression, non-linear regression, similarity-based regression (nearest-neighbors regression), and tree-based regression, here, we choose the linear regression model.

Of course, we can not predict the best regression method (No Free Lunch Theorem; if we don’t have any prior information, we can not choose a specific regression model that will be always better). However, Linear Regression works in many cases, even if there is still no guarantee

 

[Linear Regression Model, linear regression estimate = hat{f(x)}]

Because the data we saw from the lectures tend to have linear associations between X and, the linear model can be preferred.

Among regression models, Linear regression is a model to find the linear relationship between X and Y, as a supervised learning model.

From true regression function f(x) = E(Y|X=x), which might not be linear, we can construct linear regression estimate hat{f(x)} = Beta0 + Beta1 *X1 + Beta2 *X2 +..... +Betan * Xn

If it is simple linear regression, it can be represented as 'Y = hat{f(x)}  + Errors = Beta0 +Beta1*X1 + Errors’

If it is multiple linear regression, it can be represented as 'Y = hat{f(x)} +Errors = Beta0 + Sum of (Betai * Xi) + Errors’

 

 

[best linear predictor, fL(X)]  

Among linear regression estimates hat{f(x)}1, hat{f(x)}2, ..., we can find the best linear predictor, fL(x), which is close to the true regression function f(x). Because true regression function f(x) is the goal that we want to follow, so from the linear regression estimate, we try to find the best linear regression predictor out of all estimators that are mostly similar to the true regression function f(x)

 

 


[RSS, MSE]

 

 

[Values estimated from the data, hat{y}, RSS, MSE, Gradient Descent ]

 

For gaining the best linear predictor, fL(x), we can minimize the RSS or MSE with the Gradient Descent method.

If we have several linear regression estimates, from the linear regression estimates’ parameter, we can obtain the expected value hat{y}.

hat{y} = hat{f(x)} + Errors.

RSS is represented Sum of (hat{yi} –yi)^2, and MSE is 1/n * Sum of (hat{yi} – yi)^2, which we try to minimize for gaining the best parameters of linear regression estimate.

To minimize MSE, we can use Gradient Descent, in which we update the parameters of linear regression estimate by partial differentiation with respect to each of Beta i

 

 

[Gradient Descent]

1.       Iterative optimization algorithm

 

2.       local minimum

 

3.       differentiable function

 

4.       repeated steps in the opposite direction of the gradient (update opposite direction)

                      -differential value positive -> Lowering the gradient value (-)

                      -differential value negative -> Raising the gradient value (+)

 

5.       Learning rate

                     -If the learning rate is too small, it will not be converged

                     -If the learning rate is too large, it can diverge.

 

6.       Repeat until the differential value becomes zero (converge)

 


 


[Interpretation of Linear Regression Estimate’s Parameters]

Y = hat{f(x)} + Errors = beta0 + beta1*x1 + beta2*x2 + beta3*x3 + Errors

 

 

coefficient beta1 estimates

1.       the expected change in Y when x1 changes

2.       all other predictors are held fixed

 

 

coefficient beta i estimates

 

1.       the expected change in X when x i changes

2.       all other predictors (X1 ~ Xi) are held fixed

 

 

But, predictors (X1 ~ Xi ) change together (more predictors come, the parameters can change)

 

[making the model more useful & less wrong]

 

-more useful

1.       solving more useful problems

2.       interpreting models in meaningful, practical terms

 

-less wrong

 

1.       if it is linear regression

-> two assumptions (1. Linear, 2. Additive (부가적인) – addictive (x))

 

 

2.       if it is not linear

 

  •  Polynomials (raising dimensions)
  •  Step functions (계단함수)
  • Splines (스플라인 곡선, polynomials 곡선)
  • Local regression ( moving average -> polynomials regression)
  • Generalized additive models (polynomials regression 확장) -> Logistic Regression

 

 

3.       if the additivity assumption doesn’t hold

  •  incorporate interaction terms (교호작용, x1과 x2가 결합됨으로써 y에 중요한 영향을 미침)
  • these things give increased flexibility and interpretability

'머신러닝' 카테고리의 다른 글

MNIST  (0) 2023.02.24
Logistic Regression  (0) 2023.02.24
Measuring Performance & Model Valuation  (1) 2023.02.23
Sklearn (library)  (0) 2023.02.23
Class, Object, POO (Object Oriented Programming), Inheritance  (0) 2023.02.22