1、简单线性回归

简单线性回归是一个线性回归模型。一个独立变量和一个因变量,目的是找到的因变量和自变量之间的线性函数,尽可能准确地,预测因变量的值作为自变量的函数。这是常见的做法是:利用最小二乘方法使得残差(数据集的点和拟合线之间的垂直距离)最小化。找到残差最小时的拟合曲线即为我们要找的结果。
假设拟合曲线为:

y=β0+β1x
<script type="math/tex; mode=display" id="MathJax-Element-8063">y=\beta_0+\beta_1x</script>
这样我们的目标就是找到斜率 β1 <script type="math/tex" id="MathJax-Element-8064">\beta_1</script>和 y <script type="math/tex" id="MathJax-Element-8065">y</script>轴截距β0<script type="math/tex" id="MathJax-Element-8066">\beta_0</script>,换成数学表达式就是找到 β0 <script type="math/tex" id="MathJax-Element-8067">\beta_0</script>和 β1 <script type="math/tex" id="MathJax-Element-8068">\beta_1</script>使得下面的表达式最小:
mini=1n{yi(β0+β1xi)}2
<script type="math/tex; mode=display" id="MathJax-Element-8069">min \sum_{i=1}^{n}\{y_i -(\beta_0+\beta_1x_i)\}^2</script>
下面是求解过程:
=i=1n{yi(β0+β1xi)}2
<script type="math/tex; mode=display" id="MathJax-Element-8070">=\sum_{i=1}^{n}\{y_i -(\beta_0+\beta_1x_i)\}^2</script>
=i=1n{yiβ1xiβ0}2
<script type="math/tex; mode=display" id="MathJax-Element-8071">=\sum_{i=1}^{n}\{y_i -\beta_1x_i-\beta_0\}^2</script>
y=yiβ1xi <script type="math/tex" id="MathJax-Element-8072">y^*=y_i-\beta_1x_i</script>可以将上式简化为
=i=1n{yβ0}2
<script type="math/tex; mode=display" id="MathJax-Element-8073">=\sum_{i=1}^{n}\{y^*-\beta_0\}^2</script>
要使得上式最小化,只有 β0 <script type="math/tex" id="MathJax-Element-8074">\beta_0</script>等于 y <script type="math/tex" id="MathJax-Element-8075">y^*</script>的平均值的时候才能使得上式最小。
β0=yin=(yiβ1xi)n=y¯β1x¯
<script type="math/tex; mode=display" id="MathJax-Element-8076">\beta_0=\frac{\sum y_{i}^{*}}{n}=\frac{\sum (y_{i}-\beta_1x_i)}{n}=\overline{y}-\beta_1\overline{x}</script>
β0 <script type="math/tex" id="MathJax-Element-8077">\beta_0</script>代入原始式子得到
=i=1n{yiβ1xiy¯+β1x¯}2
<script type="math/tex; mode=display" id="MathJax-Element-8078">=\sum_{i=1}^{n}\{y_i -\beta_1x_i-\overline{y}+\beta_1\overline{x}\}^2</script>
=i=1n{yiy¯(xix¯)β1}2
<script type="math/tex; mode=display" id="MathJax-Element-8079">=\sum_{i=1}^{n}\{y_i -\overline{y}-(x_i-\overline{x})\beta_1\}^2</script>
yi^=yiy¯ <script type="math/tex" id="MathJax-Element-8080">\hat {y_{i}}=y_i-\overline{y}</script>和 xi^=xix¯ <script type="math/tex" id="MathJax-Element-8081">\hat {x_{i}}=x_i-\overline{x}</script>
=i=1n{yi^xi^β1}2
<script type="math/tex; mode=display" id="MathJax-Element-8082">=\sum_{i=1}^{n}\{\hat {y_{i}}-\hat {x_{i}}\beta_1\}^2</script>
同上面 β0 <script type="math/tex" id="MathJax-Element-8083">\beta_0</script> 的道理, xi^β1 <script type="math/tex" id="MathJax-Element-8084">\hat {x_{i}}\beta_1</script>等于 yi^ <script type="math/tex" id="MathJax-Element-8085">\hat {y_{i}}</script>的均值时上式最小,这样的得到 β1 <script type="math/tex" id="MathJax-Element-8086">\beta_1</script>的解:
β1=yi^xi^xi^2=(yiy¯)(xix¯)(xix¯)2
<script type="math/tex; mode=display" id="MathJax-Element-8087">\beta_1=\frac { \sum \hat {y_i} \hat {x_i} } {\sum \hat {x_i}^2}=\frac{\sum (y_i-\overline{y})(x_i-\overline{x})}{\sum (x_i-\overline{x})^2}</script>
β1=(yiy¯)(xix¯)/(n1)(xix¯)2/(n1)
<script type="math/tex; mode=display" id="MathJax-Element-8088">\beta_1=\frac{\sum (y_i-\overline{y})(x_i-\overline{x})/(n-1)}{\sum (x_i-\overline{x})^2 /(n-1)}</script>
β1=cov(y,x)cov(x,x)=cov(y,x)var(x)
<script type="math/tex; mode=display" id="MathJax-Element-8089">\beta_1=\frac {cov(y,x)}{cov(x,x)}=\frac {cov(y,x)}{var(x)}</script>

2、线性回归

给定一个数据集 {yi,xi1,...,xip}ni=1 <script type="math/tex" id="MathJax-Element-1944">\{y_i,x_{i1},...,x_{ip}\}_{i=1}^{n}</script> ,线性回归模型主要是为了找到变量 yi <script type="math/tex" id="MathJax-Element-1945">y_i</script> 和向量 X <script type="math/tex" id="MathJax-Element-1946"></script>的线性关系。
This relationship is modeled through a disturbance term or error variable εi — an unobserved random variable that adds noise to the linear relationship between the dependent variable and regressors. Thus the model takes the form

待续。。。

参考资料:
1、https://en.wikipedia.org/wiki/Simple_linear_regression
2、https://en.wikipedia.org/wiki/Linear_regression

Logo

腾讯云面向开发者汇聚海量精品云计算使用和开发经验,营造开放的云计算技术生态圈。

更多推荐