GWR(地理加权回归)预测分析中国各省份开关窗情况(R语言)
英文版链接:https://blog.csdn.net/qq_42686550/article/details/103882263本文章,使用GIS中的GWR(地理加权回归)(Geographically weighted regression)来预测95个城市的楼宇开关窗情况.有以下几个步骤:气泡图,OLS,Moran’s I(莫兰系数)和GWR气泡图首先是制作气泡地图的代码 (R语言...
英文版链接:https://blog.csdn.net/qq_42686550/article/details/103882263
本文章,使用GIS中的GWR(地理加权回归)(Geographically weighted regression)来预测95个城市的楼宇开关窗情况.
有以下几个步骤:
气泡图,OLS,Moran’s I(莫兰系数)和GWR
气泡图
首先是制作气泡地图的代码 (R语言):
library(ggplot2)
library(ggmap)
library(maptools)
library(maps)
library(dplyr)
library(viridis)
library(mapproj)
library(readr)
China <- map_data("world") %>% filter(region=="China") #Region of China
myData <- read.csv('cw/ex.csv') # Read file
myBreaks <- c(0.1,0.3, 0.5,0.7, 1)
# Plot map
map <-
ggplot() +
geom_polygon(data=China,aes(long, y = lat, group = group), fill="white", alpha=0.3) +
geom_point(data=myData, aes(x=longitude, y = latitude, size=open.rate, color=open.rate, alpha=open.rate), shape=20, stroke=FALSE) +
scale_size_continuous(name="Opening rate", trans="sqrt", range=c(1,12), breaks=myBreaks) +
scale_alpha_continuous(name="Opening rate", trans="sqrt", range=c(0.1, 0.6), breaks=myBreaks) +
scale_color_viridis(option="magma", trans="sqrt", breaks=myBreaks, name="Opening rate" ) + #Transformation "sqrt"
theme_void() + coord_map() + xlim(70,149)+ ylim(18,55)+
guides( colour = guide_legend()) +
ggtitle("Window opening rate in China") +
theme(
legend.position = c(0.85, 0.8),
text = element_text(color = "#f5f5f2"), # Choose colour
plot.background = element_rect(fill = "#4e4d47", color = NA),
panel.background = element_rect(fill = "#4e4d47", color = NA),
legend.background = element_rect(fill = "#4e4d47", color = NA),
plot.title = element_text(size= 16, hjust=0.1, color = "#f5f5f2", margin = margin(b = -0.1, t = 0.4, l = 2, unit = "cm")),
)
plot(map)
气泡图展示了全国95个城市全年开关窗的百分比,有四个自变量(温度湿度等)和一个因变量(开窗率)
OLS模型
首先是我们需要的lib:
library(tmap)
library(readr)
library(rgdal)
library(ggplot2)
library(ggmap)
library(maptools)
library(maps)
library(dplyr)
library(viridis)
library(mapproj)
library(spdep)
library(car)
library(spgwr)
如果你需要下载中国的shapefile可以从这里下载.
首先来看一下传统的OLS模型的代码和效果:
chinaSHP <- readOGR("exe/gadm36_CHN_1.shp")# Read SHP file
dataWindow <- read_csv("cw/data.csv") # Read data
model <- lm(`open rate` ~ `Dry-bulb temperature`+`Wind speed`+`External relative humidity`
+`Atmospheric pressure`,data=dataWindow) # OLS model
summary(model)
plot(model) # Plot result
durbinWatsonTest(model) #Autocorrelation
lm(formula = `open rate` ~ `Dry-bulb temperature` + `Wind speed` +
`External relative humidity` + `Atmospheric pressure`, data = dataWindow)
Residuals:
Min 1Q Median 3Q Max
-0.153347 -0.035615 0.009812 0.031448 0.251990
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.2491505 0.0600809 -4.147 7.62e-05 ***
`Dry-bulb temperature` 0.0569267 0.0024493 23.242 < 2e-16 ***
`Wind speed` 0.0036693 0.0069773 0.526 0.60026
`External relative humidity` 0.0005466 0.0005036 1.085 0.28060
`Atmospheric pressure` -0.0027743 0.0008336 -3.328 0.00127 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.06105 on 90 degrees of freedom
Multiple R-squared: 0.8957, Adjusted R-squared: 0.8911
F-statistic: 193.3 on 4 and 90 DF, p-value: < 2.2e-16
可以看出有些变量 并不是显著的(风速等),但是这些变量是在OLS模型中的也就是在global模型中的,再局部模型中这些变量可能会变成显著的,因此在此我们将这些变量通通保留。这时我们再验证一下遍历将相关性是否独立:
lag Autocorrelation D-W Statistic p-value
1 0.06111587 1.846829 0.376
Alternative hypothesis: rho != 0
可以看出D-W测试结果为1.84,似乎这是OK的。但是!!! 我们要用的是GWR,这个独立性可能在空间上是非独立的,因此我们需要检测残差在空间上的分布:
dataWindow$residuals <- residuals(model)
#Function to plot residuals
bubblefunc <- function(myData,size,name,myBreaks){
China <- map_data("world") %>% filter(region=="China")
map <-
ggplot() +
geom_polygon(data=China,aes(long, y = lat, group = group), fill="white", alpha=0.3) +
geom_point(data=myData, aes(x=longitude, y = latitude, size=size, color=size, alpha=size), shape=20, stroke=FALSE) +
scale_size_continuous(name=name, trans="identity",range=c(1,12),breaks=myBreaks) +
scale_alpha_continuous(name=name, trans="identity", range=c(0.1, 0.6),breaks=myBreaks) +
scale_color_viridis(option="magma", trans="identity", name=name,breaks=myBreaks ) +
theme_void() + coord_map() + xlim(70,149)+ ylim(18,55)+
guides( colour = guide_legend()) +
ggtitle(paste("Window opening rate" ,name,"in China")) +
theme(
legend.position = c(0.85, 0.8),
text = element_text(color = "#f5f5f2",size=15), # Choose colour
plot.background = element_rect(fill = "#4e4d47", color = NA),
panel.background = element_rect(fill = "#4e4d47", color = NA),
legend.background = element_rect(fill = "#4e4d47", color = NA),
plot.title = element_text(size= 16, hjust=0.1, color = "#f5f5f2", margin = margin(b = -0.1, t = 0.4, l = 2, unit = "cm")),
)
plot(map)
bubblefunc(dataWindow,dataWindow$residuals,"Residuals",c(-0.2,-0.1,0,0.1,0.3)) # Plot
可以看出似乎南部地区的残差存在聚集性,也就是说高残差靠近附近的高残差,可能存在空间非独立性。这时我们用莫兰系数来测试一下.
Morans’I
xy=dataWindow[,c(13,14)]
chinasf <- st_as_sf(x=xy,coords = c("longitude", "latitude"))
chinasp <- as(chinasf, "Spatial") # Transform to "spatial"
coordsW <- coordinates(chinasp) # calculate the centroids of all Wards in London
plot(coordsW)
knn_wards <- knearneigh(coordsW, k=4)# nearest neighbours
LWard_knn <- knn2nb(knn_wards)
plot(LWard_knn, coordinates(coordsW), col="blue") #Plot
#create a spatial weights matrix object from weight
Lward.knn_4_weight <- nb2listw(LWard_knn, style="C")
#moran's I test on the residuals
moran.test(dataWindow$residuals, Lward.knn_4_weight)
Moran I test under randomisation
data: dataWindow$residuals
weights: Lward.knn_4_weight
Moran I statistic standard deviate = 2.7069, p-value = 0.003396
alternative hypothesis: greater
sample estimates:
Moran I statistic Expectation Variance
0.166649889 -0.010638298 0.004289717
可以看出global Morans’ I为0.167. 莫兰系数一般在-1到1之间,0表示没用空间聚集。而0.167说明我们在这个模型中,有着较弱的空间相关性,因此我们需要使用GWR来解决这个问题。
GWR模型
#calculate kernel bandwidth
GWRbandwidth <- gwr.sel(`open rate` ~ `Dry-bulb temperature`+`Wind speed`+`External relative humidity`
+`Atmospheric pressure`,data=dataWindow,coords=coordsW,adapt=T)
#run the gwr model
GWRModel <- gwr(`open rate` ~ `Dry-bulb temperature`+`Wind speed`+`External relative humidity`
+`Atmospheric pressure`,data=dataWindow,coords=coordsW,adapt=GWRbandwidth,
hatmatrix = TRUE,se.fit = TRUE)
GWRModel #print the results of the model
results<-as.data.frame(GWRModel$SDF)
names(results)
gwr(formula = `open rate` ~ `Dry-bulb temperature` +
`Wind speed` + `External relative humidity` +
`Atmospheric pressure`, data = dataWindow, coords = coordsW,
adapt = GWRbandwidth, hatmatrix = TRUE, se.fit = TRUE)
Kernel function: gwr.Gauss
Adaptive quantile: 0.0333945 (about 3 of 95 data points)
Summary of GWR coefficient estimates at data points:
Min. 1st Qu. Median 3rd Qu. Max. Global
X.Intercept. -1.55295044 -0.56679395 -0.30426784 -0.12435906 0.40684111 -0.2492
X.Dry.bulb.temperature. 0.02109119 0.05192010 0.06482929 0.06990573 0.08456590 0.0569
X.Wind.speed. -0.09409310 -0.00837505 -0.00171803 0.00572094 0.02819445 0.0037
X.External.relative.humidity. -0.00539565 -0.00085167 0.00093522 0.00260762 0.01414508 0.0005
X.Atmospheric.pressure. -0.02081367 -0.00529484 -0.00299928 0.00036854 0.01054743 -0.0028
Number of data points: 95
Effective number of parameters (residual: 2traceS - traceS'S): 55.28598
Effective degrees of freedom (residual: 2traceS - traceS'S): 39.71402
Sigma (residual: 2traceS - traceS'S): 0.03666693
Effective number of parameters (model: traceS): 44.0667
Effective degrees of freedom (model: traceS): 50.9333
Sigma (model: traceS): 0.03237767
Sigma (ML): 0.02370744
AICc (GWR p. 61, eq 2.33; p. 96, eq. 4.21): -266.3886
AIC (GWR p. 96, eq. 4.22): -397.3086
Residual sum of squares: 0.05339408
Quasi-global R2: 0.9834058
可以看出,一些因变量存在正负数差异,并随着空间变化而变化,我们用地图来可视化一下。
dataWindow$coeTemperature <- results$X.Dry.bulb.temperature.
bubblefunc(dataWindow,dataWindow$coeTemperature,'coeTemperature',c(0.02,0.05, 0.06,0.07, 0.08))
可以看出,温度这个因变量再南部有着较高的比重。但是我们仍需考虑这些系数的显著性。在文本中,如果一个系数比标准大2倍,那么我们就认为此时这个系数具有显著性。
#statistically significant test
sigTest_Temp = abs(GWRModel$SDF$"`Dry-bulb temperature`") - 2 * GWRModel$SDF$"`Dry-bulb temperature`_se"
dataWindow$GWRTemp <- sigTest_Temp
bubblefunc(subset(dataWindow,dataWindow$GWRTemp>0),subset(dataWindow$coeTemperature,dataWindow$GWRTemp>0),'GWRTemp ',c(0.02,0.05, 0.06,0.07, 0.08))
看上去没有太大变化,说明大多数地区的温度变量都是显著的,此时同理我们再来看一下其他变量的分布。
很明显有些变量在一些地区并不是显著的,并且存在地区差异,因此我们认为在只考虑这四个因变量的时候,开关窗行为存在地区上的差异,并具体表现如上图所示。
Github链接:https://github.com/Connor666/Geographic-Information-Systems-and-Science-assessment
更多推荐
所有评论(0)