Prediction Parking Demand
1.Introduction
1.1. Research Background
Since the Ford Model T was first introduced on October 1, 1908, the automobile has become an integral part of society. One of the problems that have plagued mankind at the same time is the problem of parking. Difficulty in parking first of all imposes additional time costs on individuals. Vehicles wandering on the streets in search of a parking space add to the congestion on the roads. At the same time, more exhaust emissions and environmental pollution are created. To address these problems, some local governments have attempted to ensure the availability of parking spaces by dynamically adjusting parking prices, such as the SFMTA in our study, which seeks to implement a dynamic parking pricing policy for on-street parkingmeters between 9 a.m. and 6 p.m., Monday through Saturday, every week. This is intended to use the increased parking prices to ensure that there are always one or two spaces available on each street for vehicles to park, thus making it difficult for vehicles to find a space.
1.2. User & Use Case
This study is intended to help the City of San Francisco better evaluate and utilize the existing dynamic pricing policy, i.e., to be able to predict and regulate the occupancy rate of parking spaces through dynamic pricing and various spatial and temporal factors to ensure the availability of parking spaces. To address this issue, first, we will download the relevant parking meter usage data from the SFMTA website. Secondly, we will also use US census data to filter some parameters related to parking occupancy. After completing the exploration and sorting of the above data, we will build a regression model to analyze and predict the occupancy rate of parking spaces. At the same time, we will evaluate the validity of our model using cross-validation methods.
2. Data Wrangling.
2.1. Setup
Let’s load relevant libraries and some graphic themes.
library(prettydoc)
library(rmdformats)
library(tidyverse)
library(ggplot2)
library(tidycensus)
library(sf)
library(spdep)
library(caret)
library(ckanr)
library(FNN)
library(grid)
library(gridExtra)
library(ggcorrplot)
library(osmdata)
library(tigris)
library(osmextract)
library(curl)
library(reshape2)
library(glue)
library(dismo)
library(spgwr)
library(MASS)
library(lme4)
library(data.table)
library(kableExtra)
library(stargazer)
library(RSocrata)
library(knitr)
library(gifski)
library(rjson)
library(riem)
library(gganimate)
library(viridis)
library(lubridate)
library(tigris)
library(geojsonio)
library(magrittr)
#-----Import external functions & a palette-----
= "https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/DATA/"
root.dir source("https://raw.githubusercontent.com/urbanSpatial/Public-Policy-Analytics-Landing/master/functions.r")
<- theme(
plotTheme plot.title =element_text(size=12),
plot.subtitle = element_text(size=8),
plot.caption = element_text(size = 6),
axis.text.x = element_text(size = 10, angle = 45, hjust = 1),
axis.text.y = element_text(size = 10),
axis.title.y = element_text(size = 10),
# Set the entire chart region to blank
panel.background=element_blank(),
plot.background=element_blank(),
#panel.border=element_rect(colour="#F0F0F0"),
# Format the grid
panel.grid.major=element_line(colour="#D0D0D0",size=.2),
axis.ticks=element_blank())
<- function(base_size = 35, title_size = 45) {
mapTheme1 theme(
text = element_text( color = "black"),
plot.title = element_text(hjust = 0.5, size = title_size, colour = "black",face = "bold"),
plot.subtitle = element_text(hjust = 0.5,size=base_size,face="italic"),
plot.caption = element_text(size=base_size,hjust=0),
axis.ticks = element_blank(),
panel.spacing = unit(6, 'lines'),
panel.background = element_blank(),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
strip.background = element_rect(fill = "grey80", color = "white"),
strip.text = element_text(size=base_size),
axis.title = element_text(face = "bold",size=base_size),
axis.text = element_blank(),
plot.background = element_blank(),
legend.background = element_blank(),
legend.title = element_text(size=base_size,colour = "black", face = "italic"),
legend.text = element_text(size=base_size,colour = "black", face = "italic"),
strip.text.x = element_text(size = base_size,face = "bold")
)
}
<- theme(plot.title =element_text(size=12),
mapTheme plot.subtitle = element_text(size=8),
plot.caption = element_text(size = 6),
axis.line=element_blank(),
axis.text.x=element_blank(),
axis.text.y=element_blank(),
axis.ticks=element_blank(),
axis.title.x=element_blank(),
axis.title.y=element_blank(),
panel.background=element_blank(),
panel.border=element_blank(),
panel.grid.major=element_line(colour = 'transparent'),
panel.grid.minor=element_blank(),
legend.direction = "vertical",
legend.position = "right",
plot.margin = margin(1, 1, 1, 1, 'cm'),
legend.key.height = unit(1, "cm"), legend.key.width = unit(0.2, "cm"))
<- c("#eff3ff","#bdd7e7","#6baed6","#3182bd","#08519c")
palette5 <- c("#D2FBD4","#92BCAB","#527D82","#123F5A")
palette4 <- c("#6baed6","#08519c")
palette2 <- "#174C4F"
palette1_main <- '#F9B294' palette1_assist
2.2. Import Parking Data
Through the official website of San Francisco Municipal Transportation Agency (SFMTA), we can download the parking data since 2020. Due to the high demand of parking in San Francisco, the total data volume is 105 million rows. Therefore, when importing the api, we filtered the data to only get the data from 20220501 9am to 20220514 6pm. The two weeks of data are used to form the basic dataset of our model.
## This part variable has been saved locally as dat, location
<- read.socrata("https://data.sfgov.org/resource/imvp-dq3v.csv?$where=SESSION_START_DT%20between%20%272022-05-1T9:00:00%27%20and%20%272022-05-14T17:00:00%27")
dat
<-
location read.socrata("https://data.sfgov.org/resource/8vzz-qzz9.csv") %>%
st_as_sf(coords = c("latitude", "longitude"), crs = 4326, agr = "constant")%>%
st_transform('ESRI:102271') %>%
distinct()
glimpse(dat)
By calculating the total time each parking meter is used during the day, we can here calculate the parking occupancy of each parking meter based on the post_id. Here we just process the data and wait for the analysis to be done later.
## This part variable has been saved locally as dat2, parking_rate
<- dat %>%
dat2 left_join(location, by=('post_id'='post_id')) %>%
::select(post_id, street_block, session_start_dt, session_end_dt, meter_event_type, gross_paid_amt, on_offstreet_type, ms_space_num, old_rate_area, street_name,shape,geometry) %>%
dplyrmutate(end_interval15 = floor_date(ymd_hms(session_end_dt), unit = "15 mins"),
start_interval15 = floor_date(ymd_hms(session_start_dt), unit = "15 mins"),
ms_space_num = ifelse(ms_space_num == 0, 1, ms_space_num))
<- dat2 %>%
parking_rate mutate(length = end_interval15 - start_interval15,
parking_hour = as.numeric(length)/3600,) %>%
::select(start_interval15, street_block, length,gross_paid_amt,parking_hour,post_id) %>%
dplyrmutate(rate=gross_paid_amt/parking_hour) %>%
filter(parking_hour!=0) %>%
::select(post_id, start_interval15, rate, street_block) dplyr
2.3. Import Census Data
In US Census Data, we have selected the following data: TotalPop, Whites, AfricanAmericans, Asians, MedHHINC, MedRent. First, we believe that there is a direct relationship between population size and parking occupancy, which is the basic logic of supply and demand in the market. Second, we also argue that resident income and rents also have an impact on parking occupancy.
<-
CensusData get_acs(geography = "block group",
variables = c("B01003_001E","B02001_002E","B19013_001E","B25058_001E",'B02001_003E','B02001_005E'),
year=2019, state="CA", county="SAN FRANCISCO", geometry=T, output="wide") %>%
st_transform('ESRI:102243') %>%
rename(Census_TotalPop = B01003_001E,
Census_Whites = B02001_002E,
Census_AfricanAmericans = B02001_003E,
Census_Asians = B02001_005E,
Census_MedHHInc = B19013_001E,
Census_MedRent = B25058_001E) %>%
::select(-NAME,-starts_with("B")) %>%
dplyrmutate(Census_pctWhite = ifelse(Census_TotalPop > 0, 100*Census_Whites / Census_TotalPop,0),
Census_pctAfricanAmericans = ifelse(Census_TotalPop > 0, 100*Census_AfricanAmericans / Census_TotalPop,0),
Census_pctAsians = ifelse(Census_TotalPop > 0, 100*Census_Asians / Census_TotalPop,0),
Census_blockgroupareasqm = as.numeric(st_area(.)),
Census_areaperpeople = ifelse(Census_blockgroupareasqm > 0,Census_blockgroupareasqm/Census_TotalPop,0),
year = "2019") %>%
::select(-Census_Whites,-Census_AfricanAmericans,-Census_Asians, -GEOID) dplyr
# Geometries
<-
CensusData2 get_acs(geography = "tract",
variables = c("B01003_001E"),
year=2019, state="CA", county="SAN FRANCISCO", geometry=T, output="wide") %>%
st_transform('ESRI:102243') %>%
::select(-NAME,-starts_with("B"))
dplyr
#-----Join Parking Meter data to census data-----
<-st_join(location,dplyr::select(CensusData,-Census_blockgroupareasqm,-year,-geometry,-Census_TotalPop))
trainData <-st_join(location,dplyr::select(CensusData2, GEOID, -geometry)) trainData
## plot parking meters
= st_read("D:/Upenn/Upenn Lec/05-MUSA-508/Assign-Final/DATAEXPO/SF/trim.shp") %>%
boundary st_transform(crs) %>%
filter(objectid == "32")
<- CensusData2 %>%
CensusData3 filter(GEOID!="06075980401")%>%
filter(GEOID!="06075017902")
ggplot()+
geom_sf(data = CensusData3, color="gray50",fill = "transparent",size=1,linetype = "dashed")+
geom_sf(data = boundary,fill = "transparent", color="black",size=100000000)+
geom_sf(data = trainData, size = 1,color=palette1_main )+
scale_color_manual(values = palette5,
name = "Home sale prices")+
labs(title = "Map of Parking Meters in San Francisco",
subtitle = "In tract unit")+
mapTheme()