::p_load(sf, tidyverse, funModeling) pacman
In Class Exercise 2: Geospatial Data Wrangling
1 Scenario:
Water is an important resource to mankind. Clean and accessible water is critical to human health. It provides a healthy environment, a sustainable economy, reduces poverty and ensures peace and security. Yet over 40% of the global population does not have access to sufficient clean water. By 2025, 1.8 billion people will be living in countries or regions with absolute water scarcity, according to UN-Water. The lack of water poses a major threat to several sectors, including food security. Agriculture uses about 70% of the world’s accessible freshwater.
Developing countries are most affected by water shortages and poor water quality. Up to 80% of illnesses in the developing world are linked to inadequate water and sanitation. Despite technological advancement, providing clean water to the rural community is still a major development issues in many countries globally, especially countries in the Africa continent.
To address the issue of providing clean and sustainable water supply to the rural community, a global Water Point Data Exchange (WPdx) project has been initiated. The main aim of this initiative is to collect water point related data from rural areas at the water point or small water scheme level and share the data via WPdx Data Repository, a cloud-based data library. What is so special of this project is that data are collected based on WPDx Data Standard.
1.1 The Task:
The specific tasks of this take-home exercise are as follows:
Using appropriate sf method, import the shapefile into R and save it in a simple feature data frame format. Note that there are three Projected Coordinate Systems of Nigeria, they are: EPSG: 26391, 26392, and 26303. You can use any one of them.
Using appropriate tidyr and dplyr methods, derive the proportion of functional and non-functional water point at LGA level.
Combining the geospatial and aspatial data frame into simple feature data frame.
Visualising the distribution of water point by using appropriate analytical visualisation methods.
2 Getting Started: Load packages
3 Handling Geospatial Data
##Importing Geospatial
geoBoundaries Data set
Show code
<- st_read(dsn = "data/geospatial", layer = "geoBoundaries-NGA-ADM2") |>
geo_nga st_transform(crs = 26392)
Reading layer `geoBoundaries-NGA-ADM2' from data source
`/Users/junhaoteo/Documents/junhao2309/IS415/In-class_Ex/In-class_Ex02/data/geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 774 features and 5 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS: WGS 84
NGA Data Set
<- st_read("data/geospatial/",
NGA layer = "nga_admbnda_adm2_osgof_20190417") %>%
st_transform(crs = 26392)
Reading layer `nga_admbnda_adm2_osgof_20190417' from data source
`/Users/junhaoteo/Documents/junhao2309/IS415/In-class_Ex/In-class_Ex02/data/geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 774 features and 16 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2.668534 ymin: 4.273007 xmax: 14.67882 ymax: 13.89442
Geodetic CRS: WGS 84
3.1 Importing Aspatial data
Use filter to extract only “Nigeria”
Show code
<- read_csv(file = "data/aspatial/WPdx.csv") |>
wp_nga filter(`#clean_country_name` == "Nigeria")
4 Converting Aspatial Data into Geospatial
Changes only the “New Georeferenced Column” but maintains wp_nga as a tibble dataframe Method in Hands-On_Ex1 also works
Show code
$Geometry = st_as_sfc(wp_nga$`New Georeferenced Column`)
wp_nga
wp_nga
# A tibble: 97,478 × 75
row_id `#source` #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
<dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 158721 Federal Minis… 5.07 6.62 02/19/… Yes Boreho… Well Mechan…
2 158892 Federal Minis… 5.09 7.09 02/06/… Yes Boreho… Well Hand P…
3 323117 Federal Minis… 5.91 8.77 08/31/… Yes Boreho… Well Hand P…
4 300176 Federal Minis… 5.23 7.32 05/17/… Yes Boreho… Well Mechan…
5 324346 Federal Minis… 6.88 3.36 08/17/… Yes Boreho… Well Mechan…
6 297273 Federal Minis… 6.59 3.29 05/26/… Yes Boreho… Well Mechan…
7 296853 Federal Minis… 6.60 3.26 06/02/… Yes Boreho… Well Mechan…
8 323866 Federal Minis… 6.20 6.73 09/18/… Yes Boreho… Well Mechan…
9 297044 Federal Minis… 6.61 3.30 05/26/… Yes Boreho… Well Mechan…
10 324321 Federal Minis… 6.96 3.60 08/16/… Yes Boreho… Well Mechan…
# … with 97,468 more rows, 66 more variables: `#water_tech_category` <chr>,
# `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
# `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
# `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
# `#rehabilitator` <chr>, `#management_clean` <chr>, `#status_clean` <chr>,
# `#pay` <chr>, `#fecal_coliform_presence` <chr>,
# `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …
Convert wp_NGA into an sf data.frame and transforming it into the Nigeria projected coordinate system
Show code
<- st_sf(wp_nga, crs = 4326) |>
wp_sf st_transform(crs = 26392)
wp_sf
Simple feature collection with 97478 features and 74 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 28907.91 ymin: 33736.93 xmax: 1293293 ymax: 1092883
Projected CRS: Minna / Nigeria Mid Belt
# A tibble: 97,478 × 75
row_id `#source` #lat_…¹ #lon_…² #repo…³ #stat…⁴ #wate…⁵ #wate…⁶ #wate…⁷
* <dbl> <chr> <dbl> <dbl> <chr> <chr> <chr> <chr> <chr>
1 158721 Federal Minis… 5.07 6.62 02/19/… Yes Boreho… Well Mechan…
2 158892 Federal Minis… 5.09 7.09 02/06/… Yes Boreho… Well Hand P…
3 323117 Federal Minis… 5.91 8.77 08/31/… Yes Boreho… Well Hand P…
4 300176 Federal Minis… 5.23 7.32 05/17/… Yes Boreho… Well Mechan…
5 324346 Federal Minis… 6.88 3.36 08/17/… Yes Boreho… Well Mechan…
6 297273 Federal Minis… 6.59 3.29 05/26/… Yes Boreho… Well Mechan…
7 296853 Federal Minis… 6.60 3.26 06/02/… Yes Boreho… Well Mechan…
8 323866 Federal Minis… 6.20 6.73 09/18/… Yes Boreho… Well Mechan…
9 297044 Federal Minis… 6.61 3.30 05/26/… Yes Boreho… Well Mechan…
10 324321 Federal Minis… 6.96 3.60 08/16/… Yes Boreho… Well Mechan…
# … with 97,468 more rows, 66 more variables: `#water_tech_category` <chr>,
# `#facility_type` <chr>, `#clean_country_name` <chr>, `#clean_adm1` <chr>,
# `#clean_adm2` <chr>, `#clean_adm3` <chr>, `#clean_adm4` <chr>,
# `#install_year` <dbl>, `#installer` <chr>, `#rehab_year` <lgl>,
# `#rehabilitator` <chr>, `#management_clean` <chr>, `#status_clean` <chr>,
# `#pay` <chr>, `#fecal_coliform_presence` <chr>,
# `#fecal_coliform_value` <dbl>, `#subjective_quality` <chr>, …
5 Geospatial Data Cleaning
##Exclude redundent fields
Show code
<- NGA %>%
NGA select(c(3:4, 8:9))
5.1 Checking duplicate name and amending them
$ADM2_EN[duplicated(NGA$ADM2_EN) == TRUE] NGA
[1] "Bassa" "Ifelodun" "Irepodun" "Nasarawa" "Obi" "Surulere"
Correct the areas as they are located in different states
$ADM2_EN[94] <- "Bassa, Kogi"
NGA$ADM2_EN[95] <- "Bassa, Plateau"
NGA$ADM2_EN[304] <- "Ifelodun, Kwara"
NGA$ADM2_EN[305] <- "Ifelodun, Osun"
NGA$ADM2_EN[355] <- "Irepodun, Kwara"
NGA$ADM2_EN[356] <- "Irepodun, Osun"
NGA$ADM2_EN[519] <- "Nasarawa, Kano"
NGA$ADM2_EN[520] <- "Nasarawa, Nasarawa"
NGA$ADM2_EN[546] <- "Obi, Benue"
NGA$ADM2_EN[547] <- "Obi, Nasarawa"
NGA$ADM2_EN[693] <- "Surulere, Lagos"
NGA$ADM2_EN[694] <- "Surulere, Oyo" NGA
Rerun code to check
Show code
$ADM2_EN[duplicated(NGA$ADM2_EN) == TRUE] NGA
character(0)
6 Data Wrangling for Water Point Data
freq() showcase the distribution of waterpoint status visually.
freq(data = wp_sf,
input = '#status_clean')
#status_clean frequency percentage cumulative_perc
1 Functional 47228 48.45 48.45
2 Non-Functional 30638 31.43 79.88
3 <NA> 10154 10.42 90.30
4 Functional, needs repair 4792 4.92 95.22
5 Non-Functional, dry 2473 2.54 97.76
6 Functional, not in use 1775 1.82 99.58
7 Abandoned/Decommissioned 321 0.33 99.91
8 Functional but not in use 86 0.09 100.00
9 Non-Functional due to dry season 7 0.01 100.01
10 Functional but needs repair 4 0.00 100.00
replace_na(status_clean, “unknown”) : replaces all NA to unknown within status_clean variable
<- wp_sf %>%
wp_sf_nga rename(status_clean = '#status_clean') %>%
select(status_clean) %>%
mutate(status_clean = replace_na(
'unknown')) status_clean,
6.1 Extracting Water Point Data
Filter based on functional, non_functional and unknown respectively.
<- wp_sf_nga %>%
wp_functional filter(status_clean %in%
c("Functional",
"Functional but not in use",
"Functional but needs repair"))
<- wp_sf_nga %>%
wp_nonfunctional filter(status_clean %in%
c("Abandoned/Decommissioned",
"Abandoned",
"Non-Functional due to dry season",
"Non-Functional",
"Non functional due to dry season"))
<- wp_sf_nga %>%
wp_unknown filter(status_clean %in% "unknown")
freq(data = wp_functional,
input = 'status_clean')
status_clean frequency percentage cumulative_perc
1 Functional 47228 99.81 99.81
2 Functional but not in use 86 0.18 99.99
3 Functional but needs repair 4 0.01 100.00
freq(data = wp_nonfunctional,
input = 'status_clean')
status_clean frequency percentage cumulative_perc
1 Non-Functional 30638 98.94 98.94
2 Abandoned/Decommissioned 321 1.04 99.98
3 Non-Functional due to dry season 7 0.02 100.00
freq(data = wp_unknown,
input = 'status_clean')
status_clean frequency percentage cumulative_perc
1 unknown 10154 100 100
6.2 Performing Point-in-Polygon Count
<- NGA %>%
NGA_wp mutate(`total_wp` = lengths(
st_intersects(NGA, wp_sf_nga))) %>%
mutate(`wp_functional` = lengths(
st_intersects(NGA, wp_functional))) %>%
mutate(`wp_nonfunctional` = lengths(
st_intersects(NGA, wp_nonfunctional))) %>%
mutate(`wp_unknown` = lengths(
st_intersects(NGA, wp_unknown)))
6.3 Visualing attributes by using statistical graph
Show code
ggplot(data = NGA_wp,
aes(x = total_wp)) +
geom_histogram(bins = 20,
color = "black",
fill ="light blue") +
geom_vline(aes(xintercept=mean(total_wp, na.rm=T)),
color = "red", linetype="dashed", size = 0.8) +
xlab("No of water points") +
ylab("No. of\nLGAs") +
theme(axis.title.y = element_text(angle = 0))
Show code
write_rds(NGA_wp, "data/rds/NGA_wp.rds")