import pandas as pd
import geopandas as gpd
import xarray as xr
import contextily as ctx
Appendix C — Air quality
This notebook contains the code used to derive the air quality/pollution index.
C.1 Exploration of air quality data from Newcastle urban observatory
Downloaded from https://archive.dev.urbanobservatory.ac.uk/agg/900/ aggregated for 15 minutes blocks. We can also get coarser aggregation (hour, 6 hours, day) or even raw minute data.
Specify paths to existing files, some of which need to be downloaded manually (see Notes).
= "/Users/martin/Library/CloudStorage/OneDrive-SharedLibraries-TheAlanTuringInstitute/Daniel Arribas-Bel - demoland_data" data_folder
= (
lsoa_list_file f"{data_folder}/processed/tynewear_lsoas_list.csv" # list of LSOAs within AoI
)= f"{data_folder}/processed/authorities/"
aoi_file
# national level files
= f"{data_folder}/raw/LSOA_(Dec_2011)_Boundaries_Super_Generalised_Clipped_(BSC)_EW_V3.geojson" # administrative boundaries from gov.uk OS LSOA_boundaries_file
Load the data for CO for December to see the structure
= pd.read_csv(f"{data_folder}/raw/air_quality/2022-12-CO-900.0.csv.zip") co_dec22
Generate actual geometry
= gpd.GeoDataFrame(
co_dec22 =gpd.GeoSeries.from_wkt(co_dec22["Location (WKT)"], crs=4326)
co_dec22, geometry27700) ).to_crs(
co_dec22.head()
Sensor Name | Variable | Units | Timestamp | Mean Value | Min Value | Max Value | Median Value | Standard Dev | Count Records | Location (WKT) | Ground Height Above Sea Level | Sensor Height Above Ground | Broker Name | Third Party | Sensor Centroid Longitude | Sensor Centroid Latitude | Raw ID | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | PER_AIRMON_MESH1972150 | CO | ugm -3 | 2022-12-01 00:00:00 | 417.865689 | 406.879185 | 491.198130 | 410.974850 | 20.274449 | 15.0 | POINT (-1.617121 54.974193) | 47.290001 | 2.0 | aq_mesh_api | False | -1.617121 | 54.974193 | 79208 | POINT (424607.499 564463.931) |
1 | PER_AIRMON_MESH1907150 | CO | ugm -3 | 2022-12-01 00:00:00 | 630.707678 | 614.071515 | 678.104495 | 627.741670 | 17.546116 | 15.0 | POINT (-1.603448 54.965679) | 31.170000 | 2.0 | aq_mesh_api | False | -1.603448 | 54.965679 | 79150 | POINT (425488.062 563521.339) |
2 | PER_AIRMON_MESH1968150 | CO | ugm -3 | 2022-12-01 00:00:00 | 1.257286 | 1.196525 | 1.766735 | 1.217135 | 0.138612 | 15.0 | POINT (-1.614848 54.97149) | 42.040001 | 2.0 | aq_mesh_api | False | -1.614848 | 54.971490 | 79204 | POINT (424754.639 564163.933) |
3 | PER_AIRMON_MESH1969150 | CO | ugm -3 | 2022-12-01 00:00:00 | 343.280313 | 337.130365 | 364.159235 | 342.825595 | 6.512988 | 15.0 | POINT (-1.593537 55.039309) | 64.440002 | 2.0 | aq_mesh_api | False | -1.593537 | 55.039309 | 79205 | POINT (426075.104 571718.592) |
4 | PER_AIRMON_MONITOR1157100 | CO | ugm -3 | 2022-12-01 00:00:00 | 0.179765 | 0.179765 | 0.179765 | 0.179765 | 0.000000 | 1.0 | POINT (-1.623043 54.970093) | 45.270000 | 2.0 | aq_mesh_api | False | -1.623043 | 54.970093 | 79565 | POINT (424230.858 564005.621) |
We have classic summary stats for the CO values, each per 15 minute slot. We can get those for every month back to roughly 2016.
The same can be done for the other metrics:
- CO (this one)
- NO
- NO2
- NOx
- O3
- PM 4
- PM 1
- PM 10
- PM 2.5
- Particle count
Given we are dealing with rather static scenarios, the target indicator in the modelling process needs to be an aggregation of some sort per a longer period of time (a year?).
Further, we may not want to model each metric individually, so there may be a scope for a creation of an index of air quality combining all these into a single metric that will be then predicted. Or we can predict them all…
= gpd.read_file(aoi_file) aoi
= co_dec22.loc[~co_dec22["Location (WKT)"].duplicated()].explore(marker_type="marker")
m =m, color="black")
aoi.boundary.explore(m m
The main issue with the dataset is its coverage. As shown above, it is very limited and even if we wanted to interpolate the data we wouldn’t be able to do so south of the city. UO has other sensors there but not those for air quality.
C.2 EEA-based index
EEA specifies the relationship between concentration of different pollutants to get an index of hazard for human health (here). We can use their multipliers to create a single index based on the UK AIR data. From EEA:
The bands are based on the relative risks associated to short-term exposure to PM2.5, O3 and NO2, as defined by the World Health Organization in its report on the Health Risks of Air Pollution in Europe project (HRAPIE project report).
The relative risk of exposure to PM2.5 is taken as basis for driving the index, specifically the increase in the risk of mortality per 10 µg/m3 increase in the daily mean concentration of PM2.5.
Assuming linearity across the relative risks functions for O3 and NO2, we calculate the concentrations of these pollutants that pose an equivalent relative risk to a 10 µg/m3 increase in the daily mean of PM2.5.
For PM10 concentrations, a constant ratio between PM10 and PM2.5 of 1:2 is assumed, in line with the World Health Organization´s air quality guidelines for Europe.
For SO2, the bands reflect the limit values set under the EU Air Quality Directive.
The relationship between \(PM_{2.5}\) : \(PM_{10}\) : \(NO_{2}\) : \(O_{3}\) : \(SO_{2}\) is then equal to 1 : 2 : 4 : 5 : 10. The combined index can then be computed as \[Q_{air} = \frac{PM_{2.5}}{1} + \frac{PM_{10}}{2} + \frac{NO_{2}}{4} + \frac{O_{3}}{5} + \frac{SO_{2}}{10}\]
Except for the \(O_{3}\), UK AIR reports all as a concentration in \(\mu g \cdot m^{-3}\). \(O_{3}\) is reported as a number of days above a threshold of 120 \(\mu g \cdot m^{-3}\) and cannot be used in this formula but even EEA omits data when unavailable so we shall be able to create the index based on 4 remaning measurements.
It shall also be condsidered that UK AIR is not representing a direct measurements but a model.
C.2.1 Data
Data is available from DEFRA (https://uk-air.defra.gov.uk/data/pcm-data) as CSVs in a UK grid.
The data begins on row 6 of each file and contains 4 columns: - a unique code (ukgridcode) for each 1x1km cell in the map - the x coordinates for the centroid of each grid cell - the y coordinates for the centroid of each grid cell - the values for the metric itself.
The coordinate system is OSGB and the coordinates represent the centre of each 1x1km cell.
There are some grid squares in each data set that do not have values associated with them and are labelled as ‘MISSING’
We can use pandas to read grids to Xarray for efficient array computation of the index.
= (
pm10_21
pd.read_csv("https://uk-air.defra.gov.uk/datastore/pcm/mappm102021g.csv",
=5,
header=["MISSING"],
na_values
)"x", "y"])
.set_index([="gridcode")
.drop(columns
.to_xarray()
)= (
pm25_21
pd.read_csv("https://uk-air.defra.gov.uk/datastore/pcm/mappm252021g.csv",
=5,
header=["MISSING"],
na_values
)"x", "y"])
.set_index([="gridcode")
.drop(columns
.to_xarray()
)= (
no2_21
pd.read_csv("https://uk-air.defra.gov.uk/datastore/pcm/mapno22021.csv",
=5,
header=["MISSING"],
na_values
)"x", "y"])
.set_index([="gridcode")
.drop(columns
.to_xarray()
)= (
so2_21
pd.read_csv("https://uk-air.defra.gov.uk/datastore/pcm/mapso22021.csv",
=5,
header=["MISSING"],
na_values
)"x", "y"])
.set_index([="gridcode")
.drop(columns
.to_xarray()
)= xr.merge([pm10_21, pm25_21, no2_21, so2_21])
pollutants_2021 pollutants_2021
We can compute the index based on the formula above.
= (
aqi
pollutants_2021.pm252021g+ pollutants_2021.pm102021g / 2
+ pollutants_2021.no22021 / 4
+ pollutants_2021.so22021 / 10
)= pollutants_2021.assign(aqi=aqi) pollutants_2021
We can filter only the area of interest before converting to a GeoDataFrame.
= aoi.total_bounds
bds = pollutants_2021.sel(
pollutants_2021_tyne =slice(bds[0], bds[2]), y=slice(bds[1], bds[3])
x )
We convert the array to a GeoDataFrame with polygons representing grid area. This will be needed for areal interpolation to LSOA/MSOA.
= pollutants_2021_tyne.to_dataframe().reset_index()
pollutants_2021_tyne_df = gpd.GeoDataFrame(
pollutants_2021_tyne_df
pollutants_2021_tyne_df,=gpd.points_from_xy(
geometry=27700
pollutants_2021_tyne_df.x, pollutants_2021_tyne_df.y, crsbuffer(500, cap_style=3),
). )
We can check how the derived index looks.
"aqi") pollutants_2021_tyne_df.explore(
pollutants_2021_tyne_df.to_file(f"{data_folder}/processed/air_quality/air_quality_grid_2021.gpkg"
)