Appendix Q — Air quality and house price model inference

Wrap into a custom class dealing with the lag computation.

import pickle
import geopandas as gpd
import libpysal

from demoland_engine.indicators import Model
/var/folders/2f/fhks6w_d0k556plcv3rfmshw0000gn/T/ipykernel_3696/2799141644.py:2: UserWarning: Shapely 2.0 is installed, but because PyGEOS is also installed, GeoPandas will still use PyGEOS by default for now. To force to use and test Shapely 2.0, you have to set the environment variable USE_PYGEOS=0. You can do this before starting the Python process, or in your code before importing geopandas:

import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

In a future release, GeoPandas will switch to using Shapely by default. If you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
  import geopandas as gpd
data_folder = "/Users/martin/Library/CloudStorage/OneDrive-SharedLibraries-TheAlanTuringInstitute/Daniel Arribas-Bel - demoland_data"

Load the data

data = gpd.read_parquet(f"{data_folder}/processed/interpolated/all_oa.parquet")

Filter only explanatory variables.

exvars = data.drop(
    columns=[
        "geo_code",
        "geometry",
        "air_quality_index",
        "house_price_index",
        "jobs_accessibility_index",
        "greenspace_accessibility_index",
    ]
)

Q.1 Air quality

Load the sklearn model

with open(f"{data_folder}/models/air_quality_model.pickle", "rb") as f:
    air_quality = pickle.load(f)

Create spatial weights

queen = libpysal.weights.Queen.from_dataframe(data)
_2k = libpysal.weights.DistanceBand.from_dataframe(data, 2000)
W = libpysal.weights.w_union(queen, _2k)
W.transform = "r"
/Users/martin/mambaforge/envs/demoland/lib/python3.11/site-packages/libpysal/weights/weights.py:172: UserWarning: The weights matrix is not fully connected: 
 There are 3 disconnected components.
  warnings.warn(message)

Create object.

aqm = Model(W, air_quality)

Save the custom predictor class to a pickle.

with open(f"{data_folder}/models/air_quality_predictor.pickle", "wb") as f:
    pickle.dump(aqm, f)

Q.1.1 England-wide model

Load the sklearn model

with open(f"{data_folder}/models/air_quality_model_nc_urbanities.pickle", "rb") as f:
    air_quality = pickle.load(f)

Create spatial weights

queen = libpysal.weights.Queen.from_dataframe(data)
W = libpysal.weights.higher_order(queen, k=5, lower_order=True, silence_warnings=True)
W.transform = "r"
/Users/martin/mambaforge/envs/demoland/lib/python3.11/site-packages/libpysal/weights/weights.py:172: UserWarning: The weights matrix is not fully connected: 
 There are 3 disconnected components.
  warnings.warn(message)

Create object.

aqm = Model(W, air_quality)

Save the custom predictor class to a pickle.

with open(
    f"{data_folder}/models/air_quality_predictor_nc_urbanities.pickle", "wb"
) as f:
    pickle.dump(aqm, f)

Q.2 House price

Load the sklearn model

with open(f"{data_folder}/models/house_price_model.pickle", "rb") as f:
    house_price = pickle.load(f)

Create spatial weights

q5 = libpysal.weights.higher_order(queen, k=5, lower_order=True)
q5.transform = "r"

Create a wrapper class computing the lag.

hpm = Model(q5, house_price)

Save the custom predictor class to a pickle.

with open(f"{data_folder}/models/house_price_predictor.pickle", "wb") as f:
    pickle.dump(hpm, f)

Q.2.1 England-wide model

Load the sklearn model

with open(
    f"{data_folder}/models/house_price_model_england_no_london.pickle", "rb"
) as f:
    house_price = pickle.load(f)

Create a wrapper class computing the lag.

hpm = Model(W, house_price)

Save the custom predictor class to a pickle.

with open(
    f"{data_folder}/models/house_price_predictor_england_no_london.pickle", "wb"
) as f:
    pickle.dump(hpm, f)

Q.3 Using the class for prediction

To use the class for prediction, load the pickle and call predict on a data frame with explanatory variables (either default or reflecting a scenario).

with open(f"{data_folder}/models/air_quality_predictor.pickle", "rb") as f:
    aqm2 = pickle.load(f)
aqm2.predict(exvars)
array([17.19278662, 16.43954378, 17.48423016, ..., 16.7559517 ,
       12.60627689, 17.31309272])

Exactly the same would it be for the house price model.