Appendix F — Sentinel 2 latent representation

Process the data from the https://doi.org/10.1016/j.compenvurbsys.2022.101802 paper to be used in modelling.

import geopandas as gpd
import tobler
/var/folders/2f/fhks6w_d0k556plcv3rfmshw0000gn/T/ipykernel_69158/2891901632.py:1: UserWarning: Shapely 2.0 is installed, but because PyGEOS is also installed, GeoPandas will still use PyGEOS by default for now. To force to use and test Shapely 2.0, you have to set the environment variable USE_PYGEOS=0. You can do this before starting the Python process, or in your code before importing geopandas:

import os
os.environ['USE_PYGEOS'] = '0'
import geopandas

In a future release, GeoPandas will switch to using Shapely by default. If you are using PyGEOS directly (calling PyGEOS functions on geometries from GeoPandas), this will then stop working and you are encouraged to migrate from PyGEOS to Shapely 2.0 (https://shapely.readthedocs.io/en/latest/migration_pygeos.html).
  import geopandas as gpd
data_folder = "/Users/martin/Library/CloudStorage/OneDrive-SharedLibraries-TheAlanTuringInstitute/Daniel Arribas-Bel - demoland_data"

Load the data

oa = gpd.read_parquet(
    f"{data_folder}/processed/interpolated/all_oa.parquet",
    columns=["geo_code", "geometry"],
)
postcodes = gpd.read_parquet(
    f"{data_folder}/raw/sentinel_latent/latent_smoothed.parquet"
)

Create geometry of original samples.

postcodes.geometry = postcodes.buffer(80, cap_style=3)

Join with the OA in the area of interest.

xmin, ymin, xmax, ymax = oa.total_bounds
postcodes_aoi = postcodes.cx[xmin:xmax, ymin:ymax]
postcodes_aoi = postcodes_aoi.sjoin(oa, how="inner")
/Users/martin/mambaforge/envs/demoland/lib/python3.11/site-packages/geopandas/geodataframe.py:2061: UserWarning: CRS mismatch between the CRS of left geometries and the CRS of right geometries.
Use `to_crs()` to reproject one of the input geometries to match the CRS of the other.

Left CRS: EPSG:27700
Right CRS: EPSG:27700

  return geopandas.sjoin(left_df=self, right_df=df, *args, **kwargs)

Create a mean latent vector per OA.

latent_oa = (
    postcodes_aoi.drop(columns=["geo_code", "geometry"]).groupby("index_right").mean()
)
latent_oa
0 1 2 3 4 5 6 7 8 9 ... 54 55 56 57 58 59 60 61 62 63
index_right
0 -0.682062 -1.021918 -1.021918 -1.021918 -1.021918 -1.021918 0.451931 -0.211763 0.387425 0.928424 ... -1.021918 -1.021918 0.440444 1.743971 0.320842 1.474930 1.151639 -0.132747 1.498061 0.871063
1 -0.663786 -1.021918 -1.021918 -1.021918 -1.021918 -1.021918 0.349480 -0.311341 0.504530 0.902771 ... -1.021918 -1.021918 0.577340 1.883114 0.293000 1.414315 1.145569 -0.093327 1.454991 0.880657
2 -0.657851 -1.021918 -1.021918 -1.021918 -1.021918 -1.021918 0.414436 -0.291362 0.512188 0.857220 ... -1.021918 -1.021918 0.562472 1.878608 0.301625 1.483689 1.187823 -0.086438 1.513144 0.918716
3 -0.666984 -1.021918 -1.021918 -1.021918 -1.021918 -1.021918 0.427887 -0.262477 0.397210 0.949204 ... -1.021918 -1.021918 0.464777 1.763278 0.334995 1.434697 1.128427 -0.116929 1.466723 0.861476
4 -0.679580 -1.021918 -1.021918 -1.021918 -1.021918 -1.021918 0.461694 -0.156182 0.458418 0.925219 ... -1.021918 -1.021918 0.490356 1.797309 0.374750 1.505300 1.117723 -0.139840 1.487528 0.887797
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3790 -0.646544 -1.021918 -1.021918 -1.021918 -1.021918 -1.021918 0.448320 -0.247494 0.457923 0.925764 ... -1.021918 -1.021918 0.596297 1.882795 0.314640 1.466138 1.180296 -0.069748 1.481510 0.899854
3791 -0.641375 -1.021918 -1.021918 -1.021918 -1.021918 -1.021918 0.533849 -0.143489 0.441434 0.966423 ... -1.021918 -1.021918 0.556728 1.898314 0.299499 1.554656 1.178569 -0.087546 1.507623 0.944229
3792 -0.642435 -1.021918 -1.021918 -1.021918 -1.021918 -1.021918 0.508919 -0.193876 0.475717 0.867998 ... -1.021918 -1.021918 0.583808 1.872691 0.381800 1.518266 1.206397 -0.212747 1.641260 0.999635
3793 -0.626392 -1.021918 -1.021918 -1.021918 -1.021918 -1.021918 0.570091 -0.141022 0.387859 0.993209 ... -1.021918 -1.021918 0.558174 1.881344 0.309562 1.597636 1.203249 -0.086112 1.549585 0.941391
3794 -0.630401 -1.021918 -1.021918 -1.021909 -1.021918 -1.021918 0.507333 -0.214628 0.472556 0.940042 ... -1.021918 -1.021918 0.606770 1.923926 0.276992 1.529824 1.194195 -0.077461 1.518514 0.937542

3795 rows × 64 columns

latent_oa = latent_oa.set_geometry(oa.geometry)
latent_oa.to_parquet(f"{data_folder}/processed/sentinel/latent_oa.parquet")