The synoptReg package contains two main methods to perform a synoptic classification based on the matrix mode. One of them is the S-mode.

Introduction to the S-mode classification method

The S-mode classification is that in which the grid points (lon-lat) are the variables and the observations (rows), the days. Thus, the application of a PCA on this type of matrix establishes the linear relationships between the time series of all the grid points. In this sense, the scores obtained in this type of classification show the degree of representativeness of each day for each of the principal components (PCs). However, the scores do not allow us to directly obtain the circulation types (also known as “weather types”), since the same day can be represented by different PCs. For example, 2018/01/23 may have a certain degree of representation in the PC1, PC3 and PC6. For this reason, we need to use the scores (degree of representation) as coordinates in the multivariate space to execute a clustering method that ends up creating groups between those days that are most similar to each other. In addition, the VARIMAX rotation is applied to the retained PCs in order to redistribute their explained variance. The variance of the PCs after being rotated will probably not follow a consecutive order. For example, the PC1 does not have to be the one that accumulates the highest explained variance. Note that this rotation is not applied in the pca_decision() function.

In the synoptReg package the clustering method implemented is the one proposed by Esteban et al. (2005), based on the extreme scores. Esteban et al. suggest that the extreme scores (positive and negative) of each PC (>2 or <-2) can be used as the multivariate coordinates of each cluster center to apply the k-means algorithm. In this sense, when using the positive and negative scores, the groups in each component are split (if for each group these extreme scores are reached). So, if we retain 4 PCs, our final classification can reach up to 8 weather types/circulation types.

Synoptic classification procedure

In this example we will use the data previously downloaded with the download_ncep() function. Specifically, we will use the mslp and z500 objects loaded into the package, referring to the mean sea level pressure and 500 hPa geopotential height variables, respectively. The time period of the atmospheric variables is from 2000-01-01 to 2002-12-31. A total of 3 years.

library(synoptReg)
#> 
#> ****
#> Welcome to synoptReg! 
#> Using synoptReg for research publication?  Please cite it!
#> I'm an early career researcher and every citation matters.
#> ****
# check ?download_ncep() to see how to download reanalysis data.
data(mslp)
data(z500)

The first step before executing the synoptic classification is to pass the function tidy_nc(), since it will allow us to join the two atmospheric variables to execute the classification later. It is also possible to subset the time period and the geographical extension. In addition, whether we specify an argument or not, this function computes the anomaly of each atmospheric variable. Such anomaly is computed with respect to the monthly average. If we subset the time period using the time_subet argument, the anomaly will be computed from this last period.

In any case, now we will only group our atmospheric variables. It is mandatory to use a list for grouping variables.

vars <- tidy_nc(x = list(mslp,z500), name_vars = c("mslp","z500"))
vars
#> # A tibble: 484,432 x 6
#>    var     lon   lat time         value anom_value
#>    <chr> <dbl> <dbl> <date>       <dbl>      <dbl>
#>  1 mslp  -10      60 2000-01-01 100010      -380. 
#>  2 mslp   -7.5    60 2000-01-01 100292.     -186. 
#>  3 mslp   -5      60 2000-01-01 100508.      -48.1
#>  4 mslp   -2.5    60 2000-01-01 100675        43.8
#>  5 mslp    0      60 2000-01-01 100810       103. 
#>  6 mslp    2.5    60 2000-01-01 100915       139. 
#>  7 mslp    5      60 2000-01-01 101000       146. 
#>  8 mslp    7.5    60 2000-01-01 101115       138. 
#>  9 mslp   10      60 2000-01-01 101260       177. 
#> 10 mslp   12.5    60 2000-01-01 101365       294. 
#> # ... with 484,422 more rows

Once the two variables are joined in the same tibble, we can proceed to explore our data set to decide the number of components to retain. For such decision we will use the pca_decision() function. It is important to specify a number of generous components since what we want is to know which number of components explain most of the variance in our data. Also, since we’re working with 2 variables that have different units of measurement, it’s important to normalize/standardize such variables.

info <- pca_decision(vars,ncomp = 30,norm = T,matrix_mode = "S-mode")
info$screeplot #scree test

info$summary[,1:10] # first 10 PCs
#>                        1          2         3          4          5         6
#> sdev          12.0054706 10.0443840 8.0009129 6.62004843 4.66172083 3.6935655
#> prop.variance  0.3263868  0.2284656 0.1449617 0.09924223 0.04921151 0.0308934
#> cum.variance   0.3263868  0.5548523 0.6998140 0.79905626 0.84826777 0.8791612
#>                        7          8          9          10
#> sdev          3.08389484 2.36012591 2.29847366 2.092658247
#> prop.variance 0.02153641 0.01261376 0.01196336 0.009916782
#> cum.variance  0.90069759 0.91331135 0.92527471 0.935191490

After taking a look at the scree test, it seems reasonable to retain 5 PCs, which could allow up to 10 WT. These 5 PCs explain arround 85% of the explained variance. Now, we can already compute the synoptic classification:

cl <- synoptclas(vars,ncomp = 5,norm = T,matrix_mode = "S-mode")
#> Warning: The `x` argument of `as_tibble.matrix()` must have unique column names if `.name_repair` is omitted as of tibble 2.0.0.
#> Using compatibility `.name_repair`.
cl # list object containing the classification and the classified grid
#> $clas
#> # A tibble: 1,096 x 2
#>    time          WT
#>    <date>     <int>
#>  1 2000-01-01     6
#>  2 2000-01-02     6
#>  3 2000-01-03     9
#>  4 2000-01-04     9
#>  5 2000-01-05     9
#>  6 2000-01-06     9
#>  7 2000-01-07     9
#>  8 2000-01-08     9
#>  9 2000-01-09     6
#> 10 2000-01-10     5
#> # ... with 1,086 more rows
#> 
#> $grid_clas
#> # A tibble: 3,978 x 9
#>    var     lon   lat time          WT mean_WT_value mean_WT_anom_val~ pval_ttest
#>    <chr> <dbl> <dbl> <date>     <int>         <dbl>             <dbl>      <dbl>
#>  1 mslp  -10      60 2000-01-01     6       100625.              64.3   6.20e- 1
#>  2 mslp   -7.5    60 2000-01-01     6       100838.             215.    8.74e- 2
#>  3 mslp   -5      60 2000-01-01     6       101048.             373.    2.42e- 3
#>  4 mslp   -2.5    60 2000-01-01     6       101265.             535.    1.05e- 5
#>  5 mslp    0      60 2000-01-01     6       101490.             694.    9.92e- 9
#>  6 mslp    2.5    60 2000-01-01     6       101706.             840.    4.12e-12
#>  7 mslp    5      60 2000-01-01     6       101914.             958.    2.54e-15
#>  8 mslp    7.5    60 2000-01-01     6       102129.            1040.    1.41e-17
#>  9 mslp   10      60 2000-01-01     6       102287.            1085.    6.45e-19
#> 10 mslp   12.5    60 2000-01-01     6       102318.            1112.    1.78e-20
#> # ... with 3,968 more rows, and 1 more variable: cv_WT_value <dbl>

The object returned by synoptclas is a list that contains a tibble with each date and its corresponding WT, and another tibble that is a grid with the average value of the atmospheric variables for each CT. With this last result we can create a visualization of these synoptic patterns:

library(ggplot2) 
library(dplyr)
library(rnaturalearth)
library(metR)

borders <- ne_countries(continent = c("europe","africa"),
                                       returnclass = "sf")

ggplot()+
    geom_contour_fill(filter(cl$grid_clas, var == "z500"), 
              mapping = aes(lon,lat,z = mean_WT_anom_value),
              binwidth = 20)+
    stat_subset(data = filter(cl$grid_clas,var == "z500"),
              aes(lon,lat, subset = var == "z500" & pval_ttest  <0.01), # skip = 1
              size = 0.2)+
    geom_sf(data = borders, fill = "transparent")+
    geom_contour2(data = filter(cl$grid_clas,var == "mslp"),
                  aes(x=lon,y=lat,z=mean_WT_value/100),
                  binwidth = 4, color = "black") +
    geom_text_contour(data= filter(cl$grid_clas, var == "mslp"),
                      aes(x=lon,y=lat,z=mean_WT_value/100), 
                      stroke = 0.15,binwidth = 4) +
    guides(fill = guide_colourbar(barwidth = 9, barheight = 0.5))+
    facet_wrap(~WT, ncol = 5) +
    scale_fill_divergent(name = "Z anomaly (m)") +
    scale_x_continuous(limits = c(-10,30), expand = c(0, 0))+
    scale_y_continuous(limits = c(30,60), expand = c(0,0))+
    theme_bw() +
    theme(
      panel.grid.major = element_blank(), 
      panel.grid.minor = element_blank(),
      panel.background = element_blank(),
      text = element_text(size = 10),
      strip.background = element_rect(fill = "transparent", color = NA),
      axis.title = element_blank(),
      axis.text = element_blank(),
      axis.ticks = element_blank(),
      legend.position = "bottom")

From circulation types to environment

synoptReg has a function called ct2env() that allows to estalblish the link between the synoptic classification created previously and its link with any environmental variable. In this case, the example uses daily precipitation data from the Balearic Islands (Spain) at a resolution of 5x5 km.

data(pcp)
pcp
#> # A tibble: 847,798 x 4
#>       lon     lat time       value
#>     <dbl>   <dbl> <date>     <int>
#>  1 865672 4313822 2000-01-01     0
#>  2 870672 4313822 2000-01-01     0
#>  3 870672 4318822 2000-01-01     0
#>  4 875672 4313822 2000-01-01     0
#>  5 875672 4318822 2000-01-01     0
#>  6 875672 4323822 2000-01-01     0
#>  7 875672 4328822 2000-01-01     0
#>  8 880672 4308822 2000-01-01     0
#>  9 880672 4313822 2000-01-01     0
#> 10 880672 4318822 2000-01-01     0
#> # ... with 847,788 more rows

As you can observe, the structure of the tibble is always the same (variables: lon, lat, time, value) and has to be respected so that all the functions of the package work correctly. How do we calculate the interaction of WTs with precipitation?

ct2pcp <- ct2env(pcp,clas = cl$clas,fun = mean,out = "data.frame")
ct2pcp
#> # A tibble: 1,899 x 4
#>       lon     lat    WT  calc
#>     <dbl>   <dbl> <int> <dbl>
#>  1 865672 4313822     6 15.1 
#>  2 870672 4313822     6 10.9 
#>  3 870672 4318822     6 14.7 
#>  4 875672 4313822     6  8.05
#>  5 875672 4318822     6 11.4 
#>  6 875672 4323822     6 14.4 
#>  7 875672 4328822     6 19.4 
#>  8 880672 4308822     6  7.83
#>  9 880672 4313822     6  8.22
#> 10 880672 4318822     6 10.5 
#> # ... with 1,889 more rows

The result is a tibble containing the variables lon, lat, WT and calc. This last column is the result of the average precipitation for each grid point and WT.

To visualize the result:

ggplot() + 
  geom_tile(data=ct2pcp, aes(x=lon, y=lat,fill=calc/10)) + # original was mm*10
  scale_fill_gradientn(colours = rev(pals::linearl(100)), 
  name = "rainfall (mm)",na.value="transparent") + 
  coord_equal()+
  scale_colour_gradient(guide = 'none') + facet_wrap(~WT, ncol = 4) +
  guides(fill = guide_colourbar(barwidth = 9, barheight = 0.5))+
    theme_bw() +
    theme(
      panel.grid.major = element_blank(), 
      panel.grid.minor = element_blank(),
      panel.background = element_blank(),
      text = element_text(size = 10),
      strip.background = element_rect(fill = "transparent", color = NA),
      axis.title = element_blank(),
      axis.text = element_blank(),
      axis.ticks = element_blank(),
      legend.position = "bottom")

Creating categorized regions. The last step.

Finally, it may be interesting to classify our environmental variables into different regions. Each one of these regions will have different characteristics based on the relation with the precipitation linked to each WT. To carry out such a categorization process, it is necessary that our object ct2pcp is in raster format:

ct2pcp_r <- ct2env(pcp,clas = cl$clas,fun = mean,out = "raster")
ct2pcp_r
#> class      : RasterBrick 
#> dimensions : 35, 53, 1855, 9  (nrow, ncol, ncell, nlayers)
#> resolution : 5000, 5000  (x, y)
#> extent     : 863172, 1128172, 4286322, 4461322  (xmin, xmax, ymin, ymax)
#> crs        : NA 
#> source     : memory
#> names      :        X1,        X2,        X3,        X4,        X5,        X6,        X7,        X8,        X9 
#> min values :  3.492537,  5.072340, 20.135338,  2.532468, 20.445652,  7.546512,  2.378049,  1.329545,  1.493506 
#> max values :  39.08955,  25.92340,  95.32331,  23.29870, 182.06522,  52.48837,  14.07317,  26.28409,  19.89610

Now it is a matter of extracting the main spatial patterns of precipitation linked to WTs. For this purpose, we will use the raster_pca() function.

ct2pcp_r_pca <- raster_pca(ct2pcp_r)
raster::plot(ct2pcp_r_pca$rasterPCA)

ct2pcp_r_pca$summary
#>                  Comp.1    Comp.2     Comp.3     Comp.4     Comp.5     Comp.6
#> sdev          2.3172708 1.1168311 0.88643901 0.75487256 0.61850549 0.47118991
#> prop.variance 0.5966382 0.1385902 0.08730824 0.06331473 0.04250545 0.02466888
#> cum.variance  0.5966382 0.7352284 0.82253667 0.88585140 0.92835685 0.95302573
#>                   Comp.7     Comp.8     Comp.9
#> sdev          0.42711209 0.36097577 0.33172308
#> prop.variance 0.02026942 0.01447817 0.01222669
#> cum.variance  0.97329514 0.98777331 1.00000000

The first 4 PCs accumulate almost 90% of the variance of our original data. Thus, we will use these 4 PCs to run our regionalization by means of the K-means algorithm implemented in the regionalization() function. For more details on how the regionalization is performed, please read Lemus-Canovas et al. (2019)

# Regionalization
ct2pcp_reg <- regionalization(ct2pcp_r_pca$rasterPCA[[1:4]],centers = 4) 

precp_region_df <- raster::as.data.frame(ct2pcp_reg$regionalization, 
                                 xy = T)
# Visualization:

labels <- c("Region 1","Region 2",
           "Region 3", "Region 4")
colours <- c("mediumseagreen","darkgoldenrod1",
              "khaki1", "dodgerblue3")

ggplot() + 
  geom_tile(data=precp_region_df, aes(x=x, y=y,fill=layer)) +
  scale_fill_gradientn(colours = colours,
                       guide = "legend", breaks = c(1,2,3,4),
                       na.value = alpha("white", alpha = 0.0),
                       labels = labels, name= "")+
  theme_bw() +
  coord_equal()+
  theme(axis.text = element_blank(),axis.title = element_blank(),
        legend.position = "bottom", axis.ticks = element_blank())

To be more accurate, we should execute an iterative process with the function regionalization() to evaluate the clustering error using the metric pseudoMAE included in the outputs of this function.

Method references

Synoptic classification method:

  • Esteban, P., Jones, P.D., Martín-Vide, J., Mases, M., 2005. Atmospheric circulation patterns related to heavy snowfall days in Andorra, Pyrenees. Int. J. Climatol. 25, 319–329. https://doi.org/10.1002/joc.1103

Clustering method using the precipitation patterns linked to the weather types:

  • Lemus-Canovas, M., Lopez-Bustins, J.A., Trapero, L., Martin-Vide, J., 2019. Combining circulation weather types and daily precipitation modelling to derive climatic precipitation regions in the Pyrenees. Atmos. Res. 220, 181–193. https://doi.org/10.1016/J.ATMOSRES.2019.01.018

  • Lemus-Canovas, M., Lopez-Bustins, J.A., Martin-Vide, J., Royé, D., 2019. synoptReg: An R package for computing a synoptic climate classification and a spatial regionalization of environmental data. Environ. Model. Softw. 118, 114–119. https://doi.org/10.1016/J.ENVSOFT.2019.04.006