Getting started with pfocal • pfocal

library(pfocal)

Introduction

Moving window calculations are common in many fields, from image processing to raster data manipulation. This package provides functions to create and apply discretized convolution kernels to rectangular data (matrix, raster). The calculations are implemented in parallel to reduce processing time, providing a fast alternative to existing single thread implementations of focal moving window methods.

A simple example

Let’s start with a simple 1000 by 1000 square matrix.

size <- 1000
data <- matrix(nrow = size, ncol = size, 
               data = runif(n = size*size, min = 0, max = 10))
image(data, asp = 1)

The package contains two types of functions: kernel generation functions and kernel application functions. With the package’s kernel generation functions, it is easy to create distance, exponential, circular as well as Gaussian discretized kernels, returned in matrix form. Here we generate a Chebyshev distance kernel and show its shape.

kernel <- chebyshev_distance_kernel(10)
image(kernel, asp = 1)

The second type of function in the package, kernel applications functions, apply a kernel to a matrix, raster object or terra object (although the most efficient approach is to pass a matrix to the function). The main application function is pfocal.

convoluted <- pfocal(data = data, kernel = kernel, edge_value = 0)
image(convoluted, asp = 1)

Convolution functions

The pfocal function is not as flexible as other existing focal calculation functions which can accommodate any functional treatment of the values covered by the kernel (for example raster::focal can take on any R function). However, it is optimized for a few common combinations of “transform” and “reduce” functions. The transform function dictates how the values d covered by the kernel k are treated with regard to the focal cell. To get the list of possible transform functions:

pfocal_info_transform()
#>      [,1]       [,2]
#> [1,] "MULTIPLY" "0" 
#> [2,] "ADD"      "1" 
#> [3,] "R_EXP"    "2" 
#> [4,] "L_EXP"    "3" 
#>      [,3]                                                                   
#> [1,] "For data value 'd' and kernal value 'k', intermediate_value = (d * k)"
#> [2,] "For data value 'd' and kernal value 'k', intermediate_value = (d + k)"
#> [3,] "For data value 'd' and kernal value 'k', intermediate_value = (d ^ k)"
#> [4,] "For data value 'd' and kernal value 'k', intermediate_value = (k ^ d)"

The reduce function, on the other hand, dictates how the transformed values are reduced, with the most common option being sum, product, minimum and maximum. See the list of reduce functions:

pfocal_info_reduce()
#>      [,1]          [,2]
#> [1,] "SUM"         "0" 
#> [2,] "ABS_SUM"     "1" 
#> [3,] "PRODUCT"     "2" 
#> [4,] "ABS_PRODUCT" "3" 
#> [5,] "MIN"         "4" 
#> [6,] "MAX"         "5" 
#>      [,3]                                                                                                                          
#> [1,] "Accumulator starts at 0. For each intermediate value, in no particular order, acc = ( acc + iv )"                            
#> [2,] "Accumulator starts at 0. For each intermediate value, in no particular order, acc = ( acc + abs(iv) )"                       
#> [3,] "Accumulator starts at 1. For each intermediate value, in no particular order, acc = ( acc * iv )"                            
#> [4,] "Accumulator starts at 1. For each intermediate value, in no particular order, acc = ( acc * abs(iv) )"                       
#> [5,] "Accumulator starts at the highest possible value. For each intermediate value, in no particular order, acc = min( acc , iv )"
#> [6,] "Accumulator starts at the lowest possible value. For each intermediate value, in no particular order, acc = max( acc , iv )"

Finally, there is the option of dividing the final result (a method for scaling). The default is no division. The different options can be seen with:

pfocal_info_mean_divisor()
#>       [,1]                    [,2]
#>  [1,] "ONE"                   "0" 
#>  [2,] "KERNEL_SIZE"           "1" 
#>  [3,] "KERNEL_COUNT"          "2" 
#>  [4,] "KERNEL_SUM"            "3" 
#>  [5,] "KERNEL_ABS_SUM"        "4" 
#>  [6,] "KERNEL_PROD"           "5" 
#>  [7,] "KERNEL_ABS_PROD"       "6" 
#>  [8,] "DYNAMIC_COUNT"         "7" 
#>  [9,] "DYNAMIC_SUM"           "8" 
#> [10,] "DYNAMIC_ABS_SUM"       "9" 
#> [11,] "DYNAMIC_PROD"          "10"
#> [12,] "DYNAMIC_ABS_PROD"      "11"
#> [13,] "DYNAMIC_DATA_SUM"      "12"
#> [14,] "DYNAMIC_DATA_ABS_SUM"  "13"
#> [15,] "DYNAMIC_DATA_PROD"     "14"
#> [16,] "DYNAMIC_DATA_ABS_PROD" "15"
#>       [,3]                                                                                                                            
#>  [1,] "Does not divide the final value by anything"                                                                                   
#>  [2,] "Divide the final value at each point by nrow(k)*ncol(k)"                                                                       
#>  [3,] "Divide the final value at each point by sum(+!is.na(k))"                                                                       
#>  [4,] "Divide the final value at each point by sum(k[!is.na(k)])"                                                                     
#>  [5,] "Divide the final value at each point by sum(abs(k[!is.na(k)]))"                                                                
#>  [6,] "Divide the final value at each point by prod(k[!is.na(k)])"                                                                    
#>  [7,] "Divide the final value at each point by prod(abs(k[!is.na(k)]))"                                                               
#>  [8,] "Divide the final value at each point by sum(!is.na( intermediate_data )), recalculated at every point"                         
#>  [9,] "Divide the final value at each point by sum(intermediate_data[!is.na( intermediate_data )]), recalculated at every point"      
#> [10,] "Divide the final value at each point by sum(abs(intermediate_data[!is.na( intermediate_data )])), recalculated at every point" 
#> [11,] "Divide the final value at each point by prod(intermediate_data[!is.na( intermediate_data )]), recalculated at every point"     
#> [12,] "Divide the final value at each point by prod(abs(intermediate_data[!is.na( intermediate_data )])), recalculated at every point"
#> [13,] "Divide the final value at each point by sum(local_data[!is.na( intermediate_data )]), recalculated at every point"             
#> [14,] "Divide the final value at each point by sum(abs(local_data[!is.na( intermediate_data )])), recalculated at every point"        
#> [15,] "Divide the final value at each point by prod(local_data[!is.na( intermediate_data )]), recalculated at every point"            
#> [16,] "Divide the final value at each point by prod(abs(local_data[!is.na( intermediate_data )])), recalculated at every point"

Shortcut functions

Other kernel applications are available in the package, bearing the pfocal_fast, and wrapping pfocal as a shortcut to some default arguments.

fast_convoluted <- pfocal_fast_abs_rectangle(data = data, height = 10, width = 10)
image(fast_convoluted, asp = 1)

Comparison with raster::focal

To compare with raster::focal, we will make use of the raster and microbenchmark packages:

library(raster)
#> Loading required package: sp
library(microbenchmark)

We will run the functions 100 times on the same data grid above. We show here how the mean() function can be replicated in pfocal.

data_raster <- raster::raster(data)

mbm <- microbenchmark(
  "pfocal" = {
    pfocal_results <- pfocal::pfocal(data = data_raster, kernel = kernel, 
                                     transform_function = "MULTIPLY",
                                     reduce_function = "SUM", 
                                     mean_divider = "KERNEL_COUNT")
    gc()
  },
  "raster_focal" = {
    raster_focal_results <- raster::focal(x = data_raster, w = kernel, 
                                          fun = mean, pad = TRUE, padValue = 0)
    gc()
  },
  times = 100,
  unit = "s")
mbm
#> Unit: seconds
#>          expr       min        lq      mean    median        uq      max neval
#>        pfocal 0.5498404 0.5558339 0.5627698 0.5589692 0.5651258 0.626185   100
#>  raster_focal 0.7848192 0.8120337 0.8282878 0.8258618 0.8389689 1.094305   100

plot(mbm)