Introduction: PASS

The CDO Data Science team at Employment and Social Development Canada has developed a geographic information system, Potential Accessibility Software Service (PASS). PASS offers an advanced quantitative approach to measure how spatially accessible population demand is to a given service.

Spatial accessibility is the consideration of how physical and social space and place affect how a population can traverse through it to access a given service. Though abstract in nature, it can be measured through considerations like where potential population demand is located, the geographic distance to get from the population location to the service locations offered, the supply at the service locations, as well as the probability of a population going to one service location over another based on the capacity. PASS uses the enhanced 3-Step Floating Catchment Area (3SFCA) methodology to accomplish this, which is further explained below.

PASS lets you select a geographic area of interest by panning and zooming on the interactive map, and lets you define the parameters to model spatial accessibility to better reflect Canada’s diverse society. For example, individuals living in urban areas versus rural areas have different assumptions and considerations for how to access a service.

Spatial Accessibility

To understand how a phenomenon traverses through space, spatial interaction methodologies model spatial and non-spatial physical and social observations. For example, road networks and driving observations are leveraged to predict traffic flows; while demographic and social media data might be used to understand information flows. Spatial interaction theory can also be used for quantifying access through measuring spatial accessibility, that is the availability and accessibility of a given service, often presented as an index (Ma et al. 2018).

Access is a multidimensional construct that depends on both objective (e.g., financial, social and geographical) and subjective (e.g., local knowledge) determinants. Moreover, there are two perspectives of access, potential and revealed, which can be measured through assessing non-spatial and spatial barriers. Revealed access relies on data actually collected from a service, such as to better locate their services for their known customers (Bauer and Groneberg 2016). On the other hand, potential access relies on population data (e.g., Census) that represents those who likely need or want to have access to a service, such as to assure their service covers clients of interest (Joseph and Bantock 1982). Whether potential or revealed, barriers to access include the following: availability, accessibility, accommodation, affordability and acceptability. Spatial accessibility is the commonly described as the availability and accessibility of a given service (Bauer and Groneberg 2016).

The use of population distributions as a proxy for demand, instead of actual use, can result in inaccurate estimates of demand (as not everyone makes use of the service or at the same frequency). However, since utilization data is rarely available and quality assured, the literature for calculating spatial accessibility is often focused on making use of population data, resulting in estimates of potential accessibility. Moreover, in the context of government services, though it is important to assess a given service is accessible to its consumers, it is arguably more important to assure a government service is accessible to those who should and need to be receiving it.

Literature Review: Current Methods

Public Health research has led in the development and application of quantifying potential spatial accessibility, identifying population areas (i.e., Census geographic boundaries like Census Tracts) that are underserviced to health practioners. This research, however, has expanded into other socioeconomic applications, such as for businesses determining where they should develop a new service location.

Regional Availability Model (Provider-to-Population Ratios)

Though quantifying access is a complex problem, there is a simpler mechanism for measuring potential spatial accessibility through using the regional availability model. This model calculates the ratio of total supply provided by a given service to the total population in a given geographic unit, also known as provider-to-population ratios (PPRs) (Bauer and Groneberg 2016; Paez et al. 2019). The geographic unit for calculating these ratios is usually a Census boundary, like the Census Subdivision. Though computationally simple to calculate and potentially informative for aggregated analyses (hence regional), this model assumes the population served by a site is completely contained within the given area and that all people within the area have equal access to the service. This does not reflect reality, however, as individuals move across space non-uniformly, especially when utilizing granular geographic areas like Census Tracts (ibid.; Ma et al. 2018). Furthermore, the modifiable areal unit problem (MAUP) occurs, which is when a modification of the areal unit (e.g., Census Tracts to Postal Code areas) yields different results as geographic units are designed for their intended uses (e.g., Statistics Canada Census and Canada Post).

Gravity Model

To avoid confining the analysis within geographic units, potential spatial accessibility has been quantified through modifications of Newton’s model of gravity and the use of floating catchment areas to account for how distance influences demand (Hansen 1959; Joseph and Bantock 1982; Wang and Luo 2003). The gravity model assumes a given population’s accessibility to a service decreases as travel distance to that service increases; moreover, for a given distance, accessibility will rise as the magnitude of service available at the site increases. A key advantage of gravity-based approaches over the simpler regional availability models is that distance decay is considered, meaning your accessibility to a service decreases the further you are from it (which should result in more realistic estimates). The equation below details the basic gravity-based spatial accessibility model (Wan et al. 2012; Luo et al. 2014):

\[ A_i= \sum_{j=1}^n \frac{S_j f(d_i,_j)}{\sum_{k=1}^m P_k f(d_i,_j)} (1) \]

\(A_i\) = spatial accessibility at population demand location \(i\)

\(m\) = total number of demand locations

\(n\) = total number of service locations

\(S_j\) = supply at a given service location \(j\) (e.g.,Service Canada)

\(P_k\) = population count at a given demand area location \(k\)

\(d_i,_j\) = travel distance from location \(i\) to \(j\),where as

\(f(d)\) represents the* generalize distance decay function

The gravity model is essentially the PPR, but with a generalized \(f(d)\) distance decay function (also known as the distance impedance function) that determines how distance influences accessibility. The distance decay function has three common forms: inverse-power function \(d^{-β}\), exponential function \(e^{-βd}\) and the Gaussian function \(e^{-d^2/β}\) (Kwan 1998). Each of these functions take, as input, the distance between two objects. The distance calculations can be as simple as a Euclidean distance (straight line) from population location \(i\) to service location \(j\); or it can be more complex, considering physical barriers (e.g., rivers, elevation) and/or different transportation networks (e.g., road, public transit).

Floating Catchment Area (FCA) Methods: 2SFCA and Enhanced 3SFCA Models

Although this model accounts for demand and supply and how travel distance between the potential consumers and providers can influence accessibility, it cannot be interpreted intuitively, making it difficult to select a suitable distance decay function and decay coefficient (Wan et al. 2012; Luo et al. 2014; Bauer and Groneberg 2016). As such, floating catchment area methods were introduced as additional steps to more accurately implement the gravity-based model as a solution for measuring spatial accessibility; notably, the 2-Step Floating Catchment Area (2SFCA) methodology, introduced by Wang and Luo (2003). A catchment area is a buffer zone surrounding a given point location defined by a distance or time threshold. A popular example is a school catchment that determines students’ attendance eligibility. A catchment could be a simple circle buffer around a given point location (calculated by Euclidean distance) or a much more complex polygon that represents actual travel time or distance based on transportation networks. Figure 1 demonstrates both catchment areas that, for example, represent a travel time of 60 minutes by car. Catchments areas can represent distance and distance decay either through a uniform, piece-wise or continuous time/distance surface; moreover, if distance is calculated with transportation networks, the catchments better represent the physical and social landscape.

Figure 1: Example of catchment areas, either calculated with Euclidean distance to make a circle buffer; or, to better reflect the geographic landscape, with transportation network distance to make an irregular buffer. The different colours for the irregular catchment area demonstrates sub-zones based on same commute time thresholds, such as all areas within 5, 15, 25, and 45 minutes. This method attempts to account for distance decay with a piece-wise surface.

Figure 1: Example of catchment areas, either calculated with Euclidean distance to make a circle buffer; or, to better reflect the geographic landscape, with transportation network distance to make an irregular buffer. The different colours for the irregular catchment area demonstrates sub-zones based on same commute time thresholds, such as all areas within 5, 15, 25, and 45 minutes. This method attempts to account for distance decay with a piece-wise surface.

Essentially the 2SFCA methdology broke down calculating potential spatial accessibility into two steps: 1) calculate the PPR, and 2) sum the PPRs within a given population location’s catchment area to obtain the measure of accessibility. Each step is further explained below.

Step 1: For each service location \(j\), generate its catchment area \((D_j)\) by finding all populations points that are within the travel time or distance threshold \(d_0\) - that is all population demand location \(k\) such that \(d_(k,j)<d_0\). Then calculate the PPR (\(R_j\)) for that given service site as the ratio of its supply to the sum of the populations within its catchment area.

\[ R_j = \frac{S_j}{\sum_{k∈D_j}P_k} (2) \]

Step 2: For each population demand location \(i\), generate its catchment area (\(D_i\)) using the same threshold (\(d_0\)). Then calculate the accessibility for that location (\(A_i^F\)) as the sum of PPR of all service sites within the catchment - that is all \(j\) such that \(d_{i,j} < d_0\).

\[ A_i^F = \sum_{j∈D_i}R_j (3) \]

Though this approach is relatively simple to implement with the right geographic information system in place, this model neglects to account for the distance decay within each catchment area because it considers the decay as binary. Hence, population locations have equal access within a catchment, while those outside of the catchment area are considered inaccessible (Wan et al. 2012; Luo et al. 2014; Paez et al. 2019). Although limitations exist, the 2SFCA paved the way for various modifications, enhancements or additional steps for calculating potential spatial accessibility, known as the floating catchment area (FCA) family (Bauer and Groneberg 2018). For example, the enhanced 2SFCA (E2SFCA) attempts to account for distance decay by adding sub-zones within the catchment area (Luo and Qi 2009). (Sub-zones are demonstrated in Figure 1.) In another case, to reduce demand inflation, the 3SFCA method was introduced by Wan et al. (2012). This method assumes that a population’s demand on a service site is influenced by the availability of other nearby sites. In other words, when more options are available, an individual’s demand on a single site decreases. To account for this, Wan et al. introduced the use of selection weights of a potential population demand location on a service location. Figure 2 below demonstrates how the selection weight is considered for the 3SFCA method.

Figure 2: Example scenario to illustrate the limitations of the 3SFCA model.

Figure 2: Example scenario to illustrate the limitations of the 3SFCA model.

Each service location (A, B, C), with their supply value represented in parenthesis, have their distance decay function calculated and provided beside the lines between population location \(i\). The selection weight of \(i\) for service site A would be 0.3 / (0.5+0.3+0.4) = 0.25, B would be 0.33 and C would be 0.42. Then, to calculate the adjusted demand, A would be 0.25 x 0.3 x \(P_i\) = 0.075\(P_i\) while B would be 0.13\(P_i\) and C would be 0.21\(P_i\). Service site A has the smallest adjusted demand, yet has the largest capacity, a factor that certainly plays a role in someone’s decision-making.

Though this modification accounts for demand inflation observed in the 2SFCA approach through incorporating selection weights, there are limitations with the 3SFCA, mainly the selection weight calculations do not consider how sites’ supply can influence an individual’s decision. With this in mind, the enhanced 3SFCA introduced the integration of the Huff model (Luo 2014). It is this method that is currently applied for PASS. The model is illustrated as such, for population location \(i\) visiting service location \(j\):

\[ Prob_i = \frac{C_id_i,_j^{-β}}{\sum_{j∈D_i}C_id_i,_j^{-β}} (4) \] \(C_i\) = capacity/attractiveness of service location

\(β\) = decay coefficient

The probability of population location \(i\) visiting service location \(j\) depends on the attractiveness/capacity of other service locations within population location \(i\) catchment area, with considerations of distance decay. The Huff model is usually implemented with the inverse-power distance decay function, \(d^{-β}\) (ibid.). For \(β\) parameter, the distance decay coefficient, if there is business and local knowledge that the potential population demand are willing to commute further distances for the service, then \(β\) should be smaller. (CDO has been reviewing further literature to determine a more qunatitative approach for selecting the distance decay function and coefficient for the enhanced 3SFCA method.) \(C_i\) represents the attractiveness/capacity of the service. For childcare, this could be the amount of available seats or the number of employees. Since the Huff model incorporates the distance decay function, the surface is assumed continuous rather then stepwise; furthermore, capacity or other factors that could be considered attractive to consumers can be accounted for, providing a more realistic estimation of demand. With the probability values representing your selection weights, and \(W\) representing the distance decay weight, equation 2 and 3 (steps 2 and 3 of the FCA method) are modified as such:

\[ R_j = \frac{S_j}{\sum_{k∈D_j}Prob_{kj}P_kW_k,_j} (5) \]

\[ A_i^F= \sum_{_j∈D_i}Prob_{ij} R_j W_{ij} (6) \]

So, referring back to Figure 2, utilizing the Huff model to calculate the weights, service location A would have a probability of (20 x 0.3) / (20 x 0.3 + 10 x 0.4 + 4 x 0.5) = 0.5 while B would have a probability of 0.33 and C a probability of 0.17. The adjusted demand would then be 0.5 x 0.3 x \(P_i\) = 0.15\(P_i\), whereas B would be 0.132\(P_i\) and C would be 0.085\(P_i\), thus site A has the highest adjusted demand and C the lowest because capacity is now considered. This highlights how demand is more accurately respresented versus the demonstration provided above with the 3SFCA. Evidently, through the academic evaluations and modifications of various FCA methods, testing different uses of weights and distance decay, the Huff model offers a more realistic measure for calculating spatial accessibility from population locations to POS.

Data

Regardless of the trade-offs between the various FCA methods, all approaches require the same input data:

  1. The population geographic locations that represents the potential demand as longitude and latitude geographic points, which could be derived from centroids of geographic boundaries (e.g., Dissemination Area boundaries and centroids);
  2. In addition to where the potential demand population resides, the population counts per each population location is needed;
  3. The service geographic locations as longitude and latitude geographic points;
  4. Moreover, at least one variable to represent the supply \(S_j\) and capacity \(C_i\) of each service location, this could be a uniform value;
  5. Last, a distance matrix, which is the distance or time calculations between all population and service locations within a distance or time threshold.

The remainder of this report describes implementation of this enhanced 3SFCA methodology for Service Canada, and how it can be repurposed and improved for other use cases.

Reflection: Limitations

With a modifiable 3SFCA model available for use and scale, we can start investigating further literature to consider different factors that impact spatial accessibility, such as demographics, modes of transportation, time, and regional differences (e.g., urban versus rural). Moreover to improve how a distance decay function and subsequent coefficients are selected.

It is undecided how best to include demographic data. For instance, multiple accessibility scores could be individually produced, one for each group of interest, based on the count values per geographic unit. Alternatively, based on literature, a weighted population metric could be used in order to summarize accessibility over the entire population. This approach essentially simply biases the model towards specific subpopulations, thus policy should influence what gets weighted.

Aside from demographics, different modes of transportation could also be considered, particularly within urban areas public infrastructure for biking, walking, and public transit exists. Literature on this topic has not been reviewed yet, but CDO is investigating and testing the inclusion of public transportation, allowing to calculate scores per each mode of transportation. The time distance threshold would have to change per different transportation because how long an average person is willing to commute will vary by mode of transportation as well whether they are within an urban or rural area.

Work Cited