How Optimized Hot Spot Analysis Works

Optimized Hot Spot Analysis executes the Hot Spot Analysis (Getis-Ord Gi*) tool using parameters derived from characteristics of your input data. Similar to the way that the automatic setting on a digital camera will use lighting and subject versus ground readings to determine an appropriate aperture, shutter speed, and focus, the Optimized Hot Spot Analysis tool interrogates your data to obtain the settings that will yield optimal hot spot results. If, for example, the Input Features dataset contains incident point data, the tool will aggregate the incidents into weighted features. Using the distribution of the weighted features, the tool will identify an appropriate scale of analysis. The statistical significance reported in the Output Features will be automatically adjusted for multiple testing and spatial dependence using the False Discovery Rate (FDR) correction method.

Each of the decisions the tool makes in order to give you the best results possible is reported to the Results window and an explanation for these decisions is documented below. Right-clicking on the Messages entry in the Results window and selecting View will display this tool runtime information in a Message dialog box.

Just like your camera has a manual mode that allows you to override the automatic settings, the Hot Spot Analysis (Getis-Ord Gi*) tool gives you full control over all parameter options. Running the Optimized Hot Spot Analysis tool and noting the parameter settings it uses may help you refine the parameters you provide to the full control Hot Spot Analysis (Getis-Ord Gi*) tool.

The workflow for the Optimized Hot Spot Analysis tool includes the following components. The calculations and algorithms used within each of these components are described below.

Initial data assessment

In this component, the Input Features and the optional Analysis Field, Bounding Polygons Defining Where Incidents Are Possible, and Polygons For Aggregating Incidents Into Points are scrutinized to ensure there are sufficient features and adequate variation in the values to be analyzed. If the tool encounters records with corrupt or missing geometry, or if an Analysis Field is specified and null values are present, the associated records will be listed as bad records and excluded from analysis.

The Optimized Hot Spot Analysis tool uses the Getis-Ord Gi* (pronounced Gee Eye Star) statistic and, similar to many statistical methods, the results are not reliable when there are less than 30 features. If you provide polygon Input Features or point Input Features and an Analysis Field, you will need a minimum of 30 features to use this tool. The minimum number of Polygons For Aggregating Incidents Into Points is also 30. The feature layer representing Bounding Polygons Defining Where Incidents Are Possible may include one or more polygons.

The Gi* statistic also requires values to be associated with each feature it analyzes. When the Input Features you provide represent incident data (when you don't provide an Analysis Field), the tool will aggregate the incidents and the incident counts will serve as the values to be analyzed. After the aggregation process completes, there still must be a minimum of 30 features, so with incident data you will want to start with more than 30 features. The table below documents the minumum number of features for each Incident Data Aggregation Method:

Minimum Number of Incidents

Aggregation Method

Minimum Number of Features After Aggregation

60

COUNT INCIDENTS WITHIN FISHNET POLYGONS, without specifying Bounding Polygons Defining Where Incidents Are Possible

30

30

COUNT INCIDENTS WITHIN FISHNET POLYGONS, when you do provide a feature class for the Bounding Polygons Defining Where Incidents Are Possible parameter

30

30

COUNT INCIDENTS WITHINN AGGREGATION POLYGONS

30

60

SNAP NEARBY INCIDENTS TO CREATE WEIGHTED POINTS

30

The Gi* statistic was also designed for an Analysis Field with a variety of different values. The statistic is not appropriate for binary data, for example. The Optimized Hot Spot Analysis tool will check the Analysis Field to make sure that the values have at least some variation.

If you specify a path for the Density Surface, this component of the tool workflow will also check the raster analysis mask environment setting. If no raster analysis mask is set, it will construct a convex hull around the incident points to use for clipping the output Density Surface raster layer. The Density Surface parameter is only enabled when your Input Features are points and you have the ArcGIS Spatial Analyst extension. It is disabled for all but the SNAP_NEARBY_INCIDENTS_TO_CREATE_WEIGHTED_POINTS Incident Data Aggregation Method.

Locational outliers are features that are much farther away from neighboring features than the majority of features in the dataset. Think of an urban environment with large, densely populated cities in the center, and smaller, less densely populated cities at the periphery. If you computed the average nearest neighbor distance for these cities you would find that the result would be smaller if you excluded the peripheral locational outliers and focused only on the cities near the urban center. This is an example of how locational outliers can have a strong impact on spatial statistics such as Average Nearest Neighbor. Since the Optimized Hot Spot Analysis tool uses the average and the median nearest neighbor calculations for aggregation and also to identify an appropriate scale of analysis, the Initial Data Assessment component of the tool will also identify any locational outliers in the Input Features or Polygons For Aggregating Incidents Into Points and will report the number it encounters. To do this, the tool computes each feature's average nearest neighbor distance and evaluates the distribution of all of these distances. Features that are more than a three standard deviation distance away from their closest noncoincident neighbor are considered locational outliers.

Incident Aggregation

For incident data the next component in the workflow aggregates your data. There are three possible approaches based on the Incident Data Aggregation Method you select. The algorithms for each of these approaches are described below.

Scale of analysis

This next component of the Optimized Hot Spot Analysis workflow is applied to weighted features either because you provided Input Features with an Analysis Field or because the Incident Aggregation procedure has created weights from incident counts. The next step is to identify an appropriate scale of analysis. The ideal scale of analysis is a distance that matches the scale of the question you are asking (if you are looking for hot spots of a disease outbreak and know that the mosquito vector has a range of 10 miles, for example, using a 10-mile distance would be most appropriate). When you can't justify any specific distance to use for your scale of analysis, there are some strategies to help with this. The Optimized Hot Spot Analysis tool employs these strategies.

The first strategy tried is Incremental Spatial Autocorrelation. Whenever you see spatial clustering in the landscape, you are seeing evidence of underlying spatial processes at work. The Incremental Spatial Autocorrelation tool performs the Global Moran's I statistic for a series of increasing distances, measuring the intensity of spatial clustering for each distance. The intensity of clustering is determined by the z-score returned. Typically, as the distance increases, so does the z-score, indicating intensification of clustering. At some particular distance, however, the z-score generally peaks. Peaks reflect distances where the spatial processes promoting clustering are most pronounced. The Optimized Hot Spot Analysis tool looks for peak distances using Incremental Spatial Autocorrelation. If a peak distance is found, this distance becomes the scale for analysis. If multiple peak distances are found, the first peak distance is selected.

When no peak distance is found, Optimized Hot Spot Analysis examines the spatial distribution of the features and computes the average distance that would yield K neighbors for each feature. K is computed as 0.05 * N, where N is the number of features in the Input Features layer. K will be adjusted so that it is never smaller than three or larger than 30. If the average distance that would yield K neighbors exceeds one standard distance, the scale of analysis will be set to one standard distance; otherwise, it will reflect the K neighbor average distance.

The Incremental Spatial Autocorrelation step can take a long time to finish for large, dense datasets. Consequently, when a feature with 500 or more neighbors is encountered, the incremental analysis is skipped, and the average distance that would yield 30 neighbors is computed and used for the scale of analysis.

The distance reflecting the scale of analysis will be reported to the Results window and will be used to perform the hot spot analysis. If you provide a path for the Density Surface parameter, this optimal distance will also serve as the search radius with the Kernel Density tool. This distance corresponds to the Distance Band or Threshold Distance parameter used by the Hot Spot Analysis (Getis-Ord Gi*) tool.

Hot spot analysis

At this point in the Optimized Hot Spot Analysis workflow all of the checks and parameter settings have been made. The next step is to run the Getis-Ord Gi* statistic. Details about the mathematics for this statistic are outlined in How Hot Spot Analysis (Getis-Ord Gi*) works. Results from the Gi* statistic will be automatically corrected for multiple testing and spatial dependence using the False Discovery Rate (FDR) correction method. Messages to the Results window summarize the number of features identified as statistically significant hot or cold spots, after the FDR correction is applied.

Output

The last component of the Optimized Hot Spot Analysis tool is to create the Output Features and, if specified, the Density Surface raster layer. If the Input Features represent incident data requiring aggregation, the Output Features will reflect the aggregated weighted features (fishnet polygon cells, the aggregation polygons you provided for the Polygons For Aggregating Incidents Into Points parameter, or weighted points). Each feature will have a z-score, p-value, and Gi_Bin result.

When specified, the Density Surface is created using the Kernel Density tool. The search radius for this tool is the same as the scale of analysis distance used for hot spot analysis. The default rendering is stretched values along a gray scale color ramp. If a raster analysis mask is specified in the environment settings, the output Density Surface will be clipped to the analysis mask. If the raster analysis mask isn't specified, the Density Surface will be clipped to a convex hull around the Input Features centroids.

LicenseLicense:

The Kernel Density tool is used to create the density surface; because this tool is part of the ArcGIS Spatial Analyst extension, the Density Surface parameter remains disabled if you don't have this extension.

8/26/2014