Create lists of data
One of the foremost tasks in a batch processing script is cataloging the available data so it can iterate through the data during processing. ArcPy has a number of functions built specifically for creating such lists.
ListFields(dataset, wild_card, field_type) |
Returns a list of fields found in the input value |
ListIndexes(dataset, wild_card) |
Returns a list of attribute indexes found in the input value |
ListDatasets(wild_card, feature_type) |
Returns the datasets in the current workspace |
ListFeatureClasses(wild_card, feature_type) |
Returns the feature classes in the current workspace |
ListFiles(wild_card) |
Returns the files in the current workspace |
ListRasters(wild_card, raster_type) |
Returns a list of rasters found in the current workspace |
ListTables(wild_card, table_type) |
Returns a list of tables found in the current workspace |
ListWorkspaces(wild_card, workspace_type) |
Returns a list of workspaces found in the current workspace |
ListVersions(sde_workspace) |
Returns a list of versions the connected user has permission to use |
The result of each of these functions is a Python list, which is a list of values. A list in scripting can contain any type of data, such as a string, which could be, for example, a path to a dataset, field, or row from a table. Once the list has been created with the values you want, you can loop through it in your script to work with each individual value.
List function parameters
The parameters of these functions are similar. A few, such as ListFields, require an input dataset value, since the items the functions are listing reside within a certain object or dataset. Other functions do not require an input dataset because they list types of data in the current workspace that are defined in the environment settings. All functions have a wildcard parameter, which is used to restrict the objects or datasets listed by name. A wildcard defines a name filter, and all the contents in the newly created list must pass that filter. For example, you may want to list all the feature classes in a workspace that start with the letter G. The following example shows how this is done:
import arcpy
# Set the workspace. List all of the feature classes that start with 'G'
#
arcpy.env.workspace = "D:/St_Johns/data.gdb"
fcs = arcpy.ListFeatureClasses("G*")
The list can also be restricted to match certain data properties, such as only polygon feature classes, integer fields, or coverage datasets. This is what the Type parameter is used for in all the functions. In the next example, the feature classes in a workspace are filtered using a wildcard and a data type, so only polygon feature classes that start with the letter G are in the resulting list:
# Set the workspace. List all of the polygon feature classes that
# start with 'G'
#
arcpy.env.workspace = "D:/St_Johns/data.gdb"
fcs = arcpy.ListFeatureClasses("G*", "polygon")
Using your list
ArcPy uses a Python list type as the returned type for all its list function results, since lists support the flexibility required for simple data access and multiple data types. A For loop is ideal for working with a Python list because it can be used to step through the list one item at a time. A For loop iterates through each item in the list. Below is an example of a For loop used to iterate through the list generated in the previous example:
# For each feature class in the list of feature classes
#
for fc in fcs:
# Copy the features from the workspace to a folder
#
arcpy.CopyFeatures_management(fc, "D:/St_Johns/Shapefiles/" + fc)
Following is another example of how to use a List function. The script is used to create raster pyramids for all rasters that are Tagged Image File Format (TIFF) images within a folder.
# Set the workspace. List all of the TIFF files
#
arcpy.env.workspace= "D:/St_Johns/images"
# For each raster in the list of rasters
#
for tiff in arcpy.ListRasters("*", "TIF"):
# Create pyramids
#
arcpy.BuildPyramids_management(tiff)
A Python list provides the opportunity to use and manage the results of a list function in a variety of ways. Lists are a versatile Python type and provide a number of methods (append, count, extend, index, insert, pop, remove, reverse, and sort) that can be used to manipulate and extract information.
For instance, if you want to know how many feature classes you have in a workspace, you can use Python's built-in len function to provide that number.
import arcpy
arcpy.env.workspace = "c:/St_Johns/Shapefiles"
fcs = arcpy.ListFeatureClasses()
# Use Python's built-in function len to reveal the number of feature classes
# in the workspace
#
fcCount = len(fcs)
print fcCount
A Python list can easily show its contents. Python lists can be manipulated with a number of methods, including sort, append, and reverse.
>>> import arcpy
>>> arcpy.env.workspace = "c:/data/water.gdb"
>>> fcs = arcpy.ListFeatureClasses()
>>> print fcs
[u'water_pipes', u'water_services', u'water_stations']
>>> fcs.sort(reverse=True)
>>> print fcs
[u'water_stations', u'water_services', u'water_pipes']
As lists are an ordered collection; they also permit indexing and slicing.
>>> print fcs[0]
water_stations
>>> print fcs[1:]
[u'water_services', u'water_pipes']
List function type keywords
The default behavior for all list functions is to list all supported types. A keyword is used to restrict the returned list to a specific type. The type keywords for each function are listed in the table below.
Function |
Type keywords |
---|---|
ListDatasets |
All, Feature, Coverage, RasterCatalog, CAD, VPF, TIN, Topology |
ListFeatureClasses |
All, Point, Label, Node, Line, Arc, Route, Polygon, Region |
ListFields |
All, SmallInteger, Integer, Single, Double, String, Date, OID, Geometry, BLOB |
ListTables |
All, dBASE, INFO |
ListRasters |
All, ADRG, BIL, BIP, BSQ, BMP, CADRG, CIB, ERS, GIF, GIS, GRID, STACK, IMG, JPEG, LAN, SID, SDE, TIFF, RAW, PNG, NITF |
ListWorkspaces |
All, Coverage, Access, SDE, Folder |
Searching directories and subdirectories
ArcPy list functions can be used to iterate over a single directory or workspace, but in some cases, iterating through other subfolders and workspaces is necessary. For files, this can be achieved with Python's os.walk function which is used to iterate or walk through folders to find additional subfolders and files. However, os.walk is strictly file based and does not recognize databases and non-file-based data types that are important in ArcGIS. For instance, os.walk will not see raster datasets or other contents in a file geodatabase workspace or a feature dataset.
In the arcpy.da module, the Walk function can also be used to iterate through a directory tree but can also look into databases and identify ArcGIS data types.
Walk(top, topdown, onerror, followlinks, datatype, type) |
Generate data names in a Catalog tree by walking the tree top-down or bottom-up. Each directory/workspace in the tree yields a tuple of three (dirpath, dirnames, filenames). |
Unlike the List functions, Walk does not use the workspace environment to identify its starting workspace. Instead, the first starting (or top) workspace that Walk traverses is specified in its first argument, top.
In the following example, the Walk function is used to iterate through a Catalog tree and identify all Polygon feature classes contained within.
Use Walk function to catalog polygon feature classes.
import arcpy
import os
workspace = "c:/data"
feature_classes = []
for dirpath, dirnames, datatypes in arcpy.da.Walk(workspace,
datatype="FeatureClass",
type="Polygon"):
# Append all Polygon feature classes to a list for further processing
for datatype in datatypes:
feature_classes.append(os.path.join(dirpath, filename))
In some cases, there may be subdirectories that should be avoided when traversing the Catalog tree, for example, directories of backed-up files. If the topdown argument is True or unspecified, the workspaces can be modified in-place to avoid any undesired workspaces or to add additional workspaces as they are created.
Use the Walk function to catalog raster data. Any rasters in a folder named back_up will be ignored.
import arcpy
import os
workspace = "c:/data"
rasters = []
for dirpath, dirnames, filenames in arcpy.da.Walk(workspace,
topdown=True,
datatype="RasterDataset"):
# Disregard any folder named 'back_up' in creating list
# of rasters
if "back_up" in dirnames:
dirnames.remove('back_up')
for filename in filenames:
rasters.append(os.path.join(dirpath, filename))
Example: Using arcpy.da.Walk to analyze data
The Walk function (as well as List functions) is commonly used for processing data in bulk. The following script takes advantage of the arcpy.da.Walk function to analyze all datasets in an SDE workspace.
import arcpy
import os
# SDE workspace to be used
admin_workspace = "Database Connections/tenone@sde.sde"
analyze_contents = []
for dirpath, workspaces, datatypes in arcpy.da.Walk(
admin_workspace,
followlinks=True,
datatype=['Table', 'FeatureClass', 'RasterDataset']):
# Create full path, and add tables, feature classes, raster datasets
analyze_contents += [
os.path.join(dirpath, datatype) for datatype in datatypes]
# create full path, add the feature datasets of the .sde file
analyze_contents += [
os.path.join(dirpath, workspace) for workspace in workspaces]
# Execute Analyze Datasets on the complete list
arcpy.AnalyzeDatasets_management(admin_workspace,
"SYSTEM",
analyze_contents,
"ANALYZE_BASE",
"ANALYZE_DELTA",
"ANALYZE_ARCHIVE")
By default, sde connection files are not iterated to save from unintentional opening of remote databases. To deliberately iterate through an sde connection file, set the followlinks argument to True.