Create lists of data

One of the foremost tasks in a batch processing script is cataloging the available data so it can iterate through the data during processing. ArcPy has a number of functions built specifically for creating such lists.

ListFields(dataset, wild_card, field_type)

Returns a list of fields found in the input value

ListIndexes(dataset, wild_card)

Returns a list of attribute indexes found in the input value

ListDatasets(wild_card, feature_type)

Returns the datasets in the current workspace

ListFeatureClasses(wild_card, feature_type)

Returns the feature classes in the current workspace

ListFiles(wild_card)

Returns the files in the current workspace

ListRasters(wild_card, raster_type)

Returns a list of rasters found in the current workspace

ListTables(wild_card, table_type)

Returns a list of tables found in the current workspace

ListWorkspaces(wild_card, workspace_type)

Returns a list of workspaces found in the current workspace

ListVersions(sde_workspace)

Returns a list of versions the connected user has permission to use

List functions

The result of each of these functions is a Python list, which is a list of values. A list in scripting can contain any type of data, such as a string, which could be, for example, a path to a dataset, field, or row from a table. Once the list has been created with the values you want, you can loop through it in your script to work with each individual value.

Learn more about listing tools, toolboxes, and environment settings

List function parameters

The parameters of these functions are similar. A few, such as ListFields, require an input dataset value, since the items the functions are listing reside within a certain object or dataset. Other functions do not require an input dataset because they list types of data in the current workspace that are defined in the environment settings. All functions have a wildcard parameter, which is used to restrict the objects or datasets listed by name. A wildcard defines a name filter, and all the contents in the newly created list must pass that filter. For example, you may want to list all the feature classes in a workspace that start with the letter G. The following example shows how this is done:

import arcpy

# Set the workspace. List all of the feature classes that start with 'G'
#
arcpy.env.workspace = "D:/St_Johns/data.gdb"
fcs = arcpy.ListFeatureClasses("G*")

The list can also be restricted to match certain data properties, such as only polygon feature classes, integer fields, or coverage datasets. This is what the Type parameter is used for in all the functions. In the next example, the feature classes in a workspace are filtered using a wildcard and a data type, so only polygon feature classes that start with the letter G are in the resulting list:

# Set the workspace. List all of the polygon feature classes that 
#   start with 'G'
#
arcpy.env.workspace = "D:/St_Johns/data.gdb"
fcs = arcpy.ListFeatureClasses("G*", "polygon")

Using your list

ArcPy uses a Python list type as the returned type for all its list function results, since lists support the flexibility required for simple data access and multiple data types. A For loop is ideal for working with a Python list because it can be used to step through the list one item at a time. A For loop iterates through each item in the list. Below is an example of a For loop used to iterate through the list generated in the previous example:

# For each feature class in the list of feature classes
#
for fc in fcs: 
    # Copy the features from the workspace to a folder
    #
    arcpy.CopyFeatures_management(fc, "D:/St_Johns/Shapefiles/" + fc)

Following is another example of how to use a List function. The script is used to create raster pyramids for all rasters that are Tagged Image File Format (TIFF) images within a folder.

# Set the workspace. List all of the TIFF files
#
arcpy.env.workspace= "D:/St_Johns/images"

# For each raster in the list of rasters
#
for tiff in arcpy.ListRasters("*", "TIF"): 
    # Create pyramids
    #
    arcpy.BuildPyramids_management(tiff)

A Python list provides the opportunity to use and manage the results of a list function in a variety of ways. Lists are a versatile Python type and provide a number of methods (append, count, extend, index, insert, pop, remove, reverse, and sort) that can be used to manipulate and extract information.

For instance, if you want to know how many feature classes you have in a workspace, you can use Python's built-in len function to provide that number.

import arcpy

arcpy.env.workspace = "c:/St_Johns/Shapefiles"

fcs = arcpy.ListFeatureClasses()

# Use Python's built-in function len to reveal the number of feature classes
#   in the workspace
#
fcCount = len(fcs)
print fcCount
TipTip:

A Python list can easily show its contents. Python lists can be manipulated with a number of methods, including sort, append, and reverse.

>>> import arcpy
>>> arcpy.env.workspace  = "c:/data/water.gdb"
>>> fcs = arcpy.ListFeatureClasses()
>>> print fcs
[u'water_pipes', u'water_services', u'water_stations']

>>> fcs.sort(reverse=True)
>>> print fcs
[u'water_stations', u'water_services', u'water_pipes']

As lists are an ordered collection; they also permit indexing and slicing.

>>> print fcs[0]
water_stations

>>> print fcs[1:]
[u'water_services', u'water_pipes']

List function type keywords

The default behavior for all list functions is to list all supported types. A keyword is used to restrict the returned list to a specific type. The type keywords for each function are listed in the table below.

Function

Type keywords

ListDatasets

All, Feature, Coverage, RasterCatalog, CAD, VPF, TIN, Topology

ListFeatureClasses

All, Point, Label, Node, Line, Arc, Route, Polygon, Region

ListFields

All, SmallInteger, Integer, Single, Double, String, Date, OID, Geometry, BLOB

ListTables

All, dBASE, INFO

ListRasters

All, ADRG, BIL, BIP, BSQ, BMP, CADRG, CIB, ERS, GIF, GIS, GRID, STACK, IMG, JPEG, LAN, SID, SDE, TIFF, RAW, PNG, NITF

ListWorkspaces

All, Coverage, Access, SDE, Folder

Type keywords for List functions

Searching directories and subdirectories

ArcPy list functions can be used to iterate over a single directory or workspace, but in some cases, iterating through other subfolders and workspaces is necessary. For files, this can be achieved with Python's os.walk function which is used to iterate or walk through folders to find additional subfolders and files. However, os.walk is strictly file based and does not recognize databases and non-file-based data types that are important in ArcGIS. For instance, os.walk will not see raster datasets or other contents in a file geodatabase workspace or a feature dataset.

In the arcpy.da module, the Walk function can also be used to iterate through a directory tree but can also look into databases and identify ArcGIS data types.

Walk(top, topdown, onerror, followlinks, datatype, type)

Generate data names in a Catalog tree by walking the tree top-down or bottom-up. Each directory/workspace in the tree yields a tuple of three (dirpath, dirnames, filenames).

arcpy.da.Walk function
NoteNote:

Unlike the List functions, Walk does not use the workspace environment to identify its starting workspace. Instead, the first starting (or top) workspace that Walk traverses is specified in its first argument, top.

In the following example, the Walk function is used to iterate through a Catalog tree and identify all Polygon feature classes contained within.

Use Walk function to catalog polygon feature classes.

import arcpy
import os
workspace = "c:/data"
feature_classes = []
for dirpath, dirnames, datatypes in arcpy.da.Walk(workspace,
                                                  datatype="FeatureClass",
                                                  type="Polygon"):

    # Append all Polygon feature classes to a list for further processing
    for datatype in datatypes:
        feature_classes.append(os.path.join(dirpath, filename))

In some cases, there may be subdirectories that should be avoided when traversing the Catalog tree, for example, directories of backed-up files. If the topdown argument is True or unspecified, the workspaces can be modified in-place to avoid any undesired workspaces or to add additional workspaces as they are created.

Use the Walk function to catalog raster data. Any rasters in a folder named back_up will be ignored.

import arcpy
import os
workspace = "c:/data"
rasters = []
for dirpath, dirnames, filenames in arcpy.da.Walk(workspace,
                                                  topdown=True,
                                                  datatype="RasterDataset"):
    # Disregard any folder named 'back_up' in creating list 
    #  of rasters
    if "back_up" in dirnames:
        dirnames.remove('back_up')
    for filename in filenames:
        rasters.append(os.path.join(dirpath, filename))

Example: Using arcpy.da.Walk to analyze data

The Walk function (as well as List functions) is commonly used for processing data in bulk. The following script takes advantage of the arcpy.da.Walk function to analyze all datasets in an SDE workspace.

import arcpy
import os

# SDE workspace to be used
admin_workspace = "Database Connections/tenone@sde.sde"

analyze_contents = []

for dirpath, workspaces, datatypes in arcpy.da.Walk(
    admin_workspace,
    followlinks=True,
    datatype=['Table', 'FeatureClass', 'RasterDataset']):

    # Create full path, and add tables, feature classes, raster datasets
    analyze_contents += [
        os.path.join(dirpath, datatype) for datatype in datatypes]

    # create full path, add the feature datasets of the .sde file
    analyze_contents += [
        os.path.join(dirpath, workspace) for workspace in workspaces]

# Execute Analyze Datasets on the complete list
arcpy.AnalyzeDatasets_management(admin_workspace,
                                 "SYSTEM",
                                 analyze_contents,
                                 "ANALYZE_BASE",
                                 "ANALYZE_DELTA",
                                 "ANALYZE_ARCHIVE")
TipTip:

By default, sde connection files are not iterated to save from unintentional opening of remote databases. To deliberately iterate through an sde connection file, set the followlinks argument to True.

Related Topics

4/12/2013