How raster data is stored and managed

Raster data structures and storage models

Image and raster data is usually stored in its original form. Rarely will you edit individual pixel values, like you might edit a feature in a vector dataset. You often process this data to create new forms that can be processed on the fly or saved as another version. These datasets, and collections of them, are often very large, so having good management capabilities is critical and ArcGIS is designed to do this.

There are three methods to store image and raster data: as files in a file system, within a geodatabase, or managed from within the geodatabase but stored in a file system. This decision also involves determining whether to store all the data in a single dataset or in a catalog of potentially many datasets. If you're choosing to store the data in a file system, then you're choosing to store raster datasets whereas a geodatabase can store either raster datasets or mosaic datasets. A third geodatabase option is the raster catalog. It is not discussed below because it has been superceded by the mosaic dataset, which has many more capabilities, uses, and functions.

Raster datasets

Most imagery and raster data (such as an ortho photo or DEM) is provided as a raster dataset. The term raster dataset refers to any raster data model that is stored on disk or in a geodatabase. It's the most basic raster data storage model in which the others are built upon—mosaic datasets manage raster datasets. It's also the output from many geoprocessing tools that process raster data. Below is an example of a raster dataset.

Raster dataset example

A raster dataset is any valid raster format organized into one or more bands. Each band consists of an array of pixels (cells), and each pixel has a value. A raster dataset has at least one band. ArcGIS supports more than 70 different file formats for raster dataset, including TIFF, JPEG 2000, Esri Grid, and MrSid.

Learn about the supported raster dataset file formats

Mosaic datasets

A mosaic dataset is a collection of raster datasets (images) stored as a catalog and viewed or accessed as a single mosaicked image or individual images (rasters). These collections can be extremely large both in total file size and number of datasets. The raster datasets in a mosaic dataset can remain in their native format on disk or exist in the geodatabase. The metadata can be managed within the raster's record as well as attributes in the attribute table. Storing metadata as attributes enables parameters such as sensor orientation data to be managed easily as well as enabling fast queries to enable selections.

Mosaic dataset diagram

The data in a mosaic dataset does not have to be adjoining or overlapping but can exist as unconnected, discontinuous datasets. For example, you can have images that completely cover an area or you can have many strips of images that may not join together to form a continuous image (such as along pipelines).

Contiguous data coverageDiscontinuous data coverage

The data can even be completely or partially overlapping but be captured over different dates. The mosaic dataset is an ideal dataset for storing temporal data. You can query the mosaic dataset for the images you need based on time or dates and use a mosaic method to display the mosaicked image according to a time or date attribute.

Mosaic datasets are not limited to one particular type of raster data. You can add raster data in different projections, resolutions, pixel depths, and number of bands. Overviews (like pyramids) can be generated for the entire data collection. This allows for faster viewing of the data and allows you to easily serve these datasets. There are also many additional properties for viewing, including setting a mosaicking method, that make these datasets unique and functional in many situations. You can also query a mosaic dataset based on your spatial and nonspatial query constraints. The results of that query can be a set of images that you could process one by one, or it could be a dynamically generated mosaicked image.

In addition to raster data, you can store and manage lidar data in a mosaic dataset in the same way as raster datasets and even together with raster datasets. The lidar data can be stored as in the file system as LAS files or LAS datasets, or in a geodatabase as a terrain dataset.

LicenseLicense:

Mosaic datasets are versioned. The following table describes how they are supported between versions:

Client

10.0 mosaic dataset

10.1 mosaic dataset

10.0

Full (read/modify/create)

Not supported

10.1

Read-only

Full (read/modify/create)

Learn more about the mosaic dataset

Comparing raster data storage models

Storing the raster datasets individually is often the best method when the datasets are not adjacent to each other or are rarely used on the same project. Mosaicking your inputs together to form one large, single extent of raster data is appropriate for many applications, but a mosaic dataset may be desired for any of these reasons:

Comparing the raster data storage models

Raster dataset

Mosaic dataset

Description

A single picture of an object or a seamless image covering a spatially continuous area. This may be a single original image or the result of many images appended (mosaicked) together.

Raster dataset icon

A collection of raster datasets stored as a catalog that allows you to store, manage, view, and query collections of raster and lidar data. It is viewed as a mosaicked image, but you have access to each dataset in the collection.

Mosaic dataset icon

Storage

As a file on disk or within a geodatabase.

Within a geodatabase, but can have a reference stored as a file on disk.

Map layers

One map layer.

One map layer.

Homogeneous or heterogeneous data

Homogeneous data: a single format, data type, and file.

Heterogeneous data: multiple formats, data types, file sizes, and coordinate systems.

Metadata

Stored once and applies to complete dataset.

Can be stored within the raster record and as attributes in the attribute table.

Downsampled datasets

A single pyramid on the entire raster dataset.

Pyramids for each raster dataset, as well as overviews (such as a pyramid) for the entire collection.

Geoprocessing and image analysis

  • Can be used as a data source in many geoprocessing and analysis tools.
  • Can be used in the Image Analysis window.
  • Can be used as a data source in many geoprocessing and analysis tools.
  • Can be used in the Image Analysis window.

Pros

  • Fast to display at any scale.
  • Mosaic saves space, since there is no overlapping data.
  • Manages large collections of raster data.
  • Fast to display at any scale.
  • No loss of data to create mosaic.
  • User has access to full content of collection.
  • Properties can be set to control the mosaicked display.
  • On-the-fly processing.

Cons

File and personal geodatabase raster datasets are slower to update because the entire file has to be rewritten.

Overviews can take time to generate.

Serving

Can be served directly as an image service.

Can be served directly as an image service.

Recommendations

Use raster datasets when overlaps between mosaicked images do not need to be retained and for fast display of large quantities of raster data.

Use a mosaic dataset for managing and visualizing raster and lidar data. It's good for multidimensional data, querying, storing metadata, and overlapping data, and it provides a good hybrid solution.

A comparison of the raster data storage models

Raster data storage in the geodatabase

Store raster data in the geodatabase when you want to manage rasters, add behavior, and control the schema; when you want to manage a well-defined set of raster datasets as part of your DBMS; and when you require a single architecture for managing all your content. There are three main types of geodatabases: enterprise, personal, and file.

The enterprise geodatabase uses ArcSDE and can support multiple operations within its DBMS. File geodatabases (like the personal geodatabases) are designed to be edited by a single user and do not support versioning. They reside in your file system directory; thus they do not require a password for access. The file geodatabases and enterprise geodatabases share the same basic storage schema.

NoteNote:

The functional behavior of each geodatabase is basically the same; however, there are some exceptions for specific tools or procedures. For information about the differences in behavior by a tool or procedure, refer to the specific tool or procedure with this help system.

Comparing raster storage in file, enterprise, and personal geodatabases

Raster storage characteristic

File geodatabase

Enterprise geodatabase

Personal geodatabase

Size limit

1 TB for each raster dataset

Unlimited; limit dependent on DBMS limits

2 gigabytes (GB) per geodatabase (This is a table size limit, not a limit on the raster dataset size.)

Raster dataset file format

File geodatabase raster dataset

ArcSDE raster dataset

ERDAS IMAGINE, JPEG, or JPEG 2000

Storage

  • Raster dataset: Managed
  • Mosaic dataset: Unmanaged
  • Raster as attribute: Managed or unmanaged
  • Raster dataset: Managed
  • Mosaic dataset: Unmanaged
  • Raster as attribute: Managed
  • Raster dataset: Managed
  • Mosaic dataset: Unmanaged
  • Raster as attribute: Managed or unmanaged

Stored in the file system

Stored in an RDBMS

Stored in Microsoft Access

Compression

LZ77, JPEG, JPEG 2000, or None

LZ77, JPEG, JPEG 2000, or None

LZ77, JPEG, JPEG 2000, or None

Pyramids

Supports partial pyramiding

Supports partial pyramiding

Rebuilds entire pyramid

Mosaicking

Allows you to append to a raster dataset when mosaicking

Allows you to append to a raster dataset when mosaicking

Rewrites a new dataset every time you mosaic to a raster dataset

Updating

Allows incremental updating

Allows incremental updating

Number of users

Single user and small workgroups; some readers and one writer

Multiuser; many users and many writers

Single user and small workgroups; some readers and one writer

File vs. personal vs. enterprise geodatabases

File geodatabase

The storage model of the file geodatabases is a hybrid of the enterprise geodatabase and the personal geodatabase where managed raster data follows the storage model of the enterprise geodatabase and unmanaged raster data follows the storage model of the personal geodatabase. File geodatabases are also similar to personal geodatabases because they are designed to be edited by a single user and do not support versioning. They reside in your file system directory; thus they do not require a password for access. The file geodatabases and enterprise geodatabases share the same basic storage schema.

A file geodatabase has several advantages over the use of a personal geodatabase. Like the enterprise geodatabase, the file geodatabase stores data in blocks. This provides a more efficient access to data—especially during the mosaic operation. When mosaicking data in a file geodatabase, only overlapping blocks are updated. If an overlapping block does not exist, a new block is inserted. Partial blocks are padded with NoData pixels. In addition, the file geodatabase (and enterprise) storage model employs partial pyramid updates, which saves time. Also the data structure of the file and enterprise geodatabases are the same—fast copy technology is used to copy and paste data between the file and enterprise geodatabases.

The file geodatabase also accepts configuration keywords, but unlike the enterprise geodatabase, the configuration keywords have a standard predefined value. For more information about configuration keywords, see Configuration keywords for file geodatabases.

Enterprise geodatabase

When raster data is stored within the enterprise geodatabase, it offers an enterprise level of functionality, such as security, multiuser access, and data sharing. The following are three main reasons to store your raster data as an enterprise geodatabase:

  • It will not be updated very regularly (such as every two or three years or longer).
  • It will be accessed in read-only use cases (such as using it as basemap data under vector data).
  • There are hundreds of users (or more) that will access it as a basemap.

Because of its storage structure, the raster data is said to be managed, or fully controlled, by the geodatabase. Enterprise geodatabases always store all the raster information (pixels, spatial reference, any associated table, and other metadata) for raster datasets and raster attributes within the associated relational database. This means that all input raster information is loaded into the database and can be thought of as a format conversion.

The enterprise geodatabase evenly tiles the bands into blocks of pixels according to a user-defined dimension (the default is 128 by 128). Tiling the raster band data enables efficient storage and retrieval of the raster data. The pyramid information is stored according to a declining resolution. The height of the pyramid is determined by the number of levels specified by the application or user.

The raster blocks table (the largest table and the one that stores the actual pixel information and pyramids) stores one row per block (tile) per band in a raster dataset and per pyramid level. For example, a three-band raster divided into 12 blocks with no pyramids built will have 36 rows in the BLK table—12 separate blocks for each of the bands. The column containing the pixel data for the block is a binary large object (BLOB).

Learn about raster data storage in these DBMS's

DBMS

Data storage model

DB2

Raster datasets

Mosaic datasets

Informix

Raster datasets

Mosaic datasets

Oracle

Raster datasets

Mosaic datasets

PostgreSQL

Raster datasets

Mosaic datasets

SQL Server

Raster datasets

Mosaic datasets

Personal geodatabase

In a personal geodatabase, the raster dataset is converted to an IMAGINE (.img) file and stored inside an image database (IDB) folder. The IDB folder is located in the directory next to the personal geodatabase. When you delete a raster dataset, the raster in the IDB folder is permanently deleted.

When storing a mosaic dataset in a personal geodatabase, the mosaic dataset is a table that points to the stored raster datasets it contains. In a mosaic dataset, the raster datasets are stored as unmanaged; therefore, it contains the path location where the raster datasets are stored. Each row in the business table points to the stored raster dataset. The operations on a mosaic dataset do not affect the stored raster files; therefore, if you delete the raster datasets in a mosaic dataset, they will only be deleted from the mosaic dataset and not from the disk.

When storing a raster dataset as an attribute, the raster is stored as an IMG file in the system-defined location or as it is in the file system; this depends on whether it is managed or not.

Compression, pyramids, and tile size

There are other storage structures to consider when storing and managing raster data, including compression, downsampled datasets (pyramids and overviews), and tile size.

Compression

There are two types of compression: lossless and lossy. Lossless compression means the values of pixels in the raster dataset are not changed, whereas lossy compression results in altered pixel values. The amount of compression depends on the type of pixel data; the more homogeneous the image, the higher the compression ratio. You should store data that will be used for analysis, not just display, using a lossless compression. The primary benefit of compressing your data is that it requires less storage space; the amount of savings depends on the method of compression and the redundancy in the data. An added benefit is the overwhelmingly improved performance because you are transferring fewer packets of data. For example, when accessing raster data over a network with low bandwidth, the use of compression can offer improved performance because the amount of information to be transferred is reduced significantly, making it possible to store large, seamless raster datasets and serve them quickly to a client for display.

Learn more about raster compression

Mosaic datasets also have compression. This is not for storage of the raster dataset being managed, but it is for the compression applied to the image it generates when displayed. This also aids when accessing data over the network by reducing the size of file that is transferred.

Learn about the Allowed Compression Method property

Downsampled datasets

Downsampled datasets are rasters created from the original data for either raster datasets or mosaic datasets. They are generated to improve display speed and performance. When they are created for raster datasets, they are called pyramids, and when they are created for mosaic datasets, they are called overviews.

Pyramids versus overviews

Pyramids

Overviews

Created for

Raster datasets

Mosaic datasets

Format

Writes .ovr files—with a few exceptions.

Reads pyramids stored externally as .ovr or .rrd or internally.

Writes as .tif files.

Storage

In a single file that generally resides next to the source raster dataset using the same name.

By default, in a folder next to the geodatabase with a *.overviews extension, or internally for ArcSDE.

Storage location is customizable.

Storage size

2 to 10% (compared to original raster datasets)

Downsampling factor

2

3

Extent

  • Each pyramid level covers the entire raster dataset.
  • You can specify the number of levels to generate.

  • Can cover part of or all of a mosaic dataset.
  • Each level can consist of one or more images.

Options when building

  • Number of levels to create
  • Resampling method
  • Compression method & quality

  • Number of levels to create
  • Tile size
  • Base pixel size
  • Resampling method
  • Compression method and quality
  • Output location
  • Extent Sampling factor

Pyramids vs. overviews

Learn more about raster pyramids

Learn more about mosaic dataset overviews

Tile size

In an enterprise geodatabase, raster data is stored in a structure where the data is tiled, indexed, pyramided, and most often compressed. Because of tiling, indexing, and pyramiding, each time the raster data is queried, only the tiles necessary to satisfy the extent and resolution of the query are returned instead of the whole dataset. The tile size controls the number of pixels you want to store in each database memory block. This is specified as a number of pixels in x and y. The default tile size is 128 by 128 pixels, and most applications do not warrant deviating from these default values. In an enterprise geodatabase, the tiles of raster data are compressed before storing them in the geodatabase.

Related Topics

5/18/2014