HCID: Grid Databases at Multiple Spatial Resolutions

To facilitate the exchange of HarvestChoice-developed datasets and analysis results with broader geospatial community, a standard/systematic global grid database was developed for multiple spatial resolutions (from 1 degree to 30 arc-second). The new grid database, called HCID, can be used as a key identifier that links and harmonizes various themes of raster datasets as well as aggregates them even at multiple resolutions. This can be helpful not only for GIS analysts but also for researchers who would need to handle the datasets in relational database management systems.

A new set of grid databases, HCID, was developed by HarvestChoice in collaboration with Robert Hijmans to facilitate data exchange and analysis across the HarvestChoice projects and beyond. HCID can be also used in any discipline where much of the spatial analysis relies on grid (raster) type data.

Grid datasets typically have a continental or global extent and are stored and processed in different formats. Some files cover the world; others cover a particular region (e.g., Africa). The global grids available here provide cell number identifiers, country identifiers, the fraction area that is land, and values representing the area covered by each cell. There are grids for a number of different but consistently-generated resolutions (30 seconds, 5, 10, 30 minutes, and 1 degree).

Grids and Resolution

We define five grids with a global extent using a "geographic" projection. Thus the corner coordinates are (in decimal degree):

  • Upper left corner is at lon = -180.0, lat = 90.0
  • Lower-right corner is at lon = 180.0, lat = -90.0

Different "resolutions" have different cell sizes. The one degree global grid has 360 columns and 180 rows, thus (360 x 180 =) 64800 cells (0, 1, 2, ..., 64799).

To avoid confusion between grid cell numbers for grids with a different resolution, we refer to the cell numbers of these grids as: cell1d, cell30m, cell10m, cell5m, and cell30s. For example cell5m = 720 refers to the column 720 and row 0 while cell1d = 720 refers to column 0 and row 3 on their respective grids.

Countries

In the country grids, cell values are numeric codes that identify a country. The link between the identifier and the country name can be made via an access database or with this text file. The country grids were created by converting the GADM (Global Administrative Areas), http://gadm.org version 0.9 polygons to a 30 second global grid, and aggregating using the mode (most common value).

Area

As we deal with un-projected grids (latitude/longitude) spatial units are in degrees, and cell resolution is constant in degrees, but not in m2. This is because one degree longitude is about 0.83 km at the equator, but 0 at the poles. The area grids provide an estimate of the size of each cell (in km2).

Fraction Land Area

Identifies the fraction of the grid cell that is land area. Derived from the 30 seconds country data.

Unique Cell Identifiers

For a grid of a certain resolution, irrespective of its extent, it is possible to always use the same identifiers for a specific grid cell, even if you are using only a subset of the data (e.g. Africa). Such a consistent numbering system assures a smooth data exchange and analysis. Grid (or raster) data consists of a rectangular area divided into rectangular (typically square) cells. In the example grid drawn [1], the green area represents the grid, with in red the cell numbers starting at zero. There are 10 columns and 5 rows, hence 10 x 5 = 50 cells.

The sequential numbering starts in the upper-left corner, moves to the right, and then to the next line, ending in the lower-right corner. For computational reasons is easier to start with 0 than with 1 (because in most computer languages arrays are indexed from 0...n and not from 1...n). Therefore the identifier of the last cell will be the total number of cells minus one, which is  10 x 5 - 1 = 49  in this example. Row and column numbers also start with zero. Cell numbers only have meaning for a specific grid (computationally only the number of columns must be the same; but semantically the grid also has the same spatial extent and resolution).

While a cell could be referred to with a row and column number, it is in most cases much easier to have a single unique identifier (in the case of simpler queries for example), also because for necessary cases, computing the row or column number from a cell number is relatively trivial. The next page shows a number of example functions in R language that are useful in this context. Such functions can also be easily implemented in other programming languages.

Download

 

  • Raster: Grid cells are in the ESRI ASCII raster type format. For each resolution there are three files. One file with cell numbers (filename = "hc_seq_*"), i.e. the unique identifier for each cell. There is a also a file indicating the country to which (the majority of) that cell belongs (filename = "hc_cnt_*"). Countries are identified with a numeric code that is linked to country names in the access database and also here. The country grids were created by aggregating from a 30 second grid, using the mode (most common value). Finally, there is a file in which the value represents the area of that cell in km2 (filename = "hc_area_*").
  • Vector: Same as above for the raster data, but the data are stored in a shapefile format: hc_grid_shp.zip. That is, each raster cell is a rectangular polygon with the cell number as an attribute for easy (albeit perhaps inefficient) linking and displaying of data. Grid cells that only cover oceans or seas are not included.
  • Cell database: The database file (hc_grid_mdb.zip) is in the Microsoft Access format, and it has three tables: "cells", "countries", and "gridspecs". Table cells links the cell numbers of the different resolutions and also links these to the country numbers. This can be used to (dis) aggregate data, for example to distribute data at a 1 degree resolution to different countries. Table "countries" provides the link between the country codes "CID" and country names. The "gridspecs" table provides some essential parameters for each grid such as number of rows.
  • Country boundaries: The country boundaries used to make the country grids is from the GADM version 0.9.
  • Gridded data is included for all resolutions except 30 seconds resolution because of the large file sizes. If you require cell numbers of country identifiers at this resolution you can use these  grids: 30 second cell numbers (hc_seq30s_asc.zip) this one is a very big download) and for countries (hc_cnt30s_asc.zip).
  • All of these files are provide for reference; in many cases using a raster type file format and a programmatic approach to calculating cell numbers may be more efficient then linking to these files.

Citation

HarvestChoice, 2010. "HCID: Grid Databases at Multiple Spatial Resolutions." International Food Policy Research Institute, Washington, DC., and University of Minnesota, St. Paul, MN. Available online at http://harvestchoice.org/node/2232.

Aug 6, 2010