Using an innovative approach that combines Geographic Information Science, remote sensing technology, and machine learning algorithms, ORNL’s LandScan is the community standard for global population distribution. At 30 arc-second (approximately 1 km) resolution, LandScan is the finest resolution global population distribution data available representing an “ambient population” (average over 24 hours). The LandScan algorithm, an R&D 100 Award Winner, uses spatial data, high-resolution imagery exploitation, and a multi-variable dasymetric modeling approach to disaggregate census counts within an administrative boundary. LandScan population data are spatially explicit - unlike tabular Census data. Since no single population distribution model can account for the differences in spatial data availability, quality, scale, and accuracy as well as the differences in cultural settlement practices, LandScan population distribution models are tailored to match the data conditions and geographical nature of each individual country and region. By modeling an ambient population, LandScan Global captures the full potential activity space of people throughout the course of the day and night rather than just a residential location.
Format and extent: The data are currently distributed in ESRI grid format. The dataset has 20,880 rows and 43,200 columns covering North 84 degrees to South 90 degrees and West 180 degrees to East 180 degrees.
Data Values: The values of the cells are integer population counts representing an average, or ambient, population distribution. An ambient population integrates diurnal movements and collective travel habits into a single measure (Dobson et al. 2000). Since natural or man made emergencies may occur at any time of the day, the goal of the LandScan Global model is to develop a population distribution surface in totality, not just the locations of where people sleep. Because of this ambient nature, care should be taken with direct comparisons of LandScan Global data with other population distribution surfaces.
Resolution and Coordinate System: The dataset has a spatial resolution of 30 arc-seconds and is output in a geographical coordinate system – World Geodetic System (WGS) 84 datum. The 30 arc-second cell, or 0.008333333 decimal degrees, represents approximately 1 km2 near the equator. Since the data are in a spherical coordinate system, cell width decreases in a relationship that varies with the cosine of the latitude of the cell. Thus a cell at 60 degrees latitude would have a width that is half that of a cell at the equator (cos60 = 0.5). The height of the cells does not vary. The values of the cells are integer population counts, not population density, since the cells vary in size. Population counts are normalized to sum to each sub-national administrative unit estimate. For this reason, projecting the data in a raster format to a different coordinate system (including on-the-fly projections) will result in a re-sampling of the data and the integrity of normalized population counts will be compromised. Also prior to all spatial analysis, users should ensure that extents are set to an exact multiple of the cell size (for example 35.25, 35.50, 35.0) to avoid “shifting” of the dataset.
Data Revisions:The database is updated annually by incorporating new spatial data and imagery analysis into the distribution algorithms. Comparing different versions of the dataset on a cell to cell basis may result in misleading conclusions. Some of the differences between LandScan Global dataset versions are due to recently developed urban or suburban expansion. However, there are many cases where a village identified with high resolution imagery may have existed for years, but was either not represented or was identified in an incorrect location in various spatial data products.
To produce estimated population counts at 30 arc-second resolution, LandScan Global employs a multi-layered, dasymetric, spatial modeling approach. In dasymetric mapping, a source layer is converted to a surface and an ancillary data layer is added to the surface with a weighting scheme applied to cells coinciding with identified or derived density level values in the ancillary data. In the LandScan Global models, the basic dasymetric model is improved by integrating and employing multiple ancillary or indicator data layers. The modeling process uses sub-national level census counts for each country and primary geospatial input or ancillary datasets, including land cover, roads, slope, urban areas, village locations, and high resolution imagery analysis; all of which are key indicators of population distribution. Based upon the spatial data and the socioeconomic and cultural understanding of an area, cells are preferentially weighted for the possible occurrence of population during a day. Within each country, the population distribution model calculates a “likelihood” coefficient for each cell and applies the coefficients to the census counts, which are employed as control totals for appropriate areas. The total population for that area is then allocated to each cell proportionally to the calculated population coefficient. The resultant population count is an ambient or average day/night population count.
Positional or attribute errors and anomalies are to be anticipated in large volumes of disparate spatial data. The LandScan Global methodology includes a manual verification and modification process to improve the spatial precision and relative magnitude of the population distribution. Imagery analysts identify obvious population distribution errors and create an additional spatial data layer of population likelihood coefficient modifications to correct or mitigate input data anomalies. Output cells are converted to points with an attribute field for the cell modification values. Many modifications are made to urban areas and urban extents. Derived land cover data often do not reveal urban properties such as building densities or building heights that can be readily inferred with visual inspection using high resolution imagery. Manual corrections to the likelihood coefficient file using high resolution imagery are made for each country as time and budget constraints allow.
LandScan Global was first produced in 1998 as an improved resolution global population distribution database for estimating populations at risk. The original LandScan Global algorithms integrated globally consistent, but relatively coarse, spatial data. The input data for each nation or region were assigned customized weighting factors in a spatial model to characterize diverse settlement patterns. The last decade has seen significant growth of global spatial data and a tremendous increase in the volume of high resolution satellite imagery. These data and imagery present an opportunity to improve the spatial fidelity of annual data releases. However, the production of new spatial data is often fragmented, and the LandScan Global algorithms must account for disparate input data resolutions and temporal incongruities. Selected spatial data layers used in the LandScan Global modeling process are listed below.
Census Information: LandScan Global 2010 and earlier versions used annual mid-year national population estimates from the Geographic Studies Branch, US Bureau of Census. Intermittent populations such as temporary relief workers, some military outposts, scientific expeditions or tourists are not included in these estimates. These mid-year estimates may not reflect seasonal migrations, internally displaced persons (IDPs), or refugee movements since the last official census conducted by a country.
Administrative Boundaries: : Accurate administrative boundary attributes are essential to the LandScan Global models since the population projections are joined to the boundaries which act as spatial controls for the population totals. Each year the models incorporate administrative boundary changes, refine the spatial precision of international and sub-national administrative boundaries, and reconcile temporal census information and administrative boundary inconsistencies. The administrative unit level by which the census data is distributed varies considerably in size and spatial precision from country to country. The number of administrative units per nation and spatial fidelity of the boundaries are considered in the model parameterization process. Nations with few, but very large administrative areas require different weights in the model parameters to allocate representative populations to their appropriate locations. Generally, smaller administrative boundaries lead to better population distribution – if the boundaries are spatially accurate. However, small administrative areas that are poorly geo-referenced or spatially characterized actually induce population distribution errors. To mitigate these errors, where possible, analysts will merge poor sub-province boundaries to the province level and distribute the entire province population according to the population likelihood locations determined by the model rather than constrict population distributions to incorrect locations. Very small administrative or enumeration areas equivalent to US census blocks or block groups have unintended consequences for modeling an ambient population. Since the populations associated with census tables are places of residence, commercial and industrial areas may have zero or very low populations associated with them. Thus the output would be reflective of a residential only population distribution instead of an ambient population distribution.
Land Cover: Accurate land cover data is an integral component of the population distribution models. The land cover data used by the model represents an assemblage of diverse spatial data sources of assorted resolutions. Data from the National Geospatial-Intelligence Agency (NGA), federal producers of land cover data such as USGS, NASA, and NOAA, and other federal agencies, selected international sources, and land cover produced through ORNL imagery analysis are all incorporated into the final land cover database. Much of the world’s land cover has been processed using 30 meter or higher resolution imagery. Very high resolution imagery (~1 meter) is processed at ORNL using novel image processing algorithms and high-performance computers to delineate and update developed areas. These modifications are incorporated into a global land cover database. Analysts assign relative weights to each land cover type based upon the region’s cultural settlement patterns and agricultural intensity practices, and employ these weights in calculating the probability coefficient for each cell.
Other Spatial Data: Elevation and slope are also important indicators of population distribution potential. Although on occasion humans will frequent the very highest elevations on the planet, there are elevation levels above which consistent habitation is impractical. Likewise, very steep slopes inhibit settlements, agriculture, and industrial development – the most likely areas of ambient population distribution. Various vector data layers used in the modeling algorithms include roads, populated areas (urban boundaries), and populated points (towns and villages). Each data layer serves as an indicator of likely population locations. Analysts must reconcile spatial inconsistencies due to data scale, accuracy and currency for each data layer to coincide with the local settlement characteristics.
Coastlines: Because of their intricate spatial patterns, coastlines require very high resolution to represent coastal features accurately. Since many coastal areas are dynamic landscapes, shorelines change and coastal islands may grow, shift positions, or disappear entirely. Popular coastline databases may intersect current land features thereby potentially missing populated areas. For this reason, the LandScan Global models extend all coastal boundaries several kilometers seaward to ensure all shore and small island features are encapsulated within an administrative unit boundary. Instead of a vector shoreline, the land cover data and high resolution imagery are used to capture the populated areas along the shore.
Imagery: High resolution imagery is employed in every phase of the LandScan Global population distribution modeling process. At the outset high resolution imagery is used to identify settlement patterns and building characteristics. Imagery is used to evaluate the accuracy and precision of the different spatial data layers used in the models as well as to adapt the weighting factor for each layer in the model algorithms. Preliminary model output is superimposed on high resolution imagery to verify relative population distributions and magnitude. As new spatial data are received, iterative modifications to variable weights in the likelihood coefficient file are made and the distribution algorithms are re-calculated. Additionally, high resolution imagery is used to create or modify existing spatial data layers, especially to update or refine the land cover data related to urban boundary delineations. To speed processing of vast image archives, an automated urban boundary delineation algorithm based on texture and edge information extracted from high-resolution images is being developed.