Skip to content

earthaccess API

earthaccess is a Python library that simplifies data discovery and access to NASA Earth science data by providing a higher abstraction for NASA’s Search API (CMR) so that searching for data can be done using a simpler notation instead of low level HTTP queries.

This library handles authentication with NASA’s OAuth2 API (EDL) and provides HTTP and AWS S3 sessions that can be used with xarray and other PyData libraries to access NASA EOSDIS datasets directly allowing scientists get to their science in a simpler and faster way, reducing barriers to cloud-based data analysis.


collection_query()

Returns a query builder instance for NASA collections (datasets).

Returns:

Type Description
CollectionQuery

a query builder instance for data collections.

download(granules, local_path=None, provider=None, threads=8, *, show_progress=None, credentials_endpoint=None, pqdm_kwargs=None, force=False)

Retrieves data granules from a remote storage system. Provide the optional local_path argument to prevent repeated downloads.

  • If we run this in the cloud, we will be using S3 to move data to local_path.
  • If we run it outside AWS (us-west-2 region) and the dataset is cloud hosted, we'll use HTTP links.

Parameters:

Name Type Description Default
granules Union[DataGranule, List[DataGranule], str, List[str]]

a granule, list of granules, a granule link (HTTP), or a list of granule links (HTTP)

required
local_path Optional[Union[Path, str]]

Local directory to store the remote data granules. If not supplied, defaults to a subdirectory of the current working directory of the form data/YYYY-MM-DD-UUID, where YYYY-MM-DD is the year, month, and day of the current date, and UUID is the last 6 digits of a UUID4 value.

None
provider Optional[str]

if we download a list of URLs, we need to specify the provider.

None
credentials_endpoint Optional[str]

S3 credentials endpoint to be used for obtaining temporary S3 credentials. This is only required if the metadata doesn't include it, or we pass urls to the method instead of DataGranule instances.

None
threads int

parallel number of threads to use to download the files, adjust as necessary, default = 8

8
show_progress Optional[bool]

whether or not to display a progress bar. If not specified, defaults to True for interactive sessions (i.e., in a notebook or a python REPL session), otherwise False.

None
pqdm_kwargs Optional[Mapping[str, Any]]

Additional keyword arguments to pass to pqdm, a parallel processing library. See pqdm documentation for available options. Default is to use immediate exception behavior and the number of jobs specified by the threads parameter.

None
force bool

Force a redownload. By default, existing local files are not overwritten.

False

Returns:

Type Description
List[Path]

List of downloaded files

Raises:

Type Description
Exception

A file download failed.

get_edl_token()

Returns the current token used for EDL.

Returns:

Type Description
str

EDL token

get_fsspec_https_session()

Returns a fsspec session that can be used to access datafiles across many different DAACs.

Returns:

Type Description
AbstractFileSystem

An fsspec instance able to access data across DAACs.

Examples:

import earthaccess

earthaccess.login()
fs = earthaccess.get_fsspec_https_session()
with fs.open(DAAC_GRANULE) as f:
    f.read(10)

get_requests_https_session()

Returns a requests Session instance with an authorized bearer token. This is useful for making requests to restricted URLs, such as data granules or services that require authentication with NASA EDL.

Returns:

Type Description
Session

An authenticated requests Session instance.

Examples:

import earthaccess

earthaccess.login()

req_session = earthaccess.get_requests_https_session()
data = req_session.get(granule_url, headers = {"Range": "bytes=0-100"})

get_s3_credentials(daac=None, provider=None, results=None)

Returns temporary (1 hour) credentials for direct access to NASA S3 buckets. We can use the daac name, the provider, or a list of results from earthaccess.search_data(). If we use results, earthaccess will use the metadata on the response to get the credentials, which is useful for missions that do not use the same endpoint as their DAACs, e.g. SWOT.

Parameters:

Name Type Description Default
daac Optional[str]

a DAAC short_name like NSIDC or PODAAC, etc.

None
provider Optional[str]

if we know the provider for the DAAC, e.g. POCLOUD, LPCLOUD etc.

None
results Optional[List[DataGranule]]

List of results from search_data()

None

Returns:

Type Description
Dict[str, Any]

a dictionary with S3 credentials for the DAAC or provider

get_s3_filesystem(daac=None, provider=None, results=None, endpoint=None)

Return an s3fs.S3FileSystem for direct access when running within the AWS us-west-2 region.

Parameters:

Name Type Description Default
daac Optional[str]

Any DAAC short name e.g. NSIDC, GES_DISC

None
provider Optional[str]

Each DAAC can have a cloud provider. If the DAAC is specified, there is no need to use provider.

None
results Optional[DataGranule]

A list of results from search_data(). earthaccess will use the metadata from CMR to obtain the S3 Endpoint.

None
endpoint Optional[str]

URL of a cloud provider credentials endpoint to be used for obtaining AWS S3 access credentials.

None

Returns:

Type Description
S3FileSystem

An authenticated s3fs session valid for 1 hour.

get_s3fs_session(daac=None, provider=None, results=None)

Returns a fsspec s3fs file session for direct access when we are in us-west-2.

Parameters:

Name Type Description Default
daac Optional[str]

Any DAAC short name e.g. NSIDC, GES_DISC

None
provider Optional[str]

Each DAAC can have a cloud provider. If the DAAC is specified, there is no need to use provider.

None
results Optional[DataGranule]

A list of results from search_data(). earthaccess will use the metadata from CMR to obtain the S3 Endpoint.

None

Returns:

Type Description
S3FileSystem

An s3fs.S3FileSystem authenticated for reading in-region in us-west-2 for 1 hour.

granule_query()

Returns a query builder instance for data granules.

Returns:

Type Description
GranuleQuery

a query builder instance for data granules.

login(strategy='all', persist=False, system=PROD)

Authenticate with Earthdata login (https://urs.earthdata.nasa.gov/).

Attempt to login via only the specified strategy, unless the "all" strategy is used, in which case each of the individual strategies is attempted in the following order, until one succeeds: "environment", "netrc", "interactive". In this case, only when all strategies fail does login fail.

Parameters:

Name Type Description Default
strategy str

An authentication method.

  • "all": try each of the following methods, in order, until one succeeds.
  • "environment": retrieve either an Earthdata login token from the EARTHDATA_TOKEN environment variable, or a username and password pair from the EARTHDATA_USERNAME and EARTHDATA_PASSWORD environment variables (specifying a token takes precedence).
  • "netrc": retrieve username and password from ~/.netrc (or ~/_netrc on Windows), or from the file specified by the NETRC environment variable.
  • "interactive": enter username and password via interactive prompts.
'all'
persist bool

if True, persist credentials to a .netrc file

False
system System

the Earthdata system to access

PROD

Returns:

Type Description
Auth

An instance of Auth.

Raises:

Type Description
LoginAttemptFailure

If the NASA Earthdata Login service rejects credentials.

open(granules, provider=None, *, credentials_endpoint=None, show_progress=None, pqdm_kwargs=None, open_kwargs=None)

Returns a list of file-like objects that can be used to access files hosted on S3 or HTTPS by third party libraries like xarray.

Parameters:

Name Type Description Default
granules Union[List[str], List[DataGranule]]

a list of granule instances or list of URLs, e.g. s3://some-granule. If a list of URLs is passed, we need to specify the data provider.

required
provider Optional[str]

e.g. POCLOUD, NSIDC_CPRD, etc.

None
show_progress Optional[bool]

whether or not to display a progress bar. If not specified, defaults to True for interactive sessions (i.e., in a notebook or a python REPL session), otherwise False.

None
pqdm_kwargs Optional[Mapping[str, Any]]

Additional keyword arguments to pass to pqdm, a parallel processing library. See pqdm documentation for available options. Default is to use immediate exception behavior and the number of jobs specified by the threads parameter.

None
open_kwargs Optional[Dict[str, Any]]

Additional keyword arguments to pass to fsspec.open, such as cache_type and block_size. Defaults to using blockcache with a block size determined by the file size (4 to 16MB).

None

Returns:

Type Description
List[AbstractFileSystem]

A list of "file pointers" to remote (i.e. s3 or https) files.

search_data(count=-1, **kwargs)

Search for dataset files (granules) using NASA's CMR.

https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html

The CMR does not permit queries across all granules in all collections in order to provide fast search responses. Granule queries must target a subset of the collections in the CMR using a condition like provider, provider_id, concept_id, collection_concept_id, short_name, version or entry_title.

Parameters:

Name Type Description Default
count int

Number of records to get, -1 = all

-1
kwargs Dict

arguments to CMR:

  • short_name: (str) Filter granules by product short name; e.g. ATL08
  • version: (str) Filter by dataset version
  • daac: (str) a provider code for any DAAC, e.g. NSIDC or PODAAC
  • data_center; (str) An alias for daac
  • provider: (str) Only match granules from a given provider. A DAAC can have more than one provider, e.g PODAAC and POCLOUD, NSIDC_ECS and NSIDC_CPRD.
  • cloud_hosted: (bool) If True, only match granules hosted in Earthdata Cloud
  • downloadable: (bool) If True, only match granules that are downloadable. A granule is downloadable when it contains at least one RelatedURL of type GETDATA.
  • online_only: (bool) Alias of downloadable
  • orbit_number; (float) Filter granule by the orbit number in which a granule was acquired
  • granule_name; (str) Filter by granule name. Granule name can contain wild cards, e.g MODGRNLD.*.daily.*.
  • instrument; (str) Filter by instrument name, e.g. "ATLAS"
  • platform; (str) Filter by platform, e.g. satellite or plane
  • cloud_cover: (tuple) Filter by cloud cover. Tuple is a range of cloud covers, e.g. (0, 20). Cloud cover values in metadata may be fractions (i.e. (0.,0.2)) or percentages. CMRS searches for cloud cover range based on values in metadata. Note collections without cloud_cover in metadata will return zero granules.
  • day_night_flag: (str) Filter for day- and night-time images, accepts 'day', 'night', 'unspecified'.
  • temporal: (tuple) A tuple representing temporal bounds in the form (date_from, date_to). Dates can be datetime objects or ISO 8601 formatted strings. Date strings can be full timestamps; e.g. YYYY-MM-DD HH:mm:ss or truncated YYYY-MM-DD
  • bounding_box: (tuple) Filter collection by those that intersect bounding box. A tuple representing spatial bounds in the form (lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)
  • polygon: (list[tuples]) Filter by polygon. Polygon must be a list of tuples containing longitude-latitude pairs representing polygon vertices. Vertices must be in counter-clockwise order and the final vertex must be the same as the first vertex; e.g. [(lon1,lat1),(lon2,lat2),(lon3,lat3), (lon4,lat4),(lon1,lat1)]
  • point: (tuple(float,float)) Filter by collections intersecting a point, where the point is a longitude-latitude pair; e.g. (lon,lat)
  • line: (list[tuples]) Filter collections that overlap a series of connected points. Points are represented as tuples containing longitude-latitude pairs; e.g. [(lon1,lat1),(lon2,lat2),(lon3,lat3)]
  • circle: (tuple(float, float, float)) Filter collections that intersect a circle defined as a point with a radius. Circle parameters are a tuple containing latitude, longitude and radius in meters; e.g. (lon, lat, radius_m). The circle center cannot be the north or south poles. The radius mst be between 10 and 6,000,000 m
{}

Returns:

Type Description
List[DataGranule]

a list of DataGranules that can be used to access the granule files by using download() or open().

Raises:

Type Description
RuntimeError

The CMR query failed.

Examples:

granules = earthaccess.search_data(
    short_name="ATL06",
    bounding_box=(-46.5, 61.0, -42.5, 63.0),
    )
granules = earthaccess.search_data(
    doi="10.5067/SLREF-CDRV2",
    cloud_hosted=True,
    temporal=("2002-01-01", "2002-12-31")
)

search_datasets(count=-1, **kwargs)

Search datasets (collections) using NASA's CMR.

https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html

Parameters:

Name Type Description Default
count int

Number of records to get, -1 = all

-1
kwargs Dict

arguments to CMR:

  • keyword: (str) Filter collections by keywords. Case-insensitive and supports wildcards ? and *
  • short_name: (str) Filter collections by product short name; e.g. ATL08
  • doi: (str) Filter by DOI
  • daac: (str) Filter by DAAC; e.g. NSIDC or PODAAC
  • data_center: (str) An alias for daac
  • provider: (str) Filter by data provider; each DAAC can have more than one provider, e.g. POCLOUD, PODAAC, etc.
  • has_granules: (bool) If true, only return collections with granules. Default: True
  • temporal: (tuple) A tuple representing temporal bounds in the form (date_from, date_to). Dates can be datetime objects or ISO 8601 formatted strings. Date strings can be full timestamps; e.g. YYYY-MM-DD HH:mm:ss or truncated YYYY-MM-DD
  • bounding_box: (tuple) Filter collection by those that intersect bounding box. A tuple representing spatial bounds in the form (lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)
  • polygon: (List[tuples]) Filter by polygon. Polygon must be a list of tuples containing longitude-latitude pairs representing polygon vertices. Vertices must be in counter-clockwise order and the final vertex must be the same as the first vertex; e.g. [(lon1,lat1),(lon2,lat2),(lon3,lat3),(lon4,lat4),(lon1,lat1)]
  • point: (Tuple[float,float]) Filter by collections intersecting a point, where the point is a longitude-latitude pair; e.g. (lon,lat)
  • line: (List[tuples]) Filter collections that overlap a series of connected points. Points are represented as tuples containing longitude-latitude pairs; e.g. [(lon1,lat1),(lon2,lat2),(lon3,lat3)]
  • circle: (List[float, float, float]) Filter collections that intersect a circle defined as a point with a radius. Circle parameters are a list containing latitude, longitude and radius in meters; e.g. [lon, lat, radius_m]. The circle center cannot be the north or south poles. The radius mst be between 10 and 6,000,000 m
  • cloud_hosted: (bool) Return only collected hosted on Earthdata Cloud. Default: True
  • downloadable: (bool) If True, only return collections that can be downloaded from an online archive
  • concept_id: (str) Filter by Concept ID; e.g. C3151645377-NSIDC_CPRD
  • instrument: (str) Filter by Instrument name; e.g. ATLAS
  • project: (str) Filter by project or campaign name; e.g. ABOVE
  • fields: (List[str]) Return only the UMM fields listed in this parameter
  • revision_date: tuple(str,str) Filter by collections that have revision date within the range
  • debug: (bool) If True prints CMR request. Default: True
{}

Returns:

Type Description
List[DataCollection]

A list of DataCollection results that can be used to get information about a dataset, e.g. concept_id, doi, etc.

Raises:

Type Description
RuntimeError

The CMR query failed.

Examples:

datasets = earthaccess.search_datasets(
    keyword="sea surface anomaly",
    cloud_hosted=True
)
results = earthaccess.search_datasets(
    daac="NSIDC",
    bounding_box=(-73., 58., -10., 84.),
)
results = earthaccess.search_datasets(
    instrument="ATLAS",
    bounding_box=(-73., 58., -10., 84.),
    temporal=("2024-09-01", "2025-04-30"),
)

search_services(count=-1, **kwargs)

Search the NASA CMR for Services matching criteria.

See https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#service.

Parameters:

Name Type Description Default
count int

maximum number of services to fetch (if less than 1, all services matching specified criteria are fetched [default])

-1
kwargs Any

keyword arguments accepted by the CMR for searching services

{}

Returns:

Type Description
List[Any]

list of services (possibly empty) matching specified criteria, in UMM

List[Any]

JSON format

Examples:

services = search_services(provider="POCLOUD", keyword="COG")

status(system=PROD, raise_on_outage=False)

Get the statuses of NASA's Earthdata services.

Parameters:

Name Type Description Default
system System

The Earthdata system to access, defaults to PROD.

PROD
raise_on_outage bool

If True, raises exception on errors or outages.

False

Returns:

Type Description
dict[str, str]

A dictionary containing the statuses of Earthdata services.

Examples:

>>> earthaccess.status()
{'Earthdata Login': 'OK', 'Common Metadata Repository': 'OK'}
>>> earthaccess.status(earthaccess.UAT)
{'Earthdata Login': 'OK', 'Common Metadata Repository': 'OK'}

Raises:

Type Description
ServiceOutage

if at least one service status is not "OK"

Virtual dataset utilities for cloud-native access to NASA Earthdata granules.

This subpackage provides tools for creating virtual xarray Datasets from NASA Earthdata granules without downloading data, using VirtualiZarr parsers.

Public API

  • virtualize — create a virtual (or loaded) xarray Dataset from granules.
  • SUPPORTED_PARSERS — frozenset of recognised parser name strings.
  • get_granule_credentials_endpoint_and_region — resolve S3 credentials for a granule (useful for advanced direct-access workflows).

Requires the earthaccess[virtualizarr] optional extra.

get_granule_credentials_endpoint_and_region(granule)

Return the S3 credentials endpoint and region for a granule.

The endpoint is read from the granule's UMM-G record first. If absent, a CMR collection query is performed and the information is taken from the UMM-C record.

Parameters:

Name Type Description Default
granule DataGranule

A single DataGranule object.

required

Returns:

Type Description
str

A tuple of (credentials_endpoint, region). region defaults to

str

"us-west-2" when not present in the collection record.

Raises:

Type Description
ValueError

If no S3CredentialsAPIEndpoint can be resolved from either the granule or its parent collection.

virtualize(granules, *, access='direct', load=False, group='/', concat_dim=None, preprocess=None, data_vars='all', coords='different', compat='no_conflicts', combine_attrs='drop_conflicts', parallel='dask', parser='DMRPPParser', reference_dir=None, reference_format='json', **xr_combine_kwargs)

Create a virtual xarray Dataset from NASA Earthdata granules.

Uses VirtualiZarr to open granules as virtual datasets backed by cloud object storage without downloading data. By default returns a virtual dataset (load=False); set load=True to return a concrete lazily-loaded xarray Dataset via a kerchunk round-trip.

The parser controls which VirtualiZarr backend reads the files. The default "DMRPPParser" is the fastest option and uses NASA pre-computed DMR++ sidecar files. When those sidecars are absent earthaccess automatically falls back to "HDFParser" and emits a UserWarning.

Parameters:

Name Type Description Default
granules list[DataGranule]

One or more DataGranule objects from earthaccess.search_data().

required
access AccessType

Cloud access mode. "direct" uses S3 (fastest inside AWS us-west-2); "indirect" uses HTTPS (works anywhere).

'direct'
load bool

When False (default) returns a virtual dataset with ManifestArray variables. When True materialises the references via a kerchunk round-trip and returns a concrete, lazily-loaded xr.Dataset backed by dask arrays.

False
group str

HDF5/NetCDF4 group path to open. Defaults to the root group "/".

'/'
concat_dim str | None

Dimension name used to concatenate granules. Required when len(granules) > 1.

None
preprocess Callable[[Dataset], Dataset] | None

Optional callable applied to each single-granule virtual dataset before combining.

None
data_vars DataVarsType

Forwarded to xarray.combine_nested.

'all'
coords str

Forwarded to xarray.combine_nested.

'different'
compat CompatType

Forwarded to xarray.combine_nested.

'no_conflicts'
combine_attrs CombineAttrsType

Forwarded to xarray.combine_nested.

'drop_conflicts'
parallel ParallelType

Parallelism backend. "dask" (default) wraps opens in dask.delayed; "lithops" uses Lithops; False disables parallelism.

'dask'
parser ParserType

VirtualiZarr parser to use. One of "DMRPPParser" (default), "HDFParser", "NetCDF3Parser", a lowercase alias ("dmrpp", "hdf", "hdf5", "netcdf3"), or a pre-instantiated parser object.

'DMRPPParser'
reference_dir str | None

Directory for kerchunk reference files when load=True. A temporary directory is used when None.

None
reference_format ReferenceFormatType

Serialisation format when load=True. "json" (default) or "parquet".

'json'
**xr_combine_kwargs Any

Additional keyword arguments forwarded to xarray.combine_nested.

{}

Returns:

Type Description
Dataset

An xr.Dataset. With load=False the dataset contains

Dataset

ManifestArray variables; with load=True it contains dask

Dataset

arrays backed by the kerchunk reference store.

Raises:

Type Description
ValueError

If granules is empty.

ValueError

If len(granules) > 1 and concat_dim is None.

ValueError

If parser is an unrecognised string.

ImportError

If earthaccess[virtualizarr] is not installed.

Examples:

import earthaccess

granules = earthaccess.search_data(
    count=5,
    temporal=("2024-01-01", "2024-01-05"),
    short_name="MUR-JPL-L4-GLOB-v4.1",
)

# Virtual dataset (no data downloaded)
vds = earthaccess.virtualize(granules, access="indirect", concat_dim="time")
vds.virtualize.to_kerchunk("mur_combined.json", format="json")

# Loaded dataset (kerchunk round-trip, lazy dask arrays)
ds = earthaccess.virtualize(granules, access="direct", load=True, concat_dim="time")