earthaccess API

earthaccess is a Python library that simplifies data discovery and access to NASA Earth science data by providing a higher abstraction for NASA’s Search API (CMR) so that searching for data can be done using a simpler notation instead of low level HTTP queries.

This library handles authentication with NASA’s OAuth2 API (EDL) and provides HTTP and AWS S3 sessions that can be used with xarray and other PyData libraries to access NASA EOSDIS datasets directly allowing scientists get to their science in a simpler and faster way, reducing barriers to cloud-based data analysis.

`collection_query()`

Returns a query builder instance for NASA collections (datasets).

Returns:

Type	Description
`CollectionQuery`	a query builder instance for data collections.

`download(granules, local_path=None, provider=None, threads=8, *, show_progress=None, credentials_endpoint=None, pqdm_kwargs=None, force=False)`

Retrieves data granules from a remote storage system. Provide the optional local_path argument to prevent repeated downloads.

If we run this in the cloud, we will be using S3 to move data to local_path.
If we run it outside AWS (us-west-2 region) and the dataset is cloud hosted, we'll use HTTP links.

Parameters:

Name	Type	Description	Default
`granules`	`Union[DataGranule, List[DataGranule], str, List[str]]`	a granule, list of granules, a granule link (HTTP), or a list of granule links (HTTP)	required
`local_path`	`Optional[Union[Path, str]]`	Local directory to store the remote data granules. If not supplied, defaults to a subdirectory of the current working directory of the form `data/YYYY-MM-DD-UUID`, where `YYYY-MM-DD` is the year, month, and day of the current date, and `UUID` is the last 6 digits of a UUID4 value.	`None`
`provider`	`Optional[str]`	if we download a list of URLs, we need to specify the provider.	`None`
`credentials_endpoint`	`Optional[str]`	S3 credentials endpoint to be used for obtaining temporary S3 credentials. This is only required if the metadata doesn't include it, or we pass urls to the method instead of `DataGranule` instances.	`None`
`threads`	`int`	parallel number of threads to use to download the files, adjust as necessary, default = 8	`8`
`show_progress`	`Optional[bool]`	whether or not to display a progress bar. If not specified, defaults to `True` for interactive sessions (i.e., in a notebook or a python REPL session), otherwise `False`.	`None`
`pqdm_kwargs`	`Optional[Mapping[str, Any]]`	Additional keyword arguments to pass to pqdm, a parallel processing library. See pqdm documentation for available options. Default is to use immediate exception behavior and the number of jobs specified by the `threads` parameter.	`None`
`force`	`bool`	Force a redownload. By default, existing local files are not overwritten.	`False`

Returns:

Type	Description
`List[Path]`	List of downloaded files

Raises:

Type	Description
`Exception`	A file download failed.

`get_edl_token()`

Returns the current token used for EDL.

Returns:

Type	Description
`str`	EDL token

`get_fsspec_https_session()`

Returns a fsspec session that can be used to access datafiles across many different DAACs.

Returns:

Type	Description
`AbstractFileSystem`	An fsspec instance able to access data across DAACs.

Examples:

import earthaccess

earthaccess.login()
fs = earthaccess.get_fsspec_https_session()
with fs.open(DAAC_GRANULE) as f:
    f.read(10)

`get_requests_https_session()`

Returns a requests Session instance with an authorized bearer token. This is useful for making requests to restricted URLs, such as data granules or services that require authentication with NASA EDL.

Returns:

Type	Description
`Session`	An authenticated requests Session instance.

Examples:

import earthaccess

earthaccess.login()

req_session = earthaccess.get_requests_https_session()
data = req_session.get(granule_url, headers = {"Range": "bytes=0-100"})

`get_s3_credentials(daac=None, provider=None, results=None)`

Returns temporary (1 hour) credentials for direct access to NASA S3 buckets. We can use the daac name, the provider, or a list of results from earthaccess.search_data(). If we use results, earthaccess will use the metadata on the response to get the credentials, which is useful for missions that do not use the same endpoint as their DAACs, e.g. SWOT.

Parameters:

Name	Type	Description	Default
`daac`	`Optional[str]`	a DAAC short_name like NSIDC or PODAAC, etc.	`None`
`provider`	`Optional[str]`	if we know the provider for the DAAC, e.g. POCLOUD, LPCLOUD etc.	`None`
`results`	`Optional[List[DataGranule]]`	List of results from search_data()	`None`

Returns:

Type	Description
`Dict[str, Any]`	a dictionary with S3 credentials for the DAAC or provider

`get_s3_filesystem(daac=None, provider=None, results=None, endpoint=None)`

Return an s3fs.S3FileSystem for direct access when running within the AWS us-west-2 region.

Parameters:

Name	Type	Description	Default
`daac`	`Optional[str]`	Any DAAC short name e.g. NSIDC, GES_DISC	`None`
`provider`	`Optional[str]`	Each DAAC can have a cloud provider. If the DAAC is specified, there is no need to use provider.	`None`
`results`	`Optional[DataGranule]`	A list of results from search_data(). `earthaccess` will use the metadata from CMR to obtain the S3 Endpoint.	`None`
`endpoint`	`Optional[str]`	URL of a cloud provider credentials endpoint to be used for obtaining AWS S3 access credentials.	`None`

Returns:

Type	Description
`S3FileSystem`	An authenticated s3fs session valid for 1 hour.

`get_s3fs_session(daac=None, provider=None, results=None)`

Returns a fsspec s3fs file session for direct access when we are in us-west-2.

Parameters:

Name	Type	Description	Default
`daac`	`Optional[str]`	Any DAAC short name e.g. NSIDC, GES_DISC	`None`
`provider`	`Optional[str]`	Each DAAC can have a cloud provider. If the DAAC is specified, there is no need to use provider.	`None`
`results`	`Optional[DataGranule]`	A list of results from search_data(). `earthaccess` will use the metadata from CMR to obtain the S3 Endpoint.	`None`

Returns:

Type	Description
`S3FileSystem`	An `s3fs.S3FileSystem` authenticated for reading in-region in us-west-2 for 1 hour.

`granule_query()`

Returns a query builder instance for data granules.

Returns:

Type	Description
`GranuleQuery`	a query builder instance for data granules.

`login(strategy='all', persist=False, system=PROD)`

Authenticate with Earthdata login (https://urs.earthdata.nasa.gov/).

Attempt to login via only the specified strategy, unless the "all" strategy is used, in which case each of the individual strategies is attempted in the following order, until one succeeds: "environment", "netrc", "interactive". In this case, only when all strategies fail does login fail.

Parameters:

Name	Type	Description	Default
`strategy`	`str`	An authentication method. `"all"`: try each of the following methods, in order, until one succeeds. `"environment"`: retrieve either an Earthdata login token from the `EARTHDATA_TOKEN` environment variable, or a username and password pair from the `EARTHDATA_USERNAME` and `EARTHDATA_PASSWORD` environment variables (specifying a token takes precedence). `"netrc"`: retrieve username and password from `~/.netrc` (or `~/_netrc` on Windows), or from the file specified by the `NETRC` environment variable. `"interactive"`: enter username and password via interactive prompts.	`'all'`
`persist`	`bool`	if `True`, persist credentials to a `.netrc` file	`False`
`system`	`System`	the Earthdata system to access	`PROD`

Returns:

Type	Description
`Auth`	An instance of Auth.

Raises:

Type	Description
`LoginAttemptFailure`	If the NASA Earthdata Login service rejects credentials.

`open(granules, provider=None, *, credentials_endpoint=None, show_progress=None, pqdm_kwargs=None, open_kwargs=None)`

Returns a list of file-like objects that can be used to access files hosted on S3 or HTTPS by third party libraries like xarray.

Parameters:

Name	Type	Description	Default
`granules`	`Union[List[str], List[DataGranule]]`	a list of granule instances or list of URLs, e.g. `s3://some-granule`. If a list of URLs is passed, we need to specify the data provider.	required
`provider`	`Optional[str]`	e.g. POCLOUD, NSIDC_CPRD, etc.	`None`
`show_progress`	`Optional[bool]`	whether or not to display a progress bar. If not specified, defaults to `True` for interactive sessions (i.e., in a notebook or a python REPL session), otherwise `False`.	`None`
`pqdm_kwargs`	`Optional[Mapping[str, Any]]`	Additional keyword arguments to pass to pqdm, a parallel processing library. See pqdm documentation for available options. Default is to use immediate exception behavior and the number of jobs specified by the `threads` parameter.	`None`
`open_kwargs`	`Optional[Dict[str, Any]]`	Additional keyword arguments to pass to `fsspec.open`, such as `cache_type` and `block_size`. Defaults to using `blockcache` with a block size determined by the file size (4 to 16MB).	`None`

Returns:

Type	Description
`List[AbstractFileSystem]`	A list of "file pointers" to remote (i.e. s3 or https) files.

`search_data(count=-1, **kwargs)`

Search for dataset files (granules) using NASA's CMR.

https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html

The CMR does not permit queries across all granules in all collections in order to provide fast search responses. Granule queries must target a subset of the collections in the CMR using a condition like provider, provider_id, concept_id, collection_concept_id, short_name, version or entry_title.

Parameters:

Name	Type	Description	Default
`count`	`int`	Number of records to get, -1 = all	`-1`
`kwargs`	`Dict`	arguments to CMR: short_name: (str) Filter granules by product short name; e.g. ATL08 version: (str) Filter by dataset version daac: (str) a provider code for any DAAC, e.g. NSIDC or PODAAC data_center; (str) An alias for daac provider: (str) Only match granules from a given provider. A DAAC can have more than one provider, e.g PODAAC and POCLOUD, NSIDC_ECS and NSIDC_CPRD. cloud_hosted: (bool) If True, only match granules hosted in Earthdata Cloud downloadable: (bool) If True, only match granules that are downloadable. A granule is downloadable when it contains at least one RelatedURL of type GETDATA. online_only: (bool) Alias of downloadable orbit_number; (float) Filter granule by the orbit number in which a granule was acquired granule_name; (str) Filter by granule name. Granule name can contain wild cards, e.g `MODGRNLD..daily.`. instrument; (str) Filter by instrument name, e.g. "ATLAS" platform; (str) Filter by platform, e.g. satellite or plane cloud_cover: (tuple) Filter by cloud cover. Tuple is a range of cloud covers, e.g. (0, 20). Cloud cover values in metadata may be fractions (i.e. (0.,0.2)) or percentages. CMRS searches for cloud cover range based on values in metadata. Note collections without cloud_cover in metadata will return zero granules. day_night_flag: (str) Filter for day- and night-time images, accepts 'day', 'night', 'unspecified'. temporal: (tuple) A tuple representing temporal bounds in the form `(date_from, date_to)`. Dates can be `datetime` objects or ISO 8601 formatted strings. Date strings can be full timestamps; e.g. YYYY-MM-DD HH:mm:ss or truncated YYYY-MM-DD bounding_box: (tuple) Filter collection by those that intersect bounding box. A tuple representing spatial bounds in the form `(lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)` polygon: (list[tuples]) Filter by polygon. Polygon must be a list of tuples containing longitude-latitude pairs representing polygon vertices. Vertices must be in counter-clockwise order and the final vertex must be the same as the first vertex; e.g. [(lon1,lat1),(lon2,lat2),(lon3,lat3), (lon4,lat4),(lon1,lat1)] point: (tuple(float,float)) Filter by collections intersecting a point, where the point is a longitude-latitude pair; e.g. (lon,lat) line: (list[tuples]) Filter collections that overlap a series of connected points. Points are represented as tuples containing longitude-latitude pairs; e.g. [(lon1,lat1),(lon2,lat2),(lon3,lat3)] circle: (tuple(float, float, float)) Filter collections that intersect a circle defined as a point with a radius. Circle parameters are a tuple containing latitude, longitude and radius in meters; e.g. (lon, lat, radius_m). The circle center cannot be the north or south poles. The radius mst be between 10 and 6,000,000 m	`{}`

Returns:

Type	Description
`List[DataGranule]`	a list of DataGranules that can be used to access the granule files by using `download()` or `open()`.

Raises:

Type	Description
`RuntimeError`	The CMR query failed.

Examples:

granules = earthaccess.search_data(
    short_name="ATL06",
    bounding_box=(-46.5, 61.0, -42.5, 63.0),
    )

granules = earthaccess.search_data(
    doi="10.5067/SLREF-CDRV2",
    cloud_hosted=True,
    temporal=("2002-01-01", "2002-12-31")
)

`search_datasets(count=-1, **kwargs)`

Search datasets (collections) using NASA's CMR.

https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html

Parameters:

Name	Type	Description	Default
`count`	`int`	Number of records to get, -1 = all	`-1`
`kwargs`	`Dict`	arguments to CMR: keyword: (str) Filter collections by keywords. Case-insensitive and supports wildcards ? and * short_name: (str) Filter collections by product short name; e.g. ATL08 doi: (str) Filter by DOI daac: (str) Filter by DAAC; e.g. NSIDC or PODAAC data_center: (str) An alias for `daac` provider: (str) Filter by data provider; each DAAC can have more than one provider, e.g. POCLOUD, PODAAC, etc. has_granules: (bool) If true, only return collections with granules. Default: True temporal: (tuple) A tuple representing temporal bounds in the form `(date_from, date_to)`. Dates can be `datetime` objects or ISO 8601 formatted strings. Date strings can be full timestamps; e.g. YYYY-MM-DD HH:mm:ss or truncated YYYY-MM-DD bounding_box: (tuple) Filter collection by those that intersect bounding box. A tuple representing spatial bounds in the form `(lower_left_lon, lower_left_lat, upper_right_lon, upper_right_lat)` polygon: (List[tuples]) Filter by polygon. Polygon must be a list of tuples containing longitude-latitude pairs representing polygon vertices. Vertices must be in counter-clockwise order and the final vertex must be the same as the first vertex; e.g. [(lon1,lat1),(lon2,lat2),(lon3,lat3),(lon4,lat4),(lon1,lat1)] point: (Tuple[float,float]) Filter by collections intersecting a point, where the point is a longitude-latitude pair; e.g. (lon,lat) line: (List[tuples]) Filter collections that overlap a series of connected points. Points are represented as tuples containing longitude-latitude pairs; e.g. [(lon1,lat1),(lon2,lat2),(lon3,lat3)] circle: (List[float, float, float]) Filter collections that intersect a circle defined as a point with a radius. Circle parameters are a list containing latitude, longitude and radius in meters; e.g. [lon, lat, radius_m]. The circle center cannot be the north or south poles. The radius mst be between 10 and 6,000,000 m cloud_hosted: (bool) Return only collected hosted on Earthdata Cloud. Default: True downloadable: (bool) If True, only return collections that can be downloaded from an online archive concept_id: (str) Filter by Concept ID; e.g. C3151645377-NSIDC_CPRD instrument: (str) Filter by Instrument name; e.g. ATLAS project: (str) Filter by project or campaign name; e.g. ABOVE fields: (List[str]) Return only the UMM fields listed in this parameter revision_date: tuple(str,str) Filter by collections that have revision date within the range debug: (bool) If True prints CMR request. Default: True	`{}`

Returns:

Type	Description
`List[DataCollection]`	A list of DataCollection results that can be used to get information about a dataset, e.g. concept_id, doi, etc.

Raises:

Type	Description
`RuntimeError`	The CMR query failed.

Examples:

datasets = earthaccess.search_datasets(
    keyword="sea surface anomaly",
    cloud_hosted=True
)

results = earthaccess.search_datasets(
    daac="NSIDC",
    bounding_box=(-73., 58., -10., 84.),
)

results = earthaccess.search_datasets(
    instrument="ATLAS",
    bounding_box=(-73., 58., -10., 84.),
    temporal=("2024-09-01", "2025-04-30"),
)

`search_services(count=-1, **kwargs)`

Search the NASA CMR for Services matching criteria.

See https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#service.

Parameters:

Name	Type	Description	Default
`count`	`int`	maximum number of services to fetch (if less than 1, all services matching specified criteria are fetched [default])	`-1`
`kwargs`	`Any`	keyword arguments accepted by the CMR for searching services	`{}`

Returns:

Type	Description
`List[Any]`	list of services (possibly empty) matching specified criteria, in UMM
`List[Any]`	JSON format

Examples:

services = search_services(provider="POCLOUD", keyword="COG")

`status(system=PROD, raise_on_outage=False)`

Get the statuses of NASA's Earthdata services.

Parameters:

Name	Type	Description	Default
`system`	`System`	The Earthdata system to access, defaults to PROD.	`PROD`
`raise_on_outage`	`bool`	If True, raises exception on errors or outages.	`False`

Returns:

Type	Description
`dict[str, str]`	A dictionary containing the statuses of Earthdata services.

Examples:

>>> earthaccess.status()
{'Earthdata Login': 'OK', 'Common Metadata Repository': 'OK'}
>>> earthaccess.status(earthaccess.UAT)
{'Earthdata Login': 'OK', 'Common Metadata Repository': 'OK'}

Raises:

Type	Description
`ServiceOutage`	if at least one service status is not `"OK"`

Virtual dataset utilities for cloud-native access to NASA Earthdata granules.

This subpackage provides tools for creating virtual xarray Datasets from NASA Earthdata granules without downloading data, using VirtualiZarr parsers.

Public API

virtualize — create a virtual (or loaded) xarray Dataset from granules.
SUPPORTED_PARSERS — frozenset of recognised parser name strings.
get_granule_credentials_endpoint_and_region — resolve S3 credentials for a granule (useful for advanced direct-access workflows).

Requires the earthaccess[virtualizarr] optional extra.

`get_granule_credentials_endpoint_and_region(granule)`

Return the S3 credentials endpoint and region for a granule.

The endpoint is read from the granule's UMM-G record first. If absent, a CMR collection query is performed and the information is taken from the UMM-C record.

Parameters:

Name	Type	Description	Default
`granule`	`DataGranule`	A single `DataGranule` object.	required

Returns:

Type	Description
`str`	A tuple of `(credentials_endpoint, region)`. `region` defaults to
`str`	`"us-west-2"` when not present in the collection record.

Raises:

Type	Description
`ValueError`	If no `S3CredentialsAPIEndpoint` can be resolved from either the granule or its parent collection.

`virtualize(granules, *, access='direct', load=False, group='/', concat_dim=None, preprocess=None, data_vars='all', coords='different', compat='no_conflicts', combine_attrs='drop_conflicts', parallel='dask', parser='DMRPPParser', reference_dir=None, reference_format='json', **xr_combine_kwargs)`

Create a virtual xarray Dataset from NASA Earthdata granules.

Uses VirtualiZarr to open granules as virtual datasets backed by cloud object storage without downloading data. By default returns a virtual dataset (load=False); set load=True to return a concrete lazily-loaded xarray Dataset via a kerchunk round-trip.

The parser controls which VirtualiZarr backend reads the files. The default "DMRPPParser" is the fastest option and uses NASA pre-computed DMR++ sidecar files. When those sidecars are absent earthaccess automatically falls back to "HDFParser" and emits a UserWarning.

Parameters:

Name	Type	Description	Default
`granules`	`list[DataGranule]`	One or more `DataGranule` objects from `earthaccess.search_data()`.	required
`access`	`AccessType`	Cloud access mode. `"direct"` uses S3 (fastest inside AWS us-west-2); `"indirect"` uses HTTPS (works anywhere).	`'direct'`
`load`	`bool`	When `False` (default) returns a virtual dataset with `ManifestArray` variables. When `True` materialises the references via a kerchunk round-trip and returns a concrete, lazily-loaded `xr.Dataset` backed by dask arrays.	`False`
`group`	`str`	HDF5/NetCDF4 group path to open. Defaults to the root group `"/"`.	`'/'`
`concat_dim`	`str \| None`	Dimension name used to concatenate granules. Required when `len(granules) > 1`.	`None`
`preprocess`	`Callable[[Dataset], Dataset] \| None`	Optional callable applied to each single-granule virtual dataset before combining.	`None`
`data_vars`	`DataVarsType`	Forwarded to `xarray.combine_nested`.	`'all'`
`coords`	`str`	Forwarded to `xarray.combine_nested`.	`'different'`
`compat`	`CompatType`	Forwarded to `xarray.combine_nested`.	`'no_conflicts'`
`combine_attrs`	`CombineAttrsType`	Forwarded to `xarray.combine_nested`.	`'drop_conflicts'`
`parallel`	`ParallelType`	Parallelism backend. `"dask"` (default) wraps opens in `dask.delayed`; `"lithops"` uses Lithops; `False` disables parallelism.	`'dask'`
`parser`	`ParserType`	VirtualiZarr parser to use. One of `"DMRPPParser"` (default), `"HDFParser"`, `"NetCDF3Parser"`, a lowercase alias (`"dmrpp"`, `"hdf"`, `"hdf5"`, `"netcdf3"`), or a pre-instantiated parser object.	`'DMRPPParser'`
`reference_dir`	`str \| None`	Directory for kerchunk reference files when `load=True`. A temporary directory is used when `None`.	`None`
`reference_format`	`ReferenceFormatType`	Serialisation format when `load=True`. `"json"` (default) or `"parquet"`.	`'json'`
`**xr_combine_kwargs`	`Any`	Additional keyword arguments forwarded to `xarray.combine_nested`.	`{}`

Returns:

Type	Description
`Dataset`	An `xr.Dataset`. With `load=False` the dataset contains
`Dataset`	`ManifestArray` variables; with `load=True` it contains dask
`Dataset`	arrays backed by the kerchunk reference store.

Raises:

Type	Description
`ValueError`	If `granules` is empty.
`ValueError`	If `len(granules) > 1` and `concat_dim` is `None`.
`ValueError`	If `parser` is an unrecognised string.
`ImportError`	If `earthaccess[virtualizarr]` is not installed.

Examples:

import earthaccess

granules = earthaccess.search_data(
    count=5,
    temporal=("2024-01-01", "2024-01-05"),
    short_name="MUR-JPL-L4-GLOB-v4.1",
)

# Virtual dataset (no data downloaded)
vds = earthaccess.virtualize(granules, access="indirect", concat_dim="time")
vds.virtualize.to_kerchunk("mur_combined.json", format="json")

# Loaded dataset (kerchunk round-trip, lazy dask arrays)
ds = earthaccess.virtualize(granules, access="direct", load=True, concat_dim="time")