earthaccess API
earthaccess is a Python library that simplifies data discovery and access to NASA Earth science data by providing a higher abstraction for NASA’s Search API (CMR) so that searching for data can be done using a simpler notation instead of low level HTTP queries.
This library handles authentication with NASA’s OAuth2 API (EDL) and provides HTTP and AWS S3 sessions that can be used with xarray and other PyData libraries to access NASA EOSDIS datasets directly allowing scientists get to their science in a simpler and faster way, reducing barriers to cloud-based data analysis.
collection_query()
Returns a query builder instance for NASA collections (datasets).
Returns:
| Type | Description |
|---|---|
CollectionQuery
|
a query builder instance for data collections. |
download(granules, local_path=None, provider=None, threads=8, *, show_progress=None, credentials_endpoint=None, pqdm_kwargs=None, force=False)
Retrieves data granules from a remote storage system. Provide the optional local_path argument to prevent repeated downloads.
- If we run this in the cloud, we will be using S3 to move data to
local_path. - If we run it outside AWS (us-west-2 region) and the dataset is cloud hosted, we'll use HTTP links.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
granules
|
Union[DataGranule, List[DataGranule], str, List[str]]
|
a granule, list of granules, a granule link (HTTP), or a list of granule links (HTTP) |
required |
local_path
|
Optional[Union[Path, str]]
|
Local directory to store the remote data granules. If not
supplied, defaults to a subdirectory of the current working directory
of the form |
None
|
provider
|
Optional[str]
|
if we download a list of URLs, we need to specify the provider. |
None
|
credentials_endpoint
|
Optional[str]
|
S3 credentials endpoint to be used for obtaining temporary S3 credentials. This is only required if
the metadata doesn't include it, or we pass urls to the method instead of |
None
|
threads
|
int
|
parallel number of threads to use to download the files, adjust as necessary, default = 8 |
8
|
show_progress
|
Optional[bool]
|
whether or not to display a progress bar. If not specified, defaults to |
None
|
pqdm_kwargs
|
Optional[Mapping[str, Any]]
|
Additional keyword arguments to pass to pqdm, a parallel processing library.
See pqdm documentation for available options. Default is to use immediate exception behavior
and the number of jobs specified by the |
None
|
force
|
bool
|
Force a redownload. By default, existing local files are not overwritten. |
False
|
Returns:
| Type | Description |
|---|---|
List[Path]
|
List of downloaded files |
Raises:
| Type | Description |
|---|---|
Exception
|
A file download failed. |
get_edl_token()
Returns the current token used for EDL.
Returns:
| Type | Description |
|---|---|
str
|
EDL token |
get_fsspec_https_session()
Returns a fsspec session that can be used to access datafiles across many different DAACs.
Returns:
| Type | Description |
|---|---|
AbstractFileSystem
|
An fsspec instance able to access data across DAACs. |
Examples:
get_requests_https_session()
Returns a requests Session instance with an authorized bearer token. This is useful for making requests to restricted URLs, such as data granules or services that require authentication with NASA EDL.
Returns:
| Type | Description |
|---|---|
Session
|
An authenticated requests Session instance. |
Examples:
get_s3_credentials(daac=None, provider=None, results=None)
Returns temporary (1 hour) credentials for direct access to NASA S3 buckets. We can use the daac name, the provider, or a list of results from earthaccess.search_data(). If we use results, earthaccess will use the metadata on the response to get the credentials, which is useful for missions that do not use the same endpoint as their DAACs, e.g. SWOT.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
daac
|
Optional[str]
|
a DAAC short_name like NSIDC or PODAAC, etc. |
None
|
provider
|
Optional[str]
|
if we know the provider for the DAAC, e.g. POCLOUD, LPCLOUD etc. |
None
|
results
|
Optional[List[DataGranule]]
|
List of results from search_data() |
None
|
Returns:
| Type | Description |
|---|---|
Dict[str, Any]
|
a dictionary with S3 credentials for the DAAC or provider |
get_s3_filesystem(daac=None, provider=None, results=None, endpoint=None)
Return an s3fs.S3FileSystem for direct access when running within the AWS us-west-2 region.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
daac
|
Optional[str]
|
Any DAAC short name e.g. NSIDC, GES_DISC |
None
|
provider
|
Optional[str]
|
Each DAAC can have a cloud provider. If the DAAC is specified, there is no need to use provider. |
None
|
results
|
Optional[DataGranule]
|
A list of results from search_data().
|
None
|
endpoint
|
Optional[str]
|
URL of a cloud provider credentials endpoint to be used for obtaining AWS S3 access credentials. |
None
|
Returns:
| Type | Description |
|---|---|
S3FileSystem
|
An authenticated s3fs session valid for 1 hour. |
get_s3fs_session(daac=None, provider=None, results=None)
Returns a fsspec s3fs file session for direct access when we are in us-west-2.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
daac
|
Optional[str]
|
Any DAAC short name e.g. NSIDC, GES_DISC |
None
|
provider
|
Optional[str]
|
Each DAAC can have a cloud provider. If the DAAC is specified, there is no need to use provider. |
None
|
results
|
Optional[DataGranule]
|
A list of results from search_data().
|
None
|
Returns:
| Type | Description |
|---|---|
S3FileSystem
|
An |
granule_query()
Returns a query builder instance for data granules.
Returns:
| Type | Description |
|---|---|
GranuleQuery
|
a query builder instance for data granules. |
login(strategy='all', persist=False, system=PROD)
Authenticate with Earthdata login (https://urs.earthdata.nasa.gov/).
Attempt to login via only the specified strategy, unless the "all"
strategy is used, in which case each of the individual strategies is
attempted in the following order, until one succeeds: "environment",
"netrc", "interactive". In this case, only when all strategies fail
does login fail.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
strategy
|
str
|
An authentication method.
|
'all'
|
persist
|
bool
|
if |
False
|
system
|
System
|
the Earthdata system to access |
PROD
|
Returns:
| Type | Description |
|---|---|
Auth
|
An instance of Auth. |
Raises:
| Type | Description |
|---|---|
LoginAttemptFailure
|
If the NASA Earthdata Login service rejects credentials. |
open(granules, provider=None, *, credentials_endpoint=None, show_progress=None, pqdm_kwargs=None, open_kwargs=None)
Returns a list of file-like objects that can be used to access files hosted on S3 or HTTPS by third party libraries like xarray.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
granules
|
Union[List[str], List[DataGranule]]
|
a list of granule instances or list of URLs, e.g. |
required |
provider
|
Optional[str]
|
e.g. POCLOUD, NSIDC_CPRD, etc. |
None
|
show_progress
|
Optional[bool]
|
whether or not to display a progress bar. If not specified, defaults to |
None
|
pqdm_kwargs
|
Optional[Mapping[str, Any]]
|
Additional keyword arguments to pass to pqdm, a parallel processing library.
See pqdm documentation for available options. Default is to use immediate exception behavior
and the number of jobs specified by the |
None
|
open_kwargs
|
Optional[Dict[str, Any]]
|
Additional keyword arguments to pass to |
None
|
Returns:
| Type | Description |
|---|---|
List[AbstractFileSystem]
|
A list of "file pointers" to remote (i.e. s3 or https) files. |
search_data(count=-1, **kwargs)
Search for dataset files (granules) using NASA's CMR.
https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html
The CMR does not permit queries across all granules in all collections in order to provide fast search responses. Granule queries must target a subset of the collections in the CMR using a condition like provider, provider_id, concept_id, collection_concept_id, short_name, version or entry_title.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
count
|
int
|
Number of records to get, -1 = all |
-1
|
kwargs
|
Dict
|
arguments to CMR:
|
{}
|
Returns:
| Type | Description |
|---|---|
List[DataGranule]
|
a list of DataGranules that can be used to access the granule files by using
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
The CMR query failed. |
Examples:
search_datasets(count=-1, **kwargs)
Search datasets (collections) using NASA's CMR.
https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
count
|
int
|
Number of records to get, -1 = all |
-1
|
kwargs
|
Dict
|
arguments to CMR:
|
{}
|
Returns:
| Type | Description |
|---|---|
List[DataCollection]
|
A list of DataCollection results that can be used to get information about a dataset, e.g. concept_id, doi, etc. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
The CMR query failed. |
Examples:
search_services(count=-1, **kwargs)
Search the NASA CMR for Services matching criteria.
See https://cmr.earthdata.nasa.gov/search/site/docs/search/api.html#service.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
count
|
int
|
maximum number of services to fetch (if less than 1, all services matching specified criteria are fetched [default]) |
-1
|
kwargs
|
Any
|
keyword arguments accepted by the CMR for searching services |
{}
|
Returns:
| Type | Description |
|---|---|
List[Any]
|
list of services (possibly empty) matching specified criteria, in UMM |
List[Any]
|
JSON format |
Examples:
status(system=PROD, raise_on_outage=False)
Get the statuses of NASA's Earthdata services.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
system
|
System
|
The Earthdata system to access, defaults to PROD. |
PROD
|
raise_on_outage
|
bool
|
If True, raises exception on errors or outages. |
False
|
Returns:
| Type | Description |
|---|---|
dict[str, str]
|
A dictionary containing the statuses of Earthdata services. |
Examples:
>>> earthaccess.status()
{'Earthdata Login': 'OK', 'Common Metadata Repository': 'OK'}
>>> earthaccess.status(earthaccess.UAT)
{'Earthdata Login': 'OK', 'Common Metadata Repository': 'OK'}
Raises:
| Type | Description |
|---|---|
ServiceOutage
|
if at least one service status is not |
Virtual dataset utilities for cloud-native access to NASA Earthdata granules.
This subpackage provides tools for creating virtual xarray Datasets from NASA Earthdata granules without downloading data, using VirtualiZarr parsers.
Public API
virtualize— create a virtual (or loaded) xarray Dataset from granules.SUPPORTED_PARSERS— frozenset of recognised parser name strings.get_granule_credentials_endpoint_and_region— resolve S3 credentials for a granule (useful for advanced direct-access workflows).
Requires the earthaccess[virtualizarr] optional extra.
get_granule_credentials_endpoint_and_region(granule)
Return the S3 credentials endpoint and region for a granule.
The endpoint is read from the granule's UMM-G record first. If absent, a CMR collection query is performed and the information is taken from the UMM-C record.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
granule
|
DataGranule
|
A single |
required |
Returns:
| Type | Description |
|---|---|
str
|
A tuple of |
str
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If no |
virtualize(granules, *, access='direct', load=False, group='/', concat_dim=None, preprocess=None, data_vars='all', coords='different', compat='no_conflicts', combine_attrs='drop_conflicts', parallel='dask', parser='DMRPPParser', reference_dir=None, reference_format='json', **xr_combine_kwargs)
Create a virtual xarray Dataset from NASA Earthdata granules.
Uses VirtualiZarr to open granules as virtual datasets backed by cloud
object storage without downloading data. By default returns a virtual
dataset (load=False); set load=True to return a concrete
lazily-loaded xarray Dataset via a kerchunk round-trip.
The parser controls which VirtualiZarr backend reads the files. The
default "DMRPPParser" is the fastest option and uses NASA pre-computed
DMR++ sidecar files. When those sidecars are absent earthaccess
automatically falls back to "HDFParser" and emits a UserWarning.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
granules
|
list[DataGranule]
|
One or more |
required |
access
|
AccessType
|
Cloud access mode. |
'direct'
|
load
|
bool
|
When |
False
|
group
|
str
|
HDF5/NetCDF4 group path to open. Defaults to the root
group |
'/'
|
concat_dim
|
str | None
|
Dimension name used to concatenate granules. Required
when |
None
|
preprocess
|
Callable[[Dataset], Dataset] | None
|
Optional callable applied to each single-granule virtual dataset before combining. |
None
|
data_vars
|
DataVarsType
|
Forwarded to |
'all'
|
coords
|
str
|
Forwarded to |
'different'
|
compat
|
CompatType
|
Forwarded to |
'no_conflicts'
|
combine_attrs
|
CombineAttrsType
|
Forwarded to |
'drop_conflicts'
|
parallel
|
ParallelType
|
Parallelism backend. |
'dask'
|
parser
|
ParserType
|
VirtualiZarr parser to use. One of |
'DMRPPParser'
|
reference_dir
|
str | None
|
Directory for kerchunk reference files when
|
None
|
reference_format
|
ReferenceFormatType
|
Serialisation format when |
'json'
|
**xr_combine_kwargs
|
Any
|
Additional keyword arguments forwarded to
|
{}
|
Returns:
| Type | Description |
|---|---|
Dataset
|
An |
Dataset
|
|
Dataset
|
arrays backed by the kerchunk reference store. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
ValueError
|
If |
ValueError
|
If |
ImportError
|
If |
Examples:
import earthaccess
granules = earthaccess.search_data(
count=5,
temporal=("2024-01-01", "2024-01-05"),
short_name="MUR-JPL-L4-GLOB-v4.1",
)
# Virtual dataset (no data downloaded)
vds = earthaccess.virtualize(granules, access="indirect", concat_dim="time")
vds.virtualize.to_kerchunk("mur_combined.json", format="json")
# Loaded dataset (kerchunk round-trip, lazy dask arrays)
ds = earthaccess.virtualize(granules, access="direct", load=True, concat_dim="time")