Dataset#

Classes and functions for defining, finding, and loading data.

Classes:

Dataset(**facets)

Define datasets, find the related files, and load them.

Data:

INHERITED_FACETS

Inherited facets.

Functions:

datasets_to_recipe(datasets[, recipe])

Create or update a recipe from datasets.

class esmvalcore.dataset.Dataset(**facets: FacetValue)[source]#

Define datasets, find the related files, and load them.

Parameters:: **facets (FacetValue) – Facets describing the dataset. See esmvalcore.esgf.facets.FACETS for the mapping between the facet names used by ESMValCore and those used on ESGF.

supplementaries#

List of supplementary datasets.

Type:: list[Dataset]

facets#

Facets describing the dataset.

Type:: esmvalcore.typing.Facets

Methods:

`add_supplementary`(**facets)	Add an supplementary dataset.
`augment_facets`()	Add additional facets.
`copy`(**facets)	Create a copy.
`find_files`()	Find files.
`from_files`()	Create datasets based on the available files.
`from_ranges`()	Create a list of datasets from short notations.
`from_recipe`(recipe, session)	Read datasets from a recipe.
`load`()	Load dataset.
`set_facet`(key, value[, persist])	Set facet.
`set_version`()	Set the `'version'` facet based on the available data.
`summary`([shorten])	Summarize the content of dataset.

Attributes:

`files`	The files associated with this dataset.
`input_datasets`	Get input datasets.
`minimal_facets`	Return a dictionary with the persistent facets.
`session`	A `esmvalcore.config.Session` associated with the dataset.

add_supplementary(**facets: FacetValue) → None[source]#

Add an supplementary dataset.

This is a convenience function that will create a copy of the current dataset, update its facets with the values specified in **facets, and append it to Dataset.supplementaries. For more control over the creation of the supplementary dataset, first create a new Dataset describing the supplementary dataset and then append it to Dataset.supplementaries.

Parameters:: **facets (FacetValue) – Facets describing the supplementary variable.
Return type:: None

augment_facets() → None[source]#

Add additional facets.

This function will update the dataset with additional facets from various sources.

Return type:: None

copy(**facets: FacetValue) → Dataset[source]#

Create a copy.

Parameters:: **facets (FacetValue) – Update these facets in the copy. Note that for supplementary datasets attached to the dataset, the 'short_name' and 'mip' facets will not be updated with these values.
Returns:: A copy of the dataset.
Return type:: Dataset

property files: list[ESGFFile | LocalFile]#: The files associated with this dataset.

find_files() → None[source]#

Find files.

Look for files and populate the Dataset.files property of the dataset and its supplementary datasets.

Return type:: None

from_files() → Iterator[Dataset][source]#

Create datasets based on the available files.

The facet values for local files are retrieved from the directory tree where the directories represent the facets values. See CMIP data for more information on this kind of file organization.

glob.glob() patterns can be used as facet values to select multiple datasets. If for some of the datasets not all glob patterns can be expanded (e.g. because the required facet values cannot be inferred from the directory names), these datasets will be ignored, unless this happens to be all datasets.

If glob.glob() patterns are used in supplementary variables and multiple matching datasets are found, only the supplementary dataset that has most facets in common with the main dataset will be attached.

Supplementary datasets will in inherit the facet values from the main dataset for those facets listed in INHERITED_FACETS.

This also works for derived variables. The input datasets that are necessary for derivation can be accessed via Dataset.input_datasets.

Examples

See Discovering data for example use cases.

Yields:: Dataset – Datasets representing the available files.
Return type:: Iterator[Dataset]

from_ranges() → list[Dataset][source]#

Create a list of datasets from short notations.

This expands the 'ensemble' and 'sub_experiment' facets in the dataset definition if they are ranges.

For example 'ensemble'='r(1:3)i1p1f1' will be expanded to three datasets, with 'ensemble' values 'r1i1p1f1', 'r2i1p1f1', 'r3i1p1f1'.

Returns:: The datasets.
Return type:: list[Dataset]

static from_recipe(recipe: Path | str | dict, session: Session) → list[Dataset][source]#

Read datasets from a recipe.

Parameters:

recipe (Path | str | dict) – Recipe to load the datasets from. The value provided here should be either a path to a file, a recipe file that has been loaded using e.g. yaml.safe_load(), or an str that can be loaded using yaml.safe_load().
session (Session) – Datasets to use in the recipe.

Returns:

A list of datasets.

Return type:

list[Dataset]

property input_datasets: list[Dataset]#

Get input datasets.

For non-derived variables (i.e., those with facet derive=False), this will simply return the dataset itself in a list.

For derived variables (i.e., those with facet derive=True), this will return the datasets required for derivation if derivation is necessary, and the dataset itself if derivation is not necessary. Derivation is necessary if the facet force_derivation=True is set or no files for the dataset itself are available.

See also esmvalcore.preprocessor.derive() for an example usage.

load() → Cube[source]#

Load dataset.

Raises:: InputFilesNotFound – When no files were found.
Returns:: An iris cube with the data corresponding the the dataset.
Return type:: iris.cube.Cube

property minimal_facets: Facets#: Return a dictionary with the persistent facets.

property session: Session#: A esmvalcore.config.Session associated with the dataset.

set_facet(key: str, value: FacetValue, persist: bool = True) → None[source]#

Set facet.

Parameters:

key (str) – The name of the facet.
value (FacetValue) – The value of the facet.
persist (bool) – When writing a dataset to a recipe, only persistent facets will get written.

Return type:

None

set_version() → None[source]#

Set the 'version' facet based on the available data.

Return type:: None

summary(shorten: bool = False) → str[source]#

Summarize the content of dataset.

Parameters:: shorten (bool) – Shorten the summary.
Returns:: A summary describing the dataset.
Return type:: str

esmvalcore.dataset.INHERITED_FACETS: list[str] = ['dataset', 'domain', 'driver', 'grid', 'project', 'timerange']#

Inherited facets.

Supplementary datasets created based on the available files using the Dataset.from_files() method will inherit the values of these facets from the main dataset.

esmvalcore.dataset.datasets_to_recipe(datasets: Iterable[Dataset], recipe: Path | str | dict[str, Any] | None = None) → dict[source]#

Create or update a recipe from datasets.

Parameters:

datasets (Iterable[Dataset]) – Datasets to use in the recipe.
recipe (Path | str | dict[str, Any] | None) – Recipe to load the datasets from. The value provided here should be either a path to a file, a recipe file that has been loaded using e.g. yaml.safe_load(), or an str that can be loaded using yaml.safe_load().

Return type:

dict

Examples

See Composing recipes for example use cases.

Returns:

The recipe with the datasets. To convert the dict to a recipe, use e.g. yaml.safe_dump().

Return type:

dict

Raises:

RecipeError – Raised when a dataset is missing the diagnostic facet.

Parameters:

datasets (Iterable[Dataset])
recipe (Path | str | dict[str, Any] | None)

Dataset#

This Page