The ASTR schema defines a set of naming and data conventions to create data objects that can be easily subset with (almost) no loss of information. Such datasets allow frictionless handling of them and their subsets between software and humans. Creating such datasets requires reduction of implicit information as much as possible (e.g. units of a value being defined in a separate column). Moreover, data objects must be readable for machines while preserving human readability and allow for direct import of spreadsheet-like formats. The conventions defined below are further influenced by considerations related to the analytical data and their quality, which are explained in this vignette.
The schema is platform and programming language agnostic. You can learn about the implementation of the schema in the ASTR package in the dedicated vignette.
Data preparation
- All variable names are case sensitive.
- Decimals can only be read when indicated by a decimal point (0.5), as opposed to decimal commas (0,5).
- Apostrophes and any other special characters should be avoided in the sample/variable names.
Analytical data columns naming
All columns that contain element and oxide compositions, isotopic values, or ratios derived therefrom as well as analytical uncertainties (“errors”) should be named using the following conventions:
- Only Latin characters can be supported in the column names, i.e. δ and ε will not be identified.
- The names of oxides and trace elements are self-explanatory
(e.g.
SiO2orSi). Total iron, if given, should be expressed as:FeOtotorFe2O3tot. Loss on ignition, where known, should be expressed asLOI. The value “balance” provided by e.g. many pXRF instruments should be expressed asBalorBalance. - Units for values should follow element or oxide, as
_<unit>. Units supported include all SI units, as well as ppm, ppb, ppt, ppq, %, wt%, at%, ‰, counts, and cps, noted as such. - Where known, specify wt% and at% instead of using %, and make the mixture type of concentrations explicit, e.g. write ppm(m/m) instead of ppm. For ‘per mille’, use the symbol ‰.
- For isotopic ratios, use simple forms such as
206Pb/204Pb,87Sr/86Sr,147Sm/144Nd, ord18O. For isotope ratios expressed in δ-notation and similar expressions, always provide the mass of the isotope in the numerator (e.g. writee143Ndinstead ofeNd). - If your dataset includes columns with additional information such as
geolocation, or values derived from analytial data such as Pb isotope
age model parameters, ensure the names do not contain a
dash (‘-’) or underscore (‘_’) prior to import and specify these columns
as context in the
read_ASTR()function. - Analytical precision should be indicated in the column name as:
_err2SD,_errSD,_errSE,_err2SEwithout indicating the unit. For absolute analytical uncertainties, the unit will be inferred from the corresponding composition column. Relative analytical uncertainties are indicated by adding a%sign, e.g._err2SD%. We strongly recommend that the uncertainty is included in the data table, and properly noted following the conventions described.
| Components | Accepted formats |
|---|---|
| Oxides and elements |
SiO2 ; Si
|
| Total iron |
FeOtot ; Fe2O3tot
|
| Loss on ignition | LOI |
| Isotopic ratios |
206Pb/204Pb ; 87Sr/86Sr ;
147Sm/144Nd ; d18O
|
| Units for value |
_wt% ; _at%
|
| Analytical precision |
_err2SD ; _errSD ; _errSD% ;
_errSD% ; _errSE ; _err2SE
|