To make the most out of the features offered by the ASTR package,
input data should be imported into ASTR objects.
ASTR objects represent archaeometric and geochemical data
in R in a standardized and semantically rich format, to allow for
convenient and safe data analysis. If the input data structure adheres
to the conventions set out in the ASTR schema, then the package can
automatically build valid ASTR objects from it. When
working with the package, many of its functions require only an
ASTR object, making them convenient to use even for users
who consider themselves less experienced with R. This article highlights
some of these functionalities.
Before we start, we want to load the ASTR package:
Data import and inspection
To work with an ASTR object, we have to create one. The
function read_ASTR() imports spreadsheets in a variety of
formats, including Excel files. For this article, we use mock-up
data:
#> Group Cu_wt% Sn_wt% As_wt% Sb_wt% Ag_ppm Ni_ppm d65Cu 206Pb/204Pb
#> 1 A 87.9 9.8 2.39 0.98 136 316 0.17 NA
#> 2 A 92.9 7.3 2.19 0.95 204 455 -0.41 NA
#> 3 A 89.1 8.4 2.14 0.85 203 580 -0.12 NA
#> 4 A 93.8 7.9 2.49 0.90 192 343 -0.23 NA
#> 5 A 94.4 5.5 2.16 0.51 138 496 0.31 NA
#> 6 B 85.5 9.5 2.21 0.74 135 372 -0.05 18.7050
#> 7 B 90.3 6.2 2.04 0.88 158 345 #REF! 18.7062
#> 8 B 93.9 5.2 2.09 0.61 216 564 #REF! 18.6972
#> 9 B 90.5 6.6 1.79 0.66 166 613 #REF! 18.5199
#> 10 B 89.6 9.8 1.65 0.62 314 431 #REF! 18.6772
#> 206Pb/204Pb_err2SD 207Pb/204Pb 207Pb/204Pb_err2SD 208Pb/204Pb
#> 1 NA NA NA NA
#> 2 NA NA NA NA
#> 3 NA NA NA NA
#> 4 NA NA NA NA
#> 5 NA NA NA NA
#> 6 0.00095 15.6380 0.00080 38.492
#> 7 0.00098 15.6613 0.00073 38.447
#> 8 0.00070 15.6352 0.00084 38.443
#> 9 0.00089 15.6111 0.00063 38.088
#> 10 0.00079 15.6244 0.00073 38.065
#> 208Pb/204Pb_err2SD
#> 1 NA
#> 2 NA
#> 3 NA
#> 4 NA
#> 5 NA
#> 6 0.0017
#> 7 0.0013
#> 8 0.0017
#> 9 0.0013
#> 10 0.0012
Note how the column headers are organised according to the ASTR
conventions. This allows the package to understand some of the
semantics of the dataset in the read_ASTR() process, and
correctly represent them in the resulting ASTR object. A
closer look at the mock-up data also reveals (common) problems with the
copper isotope data: it seems that some Excel formulas did not work as
intended and left #REF! entries. read_ASTR()
will read these as NA values. Finally, every
ASTR object requires one column that acts as a unique row
identifier. In this dataset there is no such column, though, only a
Group identifier. read_ASTR() can
automatically turn such a column to a unique identifier.
We can now perform the reading process. As we are working with
mock-up data in an R vignette, we do not read from the file system. We
only need to call an essential internal function of
read_ASTR(): as_ASTR(). It turns R data.frames
to ASTR objects. If we would start from an Excel file we
would call read_ASTR() instead.
To do this in practice we not only have to submit our
data to as_ASTR(), but also set two other
arguments: 1. We have to define one of the columns as the ID column with
id_column, and 2. we explicitly have to mark any column
providing contextual information with context (i.e. no
analytical values, cf. ASTR
schema: Implementation).
data <- as_ASTR(data, id_column = "Group", context = c("Group"))
#> Warning in as_ASTR(data, id_column = "Group", context = c("Group")): Detected
#> multiple data rows with the same ID. They will be renamed consecutively using
#> the following convention: _1, _2, ... _n
#> Warning: Issue when transforming column "d65Cu" to numeric values: NAs
#> introduced by coercion
#> Warning in validate.ASTR(df5, quiet = FALSE): 34 missing values across 7
#> analytical columns
#> Warning in as_ASTR(data, id_column = "Group", context = c("Group")): See the
#> full list of validation output with: ASTR::validate(<your ASTR object>).While ASTR attempts to make data import as easy as possible, it also
aims to be transparent about it. Consequently, as_ASTR()
issues several warnings when confronted with our mock-up data, flagging
(potential) issues:
- The column for the ID did contain non-unique values but ASTR automatically created unique values from it.
- Some values in the column
d65Cuwere replaced withNA; when inspecting the data set, it is clear that these were the Excel#REF!error messages. In general, ASTR replaces by default common and user-defined indicators for missing values and also for indicators of “below detection limit” during import. - Across the dataset, 34 values are missing.
To learn more about the (potential) issues with our dataset, it
suggests to run validate():
validate(data)
#> # A tibble: 7 × 3
#> column count warning
#> <chr> <int> <chr>
#> 1 d65Cu 4 missing values
#> 2 206Pb/204Pb 5 missing values
#> 3 206Pb/204Pb_err2SD 5 missing values
#> 4 207Pb/204Pb 5 missing values
#> 5 207Pb/204Pb_err2SD 5 missing values
#> 6 208Pb/204Pb 5 missing values
#> 7 208Pb/204Pb_err2SD 5 missing valuesThis summary of where to find the missing values comes in handy for large datasets!
In our case, the dataset is fine. We might want to check on the missing values in the copper isotope data because apparently there was something wrong with the Excel file. In case of the lead isotope data, however, they are intentional (they were not measured for samples from group A).
Having a closer look on the imported data set shows, that ASTR did one more thing:
data
#> ASTR table
#> Analytical columns: Cu, Sn, As, Sb, Ag, Ni, d65Cu, 206Pb/204Pb, 206Pb/204Pb_err2SD, 207Pb/204Pb, 207Pb/204Pb_err2SD, 208Pb/204Pb, 208Pb/204Pb_err2SD
#> Contextual columns: Group
#> # A data frame: 10 × 15
#> ID Group Cu Sn As Sb Ag Ni d65Cu `206Pb/204Pb`
#> <chr> <chr> [wtP] [wtP] [wtP] [wtP] [mg/kg] [mg/kg] <dbl> <dbl>
#> 1 A_1 A 87.9 9.8 2.39 0.98 136 316 0.17 NA
#> 2 A_2 A 92.9 7.3 2.19 0.95 204 455 -0.41 NA
#> 3 A_3 A 89.1 8.4 2.14 0.85 203 580 -0.12 NA
#> 4 A_4 A 93.8 7.9 2.49 0.9 192 343 -0.23 NA
#> 5 A_5 A 94.4 5.5 2.16 0.51 138 496 0.31 NA
#> 6 B_1 B 85.5 9.5 2.21 0.74 135 372 -0.05 18.7
#> 7 B_2 B 90.3 6.2 2.04 0.88 158 345 NA 18.7
#> 8 B_3 B 93.9 5.2 2.09 0.61 216 564 NA 18.7
#> 9 B_4 B 90.5 6.6 1.79 0.66 166 613 NA 18.5
#> 10 B_5 B 89.6 9.8 1.65 0.62 314 431 NA 18.7
#> # ℹ 5 more variables: `206Pb/204Pb_err2SD` <dbl>, `207Pb/204Pb` <dbl>,
#> # `207Pb/204Pb_err2SD` <dbl>, `208Pb/204Pb` <dbl>, `208Pb/204Pb_err2SD` <dbl>Our relative unit “ppm” was converted into SI-units and the values
now have a measurement unit R can understand (note that the unit
wt% is written as wtP)1:
units(data$Ag[1])
#> $numerator
#> [1] "mg"
#>
#> $denominator
#> [1] "kg"
#>
#> attr(,"class")
#> [1] "symbolic_units"When looking into the data structure, we also see that it also labelled the different columns as e.g. concentration or isotope ratio:
str(data)
#> ASTR [10 × 15] (S3: ASTR/tbl_df/tbl/data.frame)
#> $ ID : chr [1:10] "A_1" "A_2" "A_3" "A_4" ...
#> ..- attr(*, "ASTR_class")= chr "ASTR_id"
#> $ Group : chr [1:10] "A" "A" "A" "A" ...
#> ..- attr(*, "ASTR_class")= chr "ASTR_context"
#> $ Cu : Units: [wtP] num [1:10] 87.9 92.9 89.1 93.8 94.4 85.5 90.3 93.9 90.5 89.6
#> ..- attr(*, "ASTR_class")= chr "ASTR_concentration"
#> $ Sn : Units: [wtP] num [1:10] 9.8 7.3 8.4 7.9 5.5 9.5 6.2 5.2 6.6 9.8
#> ..- attr(*, "ASTR_class")= chr "ASTR_concentration"
#> $ As : Units: [wtP] num [1:10] 2.39 2.19 2.14 2.49 2.16 2.21 2.04 2.09 1.79 1.65
#> ..- attr(*, "ASTR_class")= chr "ASTR_concentration"
#> $ Sb : Units: [wtP] num [1:10] 0.98 0.95 0.85 0.9 0.51 0.74 0.88 0.61 0.66 0.62
#> ..- attr(*, "ASTR_class")= chr "ASTR_concentration"
#> $ Ag : Units: [mg/kg] num [1:10] 136 204 203 192 138 135 158 216 166 314
#> ..- attr(*, "ASTR_class")= chr "ASTR_concentration"
#> $ Ni : Units: [mg/kg] num [1:10] 316 455 580 343 496 372 345 564 613 431
#> ..- attr(*, "ASTR_class")= chr "ASTR_concentration"
#> $ d65Cu : num [1:10] 0.17 -0.41 -0.12 -0.23 0.31 -0.05 NA NA NA NA
#> ..- attr(*, "ASTR_class")= chr [1:2] "ASTR_isotope" "ASTR_ratio"
#> $ 206Pb/204Pb : num [1:10] NA NA NA NA NA ...
#> ..- attr(*, "ASTR_class")= chr [1:2] "ASTR_isotope" "ASTR_ratio"
#> $ 206Pb/204Pb_err2SD: num [1:10] NA NA NA NA NA 0.00095 0.00098 0.0007 0.00089 0.00079
#> ..- attr(*, "ASTR_class")= chr "ASTR_error"
#> $ 207Pb/204Pb : num [1:10] NA NA NA NA NA ...
#> ..- attr(*, "ASTR_class")= chr [1:2] "ASTR_isotope" "ASTR_ratio"
#> $ 207Pb/204Pb_err2SD: num [1:10] NA NA NA NA NA 0.0008 0.00073 0.00084 0.00063 0.00073
#> ..- attr(*, "ASTR_class")= chr "ASTR_error"
#> $ 208Pb/204Pb : num [1:10] NA NA NA NA NA ...
#> ..- attr(*, "ASTR_class")= chr [1:2] "ASTR_isotope" "ASTR_ratio"
#> $ 208Pb/204Pb_err2SD: num [1:10] NA NA NA NA NA 0.0017 0.0013 0.0017 0.0013 0.0012
#> ..- attr(*, "ASTR_class")= chr "ASTR_error"Subsetting data
With this much information about our data, it is easy to subset data.
Let’s say that, for example, we want to check if our data can be
differentiated into groups based on the chemical composition by creating
biplots of all possible combinations. The
get_ ... _columns()-family comes in very handy here:
pairs(get_concentration_columns(data)[-1])
Note that we have to exclude the first column here, because the first
column of ASTR objects and their subsets is always the ID
column to ensure that data can be clearly assigned to a single sample.
Actually, we might only be interested in the main elements of the metal.
In this case, we can take advantage that these are the only values in
wtP:
pairs(get_unit_columns(data, units = "wtP")[-1])
Unit conversions
While continuing investigating the metal, we may want to estimate its
approximate melting temperature and remember that we can get this
information from a phase diagram. While our values for copper and tin
are in wt%, the phase diagram in front of us is in
at%. Converting from wt% to at%
is probably faster than searching for a version of the phase diagram in
wt%2:
library(magrittr) # import of pipe operator `%>%`
unify_concentration_unit(data, unit = "wtP") %>%
wt_to_at()
#> ASTR table
#> Analytical columns: Cu, Sn, As, Sb, Ag, Ni, d65Cu, 206Pb/204Pb, 206Pb/204Pb_err2SD, 207Pb/204Pb, 207Pb/204Pb_err2SD, 208Pb/204Pb, 208Pb/204Pb_err2SD
#> Contextual columns: Group
#> # A data frame: 10 × 15
#> ID Group Cu Sn As Sb Ag Ni d65Cu `206Pb/204Pb`
#> <chr> <chr> [atP] [atP] [atP] [atP] [atP] [atP] <dbl> <dbl>
#> 1 A_1 A 91.8 5.48 2.12 0.534 0.00837 0.0357 0.17 NA
#> 2 A_2 A 93.6 3.94 1.87 0.500 0.0121 0.0496 -0.41 NA
#> 3 A_3 A 92.9 4.69 1.89 0.462 0.0125 0.0655 -0.12 NA
#> 4 A_4 A 93.2 4.20 2.10 0.467 0.0112 0.0369 -0.23 NA
#> 5 A_5 A 94.9 2.96 1.84 0.267 0.00817 0.0540 0.31 NA
#> 6 B_1 B 92.0 5.47 2.02 0.416 0.00856 0.0434 -0.05 18.7
#> 7 B_2 B 94.2 3.46 1.81 0.479 0.00971 0.0390 NA 18.7
#> 8 B_3 B 95.0 2.82 1.79 0.322 0.0129 0.0618 NA 18.7
#> 9 B_4 B 94.3 3.68 1.58 0.359 0.0102 0.0692 NA 18.5
#> 10 B_5 B 92.7 5.43 1.45 0.335 0.0191 0.0483 NA 18.7
#> # ℹ 5 more variables: `206Pb/204Pb_err2SD` <dbl>, `207Pb/204Pb` <dbl>,
#> # `207Pb/204Pb_err2SD` <dbl>, `208Pb/204Pb` <dbl>, `208Pb/204Pb_err2SD` <dbl>We use a two-step approach here to be more accurate: The actual
conversion from wt% to at% is done by
wt_to_at(). However, because Ag and Ni are not in
wt%, they need to be converted first. That’s why we run
unify_concentration_unit() first: it detects values with
units that can be converted into the target unit and converts them all
into the same unit. While it is nice to be scientifically accurate, we
can get only a rough estimate anyway because of the significant amounts
of arsenic and antimony in the metal. Therefore, converting only copper
and tin would be sufficient:
wt_to_at(data, elements = c("Cu", "Sn"))
#> ASTR table
#> Analytical columns: Cu, Sn, As, Sb, Ag, Ni, d65Cu, 206Pb/204Pb, 206Pb/204Pb_err2SD, 207Pb/204Pb, 207Pb/204Pb_err2SD, 208Pb/204Pb, 208Pb/204Pb_err2SD
#> Contextual columns: Group
#> # A data frame: 10 × 15
#> ID Group Cu Sn As Sb Ag Ni d65Cu `206Pb/204Pb`
#> <chr> <chr> [atP] [atP] [wtP] [wtP] [mg/kg] [mg/kg] <dbl> <dbl>
#> 1 A_1 A 94.4 5.63 2.39 0.98 136 316 0.17 NA
#> 2 A_2 A 96.0 4.04 2.19 0.95 204 455 -0.41 NA
#> 3 A_3 A 95.2 4.80 2.14 0.85 203 580 -0.12 NA
#> 4 A_4 A 95.7 4.31 2.49 0.9 192 343 -0.23 NA
#> 5 A_5 A 97.0 3.02 2.16 0.51 138 496 0.31 NA
#> 6 B_1 B 94.4 5.61 2.21 0.74 135 372 -0.05 18.7
#> 7 B_2 B 96.5 3.55 2.04 0.88 158 345 NA 18.7
#> 8 B_3 B 97.1 2.88 2.09 0.61 216 564 NA 18.7
#> 9 B_4 B 96.2 3.76 1.79 0.66 166 613 NA 18.5
#> 10 B_5 B 94.5 5.53 1.65 0.62 314 431 NA 18.7
#> # ℹ 5 more variables: `206Pb/204Pb_err2SD` <dbl>, `207Pb/204Pb` <dbl>,
#> # `207Pb/204Pb_err2SD` <dbl>, `208Pb/204Pb` <dbl>, `208Pb/204Pb_err2SD` <dbl>Working with data
Another task in our investigation of the copper items is to identify the type of copper alloy our metal is made of. A first rough approach could be to use the classification suggested by Bray et al. (2015). Not surprising, we see that they all belong to the group “As+Sb”:
copper_group <- copper_group_bray(data)
copper_group
#> ASTR table
#> Analytical columns: As, Sb, Ag, Ni
#> Contextual columns: Group, copper_group_bray
#> # A data frame: 10 × 7
#> ID Group As Sb Ag Ni copper_group_bray
#> <chr> <chr> [wtP] [wtP] [wtP] [wtP] <chr>
#> 1 A_1 A 2.39 0.98 0.0136 0.0316 As+Sb
#> 2 A_2 A 2.19 0.95 0.0204 0.0455 As+Sb
#> 3 A_3 A 2.14 0.85 0.0203 0.058 As+Sb
#> 4 A_4 A 2.49 0.9 0.0192 0.0343 As+Sb
#> 5 A_5 A 2.16 0.51 0.0138 0.0496 As+Sb
#> 6 B_1 B 2.21 0.74 0.0135 0.0372 As+Sb
#> 7 B_2 B 2.04 0.88 0.0158 0.0345 As+Sb
#> 8 B_3 B 2.09 0.61 0.0216 0.0564 As+Sb
#> 9 B_4 B 1.79 0.66 0.0166 0.0613 As+Sb
#> 10 B_5 B 1.65 0.62 0.0314 0.0431 As+SbAfter we look into the chemical compositions, it is time to turn to the lead isotope data. Before plotting them, we want to calculate their age model parameters because they give us some idea about the geological setting. The model by Stacey & Kramers (1975) is still the most commonly used one. Because we have lead isotope data only for group B, we exclude data from group A:
data_pb_iso <- data[data$Group == "B", ]
age_model_params <- pb_iso_age_model(data_pb_iso, model = "SK75")
age_model_params
#> ASTR table
#> Analytical columns: 206Pb/204Pb, 207Pb/204Pb, 208Pb/204Pb
#> Contextual columns: Group, model_age_SK75, mu_SK75, kappa_SK75
#> # A data frame: 5 × 8
#> ID Group `206Pb/204Pb` `207Pb/204Pb` `208Pb/204Pb` model_age_SK75 mu_SK75
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 B_1 B 18.7 15.6 38.5 18.7 9.78
#> 2 B_2 B 18.7 15.7 38.4 67.8 9.88
#> 3 B_3 B 18.7 15.6 38.4 18.5 9.77
#> 4 B_4 B 18.5 15.6 38.1 101. 9.70
#> 5 B_5 B 18.7 15.6 38.1 10.2 9.73
#> # ℹ 1 more variable: kappa_SK75 <dbl>Data export
After finishing our analysis of the data, we want to save the copper
type and the parameters of the lead isotope model age to our hard drive.
For this task, ASTR relies on the functions already provided by R and
other R packages (e.g. readr::write_csv()). Before export,
we have to make sure that the information about the units of the values
is preserved by including it back in the column name:
data_unitless <- remove_units(data, recover_unit_names = TRUE)
data_unitless
#> ASTR table
#> Analytical columns: Cu_wt%, Sn_wt%, As_wt%, Sb_wt%, Ag_mg/kg, Ni_mg/kg, d65Cu, 206Pb/204Pb, 206Pb/204Pb_err2SD, 207Pb/204Pb, 207Pb/204Pb_err2SD, 208Pb/204Pb, 208Pb/204Pb_err2SD
#> Contextual columns: Group
#> # A data frame: 10 × 15
#> ID Group `Cu_wt%` `Sn_wt%` `As_wt%` `Sb_wt%` `Ag_mg/kg` `Ni_mg/kg` d65Cu
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 A_1 A 87.9 9.8 2.39 0.98 136 316 0.17
#> 2 A_2 A 92.9 7.3 2.19 0.95 204 455 -0.41
#> 3 A_3 A 89.1 8.4 2.14 0.85 203 580 -0.12
#> 4 A_4 A 93.8 7.9 2.49 0.9 192 343 -0.23
#> 5 A_5 A 94.4 5.5 2.16 0.51 138 496 0.31
#> 6 B_1 B 85.5 9.5 2.21 0.74 135 372 -0.05
#> 7 B_2 B 90.3 6.2 2.04 0.88 158 345 NA
#> 8 B_3 B 93.9 5.2 2.09 0.61 216 564 NA
#> 9 B_4 B 90.5 6.6 1.79 0.66 166 613 NA
#> 10 B_5 B 89.6 9.8 1.65 0.62 314 431 NA
#> # ℹ 6 more variables: `206Pb/204Pb` <dbl>, `206Pb/204Pb_err2SD` <dbl>,
#> # `207Pb/204Pb` <dbl>, `207Pb/204Pb_err2SD` <dbl>, `208Pb/204Pb` <dbl>,
#> # `208Pb/204Pb_err2SD` <dbl>