3 Quality Control Summary

Below is our quality control (QC) summary visualizations and commentary. For additional details regarding individual QC steps and study-specific cutoffs, please refer to the Appendix.

In summary:

  • 7 slides were profiled and all 7 passed QC and were included in downstream analysis
  • 235 ROI/AOI segments were profiled and 221 passed QC and were included in downstream analysis
  • Of the 14 ROI/AOI segments that failed QC, most failed because the NGS reads were low and consequently few genes were detected
  • 10,131 genes were detected in at least 10% of ROI/AOI segments

3.1 Segment QC

We first will assess sequencing quality and adequate tissue sampling for every ROI/AOI segment.

Below is a study-specific summary of segments that pass the key QC metrics.

Analyst Notes: A total of 6 segments were removed by the above QC. See the Appendix for more details on segment QC parameters.

3.2 Probe QC

Each ROI/AOI segment has several unique negative control probes that in aggregate can be used to estimate background.

Our QC removes outlier negative control probes from the data to refine our estimation of background and downstream gene detection. We remove outlier probes either entirely from the study (global) or from specific segments (local).

Probe QC does not remove endogenous genes from the data.

Analyst Notes: There are 18619 probes that passed QC. 1 Global outlier and 22 Local outliers were detected.

3.3 Limit of Quantification

We define a limit of quantification (LOQ) per ROI/AOI segment based on the negative control probes to guide the filtering of segments and genes with low signal relative to background. The formula for calculating the LOQ in the \(i^{th}\) segment at \(n\) standard deviations (\(n\) = 2 for this study) is:

\[LOQ_{i} = geomean(NegProbe_{i}) * geoSD(NegProbe_{i})^{n}\]

3.3.1 LOQ-based Segment Filtering

Filtering is an important step to focus on the true biological signal of interest. Prior to downstream analysis, it is beneficial to remove ROI/AOI segments that are low in signal relative to background. Below is a breakdown of this study’s ROI/AOI segments with respect to the percentage of genes detected above LOQ.

Analyst Notes: In this dataset, we choose to remove segments with fewer than 10% of the genes detected. As a result, 8 segments were flagged and removed. In total, 14 segments were removed from the study from either segment QC or filtering.

3.3.2 LOQ-based Gene Filtering

Gene filtering improves interpretation of true biological signal. Below we graph the total number of genes detected in different percentages of segments. We typically set a cutoff based on the biological diversity of our dataset.

Analyst Notes: 10,130 targets were detected above LOQ in 10% or more of the segments. We will filter down to this number of targets.

3.4 Normalization

The purpose of normalization is to adjust for technical variables, such as ROI/AOI surface area and tissue quality, and enable meaningful biological and statistical discoveries.

Two common methods for normalization of GeoMx® WTA/CTA data are i) Quartile 3 (Q3) or ii) background. Both methods estimate a normalization factor per ROI/AOI segment to bring the segment data distributions together. Q3 is typically the preferred approach.

Analyst Notes: High correlation between the geometric mean of the negative control probe counts and 75th quantile (Q3) of expression. Q3 normalization was used downstream for analysis.

# Combine annotations, Q3 intensity, and background into a data.frame
df <- pData(target_data) %>% dplyr::select(all_of(factors_of_interest))

q3_intensity <- data.frame(Q3=unlist(apply(exprs(target_data), 2,
                               quantile, 0.75, na.rm = TRUE)))

negative_probes <- filter(fData(target_data), Negative==TRUE)$TargetName

if(length(negative_probes)<1L){
  stop("At least 1 negative probe is expected.")
} else if(length(negative_probes)<=1L){
  # i.e., 1 panel used; numeric
  neg_moment <- data.frame(exprs(target_data)[negative_probes,])
  colnames(neg_moment) <- gsub("-", ".", negative_probes)
} else {
  # i.e., >1 panels used; matrix
  neg_moment <- data.frame(t(exprs(target_data)[negative_probes,]))
}

# Combine
if(!all(row.names(df)==row.names(q3_intensity)) | !all(row.names(df)==row.names(neg_moment))){
  stop("Check row names")
} else {
  signal_intensity <- cbind(df, q3_intensity, neg_moment)
  signal_intensity <- signal_intensity %>% as_tibble() %>% tibble::add_column("Sample_ID"=row.names(signal_intensity), .before=1) %>% as.data.frame()
}
negative_probes_dots <- gsub("-", ".", negative_probes)
signal_intensity_long <- 
 tidyr::pivot_longer(signal_intensity, 
                     cols=c(Q3, all_of(negative_probes_dots)), 
                     names_to="Metric", values_to="value")

# Save pairs plot graphs to disc for later download
facs_to_graph <- setdiff(factors_of_interest, 'slide name')
pairs_plots <- list()
for(fac in facs_to_graph) {
 p <- plotPairs(dat=signal_intensity %>%
                  dplyr::select(Q3, c(all_of(negative_probes_dots), 
                                      all_of(fac))), 
                color_by=fac, color_scale = pal_main[[fac]])
 ggsave(p, filename = file.path(qc_dir, paste0("pairs_", fac, ".svg")), "svg",
                                width=10, height=6)
 pairs_plots[[fac]] <- p
}