nCounter Knowledge Base: Data Analysis

Home » Support » Knowledge Base » nCounter Knowledge Base: Data Analysis

nCounter Data Analysis

General

For Windows:

First, make sure that you “unhide” hidden folders on your computer. For Windows 10 or 11, open the File Explorer, click “View”, and the “Show” Menu, and click “Hidden Items.” For win 7 or 8 go to the control panel and, using the search bar, look for “Folder Options” (if using Win 7 or 8) and click the “View” tab. Within the “View” tab, find “Show hidden files, folders, and drives” and select it. Click Apply and exit. Hidden files and folders should now be visible.
Navigate to c:\users\<username>\appdata\roaming\nSolver4.
Copy the folder and zip the copy to USB.
On the computer being transferred to, if nSolver software is already present, find the nSolver4 folder at the above location and rename (or move to another location).
Unzip then place the nSolver4 folder to be transferred into the same location on the computer.

For Mac:

(A) Using Finder
Open terminal Type in the following command: defaults write com.apple.finder AppleShowAllFiles YES – this will make the invisible files show up using Finder (warning if you have never done this there are lots of invisible files and directories).
Copy the nSolver4 directory to be transferred (C:/Users/<user name>/.nSolver4… don’t forget the preceding dot in .nSolver4) to a flash drive or zip the copy for transport via file share.
On the other Mac, type in same terminal command to show hidden folders, and then re-name the existing .nSolver4 directory (e.g. .nSolver4a, etc.), then unzip and drag and drop the replacement one in the same location.
When finished go back to terminal on both systems and type: defaults write com.apple.finder AppleShowAllFiles NO – and now everything will be back to normal.

(B) Using Terminal
You can use terminal to do all of this, which means that you don’t have to tell finder to show hidden files, type ls -la list all files
Use the ditto command to copy (but copy/paste should work as well)
Making hidden files and folders viewable for Mac:
Open a terminal window by finding “Terminal” in the Utilities folder in the Applications directory. At the prompt copy and paste the following:
defaults write com.apple.finder AppleShowAllFiles TRUE
killall Finder
Press Enter.

Multi-RLF experiments come in two different types: CrossRLF/Batch Calibration and MultiRLF Merge. The CrossRLF/Batch Calibration option allows you to consolidate datasets of primarily distinct samples that were each run on multiple CodeSets (RLF) or on different CodeSet or reagent lots. At least one calibrator sample must be run across all CodeSets/lots. An ideal calibrator sample has robust counts (>200) for all genes of interest. Some off the shelf Panels can be purchased with a pool of synthetic targets which can be used for this purpose (called Panel Standards).

The Multi-RLF Merge option allows for data from a set of identical samples run across two or more CodeSets to be aggregated.

To create a CrossRLF experiment in nSolver software, both RLFs needed to be uploaded before creating the experiment. nSolver software will normalize within each RLF (i.e. each batch) followed by CodeSet calibration using the calibrator samples run in each lot. The user manual covers this process in the section titled “Multi-RLF Experiments & Batch Calibration”.

To use Advanced Analysis with a CrossRLF/batch corrected experiment starts with the “Normalized Data,” choose a Custom Analysis, and make sure to uncheck the “Normalize mRNA” box in the Normalization module options.

To create a multi-RLF Merge experiment in the Advanced Analysis module, one must first create a basic nSolver experiment for each separate CodeSet. It is critical to ensure that sample names are well annotated so identical samples can be easily matched up in the combined dataset. Since the geNorm algorithm for automatic housekeeping gene selection is not available in the Advanced Analysis module for multi-RLF Merge experiments, normalization should be finalized in the initial experiment before proceeding to the next stage. If housekeeping genes have been validated, these can be picked manually in each respective basic nSolver experiment. Alternatively, each experiment should be run separately through the Advanced Analysis module with the sole purpose of identifying the most stably expressed targets using the included Normalization module. Thereafter, the basic nSolver experiment should be re-run using these selected genes for normalization.

Next, upload the RLF files for each CodeSet into nSolver software.

Create a multi-RLF Merge experiment in nSolver software. The user manual covers this process in the section titled “Multi-RLF Experiments & Batch Calibration”. Here you will need to align the various nSolver files (one for each sample for each CodeSet), specifically matching up the sample names across CodeSets (panels).

Lastly, from the new multi RLF experiment, select all the samples from the normalized data table, and run them through the Advanced Analysis Module as normal. The option to normalize the data here is automatically disabled, as the software expects that the data have already been normalized according to the methods outlined above.

Data QC and Normalization

The total surface area of each lane in a cartridge is scanned in multiple discrete units called fields of view (FOV). After scanning is complete, the FOV within each lane are aggregated together to generate total counts across the entire surface area within each lane. The “Imaging QC” metric quantifies the performance of this imaging process. Specifically, it is a fraction that is calculated by dividing the number of FOVs that have successfully been scanned (i.e., “FOV Counted” within nSolver software) by the number of FOVs that were attempted to be scanned (i.e., “FOV Count” within nSolver software). Significant discrepancy between the number of FOV for which imaging was attempted (“FOV Count”) and for which imaging was successful (“FOV Counted”) may indicate an issue with imaging performance.

Within nSolver software, a sample that has an Imaging QC value less than 0.75 (or 75%) will be flagged. The threshold of 0.75 was selected based on internal testing that evaluated performance over a range of FOV values. The scanner is more likely to encounter difficulties near the edge of the slide. Therefore, when the maximum scan setting is selected for MAX or FLEX systems (the SPRINT instrument has one scan setting), it is more likely that some FOV will be dropped. Reduction in number of FOV counted does not compromise data quality and is accounted for during data normalization. However, when a substantial percentage of FOVs are not successfully counted, there may be issues with the resulting data. Consistent large reductions in percentages can be indicative of an issue associated with the instrumentation.

If Imaging QC is greater than 0.75, then a re-scan may be performed, if desired, in attempt to increase number of FOV counted, though as a routine practice this is not necessary or recommended. If Imaging QC is less than 0.75, then clean the bottom of the cartridge with a lint-free wipe, and re-scan the cartridge, being sure that the cartridge lays flat in the scanner. Please note that the re-scan option is currently available for MAX and FLEX systems only; it is not available for the SPRINT system (as of October 25, 2016). If re-scan does not improve imaging performance in samples with Imaging QC less than 0.75, then email the raw data (RCC files) and instrument log files to support.spatial@bruker.com. The data and logs will be examined for hardware or assay problems.

A QC flag does not necessarily mean that data from a flagged lane cannot be used. The thresholds for QC flags are set at a conservative level in order to both catch samples which may have failed, and also to identify samples with usable data which happened to experience a reduction in assay efficiency.

To determine whether a QC flag is indicating a critical problem, examine the raw and normalized data and check whether the flagged samples have a poorer limit of detection for low count transcripts when compared to non-flagged samples. For some genes, differences in expression level between samples will be caused by differences in treatment or pathology, so it may be more appropriate to determine if the expression of only the low count genes for any flagged lane falls within the range of expression values observed across a number of unflagged samples which come from different treatments or pathologies.

One can approach this potential limit of detection question in a number of ways. First, a simple visual scan of the data may suffice to detect problems in the flagged samples. This can be performed on raw data which have been background subtracted in nSolver software to identify targets that are below the background. Alternatively, outlier samples could be identified by generating a heat map of normalized data from all samples to see if the flagged samples in question are strongly divergent from other samples with similar pathology. Another option would be to examine the calculated QC metrics within nSolver software (right click or command click on one of the table columns in the raw data table, and choose ‘select hidden columns’). If these QC metrics have only exceeded the threshold by a very small margin (i.e., the FOV registration is 74% instead of 75%), then the resultant data are generally going to be quite robust and usable.

More details on QC flags can be found in the nSolver software user manual. If QC flags become more than a rare anomaly, we encourage you to contact our support team (support.spatial@bruker.com and/or your local Applications Scientist) in order to assist you in tracking down the root cause of these potential problems with the assay consistency.

Data normalization is designed to remove sources of technical variability from an experiment, so that the remaining variance can be attributed to the underlying biology of the system under study. The precision and accuracy of nCounter Gene Expression assays are dependent upon robust methods of normalization to allow direct comparison between samples. There are many sources of variability that can potentially be introduced into nCounter assays. The largest and most common categories of variability originate from either the platform or the sample. Both types of variability can be normalized using standard normalization procedures for Gene Expression assays.

Standard normalization uses a combination of Positive Control Normalization, which uses synthetic positive control targets, and CodeSet Content Normalization, which uses housekeeping genes, to apply a sample-specific correction factor to all the target probes within that sample lane. These correction factors will control for sources of variability such as pipetting errors, instrument scan resolution, and sample input variability that affect all probes equally.

Note that Positive Control Normalization will not correct for sample input variability, and thus should usually be used in combination with CodeSet Content (housekeeping gene) Normalization. Performing such a two-step normalization will usually not differ mathematically from Content Normalization alone, and thus is mathematically somewhat redundant. Nevertheless, normalizing to both target classes will provide a good indicator of how technical variability is partitioned between the two major sources of assay noise (platform and sample), and thus may provide a good tool for troubleshooting low assay performance. Normalization workflows are described below.

nCounter Reporter probes (or TagSet probes) are manufactured to contain six synthetic ssDNA control targets. The counts from these targets may be used to normalize all platform-associated sources of variation (e.g., automated purification, hybridization conditions, etc.).

The procedure is as follows:

Calculate the geometric mean of the positive controls for each lane (POS_E to POS_A).
Calculate the arithmetic mean of these geometric means for all sample lanes.
Divide this arithmetic mean by the geometric mean of each lane to generate a lane-specific normalization factor.
Multiply the counts for every gene by its lane-specific normalization factor.

It is expected that some noise will be introduced into the nCounter assay due to variability in sample input. For most experiments, normalization of sample input is most effectively done using so-called housekeeping genes. These are mRNA targets included in a CodeSet which are known to or are suspected to show little-to-no variability in expression across all treatment conditions in the experiment. Because of this, these targets will ideally vary only according to how much sample RNA was loaded.

Using the geometric mean of three housekeeping genes, at minimum, to calculate normalization factors is highly recommended. This is done in order to minimize the noise from individual genes and to ensure that the calculations are not weighted towards the highest expressing housekeeping targets. It is important to note that some previously-identified housekeeping genes may, in fact, behave poorly as normalizing targets in the current experiment, and may therefore need to be excluded from normalization.

The procedure is the same as that for Positive Control Normalization:

Calculate the geometric mean of the selected housekeeping genes for each lane.
Calculate the arithmetic mean of these geometric means for all sample lanes.
Divide this arithmetic mean by the geometric mean of each lane to generate a lane-specific normalization factor.
Multiply the counts for every gene by its lane-specific normalization factor.

A positive control normalization flag indicates that the POS controls for the lane (sample) in question are more than three-fold different (greater or smaller) than the POS control counts from the other samples in the experiment. High POS control counts are rarely problematic, so a flag usually only indicates a problem when the POS controls are particularly low for a sample. Such low POS counts are indicative of relatively low assay efficiency at capturing and counting targets, which may lower sensitivity or introduce bias into the assay.

To determine whether a POS control normalization flag is indicating a critical problem, examine the raw and normalized data and check whether the flagged samples have a poorer limit of detection for low count transcripts when compared to non-flagged samples. For some genes one should anticipate differences in expression level between samples due to differences in treatment or pathology, so it may be more appropriate to see if the expression of the low count genes for any flagged lane falls in the range of expression values observed across a number of unflagged samples which come from different treatments or pathologies.

One can approach this potential limit of detection question in a number of ways. First, a simple visual scan of the data may suffice to detect problems in the flagged samples. This can be performed on raw data which have been background subtracted in nSolver software to identify targets that are below the background. Alternatively, outlier samples could be identified by generating a heat map of normalized data from all samples to see if the flagged samples in question are strongly divergent from other samples with similar pathology. Another option would be to examine the calculated POS control normalization factors within nSolver software (found in the normalized data table on the far right). If these factors have only exceeded the threshold by a very small margin (i.e., the POS control normalization factor is 3.2), then one can usually assume that the resultant data are generally going to be quite robust and usable for the majority of data sets.

More details on POS control normalization flags can be found in the nSolver software user manual. If POS control normalization flags become more than a rare anomaly, we encourage you to contact our support team (support.spatial@bruker.com and/or your local Applications Scientist) in order to assist you in tracking down the root cause of these potential problems with the assay consistency.

A QC flag for content normalization indicates that the flagged sample had a content (or housekeeping gene) normalization factor more than 10-fold different from the average sample in the same experiment. In other words, the flagged sample had significantly lower or higher counts in the Housekeeping genes which are used to normalize sample input. Although unusually high housekeeping gene counts would not typically be problematic, it is much more common to see samples with lower housekeeping gene counts, and these would be flagged if the content correction factor for that sample were greater than 10.

Content normalization flags can be caused by either a significant reduction in overall assay efficiency for that sample, or because of an effective reduction in quantity or quality (fragmentation) of the input RNA. The likelihood of a reduction in assay efficiency can be assessed by the presence of any other QC flags for that sample. If the lane failed the QC specifications by a large margin for any of the other QC metrics (including POS control normalization), then overall counts may be reduced enough to also cause a Content normalization flag. Essentially, in this scenario the assay is working so poorly that the counts for endogenous and housekeeping genes are dramatically reduced even if sufficient RNA targets are present. If, however, the sample had no other QC flags except that for Content normalization, this usually means that the assay is working well, but there were insufficient RNA targets to count. This can be caused either by low RNA concentrations or highly fragmented RNA, such as from an archival FFPE sample.

To determine whether a Content normalization flag is creating a critical problem, examine the raw and normalized data and check whether the flagged samples have a poorer limit of detection for low count transcripts when compared to non-flagged samples. For some genes one should anticipate differences in expression level between samples due to differences in treatment or pathology, so it may be more appropriate to see if the expression of the low count genes for any flagged lane falls in the range of expression values observed across a number of unflagged samples which come from different treatments or pathologies.

More details on Content normalization flags can be found in the nSolver software user manual. If QC flags become more than a rare anomaly, we encourage you to contact our support team (support.spatial@bruker.com and/or your local Applications Scientist) in order to assist you in tracking down the root cause of these potential problems with the assay consistency.

The Housekeeping (HK) Gene selection in the Advanced Analysis is performed by default using the geNorm algorithm shown in the below paper:

Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, et al. Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome biology. 2002;3(7):research0034.

The geNorm algorithm assumes that HK gene expression does not change across all samples, irrespective of the experimental condition. Based on that assumption, geNorm expects that the ratio between HK Gene A and HK Gene B within sample 1 will be the same as the ratio between HK Gene A and HK Gene B in sample 2, and sample 3, etc. If that is not true for the dataset, the aberrant gene is not used as a HK gene for normalization. As such, geNorm looks at the different ratios between the potential HK genes and iteratively removes HK genes that do not perform as expected. In the end, it retains a set of optimal HK genes to be used for the final normalization. A more detailed explanation can be found in the Advanced Analysis User Manual (MAN-10030).

Within a typical nSolver workflow, HK genes can be selected by the user based on their variability across samples (%CV) and/or their average expression levels. Overall, this is a relatively good approach. However, there is a risk that less optimal HK genes are chosen due to variability in the data that artificially decrease the %CV. Consequently, %CV can sometimes lead to a bias due to variability in the data that can coincidentally result in artificially stable genes. Similarly, this can make genes that are normally stable appear unstable due to input variability. In the latter situation, geNorm will still be able to identify that gene as a good housekeeping gene (as it is ratio-based within samples), whereas the %CV method will discard the gene. Therefore, relying solely on %CV is not recommended. A consistent trend amongst the annotated housekeeping genes must also be considered.

The best approach for normalizing miRNA data will depend mostly on the sample type they represent. For everything except biofluids (such as plasma or serum), using a “global” normalization method which normalizes to total counts of the 100 most highly expressed (on average) miRNA targets across all samples is recommended. This is called the TOP 100 method in the software. Importantly, this method does not use the Positive Control or Positive Ligation Control probes for any of these calculations.

However, it does get more complicated with biofluids or other samples where the number of expressed targets drops below ~150-200 targets. As a frame of reference, targets expressed above background are usually identified by comparison to the Negative control probes (either the mean, mean +2 Standard Deviation, the maximum value of the NEG probes, or 100 to be conservative).

When normalizing samples from biofluids, a judgement call can be made depending on how many targets are expressed above background. In the miRNA assay, background would usually be ~30 counts, but will vary from one experiment to the next. Therefore, sometimes a global approach (TOP 100 method) can still work with biofluids if samples express 100-150 miRNA targets above this cutoff.

However, if this is not the case, the identification of good “housekeeper” miRNAs will likely allow you to normalize and obtain robust results. There are not many well-characterized housekeeper miRNA targets from plasma or other biofluids, as they do seem to vary depending on extraction kits and pathologies being studied. Consequently, a literature search would not necessarily help you determine appropriate housekeepers and a more data-driven approach would be better suited. Using third party software or algorithms can identify the most stably expressed targets within the particular experiment. It is recommended that this method of identifying housekeeping genes be repeated as more data is generated to confirm these are appropriate for the entirety of the study and not just for the initial experiment.

The path of least resistance on published algorithms for Stable Housekeeper gene identification is NormFinder, because it is free and easy to use.

Claus Lindbjerg Andersen, Jens Ledet Jensen and Torben Falck Ørntoft. Cancer Res 2004;64:5245-5250.
http://cancerres.aacrjournals.org/content/64/15/5245.
Supplemental Methods
http://cancerres.aacrjournals.org/content/suppl/2004/08/24/64.15.5245.DC1.html
Software download: http://moma.dk/normfinder-software

geNorm is another program that uses slightly different principles. Specifically, NormFinder chooses targets with the lowest within and between group variance, while geNorm also picks multiple targets that give the lowest estimates of variance when they are used together (NormFinder only picks them individually or gives the best two together). geNorm can be obtained with a license.

If Spike-In synthetic miRNAs are used to normalize variance introduced in purification of samples, it is assumed and highly recommended that equal volume inputs are used across samples. Synthetic oligos must be spiked in before sample extraction, and it is strongly recommended that Spike-Ins are used for all samples in that experiment.

Three Methods for Normalization

Normalize using only the Spike-In control probes
Normalize using only the Housekeeping miRNA targets as identified by the user.
First normalize all the endogenous counts (including the putative miRNA housekeepers) to the Spike-In control probes. Then use the spike-In normalized miRNA housekeeper counts to normalize the endogenous miRNA targets. This option is not available in the nSolver software so it would need to be performed in Excel. The basic workflow in Excel is:
1. For each lane calculate the geometric mean of the Spike-In controls.
2. Calculate the arithmetic mean of these geometric means across all lanes.
3. Divide this arithmetic mean by the geometric mean in each lane (calculated in #1) to get a lane-specific normalization factor.
4. Multiply all the endogenous counts in a lane its lane-specific normalization factor.
5. Repeat 1 through 4 using the Spike-In normalized housekeeper miRNA targets.

The three methods for normalization may yield similar results. Typically, the better normalization approaches will result in overall lower variance. Below is an example graph depicting what would be expected of a typical normalization method. For each of the three methods, variance should be calculated, and the lowest variance method should be chosen. Theoretically, the third method provides the best reduction in technical and sample input variance.

Data Interpretation

The genes used for immune cell scoring comprise a subset of high confidence markers validated by co-expression patterns via a large survey of TCGA samples (N=9986), and confirmed by nCounter RNA and protein analysis (Danaher et al, 2016 http://dx.doi.org/10.1101/068940). To some extent, these markers thus already represent high confidence markers for these cell types.

An additional level of quality control is by default performed within the Advanced Analysis module, whereby correlations are calculated between the expression levels of these candidate cell type markers. Those markers which do not correlate with other cell type-specific markers are discarded from the estimates of abundance. Such markers may be expressed at low levels in another cell type, or they may show highly variable expression levels within their specific cell type, in either case making the gene a poor marker for cell type abundance.

The Advanced Analysis module will also, by default, utilize a resampling technique to generate a significance level for confidence in the individual cell type scores. Cell scores with p-values below a threshold level of confidence (e.g., 0.05) would be considered higher confidence stand-alone estimates of abundance. Note that some cell scores will only ever be based on a single literature-validated cell-specific marker, and the statistical resampling method can only ever return a p value of ‘1.0’ for these scores (i.e. Tregs are only characterized by expression of the gene FOXP3). Importantly, the cell abundance levels for these and other cell type scores with p-values greater than 0.05 should not necessarily be ignored, nor should they be considered unrelated to immune cell abundance. Instead, cell abundance scores with high (non significant) p-values should be considered hypotheses, with a confidence level based on the strength of these marker associations with cell type from the literature. The marker for Tregs, for example, is considered quite robust for this cell type and can therefore be reliably used as an estimate of Treg cell abundance, despite the single gene abundance score having a p-value of 1.0 in the software.

nCounter Knowledge Base: Data Analysis

nCounter Data Analysis

General

How do I analyze my nCounter data?

How do I install the Advanced Analysis Module on a MAC OS?

Can I compare data from my nCounter Analysis System to microarray data?

Can nCounter data be used together with qPCR data?

How do I backup my analyses from nSolver software or move them to a new computer?

Can I rename my RLF file?

How can I analyze multi-RLF experiments with the Advanced Analysis module?

Do I need to log transform my data?

Data QC and Normalization

What are the positive controls included in my CodeSet?

What are the best practices for nCounter data QC?

What is the “Imaging QC” metric and how should it be interpreted?

What is binding density? Is there an optimal binding density?

What factors influence binding density on a cartridge?

Do QC flags mean data for a flagged lane is unusable?

How is my data normalized in nSolver software?

Should I be worried about samples that pass QC but have normalization flags?

I get a positive control normalization flag for a lane(s). Is this a problem?

I got a QC flag for content normalization. What does that mean?

How do I select my reference or housekeeping genes?

What is the difference between geNorm and %CV methods for HK gene selection?

What are best practices for background subtraction?

When should I perform a reference gene normalization vs. a global normalization?

How do I access the normalized data in Advanced Analysis?

How should I normalize my miRNA data?

Data Interpretation

What is a Pathway score and how is it interpreted?

What is a GSA score and how is it interpreted?

How do GSA and Pathway scores differ?

How do I interpret Cell Type Profiling Scores?

How do I assess confidence in the Advanced Analysis Cell Type Profiling Scores?