Data Analysis Services

Quality Control and Assurance Pipeline - Illumina iScan MEGA Genotyping

Deliverables provided to the Customer

An example of the text and PDF deliverables can be found here.  Currently, there are two types of deliverables we provide to our customers:

  • Data files

  • Text and PDF Files

The main sets of data files are the raw idats produced by our iScan system which contain the raw signal intensities, the gtc files, both the raw and cleaned PLINK files, and a cleaned VCF file.

There are several text and PDF files that serve multiple purposes.   We provide a failing SNPs text file and a samples failing text file.  These are tab-delimited files that can be opened in a standard Excel spreadsheet or parsed by any programming language.  The files list the names (one per line) of either a SNP or sample ID, depending on the file that is opened, with the reason(s) of failure and the value for which each SNP or sample failed separated by tabs.  Additionally, there are three PDFs generated.  The summary report gives the customer an overview of the number of SNPs and samples passing QC and the total number of SNPs and samples analyzed.  It also provided details on HapMap trios and duplication concordance statistics.  The detailed report PDF provides an in-depth, granular view of all the QC metrics collected at both the SNP and sample level.  Finally, the glossary report, defines all the criteria, pipeline information, definitions, and thresholds used in the analysis.

We supply a md5 check sum of all the files, so please verify these match to ensure data has not been lost or corrupted.

Quality Control and Assurance Pipeline - Illumina iScan EPIC Methylation

Infinium methylation EPIC bead chip captures over 850,000 methylation sites quantitatively across the genome at single-nucleotide resolution.

The arrays are scanned using the iScan in house and the data output files in the form of idats are run through the QC pipeline. 

Using a standard R package “minfi”, a QC report is generated that includes several plots as listed below:

1) Density plot: A density plot for Beta values across samples colored by the sample groups. Since methylation values correspond to two conditions, hypo (low) methylation and hyper (high) methylation, this plot is expected to be bimodal. 

2) Density Bean plot: These plots also convey similar message like the density plot but have a plot for each sample with range of beta values plotted on the x axis. 

3) Control probes plot: The methylation array contains several internal control probes that can be used to assess the quality control of different sample preparation steps (bisulfite conversion, hybridization, etc.). These values are plotted using a control strip plot to look for any technical failures. There are control probes plots generated for different types of control probes.

As part of the QC, there are two additional plots generated. A simple QC plot of log median intensities of methylated and unmethylated channels with a diagonal line as a threshold that separates samples that pass and fail QC. The samples below the threshold are considered QC failures.

A plot identifying the gender mismatches is also generated. There are probes on the X and Y chromosomes that specifically are used to predict sex of a sample run on the array.

The plot with actual and predicted sex of a sample helps identify any gender mismatches by looking at samples that are clustered differently than expected.


Quality Control and Assurance Pipeline - ABI 7900


The ABI 7900 Genotyping Platform genotypes low to mid throughput SNPs (1-50). Customers will receive three files: a PDF consisting of a series of cluster plots for all SNPs designed per 96 well plate, a PDF with all QC metrics provided (SNP call rate, Hardy-Weinberg equilibrium, sample call rate, concordance for Coriell controls, and number of Mendel errors in Coriell trios), as well as batch/plate info, and final PLINK files.