CACT/Affymetrix Lab
 

> Affymetrix Genechip Analysis



Investigators are able to examine global differential gene expression in a wide variety of cells and tissues in many different organisms. This genome wide view of RNA regulation is useful for studying complex biological pathways, classification and prediction of diseases and drug interactions. Please consult the CACT staff if you have any questions regarding your experiment. The CACT laboratory has in stock arrays for human and mouse, all other arrays are ordered by the staff when requested.

Contact Staff

GeneChip® probe arrays are manufactured using technologies that combine photolithographic methods and combinatorial chemistry. Over 400,000 unique oligonucleotide probes are synthesized on each GeneChip representing anywhere from 12,000-23,000 genes/EST's. Each gene/EST is represented by 22-40 distinct oligonucleotide probe sequences, half of which are perfect match (PM) probes and the other half containing a single base mismatch (MM). The mismatch pairs serve as controls for nonspecific binding. The probes span an approximately 600 bp range towards the 3' end of the target gene. The detection call (present, marginal, absent) for each gene in your library, as well as the amount of message, is then determined by considering both the intensity of the signal being emitted from the probe set, as well as the number of probe pairs in which the PM pair member is specifically emitting signal. Each probe sequence is located in a specific area of the array known as the probe cell and each probe cell contains more than 10 million copies of a given probe. The newest generation chips are designed so that the probe cells for a specific gene are distributed throughout the chip, making it highly unlikely that the information for an entire gene might be lost due to a technical problem affecting a small portion of the chip. Currently, Affymetrix is providing GeneChips® which contain genes from Mouse, Rat, Human, Drosophila, C. elegans, S. cerevisae, E. coli, P. aeruginosa, B. subtilis and Arabidopsis. In addition, specialized GeneChips® address mutations in cytochrome p450 and the tumor repressor gene p53 as well as human SNP mapping. Additional information can be found at www.affymetrix.com

back to top

HOW TO ORDER

You can place your oder through SRM.

Quality Assurance

To obtain high quality data from your GeneChip experiment, it is essential to start with high quality total RNA. When you submit your sample, the analysts in the CACT lab will assess the quality of the RNA before starting the labeling process. If RNA is degraded or otherwise unsuitable for microarray application, you will be notified immediately.

Spectrophotometric Analysis

A small amount of your sample will be assayed with a Spectromax Plus spectrophotometer. We require a concentration of greater than 1.5 mg/mL. The OD 260/280 is calculated to estimate the purity of the RNA. A ratio close to 2.00 indicates a high percentage of ribonucleotide.

Electrophoretic Analysis

To assess the integrity of your total RNA, we will test it with the Agilent Technologies 2100 Bioanalyzer Lab-on-a-chip system. This assay is similar to gel electrophoresis in concept, but it is cleaner, more efficient, and only requires a very small amount of sample.

Figure 1: Agilent's Lab-on-a-chip

Figure 2: The sample wells are connected to microchannels

A small amount of sample is loaded into the wells in the chip and electrodes cause the RNA to move through gel filled microchannels. Fluorescence signal is plotted against run-time to generate an electropherogram. (The data can also be visualized as a virtual gel)

Figure 3: A virtual gel image.

Figure 4: This is an example of a good electropherogram. The 18s and 28s ribosomal RNA peaks are clearly resolved and there is very little contamination or degradation.

Good quality RNA will have two strong peaks, representing the 18s and 28s ribosomal RNA fragments, with a 28s/18s ratio close to 2.00. There should be very little low molecular weight degradation products.

Figure 5: This is an electropherogram of bad RNA. The ribosomal peaks are indistinct and of low quantity. The peaks seen between 24 and 29 seconds represents degraded RNA.

Figure 6: This is an electropherogram of totally degraded RNA. The 18s and 28s ribosomal peaks are not present and there are a lot of low molecular weight degradation products.

back to top

DATA ANALYSIS/TROUBLESHOOTING

As with the cDNA array, Affymetrix microarrays generate an enormous amount of data and as such relies heavily on bioinformatics support for data management and analysis. The Hartwell Center’s Bioinformatics section and "(High Performance Computing Facility)" provides the substantial computational resources needed.

Data from Affymetrix GeneChip arrays is acquired and analyzed using Microarray Suite. The Affymetrix Laboratory Information Management System (LIMS) is used to manage and stores all data generated during analysis. Advanced data analysis is conducted using Affymetrix Data Mining Tool (DMT). Details of these software can be obtained from www.affymetrix.com.

"Spotfire Decision Site for Functional Genomics" is now available to everyone for data analysis. Analytic component features include hierarchical clustering, K-means clustering, self organizing maps, and principal component analysis among others. The researcher can even define an expression pattern and genes showing that expression pattern across the samples will be identified. Data can be loaded from Affymetrix generated metric text files or from EXCEL files. Direct links to the web, including the Affymetrix NetAffx site, make further exploration of selected genes convenient. Spotfire classes are being given by the Hartwell Center staff. You can register over the web by going to Hartwell Center Training Classes, but if you do not already have an account, first create a new HC Training Account. For more information about Spotfire click here.

Microarray Suite Report file

After your RNA has been labeled, hybridized to a GeneChip, and scanned, the expression data is analyzed by Affymetrix Microarray Suite software. Microarray Suite (MAS) generates a report file that lists the performance parameters of the chip. We will look at all of these parameters to determine if the data passes our quality standards. If the data does not "pass," it will not be released for retrieval by HCWebFetch and you will be contacted to discuss the results of the run. Parameters that are in red print are ones that we use to determine suitability for analysis.

Report Type:
Date:
Expression Report
04:59PM 09/17/2002
 The date the chip report was generated.

Filename: drm065-v5-u74av2.CHP  This section contains the name of the file and the type of chip that the sample was run on. Algorithm refers to the type of calculation that was done to determine the signal level and detection calls for the chip. The statistical algorithm is used in the most recent version of MAS (5.0).
Probe Array Type: MG_U74Av2
Algorithm: Statistical
Probe Pair Thr: 8
Controls: Antisense

Alpha1: 0.04
Alpha2: 0.06
Tau: 0.015
Noise (RawQ): 2.790
Scale Factor (SF): 9.415
TGT Value: 500
Norm Factor (NF): 1.000

Background: Avg: 77.67 Std: 1.72
Noise: Avg: 2.76 Std: 0.26  Min: 2.30
Corner+ Avg: 71 Count: 32
Corner- Avg: 6480  
Central- Avg: 6713  
The statistical algorithm calculates a signal level as well as a detection P-value for each gene. Alpha1 and 2 are the P-value cut-offs that we set for determining the absent or present call that is based on the signal and the detection P-value. A P-value of <.04 results in a present call, >.06 is absent, and .04-.06 is marginal. HG_U133 P-values are more relaxed with Alpha1 being .05 and Alpha2 being .065. Tau is used in the calculation of the P-values. Noise is a property of the scanner and is measured by determining pixel-to-pixel variations in signal intensity. Currently, a target intensity value (TGT) of 500 is set for each chip that we run, where the absolute signal for each gene is multiplied by an amount, the scale factor, which makes signals comparable from chip to chip. The signals that you obtain in your metric text files are the scaled signals. To determine the absolute signal, divide the scaled signal by the scale factor. Affymetrix guarantees linearity up to an absolute signal of 64,500. The scale factor is important to consider when analyzing data. A data set with a scale factor range of greater than 3-fold difference should be interpreted cautiously as it may be an indicator of the poor performance of a sample. As explained below re: the percentage of genes present, it may also reflect the particular biology of your system. Background is signal intensity caused by autofluoresence of the array surface and nonspecific binding of the target stain (streptavidin phycoerythrin).

The following data represents probe sets that exceed the probe pair threshold and are not called "No Call".
Total Probe Sets: 12473   Total probe sets is the number of probe sets defined by the chip. The percentage of genes detected as present is an important parameter to consider. To date, for most samples run on most highly annotated chips, this ranges from 20-40%. The percent present is inversely related to the scale factor because the fewer probes emitting signal, the greater scaling necessary to reach the target value of 500. It is important to remember when interpreting these data, that you may be running a sample set that due to its unique tissue specific expression program, or due to treatments of the cells, may have a lower percent present than other cells. Therefore, a low percent present could mean that your RNA prep didn't label well and should be discounted or it could reflect the biology of your particular system.
Number Present: 4505     36.1%
Number Absent: 7621     61.1%
Number Marginal: 347        2.8%
   
Average Signal (P): 2053.8
Average Signal (A): 145.5
Average Signal (M): 374.3
Average Signal (All): 841.1
   

Housekeeping Controls:
Probe Set Sig(5') Det(5') Sig(M') Det(M') Sig(3') Det(3') Sig(all) Sig(3'/5')
B-ACTINMUR/M12481 38099.6 P 46719.9 P 49485.2 P 44768.23 1.30
GAPDHMUR/M32599 22784.3 P 19906.3 P 17501.4 P 20063.98 0.77
TRANSRECMUR/X57349 145.4 A 85.8 A 262.6 M 154.14 2.30
PYRUCARBMUR/L09192 32693.8 A 40.5 A 52.7 A 79.52 0.36
18SRNAMUR 55.0 A 34.3 A 274.8 A 121.36 5.00
Housekeeping controls are used as indicators of the suitability of the transcription milieu for transcription. As most probe sets span a 600bp region of the 3' end of each gene, and PCR is a 3' anchored amplification technique, it is important to show transcription across the probe set for representative genes. 3'/5' of less than 3.0 is considered acceptable. Generally if these numbers are very large, the percent present will be low and the scale factor will be high, indicating a generally poor quality sample. If only the 3'/5' ratio is high (independent of other parameters), this may be a tissue-specific or disease-specific event, suggesting that the data might be usable with careful analysis and cautious interpretation.

Spike Controls:
Probe Set Sig(5') Det(5') Sig(M') Det(M') Sig(3') Det(3') Sig(all) Sig(3'/5')
BIOB 1060.8 P 2127.5 P 662.0 P 1283.41 0.62
BIOC 3074.5 P     2931.1 P 3002.78 0.95
BIODN 3688.4 P     15548.2 P 9618.31 4.22
CREX 32693.8 P     41256.2 P 36975.01 1.26
DAPX 93.5 A 91.3 A 10.2 A 65.01 0.11
LYSX 43.9 A 147.7 A 34.1 A 75.24 0.78
PHEX 97.0 A 64.7 A 105.3 A 88.98 1.09
THRX 27.2 A 14.9 A 28.4 A 23.50 1.05
TRPNX 31.1 A 16.6 A 9.7 A 19.14 0.31
The spike controls are multipurpose. BioB, for example, is spiked in at a concentration of 1.5pmol, the lowest mRNA concentration that Affymetrix guarantees to give accurate data. The scaled signal value of BioB, therefore, represents the minimum signal level in that data set that is in the linear range. Since some of the spike-in controls are prebiotinylated, their signals are also valuable in determining whether conditions during the hybridization were suitable for hybridization to occur. A very low percent present plus BioB, C , DN and CREX signals that are low, or low and flat (not reflecting the concentration gradient that they are spiked-in at) suggest a problem with the hybridization buffer or the chip that did not allow hybridization to occur. The remaining spike-in controls are non-biotinylated negative controls.

INSTRUMENTATION

Affymetrix scanning laser confocal microscope (1 of 2)

Fluidics workstation (1 of 3)

back to top

FEES

Please follow this link to a general fees page.

Related MEthodS

RNA Suitability Analysis

To obtain high quality data from your gene chip experiment, it is essential to start with high quality total RNA. The analysts in the CACT lab will determine the quantity, and assess the quality of the RNA before starting the labeling process. In addition to determining the 260/280 ratio, the integrity of the RNA is determined by running the sample on the Agilent Bioanalyzer Lab-on-a-Chip.

Target Preparation

The RNA labeling procedure involves reverse transcription using dT primer, followed by synthesis of second strand DNA. Double stranded DNA is transcribed in-vitro (IVT), incorporating biotinylated ribonucleotides. The cRNA is fragmented in order to minimize steric hindrance in the hybridization step. Before hybridization, a number of spike-in controls are added for quality assurance.

Target Preparation with Amplification

We can successfully amplify 1 mg of total RNA for GeneChip® analysis and are now offering amplification as a service. Prior to target preparation, the entire process for target preparation is performed as described above with the exception that the IVT step proceeds with the incorporation of unlabeled nucleotides. Amplification adds ~$26 to the cost per sample, and adds two days to the processing time. Amplification of less than 1mg of RNA requires a consultation with CACT staff prior to proceeding. The method that we are using can be viewed at the following location: http://www.affymetrix.com/support/technical/technotes/small_technote.pdf

Hybridization and Scanning

The labeled target is hybridized to the specified GeneChip® for about 18 hours at 45C. After hybridization, the RNA hybridization cocktail is taken out of the GeneChip cartridge and saved at -80C. The arrays are then washed and stained with streptavidin conjugated to phycoerthyrin, using the Affymetrix automated fluidics station. Then the arrays are scanned and an image file is produced.

Expression Profile Suitability Analysis

Each chip run results in the generation of a chip report that provides information reflecting the quality of the run. There are specific parameters that we evaluate to determine if your chip "passed". These parameters include noise (Raw Q), background, scale factor, percentage of genes on the chip detected as present, and the 3' probe set signal to 5' probe set signal ratio of selected genes that are expressed abundantly in all cells. You are encouraged to request your chip reports, especially if you are accumulating a data set over a period of time because you will want to assure yourself that the parameters for all samples are within a specific range. For a more detailed description on chip report analysis see the quality assurance section.

Data Management

The image file is converted into expression data using Microarray Suite. Microarray Suite Software (formerly known as GeneChip® Suite Software) provides a powerful and comprehensive data collection and analysis package for users of the Affymetrix System. The suite provides automated data collection and instrument control for the GeneChip® Fluidics Station 400 and the GeneArray™ scanner. Using advanced scientific algorithms, the suite provides several different analysis applications. Analysis options include conversion of intensity data into expression results, allele detection, single nucleotide polymorphism detection, and nucleotide analysis.

Data produced in the CACT lab is managed with the Affymetrix Laboratory Information Management System (LIMS). Affymetrix LIMS is a microarray data management package for users who are generating large quantities of GeneChip® probe array data and require a data management solution. Data is published to the LIMS database (AADM) as well as St. Jude's Gene Expression Data integration (GEDi) database and can be retrieved using a web interface or directly into Spotfire.

Data can be submitted to the NCBIs Gene Expression Omnibus (GEO) using a Hartwell Center developed application Geo Submission Tool (GST).

Analysis Assistance

Expression data in the form of metrics text files are uploaded to a Storage Area Network. You may retrieve the results of your gene expression experiment using our web-based retrieval tool HCWebFetch. CACT staff can assist you in your data analysis using Spotfire Decision Site. The Hartwell Center also offers Spotfire training sessions.

back to top

AVAILABLE GENECHIPS

Gene Expression Analysis Arrays

DNA Analysis Arrays

Affymetrix Expression Arrays

For all other available chips please see the Affymetrix website.

back to top