SigTerms: a
bioinformatics tool for linking gene expression profiling results with gene
class associations
Instructions for use
creighto@bcm.edu
Running SigTerms
Excel macros
The SigTerms software tool consists of a set of Microsoft Excel
macros for use in Excel:
In order to
run the SigTerms macros, first open the SigTerms.xls
workbook in Excel, then select Tools->Macro->Macros
from the main menu (for 32-bit, pre-Vista Excel, or select Alt+F8 keys). The list of SigTerms
macros will be displayed. Select the
name of the macro you wish to run, and then click the "Run" button.
Important note on running Excel
macros: If it does not already, your copy of Excel
needs to allow the opening and running of macros in order to use the SigTerms software.
To allow macros in 32-bit Excel (pre-Vista), open the Tools->Options dialog from the main
menu, click the "Security" button, and set the security level to
"Medium." When opening the
SigTerms.xls spreadsheet, select "Enable macros" (not "Disable
macros"), if so prompted.
Other tips
on running Excel macros:
Selecting the Annotation workbook
The SigTerms "FindSignificantTerms"
macro takes two types of input:
The gene-to-class
associations are contained within an Excel spreadsheet referred to as the
"Annotation" worksheet. The
Annotation worksheet has the following format:
The
Annotation workbook includes the Annotation worksheet and a worksheet named
"Counts," which lists each gene class term, along with the total
number of times the term occurred in the Annotation worksheet.
The main page provides links to download pre-compiled
Annotation workbooks for several types of gene class associations of potential
interest (e.g. Gene Ontology annotations, microRNA
targeting predictions, oncogenic signatures,
etc.). Users also have the freedom to
create their own Annotation worksheets, using the format specified above. After creating a new Annotation sheet, users
need to run the "CountTermToGene" macro in
order to generate the Counts worksheet.
Linking gene class terms to gene
sets
In order to
find all gene class terms (with significance of enrichment) for a given gene
set:
The "FindSignificantTerms" macro generates two new sheets
in the Annotation workbook:
For each
gene class that is matched with a number of genes in the selected gene set, SigTerms tests whether the class appears a disproportionate
number of times within the gene set, i.e. whether the class occurred more times
in the user-specified gene set that would be expected in a randomly selected
set of genes. The classical one-sided
Fisher’s exact test is used to assess significance of enrichment for each gene
class term.
For
computing the one-sided Fisher’s exact test, it is important to specify the
total gene population from which the gene set was selected. The total number of genes in the population
is used as the denominator for the enrichment calculations. There are a number of ways that one can
choose the gene population, including the following:
The number
of genes in the total population is specified in the input form of the "FindSignificantTerms" macro. If the gene population is not correctly
specified, the macro will still run, though the enrichment p-values will not be
precise.
The main page provides links to download several
pre-compiled Annotation workbooks for commonly used profiling arrays (e.g. Affymetrix or Illumina). Users of a profiling platform represented on
the main page may simply download the corresponding Annotation workbook (which
includes a list of the unique named genes represented on the array). With these array platform-specific Annotation
workbooks, the worksheet labeled "Gene Pop" has the recommended
number of genes to input as the population. Users of a profiling platform not represented
on the main page may download the "all" version of the Annotation
worksheet, select out the genes not represented on their array (using the Excel
"MATCH" function), and run the "CountTermToGene"
macro to regenerate the Counts worksheet
Correcting enrichment p-values for
multiple testing
For each term, the one-sided Fisher’s exact p-value gives the probability for that term having occurred a given number of times or more within the selected set of genes by chance. However if many terms are simultaneously considered for enrichment, the issue of multiple term testing needs to be considered when trying to assess the "global" significance of any particular term over the hundreds or thousands of terms that may be represented for the entire set of genes under study. One multiple comparison procedure to address this is to do numerous Monte Carlo simulation tests for randomness, each test in which a set of genes equal to the number of genes used to search for terms using the "FindSignificantTerms" macro is first randomly selected from a population of genes and a set of term enrichment p-values for these genes is then calculated. One can then examine the distribution of p-values generated from each test, in order to be able to estimate the number of terms that may have received a low p-value by chance alone (e.g. how many terms in each test on average received a p-value less than 0.01).
To do
simulation testing to determine global significance of term enrichment p-values:
One or more
new sheets will be generated in the current workbook. Each of the columns in these new sheets will
contain a set of p-values generated from a single simulation test
(p-values greater than 0.05 for a simulation will not be listed). You can use the simulation results in order
to estimate the true significance of a p-value obtained from your set of
genes of interest (e.g. count the number of terms in each test that had a p-value<0.01).
The SigTerms macros have been tested mainly with Microsoft
Excel (both 32-bit and 64-bit). In
principle, the software should work with Excel for Macintosh. We have recently noticed one issue with
running the "FindSignificantTerms" macro on
Mac OSX version 10. At the end of the
program run, the following error message is generated: "Run-time error
'1004': Method 'FreezePanes' of object window failed." This appears to have to with the 'FreezePanes' feature not working properly when spreadsheets
are in the "Page Layout" view, which has been noted elsewhere on Mac
user forums.
When
running SigTerms on Macintosh, we recommend that the
user has all open worksheets in the "