SigTerms: a bioinformatics tool for linking gene expression profiling results with gene class associations
Instructions for use
Running SigTerms Excel macros
The SigTerms software tool consists of a set of Microsoft Excel macros for use in Excel:
In order to run the SigTerms macros, first open the SigTerms.xls workbook in Excel, then select Tools->Macro->Macros from the main menu (for 32-bit, pre-Vista Excel, or select Alt+F8 keys). The list of SigTerms macros will be displayed. Select the name of the macro you wish to run, and then click the "Run" button.
Important note on running Excel macros: If it does not already, your copy of Excel needs to allow the opening and running of macros in order to use the SigTerms software. To allow macros in 32-bit Excel (pre-Vista), open the Tools->Options dialog from the main menu, click the "Security" button, and set the security level to "Medium." When opening the SigTerms.xls spreadsheet, select "Enable macros" (not "Disable macros"), if so prompted.
Other tips on running Excel macros:
The SigTerms "FindSignificantTerms" macro takes two types of input:
The gene-to-class associations are contained within an Excel spreadsheet referred to as the "Annotation" worksheet. The Annotation worksheet has the following format:
The Annotation workbook includes the Annotation worksheet and a worksheet named "Counts," which lists each gene class term, along with the total number of times the term occurred in the Annotation worksheet.
The main page provides links to download pre-compiled Annotation workbooks for several types of gene class associations of potential interest (e.g. Gene Ontology annotations, microRNA targeting predictions, oncogenic signatures, etc.). Users also have the freedom to create their own Annotation worksheets, using the format specified above. After creating a new Annotation sheet, users need to run the "CountTermToGene" macro in order to generate the Counts worksheet.
In order to find all gene class terms (with significance of enrichment) for a given gene set:
The "FindSignificantTerms" macro generates two new sheets in the Annotation workbook:
For each gene class that is matched with a number of genes in the selected gene set, SigTerms tests whether the class appears a disproportionate number of times within the gene set, i.e. whether the class occurred more times in the user-specified gene set that would be expected in a randomly selected set of genes. The classical one-sided Fisher’s exact test is used to assess significance of enrichment for each gene class term.
For computing the one-sided Fisher’s exact test, it is important to specify the total gene population from which the gene set was selected. The total number of genes in the population is used as the denominator for the enrichment calculations. There are a number of ways that one can choose the gene population, including the following:
The number of genes in the total population is specified in the input form of the "FindSignificantTerms" macro. If the gene population is not correctly specified, the macro will still run, though the enrichment p-values will not be precise.
The main page provides links to download several pre-compiled Annotation workbooks for commonly used profiling arrays (e.g. Affymetrix or Illumina). Users of a profiling platform represented on the main page may simply download the corresponding Annotation workbook (which includes a list of the unique named genes represented on the array). With these array platform-specific Annotation workbooks, the worksheet labeled "Gene Pop" has the recommended number of genes to input as the population. Users of a profiling platform not represented on the main page may download the "all" version of the Annotation worksheet, select out the genes not represented on their array (using the Excel "MATCH" function), and run the "CountTermToGene" macro to regenerate the Counts worksheet
Correcting enrichment p-values for multiple testing
For each term, the one-sided Fisher’s exact p-value gives the probability for that term having occurred a given number of times or more within the selected set of genes by chance. However if many terms are simultaneously considered for enrichment, the issue of multiple term testing needs to be considered when trying to assess the "global" significance of any particular term over the hundreds or thousands of terms that may be represented for the entire set of genes under study. One multiple comparison procedure to address this is to do numerous Monte Carlo simulation tests for randomness, each test in which a set of genes equal to the number of genes used to search for terms using the "FindSignificantTerms" macro is first randomly selected from a population of genes and a set of term enrichment p-values for these genes is then calculated. One can then examine the distribution of p-values generated from each test, in order to be able to estimate the number of terms that may have received a low p-value by chance alone (e.g. how many terms in each test on average received a p-value less than 0.01).
To do simulation testing to determine global significance of term enrichment p-values:
One or more new sheets will be generated in the current workbook. Each of the columns in these new sheets will contain a set of p-values generated from a single simulation test (p-values greater than 0.05 for a simulation will not be listed). You can use the simulation results in order to estimate the true significance of a p-value obtained from your set of genes of interest (e.g. count the number of terms in each test that had a p-value<0.01).
The SigTerms macros have been tested mainly with Microsoft Excel (both 32-bit and 64-bit). In principle, the software should work with Excel for Macintosh. We have recently noticed one issue with running the "FindSignificantTerms" macro on Mac OSX version 10. At the end of the program run, the following error message is generated: "Run-time error '1004': Method 'FreezePanes' of object window failed." This appears to have to with the 'FreezePanes' feature not working properly when spreadsheets are in the "Page Layout" view, which has been noted elsewhere on Mac user forums.
running SigTerms on Macintosh, we recommend that the
user has all open worksheets in the "