SigTerms: a bioinformatics tool for linking gene expression profiling results with gene class associations

 

Additional notes on searching for microRNA targeting associations

Chad Creighton, Ph.D.

Baylor College of Medicine

creighto@bcm.edu

 

 

Retrieving microRNA:mRNA pairs for a set of genes and a set of microRNAs

Running the "FindSignificantTerms" macro retrieves all microRNA:mRNA pairs involving a user-specified list of genes. In many cases, the user may have a selected list of microRNAs of interest (e.g. from a microRNA profiling experiment), in addition to a list of genes. From the "Terms with Genes" worksheet, the user can filter the retrieved microRNA:mRNA for those that involve the list of microRNAs, by using the Excel MATCH function with the Data Filtering feature. First, for each entry in the "Terms with Genes" worksheet, the user looks up each microRNA in the selected list by using the MATCH function; entries for which the microRNA was found will give a number (if "#N/A," the entry was not found). For novice Excel users, screen shots of examples of using MATCH and Data Filter are available here.

 

Linking mirBase accession numbers to microRNA names (PicTar and TargetScan)

When searching the PicTar or miRanda target predictions, the "Terms with Genes" worksheet lists each microRNA by mirBase accession number (e.g. "MIMAT0000072"), instead of the common microRNA name (e.g. "hsa-miR-18a"). The reason for this is that the accession number for a particular microRNA should not change over time, whereas the name may change from one mirBase version to the next (e.g. a hypothetical miR-XX being split into miR-XX-3p and miR-XX-5p).

 

The user may wish to list microRNA common names alongside the mirBase accession numbers. This is readily done using the Excel MATCH and INDEX functions, in conjunction with the Excel table (provided here) mapping microRNA accession numbers to common names. First, the user looks up the row position, within the mapping table, of each microRNA listed in the "Terms with Genes" worksheet, using the MATCH function. Then, using the row position, the user retrieves the common name using the INDEX function. The user can then copy the INDEX formula values and paste them into another column, using "Paste Special" ("Paste as Values"). Screen shot examples for linking name to accession number are available here.

 

Note: The "mirBase_name-to-accession.xls" tables may list a given microRNA multiple times. The microRNA accession-to-name mappings are listed starting with the most recent version of mirBase (v11 as of May 2008) to the oldest version (v6, which introduced the accession numbers). This ordering facilitates retrieving the mirBase accession number (which should remain constant) given a common microRNA name (which may change from version to version). The MATCH and INDEX function will retrieve the first entry starting from the top, so the most current name for a given accession number will always be retrieved.

 

Note: When searching the PicTar predictions for mouse, the mirBase accession numbers are for the human ortholog. To map from the human accession number to the mouse accession/name, use the table provided here; you can use the MATCH and INDEX function (screen shot examples here).

 

Linking microRNA families to microRNA names (TargetScan)

When searching the TargetScan target predictions, the "Terms with Genes" worksheet lists each microRNA association by family. A microRNA family may include several microRNAs (e.g. the "let-7/98" family includes let-7a, let-7b, let-7c, let-7d, let-7e, let-7f, let-7g, and let-7i). The user may wish to link microRNA common names alongside the corresponding family; however, this cannot be carried out using MATCH and INDEX (as the mapping between family and name is not one-to-one but one-to-many).

 

In order to link microRNA family names from the "Terms with Genes" worksheet to common names, the user can do a table join between the worksheet named "miR_Family_info" (from the TargetScan Annotation workbook) to the "Terms with Genes" worksheet; this can be done using Microsoft Access. Novice Access users can do the following:

  1. In Excel, copy the "Terms with Genes" worksheet into a new workbook and save it.
  2. Open the "join_TargetScan_family.mdb" database in Access (available here); say "No" to "Block unsafe expressions?" prompt; ignore warnings.
  3. Import the "Terms with Genes" worksheet copy into Access (from File->Get External Data->Import, in 32-bit Access). Import worksheet as a new table, as "Sheet1." Replace the original "Sheet1" in the database when prompted.
  4. From the Queries view, double click the Query "link_family_human" (or "link_family_mouse" if you are working with mouse genes).
  5. Export the Query output as an Excel workbook for opening in Excel (using File->Export, in 32-bit Access, save as XLS file).

 

Comparing microRNA:mRNA pairs across multiple target prediction databases

After searching each target prediction database separately, the user may wish to compare the microRNA:mRNA pairs predicted from one database with those predicted from another database. This may be done by the following:

  1. In each "Terms with Genes" worksheet from the respective databases, create another column of values with both the microRNA and mRNA names for each entry, using the Excel CONCATENATE function [e.g. =CONCATENATE(microRNA_ref, ":", mRNA_ref)].
  2. The microRNA:mRNA concatenated field may be used as a way to link microRNA:mRNA pairs between sheets, using the MATCH function (in a similar manner to what is described above).