Bio-Homology-InterologWalk version 0.12 ======================================= This document refers to version 0.12 of Bio::Homology::InterologWalk. This version was released January 12th, 2011. INSTALLATION------------------------------------------------------------------------- To install this module on your system, place the tarball archive file in a temporary directory and call the following: % gunzip Bio-Homology-InterologWalk-0.12.tar.gz % tar xf Bio-Homology-InterologWalk-0.12.tar % cd Bio-Homology-InterologWalk-0.12 % perl Makefile.PL % make % make test % make install DEPENDENCIES------------------------------------------------------------------------- This module requires the following modules and libraries: =============== 1. Ensembl API =============== The Ensembl project is currently branched in two sub-projects: The Ensembl Vertebrates project This is of interest to you if you work with vertebrate genomes (although it also includes data from a few non-vertebrate common model organisms). See http://www.ensembl.org/index.html for further details. The Ensembl Genomes project This utilises the Ensembl software infrastructure (originally developed in the Ensembl Core project) to provide access to genome-scale data from non-vertebrate species. This is of interest to you if your species is a non-vertebrate, or if your species is a vertebrate but you *also want to obtain results mapped from non-vertebrates*. "Bio::Homology::InterologWalk" currently only supports the metazoa sub-site from the Ensembl Genomes Project. See http://metazoa.ensembl.org/index.html for further details. IMPORTANT You will need to decide which Ensembl-DB set you will need prior to installing "Bio::Homology::InterologWalk". The module requests that Ensembl API Version == Ensembl-DB set version. This means that if you install e.g. API V.58, you will only be able to get data from Ensembl Vertebrates / Metazoa databases V. 58. As the EnsemblGenomes DB releases are one version behind the Ensembl Vertebrate DB release, if you install the bleeding-edge Ensembl Vertebrate API, *a matching EnsemblGenomes DB release might not be available yet*: you will still be able to use "Bio::Homology::InterologWalk" to run an orthology walk using exclusively Ensembl Vertebrate DBs, but you will get an error if you try to choose metazoan databases. See "setup_ensembl_adaptor" for further information. Therefore, before installing "Bio::Homology::InterologWalk", you are faced with the following choice: a) If you are exclusively interested in vertebrates (plus the few non-vertebrate model organisms still present in Ensembl Vertebrates) then obtain the APIs and set up the environment by following the steps described on the Ensembl Vertebrates API installation pages: http://www.ensembl.org/info/docs/api/api_installation.html or alternatively http://www.ensembl.org/info/docs/api/api_cvs.html This option allows you to get the most recent datasets provided by Ensembl Core. However, you might not be able to query EnsemblCompara data. b) If you are interested in querying/getting back data from vertebrate + metazoan genomes, then obtain the APIs and set up the environment by following the steps described on the Ensembl Metazoa API installation pages: (this allows you to query across a wider selection of taxa) http://metazoa.ensembl.org/info/docs/api/api_installation.html or alternatively http://metazoa.ensembl.org/info/docs/api/api_cvs.html This option will probably not use the most recent API+DBs, but will guarantee functionality across both Vertebrate and Metazoan genomes. Option (b) is the recommended one. ========== 2. Bioperl ========== Ensembl should provide a customised Bioperl installation tailored to its API, v. 1.2.3. Should version 1.2.3 be no more available through Ensembl, please obtain release 1.6.x from CPAN. (while not officially supported by the Ensembl Project it will work fine when using the API within the scope of the present module) ================================================= NOTE 1: All the API components ("ensembl", "ensembl-compara", "ensembl-variation", "ensembl-functgenomics") are required. NOTE 2: The module has been tested on Ensembl Vertebrates API & DB v. 58-60 and EnsemblGenomes API & DB v. 5-7. EXAMPLE========================================== e.g. to install API CORE V.58, do the following: log into the Ensembl CVS server at Sanger (using password: CVSUSER): $ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl login Logging in to :pserver:cvsuser@cvs.sanger.ac.uk:2401/cvsroot/ensembl CVS password: CVSUSER Install the Ensembl Core Perl API for version 58 $ cvs -d :pserver:cvsuser@cvs.sanger.ac.uk:/cvsroot/ensembl checkout -r branch-ensembl-58 ensembl ===================== 3. EXTRA PERL MODULES ===================== You will also need to install the following modules (including all dependencies) from CPAN: 1. REST::Client 2. GO::Parser 3. DBD::CSV (requires Perl DBI) 4. String::Approx 5. List::Util 6. File::Glob The following modules are only required if you intend to compute conservation scores for the putative PPIs retrieved: 7. Graph 8. Data::PowerSet 9. URI::Escape 10. Algorithm::Combinatorics ===================== 4. NOTE FOR MAC USERS ===================== Please notice that Ensembl REQUIRES the module DBD::MySQL in order to work. DBD::MySQL in turn will need to contact a running instance of MySQL in order to successfully complete the "make test" stage. Please check http://www.ensembl.org/info/docs/api/api_installation.html for further information. SAMPLE SCRIPS-------------------------------------------------------------------------------- The scripts/Code sub-directory provide an example for the usage of the module. The meaning of the files is as follows: -doInterologWalk.pl: example usage of the core methods: given a flat file containing a list of stable flybase ids, this script will use Bio::Homology::InterologWalk to build a TSV file containing the putative interactors of such ids according to the interolog mapping method. -getDirectInteractions.pl: generate a dataset of direct PPIs based on the input ID list -doScores.pl: given a tsv obtained with doInterologWalk.pl, this file will compute an aggregated score for each (id, putative interactor) couple, representing a measure of the reliability of the interaction. The output of this script is a new TSV file containing a new compound score column REQUIRES: doInterologWalk.pl getDirectInteractions.pl -doNets.pl: given a tsv obtained from doFlyWalk.pl (optionally, processed by doScores.pl to add a compound score column) this script will produce a .sif network file and two .noa network attribute files, suitable for importing into the Cytoscape (http://www.cytoscape.org/) network visualisation program. The files follow the definition on page http://cytoscape.org/cgi-bin/moin.cgi/Cytoscape_User_Manual/Network_Formats and have been tested on Cytoscape v. 2.6.2 - 2.7.0 REQUIRES: doInterologWalk.pl OPTIONAL: doScores.pl scripts/Data contains a psi-mi obo ontology (used by doScores.pl interaction types and interaction detection methods) and a small sample Mus musculus dataset. COPYRIGHT AND LICENSE------------------------------------------------------------------------ Original author: Giuseppe Gallone CPAN ID: GGALLONE G.Gallone@sms.ed.ac.uk Copyright (C) 2010-2011 by Giuseppe Gallone This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. The full text of the license can be found in the LICENSE file included with this module.