--- title: "IMMAN" author: "Payman Nickchi, Abdollah Safari, Minoo Ashtiani, Mohieddin Jafari" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{IMMAN} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- IMMAN is specified to retrieve the interlog protein network shared across diverse species. For this aim, first we exceed orthology relationships among sequences from various species by iterating over any pair of input species, and using the Needleman-Wunsch alignment algorithm and a best reciprocal hit strategy to reach the orthologues through all versus all pairwise cross-species alignments. From orthology assignment, we derive Orthologous Protein sets (OPSs), an assortment of clusters of orthologues(maximum one per species) which will conform the nodes of the so-called Interolog Protein Network (IPN). We exceed n species-specific interlog protein networks from STRING database, where each node maps to a single OPS in the IPN, and distinguish the edges of the outcome IPN by choosing only edges linking nodes in the IPN that also are linked in at least species-specific networks (where 'k' is set as a parameter). A scoring system is used by the alignment process, which can be described as a set of values specified for quantifying the likelihood of one residue that has been substituted by another in an alignment. The scoring systems used by alignment procedure is called a substitution matrix and it can be achieved from statistical analysis of residue substitution data from sets of reputable alignments of highly relevent sequences. Using identityU value which ranges from 0 to 100, user would be able to specialize how the IPNs should be larger or not. As the value of identityU gets higher, the algorithm will find much similar orthologs and vice versa. We used gapOpening and gapExtension arguments to figure numeric values of ortholog proteins. For matching alignments of proteins if we skip a protein, gapOpening argument would be incremented. The smaller the amount of gap, protein alignements are more similar to each other. The score_threshold argument is specified for evaluating the similarity values between two proteins in substitutionMatrix. It differs from 0 to 100, however, the common use ranges from 25 to 30. The transference of interactionn among orthologs of different species called the interlog approach. We used Besthit argument to reach proteins which has the most similarity in all versus all protein alignment. If an interaction was exist between each pair of proteins of OPSs, an edge would be linked in the IPN. The coverage_threshold specifies the number of interactions that are exist among pair of proteins of OPSs. It differs from 1 to number of species. As much as the value of coverage_threshold was high, the final IPN would be more robust and usually smaller. NetworkShrinkage argument determine whether two similar OPSs which have ortholog proteins in common should be merged or not. If it was TRUE the resulting IPN would be smaller. For using this package, we assume that the "CINNA" package has been properly installed into the R environment. After installations, the "CINNA" package can be loaded via ```{r} library(IMMAN) ``` For illustration, we will read two datasets from different species which can be accessed via: ```{r} data(Celegance) data(FruitFly) ``` Then, we have to make a list of dataset species and set their taxanomy IDs. ```{r} ProteinLists = list(as.character(Celegance$V1), as.character(FruitFly$V1)) List1_Species_ID = 6239 # taxonomy ID Celegance List2_Species_ID = 7227 # taxonomy ID FruitFly Species_IDs = c(List1_Species_ID, List2_Species_ID) ``` To continue, set the parameters to run the analysis. Here is a description of parameters in IMMAN. If you need more information you can refer to the paper. identityU: Cut off value for selecting proteins whose alignment score is greater or equal than identityU. substitutionMatrix: Which scoring matrix to be used for alignment setting gapOpening and gapExtension for alignment purposes. For NetworkShrinkage, coverage, and BestHit refer to paper. STRINGversion: Indicated which version of STRING database should program search in for the score of PPIs. Then, we will set the argument values: ```{r} identityU = 30 substitutionMatrix = "BLOSUM62" gapOpening = -8 gapExtension = -8 NetworkShrinkage = FALSE coverage = 1 BestHit = TRUE score_threshold = 400 STRINGversion="10" ``` Finally, we can run the IMMAN function: ```{r} output = IMMAN(ProteinLists, fileNames=NULL, Species_IDs, identityU, substitutionMatrix, gapOpening, gapExtension, BestHit, coverage, NetworkShrinkage, score_threshold, STRINGversion, InputDirectory = getwd()) ``` In order to see some particular parts of the result, you can use: ```{r} output$IPNEdges output$IPNNodes output$Networks output$Networks[[1]] output$maps output$maps[[2]] ```