\name{fpSim}
\alias{fpSim}
\title{
PubChem Fingerprint Search 
}
\description{
Function to use PubChem fingerprints for structure similarity comparisons, searching and clustering.
}
\usage{
fpSim(x, y)
}
%- maybe also 'usage' for other objects documented here.
\arguments{
  \item{x}{
\code{vector} containing binary fingerprint data. Needs to have the same length as \code{y} (\code{vector} or \code{matrix} row).
}
  \item{y}{
\code{vector} or \code{matrix} containing binary fingerprint data.
}
}
\details{
The function computes the Tanimoto coefficients for pairwise comparisons of binary fingerprints. The coefficient is defined as c/(a+b+c), which is the proportion of the "on-bits" shared among the fingerprints of two compounds divided by their union. The variable c is the number of "on-bits" common in both compounds, while a and b are the number of "on-bits" that are unique in one or the other compound, respectively.

}
\value{
Returns \code{numeric vector} with Tanimoto coefficients as values and compound identifiers as names. 
}
\references{
Tanimoto similarity coefficient: Tanimoto TT (1957) IBM Internal Report 17th Nov see also Jaccard P (1901) Bulletin del la Societe Vaudoisedes Sciences Naturelles 37, 241-272.

PubChem fingerprint specification: ftp://ftp.ncbi.nih.gov/pubchem/specifications/pubchem_fingerprints.txt
}
\author{
Thomas Girke
}
\note{
Limitation: PubChem fingerprints need to be provided, such as in PubChem's SD files.  
}

\seealso{
Functions: \code{fp2bit} 
}
\examples{
## Load PubChem SDFset sample
data(sdfsample); sdfset <- sdfsample
cid(sdfset) <- sdfid(sdfset)

## Convert base 64 encoded fingerprints to character vector or binary matrix
fpset <- fp2bit(x=sdfset, type=1)
fpset <- fp2bit(x=sdfset, type=2)

## Pairwise compound structure comparisons
fpSim(x=fpset[1,], y=fpset[2,]) 

## Structure similarity searching: x is query and y is fingerprint database  
fpSim(x=fpset[1,], y=fpset) 

## Compute fingerprint-based Tanimoto similarity matrix 
simMA <- sapply(rownames(fpset), function(x) fpSim(x=fpset[x,], fpset)) 

## Hierarchical clustering with simMA as input
hc <- hclust(as.dist(simMA), method="single")

## Plot hierarchical clustering tree
plot(as.dendrogram(hc), edgePar=list(col=4, lwd=2), horiz=TRUE) 
}
\keyword{utilities}