\name{cmp.similarity}
\alias{cmp.similarity}
\title{Compute similarity between two compounds using their descriptors}
\description{
	Given descriptors for two compounds, 'cmp.similarity' returns the
	similarity measure between the two compounds.
}
\usage{
cmp.similarity(a, b, mode = 1, worst = 0)
}
\arguments{
  \item{a}{Descriptor of the first compound.}
  \item{b}{Descriptor of the second compound.}
  \item{mode}{Mode used when computing the distance. See details below.}
  \item{worst}{The worst value you are expecting. If 'cmp.similarity' finds the
  upper bound of similarity is worse than it, it will return a 0 and
  potentially save some computation.}
}
\details{
	'cmp.similarity' uses descriptor information generated by 'cmp.parse'
	and 'cmp.parse1'. Basically, a descriptor is a vector of numbers. The
	vector actually reprsents the set of descriptors of structural
	fragment.  Similarity measurement uses Tanimoto coefficient.

	'cmp.similarity' supports 3 different modes. In mode 1, normal Tanimoto
	coefficient is used. In mode 2, it uses the size of descriptor
	intersection over the size of the smaller descriptor, mainly to deal
	with compounds that vary a lot in size. In mode 3, it is similar to
	mode 2, except that it raises the similarity to the power 3 to penalize
	small values. When mode is 0, 'cmp.similarity' will select mode 1 or
	mode 3, based on the size differences between the two descriptors.

	When 'cmp.similarity' is used in searching compounds with a threshold
	similarity value, or in clustering with a cutoff distance, the
	threshold similarity and cutoff distance can be used to decide a
	'worse' value. 'cmp.similarity' can compute an upper bound of
	similarity easier, and by comparing this upper bound to the 'worst'
	value, it can potentially skip the real computation if it finds the
	similarity will be below the 'worst' value and will be useless to the
	caller.
}
\value{
	Return a numeric value between 0 and 1 which gives the similarity
	between the two compounds.  }
\references{Chen X and Reynolds CH (2002). "Performance of similarity measures
in 2D fragment-based similarity searching: comparison of structural descriptors
and similarity coefficients", in \emph{J Chem Inf Comput Sci}.
	
	Peter Willett (1998). "Chemical Similarity Searching", in \emph{J. Chem.
	Inf. Comput. Sci}.}
\author{Y. Eddie Cao, Li-Chang Cheng}
\seealso{\code{\link{cmp.parse1}}, \code{\link{cmp.parse}},
	\code{\link{cmp.search}}, \code{\link{cmp.cluster}}}
\examples{
## Load sample SD file
# data(sdfsample); sdfset <- sdfsample

## Generate atom pair descriptor database for searching
# apset <- sdf2ap(sdfset) 

## Loads same atom pair sample data set provided by library
data(apset) 

## Compute similarities among two compounds
cmp.similarity(apset[1], apset[2])

## Search apset database with a query compound
cmp.search(apset, apset[1], type=3, cutoff = 0.3)
}
\keyword{utilities}