\name{cmp.duplicated}
\alias{cmp.duplicated}
\title{quickly detect compound duplication in a descriptor database}
\description{
    'cmp.duplicated' detects duplicated compounds from a descriptor
        database generated by 'cmp.parse'. Two compounds are said to
        duplicate each other when their descriptors are the same. 
}
\usage{
    cmp.duplicated(db, sort = FALSE, type=1)
}
\arguments{
  \item{db}{The desciptor database, in the format returned by 'cmp.parse'.}
  \item{sort}{Whether to sort the descriptors for a compound. See details.}
  \item{type}{Returns results as vector (type=1) or data frame (type=2).}
}
\details{
    'cmp.duplicated' will take the descriptors in the descriptor database,
    concatenate all descriptors for the same compound into a string, and use
    this string as the identification of a compound. If two compounds share
    the same identification string,  they are said to duplicate each other.

    'cmp.duplicated' assume the the database passed in as argument to follow
    the format generated by 'cmp.parse'. That is, 'db' is a list,
    'db$descdb' is a list, and each entry of 'db$descdb' is an array of numeric
    values that give descriptors for one compound.

    By default, 'cmp.duplicated' will assume the descriptors for a compound is
    already sorted. That is each entry in 'db\$descdb' is a sorted array. This
    is true for database generated by 'cmp.parse'. If you generate the database
    using some other tools, you might want to enable sorting.
        
}
\value{
    Returns a logic array, telling whether a compound in the database is a
    duplication of a compound appearing before this one. For example, if the
    i-th element of the array is TRUE, it means that the i-th compound in the
    database is a duplication of a compound listed before this compound in the
    database.

    The returned array can be used to remove duplication. Simply use it to
    index the descriptor database.

    If you are interested in what compound is duplicated, you can do a search
    in the database with cutoff set to 1.
}
\author{Y. Eddie Cao}
\seealso{\code{\link{cmp.parse}}, \code{\link{cmp.search}}}
\examples{
## Load sample SD file
# data(sdfsample); sdfset <- sdfsample

## Generate atom pair descriptor database for searching
# apset <- sdf2ap(sdfset) 

## Loads same atom pair sample data set provided by library
data(apset) 
db <- apset

## Manually create a duplication (here compound 1 and 10)
db[10] <- db[1]

## Find duplication
dup <- cmp.duplicated(db)
dup
cid(db[dup])

## Remove all duplications 
db <- db[!dup]
}
\keyword{utilities}