\name{cmp.parse}
\alias{cmp.parse}
\title{Parse an SDF file and compute descriptors for all compounds}
\description{
'cmp.parse' will take a SDF file, parse all the compounds
encoded, compute their atom-pair descriptors, and return the descriptors as a
list. The list contains two names, 'descdb' and 'cids'. 'descdb' is a vector of
descriptors, and 'cids' is a list of names of compounds found in the SDF file.
The returned list is usually used to a database, against which similarity
search can be performed using the 'search' function. These two functions will
parse all compounds in the SDF file. To parse a single compound, use
'cmp.parse1' instead.
}
\usage{
cmp.parse(filename, quiet=FALSE, type="normal", dbname="")
}
%- maybe also 'usage' for other objects documented here.
\arguments{
  \item{filename}{The file name of the SDF file}
  \item{quiet}{Whether to silent the output of progress information}
  \item{type}{Database type. Use the default value, or set to 'file-backed'
  when the library is large. See below.}
  \item{dbname}{Datbase name. Only used when the type is set to 'file-backed'.}
}
\details{
	The 'filename' can be a local file or an URL. It is interactive, and
	will display the parsing progress. Since the parsing will also compute
	of atom-pair descriptors, it is time consuming. You will be reminded to
	save the parsing result for future use at the end of parsing.

	'type' is either set to the default value 'normal' or 'file-backed'.
	When set to 'file-backed', the parsing work will be delegated to a
	separate package called 'ChemmineRpp', and the database will be stored
	in a file instead of in the primary memory. Therefore, 'file-backed'
	mode can handle larger compound libraries. In 'file-backed' mode,
	'dbname' will be used to name the database file. A suffix '.cdb' will
	be appended to the given name.

	The type of the database is transparent to other part of the package.
	For example, calling 'cmp.search' against a database in 'file-backed'
	mode will cause the package to load the descriptors from the database
	file progressively.
}
\value{
	Return a list that can be used as the database against which similarity
	search can be performed. The 'search' and 'cmp.cluster' functions both
	expect a database returned by 'cmp.parse'.
  \item{descdb}{A vector containing the descriptors for all the compounds.}
  \item{cids}{Compound ID information found in the SDF file. It is the first
	  line of SDF of a compound.}
}
\references{Chen X and Reynolds CH (2002). "Performance of similarity measures
in 2D fragment-based similarity searching: comparison of structural descriptors
and similarity coefficients", in \emph{J Chem Inf Comput Sci}.}
\author{Y. Eddie Cao, Li-Chang Cheng}

\seealso{\code{\link{cmp.parse1}}, \code{\link{cmp.search}},
	\code{\link{cmp.cluster}},
\code{\link{cmp.similarity}}}
\examples{
## Load sample SD file
# data(sdfsample); sdfset <- sdfsample

## Generate atom pair descriptor database for searching
# apset <- sdf2ap(sdfset) 

## Loads same atom pair sample data set provided by library
data(apset) 
db <- apset
# (optinally) save the db for future use
save(db, file="db.rda", compress=TRUE)
# ...
# later, in a separate session, you can load it back:
load("db.rda")
}
\keyword{utilities}