With the improvement of sequencing techniques, chromatin immunoprecipitation followed by high throughput sequencing (ChIP-Seq) is getting popular to study genome-wide protein-DNA interactions. To address the lack of powerful ChIP-Seq analysis method, we presented the Model-based Analysis of ChIP-Seq (MACS), for identifying transcript factor binding sites. MACS captures the influence of genome complexity to evaluate the significance of enriched ChIP regions and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. MACS can be easily used for ChIP-Seq data alone, or with a control sample with the increase of specificity. Moreover, as a general peak-caller, MACS can also be applied to any “DNA enrichment assays” if the question to be asked is simply: where we can find significant reads coverage than the random background.
This package is a wrapper of the MACS toolkit based on basilisk
.
The package is built on basilisk. The dependent python library macs3 will be installed automatically inside its conda environment.
library(MACSr)
There are 13 functions imported from MACS3. Details of each function can be checked from its manual.
Functions | Description |
---|---|
callpeak |
Main MACS3 Function to call peaks from alignment results. |
bdgpeakcall |
Call peaks from bedGraph output. |
bdgbroadcall |
Call broad peaks from bedGraph output. |
bdgcmp |
Comparing two signal tracks in bedGraph format. |
bdgopt |
Operate the score column of bedGraph file. |
cmbreps |
Combine BEDGraphs of scores from replicates. |
bdgdiff |
Differential peak detection based on paired four bedGraph files. |
filterdup |
Remove duplicate reads, then save in BED/BEDPE format. |
predictd |
Predict d or fragment size from alignment results. |
pileup |
Pileup aligned reads (single-end) or fragments (paired-end) |
randsample |
Randomly choose a number/percentage of total reads. |
refinepeak |
Take raw reads alignment, refine peak summits. |
callvar |
Call variants in given peak regions from the alignment BAM files. |
callpeak
We have uploaded multipe test datasets from MACS to a data package
MACSdata
in the ExperimentHub
. For example, Here we download a
pair of single-end bed files to run the callpeak
function.
eh <- ExperimentHub::ExperimentHub()
eh <- AnnotationHub::query(eh, "MACSdata")
CHIP <- eh[["EH4558"]]
#> see ?MACSdata and browseVignettes('MACSdata') for documentation
#> loading from cache
CTRL <- eh[["EH4563"]]
#> see ?MACSdata and browseVignettes('MACSdata') for documentation
#> loading from cache
Here is an example to call narrow and broad peaks on the SE bed files.
cp1 <- callpeak(CHIP, CTRL, gsize = 5.2e7, store_bdg = TRUE,
name = "run_callpeak_narrow0", outdir = tempdir(),
cutoff_analysis = TRUE)
#> + /home/biocbuild/.cache/R/basilisk/1.14.0/0/bin/conda 'create' '--yes' '--prefix' '/home/biocbuild/.cache/R/basilisk/1.14.0/MACSr/1.10.0/env_macs' 'python=3.10' '--quiet' '-c' 'conda-forge'
#> + /home/biocbuild/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/home/biocbuild/.cache/R/basilisk/1.14.0/MACSr/1.10.0/env_macs' 'python=3.10'
#> + /home/biocbuild/.cache/R/basilisk/1.14.0/0/bin/conda 'install' '--yes' '--prefix' '/home/biocbuild/.cache/R/basilisk/1.14.0/MACSr/1.10.0/env_macs' '-c' 'conda-forge' 'python=3.10' 'python=3.10'
#>
cp2 <- callpeak(CHIP, CTRL, gsize = 5.2e7, store_bdg = TRUE,
name = "run_callpeak_broad", outdir = tempdir(),
broad = TRUE)
#>
Here are the outputs.
cp1
#> macsList class
#> $outputs:
#> /tmp/RtmpG8fc6E/run_callpeak_narrow0_control_lambda.bdg
#> /tmp/RtmpG8fc6E/run_callpeak_narrow0_cutoff_analysis.txt
#> /tmp/RtmpG8fc6E/run_callpeak_narrow0_model.r
#> /tmp/RtmpG8fc6E/run_callpeak_narrow0_peaks.narrowPeak
#> /tmp/RtmpG8fc6E/run_callpeak_narrow0_peaks.xls
#> /tmp/RtmpG8fc6E/run_callpeak_narrow0_summits.bed
#> /tmp/RtmpG8fc6E/run_callpeak_narrow0_treat_pileup.bdg
#> $arguments: tfile, cfile, gsize, outdir, name, store_bdg, cutoff_analysis
#> $log:
#>
cp2
#> macsList class
#> $outputs:
#> /tmp/RtmpG8fc6E/run_callpeak_broad_control_lambda.bdg
#> /tmp/RtmpG8fc6E/run_callpeak_broad_model.r
#> /tmp/RtmpG8fc6E/run_callpeak_broad_peaks.broadPeak
#> /tmp/RtmpG8fc6E/run_callpeak_broad_peaks.gappedPeak
#> /tmp/RtmpG8fc6E/run_callpeak_broad_peaks.xls
#> /tmp/RtmpG8fc6E/run_callpeak_broad_treat_pileup.bdg
#> $arguments: tfile, cfile, gsize, outdir, name, store_bdg, broad
#> $log:
#>
macsList
classThe macsList
is designed to contain everything of an execution,
including function, inputs, outputs and logs, for the purpose of
reproducibility.
For example, we can the function and input arguments.
cp1$arguments
#> [[1]]
#> callpeak
#>
#> $tfile
#> CHIP
#>
#> $cfile
#> CTRL
#>
#> $gsize
#> [1] 5.2e+07
#>
#> $outdir
#> tempdir()
#>
#> $name
#> [1] "run_callpeak_narrow0"
#>
#> $store_bdg
#> [1] TRUE
#>
#> $cutoff_analysis
#> [1] TRUE
The files of all the outputs are collected.
cp1$outputs
#> [1] "/tmp/RtmpG8fc6E/run_callpeak_narrow0_control_lambda.bdg"
#> [2] "/tmp/RtmpG8fc6E/run_callpeak_narrow0_cutoff_analysis.txt"
#> [3] "/tmp/RtmpG8fc6E/run_callpeak_narrow0_model.r"
#> [4] "/tmp/RtmpG8fc6E/run_callpeak_narrow0_peaks.narrowPeak"
#> [5] "/tmp/RtmpG8fc6E/run_callpeak_narrow0_peaks.xls"
#> [6] "/tmp/RtmpG8fc6E/run_callpeak_narrow0_summits.bed"
#> [7] "/tmp/RtmpG8fc6E/run_callpeak_narrow0_treat_pileup.bdg"
The log
is especially important for MACS
to check. Detailed
information was given in the log when running.
cat(cp1$log)
More details about MACS3
can be found: https://macs3-project.github.io/MACS/.
sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=en_GB LC_COLLATE=C
#> [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
#>
#> time zone: America/New_York
#> tzcode source: system (glibc)
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] MACSdata_1.9.0 MACSr_1.10.0 BiocStyle_2.30.0
#>
#> loaded via a namespace (and not attached):
#> [1] KEGGREST_1.42.0 dir.expiry_1.10.0
#> [3] xfun_0.40 bslib_0.5.1
#> [5] Biobase_2.62.0 lattice_0.22-5
#> [7] bitops_1.0-7 vctrs_0.6.4
#> [9] tools_4.3.1 generics_0.1.3
#> [11] stats4_4.3.1 curl_5.1.0
#> [13] parallel_4.3.1 tibble_3.2.1
#> [15] fansi_1.0.5 AnnotationDbi_1.64.0
#> [17] RSQLite_2.3.1 blob_1.2.4
#> [19] pkgconfig_2.0.3 Matrix_1.6-1.1
#> [21] dbplyr_2.3.4 S4Vectors_0.40.0
#> [23] GenomeInfoDbData_1.2.11 lifecycle_1.0.3
#> [25] compiler_4.3.1 Biostrings_2.70.0
#> [27] GenomeInfoDb_1.38.0 httpuv_1.6.12
#> [29] htmltools_0.5.6.1 sass_0.4.7
#> [31] RCurl_1.98-1.12 yaml_2.3.7
#> [33] interactiveDisplayBase_1.40.0 crayon_1.5.2
#> [35] pillar_1.9.0 later_1.3.1
#> [37] jquerylib_0.1.4 ellipsis_0.3.2
#> [39] cachem_1.0.8 mime_0.12
#> [41] ExperimentHub_2.10.0 AnnotationHub_3.10.0
#> [43] basilisk_1.14.0 tidyselect_1.2.0
#> [45] digest_0.6.33 purrr_1.0.2
#> [47] dplyr_1.1.3 bookdown_0.36
#> [49] BiocVersion_3.18.0 fastmap_1.1.1
#> [51] grid_4.3.1 cli_3.6.1
#> [53] magrittr_2.0.3 utf8_1.2.4
#> [55] withr_2.5.1 filelock_1.0.2
#> [57] promises_1.2.1 rappdirs_0.3.3
#> [59] bit64_4.0.5 XVector_0.42.0
#> [61] rmarkdown_2.25 httr_1.4.7
#> [63] bit_4.0.5 reticulate_1.34.0
#> [65] png_0.1-8 memoise_2.0.1
#> [67] shiny_1.7.5.1 evaluate_0.22
#> [69] knitr_1.44 IRanges_2.36.0
#> [71] basilisk.utils_1.14.0 BiocFileCache_2.10.0
#> [73] rlang_1.1.1 Rcpp_1.0.11
#> [75] xtable_1.8-4 glue_1.6.2
#> [77] DBI_1.1.3 BiocManager_1.30.22
#> [79] BiocGenerics_0.48.0 jsonlite_1.8.7
#> [81] R6_2.5.1 zlibbioc_1.48.0