About this document

This document is a R notebook, dynamically created from the numbers extracted on the project. It lists all datasets published for the project, providing basic numbers, figures and a quick summary, and serves as a test case to make sure that all the required data is present and roughly consistent with requirements. All plots and tables are computed from the actual data as provided in the downloads.

To re-execute the document, simply start a R session, load rmarkdown and render the page with the project ID as a parameter:

render("datasets_report.Rmarkdown", params = list(project_id = "technology.jgit"), output_format="html_document")

This website uses the blogdown R package, which provides a different output_format for the hugo framework.

This report was generated on 2021-04-25.


All data is retrieved from Alambic, an open-source framework for development data extraction and processing.

This project’s analysis page can be found on the Alambic instance for the Eclipse forge, at

Downloads are composed of gzip’d CSV and JSON files. CSV files always have a header to name the fields, which makes it easy to import in analysis software like R:

data <- read.csv(file='myfile.csv', header=T)

List of datasets generated for the project:

  • Git
    • Git Commits (CSV) – Full list of commits with id, message, time, author, committer, and added, deleted and modifed lines.
    • Git Commits Evol (CSV) – Evolution of number of commits and authors by day.
    • Git Log (TXT) – the raw export of git log.
  • Bugzilla
  • Eclipse Forums
    • Forums Posts (CSV) – list of all forum posts for this project.
    • Forums threads (CSV) – list of all forum threads for this project.
  • Jenkins CI
  • Eclipse PMI
    • PMI Checks (CSV) – list of all checks applied to the Project Management Infrastructure entries for the project.


Git commits

Download: git_commits_evol.csv.gz

data <- read.csv(file=file_git_commits_evol, header=T)

File is git_commits_evol.csv, and has 3 columns for 2427 entries.

data$commits_sum <- cumsum(data$commits)
data.xts <- xts(x = data[,c('commits_sum', 'commits', 'authors')],[,c('date')]), format="%Y-%m-%d"))

time.min <- index(data.xts[1,])
time.max <- index(data.xts[nrow(data.xts)])
all.dates <- seq(time.min, time.max, by="days")
empty <- xts( = all.dates) <- merge(empty, data.xts, all=T)[ == T] <- 0

p <-dygraph([,c('commits')],
        main = paste('Daily commits for ', project_id, sep=''),
        width = 800, height = 250 ) %>%
Daily commits for technology.jgit
Jan 2010
Jan 2011
Jan 2012
Jan 2013
Jan 2014
Jan 2015
Jan 2016
Jan 2017
Jan 2018
Jan 2019
Jan 2020
Jan 2021

Git log

Download: git_log.txt.gz

File is git_log.txt, and full log has 113072 lines.


Bugzilla issues

Download: bugzilla_issues.csv.gz

data <- read.csv(file=file_bz_issues, header=T)

File is bugzilla_issues.csv, and has 17 columns for 1475 issues.

Bugzilla open issues

Download: bugzilla_issues_open.csv.gz

data <- read.csv(file=file_bz_issues_open, header=T)

File is bugzilla_issues_open.csv, and has 17 columns for 485 issues (all open).

Bugzilla evolution

Download: bugzilla_evol.csv.gz

data <- read.csv(file=file_bz_evol, header=T)

File is bugzilla_evol.csv, and has 3 columns for 1128 weeks.

Let’s try to plot the monthly number of submissions for the project:

Monthly issues submissions for technology.jgit


Download: bugzilla_versions.csv.gz

data <- read.csv(file=file_bz_versions, header=T)

File is bugzilla_versions.csv, and has 2 columns for 72 weeks.


Download: bugzilla_components.csv.gz

data <- read.csv(file=file_bz_components, header=T)

File is bugzilla_components.csv, and has 2 columns for 1 weeks.

data.sorted <- data[order(data$Bugs, decreasing = T),]

g <- gvisColumnChart(data.sorted, options=list(title='List of product components', legend="{position: 'none'}", width="automatic", height="300px"))

Eclipse Forums

Forums posts

Download: eclipse_forums_posts.csv.gz

data <- read.csv(file=file_forums_posts, header=T)

File is eclipse_forums_posts.csv, and has 6 columns for 7993 posts. The evolution of posts

data$ <- as.POSIXct(data$created_date, origin="1970-01-01")
posts.xts <- xts(data, = data$

time.min <- index(posts.xts[1,])
time.max <- index(posts.xts[nrow(posts.xts)])
all.dates <- seq(time.min, time.max, by="weeks")
empty <- xts( = all.dates) <- merge(empty, posts.xts$id, all=T)[ == T] <- 0

posts.weekly <- apply.weekly(, FUN = nrow)
names(posts.weekly) <- c("posts")

p <- dygraph(
  data = posts.weekly[-1,],
  main = paste('Weekly forum posts for ', project_id, sep=''),
  width = 800, height = 250 ) %>%
  dyAxis("x", drawGrid = FALSE) %>%
  dySeries("posts", label = "Weekly posts") %>%
  dyOptions(stepPlot = TRUE) %>%
Weekly forum posts for technology.jgit
Jan 2010
Jan 2011
Jan 2012
Jan 2013
Jan 2014
Jan 2015
Jan 2016
Jan 2017
Jan 2018
Jan 2019
Jan 2020
Jan 2021

The list of the 10 last active posts on the forums:

data$ <- as.POSIXct(data$created_date, origin="1970-01-01")
posts.table <- head(data[,c('id', 'subject', '', 'author_id')], 10)
posts.table$subject <- paste('<a href="', posts.table$html_url, '">', posts.table$subject, '</a>', sep='')
posts.table$ <- as.character(posts.table$
names(posts.table) <- c('ID', 'Subject', 'Post date', 'Post author')

    xtable(head(posts.table, 10),
        caption = paste('10 most recent posts on', project_id, 'forum.', sep=" "),
        digits=0, align="lllll"), type="html",
    html.table.attributes='class="table table-striped"',
    sanitize.text.function=function(x) { x }
10 most recent posts on technology.jgit forum.
ID Subject Post date Post author
1840436 Re: Unable to clone repo in ssh or https in 5.12 2021-04-15 08:37:19 213855
1840415 Re: Unable to clone repo in ssh or https in 5.12 2021-04-14 19:38:04 231272
1840414 Re: Unable to clone repo in ssh or https in 5.12 2021-04-14 18:31:40 213855
1840413 Re: Unable to clone repo in ssh or https in 5.12 2021-04-14 18:17:25 213855
1840409 Re: Unable to clone repo in ssh or https in 5.12 2021-04-14 17:12:42 231272
1840406 Re: Unable to clone repo in ssh or https in 5.12 2021-04-14 15:38:20 213855
1840404 Unable to clone repo in ssh or https in 5.12 2021-04-14 14:22:52 231272
1840324 Re: Overriding Git config file without a file based config 2021-04-12 13:32:21 213855
1840313 Re: Overriding Git config file without a file based config 2021-04-12 10:36:40 231934
1840292 Re: Warning message in JGit for pull and push 2021-04-11 19:09:33 231450

Forums threads

Download: eclipse_forums_threads.csv.gz

data <- read.csv(file=file_forums_threads, header=T)

File is eclipse_forums_threads.csv, and has 8 columns for 1986 threads. A wordcloud with the main words used in threads is presented below.

The list of the 10 last active threads on the forums:

data$ <- as.POSIXct(data$last_post_date, origin="1970-01-01")
threads.table <- head(data[,c('id', 'subject', '', 'last_post_id', 'replies', 'views')], 10)
threads.table$subject <- paste('<a href="', threads.table$html_url, '">', threads.table$subject, '</a>', sep='')
threads.table$ <- as.character(threads.table$
names(threads.table) <- c('ID', 'Subject', 'Last post date', 'Last post author', 'Replies', 'Views')

        caption = paste('10 last active threads on', project_id, 'forum.', sep=" "),
        digits=0, align="lllllll"), type="html",
    html.table.attributes='class="table table-striped"',
    sanitize.text.function=function(x) { x }
10 last active threads on technology.jgit forum.
ID Subject Last post date Last post author Replies Views
1107674 Unable to clone repo in ssh or https in 5.12 2021-04-15 08:37:19 1840436 6 265
1107615 Overriding Git config file without a file based config 2021-04-12 13:32:21 1840324 3 187
1107487 Warning message in JGit for pull and push 2021-04-11 19:09:33 1840292 7 571
1107389 After upgrade to 2021-03 I cannot log in anymore using ssh 2021-03-21 12:53:40 1839436 2 406
1107356 Unable to tell when a file has changed 2021-03-19 19:16:43 1839393 2 120
1107282 ReceivePack and preReceiveHook 2021-03-12 13:11:26 1839071 1 85
1107150 change view in compare 2021-03-12 13:38:40 1839077 4 262
1107138 Applying personal access tokens 2021-03-12 13:12:57 1839072 5 236
1107054 integrated git console for EGit 2021-03-02 18:40:54 1838641 2 1042
1107041 Clone LFS exception 2021-02-22 10:33:24 1838319 0 976



Download: jenkins_builds.csv.gz

data <- read.csv(file=file_jenkins_builds, header=T)

File is jenkins_builds.csv, and has 7 columns for 180 commits.

ID Name Time Result
342 jgit \#342 1.619292e+12 SUCCESS
341 jgit \#341 1.618793e+12 SUCCESS
340 jgit \#340 1.618418e+12 SUCCESS
339 jgit \#339 1.618091e+12 SUCCESS
338 jgit \#338 1.617963e+12 SUCCESS
337 jgit \#337 1.617837e+12 SUCCESS
336 jgit \#336 1.617728e+12 SUCCESS
335 jgit \#335 1.617707e+12 ABORTED
334 jgit \#334 1.617365e+12 SUCCESS
333 jgit \#333 1.617362e+12 FAILURE


Download: jenkins_jobs.csv.gz

data <- read.csv(file=file_jenkins_jobs, header=T)

File is jenkins_jobs.csv, and has 15 columns for 11 commits.

Name Colour Last build time Health report
jgit blue 1.619292e+12 100
jgit.bazel disabled 1.497561e+12 0
jgit.old disabled 1.582496e+12 100 disabled 0.000000e+00 0
stable UNKNOWN 0.000000e+00 0
test blue 1.562684e+12 100 disabled 1.552000e+12 40
thomas-test disabled 1.561285e+12 100
webmaster-test disabled 1.557489e+12 80
webmaster-windows-test disabled 1.597747e+12 40


PMI Checks

Download: eclipse_pmi_checks.csv.gz

data <- read.csv(file=file_pmi_checks, header=T)

File is eclipse_pmi_checks.csv, and has 3 columns for 17 commits.

checks.table <- head(data[,c('Description', 'Value', 'Results')], 10)

        caption = paste('Extract of the 10 first PMI checks for ', 
                        project_id, '.', sep=" "),
        digits=0, align="llll"), type="html",
    html.table.attributes='class="table table-striped"',
    sanitize.text.function=function(x) { x }
Extract of the 10 first PMI checks for technology.jgit .
Description Value Results
Checks if the URL can be fetched using a simple get query.\_bug.cgi?product=JGit OK: Create URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. OK: Query URL could be successfully fetched.
Sends a get request to the given CI URL and looks at the headers in the response (200 404..). Also checks if the URL is really a Hudson instance (through a call to its API). OK. Fetched CI URL.\\OK. CI URL is a Hudson instance. Title is \[master\]
Checks if the Dev ML URL can be fetched using a simple get query. OK: Dev ML URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. OK: Documentation URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. OK: Download URL could be successfully fetched.
Checks if the Forums URL can be fetched using a simple get query. OK. Forum \[JGit and EGit forum\] correctly defined.\\OK: Forum \[JGit and EGit forum\] URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. Failed: no URL defined for gettingstarted\_url.
Checks if the Mailing lists URL can be fetched using a simple get query. OK. \[jgit-build\] ML correctly defined with email.\\OK: \[jgit-build\] ML URL could be successfully fetched.
Checks if the URL can be fetched using a simple get query. Failed: no URL defined for plan.