Should R Consortium Recommend CII Best Practices Badge for R Packages: Latest Survey Results

Based on our Fall 2017 survey, where the R Consortium asked about opportunities, concerns, and issues facing the R community, the R Consortium conducted a new survey this past month to solicit feedback on using the Linux Foundation (LF) Core Infrastructure Initiative (CII) Best Practices Badge Program for R packages. With your feedback, the R Consortium will base its recommendation for using the CII. Your feedback will also help us and the Linux Foundation evolve the CII with the needs of the R Community, and FLOSS projects in general, in mind.

Introduction

With over 12,000 R packages on CRAN alone, the choice of which package to use for a given task is challenging. While summary descriptions, documentation, download counts and word-of-mouth may help direct selection, a standard assessment of package quality can greatly help identify the suitability of a package for a given need – commercial, academic, or otherwise. Providing the R Community of package users an easily recognized badge indicating the level of quality achievement would make it easier for users to know the quality of a package along several dimensions. In addition, providing R package authors and maintainers a checklist of “best practices” can help guide package development and evolution, as well as help package users know what to look for in a package.

The R Consortium has been exploring the pros and cons of recommending that R package authors, contributors, and maintainers adopt the Linux Foundation (LF) Core Infrastructure Initiative (CII) “best practices” badge. This badge provides a means for Free/Libre and Open Source Software (FLOSS) projects to highlight to what extent package authors follow best software practices, while enabling individuals and enterprises to assess quickly a package’s strengths and weaknesses across a range of dimensions. The CII Best Practices Badge Program is a voluntary, self-certification, at no cost to submit a questionnaire and earn a badge. An easy to use web application guides users in the process, even automating some of the steps.

More information on the CII Best Practices Badging Program is available: criteria, is available on GitHub. Project statistics, criteria statistics., and videos. The projects page shows participating projects and supports queries (e.g., you can see projects that have a passing badge).

What did we learn?

Will the CII Best Practices Badge Program provide value to the R Community’s package developers or package users? 90% of survey respondents say ‘yes’ with 77% saying it has benefit for both developers and users. Perhaps not surprisingly, 95% of respondents had never heard of the CII before, but 74% would be willing to try it. This is according to 41 respondents, 56% of whom have been developing R packages 4 years or more, and over 60% who have developed two or more packages.

Of the six categories covered by the CII – licensing, documentation, change control, software quality, security, code analysis – over 55% of respondents found all criteria to be somewhat or highly beneficial. Over 80% found documentation and software quality criteria to be somewhat or highly beneficial. The details are provided in the table below.

Table: Expected degree of benefit for each CII criteria category

Using an open ended question, we asked respondents why the CII is good for the R Community? Here is a summary of the responses. The CII…

helps users discover and select R packages that adhere to software development best practices.
shows R developers through the badge criteria what is possible or desirable for FLOSS, especially if developers do not have a software engineering background.
provides an additional degree of assurance to the user community around package quality as well as provide a way for developers to assert more formally that they follow such best practices.
gathers and presents lessons learned from other FLOSS projects so developers don’t need to re-discover them.
creates an incentive to adopt a consistent set of practices throughout the R ecosystem.

While respondents were generally very positive about the use of the CII, concerns did arise:

Does the CII Badge have the “correct” or “best” set of criteria?
Achieving a badge does not necessarily mean a given package well designed or implemented.
How does the CII help to ensure the validity of self-certification, e.g., through automated tools?
Even if every R package had a badge, the issue of finding a needed package among over 12K packages remains.
Can the CII criteria be streamlined to reflect only the needs of R packages, including those that are more data and documentation than code?
Will automated tools be available for performing at least parts of the assessment, e.g., as found in R’s devtools?
A badge program could penalize developers who do not have time, money, or skills to meet the criteria, making their packages less desirable if they do not achieve a badge.
Introducing more process comes with additional burdens for package developers, perhaps reducing overall ecosystem participation.
Is there a way to distinguish tests for validating statistical software numerical computations and statistical properties?

Suggestions from the respondents on how best to take advantage of the CII Badge Program include:

The CII should be sure to reflect the existing quality criteria provided through CRAN.
Integrate the CII with CRAN or Bioconductor, e.g., display badges on respective package CRAN pages to give CII more visibility and so that users can identify more easily which package to use.
Use the CII to encourage package developers to train themselves in best practices.
Develop an automatic framework that will create/enforce all the criteria whenever possible.
Make the security criterion conditional based on what the package does. If a package never goes outside the R session, does it need a dedicated security expert?
Require packages implementing a statistical method be backed up by a peer-reviewed article.
Make it easier to recognize which criteria categories passed and by what percentage in a high level visual representation, perhaps incorporated into the badge itself.
“Not use it at all, it creates false impressions and discriminates against good domain packages in disciplines that simply use software rather than seek rewards.”
“Encourage R-Core to adopt these practices for R itself. Also, loosen the approach to LICENSE files on CRAN so as to make compliance easier.”
Keep it simple.

As you can see, there is quite a range of sentiment expressed regarding introducing such a badging program. Some concerns seem to be based on misunderstandings, for example, the badging process does not require a “dedicated security expert,” and there already is some degree of automation in the process. The R Consortium is grateful to the respondents for taking the time to provide their insightful and thoughtful responses. We will continue to work with the CII team to explore addressing the issues raised above, including clarifying misunderstandings where we can do so.

Since initiating this survey, however, multiple package have already taken the plunge to try the CII badge program:

foghorn	R package to summarize CRAN Check Results in the Terminal	https://github.com/fmichonneau/foghorn
osrm	Shortest Paths and Travel Time from OpenStreetMap with R	https://rgeomatic.hypotheses.org/category/osrm
R_Matrix	R package for Sparse and Dense Matrix Classes and Methods A rich hierarchy of matrix classes, including triangular, symmetric, and diagonal matrices, …	http://matrix.r-forge.r-project.org
base64enc	R tools for base64 encoding	https://github.com/s-u/base64enc
ggplot2	An implementation of the Grammar of Graphics in R	https://ggplot2.tidyverse.org
covr	Test coverage reports for R	https://github.com/r-lib/covr
datastructures	Implementation of core data structures for R.	https://dirmeier.github.io/datastructures
madrid.air	R package to parse air quality data published by http://datos.madrid.es/.	https://github.com/nramon/madrid.air
pandas	Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical…	http://pandas.pydata.org
An R Package for Quick Uncertainty Intervals	ciTools is an R package that makes working with model uncertainty as easy as possible. It gives the user easy access to confidence or prediction intervals…	https://github.com/jthaman/ciTools
dodgr	Distances on Directed Graphs in R	https://ATFutures.github.io/dodgr
netReg	Network-penalized generalized linear models in R and C++.	https://dirmeier.github.io/netReg
DBI	A database interface (DBI) definition for communication between R and RDBMSs	http://dbi.r-dbi.org

If you’re a package developer, we hope you’ll join the package developers above and start your own CII Best Practices Badge. The survey will remain open to collect your feedback on the experience.