All Posts By

Mark Hornick

CII Best Practices Badge for R Packages – responding to concerns

By | Blog, R Consortium Project

Our last post Should R Consortium Recommend CII Best Practices Badge for R Packages: Latest Survey Results summarized results from the CII Best Practices survey conducted this summer. A goal of the CII Best Practices program is to help improve open source software quality. Respondents shared several concerns to which David Wheeler, project lead for the Core Infrastructure Initiative (CII) with the Linux Foundation, and I wanted to respond.

Let’s dive in…

Concern #1: Does the CII Badge have the “correct” or “best” set of criteria?

The CII Badge criteria are the best general-purpose OSS project criteria that we, the OSS community, have developed to date.  The CII Badge criteria were developed based on the experience and recommendations of many experts, previous criteria developed by various organizations, and the examination of real-world successful OSS projects.  No doubt the criteria could be improved further, but the badge criteria are themselves open source and can be improved using the same process as any OSS code: simply propose changes for review!

Concern #2: Achieving a badge does not necessarily mean a given package is well-designed or well-implemented.

Many of the CII criteria can help push projects towards creating better- or well-designed and implemented packages.  In the “passing” level, the CII criteria include these requirements:

  • [warnings] criterion requires enabling compiler warning flags or similar
  • [static_analysis] requires the use of at least one static code analysis tool (assuming one exists)
  • [test] requires a test suite, which often nudges people towards better design and implementation
  • [test_policy] requires that you keep adding tests, especially as new functionality comes online
  • [know_secure_design] requires at least one primary developer know how to design secure software. The best practices site explains what this means under the “details” of this criterion.  In summary, this criterion requires that at least one primary developer understands the 8 principles of Saltzer and Schroeder (as explained by the CII Best Practices site) and also knowing to (1) limit the attack surface and (2) perform input validation with whitelists.  Software can be badly designed by knowledgeable people, but software is much more likely to be designed and implemented well if developers know the basics.
  • [know_common_errors] requires that at least one of the project’s primary developers must know of common kinds of errors that lead to vulnerabilities and at least one method to counter or mitigate each of them

Higher badge levels (“silver” and “gold”) offer even more.

It’s true that a badge doesn’t guarantee that a package is well-designed or implemented by some measure, but part of the problem is that it’s difficult to unambiguously determine if something is well-designed or well-implemented.  Much depends on the purpose of the package!  So instead, many criteria focus on enabling mass peer review and managing improvements, so that problems are more likely to be detected and corrected.

In short, software normally undergoes change over time.  Instead of requiring that a project be perfect at one point in time, we focus on criteria that will help projects continuously improve over the long run.

Concern #3: How does the CII help to ensure the validity of self-certification, e.g., through automated tools?

We use automated tools and reject some answers that are clearly false.  We require that replies be public and that there be URLs for some answers; that makes it easy for anyone to check answers.  In the worst case, we can override false answers, though in practice we’ve almost never found that necessary.

Concern #4: Even if every R package had a badge, the issue of finding a needed package among over 12K packages remains.

Finding a desired package or the “best” one for a given task is largely orthogonal to improving package quality, though the two can be related. The badging process can help, because one of the criteria is “The project website MUST succinctly describe what the software does (what problem does it solve?)”.  Search engines are much more effective at finding relevant packages once that kind of information is available. In another way, if using packages that state adherence to the CII criteria is important to you or your organization, the search space may be significantly reduced – at least as a starting point.

Concern #5: Can the CII criteria be streamlined to reflect only the needs of R packages, including those that are more data and documentation than code?

Our current primary approach for streamlining is to automate criteria.  That said, if you have a specific idea for streamlining things further, please file an issue on GitHub here.

Concern #6: Will automated tools be available for performing at least parts of the assessment, e.g., as found in R’s devtools?

We already use automated tools to assist in completing the form.  We’d rather not require people to install tools to fill in information, because that would be a barrier for some.  If there are tools we aren’t using and should use, let us know!

Concern #7: A badge program could penalize developers who do not have time, money, or skills to meet the criteria, making their packages less desirable if they do not achieve a badge.

We’ve worked hard to make the badge “passing” criteria doable for single-person projects.  Daniel Stenberg is the author and maintainer of cURL and libcurl, and he’s been especially influential in ensuring that the “passing” badge is doable for single-person projects.  If you have no tests, cannot automatically build your software (even though it requires building), or have never run a static analysis tool of any kind, then there is some work… but it’s better for users if these are addressed.

The top “gold” level requires multiple people in a project, e.g., because the project MUST have a “bus factor” of 2 or more.  That can be a challenge for developers, but it’s a big advantage for users – users would much rather depend on software where a single death doesn’t suddenly mean that there’s no one to update the software.  No one is required to get the gold level, however, and there are many ways to resolve this.

Concern #8: Introducing more process comes with additional burdens for package developers, perhaps reducing overall ecosystem participation.

We’ve done our best to minimize the risk from additional burdens.  We automate some answers, and that helps.  We reduce the risk of duplicated evaluation processes by having a single set of criteria for all OSS.  Perhaps most importantly: the criteria were developed by examining real-world successful projects, so they require actions that other projects are already doing and finding helpful.

Perhaps more importantly, keep in mind that getting a CII best practices badge is optional – a package author can decide if the benefits of adhering to the CII criteria outweigh the costs.

Concern #9: Is there a way to distinguish tests for validating statistical software numerical computations and statistical properties?

Sure.  Naming conventions for tests are a common way to distinguish types of tests; you can also put different kinds of tests in different directories.  From the badge perspective, we don’t focus on that distinction. For “passing” the key is that your project must have a general policy that as major new functionality is added to the software produced by the project, tests of that functionality should be added to an automated test suite.  Passing doesn’t require a perfect test suite; instead, we require that you have a test suite and that you’re committed to improving it. Since OSS is visible to the user community, a potential user may want to examine the type and quality of tests performed.  The higher-level badges do require better test suites, as you might expect.

 

We continue to receive valuable comments through the survey and are pleased to report that more R package authors are choosing to participate in the CII as evidenced by the surge in new R CII project entries.

Should R Consortium Recommend CII Best Practices Badge for R Packages: Latest Survey Results

By | Blog, R Consortium Project

Core Infrastructure Initiative Best Practices logoBased on our Fall 2017 survey, where the R Consortium asked about opportunities, concerns, and issues facing the R community, the R Consortium conducted a new survey this past month to solicit feedback on using the Linux Foundation (LF) Core Infrastructure Initiative (CII) Best Practices Badge Program for R packages. With your feedback, the R Consortium will base its recommendation for using the CII.  Your feedback will also help us and the Linux Foundation evolve the CII with the needs of the R Community, and FLOSS projects in general, in mind.

Introduction

With over 12,000 R packages on CRAN alone, the choice of which package to use for a given task is challenging. While summary descriptions, documentation, download counts and word-of-mouth may help direct selection, a standard assessment of package quality can greatly help identify the suitability of a package for a given need – commercial, academic, or otherwise. Providing the R Community of package users an easily recognized badge indicating the level of quality achievement would make it easier for users to know the quality of a package along several dimensions. In addition, providing R package authors and maintainers a checklist of “best practices” can help guide package development and evolution, as well as help package users know what to look for in a package.

The R Consortium has been exploring the pros and cons of recommending that R package authors, contributors, and maintainers adopt the Linux Foundation (LF) Core Infrastructure Initiative (CII) “best practices” badge. This badge provides a means for Free/Libre and Open Source Software (FLOSS) projects to highlight to what extent package authors follow best software practices, while enabling individuals and enterprises to assess quickly a package’s strengths and weaknesses across a range of dimensions. The CII Best Practices Badge Program is a voluntary, self-certification, at no cost to submit a questionnaire and earn a badge. An easy to use web application guides users in the process, even automating some of the steps.

More information on the CII Best Practices Badging Program is available: criteria, is available on GitHubProject statisticscriteria statistics., and videos. The projects page shows participating projects and supports queries (e.g., you can see projects that have a passing badge).

What did we learn?

Will the CII Best Practices Badge Program provide value to the R Community’s package developers or package users? 90% of survey respondents say ‘yes’ with 77% saying it has benefit for both developers and users. Perhaps not surprisingly, 95% of respondents had never heard of the CII before, but 74% would be willing to try it. This is according to 41 respondents, 56% of whom have been developing R packages 4 years or more, and over 60% who have developed two or more packages.

Of the six categories covered by the CII – licensing, documentation, change control, software quality, security, code analysis – over 55% of respondents found all criteria to be somewhat or highly beneficial. Over 80% found documentation and software quality criteria to be somewhat or highly beneficial. The details are provided in the table below.

Table: Expected degree of benefit for each CII criteria category

Using an open ended question, we asked respondents why the CII is good for the R Community? Here is a summary of the responses. The CII…

  • helps users discover and select R packages that adhere to software development best practices.
  • shows R developers through the badge criteria what is possible or desirable for FLOSS, especially if developers do not have a software engineering background.
  • provides an additional degree of assurance to the user community around package quality as well as provide a way for developers to assert more formally that they follow such best practices.
  • gathers and presents lessons learned from other FLOSS projects so developers don’t need to re-discover them.
  • creates an incentive to adopt a consistent set of practices throughout the R ecosystem.

While respondents were generally very positive about the use of the CII, concerns did arise:

  • Does the CII Badge have the “correct” or “best” set of criteria?
  • Achieving a badge does not necessarily mean a given package well designed or implemented.
  • How does the CII help to ensure the validity of self-certification, e.g., through automated tools?
  • Even if every R package had a badge, the issue of finding a needed package among over 12K packages remains.
  • Can the CII criteria be streamlined to reflect only the needs of R packages, including those that are more data and documentation than code?
  • Will automated tools be available for performing at least parts of the assessment, e.g., as found in R’s devtools?
  • A badge program could penalize developers who do not have time, money, or skills to meet the criteria, making their packages less desirable if they do not achieve a badge.
  • Introducing more process comes with additional burdens for package developers, perhaps reducing overall ecosystem participation.
  • Is there a way to distinguish tests for validating statistical software numerical computations and statistical properties?

Suggestions from the respondents on how best to take advantage of the CII Badge Program include:

  • The CII should be sure to reflect the existing quality criteria provided through CRAN.
  • Integrate the CII with CRAN or Bioconductor, e.g., display badges on respective package CRAN pages to give CII more visibility and so that users can identify more easily which package to use.
  • Use the CII to encourage package developers to train themselves in best practices.
  • Develop an automatic framework that will create/enforce all the criteria whenever possible.
  • Make the security criterion conditional based on what the package does. If a package never goes outside the R session, does it need a dedicated security expert?
  • Require packages implementing a statistical method be backed up by a peer-reviewed article.
  • Make it easier to recognize which criteria categories passed and by what percentage in a high level visual representation, perhaps incorporated into the badge itself.
  • “Not use it at all, it creates false impressions and discriminates against good domain packages in disciplines that simply use software rather than seek rewards.”
  • “Encourage R-Core to adopt these practices for R itself. Also, loosen the approach to LICENSE files on CRAN so as to make compliance easier.”
  • Keep it simple.

As you can see, there is quite a range of sentiment expressed regarding introducing such a badging program. Some concerns seem to be based on misunderstandings, for example, the badging process does not require a “dedicated security expert,” and there already is some degree of automation in the process. The R Consortium is grateful to the respondents for taking the time to provide their insightful and thoughtful responses. We will continue to work with the CII team to explore addressing the issues raised above, including clarifying misunderstandings where we can do so.

Since initiating this survey, however, multiple package have already taken the plunge to try the CII badge program:

foghorn R package to summarize CRAN Check Results in the Terminal https://github.com/fmichonneau/foghorn
osrm Shortest Paths and Travel Time from OpenStreetMap with R https://rgeomatic.hypotheses.org/category/osrm
R_Matrix R package for Sparse and Dense Matrix Classes and Methods

A rich hierarchy of matrix classes, including triangular, symmetric, and diagonal matrices, …

http://matrix.r-forge.r-project.org
base64enc R tools for base64 encoding https://github.com/s-u/base64enc
ggplot2 An implementation of the Grammar of Graphics in R https://ggplot2.tidyverse.org
covr Test coverage reports for R https://github.com/r-lib/covr
datastructures Implementation of core data structures for R. https://dirmeier.github.io/datastructures
madrid.air R package to parse air quality data published by http://datos.madrid.es/. https://github.com/nramon/madrid.air
pandas Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical… http://pandas.pydata.org
An R Package for Quick Uncertainty Intervals ciTools is an R package that makes working with model uncertainty as easy as possible. It gives the user easy access to confidence or prediction intervals… https://github.com/jthaman/ciTools
dodgr Distances on Directed Graphs in R https://ATFutures.github.io/dodgr
netReg Network-penalized generalized linear models in R and C++. https://dirmeier.github.io/netReg
DBI A database interface (DBI) definition for communication between R and RDBMSs http://dbi.r-dbi.org

 

If you’re a package developer, we hope you’ll join the package developers above and start your own CII Best Practices Badge. The survey will remain open to collect your feedback on the experience.