Category

Blog

CII Best Practices – R Package Leaderboard

By | Blog

Since my last post on the Core Infrastructure Initiative CII Best Practices Badge for R Packages – responding to concerns, there have been many R language projects started – and completed – on the CII Best Practices site. In this post, we recognize the R projects that have achieved the CII Best Practices – Passing level, and note that several are well on their way to achieving silver level.  In all, there are more than 50 CII projects related to R packages, with the popular ggplot2 package at the cusp of joining the group below with 97% completion as of this post.

Please congratulate these package owners for their achievement. If you’re a package developer, consider adding your package to the CII Best Practices ranks, and work your way through the levels of passing, silver, and gold.

Id Name Description Owner
265 madrid.air Parse air quality data published by http://datos.madrid.es/ Ramón Novoa
1882 DBI A database interface (DBI) definition for communication between R and RDBMSs Kirill Müller
2011 Delaporte Provides the probability mass, distribution, quantile, random variate generation, and method of moments parameter estimation Avraham Adler
2022 lamw Calculates the real-valued branches of the Lambert-W function Avraham Adler
2033 pade Returns the numerator and denominator when given a vector of Taylor series coefficients of sufficient length as input Avraham Adler
2041 fixedWidth Save fixed width files Jeston
2053 DataExplorer Simplified Exploratory Data Analysis Boxuan Cui
2054 PKNCA Perform all noncompartmental analysis (NCA) calculations for pharmacokinetic (PK) data Bill Denney
2055 BAS Bayesian Variable Selection and Model Averaging using Bayesian Adaptive Sampling Merlise Clyde
2083 MortalityLaws Fit and compare the most popular human mortality laws MariusD. Pascariu
2135 drake A general-purpose workflow manager for data-driven tasks in R that rebuilds intermediate data objects when their dependencies Will Landau
2136 httptest A Test Environment for HTTP Requests in R Neal Richardson
468 busdater Business dates for R Mick Mioduszewski
2527 jtools Summarize and visualize regressions with other helpful tools Jacob Long

Package licensing and enterprise use

By | Blog

For enterprise users of R, licensing terms of open source software can occupy a significant share of Legal and Corporate Architecture departments time. In Should R Consortium Recommend CII Best Practices Badge for R Packages: Latest Survey Results, one survey topic touched on the licensing of R packages. In talking with various enterprise users of R, there were a few suggestions about how the R community could make leveraging R packages easier within enterprises, while allowing Legal and Corporate Architecture departments to get more sleep.

Getting approvals to use packages

Some of you may be familiar with the process that enterprise users of R packages go through for approvals to use R in their products. Third party software often needs to go through legal reviews, corporate architectural reviews, security reviews, and line of business approvals before they can find their way into use within an enterprise or in products that they produce.

One area of concern is the use of GPL licenses, and the potential impact they may have on proprietary software. See Why GPL still gives enterprises the jitters for more discussion. While there are varying debates about the true impact of a certain license designation, for example, GPL–2 versus GPL–3, in many large organizations, a more conservative interpretation is often applied. (Comparing license options.)

Perhaps less known, is that it’s not just the license of the package in question, but all of its dependent packages, recursively. For example, is a GPL–3 licensed package using a GPL–2 license package validly designated?

What can we do?

When we ask representatives of enterprises who are responsible for approving the use of third-party software what would make their easier, a few suggestions for package authors and maintainers arise concerning licensing:

  • Packages should not depend on other packages that have incompatibly licensed materials
  • Use the most permissive license possible for your package, for example, LGPL, GPL–3 or GPL>=2, as opposed to just GPL–2
  • Minimize the number of dependent packages whenever possible, since each one requires its own approval process which affects adoption
  • Avoid using packages with more restrictive licensing terms than you intend for your package

We encourage package authors and maintainers to review their dependent packages and look for opportunities to address the suggestions above. Where possible, encourage dependent package authors and maintainers to adopt more permissive licenses as well. Where not possible, ask whether the functionality provided by the dependent package is essential.

For enterprise users of open source software, ask your Legal departments to share their concerns with developers so more informed choices can be made in the future.

2019 Update One: R Consortium and ISC Announce the Newest Funded Projects for the R Community

By | Announcement, Blog

We are excited to announce a wide and diverse group of new R Consortium funded projects. If you are interested in finding out more about these projects, connect with the project owners via links provided below each project. 

New Projects include:

Strengthening of R in support of spatial data infrastructures management

Project Owner: Emmanuel Blondel

The project aims to strengthen the role of R in support of Spatial Data Infrastructures (SDI) management, through major enhancements of the geometa R package which offers tools for reading and writing ISO/OGC geographic metadata, including ISO 19115, 19110, and 19119 through the ISO 19139 XML format. This also extends to the Geographic Markup Language (GML – ISO 19136) used for describing geographic data. The use of geometa in combination with publication tools such as ows4R and geosapi fosters the use of R software to ease the management and publication of metadata documents and related datasets in web catalogues, and then allows to move forward with a real R implementation of spatial data management plans based on FAIR (Findable, Accessible Interoperable and Reusable) principles.

The work plan includes several activities such as working on the completeness of the ISO 19115 (ISO 19115-1 and 19115-2) data model in geometa, functions to read/write multilingual metadata documents, and an increased metadata validation capability with a validator targeting the EU INSPIRE directive. Finally, functions will be made available to convert between geometa ISO/OGC metadata objects and other known metadata objects such as NetCDF-CF and EML (Ecological Metadata Language) to foster metadata interoperability. By providing these R tools, we seek to facilitate the work of spatial data (GIS) managers, but also data scientists, whatever the thematic domain, whose daily tasks consist in handling data, describing them with metadata and publishing datasets

Learn more about the project here

Catalyzing R-hub Adoption Through R Package Developer Advocacy

Project Owner:  Maëlle Salmon,

After the continuing technical progress of R-hub over the last two years, this project aims at
catalyzing its adoption by R package developers of all levels through developer advocacy. Indeed, R-hub is currently a successful and very valuable project, but it is not documented thoroughly, which hinders its wider adoption by package developers. This project shall answer this concern by three main actions: improving R-hub documentation, making R-hub
better known in the community and making the R-hub web site more attractive to, and easier to use by, R developers and users via the ingestion of METACRAN services and the creation of a R-hub blog.

Learn more about the project here

 

Licensing R – Guidelines and Tools

Project Owner: Colin Fay

Licensing is a vital part of Open Source. It provides guidelines for interacting with a program, and for making code accessible and reusable (or not). It provides a way to make code open source, in a way one wants to share it, protecting how it will be used and reused. Licensing is also challenging and complex: there are a lot of available licenses, and the choice is influenced by how you import and interact with elements from other packages and/or programs.

With this project, we propose to explore and document the current state of open source licenses in R, and to decipher compatibility and incompatibility elements inside these licenses, to help developers chose the best suited license for their project.
Screen reader support enabled.

Learn more about the project here and here.

 

Data-Driven Discovery and Tracking of  R Consortium Activities

Project Owner: Benaiah Chibuokem Ubah

This project proposes an infrastructure that provides a data-driven approach to render the yearly activities of the R Consortium, by deploying web pages for discovering and tracking ISC Funded Projects, RUGS and Marketing activities. These pages are planned to appear like dashboards summarizing activities in interactive tables and charts, presenting several views, trends and insights to what R Consortium has achieved over time. The project hopes that presenting these achievements in a data-driven manner to the R community, the data science community and prospective R Consortium members will promote greater transparency, productivity and community inclusiveness around R Consortium activities. Screen reader support enabled.

Learn more about the project here.

 

serveRless

Project Owners: Christoph Bodner, Florian Schwendinger, Thomas Laber

R is a great language for rapid prototyping and experimentation, but putting an R model in production is still more complex and time-consuming than it needs to be. With the growing popularity of serverless computing frameworks such as AWS Lambda and Azure Functions we see a huge chance to allow R developers to more easily deploy their code into production. We want to create an R package that provides a common API for different Function-as-a-Service providers such as Azure Functions and AWS Lambda.  We will also look into integrating Docker-as-a-Service (e.g. Azure Container Services) if appropriate. Our main goal is to build a user-friendly cloud agnostic wrapper that can be extended to include additional cloud providers later on. We want to build on the work already done for deploying R functions to AWS Lambda by Philipp Schirmer and on the work already done by Neal Fultz and Gergely Daróczi on a gRPC client/server for R, which is necessary for Azure Functions.

If you like our idea and want to help us, feel free to reach out to us on Github here

 

Next-Generation Text Layout in Grid and ggplot2

Project Owner: Claus Wilke

Text is a key component of any data visualization. We need to label axes and legends, we need to annotate or highlight specific data points, and we need to provide plot titles and captions. The R graphics package ggplot2 provides numerous features to customize the labeling and annotation of plots, but ultimately it is limited by the current capabilities of the underlying graphics library it uses, grid. Grid can draw simple text strings or mathematical expressions (via plotmath) in different colors, sizes, and fonts. However, it lacks functionality for changing formatting within a string (e.g., draw a single word in italics or in a different color), and it also cannot draw text boxes, where the text is enclosed in a box with defined margins, padding, or background color. This project will support the development of a new package, gridtext, that will alleviate these text formatting limitations. The project will also support efforts to make these new capabilities available from within ggplot2.

Learn more about the project here

 

Symbolic Formulae for Linear Mixed Models

Project Owner: Emi Tanaka

Symbolic model formulae define the structural component of a statistical model in an easier and often more accessible terms for practitioners. The earlier instance of symbolic model formulae for linear models was applied in Genstat with further generalization by Wilkinson and Rogers (1973). Chambers and Hastie (1993) describe the symbolic model formulae implementation for linear models in the S language which remains much the same in the R language (Venables et al. 2018).

Linear mixed models (LMMs) are widely used across many disciplines (e.g. ecology, psychology, agriculture, finance etc) due to its flexibility to model complex, correlated structures in the data. While the symbolic formula of linear models generally have a consistent representation and evaluation rule as implemented in stats::formula, this is not the case for LMMs. The inconsistency of symbolic formulae arises mainly in the representation of random effects, with the additional need to specify the variance-covariance structure of the random effects as well as structure of the associated model matrix that governs how the random effects are mapped to (groups of) the observational units. The differences give rise to confusion of equivalent model specification in different R-packages.

The lack of consistency in symbolic formula and model representation across mixed model software motivates the need to formulate a unified symbolic model formulae for LMMs with: (1) extension of the evaluation rules described in Wilkinson and Rogers (1973); and (2) ease of comprehension of the specified model for the user. This symbolic model formulae can be a basis for creating a common API to mixed models with wrappers to popular mixed model R-packages, thereby achieving a similar feat to parsnip R-package (Kuhn 2018) which implements a tidy unified interface to many predictive modeling functions (e.g. random forest, logistic regression, survival models etc).

We would like to find out what are your experiences with fitting linear mixed model in R! Please fill out this survey to help us understand your problems.

Learn more about the project here

 

Editorial Assistance for the R Journal

Project Owner: Di Cook  

This project supports the operation of the R Journal. There are two aspects, one is to fund an editorial assistant to send reminders about reviews, and assist with typesetting and copyediting issues. The second part is to explore updating the technical operations of the journal production.

Learn more about the project here

ISC Call for Proposals

By | Announcement, Blog

The March 2019 ISC Call for Proposals is now open. Once again, we are looking for ambitious projects that will contribute to the infrastructure of the R ecosystem and benefit large sections of the R community.  

Our goal in calling for proposals is to stimulate creativity and help turn good ideas into tangible benefits for the R Community. What can you do to improve the R ecosystem and how can the R Consortium help you do it?

We encourage you to “Think Big” but structure your proposal with intermediate milestones. The ISC is most likely to fund proposals that ask for modest initial grants. We tend to be conservative with initial grants, preferring projects structured in way that significant early milestones can be achieved with a modest amount of financial support.

As with any proposed project, the more detailed and credible the project plan and the better the track record of the project team, the higher the likelihood of receiving funding. Please be sure that your proposal includes the following:

  • Measurable objectives
  • Intermediate milestones
  • A list of all team members who will contributing work
  • Detailed accounting of how the grant money will be spent

You may find that reviewing some previously funded projects will help stimulate your thinking. Notice that not all projects require software development. The Guide to using Census Data and the Missing Data Task View are work products from recent ISC funded projects that focused on documentation.  

If you are really thinking big, consider proposing an ambitious project such as the R Validation Hub, or the R / Pharma and R / Medicine conferences that are funded and organized as ISC working groups.

Please note that proposals to sponsor conferences, workshops or meetups should be sent directly to the R Consortium’s R User Group and Small Conference Support Program, or the R Consortium Marketing Committee.

To submit a proposal for ISC funding, read the Call for Proposals page and submit a self-contained pdf using the online form. You should receive confirmation within 24 hours.

The deadline for submitting a proposal is midnight PST, April 1, 2019.

RC RUGS 2019 Is Up and Running

By | Announcement, Blog, Uncategorized

The R Consortium’s 2019 R User Group and Small Conference Support program which provides cash grants to R focused user groups, and small R-themed conferences is now accepting applications for financial support.

R User Groups

Grants to R user groups are awarded in three categories that depend on the number of users who typically attend meetings, and the frequency with which the group meets.

Array Level: Large, established R user groups that held at least three meetings in the six month period prior to applying that attracted more than 100 attendees may be eligible for $1,000 grants.

Matrix Level: R user groups that held at least three meetings in the six month period prior to applying that attracted at least 50 attendees may be eligible for $500 grants.

Vector Level: Other groups, even very small ones just getting started, may be eligible for $150  grants.

In addition to the cash grants, R user groups accepted into the program are eligible to participate in the R Consortium’s meetup.com Pro program. Under this program, the R Consortium will pay a user group’s meetup.com dues for twelve months.

Small Conferences

Small conferences, typically those that expect to attract less than 200 people may apply for cash grants up to $1,000. To qualify, a conference must be either entirely devoted to the R language or applications using R, or have a significant amount of R content. To apply, conferences should have a public-facing web page with a code of conduct, information about the technical program and sponsorship information. Conferences will be evaluated, and grants awarded on a case-by-case basis.

Details for RUGS, meetup.com Pro and Small Conference programs may be found here on the R Consortium website. To apply for support, please use the online form.

 

R-users & Community: give us your feedback on a R Certification to teach & verify skilled R Professionals

By | Announcement, Blog

In the past few years, we have seen an increase in the demand for R – both from employers looking for skilled R-users and professionals looking to further improve their skills. Due to this supply and demand gap, there have been various teaching channels created in an attempt to extend knowledge of the language. Even with the abundance of R teaching material, we still face a dearth of qualified, skilled R users. The inability to differentiate self-taught data scientists from qualified personnel creates confusion for employers and difficulties for quality professionals to separate themselves from the rest.

R Consortium started a working group that has identified an absence of a system to certify qualified R professionals as a cause for this problem. As a response to this, the group is working to create a certification for R that will allow professionals and students to acquire fundamental skills and knowledge of the language. Creation of this certification also aims to help recruiters identify and assess the skills of potential recruits. This group will be driven by the needs of the current R professionals and data science recruiters. More information about this initiative can be found here.

In order for this working group to create a valuable certification, we encourage community feedback in this initiative. Your feedback will help the working group to evolve this certification to best serve the needs of the R community. Please respond to this survey to help in the creation of this certification.

Progress on driving diversity standards in the R community – an update on RCDI-WG

By | Blog, R Consortium Project

R Consortium announced four months ago the formation of the R Community Diversity and Inclusion Working Group (RCDI-WG), a focal point to address diversity and inclusion issues in the R community. It’s been great to see the interest and participation from the community, with over 50 individuals representing working groups, events, meetups, and projects coming together to drive forward sustainable diversity standards and practices.

The group has begun working on the initial deliverables outlined.

Code of Conduct

One of the working group’s key deliverables was a sample code of conduct that organizers can use for their conferences, meetups, and other events. We consulted the Ada Initiative’s and Geek Feminism’s guides for codes of conduct, and based our work on the existing R-Ladies code of conduct, which itself is based on Geek Feminism’s code of conduct.

We’d encourage all of you to review the proposed code of conduct and provide feedback/comments via pull request or an email to the mailing list

Speaker Diversity working group

Conferences are at their best when everyone participates. The Speaker Diversity group focuses on collecting and disseminating tips that can help conference and event organizers increase the diversity of their speaker lineup through conscious recruitment and retention. Through curated articles, conversations with event organizers and participants, and our collective experiences, we aim to tease out strategies that have worked in the past and highlight behaviors that can lead organizers away from achieving their speaker diversity goals.

Recently the group drafted an initial version of an article outlining five major area that organizers can focus on to help increase the diversity of their event’s speaker lineup. In short, M.A.P.S.S (Mission, Advertise, Pipeline, Selection, Sharing) outlines strategies that can be taken, at each phase of planning an event, to help achieve a diverse and inclusive speaker lineup. If you are interested in exploring how to increase speaker diversity at conferences and events, join in the conversation on the email list or follow our progress in the Github repository.

Conference best practices checklist

One challenge conference organizers highlighted is ensuring that while organizing a conference in good faith of trying to drive diversity and inclusivity, often some items were forgotten or overlooked.

This inspired the workstream for driving the Conference best practices checklist. The aim of this checklist is for it to be a non-exhaustive but illustrative reference system for organisers of conferences (and even events). Conference organisers can use this to help them build out a tasklist, stimulate conversation within their organising team, or publish their alignment with it to help speakers, attendees, and sponsors make an informed decision about whether to attend.

You should review the list and let us know via pull request or email of suggestions for improving this.

 

It’s great to see progress on bringing the greater R community together to create a welcoming space for everyone, no matter their race, ethnicity, gender, gender identity and expression, socio-economic status, nationality, citizenship, religion, sexual orientation, ability, or age. We welcome everyone to review the meetings and deliverables, and join the mailing list to further drive the conversation

What’s new with R Consortium funded projects in Q3 2018

By | Blog, R Consortium Project

In an effort to provide greater transparency with respect to R Consortium activities, the ISC provides quarterly updates for all R Consortium funded projects. The following is our update for Q3 2018.

R-hub

Most of the work on R-hub was maintenance, i.e. updating software, fixing failures, updating SSL certificates.

R-Hub in 2018 January-June, some numbers:

  • 18 platforms, including Windows, Linux, MacOS and Solaris builders
  • 9098 builds
  • 843 different packages
  • 450 users

Detailed work and issue tracking can be found here and here.

PSI application for collaboration to create online R package validation repository

The AIMS SIG is planning to create an online repository for R package validation which could be used when R is used for pharmaceutical industry regulatory analysis. Since the main hurdle for widespread use of R in late phase trials is ensuring adequate validation documentation, AIMS is designing a framework which will specify a set of requirements, including metadata and examples of tests, which together would form evidence of the quality of an R package. Initially we will use dplyr as an example, and will make this “evidence of validation” available to the wider community on GitHub.

Whether the evidence provided is sufficient will be the decision of the end user, but it can be a starting point for further testing or may be sufficient in itself depending on the user’s attitude to risk. After review by our peers, we will be calling on all R users to submit similar evidence of validation for other packages. By sharing this evidence, we hope to reduce the amount of re-work being done by multiple companies eager to use R, but fearful of doing so in a regulatory environment without documentation of validation.

The AIMS SIG presented at the 2018 PSI Conference. The title was “The Future is heRe” aimed at demonstrating ways in which R can be useful in the pharmaceutical industry and promoting our plans for the online repository for the documentation of R validation.

AIMS also presented and held an R-validation brainstorming workshop at the R/Pharma Conference in August 2018. Volunteers from this meeting and the wider community will form a sub-group who will work towards the creation of the R-validation online repository. For latest details on the project please look here. If you are interested in contributing to this project please contact: Lyn Taylor at taylorlyn@prahs.com.

Ongoing infrastructural development for R on Windows and MacOS

Follow the project here.

Quantities for R

The r-quantities project has been completed. The four milestones of the project were published in the following blog posts:

1. A first working prototype
2. Support for units and errors parsing
3. An analysis of data wrangling operations with quantities
4. Prospects on fitting linear models with quantities

And along the way, there have been multiple exciting improvements, both in the ‘units’ and ‘errors’ packages, to support all these features and make ‘quantities’ possible, which is ready for an imminent CRAN release. This project ends, but the r-quantities GitHub organization will continue to thrive and to provide the best tools for quantity calculus to the R community.

histoRicalg — Preserving and Transferring Algorithmic Knowledge

histoRicalg has generated some interest and activity in its first few months, with a working group established and some activities begun concerning older algortihms that R may or may not be using. Our gitlab repository now has a variety of materials, and some are codified into vignettes is here. Our mailing list has grown, and new members welcome join the mailing list. .

R User Group Support program

The 2018 R Consortium R User Group Support Program has wrapped up for the year on September 30th. This year we have made 3 Array level grants, 17 Matrix level grants and 96 Vector level grants to make a total of 116 RUGs funded. 78 groups elected to participate in the meetup.com program where the R Consortium pays their meeup.com dues. Additionally, we funded 14 small conferences. In total, the RUGs program awarded more than $50,000 in grants.

stars: Scalable, spatiotemporal tidy arrays for R

Follow the project here.

Forwards Workshops for Women and Girls

A package development workshop for women was held at eRum 2018. Thanks to the R Consortium funding, travel scholarships were provided for 8 women that attended. The course used material developed for the first workshop run in Auckland last year and these materials are now available on GitHub under a CC-BY-NC 3.0 license. A short (1.5-2 hours) version was also developed and run at Cardiff satRday, with accompanying RStudio cloud instance (details in the materials on the GitHub site). This opens the possibility of running the workshops with minimal set-up at R-Ladies or other user groups.

The next workshop planned is for high school girls in NYC, to be held on Saturday 27 October. Further details are on the Education page of the Forwards website.

Refactoring and updating the SWIG R module

Two significant PRs have been sent to swig maintainers, details of which are discussed in the report pages below. These PRs address:

  • Rewriting the enumeration wrapping framework in the R module, allowing arbitrarily complex definition of underlying integer values
  • Rewriting/refactoring of the accessor wrapping framework for the R module, which eliminates the use of logic driven by function names from this part of the module

The latter eliminates bugs in which some accessors were incorrectly defined due to errors in the string logic, and others were incorrectly ignored due to names matching the patterns expected of automatically generated bindings. Look here for details.

Developing Tools and Templates for Teaching Materials

We have set up a GitHub organization that includes a repository to gather feedback and ideas  and a website to provide updates about the project. We started a small R package to extract, manipulate and modify the content of yml headers.

Joint profiling of native and R code

No new updates since Q2. First thing to do will be to improve documentation and perhaps record a screencast, to lower the entry barrier.

Maintaining DBI

Kirill Müller presented DBI in Berlin, Amsterdam, and Zurich. Here are the slides. The DBI package now has a CII badge. Stay tuned for the upcoming blog posts, the first in early October. All posts will be available here.

R Documentation Task Force

The Beta versions will be released soon.

Sat R Days

satRdays (satrdays.org) continue to grow and flourish. In the next three months, we’re having our first conferences in the US (dc2018.satrdays.org) and South America (santiago2018.satrdays.org) as well as another conference in the European region (belgrade2018.satrdays.org.

We’re discussing more events in Africa and growing the number of events per region next year. We’re hoping for at least 20 satRdays to run around the world.

Folks can now register interest on our events calendar and read about running an event on our knowledgebase (knowledgebase.satrdays.org).

We’re looking forward to supporting people across the world in sharing their R knowledge via accessible and diverse events.

Conference Management System for R Consortium Supported Conferences

We have completed and supported the deployment of a satRdays conference template that is open source and forkable for other events. It allows the integration of common ticketing platforms and call for paper solutions. The template is here.

Our next steps are about building a similar solution that is UseR! branded and incorporates sciencesconf to make it easier for academic events to spin up quickly too.

R-Ladies

Growth: R-Ladies growth is exceeding all the expectations. The goal for 2018 was to reach 65 chapters, and we are pleased to announce that, as per Oct 2018, we have already doubled that number: there are currently 131 chapters on meetup with ~31,500 R-Ladies members (signed up on meetup) and distributed in 41 countries and 131 cities in 6 continents. Additionally the R-Ladies remote, launched last quarter, is very active organising virtual events for all the R-Ladies far from any chapters.

Improving infrastructure: The majority of R-Ladies chapters have been migrated to our meetup pro account, few more are scheduled next quarter. This will help simplifying the reimbursement of chapter expenses.

R-Ladies is now a non-profit organization incorporated in California (USA) – application for tax exempt status is in progress.

An Earth data processing backend for testing and evaluating stars

Follow the project here.

Fall 2018: ISC Call for Proposals

By | Announcement, Blog

by Joseph Rickert

The second and final ISC Call for Proposals for 2018 is now open. We are looking for ambitious projects that will contribute to the infrastructure of the R ecosystem and benefit large sections of the R community. We are deliberately being a little vague here, but having awarded more than $650,000 in grants so far, we can show a substantial number of funded projects that provide examples.

If you are going to submit a proposal, “Think Big” but structure your proposal with intermediate milestones. The ISC is not likely to fund proposals that ask for large initial cash grants. We tend to be conservative with initial grants, preferring projects structured in such a way that significant initial milestones can be achieved with modest amounts of cash.

As with any proposed project, the more detailed and credible the project plan, and the better the track record of the project team, the higher the likelihood of receiving funding. Please be sure that your proposal includes measurable objectives, intermediate milestones, a list of all team members who will contributing work and a detailed accounting of how the grant money will be spent.

Also, if you think you are onto something but could use some help in finalizing scope of a project, or you think implementing your idea would require achieving some level of consensus within the R Community, you might consider asking the ISC to help you establish a working group.

If you don’t think you have an idea that is fundable but want to get involved, you might want to explore getting involved with existing projects or put some thought into one of the perennial issues associated with finding one’s way through the R ecosystem. For example, could you build a package discovery system or recommender engine that spans CRAN, Bioconductor and GitHub, or implement and curate a calendar that automatically tracks R related events worldwide?

Our goal in calling for proposals is to stimulate creativity and help turn good ideas into tangible benefits for the R Community. What can you do to improve the R ecosystem and how can the R Consortium help you do it?

Note that proposals to sponsor conferences, workshops or meetups should be sent directly to the R Consortium’s R User Group and Small Conference Support Program. These are not funded as ISC proposals. Note that the deadline for applying for support under the 2018 program is coming up quickly. Requests for support under the 2018 program must be received by midnight, September 30, 2018. The 2019 program will launch sometime in January.

To submit a proposal for ISC funding, read the Call for Proposals page and submit a self-contained pdf using the online form. You should receive confirmation within 24 hours.

The deadline for submitting a proposal is midnight PST, Sunday October 31, 2018.

CII Best Practices Badge for R Packages – responding to concerns

By | Blog, R Consortium Project

Our last post Should R Consortium Recommend CII Best Practices Badge for R Packages: Latest Survey Results summarized results from the CII Best Practices survey conducted this summer. A goal of the CII Best Practices program is to help improve open source software quality. Respondents shared several concerns to which David Wheeler, project lead for the Core Infrastructure Initiative (CII) with the Linux Foundation, and I wanted to respond.

Let’s dive in…

Concern #1: Does the CII Badge have the “correct” or “best” set of criteria?

The CII Badge criteria are the best general-purpose OSS project criteria that we, the OSS community, have developed to date.  The CII Badge criteria were developed based on the experience and recommendations of many experts, previous criteria developed by various organizations, and the examination of real-world successful OSS projects.  No doubt the criteria could be improved further, but the badge criteria are themselves open source and can be improved using the same process as any OSS code: simply propose changes for review!

Concern #2: Achieving a badge does not necessarily mean a given package is well-designed or well-implemented.

Many of the CII criteria can help push projects towards creating better- or well-designed and implemented packages.  In the “passing” level, the CII criteria include these requirements:

  • [warnings] criterion requires enabling compiler warning flags or similar
  • [static_analysis] requires the use of at least one static code analysis tool (assuming one exists)
  • [test] requires a test suite, which often nudges people towards better design and implementation
  • [test_policy] requires that you keep adding tests, especially as new functionality comes online
  • [know_secure_design] requires at least one primary developer know how to design secure software. The best practices site explains what this means under the “details” of this criterion.  In summary, this criterion requires that at least one primary developer understands the 8 principles of Saltzer and Schroeder (as explained by the CII Best Practices site) and also knowing to (1) limit the attack surface and (2) perform input validation with whitelists.  Software can be badly designed by knowledgeable people, but software is much more likely to be designed and implemented well if developers know the basics.
  • [know_common_errors] requires that at least one of the project’s primary developers must know of common kinds of errors that lead to vulnerabilities and at least one method to counter or mitigate each of them

Higher badge levels (“silver” and “gold”) offer even more.

It’s true that a badge doesn’t guarantee that a package is well-designed or implemented by some measure, but part of the problem is that it’s difficult to unambiguously determine if something is well-designed or well-implemented.  Much depends on the purpose of the package!  So instead, many criteria focus on enabling mass peer review and managing improvements, so that problems are more likely to be detected and corrected.

In short, software normally undergoes change over time.  Instead of requiring that a project be perfect at one point in time, we focus on criteria that will help projects continuously improve over the long run.

Concern #3: How does the CII help to ensure the validity of self-certification, e.g., through automated tools?

We use automated tools and reject some answers that are clearly false.  We require that replies be public and that there be URLs for some answers; that makes it easy for anyone to check answers.  In the worst case, we can override false answers, though in practice we’ve almost never found that necessary.

Concern #4: Even if every R package had a badge, the issue of finding a needed package among over 12K packages remains.

Finding a desired package or the “best” one for a given task is largely orthogonal to improving package quality, though the two can be related. The badging process can help, because one of the criteria is “The project website MUST succinctly describe what the software does (what problem does it solve?)”.  Search engines are much more effective at finding relevant packages once that kind of information is available. In another way, if using packages that state adherence to the CII criteria is important to you or your organization, the search space may be significantly reduced – at least as a starting point.

Concern #5: Can the CII criteria be streamlined to reflect only the needs of R packages, including those that are more data and documentation than code?

Our current primary approach for streamlining is to automate criteria.  That said, if you have a specific idea for streamlining things further, please file an issue on GitHub here.

Concern #6: Will automated tools be available for performing at least parts of the assessment, e.g., as found in R’s devtools?

We already use automated tools to assist in completing the form.  We’d rather not require people to install tools to fill in information, because that would be a barrier for some.  If there are tools we aren’t using and should use, let us know!

Concern #7: A badge program could penalize developers who do not have time, money, or skills to meet the criteria, making their packages less desirable if they do not achieve a badge.

We’ve worked hard to make the badge “passing” criteria doable for single-person projects.  Daniel Stenberg is the author and maintainer of cURL and libcurl, and he’s been especially influential in ensuring that the “passing” badge is doable for single-person projects.  If you have no tests, cannot automatically build your software (even though it requires building), or have never run a static analysis tool of any kind, then there is some work… but it’s better for users if these are addressed.

The top “gold” level requires multiple people in a project, e.g., because the project MUST have a “bus factor” of 2 or more.  That can be a challenge for developers, but it’s a big advantage for users – users would much rather depend on software where a single death doesn’t suddenly mean that there’s no one to update the software.  No one is required to get the gold level, however, and there are many ways to resolve this.

Concern #8: Introducing more process comes with additional burdens for package developers, perhaps reducing overall ecosystem participation.

We’ve done our best to minimize the risk from additional burdens.  We automate some answers, and that helps.  We reduce the risk of duplicated evaluation processes by having a single set of criteria for all OSS.  Perhaps most importantly: the criteria were developed by examining real-world successful projects, so they require actions that other projects are already doing and finding helpful.

Perhaps more importantly, keep in mind that getting a CII best practices badge is optional – a package author can decide if the benefits of adhering to the CII criteria outweigh the costs.

Concern #9: Is there a way to distinguish tests for validating statistical software numerical computations and statistical properties?

Sure.  Naming conventions for tests are a common way to distinguish types of tests; you can also put different kinds of tests in different directories.  From the badge perspective, we don’t focus on that distinction. For “passing” the key is that your project must have a general policy that as major new functionality is added to the software produced by the project, tests of that functionality should be added to an automated test suite.  Passing doesn’t require a perfect test suite; instead, we require that you have a test suite and that you’re committed to improving it. Since OSS is visible to the user community, a potential user may want to examine the type and quality of tests performed.  The higher-level badges do require better test suites, as you might expect.

 

We continue to receive valuable comments through the survey and are pleased to report that more R package authors are choosing to participate in the CII as evidenced by the surge in new R CII project entries.