Skip to main content
Category

R Consortium Project

Package Licensing: Would the R Community like some help? Feedback from the trenches

By Blog, R Consortium Project

Editor’s Note: This post comes from Mark Hornick, who leads the Code Coverage WG and serves on the Board of Directors

In the Fall of 2017, the R Consortium surveyed the R Community to understand opportunities, concerns, and issues facing the community. Taking into account that feedback, the R Consortium recently surveyed package authors and maintainers on a number of topics surrounding R package licensing. Questions revolved around motivations for choice of license, comfort level in understanding license meaning and implications, importance of corporate adoption of R, and whether guidance on licensing from the R Consortium would be valuable.

While there are a significant number of people in the R community who respond they understand and intentionally choose the license(s) they apply to their package software, a much larger group are unclear about which license to choose and what the implications of that choice are. These implications affect not only the individual package, but the R Community and corporate, government, and academic users of those packages as well.  Of roughly 7400 invitations to complete the survey, the R Consortium received more than 1100 responses – a response rate over 14%.

In this blog post, we summarize that feedback and offer next steps that the R Consortium and R Community may take based on this feedback.

Who responded to the survey

Of respondents, 42% are relatively new to R package development with 3 or fewer years of experience, 31% have 4-6 years, and 27% have more than 7 years of experience. As the following table shows, the majority of package authors have been working with R for less than 6 years and writing up to 5 packages.

The largest subgroup of responders (44%) have produced one package over their career. However, 39% of responders have not pushed a package to CRAN over the past year.

The most popular license used among respondents is ‘GPL-3’ at 35% with ‘GPL-2 | GPL-3’ a close second at 34%, ‘GPL-2’ next at 24%, and ‘MIT’ at 21%. However, there are a mix of other licenses cited, including LGPL, BSD, Apache2, and Creative Commons, among others.

What do I want others to be able to do with my package?

When it comes to open source software, there are many ways to think about how software could be used. For example, you may want everyone to be able to freely use your software by its API, but have concerns about what happens if the underlying code is modified – derivative works. On the other hand, you may want to impose licensing requirements on the software that uses your software as well, e.g., software that uses my package must be licensed in the same way as my package. The license choice can significantly affect how and whether a given package can be used in corporate, academic, or government environments.

From the survey, 60% of respondents want other software developers to be able to use their package(s) without imposing license requirements on the software that uses their package (via API), with only 15% disagreeing.

The majority of respondents were neutral as to whether they wanted to ensure that software using their package(s) must apply the same license that they chose, with 29% agreeing and 19% disagreeing.

As expected, respondents want to ensure that derivative works of their package(s) remain open source, with 74% agreeing. However, only 25% agree that derivative works should require the same license as the package used.

How do you choose a license for your package?

In the survey, we asked which factors contribute to the choice of package license.  Sixteen percent of respondents indicated license choice defaulted to the license of dependent packages, whether used exclusively through their API or if they borrowed code or header definitions. A sizeable 65% indicate that it is a conscious choice based on their understanding of open source and other license terms. But this is tempered by responses described in the next section regarding comfort level with understanding open source licenses.

The open comments section for this question revealed more details, e.g., some respondents consult websites, blogs, and books for license recommendations, or get advice from package reviewers. Some respondents admit they haven’t thought deeply about the choice of license and don’t understand the differences between licenses since the choices and legalese can be overwhelming. Some use what other respected package authors have used (without necessarily understanding why a given license was chosen for such a package) or as determined by corporate or government dictates or requirements. Yet other respondents indicated making an arbitrary or random choice since R package submission requires that some license must be chosen.

The open comments also highlighted some potential misconceptions, such as if a package author chooses GPL-2 for their package, they are unable to change that to a more permissive license later. The ability to change a license depends on multiple factors, e.g., licenses of dependent packages or lifted code, whether all authors give their consent, etc. Some respondents state they want licensing that enables more users of their code rather than fewer. Others see GPL as a way to ensure commercial usage of their packages occurs fairly. Some respondents choose BSD as it provides most freedom to package users.

 

Open Source License knowledge

For the R Consortium to understand whether resources should be applied to the problem of licensing, we asked package developers about the level of understanding of open source licenses. While 12% outright stated they do not feel comfortable interpreting or applying open source licenses, 62% find license details and options confusing – even if they understand the basic premise of open source licenses.

Only 23% felt confident in choosing the right open source license(s) for their packages, while about 1% claim to have access to Legal Counsel to guide their choice of open source licenses. Another 1% claim to have sufficient legal background to choose the appropriate licenses(s) for their packages.

While licensing is important when trying to use software in corporate settings, only 24% of respondents consider the license of an R package important in determining whether or not they use it – 35% are neutral and 40% don’t think it’s important.

A majority (56%) of respondents believe that corporate adoption of R technology (engine and packages) is important for the R Community – 36% are neutral while 8% feel corporate adoption is not important. Consistent with this, 56% of respondents feel the R ecosystem should make it easy for corporate use of R – 37% are neutral and 6% disagreeing.

Tools and Guidance

As open source communities and technology continues to evolve, there are more tools available to assist with license choice. For example, code scanning tools exist in other open source communities to identify potential licensing issues. While following the advice of such tools is optional, most if not all developers want to “do the right thing” with respect to licensing. Testament to this is that over 71% of respondents indicated they would welcome the availability of a license scanning tool to flag package license issues – only 3% disagreed.

With the objective to enable package developers to make an informed choice of license, respondents were asked if they would like the R Consortium to provide guidance on open source license choices and implications. Over 89% indicated they would. One respondent put it best “I want whatever is best for making sure the CRAN community thrives in the long-term.” This is the intent of the R Consortium as well.

The R Consortium thanks the respondents to this survey for taking the time to share their experience, concerns, and needs. As a next step, the R Consortium will work with the R Community to provide best practices for good “license hygiene.” If you would like to be part of this activity, please reach out to the R Consortium by responding to this post.

R Consortium welcomes R-Ladies as a top level project

By Announcement, Blog, News, R Consortium Project

In 2016, R-Ladies started their effort for a global expansion, with the help from the R-Consortium. Back then there were only 4 active chapters (San Francisco, Taipei, Twin Cities  and London) and the goal was to expand to 5-10 cities within the next year. The enthusiasm within the R community for local R-Ladies chapters far exceeded any possible expectations! As of March 2018, the organization has over 90 chapters and almost 19,000 members.

There are R-Ladies chapters in 45 countries around the globe, with many chapters hosting monthly events.  

With this fast growth, it became apparent to the R Consortium ISC that this project needs long term investment for success. The diversified voice of R-Ladies, speaking not only as a group representing gender minorities in tech but also a group attracting new R users, aligned with the R Consortium’s defined Code of Conduct and its desires for building a more diverse and inclusive R community.

R Consortium ISC is pleased to announce and welcome R-Ladies as a top level project. R-Ladies has shown a big commitment within the R community and becoming a top level project will provide them a longer term budget cycle (3 years instead of 1 year) to support their community. R-Ladies will also have a voting seat on the ISC (represented by Gabriela de Queiroz).

We invite all in the R community to congratulate R-Ladies on this milestone, and look forward to ensuring they have the infrastructure and funding to bring more diversity to the R community.

Announcing the second round of ISC Funded Projects for 2017

By Announcement, Blog, News, R Consortium Project

The R Consortium ISC is pleased to announce that the projects listed below were funded under the 2017 edition of the ISC Funded Projects program. This program, which provides financial support for projects that enhance the infrastructure of the R ecosystem or which benefit large segments of the R Community, has awarded $500,000 USD in grants to date. The Spring 2018 call for proposals is now open and will continue to accept proposals until midnight PST on April 1, 2018.  Learn more about the program and how to apply for funding for your project.

Quantities for R

Proposed by Iñaki Ucar

The ‘units’ package has become the reference for quantity calculus in R, with a wide and welcoming response from the R community. Along the same lines, the ‘errors’ package integrates and automatises error propagation and printing for R vectors. A significant fraction of R users, both practitioners and researchers, use R to analyse measurements, and would benefit from a joint processing of quantity values with errors.

This project not only aims at orchestrating units and errors in a new data type, but will also extend the existing frameworks (compatibility with base R as well as other frameworks such as the tidyverse) and standardise how to import/export data with units and errors.

Refactoring and updating the SWIG R module

Proposed by Richard Beare

The Simplified Wrapper and Interface Generator (SWIG) is a tool for automatically generating interface code between interpreters, including R, and a C or C++ library. The R module needs to be updated to support modern developments in R and the rest of SWIG. This project aims to make the R module conform to the recommended SWIG standards and thus ensure that there is support for R in the future. We hope that this project will be the first step in allowing SWIG generated R code using reference classes.

Future Minimal API: Specification with Backend Conformance Test Suite

Proposed by Henrik Bengtsson

The objective of the Future Framework implemented in the future package is to simplify how parallel and distributed processing is conducted in R. This project aims to provide a formal Future API specification and provide a test framework for validating the conformance of existing (e.g. future.batchtools and future.callr) and to-come third-party parallel backends to the Future framework.

An Earth data processing backend for testing and evaluating stars

Proposed by Edzer Pebesma

The stars project enables the processing Earth imagery data that is held on servers, without the need to download it to local hard drive. This project will (i) create software to run a back-end, (ii) develop scripts and tutorials that explain how such a data server and processing backend can be set up, and (iii) create an instance of such a backend in the AWS cloud that can be used for testing and evaluation purposes.

R Consortium Call For Proposals: February 2018

By Announcement, Blog, News, R Consortium Project

by Joseph Rickert

The first ISC Call for Proposals for 2018 is now open. We are looking for ambitious projects that will contribute to the infrastructure of the R ecosystem and benefit large sections of the R community. However, we are not likely to fund proposals that ask for large initial cash grants. The ISC tends to be conservative with initial grants, preferring projects structured in such a way that significant initial milestones can be achieved with modest amounts of cash.

As with any proposed project, the more detailed and credible the project plan, and the better the track record of the project team, the higher the likelihood of receiving funding. Please be sure that your proposal includes measurable objectives, intermediate milestones, a list of all team members who will contributing work and a detailed accounting of how the grant money will be spent.

But, most importantly – don’t let this talk of large projects dampen your enthusiasm! We are looking for projects with impact, regardless of their size. With this call for proposals, we are hoping to stimulate creativity and help turn good ideas into tangible benefits. Look around your corner of the R Community, what needs doing and how can the R Consortium help?

Please do not submit proposals to sponsor conferences, workshops or meetups. These requests should be sent directly to the R Consortium’s R User Group and Small Conference Support Program.

To submit a proposal for ISC funding, read the Call for Proposals page and submit a self-contained pdf using the online form. You should receive confirmation within 24 hours.

The deadline for submitting a proposal is midnight PST, Sunday April 1, 2018.

The 2018 R Consortium R User Group Support Program is Underway.

By Announcement, Blog, Events, News, R Consortium Project

In just one year, the R Consortium through the R User Group Support program sponsored 76 R user groups and 3 small conferences with cash grants totaling just under $30,000. This program aligns with the R Consortium mission of fostering the continued growth of R community and the data science ecosystem, and has already helped bring more people to using R and contributing to the community.

Coming off a successful 2017, we are pleased to announce the opening of the 2018 program today. While the structure of the 2018 program is similar to last year’s program with the multiple levels of support, we have enhanced the program based on feedback from last year’s funded user groups.

Complimentary Meetup.com Pro Account

After a year of supporting user groups, we’ve found that the primary cost for each group is having a page on meetup.com or thier own website ( though the majority prefer the meetup.com platform ). This leaves less funds available things like meetup space, food, or even swag, and thus put more of a burden on the group leaders to attract people to the group.

This year we’ve leveraged our relationship with the Linux Foundation, and now will provide each user group a complimentary meetup.com Pro account. Leveraging this not removes one less cost concern for group leaders, but it will also better enable us to promote user groups through the many features the platform provides for groups. For all the details of the program, eligibility requirements for the three levels of user group grants, the schedule of grants and the details of signing up for the meetup.com pro account please see the R Consortium’s R User Group Support Program webpage.

Small Conference Support

We’ve also seen an increase in the number of smaller, regional focused R conferences happening around the world. Grassroots events like this are critical for sustainability in the R community, but need financial support and community awareness to be successful.  Several reached out last year and we provided funding with excess funds in the program with great results.

These events perfectly align with the mission of the R User Group Support program, we’re formally expanding it this year to provide cash grants in the $500 to $1,000 range to continue to encourage small, R-focused conferences and meetings organized by non-profit or volunteer groups are the world. You can find out more about this new piece of the program on the R Consortium’s R User Group Support Program webpage.

Length of Program

R Consortium will begin taking applications for both R User Group Support and Small Conference Support today. Applications will be accepted through September 30, 2018.

Apply to the 2018 RC RUGS program by filling out this form. You can email us at rugs@r-consortium.org with any questions around the program.

2018 R Consortium Silver member representatives for Board and ISC

By Announcement, Blog, News, R Consortium Project

Per the R Consortium ByLaws and ISC Charter, the Silver Member class is entitled to elect individuals representative of the Silver Member class for a term starting January 1, 2018 through December 31, 2018 as follows:

  • 1 representative to the ISC
  • 1 Silver Member Board Director per every 7 Silver Members, subject to provisions 4.2 and 4.3(d) of the R Consortium ByLaws. This means the Silver Member class can elect up to 2 Board Directors representing the class.

These elections ran during the month of November 2017, with 3 nominees for Silver Member Board Director and 3 nominees for the Silver Member ISC representative.

I am pleased to announce those elected by the Silver member class to serve on the Board of Directors and ISC effective 1/1/2018 through 12/31/2018.

Silver Member ISC representative

Silver Member Board Directors
Please join me in congratulating each of the elected representatives.
We would also like the share a big thank you to the outgoing Silver Member Board Director Richard Pugh of Mango Solutions. His guidance and leadership within the R Consortium have made a huge impact on its current success.

Take the R Consortium’s Survey on R!

By Announcement, Blog, News, R Consortium Project, R Language

by Joseph Rickert and Hadley Wickham

Help us keep the conversation going: Take the R Consortium’s Survey. Let us know: What are you thinking? What do you make of the way R is developing? How do you use R? What is important to you? How could life be better? What issues should we be addressing? What does the big picture look like? We are looking for a few clues and we would like to hear from the entire R Community.

    

The R Consortium exists to promote R as a language, environment and community. In order to answer some of the questions above and to help us understand our mission better we have put together the first of what we hope will be an annual survey of R users. This first attempt is a prototype. We don’t have any particular hypothesis or point of view. We would like to reach everyone who is interested in participating. So please, take a few minutes to take the survey yourself and help us get the word out. The survey will adapt depending on your answers, but will take about 10 minutes to complete.

The anonymized results of the survey will be made available to the community for analysis. Thank you for participating.

                                                                                   Take the survey now!      

现在进行调查!     今すぐ調査をしてください!    Participez à l’enquête en ligne!    ¡Tome la encuesta ahora!

 

 

Code Coverage Tool for R Working Group Achieves First Release

By Blog, News, R Consortium Project, R Language

by Mark Hornick, Code Coverage Working Group Leader

The “Code Coverage Tool for R” project, proposed by Oracle and approved by the R Consortium Infrastructure Steering Committee, started just over a year ago. Project goals included providing an enhanced tool that determines code coverage upon execution of a test suite, and leveraging such a tool more broadly as part of the R ecosystem.

What is code coverage?

As defined in Wikipedia, “code coverage is a measure used to describe the degree to which the source code of a program is executed when a particular test suite runs. A program with high code coverage, measured as a percentage, has had more of its source code executed during testing which suggests it has a lower chance of containing undetected software bugs compared to a program with low code coverage.”

Why code coverage?

Code coverage is an essential metric for understanding software quality. For R, developers and users alike should be able to easily see what percent of an R package’s code has been tested and the status of those tests. By knowing code is well-tested, users have greater confidence in selecting CRAN packages. Further, automating test suite execution with code coverage analysis helps ensure new package versions don’t unknowingly break existing tests and user code.

Approach and main features in release

After surveying the available code coverage tools in the R ecosystem, the working group decided to use the covr package, started by Jim Hester in December 2014, as a foundation and continue to build on its success. The working group has enhanced covr to support even more R language aspects and needed functionality, including:

  • R6 methods support
  • Address parallel code coverage
  • Enable compiling R with Intel compiler ICC
  • Enhanced documentation / vignettes
  • Provide tool for benchmarking and defining canonical test suite for covr
  • Clean up dependent package license conflicts and change covr license to GPL-3

CRAN Process

Today, code coverage is an optional part of R package development. Some package authors/maintainers provide test suites and leverage code coverage to assess code quality. As noted above, code coverage has significant benefits for the R community to help ensure correct and robust software. One of the goals of the Code Coverage project is to incorporate code coverage testing and reporting into the CRAN process. This will involve working with the R Foundation and the R community on the following points:

  • Encourage package authors and maintainers to develop, maintain, and expand test suites with their packages, and use the enhanced covr package to assess coverage
  • Enable automatic execution of provided test suites as part of the CRAN process, just as binaries of software packages are made available, test suites would be executed and code coverage computed per package
  • Display on each packages CRAN web page its code coverage results, e.g., the overall coverage percentage and a detailed report showing coverage per line of source code.

Next Steps

The working group will assess additional enhancements for covr that will benefit the R community. In addition, we plan to explore with the R Foundation the inclusion of code coverage results in the CRAN process.

Acknowledgements

The following individuals are members of the Code Coverage Working Group:

  • Shivank Agrawal
  • Chris Campbell
  • Santosh Chaudhari
  • Karl Forner
  • Jim Hester
  • Mark Hornick
  • Chen Liang
  • Willem Ligtenberg
  • Andy Nicholls
  • Vlad Sharanhovich
  • Tobias Verbeke
  • Qin Wang
  • Hadley Wickham – ISC Sponsor

Improving DBI: A Retrospect

By Blog, News, R Consortium Project, R Language

by Kirill Müller

The “Improving DBI” project, funded by the R consortium and started about a year ago includes the definition and implementation of a testable specification for DBI and making RSQLite fully compliant to the new specification. Besides the established DBI and RSQLite packages, I have spent a lot of time on the new DBItest package. Final updates to these packages will be pushed to CRAN at the end of May. This should give downstream maintainers some time to make accommodations. The follow-up project “Establishing DBI” will focus on fully DBI-compliant backends for MySQL/MariaDB and PostgreSQL, and on minor updates to the specs where appropriate.

DBItest: Specification

The new DBItest package provides a comprehensive backend-agnostic test suite for DBI backends. When the project started, it was merely a collection of test cases. I have considerably expanded the test cases and provided a human-readable description for each, using literate programming techniques powered by roxygen2. The DBI package weaves these chunks of text to a single document that describes all test cases covered by the test suite, the textual DBI specification. This approach ensures that further updates to the specification are reflected in both the automatic tests and the text.

This package is aimed at backend implementers, who now can programmatically check with very little effort if their DBI backend conforms to the DBI specification. The verification can be integrated in the automated tests which are run as part of R’s package check mechanism in R CMD check. The odbc package, a new DBI-compliant interface to the ODBC interface, has been using DBItest from day one to enable test-driven development. The bigrquery package is another user of DBItest.

Because not all DBMS support all aspects of DBI, the DBItest package allows developers to restrict which parts of the specification are tested, and “tweak” certain aspects of the tests, e.g., the format of placeholders in parameterized queries. Adapting to other DBMS may require more work due to subtle differences in the implementation of SQL between various DBMS.

DBI: Definition

This package has been around since 2001, it defines the actual DataBase Interface in R.

I have taken over maintenance, and released versions 0.4-1, 0.5-1, and 0.6-1, with release of version 0.7 pending. The most prominent change in this package is, of course, the textual DBI specification, which is included as an HTML vignette in the package. The documentation for the various methods defined by DBI is obtained directly from the specification. These help topics are combined in a sensible order to a single, self-contained document. This format is useful for both DBI users and implementers: users can look up the behavior of a method directly from its help page, and implementers can browse a comprehensive document that describes all aspects of the interface. I have also revised the description and the examples for all help topics. Other changes include:

  • the definition of new generics dbSendStatement() and dbExecute(), for backends that distinguish between queries that return a table and statements that manipulate data,
  • the new dbWithTransaction() generic and the dbBreak() helper function, thanks Barbara Borges Ribero,
  • improved or new default implementations for methods like dbGetQuery(), dbReadTable(), dbQuoteString(), dbQuoteIdentifier(),
  • internal changes that allow methods that don’t have a meaningful return value to return silently,
  • translation of a helper function from C++ to R, to remove the dependency on Rcpp (thanks Hannes Mühleisen).

Fortunately, none of the changes seemed to have introduced any major regression issues with downstream packages. The news contain a comprehensive list of changes.

RSQLite: Implementation

RSQLite 1.1-2 is a complete rewrite of the original C implementation. Before focusing on compliance to the new DBI specification, it was important to assert compatibility to more than 100 packages on CRAN and Bioconductor that use RSQLite. These packages revealed many usage patterns that were difficult to foresee. Most of these usage patterns are supported in version 1.1-2, the more esoteric ones (such as supplying an integer where a logical is required) trigger a warning.

Several rounds of “revdep checking” were necessary before most packages showed no difference in their check output compared to the original implementation. The downstream maintainers and the Bioconductor team were very supportive, and helped spotting functional and performance regressions during the release process. Two point releases were necessary to finally achieve a stable state.

Supporting 64-bit integers also was trickier than anticipated. There is no built-in way to represent 64-bit integers in R. The bit64 package works around this limitation by using a numeric vector as storage, which also happens to use 8 bytes per element, and providing coercion functions. But when an integer column is fetched, it cannot be foreseen if a 64-bit value will occur in the result, and smaller integers must use R’s built-in integer type. For this purpose, an efficient data structure for collecting vectors, which is capable of changing the data type on the fly, has been implemented in C++. This data structure will be useful for many other DBI backends that need support for a 64-bit integer data type, and will be ported to the RKazam package in the follow-up project.

Once the DBI specification was completed, the process of making RSQLite compliant was easy: enable one of the disabled tests, fix the code, make sure all tests pass, rinse, and repeat. If you haven’t tried it, I seriously recommend test-driven development, especially when the tests are already implemented.

The upcoming release of RSQLite 2.0 will require stronger adherence to the DBI specification also from callers. Where possible, I tried to maintain backward compatibility, but in some cases breaks were inevitable because otherwise I’d have had to introduce far too many exceptions and corner cases in the DBI spec. For instance, row names are no longer included by default when writing or reading tables. The original behavior can be re-enabled by calling pkgconfig::set_config(), so that packages or scripts that rely on row names continue to work as before. (The setting is active for the duration of the session, but only for the caller that has called pkgconfig::set_config().) I’m happy to include compatibility switches for other breaking changes if necessary and desired, to achieve both adherence to the specs and compatibility with existing behavior.

A comprehensive list of changes can be found in the news.

Other bits and pieces

The RKazam package is a ready-to-use boilerplate for a DBI backend, named after the hypothetical DBMS used as example in a DBI vignette. It already “passes” all tests of the DBItest package, mostly by calling a function that skips the current test. Starting a DBI backend from scratch requires only copying and renaming the package’s code.

R has limited support for time-of-day data. The hms package aims at filling this gap. It will be useful especially in the follow-up project, because SQLite doesn’t have an intrinsic type for time-of-day data, unlike many other DBMS.

Next steps

The ensemble CRAN release of the three packages DBI, DBItest and RSQLite will occur in parallel to the startup phase for the “Establishing DBI” follow-up project. This project consists of:

  • Fully DBI compatible backends for MySQL/MariaDB and Postgres
  • A backend-agnostic C++ data structure to collect column data in the RKazam package
  • Support for spatial data

In addition, it will contain an update to the DBI specification, mostly concerning support for schemas and for querying the structure of the table returned for a query. Targeting three DBMS instead of one will help properly specify these two particularly tricky parts of DBI. I’m happy to take further feedback from users and backend implementers towards further improvement of the DBI specification.

Acknowledgments

Many thanks to the R Consortium, which has sponsored this project, and to the many contributors who have spotted problems, suggested improvements, submitted pull requests, or otherwise helped make this project a great success. In particular, I’d like to thank Hadley Wickham, who suggested the idea, supported initial development of the DBItest package, and provided helpful feedback; and Christoph Hösler, Hannes Mühleisen, Imanuel Costigan, Jim Hester, Marcel Boldt, and @thrasibule for using it and contributing to it. I enjoyed working on this project, looking forward to “Establishing DBI”!

Simple Features Now on CRAN

By Blog, R Consortium Project, R Language

by Edzer Pebesma

Support for handling and analyzing spatial data in R goes back a long way. In 2003, a group of package developers sat together and decided to adopt a shared understanding of how spatial data should be organized in R. This led to the development of the package sp and its helper packages rgdal and rgeos. sp offers simple classes for points, lines, polygons and grids, which may be associated with further properties (attributes), and takes care of coordinate reference systems. The sp package has helped many users and has made it attractive for others to develop new packages that share sp’s conventions for organizing spatial data by reusing its classes. Today, approximately 350 packages directly depend on sp and many more are indirectly dependent.

After 2003, the rest of the world has broadly settled on adopting a standard for so-called “features”, which can be thought of as “things” in the real world that have a geometry along with other properties. A feature geometry is called simple when it consists of points connected by straight line pieces, and does not intersect itself. Simple feature access is a standard for accessing and exchanging spatial data (points, lines, polygons) as well as for operations defined on them that has been adopted widely over the past ten years, not only by spatial databases such as PostGIS, but also more recent standards such as GeoJSON. The sp package and supporting packages such as rgdal and rgeos predate this standard, which complicates exchange and handling of simple feature data.

The “Simple Features for R” project, one of the projects supported by the R Consortium in its first funding round, addresses these problems by implementing simple features as native R data. The resulting package, sf provides functionality similar to the sp, rgdal for vector data, and rgeos packages together, but for simple features. Instead of S4 classes used by the sp family, it extends R’s data.frame directly, adding a list-column for geometries. This makes it easier to manipulate them with other tools that assume all data objects are data.frames, such as dplyr and tidyverse. Package sf links to the GDAL, PROJ.4 and GEOS libraries, three major geospatial “swiss army knives” for respectively input/output, cartographic (re)projections, and geometric operations (e.g. unions, buffers, intersections and topological relations). sf can be seen as a successor to sp, rgdal (for vector data), and rgeos.

The simple feature standard describes two encodings: well-known text, a human readable form that looks like “POINT(10 12)” or “LINESTRING(4 0,3 2,5 1)”, and well-known binary, a simple binary serialization. The sf package can read and write both. Exchange routines for binary encodings were written in Rcpp, to allow for very fast exchange of data with the linked GDAL and GEOS libraries, but also with other data formats or spatial databases.

The sf project on GitHub has received a considerable attention. Over 100 issues have been raised, many of which received dozens of valuable contributions, and several projects currently under development (mapview, tmap, stplanr) are experimenting with the new data classes. Several authors have provided useful pull requests, and efforts have begun to implement spatial analysis in pipe-based workflows, support dplyr-style verbs and integrate with ggplot.

Besides using data.frames and offering considerably simpler data structures for spatial geometries, advantages of sf over the sp family include: simpler handling of coordinate reference systems (using either EPSG code or PROJ.4 string), the ability to return distance or area values with proper units (meter, feet or US feet), and support for geosphere functions to compute distances or areas for longitude/latitude data, using datum-dependent values for the Earth’s radius and flattening.

The sf package is now available from CRAN, both in source form as in binary form for Windows and MacOSX platforms. The authors are grateful to the CRAN team for their strong support in getting the sf package compiled on all platforms. Support from the R Consortium has helped greatly to give this project priority, draw attention in a wider community, and facilitate travel and communication events.

For additional technical information about sf, look here on my website.