All Posts By


Simple Features Now on CRAN

By | Blog, R Consortium Project, R Language | No Comments

by Edzer Pebesma

Support for handling and analyzing spatial data in R goes back a long way. In 2003, a group of package developers sat together and decided to adopt a shared understanding of how spatial data should be organized in R. This led to the development of the package sp and its helper packages rgdal and rgeos. sp offers simple classes for points, lines, polygons and grids, which may be associated with further properties (attributes), and takes care of coordinate reference systems. The sp package has helped many users and has made it attractive for others to develop new packages that share sp’s conventions for organizing spatial data by reusing its classes. Today, approximately 350 packages directly depend on sp and many more are indirectly dependent.

After 2003, the rest of the world has broadly settled on adopting a standard for so-called “features”, which can be thought of as “things” in the real world that have a geometry along with other properties. A feature geometry is called simple when it consists of points connected by straight line pieces, and does not intersect itself. Simple feature access is a standard for accessing and exchanging spatial data (points, lines, polygons) as well as for operations defined on them that has been adopted widely over the past ten years, not only by spatial databases such as PostGIS, but also more recent standards such as GeoJSON. The sp package and supporting packages such as rgdal and rgeos predate this standard, which complicates exchange and handling of simple feature data.

The “Simple Features for R” project, one of the projects supported by the R Consortium in its first funding round, addresses these problems by implementing simple features as native R data. The resulting package, sf provides functionality similar to the sp, rgdal for vector data, and rgeos packages together, but for simple features. Instead of S4 classes used by the sp family, it extends R’s data.frame directly, adding a list-column for geometries. This makes it easier to manipulate them with other tools that assume all data objects are data.frames, such as dplyr and tidyverse. Package sf links to the GDAL, PROJ.4 and GEOS libraries, three major geospatial “swiss army knives” for respectively input/output, cartographic (re)projections, and geometric operations (e.g. unions, buffers, intersections and topological relations). sf can be seen as a successor to sp, rgdal (for vector data), and rgeos.

The simple feature standard describes two encodings: well-known text, a human readable form that looks like “POINT(10 12)” or “LINESTRING(4 0,3 2,5 1)”, and well-known binary, a simple binary serialization. The sf package can read and write both. Exchange routines for binary encodings were written in Rcpp, to allow for very fast exchange of data with the linked GDAL and GEOS libraries, but also with other data formats or spatial databases.

The sf project on GitHub has received a considerable attention. Over 100 issues have been raised, many of which received dozens of valuable contributions, and several projects currently under development (mapview, tmap, stplanr) are experimenting with the new data classes. Several authors have provided useful pull requests, and efforts have begun to implement spatial analysis in pipe-based workflows, support dplyr-style verbs and integrate with ggplot.

Besides using data.frames and offering considerably simpler data structures for spatial geometries, advantages of sf over the sp family include: simpler handling of coordinate reference systems (using either EPSG code or PROJ.4 string), the ability to return distance or area values with proper units (meter, feet or US feet), and support for geosphere functions to compute distances or areas for longitude/latitude data, using datum-dependent values for the Earth’s radius and flattening.

The sf package is now available from CRAN, both in source form as in binary form for Windows and MacOSX platforms. The authors are grateful to the CRAN team for their strong support in getting the sf package compiled on all platforms. Support from the R Consortium has helped greatly to give this project priority, draw attention in a wider community, and facilitate travel and communication events.

For additional technical information about sf, look here on my website.


Call for Proposals

By | Blog, R Consortium Project | No Comments

by Hadley Wickham

The infrastructure Steering Committee (ISC) is pleased to announce that the committee is now ready to accept proposals for the first round of funding in 2017. The ISC is broadly interested in projects that will make a difference to the R community. Don’t be afraid to think big! We have the budget to fund ambitious projects and we want to fund infrastructure that can help large segments of the R community.

Infrastructure includes:

  • Ambitious technical projects (like R-hub), which require dedicated
    time to supply infrastructure that is currently missing in the R
  • Community projects (like R-ladies and SatRdays), which help catalyse
    and support the growth of the R community around the world.
  • Smaller projects to develop packages (like DBI and sf), which
    provide key infrastructure used by thousands of R programmers.

The deadline for submitting a proposal is midnight PST, Friday February 10, 2017. For the mechanics of submitting a proposal and some guidance on how to write a good proposal see the Call for Proposals Webpage. Also, if you have ideas for projects, but you’re not sure you have the skills to do them yourself, file an issue with your idea on the wish list that the R Consortium maintains on GitHub.


Halfway through “Improving DBI”

By | Blog, R Consortium Project, R Language | No Comments

by Kirill Müller

In early 2016 the R Consortium partially accepted my “Improving DBI” proposal. An important part is the design and implementation of a testable DBI specification. Initially I also proposed to make three DBI backends to open-source databases engines (RSQLite, RMySQL, and RPostgres) compatible to the new DBI specification, but funding allows to work on only one DBI backend. I chose RSQLite for a number of reasons:

  • It is a very important package, judging by the number of reverse CRAN and Bioconductor dependencies
  • It’s easy to work with, because everything (including the database engine) is bundled with the package
  • It seemed to be the most advanced package, closest to the (yet to be completed) DBI specification
  • An informal Twitter poll supports this decision by a tiny margin

The project has reached an important milestone, with the release of RSQLite 1.1. This post reports the progress achieved so far, and outlines the next steps.


While the RSQLite API has changed very little (hence the minor version update), it includes a complete rewrite of the original 1.0.0 sources in C++. This has considerably simplified the code, which makes future maintenance easier, and allows us to take advantage of the more sophisticated memory management tools available in Rcpp, which help protect against memory leaks and crashes.

RSQLite 1.1 brings a number of improvements:

  • New strategy for prepared queries: Create a prepared query with dbSendQuery() or dbSendStatement() and bind values with dbBind(). This allows you to efficiently re-execute the same query/statement with different parameter values iteratively (by calling dbBind() several times) or in a batch (by calling dbBind() once with a data-frame-like object).
  • Support for inline parametrised queries via the param argument to dbSendQuery(), dbGetQuery(), dbSendStatement() and dbExecute(), to protect from SQL injection.
  • The existing methods dbSendPreparedQuery() and dbGetPreparedQuery() have been soft-deprecated, because the new API is more versatile, more consistent and stricter about parameter validation.
  • Using UTF8 for queries and parameters: this mean that non-English data should just work without any additional intervention.
  • Improved mapping between SQLite’s cell-types and R’s column-types.

See the release notes for further changes.

The rewrite was implemented by Hadley Wickham before the “Improving DBI” project started, and has been available for a long time on GitHub. Nevertheless, the CRAN release has proven much more challenging than anticipated, because so many CRAN and Bioconductor packages import it. (Maintainers of reverse dependencies might remember multiple e-mails where I was threatening to release RSQLite “for real”.) My aim was to break as little existing code as possible. After numerous rounds of revdep-checking and improving RSQLite, I’m proud to report that the vast majority of reverse dependencies pass their checks just as well (and as quickly!) as they did with v1.0.0. Most tests from v1.0.0 are still present in the current codebase. This means that non-packaged code also has a good chance to work unchanged. I’m happy to work with package maintainers or users whose code breaks after the update.


I have also released several DBI updates to CRAN, mostly to introduce new generics such as dbBind() (for parametrized/prepared queries) or dbSendStatement() and dbExecute() (for statements which don’t return data). The definition of a formal DBI specification is part of the project, a formatted version is updated continuously.


In addition to the textual specification in the DBI package, the DBItest package provides backend independent tests for DBI packages. It can be easily used by package authors to ensure that they follow the DBI specification. This is important because it allows you to take code that works with one DBI backend and easily switch to a different backend (providing that they both support the same SQL dialect). Literate programming techniques using advanced features of roxygen2 help keeping both code and textual specifications in close proximity, so that amendments to the text can be easily tracked back to changes of the test code, and vice versa.

Next steps

The rest of the project will focus on finalizing the specification in both code and text (mostly discussed on GitHub in the issue trackers for the DBI and DBItest projects). At least one new helper package (to handle 64-bit integer types) will be created, and DBI, DBItest, and RSQLite will see yet another release: The first two will finalize the DBI specification, and RSQLite will fully conform to it.

The development happens entirely on GitHub in repositories of the rstats-db organization. Feel free to try out development versions of the packages found there, and to report any problems or ideas at the issue trackers.


RL10N hits its first milestone

By | Blog | No Comments

by Richard Cotton and Thomas Leeper


R is gradually taking over the world (of data analysis).  However, proficiency in English remains a prerequisite for effectively working with R.  While R has a system for translating messages, warnings, and error messages into other languages, very few packages take advantage of this functionality.

Part of the problem is that it currently takes a lot of effort to create translations.  There are a few issues that the RL10N project aims to address. Firstly, the functionality contained in the tools package isn’t particularly easy to work with. Secondly, finding translators can be difficult. RL10N aims to solve both of these problems.

The project has reached its first milestone, having released the poio package to CRAN. Translations of messages are stored in .pot master translation and .po language-specific translation files that are understood by the GNU gettext utility. poio provides functionality to read and write this file format.

Setting up Translations

The workflow to create translation infrastructure for a package is now reasonably straightforward.

First, a .pot master translation file is created using the xgettext2pot from the tools package. The .pot file contains a few lines of metadata, consisting of name-value pairs.

"Project-Id-Version: R 3.3.1\n"
"POT-Creation-Date: 2016-11-06 17:19\n"

After this, it contains message ID lines, along with blank message translation lines.

msgid "This is a message!"
msgstr ""

The second step is to read this file into R, using poio’s read_po function.  (The same function reads both .po and .pot files, automatically detecting which is which.)

pot <- read_po(pot_file)

The file created by x has some incorrect metadata values.  These can be fixed by calling fix_metadata.

pot_fixed <- fix_metadata(pot)

Next, you need to choose some languages to translate your messages into.  You need to specify the languages as a two- or three-letter ISO 639 code.  These include “fr” for French, “zn” for Chinese, and country-specific variations like “pt_BR” for Brazilian Portuguese.  The language_codes dataset shows all the available language and country codes.

For each language, you must generate a language-specific po object from the master translation, using generate_po_from_pot, then write it to a .po file using write_po.

for(lang in c("de", "fr_BE"))
po <- generate_po_from_pot(pot, lang)

That’s it! You are now ready to translate.

Next Steps

The msgtools package is currently under development, and has higher level tools for managing and updating translations, and integrating translations into packages.  The immediate next step is to integrate poio with msgtools and release the latter package to CRAN.

Beyond this, the RL10N project has a plan to tackle the second problem: finding translators.  This will involve integrating automated translation functionality from Google Translate and Microsoft Translator into msgtools, as well as providing assistance with getting human translators.