Skip to main content
Category

Blog

Take the R Consortium’s Survey on R!

By Announcement, Blog, News, R Consortium Project, R Language

by Joseph Rickert and Hadley Wickham

Help us keep the conversation going: Take the R Consortium’s Survey. Let us know: What are you thinking? What do you make of the way R is developing? How do you use R? What is important to you? How could life be better? What issues should we be addressing? What does the big picture look like? We are looking for a few clues and we would like to hear from the entire R Community.

    

The R Consortium exists to promote R as a language, environment and community. In order to answer some of the questions above and to help us understand our mission better we have put together the first of what we hope will be an annual survey of R users. This first attempt is a prototype. We don’t have any particular hypothesis or point of view. We would like to reach everyone who is interested in participating. So please, take a few minutes to take the survey yourself and help us get the word out. The survey will adapt depending on your answers, but will take about 10 minutes to complete.

The anonymized results of the survey will be made available to the community for analysis. Thank you for participating.

                                                                                   Take the survey now!      

现在进行调查!     今すぐ調査をしてください!    Participez à l’enquête en ligne!    ¡Tome la encuesta ahora!

 

 

Code Coverage Tool for R Working Group Achieves First Release

By Blog, News, R Consortium Project, R Language

by Mark Hornick, Code Coverage Working Group Leader

The “Code Coverage Tool for R” project, proposed by Oracle and approved by the R Consortium Infrastructure Steering Committee, started just over a year ago. Project goals included providing an enhanced tool that determines code coverage upon execution of a test suite, and leveraging such a tool more broadly as part of the R ecosystem.

What is code coverage?

As defined in Wikipedia, “code coverage is a measure used to describe the degree to which the source code of a program is executed when a particular test suite runs. A program with high code coverage, measured as a percentage, has had more of its source code executed during testing which suggests it has a lower chance of containing undetected software bugs compared to a program with low code coverage.”

Why code coverage?

Code coverage is an essential metric for understanding software quality. For R, developers and users alike should be able to easily see what percent of an R package’s code has been tested and the status of those tests. By knowing code is well-tested, users have greater confidence in selecting CRAN packages. Further, automating test suite execution with code coverage analysis helps ensure new package versions don’t unknowingly break existing tests and user code.

Approach and main features in release

After surveying the available code coverage tools in the R ecosystem, the working group decided to use the covr package, started by Jim Hester in December 2014, as a foundation and continue to build on its success. The working group has enhanced covr to support even more R language aspects and needed functionality, including:

  • R6 methods support
  • Address parallel code coverage
  • Enable compiling R with Intel compiler ICC
  • Enhanced documentation / vignettes
  • Provide tool for benchmarking and defining canonical test suite for covr
  • Clean up dependent package license conflicts and change covr license to GPL-3

CRAN Process

Today, code coverage is an optional part of R package development. Some package authors/maintainers provide test suites and leverage code coverage to assess code quality. As noted above, code coverage has significant benefits for the R community to help ensure correct and robust software. One of the goals of the Code Coverage project is to incorporate code coverage testing and reporting into the CRAN process. This will involve working with the R Foundation and the R community on the following points:

  • Encourage package authors and maintainers to develop, maintain, and expand test suites with their packages, and use the enhanced covr package to assess coverage
  • Enable automatic execution of provided test suites as part of the CRAN process, just as binaries of software packages are made available, test suites would be executed and code coverage computed per package
  • Display on each packages CRAN web page its code coverage results, e.g., the overall coverage percentage and a detailed report showing coverage per line of source code.

Next Steps

The working group will assess additional enhancements for covr that will benefit the R community. In addition, we plan to explore with the R Foundation the inclusion of code coverage results in the CRAN process.

Acknowledgements

The following individuals are members of the Code Coverage Working Group:

  • Shivank Agrawal
  • Chris Campbell
  • Santosh Chaudhari
  • Karl Forner
  • Jim Hester
  • Mark Hornick
  • Chen Liang
  • Willem Ligtenberg
  • Andy Nicholls
  • Vlad Sharanhovich
  • Tobias Verbeke
  • Qin Wang
  • Hadley Wickham – ISC Sponsor

Improving DBI: A Retrospect

By Blog, News, R Consortium Project, R Language

by Kirill Müller

The “Improving DBI” project, funded by the R consortium and started about a year ago includes the definition and implementation of a testable specification for DBI and making RSQLite fully compliant to the new specification. Besides the established DBI and RSQLite packages, I have spent a lot of time on the new DBItest package. Final updates to these packages will be pushed to CRAN at the end of May. This should give downstream maintainers some time to make accommodations. The follow-up project “Establishing DBI” will focus on fully DBI-compliant backends for MySQL/MariaDB and PostgreSQL, and on minor updates to the specs where appropriate.

DBItest: Specification

The new DBItest package provides a comprehensive backend-agnostic test suite for DBI backends. When the project started, it was merely a collection of test cases. I have considerably expanded the test cases and provided a human-readable description for each, using literate programming techniques powered by roxygen2. The DBI package weaves these chunks of text to a single document that describes all test cases covered by the test suite, the textual DBI specification. This approach ensures that further updates to the specification are reflected in both the automatic tests and the text.

This package is aimed at backend implementers, who now can programmatically check with very little effort if their DBI backend conforms to the DBI specification. The verification can be integrated in the automated tests which are run as part of R’s package check mechanism in R CMD check. The odbc package, a new DBI-compliant interface to the ODBC interface, has been using DBItest from day one to enable test-driven development. The bigrquery package is another user of DBItest.

Because not all DBMS support all aspects of DBI, the DBItest package allows developers to restrict which parts of the specification are tested, and “tweak” certain aspects of the tests, e.g., the format of placeholders in parameterized queries. Adapting to other DBMS may require more work due to subtle differences in the implementation of SQL between various DBMS.

DBI: Definition

This package has been around since 2001, it defines the actual DataBase Interface in R.

I have taken over maintenance, and released versions 0.4-1, 0.5-1, and 0.6-1, with release of version 0.7 pending. The most prominent change in this package is, of course, the textual DBI specification, which is included as an HTML vignette in the package. The documentation for the various methods defined by DBI is obtained directly from the specification. These help topics are combined in a sensible order to a single, self-contained document. This format is useful for both DBI users and implementers: users can look up the behavior of a method directly from its help page, and implementers can browse a comprehensive document that describes all aspects of the interface. I have also revised the description and the examples for all help topics. Other changes include:

  • the definition of new generics dbSendStatement() and dbExecute(), for backends that distinguish between queries that return a table and statements that manipulate data,
  • the new dbWithTransaction() generic and the dbBreak() helper function, thanks Barbara Borges Ribero,
  • improved or new default implementations for methods like dbGetQuery(), dbReadTable(), dbQuoteString(), dbQuoteIdentifier(),
  • internal changes that allow methods that don’t have a meaningful return value to return silently,
  • translation of a helper function from C++ to R, to remove the dependency on Rcpp (thanks Hannes Mühleisen).

Fortunately, none of the changes seemed to have introduced any major regression issues with downstream packages. The news contain a comprehensive list of changes.

RSQLite: Implementation

RSQLite 1.1-2 is a complete rewrite of the original C implementation. Before focusing on compliance to the new DBI specification, it was important to assert compatibility to more than 100 packages on CRAN and Bioconductor that use RSQLite. These packages revealed many usage patterns that were difficult to foresee. Most of these usage patterns are supported in version 1.1-2, the more esoteric ones (such as supplying an integer where a logical is required) trigger a warning.

Several rounds of “revdep checking” were necessary before most packages showed no difference in their check output compared to the original implementation. The downstream maintainers and the Bioconductor team were very supportive, and helped spotting functional and performance regressions during the release process. Two point releases were necessary to finally achieve a stable state.

Supporting 64-bit integers also was trickier than anticipated. There is no built-in way to represent 64-bit integers in R. The bit64 package works around this limitation by using a numeric vector as storage, which also happens to use 8 bytes per element, and providing coercion functions. But when an integer column is fetched, it cannot be foreseen if a 64-bit value will occur in the result, and smaller integers must use R’s built-in integer type. For this purpose, an efficient data structure for collecting vectors, which is capable of changing the data type on the fly, has been implemented in C++. This data structure will be useful for many other DBI backends that need support for a 64-bit integer data type, and will be ported to the RKazam package in the follow-up project.

Once the DBI specification was completed, the process of making RSQLite compliant was easy: enable one of the disabled tests, fix the code, make sure all tests pass, rinse, and repeat. If you haven’t tried it, I seriously recommend test-driven development, especially when the tests are already implemented.

The upcoming release of RSQLite 2.0 will require stronger adherence to the DBI specification also from callers. Where possible, I tried to maintain backward compatibility, but in some cases breaks were inevitable because otherwise I’d have had to introduce far too many exceptions and corner cases in the DBI spec. For instance, row names are no longer included by default when writing or reading tables. The original behavior can be re-enabled by calling pkgconfig::set_config(), so that packages or scripts that rely on row names continue to work as before. (The setting is active for the duration of the session, but only for the caller that has called pkgconfig::set_config().) I’m happy to include compatibility switches for other breaking changes if necessary and desired, to achieve both adherence to the specs and compatibility with existing behavior.

A comprehensive list of changes can be found in the news.

Other bits and pieces

The RKazam package is a ready-to-use boilerplate for a DBI backend, named after the hypothetical DBMS used as example in a DBI vignette. It already “passes” all tests of the DBItest package, mostly by calling a function that skips the current test. Starting a DBI backend from scratch requires only copying and renaming the package’s code.

R has limited support for time-of-day data. The hms package aims at filling this gap. It will be useful especially in the follow-up project, because SQLite doesn’t have an intrinsic type for time-of-day data, unlike many other DBMS.

Next steps

The ensemble CRAN release of the three packages DBI, DBItest and RSQLite will occur in parallel to the startup phase for the “Establishing DBI” follow-up project. This project consists of:

  • Fully DBI compatible backends for MySQL/MariaDB and Postgres
  • A backend-agnostic C++ data structure to collect column data in the RKazam package
  • Support for spatial data

In addition, it will contain an update to the DBI specification, mostly concerning support for schemas and for querying the structure of the table returned for a query. Targeting three DBMS instead of one will help properly specify these two particularly tricky parts of DBI. I’m happy to take further feedback from users and backend implementers towards further improvement of the DBI specification.

Acknowledgments

Many thanks to the R Consortium, which has sponsored this project, and to the many contributors who have spotted problems, suggested improvements, submitted pull requests, or otherwise helped make this project a great success. In particular, I’d like to thank Hadley Wickham, who suggested the idea, supported initial development of the DBItest package, and provided helpful feedback; and Christoph Hösler, Hannes Mühleisen, Imanuel Costigan, Jim Hester, Marcel Boldt, and @thrasibule for using it and contributing to it. I enjoyed working on this project, looking forward to “Establishing DBI”!

Q1 2017 ISC Grants

By Blog, Events

by Hadley Wickham and Joseph Rickert

The Infrastructure Steering Committee (ISC) was very pleased with both the quantity and quality of proposals received during the recent round of funding which closed on February 10th. Funding decisions were difficult. In the end, the ISC awarded grants to ten of the twenty-seven proposals it received for a total award of $234,000. Here is a brief summary of the projects that received awards.

Adding Linux Binary Builders to R-Hub – Award: $15,000. Primary Contact: Dirk Eddelbuettel (edd at debian.org)

This project proposes to take the creation of binary Linux packages to the next level by providing R-Hub with the ability to deliver directly installable binary packages with properly-resolved dependencies. This will allow large-scale automated use of CRAN packages anywhere: laptops, desktops, servers, cluster farms and cloud-based deployments.

The project would like to hear from anyone who could possibly host a dedicated server in a rack for long term use.

An Infrastructure for Building R Packages on MacOS with Hombrew – Award: $12,000. Primary Contact: Jeroen Ooms (jeroenooms at gmail.com)

When installing CRAN packages, Windows and MacOS users often rely on binary packages that contain precompiled source code and any required external C/C++ libraries. By eliminating the need to set up a full compiler environment or manage external libraries this tremendously improves the usability of R on these platforms. Our project will improve the system by adapting the popular Homebrew system to facilitate static linking of external libraries.

Conference Management System for R Consortium Sponsored Conferences – Award: $19,000. Primary Contact: Heather Turner (ht at heatherturner.net)

This project will evaluate a number of open source conference management systems to assess their suitability for use with useR! and satRdays. Test versions of these systems will be set up to test their functionality and ease of use for all roles (systems administrator, local organizer, program chair, reviewer, conference participant). A system will be selected and a production system set up, with a view to be ready for useR! 2018 and future satRdays events.

Continued Development of the R API for Distributed Computing – Award:  $15,000. Primary Contact: Michael Lawrence (michafla at gene.com)

The ISC’s Distributed Computing Working Group explores ways of enabling distributed computing in R. One of its outputs, the CRAN package ddR, defines an idiomatic API that abstracts different distributed computing engines, such as DistributedR and potentially Spark and TensorFlow. The goal of the project is to enable R users to interact with familiar data structures and write code that is portable across distributed systems.

The working group will use this R Consortium grant to fund an internship to help improve ddR and implement support for one or more additional backends. Please contact Michael Lawrence to apply or request additional information.

Establishing  DBI – Award: $26,500. Primary Contact Kirill Müller (krlmlr at mailbox.org)

Getting data in and out of R is an important part of a statistician’s or data scientist’s work. If the data reside in a database, this is best done with a backend to DBI, R’s native DataBase Interface. The ongoing “Improving DBI” project supports the DBI specification, both in prose and as an automated test. It also supports the adaptation of the `RSQLite` package to these specs. This follow-up project aims to implement a modern, fully spec-compliant DBI backends to two major open-source RDBMS, MySQL/MariaDB and PostgreSQL.

Forwards Workshops for Women and Girls – Award $25,000. Primary Contact: Dianne Cook (rowforwards at gmail.com)

The proportion of female package authors and maintainers has remained persistently low, at best at 15%, despite 20 years of the R project’s existence. This project will conduct a grassroots effort to increase the participation of women in the R community. One day package development workshops for women engaged in research will be held in Melbourne, Australia and Auckland, New Zealand in 2017, and at locations yet to be determined in the USA and Europe in 2018. Additionally, one day workshops for teenage girls focused on building Shiny apps will be developed to encourage an interest in programming. These will be rolled out in the same locations as the women’s workshops. All materials developed will be made available under a Creative Commons share-alike license on the Forwards website (http://forwards.github.io).

Joint Profiling of Native and R Code – Award: $11,000. Primary Contact: Kirill Müller (krlmlr at mailbox.org)

R has excellent facilities for profiling R code: the main entry point is the Rprof() function that starts an execution mode where the R call stack is sampled periodically, optionally at source line level, and written to a file. Profiling results can be analyzed with summaryRprof(), or visualized using the profvis,  aprof, or GUIProfiler packages. However, the execution time of native code is only available in bulk, without detailed source information.

This project aims at bridging this gap with a drop-in replacement to Rprof() that records call stacks and memory usage information at both R and native levels, and later commingles them to present a unified view to the user.

R-hub #2 – Award: $89,500. Primary Contact: Gábor Csárdi (csardi.gabor at gmail.com)

R-hub is the first top level project of the R Consortium. The first stage of the project created a multi-platform, R package build server. This proposal includes the maintenance of the current R-hub infrastructure and a number of improvements and extensions including:

  1. R-hub as the first step of package submissions to CRAN
  2. R package reverse dependency checks, on R-hub and locally
  3. General R code execution, on all R-hub platforms
  4. Check and code quality badges
  5. Database of CRAN code
  6. The CRAN code browser

School of Data Material Development – Award: $11,200. Primary Contact: Heidi Seibold (heidi at schoolofdata.ch)

School of Data is a network of data literacy practitioners, both organizations and individuals, implementing training and other data literacy activities in their respective countries and regions. Members of School of Data work to empower civil society organizations (CSOs), journalists, civil servants and citizens with the skills they need to use data effectively in their efforts to create better, more equitable and more sustainable societies

Our R consortium will develop learning materials about R for journalists, with a focus on making them accessible and relevant to journalists from various countries. As a consequence, our content will use country-relevant examples and will be translated in several languages (English, French, Spanish, German).

Stars: Scalable, Spatiotemporal Tidy Arrays for R – Award: $10,000. Primary Contact Edzer Pebesma (edzer.pebesma at uni-muenster.de)

Spatiotemporal and raster data often come as dense, two-dimensional arrays while remote sensing and climate model data are often presented as higher dimensional arrays. Data sets of this kind often do not fit in main memory. This project will make it easier to handle such data with R by using dplyr-style, pipe-based workflows, and also consider the case where the data reside remotely, in a cloud environment. Questions and offers to support are welcome through issues at: https://github.com/edzer/stars .

 

ISC Project Status Webinar

By Blog, Events

Join us for a webinar on Jan 31, 2017 at 9:30 AM PST.

View Recording

Hear about R Consortium activities by watching the first ISC Project Status Webinar held on Tuesday, January 31st at 9:30AM PST (5:30PM GMT), 2017. Join us for 5 minute lightning talks on each active R Consortium project including:

  • R Hub – Gabor Csárdi
  • SatRdays -Gergely Daroczi
  • A Unified Framework for Distributed Computing in R – Michael Lawrence
  • Simple Features for R – Edzer Pebesma
  • Interactive data manipulation in mapview -Tim Appelhans
  • R Documentation Task Force – Andrew Redd
  • R-Ladies – Gabriela de Queiroz
  • Software Carpentry R Instructor Training – Laurent Gatto
  • Improving DBI – Kirill Mueller
  • RL10N: R Localization Proposal – Richard Cotton
  • RC RUGS (R Consortium) – Joseph Rickert
  • Future-proof native APIs for R – Lukas Stadler
  • Code Coverage Tooling for R – Jim Hester
  • RIOT Workshops – Lukas Stadler

The webinar will run approximately 90 minutes

Simple Features Now on CRAN

By Blog, R Consortium Project, R Language

by Edzer Pebesma

Support for handling and analyzing spatial data in R goes back a long way. In 2003, a group of package developers sat together and decided to adopt a shared understanding of how spatial data should be organized in R. This led to the development of the package sp and its helper packages rgdal and rgeos. sp offers simple classes for points, lines, polygons and grids, which may be associated with further properties (attributes), and takes care of coordinate reference systems. The sp package has helped many users and has made it attractive for others to develop new packages that share sp’s conventions for organizing spatial data by reusing its classes. Today, approximately 350 packages directly depend on sp and many more are indirectly dependent.

After 2003, the rest of the world has broadly settled on adopting a standard for so-called “features”, which can be thought of as “things” in the real world that have a geometry along with other properties. A feature geometry is called simple when it consists of points connected by straight line pieces, and does not intersect itself. Simple feature access is a standard for accessing and exchanging spatial data (points, lines, polygons) as well as for operations defined on them that has been adopted widely over the past ten years, not only by spatial databases such as PostGIS, but also more recent standards such as GeoJSON. The sp package and supporting packages such as rgdal and rgeos predate this standard, which complicates exchange and handling of simple feature data.

The “Simple Features for R” project, one of the projects supported by the R Consortium in its first funding round, addresses these problems by implementing simple features as native R data. The resulting package, sf provides functionality similar to the sp, rgdal for vector data, and rgeos packages together, but for simple features. Instead of S4 classes used by the sp family, it extends R’s data.frame directly, adding a list-column for geometries. This makes it easier to manipulate them with other tools that assume all data objects are data.frames, such as dplyr and tidyverse. Package sf links to the GDAL, PROJ.4 and GEOS libraries, three major geospatial “swiss army knives” for respectively input/output, cartographic (re)projections, and geometric operations (e.g. unions, buffers, intersections and topological relations). sf can be seen as a successor to sp, rgdal (for vector data), and rgeos.

The simple feature standard describes two encodings: well-known text, a human readable form that looks like “POINT(10 12)” or “LINESTRING(4 0,3 2,5 1)”, and well-known binary, a simple binary serialization. The sf package can read and write both. Exchange routines for binary encodings were written in Rcpp, to allow for very fast exchange of data with the linked GDAL and GEOS libraries, but also with other data formats or spatial databases.

The sf project on GitHub has received a considerable attention. Over 100 issues have been raised, many of which received dozens of valuable contributions, and several projects currently under development (mapview, tmap, stplanr) are experimenting with the new data classes. Several authors have provided useful pull requests, and efforts have begun to implement spatial analysis in pipe-based workflows, support dplyr-style verbs and integrate with ggplot.

Besides using data.frames and offering considerably simpler data structures for spatial geometries, advantages of sf over the sp family include: simpler handling of coordinate reference systems (using either EPSG code or PROJ.4 string), the ability to return distance or area values with proper units (meter, feet or US feet), and support for geosphere functions to compute distances or areas for longitude/latitude data, using datum-dependent values for the Earth’s radius and flattening.

The sf package is now available from CRAN, both in source form as in binary form for Windows and MacOSX platforms. The authors are grateful to the CRAN team for their strong support in getting the sf package compiled on all platforms. Support from the R Consortium has helped greatly to give this project priority, draw attention in a wider community, and facilitate travel and communication events.

For additional technical information about sf, look here on my website.

 

Call for Proposals

By Blog, R Consortium Project

by Hadley Wickham

The infrastructure Steering Committee (ISC) is pleased to announce that the committee is now ready to accept proposals for the first round of funding in 2017. The ISC is broadly interested in projects that will make a difference to the R community. Don’t be afraid to think big! We have the budget to fund ambitious projects and we want to fund infrastructure that can help large segments of the R community.

Infrastructure includes:

  • Ambitious technical projects (like R-hub), which require dedicated
    time to supply infrastructure that is currently missing in the R
    ecosystem.
  • Community projects (like R-ladies and SatRdays), which help catalyse
    and support the growth of the R community around the world.
  • Smaller projects to develop packages (like DBI and sf), which
    provide key infrastructure used by thousands of R programmers.

The deadline for submitting a proposal is midnight PST, Friday February 10, 2017. For the mechanics of submitting a proposal and some guidance on how to write a good proposal see the Call for Proposals Webpage. Also, if you have ideas for projects, but you’re not sure you have the skills to do them yourself, file an issue with your idea on the wish list that the R Consortium maintains on GitHub.

 

Halfway through “Improving DBI”

By Blog, R Consortium Project, R Language

by Kirill Müller

In early 2016 the R Consortium partially accepted my “Improving DBI” proposal. An important part is the design and implementation of a testable DBI specification. Initially I also proposed to make three DBI backends to open-source databases engines (RSQLite, RMySQL, and RPostgres) compatible to the new DBI specification, but funding allows to work on only one DBI backend. I chose RSQLite for a number of reasons:

  • It is a very important package, judging by the number of reverse CRAN and Bioconductor dependencies
  • It’s easy to work with, because everything (including the database engine) is bundled with the package
  • It seemed to be the most advanced package, closest to the (yet to be completed) DBI specification
  • An informal Twitter poll supports this decision by a tiny margin

The project has reached an important milestone, with the release of RSQLite 1.1. This post reports the progress achieved so far, and outlines the next steps.

RSQLite

While the RSQLite API has changed very little (hence the minor version update), it includes a complete rewrite of the original 1.0.0 sources in C++. This has considerably simplified the code, which makes future maintenance easier, and allows us to take advantage of the more sophisticated memory management tools available in Rcpp, which help protect against memory leaks and crashes.

RSQLite 1.1 brings a number of improvements:

  • New strategy for prepared queries: Create a prepared query with dbSendQuery() or dbSendStatement() and bind values with dbBind(). This allows you to efficiently re-execute the same query/statement with different parameter values iteratively (by calling dbBind() several times) or in a batch (by calling dbBind() once with a data-frame-like object).
  • Support for inline parametrised queries via the param argument to dbSendQuery(), dbGetQuery(), dbSendStatement() and dbExecute(), to protect from SQL injection.
  • The existing methods dbSendPreparedQuery() and dbGetPreparedQuery() have been soft-deprecated, because the new API is more versatile, more consistent and stricter about parameter validation.
  • Using UTF8 for queries and parameters: this mean that non-English data should just work without any additional intervention.
  • Improved mapping between SQLite’s cell-types and R’s column-types.

See the release notes for further changes.

The rewrite was implemented by Hadley Wickham before the “Improving DBI” project started, and has been available for a long time on GitHub. Nevertheless, the CRAN release has proven much more challenging than anticipated, because so many CRAN and Bioconductor packages import it. (Maintainers of reverse dependencies might remember multiple e-mails where I was threatening to release RSQLite “for real”.) My aim was to break as little existing code as possible. After numerous rounds of revdep-checking and improving RSQLite, I’m proud to report that the vast majority of reverse dependencies pass their checks just as well (and as quickly!) as they did with v1.0.0. Most tests from v1.0.0 are still present in the current codebase. This means that non-packaged code also has a good chance to work unchanged. I’m happy to work with package maintainers or users whose code breaks after the update.

DBI

I have also released several DBI updates to CRAN, mostly to introduce new generics such as dbBind() (for parametrized/prepared queries) or dbSendStatement() and dbExecute() (for statements which don’t return data). The definition of a formal DBI specification is part of the project, a formatted version is updated continuously.

DBItest

In addition to the textual specification in the DBI package, the DBItest package provides backend independent tests for DBI packages. It can be easily used by package authors to ensure that they follow the DBI specification. This is important because it allows you to take code that works with one DBI backend and easily switch to a different backend (providing that they both support the same SQL dialect). Literate programming techniques using advanced features of roxygen2 help keeping both code and textual specifications in close proximity, so that amendments to the text can be easily tracked back to changes of the test code, and vice versa.

Next steps

The rest of the project will focus on finalizing the specification in both code and text (mostly discussed on GitHub in the issue trackers for the DBI and DBItest projects). At least one new helper package (to handle 64-bit integer types) will be created, and DBI, DBItest, and RSQLite will see yet another release: The first two will finalize the DBI specification, and RSQLite will fully conform to it.

The development happens entirely on GitHub in repositories of the rstats-db organization. Feel free to try out development versions of the packages found there, and to report any problems or ideas at the issue trackers.

 

RL10N hits its first milestone

By Blog

by Richard Cotton and Thomas Leeper

richie_logo

R is gradually taking over the world (of data analysis).  However, proficiency in English remains a prerequisite for effectively working with R.  While R has a system for translating messages, warnings, and error messages into other languages, very few packages take advantage of this functionality.

Part of the problem is that it currently takes a lot of effort to create translations.  There are a few issues that the RL10N project aims to address. Firstly, the functionality contained in the tools package isn’t particularly easy to work with. Secondly, finding translators can be difficult. RL10N aims to solve both of these problems.

The project has reached its first milestone, having released the poio package to CRAN. Translations of messages are stored in .pot master translation and .po language-specific translation files that are understood by the GNU gettext utility. poio provides functionality to read and write this file format.

Setting up Translations

The workflow to create translation infrastructure for a package is now reasonably straightforward.

First, a .pot master translation file is created using the xgettext2pot from the tools package. The .pot file contains a few lines of metadata, consisting of name-value pairs.

"Project-Id-Version: R 3.3.1\n"
"Report-Msgid-Bugs-To: bugs.r-project.org\n"
"POT-Creation-Date: 2016-11-06 17:19\n"
...

After this, it contains message ID lines, along with blank message translation lines.

msgid "This is a message!"
msgstr ""

The second step is to read this file into R, using poio’s read_po function.  (The same function reads both .po and .pot files, automatically detecting which is which.)

pot <- read_po(pot_file)

The file created by x has some incorrect metadata values.  These can be fixed by calling fix_metadata.

pot_fixed <- fix_metadata(pot)

Next, you need to choose some languages to translate your messages into.  You need to specify the languages as a two- or three-letter ISO 639 code.  These include “fr” for French, “zn” for Chinese, and country-specific variations like “pt_BR” for Brazilian Portuguese.  The language_codes dataset shows all the available language and country codes.

For each language, you must generate a language-specific po object from the master translation, using generate_po_from_pot, then write it to a .po file using write_po.

for(lang in c("de", "fr_BE"))
{
po <- generate_po_from_pot(pot, lang)
write_po(po)
}

That’s it! You are now ready to translate.

Next Steps

The msgtools package is currently under development, and has higher level tools for managing and updating translations, and integrating translations into packages.  The immediate next step is to integrate poio with msgtools and release the latter package to CRAN.

Beyond this, the RL10N project has a plan to tackle the second problem: finding translators.  This will involve integrating automated translation functionality from Google Translate and Microsoft Translator into msgtools, as well as providing assistance with getting human translators.

The start of satRdays

By Blog

by Gergely Daroczi, organizer

Almost 200 people from 19 countries registered for the first satRday conference which was held last Saturday, September 3rd, in Budapest. The final count showed that nearly 170 R users spent 12 hours at the conference venue attending workshops, regular and lighting talks, social events and a data visualization challenge. If you missed the event, you can rewatch the live stream of the conference talks at any time. An abridged version of the video recordings will also be uploaded to the conference homepage along with the slides and related materials in the next couple of days.

There was a pretty intense interest for the conference from the beginning: the registration opened at the end of June, just before the useR! conference, and 90% of the originally planned 150 tickets were gone in a month, when the early-bird period ended. To my great but pleasant surprise, it didn’t become a local Hungarian conference at all: on an average, every third registration came from another country. The 50/50 ratio of academic to industry tickets was similarly stable from the beginning.

We sold around 130 tickets without sharing any details on the line-up of invited and contributed talks, although previously announcing our two keynote speakers (Gabor Csardi and Jeroen Ooms) kind of guaranteed a high quality for the conference. Fortunately, we received a good number of talk proposals and decided to have 25 speakers after all:

speakers.jpg

It took a while to finalize the conference program and to figure out how we would fund an inexpensive event for so many attendees (as the number of registered attendees continued to increase by one or two every day), but things sorted out by the end of August and we received a good amount of financial help (covering 75% of the overall conference expenses) from our sponsors. Thank you!

sponsors.png

And the very early morning of September 3 arrived! I left home at 6am to arrive to the conference venue in time, and it was extremely exciting to see the first attendees arrive:

The registration took a bit longer than I hoped, but after around 10 minutes of delay, all 6 workshops were ready to start. I’m extremely proud of the great line-up of workshop speakers, who provided free training to all attendees on the validation package, H2O, data.table, ggplot2 and shiny.

The conference started with the above noted short delay, but we managed to get back on track in the later sessions — by forcing myself to act as an extremely strict conference chair pushing most of the questions to the coffee breaks. Thanks to all for your highly appreciated cooperation with this!

Gabor Csardi soon proved that it was a very good idea to have him as our morning keynote speaker — he kicked off the conference with an exciting talk on fun stories from the past years of R and also introduced some of his wonderful and extremely useful projects to us. Please keep up the good work!

The R Infrastructure session started right after Gabor’s keynote talk with four presentations on networking, using R and Python, R in MSSQL and other tools along with R for applications such as fighting fraud. Photos of these and other talks will be soon uploaded to the conference homepage, until then, you might want checking the #satRdays Twitter hashtag, where I posted a number of pics. For a quick insight, this is how the conference hall looked like:

The first session ended after noon, so we headed for a quick lunch:

And we soon started the next technical session on different R packages: Arun on data.table, Mark on the validate package and Romain on dplyr — all did a fantastic job not only while working on the packages, but with their talks as well. And yet, one of the most exciting moments of the conference happened between the talks, when one of our speakers decided to ask one of the attendees a very important and personal question: Congratulations to Cecile and Romain!

And we had our first lightning talk (exactly 15 slides each shown for 20 seconds) where Bo did a wonderful job and presented a lot of valuable information and summary in such short period of time. The session ended with our second keynote talk, where Jeroen shared some of his past R projects, showed some really impressive curl examples, and gave an inspiring intro to his new cool magick package for easy and advanced image manipulations on the top of ImageMagick:

The afternoon sessions, both regular and lightning talks, covered a wide range of machine learning tools and use-cases. In addition to the H2O machine learning tools, we learned about how R and ML is used at CERN, multivariate data analysis of time-series, political parties and Thomas Levine’s crazy tools for rendering data as music and virtual kebabs. (Variables were mapped to different spices.)  It was a good mix!

Oh, and I don’t want to forget about the talks on choosing the right tools for different use cases like: catching all Pokemons, visualizing geochemical models, on how to get your boss and colleagues to love R, and an inspiring proposal on the RUG Toolbox to enable networking among local R users; and the chance to learn about how to build JS-heavy, complex Shiny dashboards at Friss for example.

The conference ended with the Data Visualization Challenge, where 8 projects were shown in 3-3 minutes and the audience voted for the best visualizationwhile having a slice of pizza and some beers. It was great to see the very well prepared and creative dashboards and plots:

The formal event ended around 8:30 pm, more than 12 hours after the start of the morning workshops, with nearly 80 attendees walking 15 mins to a nearby pub for some additional informal conversations. For myself, this was the most rewarding moment of the event — to see that all the pretty hard work that Denes and I did during the past months (more on this in a follow-up post) paid off after all: people spent the whole satRday together in a fruitful environment, where new friendships and R package ideas were born.

Hope to see many similar events in the future!