Skip to main content
Category

Blog

Building Data Highways: Kirill Müller’s Journey in Enhancing R’s Database

By Blog

Kirill Müller is the author of the {DBI} package, which helps to connect R and database management systems (DBMS). The connection to a DBMS is achieved through database-specific backend packages that implement this interface, such as RPostgres, RMariaDB, and RSQLite. There’s more information here. Most users who want to access a database do not need to install DBI directly. It will be installed automatically when one of the database backends is installed. If you are new to DBI, the introductory tutorial is an excellent starting point for familiarizing yourself with essential concepts.

{DBI} supports about 30 DBMS, including:

  • MariaDB and MySQL, using the R-package RMariaDB and RMySQL
  • Postgres, using the R-package RPostgres
  • SQLite, using the R-package RSQLite

Kirill Müller is passionate about building, applying, and teaching tools for working with data and has worked on the boundary between data and computer science for more than 25 years. He has been awarded five R consortium projects over the past 8 years to improve database connectivity and performance in R and another one to investigate profiling of R and native code. Kirill is a core contributor to several tidyverse packages, including dplyr and tibble, and the maintainer of the duckdb R package. He holds a Ph.D. in Civil Engineering from ETH Zurich and is a founder and partner at cynkra, a Zurich-based data science consultancy with a heavy focus on R. Kirill enjoys playing badminton, cycling, and hiking.

Your latest work with the R Consortium was focused on the maintenance and support for {DBI}, the {DBItest} test suite, and the 3 backends to open source databases ({RSQLite}, {RMariaDB} and {RPostgres}). You stated that “Keeping compatibility with the evolving ecosystem (OS, databases, R itself, other packages) is vital for the long-term success of the project.” What’s the current status?

DBI and the other projects are available for use. Please try them!

I always strive for a healthy, “green” build, prioritizing clean and efficient outcomes. However, given the complexity of the projects, with their many moving parts and the continuous influx of new developments, achieving perfection at all times can be challenging. My goal is to ensure that everything we build meets a standard of functionality, even if there are moments when the builds don’t meet every expectation.

Fortunately, the generous funding provided by the R Consortium empowers us to address and rectify any issues as they emerge. This financial support is crucial, as it allows for the immediate tackling of problems, ensuring that our projects remain on the cutting edge and continue to serve the community effectively. Despite the occasional imperfections, my commitment is to promptly and efficiently solve these problems, maintaining the high quality and reliability of our builds.

More information available here.

Is performance an issue with big data sets? Does R have performance advantages or disadvantages compared to other languages?

R has unique strengths as a powerful interface language. R treats data frames as first-class data structures. Functions and expressions are first-class objects, enabling easy parsing, computing, and emitting code, fostering robust programming practices. Moreover, R’s “pass by value” semantics (to be more accurate, “pass by reference and copy on write) ensure that functions do not inadvertently alter your data. This eliminates concerns over state management and makes data manipulation both predictable and secure.

Despite performance considerations, R is adept at efficiently handling bulk data operations. For example, working with columnar data frames that contain anywhere from 100,000 to 3 million rows is smooth due to R’s vectorized approach, allowing for efficient column-wise addition or multiplication. However, the performance can decline if large data frames are processed cell by cell.

And here’s the true power of R: As an interface language, R enables the use of external, high-speed engines—be it DuckDB, Arrow, traditional databases, or data processing tools like data.table and collapse—for computation, while R itself is used to compose the commands for these engines. This integration showcases R’s versatility and efficiency by leveraging the strengths of these specialized engines for heavy lifting, thereby bypassing its performance limitations in certain areas.

Therefore, the focus should not be just on performance in isolation but rather on what can be achieved through R’s integration with other systems. This flexibility allows R to extend its utility well beyond mere data processing, making it a potent tool not only for technical users but also accessible to those with less technical expertise. The intuitive syntax of R, especially with domain-specific languages like dplyr, makes it exceptionally user-friendly, resembling plain English and facilitating a smoother workflow for a wide range of tasks.

Who uses databases and R most? Are they already using R and need to connect to different types of DBMS? 

As an interface language, R is remarkably versatile. It is designed to facilitate connections with a broad spectrum of tools and systems. This versatility positions R as a central hub for orchestrating a wide range of tasks, enabling users to maintain their workflow within the platform without wrestling with complex interfaces. Command-line interfaces are acceptable, offering a decent level of control and flexibility. File-based interfaces, on the other hand, can be cumbersome and inefficient, making them far from ideal for dynamic data management tasks. 

The spectrum of interfaces available for database interaction varies. The most effective solution is an R package that includes bindings to a library. This setup provides a direct conduit to the necessary functionality, streamlining the interaction process. Examples are DBI backends for PostgreSQL, SQLite, MySQL, and ODBC, or the new ADBC (Arrow Database Connectivity) standard (more on that later). These backends facilitate direct, low-friction access to databases from within R.

Focusing on native solutions, I want to emphasize the potential of the dm package, which I see as offering substantial benefits beyond what the DBI backends might provide. The dm package closely integrates database concepts with R. It enables sophisticated operations, such as the management of data models with primary and foreign keys, execution of complex joins, and the transformation of data frames into a fully operational data warehouse within R. These capabilities extend and enhance the functionalities provided by dplyr, dbplyr, and DBI, offering a comprehensive toolkit for database management directly through R.

RMySQL is being phased out in favor of the new RMariaDB package. Why?

When I first got involved with the DBI Library, it was after being awarded my first contract, which focused on connecting R to SQLite, PostgreSQL, and MariaDB. It’s important to note that MariaDB and MySQL are essentially related; MariaDB is a fork of MySQL. Despite their independent evolution, they remain largely interchangeable, allowing connections to either MariaDB or MySQL databases without much trouble. This similarity can sometimes cause confusion.

In terms of technical specifics, our MySQL package utilizes C to create bindings to its underlying library, while our DBI package prefers C++, which I find more user-friendly for these tasks. When I took charge of the project, these packages were already separate, and I didn’t challenge that decision. Starting anew offers the benefit of not needing to maintain backward compatibility with existing our MySQL users, which has posed significant challenges, especially with the RSQLite package. That package’s widespread use across several hundred other packages meant we had to conduct reverse dependency checks, running tests from those packages against modifications in ours to ensure compatibility. This process, essentially an enhanced test suite, required considerable effort.

Reflecting on it now, I would have preferred to initiate a project like RSQLite, to begin with a clean slate. Starting fresh allows for quicker progress without the constraints of backward compatibility or the expectation of maintaining behaviors that may no longer be relevant or supported. However, you also want to avoid alienating your user base. So, transitioning to C++ and starting from scratch was a strategic choice, and it was one that the maintainer of our MySQL and I agreed upon.

I should mention the odbc package, which isn’t included in the scope of R Consortium projects but is essential to our work. We use the odbc package extensively to connect with a variety of commercial databases and specialized database systems, some of which might not offer a straightforward method for direct interaction. In our setup, the odbc package acts as a crucial database interface, bridging the gap between the database itself and DBI.

There’s been a significant new development in this space, known as ADBC, accompanied by an R package named adbi. This initiative, spearheaded by Voltron Data, represents a modern reimagining of ODBC, specifically designed to enhance analytical workflows. While traditional databases have been geared towards both reading and writing data, ADBC focuses on optimizing data reading and analysis, recognizing that data science and data analysis workflows predominantly require efficient data reading capabilities. This focus addresses a gap left by ODBC, which wasn’t originally designed with high-speed data analysis in mind.

These developments are exciting, and I’m keen to see what the future holds for us in this evolving landscape.

What’s the difference between DBI and dbplyr?

I could describe it as a relationship between DBI and dbplyr, where dbplyr acts as a user of DBI. DBI supplies the essential functionality that enables dbplyr to operate effectively. This setup allows dbplyr to concentrate on constructing SQL queries, while other packages handle the responsibility of connecting to the individual databases.

What are the biggest issues with using R and databases moving forward?

The current DBI project faces challenges that are tough to solve within its existing scope. These challenges could significantly impact many dependent components, which is why this repository has little code and serves mainly as a placeholder for ideas we think DBI is missing. However, these ideas have the potential to become significant enhancements.

One major technical challenge I’ve faced is with query cancellation. If a query runs too long, the only option is to terminate the process, which stops our entire session. This issue is closely related to the concept of asynchronous processing, where a query is sent off, and other tasks are done in parallel until the query results are ready. This would be especially useful in applications like Shiny, allowing it to handle multiple user sessions simultaneously within the same R process. Finding a solution to this problem is crucial due to the current lack of effective alternatives in our infrastructure.

While not every issue signifies a major problem, there are certainly areas that DBI does not address, some of which may be beyond its intended scope. Still, there are notable gaps that require attention.

As for our original plan, we’re taking a different direction thanks to the introduction of the ADBC via the adbi package. ADBC offers a stronger foundation for achieving our goals. With ADBC, all data is funneled through the Arrow data format, which means we no longer need individual backends to translate data into R data frames separately, and at the same time other ecosystems can be integrated easier. In addition, a substantial part of the known challenges for DBI, including query cancellation and asynchronous processing, are already solved by ADBC. Using ADBC as a bridge between databases and ecosystems reduces the complexity from a many-to-many (n × m) problem to a more manageable one-to-one (n + m) problem. This reduces duplication of effort and makes it easy to support new databases or new ecosystems. More information here.

How has it been working with the R Consortium? Would you recommend applying for an ISC grant to other R developers?

This is an excellent opportunity for young professionals to secure funding for their ideas or explore areas that haven’t been fully addressed yet. R is a fantastic tool, but it constantly evolves with new technologies being introduced. I’m particularly impressed by how the consortium supports various projects, including R-Ladies and SatRdays, which promote inclusivity within the community. I was approached with the idea of applying for a project, something I might not have considered alone. This makes me curious whether there’s a list of challenges similar to what the Google Summer of Code offers, where potential mentors submit project ideas for students to work on under their guidance. I haven’t looked into this possibility for the consortium in detail yet, but the thought of it excites me. I thoroughly enjoy being part of this process and am eager to see what long-term collaborations might emerge from it.

About ISC Funded Projects

A major goal of the R Consortium is to strengthen and improve the infrastructure supporting the R Ecosystem. We seek to accomplish this by funding projects that will improve both technical infrastructure and social infrastructure.

Decade of Data: Celebrating 10 Years of Innovation at the New York R Conference

By Announcement, Blog, Events

Come celebrate 10 YEARS of the New York R Conference! This year’s conference takes place May 16th & 17th with workshops May 15th. We are taking a trip down memory lane and looking back over the past nine years. Come listen to some of the all-time greats who will be gracing our stage once again, and we’re also adding some fresh and exciting new voices to the mix!

This year’s conference features an a-list lineup of speakers who will be sharing their expertise on a wide range of topics, including data visualization, machine learning, programming, AI and more.

Speakers include:

And more will be announced soon!

There are also interactive workshops available to sharpen your skills and learn new techniques.

Workshops include:

Use promo code RSTATS20 for 20% off conference & workshop tickets (in-person & virtual). To secure your spot and learn more about the speaker lineup and workshops, visit rstats.ai/nyr

Also, follow @rstatsai on Twitter to stay up to date with all conference details. 

Be a part of this extraordinary opportunity to acquire new skills, foster growth, and connect with the data science community.

The Impact of R on Academic Excellence in Manchester, UK

By Blog

The R Consortium recently spoke with the organizing team of the R User Group at the University of Manchester (R.U.M.). R.U.M. aims to bring together R users of all levels to share R best practices, expertise and knowledge. The group is open to all staff and postgraduate researchers at the University of Manchester, UK.

During the discussion, the team shared details about their recent events and their plans for this year. They also discussed the latest trends in the R programming language and how they are utilizing it in their work.

Martín Herrerías Azcué
Research Software Engineer
University of Manchester
Anthony Evans
Research Software Engineer
University of Manchester

Lana Bojanić
Researcher PhD Candidate
University of Manchester
Rowan Green
PhD Student in Evolutionary Microbiology 
The University of Manchester

Please share about your background and involvement with the RUGS group.

Martin: My name is Martin, and I joined the University of Manchester a year ago. They assigned me to manage the R user group, which was previously under Camila’s leadership. Although I am officially in charge, this is a collaborative effort between all of us who are present in this meeting, along with some others who couldn’t join. I work in Research IT and mainly use R for projects assigned to me by other people.

Anthony: My name is Anthony and I work at Research IT with Martin at the University of Manchester. I first came into contact with R when I was a student. Later, I became a helper at many of the university’s R training courses based on the Carpentries training courses. Camila, who was Martin’s predecessor, was also a trainer at R and she formed the R Users Manchester group. I volunteered to help her with the group a year ago, and it just turned a year old. After that, I continued to be a part of the group.

Lana: Hi there, my name is Lana. I am a PhD student and research assistant at the Division of Psychology and Mental Health at the University of Manchester. I have been using R for the past six years, ever since my Master’s degree. I have been a part of the group since its inception and have been running R introduction sessions for beginners within my division for a couple of years now. When I learned the group was being formed, I contacted Camila a year ago. This makes us founding members of the group. 

Rowan: Hello, my name is Rowan Green. I am currently a PhD student in the Department of Earth and Environmental Sciences. For my research work, I use R extensively for simulation modeling bacteria, analyzing lab data, and creating visualizations. The best thing about using R is that it produces much prettier visualizations than other options available to us as biologists. We have a lot of master’s and undergraduate students coming through the lab. I often give them pre-written scripts they can tweak to create their plots. It’s exciting to see them working hard to produce their plots.

Camilla mentioned starting a group to share knowledge about R on a university-wide level. I found this a great opportunity to participate and learn from others’ presentations during the meetings. It has been an enriching experience so far.

Can you share what the R community is like in Manchester? 

Anthony: In industries such as banking and finance, R is frequently used to create graphs to showcase econometric data in an easy-to-understand manner. The graphical capabilities of this programming language make it a popular choice in these fields. The university we’re in has access to the Financial Times, which is known for producing visually stunning graphs. Interestingly, they also use an R package called FT plot tools, which is a specialized package solely for their use. So, it’s safe to say that R has a significant presence in the banking and finance sectors. 

Are your meetups virtual or in-person? What topics have you covered recently? What are your plans for the group in the future?

Martin: Our events are a mix of in-person and online meetings. There have been talks about developing packages, data visualization, automating reports, and working with tables. We usually cover topics we are confident about or know people from the university are working on. However, we are also trying to get external speakers to come and talk. It’s challenging, but we are doing our best to make it happen. We are currently accepting proposals from potential speakers.

Our book club has mostly or completely taken place online.

Lana:  Bookclub was mostly online. During the summer book club, we were reading R for Data Science. We covered a chapter or two chapters each time. We had the book’s second edition, and people from all over the university joined the club.

We were discussing the possibility of changing the format of Tidy Tuesdays. We received feedback that people don’t have enough time to come up with something extra creative every month. Additionally, there has been a need for more practice. Therefore, we plan to redesign Tidy Tuesdays to be more practice-oriented than creativity-oriented. We will be implementing these changes this year.

Anthony: We’ve recently had several discussions on useful packages, particularly in R. Some packages that were developed and published were custom-made. We also had presentations on the cosinor and cosinor2 packages, which are used for fitting curves, and an R update package for validating clinical prediction models.

There are two other R groups in Manchester. Our aim for this year is to establish communication with them and collaborate in a coordinated manner. (Editor’s Note: We recently talked with the Manchester R User Group.) Currently, our group solely focuses on the internal R community at the University of Manchester.

Any techniques you recommend using for planning for or during the event?

Rowan: I’m not sure if everyone would agree with me, but I think we did well in the format of our meetings. We started with brief, brief talks – within an hour – followed by questions and discussions, which worked well. 

However, the harder part has been promoting and informing people about the meetings. Sometimes, word of mouth has been more effective than emails and posters. I noticed that they were interested in attending when I encouraged my lab group, who all use R. But without any scheduled reminders and someone to encourage them, it may be difficult to get people to come.

Lana: It’s important to identify everyone’s strengths or specialties within the organizing group, as they will probably be useful in the first few events. After that, you can expand your network within the community, which is easy to do since people are easily reachable. This will allow you to find interesting topic ideas and strengths to draw from.

What trends do you currently see in R language?

Martin: I’ve noticed a growing interest in Shiny lately, as I manage a pilot server for the university and have seen an increase in users over time. There have also been several inquiries about using R within our high-performance computer cluster, which may be something we can offer to the university. This interest is not surprising, given the current hype around machine learning.

A trending area that applies to multiple platforms, not just R, is towards reproducible research and compatibility between different programming languages. This means that R can be integrated with Python and other languages to create a documented and integrated pipeline. I’ve been experimenting with SnakeMake, which works well with R, but it would be great to see more integration from the R side, perhaps through the common workflow language or another similar tool.

Please share about a project you are currently working on or have worked on in the past using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in?

Rowen: Recently, I wrote a preprint of a paper where we simulated the growth and mutation of bacteria using differential equations and R programming language. To perform the simulation, we utilized high-performance computing, which enabled us to simulate various ways the bacteria could grow by adjusting the rates of reactions occurring within the cells. This simulation required high-performance computing to be feasible for running multiple simulations.

After running simulations, we came up with some ideas to test in the lab. Our focus was on measuring mutation rates, and we used statistical analysis to estimate them through R. We have been striving to ensure reproducibility, and as a result, we have annotated all the data tables and R scripts with the paper.

It has been an interesting journey for me. I had to tidy up my messy scripts and think about how someone else would perceive them. I had to ensure they made sense. However, the project was fascinating as I generated hypotheses using R, tested them, and analyzed and visualized them with the same tool. R is a complete tool that can handle all aspects of the process, making it a brilliant choice.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

EARL Early Bird Tickets Are Now Available!

By Announcement, Blog, Events

Contributed by Abbie Brookes, Senior Data Analyst at Datacove

Datacove is pleased to announce the availability of tickets for the upcoming EARL (Enterprise Applications of the R Language) conference. 

The EARL conference is a cross-sector event that will be held at the Grand Hotel in Brighton. This venue promises to provide attendees with a blend of Victorian elegance and modern conference facilities over three days, from the 3rd to the 5th of September 2024. The conference schedule includes high-quality workshops on the first day (3rd September) and two days of presentations and talks (4th – 5th September). An evening networking event is planned for the 4th of September at the British Airways i360 venue, offering attendees the opportunity to connect with peers and speakers in a relaxed setting.

We are offering tickets at a reduced early bird rate. Additionally, we provide discounts for government employees, NHS staff, charity workers, academics, and those making bulk purchases. For more detailed information on ticket pricing and discounts, contact Abbie Brookes at abbie.brookes@datacove.co.uk.

The EARL conference draws attendees from across the globe and from a variety of sectors. Previous participants have included notable organizations such as The Dogs Trust, BBC, Microsoft, Swiss RE, Posit, Sainsburys, and Bupa.  

This year’s keynote speakers include:

Professor Andy Field, known for his contributions to statistics education
Christel Swift, a senior data scientist at the BBC
Hadley Wickham, a key figure in the R community and author of the Tidyverse

In addition to the main conference, a selection of pre-conference workshops will be available, offering in-depth training opportunities. For more information on the conference venue, schedule, and registration, please visit our website. We invite you to join us for what promises to be an informative and engaging event for the R and Python communities

R/Medicine Coming June 10-14, 2024 – Call for Abstracts Open – Keynotes Announced

By Announcement, Blog, Events

The R/Medicine conference provides a forum for sharing R based tools and approaches used to analyze and gain insights from health data. Conference workshops provide a way to learn and develop your R skills. Midweek demos allow you to try out new R packages and tools, and our hackathon provides an opportunity to learn how to develop new R tools. The conference talks share new packages, and successes in analyzing health, laboratory, and clinical data with R and Shiny with a vigorous ongoing discussion with speakers (with pre-recorded talks) in the chat.

Register now!

Stephanie Hicks

Statistical Challenges in Single-Cell and Spatial Transcriptomics

Thursday, June 13

Biography

Stephanie Hicks, PhD, MA and Associate Professor of Biomedical Engineering and Biostatistics at Johns Hopkins University, is an applied statistician working at the intersection of genomics and biomedical data science.

Gundula Bosch

Reproducibility in Medical Research

Friday, June 14

Biography

Gundula Bosch, PhD, MEd ’16, MS, is a scientist and educator leading global education reform through training programs in critical, broad, and interdisciplinary scientific thinking. She is the director of the R3 Center for Innovation and Science Education at the Johns Hopkins School of Public Health.

Keynote Addresses

Call for Abstracts

R/Medicine is seeking abstracts for:

  • Lightning talks (10 min, Thursday, June 13, or Friday, June 14) Can pre-record so that you can be live on chat to answer questions
  • Regular talks (20 min, Thursday, June 13, or Friday, June 14) Can pre-record so that you can be live on chat to answer questions
  • Demos (1 hour demo of an approach or a package, Wednesday, June 12) Done live, preferably interactive
  • Workshops (2-3 hours per topic, Monday, June 10, or Tuesday June 11, usually with a website and a repo, participants can choose to code along. Usual 5-10 min breaks each hour.
  • Posters for poster session on Wednesday, June 12. Can include live demos of an app or a package.

Confirmed Workshops (Monday, June 10, and Tues, June 11)

Note: Final dates and times TBD. More workshops being added. Check the R/Medicine website for updates.

  • Causal Inference with R – Lucy D’Agostino and Malcolm Barrett
  • Tidying your REDCap data with REDCap Tidier – Stephan Kadauke and Will Beasley
  • Next Generation Shiny apps with bslib – Garrick Aden-Buie

Register here!

The R/Medicine website is being updated as we receive the latest information. Please check in again soon!

Unlocking Financial Insights: Join Us at the R Finance Conference

By Announcement, Blog, Events

Are you ready to delve into the world of finance through the lens of R? Look no further than the R Finance Conference (May 18, 2024,  University of Illinois Chicago) – your gateway to cutting-edge insights, advanced methodologies, and unparalleled networking opportunities. As an enthusiast of data-driven finance or an R programming aficionado, this single-track, one-day event promises to be an enlightening experience. R Finance is the must-attend event in the realm of financial technology.

Registrations are now open! Register here. 

A Glimpse into History

Founded in 2009, the R Finance Conference quickly evolved into the premier event in the financial technology landscape. Originating from the shared enthusiasm of R users in the Chicago financial center, a group of loosely connected enthusiasts was seeking to improve financial analysis. From its humble beginnings to its current stature, it remains committed to fostering knowledge exchange and driving advancements in R-based finance.

Why Choose a Single-Track Event?

One distinctive feature of the R Finance Conference is its single-track format. Unlike multi-track conferences, where attendees must choose between concurrent sessions, a single-track event offers a shared group experience. Single track offers:

Focused Learning:

Attendees can fully immerse themselves in each session without the distraction of conflicting schedules. This focused approach enhances learning and ensures that participants extract maximum value from every presentation.

Enhanced Networking:

The single-track format encourages interaction among attendees as everyone gathers in the same sessions. This facilitates meaningful discussions, idea exchange, and networking opportunities with like-minded professionals, fostering a sense of community and collaboration.

Comprehensive Coverage:

By following a single track, attendees gain exposure to a diverse range of topics and perspectives within the realm of R-based finance. From quantitative modeling and algorithmic trading to risk management and data visualization, each session contributes to a holistic understanding of the subject matter.

Key Highlights of R Finance Conference

  • Expert Speakers: Renowned experts and thought leaders in finance and data science share their insights, best practices, and real-world experiences. In 2022, speakers included Matthew Dixon, Associate Professor, Department of Applied Math and Affiliate Professor, Stuart School of Business, Illinois Tech; Veronika Rockova, Professor of Econometrics and Statistics, University of Chicago, Booth School of Business and James S. Kemper Foundation Faculty Scholar; and Thomas P. Harte, Head of Fixed Income & Liquidity Strats at Morgan StanleyInteractive Workshops: Hands-on workshops provide attendees with practical skills and techniques to implement R-based solutions in their professional endeavors.
  • Networking Opportunities: Engage with industry peers, establish valuable connections, and exchange ideas during networking breaks, social events, and interactive sessions.
  • Exhibition Showcase: Explore cutting-edge technologies, tools, and services offered by exhibitors and sponsors, offering valuable insights into the latest innovations in financial technology.

Join Us at R Finance 2024

Don’t miss out on the opportunity to elevate your finance skills and network with industry leaders at the R Finance Conference 2024. Reserve your spot today and embark on a transformative journey in R-based finance.

For more information and registration, visit R Finance Conference.

Register now!

Empowering R Enthusiasts: SatRDays London 2024 Unveiled

By Blog

SatRDays London 2024 is set to ignite the data science community with a vibrant lineup of speakers and a rich array of topics ranging from survival analysis to geospatial data. This inclusive event, designed for R enthusiasts at all levels, emphasizes networking and collaboration amidst the backdrop of King’s College London’s iconic Bush House. Keynote speakers like Andrie de Vries, Nicola Rennie, and Matt Thomas bring unparalleled expertise, offering attendees a unique opportunity to deepen their knowledge and connect with peers. As a hub of innovation and learning, SatRDays London promises to be a cornerstone event for anyone passionate about R and its applications in the real world.

Register Now!

How does this year’s satRDays in London compare to last year’s event? What’s new and different?

After a successful SatRdays London in 2023, we are keeping the format the same, but with a whole new lineup of speakers! This year we’re excited to welcome: 

  • Andrie de Vrie – Posit
  • Hannah Frick – Posit
  • Charlie Gao – Hibiki AI Limited
  • Michael Hogers – NPL Markets Ltd
  • Matthew Lam & Matthew Law – Mott MacDonald
  • Myles Mitchell – Jumping Rivers
  • Nicola Rennie – Lancaster University
  • Matt Thomas – British Red Cross

Talk topics for the day include survival analysis, geospatial data, styling PDFs with Quarto and using R to teach R, as well as a range of other exciting themes! The talks can reach a varied audience from aspiring data scientists right to the experienced audiences.

Take a look at the full list on the conference website for more information.

Who should attend? And what types of networking and collaboration opportunities should attendees expect?

Anyone and everyone with an interest in R! The SatRdays conferences are designed to be low cost, to allow as many to attend as possible, and they’re on a SatRday, so you don’t have to worry about getting time off work if your job isn’t necessarily R focussed.

Networking is the main focus of the event. We have multiple coffee breaks to give attendees the opportunity to interact with fellow R enthusiasts. If you’re brand new to this kind of event, and are not sure where to start, don’t worry! Find one of the attendees from JR, and we’ll be happy to help you make introductions! 

Can you share some insights into the keynote speakers, their areas of expertise, and how they will contribute to the overall experience at SatRDays?

At this year’s event, we have talks from three invited speakers – Andrie de Vries of Posit, Nicola Rennie from the University of Lancaster and Matt Thomas of the British Red Cross.

Andrie is Director of Product Strategy at Posit (formerly RStudio) where he works on the Posit commercial products. He started using R in 2009 for market research statistics, and later joined Revolution Analytics and then Microsoft, where he helped customers implement advanced analytics and machine learning workflows.                 

Nicola is a lecturer in health data science based at the Centre for Health Informatics, Computing, and Statistics at Lancaster University. She is particularly interested in creating interactive, reproducible teaching materials and communicating data through effective visualisation. Nicola also collaborates with the NHS on analytical and software engineering projects, maintains several R packages, and organises R-Ladies Lancaster.

Matt is Head of Strategic Insight & Foresight at the British Red Cross. His team conducts research and analysis to understand where, how and who might be vulnerable to various emergencies and crises within the UK.                  

Could you elaborate on the types of sessions and workshops available and how they cater to different interests and skill levels within the R community?

The day will consist of eight 25-ish minute talks, plus Q&A, from a variety of speakers across various sectors. 

The talks are on a wide range of topics. For example, last year we had speakers talking about everything from using R for mapping air quality, to EDI and sustainability in the R project, and why R is good for data journalism. If you want to take a look at what you can expect, we have a playlist of last year’s talk recordings available on our YouTube channel.

With the event being hosted at King’s College London, how does the venue enhance the experience for attendees, both in terms of facilities and location?

We’re very excited to be partnering with CUSP London again this year, who provide the amazing Bush House venue at King’s College London. The venue is a beautiful listed building, right in the heart of London, only a few minutes walk from Covent Garden. 

Being in the center of London means easy access to multiple public transport links, both for national and international attendees!

The venue facilities and supporting technology provides a great space for sharing insights and networking.

Don’t miss out, register today!

Aligning Beliefs and Profession: Using R in Protecting the Penobscot Nation’s Traditional Lifeways

By Blog
Angie Reed sampling Chlorophyll on the Penobscot River where a dam was removed 

In a recent interview by the R Consortium, Angie Reed, Water Resources Planner for the Penobscot Indian Nation, shared her experience learning and using R in river conservation and helping preserve a whole way of life. Educated in New Hampshire and Colorado, Angie began her career with the Houlton Band of Maliseet Indians, later joining the Penobscot Indian Nation. Her discovery of R transformed her approach to environmental statistics, leading to the development of an interactive R Shiny application for community engagement. 

pαnawάhpskewi (Penobscot people) derive their name from the pαnawάhpskewtəkʷ (Penobscot River), and their view of the Penobscot River as a relative guides all of the Water Resources Program’s efforts. This perspective is also reflected in the Penobscot Water Song, which thanks the water and expresses love and respect.  Angie has been honored to:

  • work for the Water Resources Program, 
  • contribute to the Tribal Exchange Network Group,
  • engage young students in environmental stewardship and R coding, blending traditional views with modern technology for effective environmental protection and community involvement, and
  • work with Posit to develop the animated video about Penobscot Nation and show it at the opening of posit:conf 2024

Please tell us about your background and how you came to use R as part of your work on the Penobscot Indian Nation.

I grew up in New Hampshire and completed my Bachelor of Science at the University of New Hampshire, followed by a Master of Science at Colorado State University. After spending some time out west, I returned to the Northeast for work. I began by joining the Houlton Band of Maliseet Indians in Houlton, Maine, right after finishing my graduate studies in 1998. Then, in 2004, I started working with the Penobscot Indian Nation. Currently, I work for both tribes, full-time with Penobscot and part-time with Maliseet.

My first encounter with R was during an environmental statistics class taught by a former USGS employee, Dennis Helsel during a class he taught for his business Practical Stats. He introduced us to a package in R called R Commander. Initially, I only used it for statistics, but soon, I realized there was much more to R. I began teaching myself how to use ggplot for graphing. I spent months searching and learning, often frustrated, but it paid off as I started creating more sophisticated graphs for our reports.

We often collaborate with staff from the Environmental Protection Agency (EPA) in Region One (New England, including Connecticut, Maine, Massachusetts, New Hampshire, Rhode Island, Vermont and 10 Tribal Nations). One of their staff, Valerie Bataille, introduced us to R Carpentries classes. She organized a free class for tribal staff in our region. Taking that class was enlightening; I realized there was so much more I could have learned earlier, making my journey easier. This experience was foundational for me, marking the transition from seeing R as an environmental statistics tool to recognizing its broader applications. It’s a bit cliché, but this journey typifies how many people discover and learn new skills in this field.

The Penobscot Nation views the Penobscot River as a relative or family. How does that make water management for the Penobscot River different from other water resource management?

If you watch The River is Our Relative, the video delves deeper into seeing the river from a relative, beautiful, and challenging perspective. This view fundamentally shifts how I perceive my work, imbuing it with a deeper meaning that transcends typical Western scientific approaches to river conservation. It’s a constant reminder that my job aligns with everything I believe in, reinforcing that there’s a profound reason behind my feelings.

Working with the Penobscot Nation and other tribal nations to protect their waters and ways of life is an honor and has revealed the challenges of conveying the differences in perspective to others. Often, attempts to bridge the gap get lost in translation. Many see their work as just a job, but for the Penobscot people, it’s an integral part of their identity. It’s not merely about accomplishing tasks; it’s about their entire way of life. The river provides sustenance, acts as a transportation route, and is a living relative to whom they have a responsibility. 

How does using open source software allow better sharing of results with Penobscot Nation citizens?

My co-worker, Jan Paul, and I had the pleasure of attending and presenting at posit::conf 2023   and working with Posit staff to create an animated video that describes what we do and how opensource and Posit tools help us do it.  It was so heart-warming to watch the video shown to all attendees at the start of conf, and was a great introduction to my shameless ask for help during my presentation and through a table where I offered a volunteer sign-up sheet/form, I was humbled by the number of generous offers and am already  receiving some assistance on a project I’ve been eager to accomplish. Jasmine Kindness, One World Analytics, is helping me recreate a Tableau viz I made years ago as an interactive, map-based R Shiny tool. 

I find that people connect more with maps, especially when it comes to visualizing data that is geographically referenced. For instance, if there’s an issue in the water, people can see exactly where it is on the map. This is particularly relevant as people in this area are very familiar with the Penobscot River watershed.  My aim is to create tools that are not only interactive but also intuitive, allowing users to zoom into familiar areas and understand what’s happening there. 

This experience has really highlighted the value of the open source community. It’s not just about the tools; it’s also about the people and the generosity within this community. The Posit conference was a great reminder of this, andthe experience of working with someone so helpful and skilled has truly reinforced how amazing and generous the open source community is.

How has your use of R helped to achieve more stringent protections for the Penobscot River?

Before we started using open source tools, my team and I had been diligently working to centralize our data management system, which significantly improved our efficiency. A major shift occurred when we began using R and RStudio (currently Posit) to extract data from this system to create summaries. This has been particularly useful in a biennial process where the State of Maine requests data and proposals for upgrading water quality classifications.

In Maine, water bodies are classified into four major categories: AA, A, B, and C. If new data suggests that a water body, currently classified as a lower grade, could qualify for a higher classification, we can submit a proposal for this upgrade. In the past we have facilitated upgrades for hundreds of miles of streams, however it took much longer to compile the data.  For the first time in 2018 we used R and RStudio to prepare a proposal to the Maine Department of Environmental Protection (DEP) to upgrade the last segment of the Penobscot River from C to B.  Using open source tools, we were able to quickly summarize data and compile data into a format that could be used for this proposal, a task that previously took a significantly longer time.  DEP accepted our proposal because our data clearly supported the upgrade.  In 2019, the proposal was passed and we anticipate this process continuing to be easier in the future.

You are part of a larger network of tribal environmental professionals, working together to learn R and share data and insights. Can you share details about how that works?

Jan Paul, Water Quality Lab Coordinator at Penobscot Nation, sampling in field

I’m involved in the Tribal Exchange Network Group (TXG), which is a national group of tribal environmental professionals like myself and is funded by a cooperative agreement with the Office of Information Management (OIM) at the Environmental Protection Agency (EPA). We work in various fields, such as air, water, and fisheries, focusing on environmental protection. Our goal is to ensure that tribes are well-represented in EPA’s Exchange Network, and we also assist tribes individually with managing their data.

Since attending a Carpentries class, I’ve been helping TXG organize and host many of them. We’ve held one every year since 2019, and we’re now moving towards more advanced topics. In addition to trainings, TXG provides a variety of activities and support, including small group discussions, 1-on-1 assistance and  conferences.  Although COVID-19 disrupted our schedule we are planning our next conference for this year.

Our smaller, more conversational monthly data drop-in sessions always include the opportunity to have a  breakout room to work on R. People can come with their R-related questions, or the host might prepare a demo.

Our 1-on-1  tribal assistance hours allows Tribes tosign up for help with issues related to their specific data. I work with individuals on R code for various tasks, such as managing temperature sensor data or generating annual assessment reports in R Markdown format. This personalized assistance has significantly improved skill building and confidence among participants and are particularly effective as they use real data and often result in a tangible product, like a table or graph, which is exciting for participants.  We’ve also seen great benefits, especially in terms of staff turnover. When staff members leave, the program still has well-documented code, making it easier for their successors to pick up where they left off. These one-on-one sessions.

Additionally, I’ve been involved in forming a Pacific Northwest Tribal coding group, which still doesn’t have an official name as it is only a few months old. It began from discussions with staff from the Northwest Indian Fisheries Commission (NWIFC) and staff from member Tribes. And I am thrilled to say we’ve already attracted many new members from staff of the Columbia River Inter-Tribal Fish Commission (CRITFC). This group is a direct offshoot of the TXG efforts with Marissa Pauling of NWIFC facilitating, and we’re excited about the learning opportunities it presents.

Our work, including the tribal assistance hours, is funded through a grant that reimburses the Penobscot Nation for the time I spend on these activities. As we move forward with the coding group, planning to invite speakers and organize events, it’s clear there’s much to share with this audience, possibly in future blogs like this one. This work is all part of our broader effort to support tribes in their environmental data management endeavors.  If anyone would like to offer their time toward these kinds of assistance, they can use the TXG google form to sign up.

How do you engage with young people?

I am deeply committed to engaging the younger generation, especially the students at Penobscot Nation’s Indian Island school (pre-K through 8th grade). In our Water Resources Program at Penobscot Nation, we actively involve these students in our river conservation efforts. We see our role as not just their employees but as protectors of the river for their future.

Sampling for Bacteria 

Our approach includes hands-on activities like taking students to the river for bacteria monitoring. They participate in collecting samples and processing them in our lab, gaining practical experience in environmental monitoring. This hands-on learning is now being enhanced with the development of the R Shiny app I’m working on with Jasmine, to make data interpretation more interactive and engaging for the students.

Recognizing their budding interest in technology, I’m also exploring the possibility of starting a mini R coding group at the school. With students already exposed to basic coding through MIT’s Scratch, advancing to R seems a promising and exciting step.

Beyond the Penobscot Nation school, we’re extending our reach to local high schools like Orono Middle School. We recently involved eighth graders, including two Penobscot Nation citizens, in our bacteria monitoring project. This collaboration has motivated me to consider establishing an R coding group in these high schools, allowing our students continuous access to these learning opportunities.

Processing bacteria sample

My vision is to create a learning environment in local high schools where students can delve deeper into data analysis and coding. This initiative aims to extend our impact, ensuring students have continuous access to educational opportunities that merge environmental knowledge with tech skills and an appreciation of Penobscot people, culture and the work being done in our program.

Over the years, witnessing the growth of students who participated in our programs has been immensely gratifying. . A particularly inspiring example is a young Penobscot woman, Shantel Neptune, who did an internship with us through the Wabanaki Youth in Science (WaYS) Program a few years back , then a data internship through TXG and is now a full-time employee in the Water Resources Program.  Shantel is also now helping to teach another young Penobscot woman, Maddie Huerth, about data collection, management, analysis and visualization while she is our temporary employee.  We’re planning sessions this winter to further enhance their R coding skills, a critical aspect of their professional development. 

It’s essential to me that these women, along with others, receive comprehensive training. Our program’s success hinges on it being led by individuals who are not only skilled but who also embody Penobscot Nation’s values and traditions. Empowering young Penobscot citizens to lead these initiatives is not just a goal but a necessity. Their growth and development are vital to the continuity and integrity of our work, and I am committed to nurturing their skills and confidence. This endeavor is more than just education; it’s about preserving identity  and ensuring our environmental efforts resonate with the Penobscot spirit and needs.

Elevate Your R Community with the 2024 RUGS Grant Program

By Announcement, Blog

The R Consortium is rolling out its 2024 R User Groups (RUGS) Grant Program, and it’s an opportunity you don’t want to miss. The program, which aims to foster vibrant R communities worldwide, is in full swing, and we are eagerly awaiting your application!

Apply here!

Why Apply and… For What?

User Group Grants: Boost engagement and initiate user-focused activities.

Conference Grants: Support for R-related events, either hosting or attending.

Special Projects Grants: Kickstart innovative projects with the potential to impact the R community.

With 74 active groups and a thriving community of over 67,000 members, the RUGS network is a hub of innovation and knowledge sharing. Your participation could be the next milestone in this growth journey.

Examples of some recent R Consortium sponsored RUGS activities:

Key Information

Application Deadline: September 30th, 2024. Don’t delay!

Eligibility: Open to initiatives aimed at community building, not software development (for that, see ISC Grant Program).

Be part of shaping the future of R. Visit here for more details and to apply. Your contribution matters to the global R narrative. Apply now, and let’s grow together!


For details and to apply, visit here.

Offa R Users Group: Empowering Data-Driven Education in Nigeria

By Blog

The R Consortium had a conversation with Anietie Edem Udokang, who is the founder and organizer of the Offa R Users Group (ORUG). He discussed the emerging local R community and the use of R for his research in time series analysis. 

The Offa R Users Group has a Meetup coming up on March 26th, 2024, titled “Test for the Assumptions of Linear Regression Using R.” The group is also seeking individuals to serve as guest speakers for their online events.

Please share about your background and involvement with the RUGS group.

My name is Anietie Edem Udokang, and I am a chief lecturer at the Federal Polytechnic Offa. I hold a Master of Science degree in Statistics. It was during my postgraduate studies that my supervisor introduced me to R, which was around 2012. Since then, I’ve been using R and have discovered that it’s far superior to some of the other software programs I had previously used.

I have found that interacting with others and utilizing specific features, such as the ability to download applications, has been incredibly beneficial to my analysis work. These special packages have helped me greatly, and I believe it is important to attach relevant packages when organizing data. This experience has made me passionate about using R for data analysis.

Ever since I began using R, I have had the privilege of engaging with a diverse group of individuals, including data scientists and software users. These interactions have led me to the realization that to continue growing and learning, it would be beneficial to establish a user group within our community. Initially, we called it the “Fedpofa R Users Group,” but later changed the name to “Offa R Users Group.” We have been organizing meetings, providing training, and engaging in other activities to keep the community vibrant.

Can you share what the R community is like in Offa?

R is not limited to academic use, but it is also used in industry. The reason for this is that polytechnics act as a bridge between the industry and academic institutions. If the students have a good grasp of how to use R, it means that industry will be directly or indirectly affected. Consultants often visit our ORUG and ask for some analysis, which we provide using R. Additionally, students also use R for their projects.

I use R for many of my publications. R has gained a lot of popularity, not only within our institution but also among sister institutions in the area. Some departments have even made R the only software that students are required to use for analysis. 

What industry are you currently in? How do you use R in your work?

I am in the education sector, and I use R for my work in time series analysis, which is my area of specialization. I rely on TSA, tseries and other related time series packages to carry out my work. For example, I used R for Modeling the Residuals of Financial Time Series with Missing Values for Risk Measures, which was my MSc project. I have also used R in the Application of the Seasonal Autoregressive Moving Average Model to Analyze and Forecast the Food Price Index (free registration required). Additionally, I used R in a paper titled “Volatility of Exchange Rates in Nigeria: An Investigation of Risk on Investment.” In another innovative project was Modelling Circular Time Series with Applications. These are just a few examples of the papers and research where I’ve personally used R.

You have a Meetup titled Test for the Assumptions of Linear Regression Using R, can you share more on the topic covered? Why this topic? 

Some authors use regression models without checking whether the assumptions hold or not. Instead of carrying out tests to confirm this, they assume that the model is valid if the assumptions are fulfilled. This topic aims to highlight the importance of carrying out such tests to ensure reliable and comprehensive results. Lack of adherence to the assumptions may lead to inaccurate conclusions. The focus will be on commonly used tests for normality, linearity, autocorrelation, heteroscedasticity/homoscedasticity, and multicollinearity, with illustrative examples using R.

I appreciate the R Consortium for their valuable RUGs grant assistance in 2022. With this grant, I could open two other user groups: the Ilorin R Users Group and the Kwara Environmental Statistics R Group. I also want to express my gratitude to the R Consortium for sponsoring my Meetup subscription and covering other minor expenses in 2022. The subscription is still ongoing, and I hope that we can continue our partnership to promote the use of R in our community. 

I would like to request for speakers to present at our R User Group. We are currently seeking speakers for our upcoming events and would be delighted to welcome speakers from all over the world to share their R-related knowledge with us.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.