Skip to main content
All Posts By

R Consortium

Salt Lake City R User Group’s Success Story: Blending In-Person and Online Events

By Blog

Last year, Julia Silge, co-organizer of the Salt Lake City R User Group discussed the group’s plans to meld in-person and online activities with the R Consortium. This year, Andrew Redd,  founder of the group, provided an update on the group’s recent and upcoming events. The group has successfully implemented its plan, with online presentations coupled with in-person networking events. Andrew also discussed his work with the Veterans Affairs and trending topics being discussed at the group’s events. 

Andrew is a Biostatistician and works as an Assistant Professor at the University of Utah School of Medicine. He also works as a Research WOC at the Department of Veterans Affairs. Andrew is also an R expert for VINCI.

Please share about your background and involvement with the RUGS group.

I was initially introduced to the R programming language during my time in graduate school at Texas A&M University. I was a member of the statistics department there and was working on my PhD. Even though Texas A&M has a close affiliation with Stata, R was the language of choice for that year. I quickly took to the language, as I already had a background in programming in C. I had also done extensive work in Mathematica during my undergraduate studies and other languages, such as Visual Basic and various other programming languages. I found R to be a pleasant language to use and am a firm supporter of open source software. As a result, I quickly became proficient in the language. I have since made a career out of working with R and have published a few early packages. One thing that brought me early recognition was my NppToR package, which is still available online. I abandoned this package when RStudio became sufficiently developed to fully utilize the capabilities that I relied on with Notepad++ as my primary editor.

I arrived at the University of Utah in 2010 and founded the Utah R Users Group shortly thereafter. The group was originally called the University of Utah and Salt Lake City R Users Group as it was centered around the university and its users. We later expanded beyond the university and now have members from all over the world. Our meetings where we present material are now fully online. We supplement these meetings with social gatherings that are not centered on presentations. Instead, we meet at local venues such as bars or ice cream shops and simply talk about R. These gatherings allow us to meet new people, see what others are doing with R, and network in a more casual setting.

Would you like to tell us about some recent and upcoming events from your group? 

The next meetup we have is in January. January is always a great meetup, as we have a tradition of doing lightning talks as our first meeting of the year. Our lightning talk series aims to highlight our local members. We prioritize our local members and give them five minutes to present an interesting project they have completed, such as a cool analysis or a new package. These presentations are low-stress and brief.

Our February event will focus on package development and the latest developments from R Studio and consortium regarding package development and maintenance. As for recent events, we had an event titled “Slide Crafting with Quarto” in November and another meetup titled “Fairness and Machine Learning” in December. We strive to provide a wide range of topics for all levels of our programming. 

Any techniques you recommend using for planning for or during the event?   

Meetup has been extremely beneficial. It did not exist when we founded the group, or at least I was not aware of it when we first organized the R users’ group. Thanks to the R Consortium grant, which pays for the Meetup page, it has proven to be a very useful tool. We are currently in a hybrid format, with all our presentations being held online. We live stream the recordings to YouTube and on Zoom, which is where we usually host them. It then simulcasts to YouTube, where it is saved. This allows anyone who wishes to do so to view our previous meetings.

We have achieved significant success, and many of our presentations have garnered a considerable number of views. While these views are not viral by YouTube standards, they are a significant number for our community. 

However, I must admit that we have always operated our organization in a manner that differs somewhat from other user groups. We have a unique culture here in Utah that makes it easier for us to meet during the day, which I know is not typical of other user groups, which typically meet in the evening. However, we have found a schedule that works for us and have stuck with it. I believe that the most important thing for anyone trying to organize a user group is to find a schedule that works and stick with it.

Please share about a project you are currently working on or have worked on in the past using the R language?

I would like to discuss my work with the Veterans Affairs. The Veterans Health Administration has the VA Informatics Computing Infrastructure (VINCI), a secure remote desktop environment for conducting research. With this infrastructure, we have access to all the VA records. Once we have approval and access, we can use tools such as R, SAS, or Stata, along with other various tools to perform all the data analysis that we need. This is a very useful resource that I have been working with for about 10 years. I am the R expert for VINCI, so I receive a lot of questions regarding R.

Most of the questions asked of us are related to connecting R with databases. In particular, I rely heavily on dbplyr, DBI, and odbc, since the VA is SQL Server based. My compliments to the team behind the DBI and odbc packages, as they have saved me from many difficult situations.

What trends do you currently see in R language and your industry? Any trends you see developing in the near future?

Everything is trending towards tidyverse and tidy principles. This has been a trend for several years now, with everything trying to be more uniform in the way it is done. This is done by treating data as data, which I really appreciate. It makes it easier to program and extend.

Our group has also had a lot of topics that are not just about R, but also about statistical analysis‌. For example, I will point out the meeting on Fairness in machine learning, which is a very important topic. If you have biases in your data going into a machine learning model, those biases can easily be propagated through the model. Sensitivity to this is something that we should all be aware of. So, we are not only talking about the hows of programming and how to work with R, but also a lot of best practices for programming. Just because something works does not mean it is necessarily the best way to do it. At least in our group, we have been very mindful of these things. 

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

In-Person Shiny in Production Conference Hosted by North East Data Science Group in Newcastle, UK

By Blog

The North East Data Scientists group hosts the annual Shiny in Production Conference. Colin Gillespie shared the details of the event, how it’s grown in its second year, and how planning is proceeding for next year. We talked with Colin a year and a half ago. We wanted to find out about recent activities with the North East Data Science group and ask about the Shiny in Production Conference. Colin also talked about R’s prevailing use in academia and Newcastle’s industry.

Colin holds a PhD in statistics from the University of Strathclyde. He is a senior lecturer at Newcastle University and also CTO of Jumping Rivers. He is the author of the book “Efficient R Programming” published by O’Reilly Media.

Please share about your background and involvement with the RUGS group.

I’m Colin Gillespie, and I’ve been using R for a long time. I started using it in 1999-2000 during my PhD, so I’ve used it for about 23-24 years. After my PhD, I did the usual academic career stuff. Then, in 2016, I co-founded Jumping Rivers, which does R consultancy.

In terms of events, we established the R user group in 2015 or 2016. However, the R user group lasted a few years before we rebranded to North East Data Science, which now covers a mixture of R and other data science topics. Sometimes, the topics are more specific to R, while at other times, they are more general. The benefit of changing the name was that five times more people attended the event. We went from 10 people to perhaps 30 or 40 – a significant increase. These events have been running continuously since 2016. We had a meetup two months ago, and another one is scheduled for January.

Can you share what the R community is like in Newcastle? 

Newcastle is a city in the northeast of England. The city has three universities with strong academic involvement in R. All three universities use R for undergraduate teaching. They also use it extensively for postgraduate teaching. In terms of businesses, several government agencies and banks are located in the city, and these organizations also use R. In addition, a few other companies in the city have adopted R.

You hosted the Shiny in Production Conference in October 2023. Why focus on Shiny? 

Shiny in Production, 2023 Speakers

The motivation behind starting this conference was that no in-person Shiny conference was being run. There are more generic R and Posit Conferences that are hosted every year. We do a lot of Shiny and did not see an in-person conference with tutorials and talks. 

It’s the second time we have hosted this conference. We held it last year in October and held it again this year on the 12th and 13th of October. We have also set dates for Shiny in Production 2024, which will also be in October.

This year, we had a variety of speakers. We had three tutorials: one on Python and Shiny, another on React for Shiny apps, and another on testing Shiny apps. The following day, we had a TOPS, a combination of invited and submitted abstracts. We had over 100 people attend, and they represented a variety of industries, including pharma, banking, insurance, tech startups, academia, and marketing companies.

The talks are recorded, and the recordings are available on YouTube but take a little while before being released. We spend substantial time editing videos and adding subtitles to ensure they are fully accessible. The process takes several weeks to complete.

What trends do you currently see in R language and your industry? Any trends you see developing in the near future?

For many attendees at the conference, the focus was on improving an existing application. These data scientists had learned a little bit about Shiny and were able to create something quite nice quickly. However, a point is reached where they need to take it to the next level – for example, ensuring that the app deploys consistently and has tests. 

One area that we have been working with several companies on is making Shiny apps accessible. This involves adhering to the WCAG 2.1 guidelines, which are guidelines that help to ensure accessibility. For example, the guidelines address whether a screen reader can navigate the app and whether keyboard shortcuts can be used. So, are the colors sensible, and do they have sufficient contrast?

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

Scaling R to Use with Enterprise Strength Databases

By Blog

This blog is contributed by Mark Hornick, Senior Director, Oracle Machine Learning. Oracle is an R Consortium member.

Hey R users! You rely on R as a powerful language and environment for statistical analysis, data science, and machine learning. You can use R with a database to analyze and manipulate data. R provides various packages and tools for working with databases, allowing you to connect to a database, retrieve data, perform analyses, and even write data back to the database. 

Many of you also work with data in—or extracted from—Oracle databases. As data volumes increase, moving data, scaling to those volumes, and deploying R-based solutions in enterprise environments complicates your life, but Oracle Machine Learning for R can simplify it! 

Oracle Machine Learning for R 2.0

With OML4R 2.0, we’ve expanded the set of built-in machine learning algorithms you can use from R with Oracle Database 19c and 21c. Now you also have in-database XGBoost on 21c and Neural Networks, Random Forests, and Exponential Smoothing on 19c and above. In-database algorithms allow you to build models and score data at scale without data movement. 

In-database algorithms can now give you explanatory prediction details through the R interface – just like in the SQL and Python APIs. Prediction details allow you to see which predictors contribute most to individual predictions. Simply specify the number of top predictors you want and get their names, values, and weights. 

> IRIS <- ore.create(iris, table='IRIS')     # create table from data.frame, return proxy object

> MOD  <- ore.odmRF(Species~., IRIS)         # build random forest model, return model proxy object

> summary(MOD)                               # display model summary details

Call:

ore.odmRF(formula = Species ~ ., data = IRIS)

Settings:

                                               value
clas.max.sup.bins                                 32

clas.weights.balanced                            OFF

odms.details                             odms.enable

odms.missing.value.treatment odms.missing.value.auto

odms.random.seed                                   0

odms.sampling                  odms.sampling.disable

prep.auto                                         ON

rfor.num.trees                                    20

rfor.sampling.ratio                               .5

impurity.metric                        impurity.gini

term.max.depth                                    16

term.minpct.node                                 .05

term.minpct.split                                 .1

term.minrec.node                                  10

term.minrec.split                                 20
Importance:

  ATTRIBUTE_NAME ATTRIBUTE_SUBNAME ATTRIBUTE_IMPORTANCE

1   Petal.Length              <NA>           0.65925265

2    Petal.Width              <NA>           0.68436552

3   Sepal.Length              <NA>           0.19704161

4    Sepal.Width              <NA>           0.09617351

> RESULT <- predict(MOD, IRIS, topN.attrs=3) # generate predictions w/details, return proxy object

> head(RESULT,3)                             # view result

  PREDICTION       NAME_1 VALUE_1 WEIGHT_1      NAME_2 VALUE_2 WEIGHT_2       NAME_3 VALUE_3 WEIGHT_3

1     setosa Petal.Length     1.4    6.717 Petal.Width    .200    5.932 Sepal.Length     5.1     .446

2     setosa Petal.Length     1.4    6.717 Petal.Width    .200    5.932 Sepal.Length     4.9     .446

3     setosa Petal.Length     1.3    6.717 Petal.Width     200    5.932 Sepal.Length     4.7     .446

Figure 1: Code to build an in-database Random Forest model using a table proxy object and use it for predictions

With datastores, you can store, retrieve, and manage R and OML4R objects in the database. To simplify object management, you can also easily rename existing datastores and all their contained objects. And you can conveniently drop batches of datastore entries based on name patterns. The same holds with the R script repository for managing R functions – load and drop scripts in bulk by name pattern.

> x <- stats::runif(20)                   # create example R objects

> y <- list(a = 1, b = TRUE, c = 'value')

> z <- ore.push(x)                        # temporary object in the database and return proxy object

> ore.save(x, y, z, name='myDatastore',   # save objects to datastore 'myDatastore' in user's schema

           description = 'my first datastore')

> ds <- ore.datastore()                   # list information about datastores in user's schema
> ore.move(name='myDatastore', newname='myNewDatastore') #rename a datastore

> ore.move(name='myNewDatastore',                        #rename objects within a datastore

           object.names=c('x', 'y'), 

           object.newnames=c('x.new', 'y.new'))

> ore.datastoreSummary(name='myNewDatastore') # display datastore content
A screenshot of a computer code

Description automatically generated

Figure 2: Code illustrating datastore functionality to store and manage R objects in the database

Some background

OML4R lets you tap into the power of Oracle Database for faster, scalable data exploration, transformation, and analysis using familiar R code. You can run parallelized in-database ML algorithms for modeling and inference without moving data around. And you can store R objects and user-defined functions right in the database for easy sharing and hand-off to application developers. You can even run R code from SQL queries and, on Autonomous Database, REST endpoints. OML4R now uses Oracle R Distribution 4.0.5 based on R 4.0.5.

Last December, as part of the R Consortium’s R/Database webinars, we presented Using R at Scale on Database Data. We highlighted the release of OML4R for Oracle Autonomous Database through the built-in notebook environment. But now you can use the same client package to connect to Autonomous Database, Oracle Database, and Oracle Base Database Service, too. 

Getting started

Getting started is easy. Download and install the latest OML4R client package in your R environment and use your favorite IDE.  If you’re working with Oracle Autonomous Database, you’re good to go. If you’re using Oracle Database or Oracle Base Database Service, install the OML4R server components as well. Links are below!

Get OML4R 2.0 here for use with Oracle Database and Autonomous Database and try OML4R on your data today. We think you’ll like it! 

For more information

Empowering R in Paris: Mouna Belaid’s Journey with R-Ladies and the French R Community

By Blog

Mouna earned the Opportunity Scholarship for the Posit 2023 Conference and thoroughly enjoyed the amazing experience.

The R Consortium recently interviewed Mouna Belaid, a co-organizer of R-Ladies Paris, who provided insights into the growth of the R community in Paris, especially in the French language. Dedicated channels are available for French speakers to seek assistance and other resources.

Please share about your background and your involvement in the R Community. What is your level of experience with the R language?

Mouna met and received the ‘R for Data Science – 2nd edition’ book offered by Posit, signed by both Hadley Wickham and Mine Çetinkaya-Rundel, two of the book’s co-authors, during the book reception at the Posit 2023 Conference in Chicago.

My name is Mouna, and I’m originally from Tunisia, a country in North Africa. Currently, I’m based in Paris, France. Professionally, I’m an engineer with a degree in statistics and data analysis from the Tunisian Higher School of Statistics and Data Analysis. Additionally, I hold double research master’s degrees from the School of Engineers of Tunis and Université Paris Cité, where I completed my academic journey in Tunisia. This is also where I began my career as a data scientist in the banking sector before moving to Paris. Now, I work as a data consultant at ArData, a French consulting company specializing in data science.

I’m particularly skilled in analytic tools like R, Python, SQL, Alteryx, and Power BI. Recently, I’ve been focusing on R in my current job, developing Shiny applications and working on data visualization projects. My journey with R began during my engineering studies in Tunisia, where it was a core part of our curriculum.

Beyond my professional work, I’m deeply involved in the R community. I’m a co-organizer of the R-Ladies Paris Community and a co-founder of the R Ladies initiative in Tunisia. Upon moving to France, I reactivated the R-Ladies Paris chapter. I’m also a part of the R-Ladies global team, responsible for onboarding new chapters, and a certified Carpentries instructor. ​I had the pleasure of being a member of the Program Committee at the Shiny 2023 conference organized by Appsilon.

I’m always eager to learn more and contribute to the R and data science fields, and I’m excited about my ongoing journey and experiences in these areas.

We would like to get to know you more on the personal side. Can you please tell me about yourself? For example, hobbies/interests or anything you want to share about yourself. 

I’m really active and enjoy being on social media, particularly LinkedIn. I dedicate significant time to sharing updates on new achievements and staying connected with our community. I’m also very interested in exploring analytical tools on LinkedIn, like tracking key indicators and the performance of my posts. If I weren’t a data consultant, I would be thrilled to work as a community manager, managing social media platforms.

Couscous, a traditional Tunisian dish

In addition to my professional interests, I have a passion for cooking, especially traditional Tunisian dishes like couscous. I really love engaging in this culinary art. That pretty much sums it up!

What industry are you currently in? How do you use R in your work?

I work in the public sector, specifically at the Ministry of Health in France. I am in the Directorate of Research, Studies, Evaluation, and Statistics. My main responsibility involves migrating code scripts from SAS to open-source tools like R and Python. This is a significant initiative we are undertaking right now. As part of my role, I oversee the technical aspects, including understanding the scripts and collaborating with the business teams and the script developers. My goal is to effectively migrate these scripts into R, primarily using the tidyverse package. In addition, I provide professional training sessions about R. These tasks form the core of my current responsibilities.

Why do industry professionals come to your user group? What is the benefit of attending?


Mouna presented the R-Ladies Paris community at the “Rencontres R” local conference dedicated to R users, held in Avignon, France, in June 2023.

In my experience with R-Ladies Paris, I believe we’re doing our best to create a safe and inclusive space for everyone. We organize not just in-person meetups but also online ones to enhance accessibility. To ensure everyone can benefit, we record these presentations and workshops and then share them on our YouTube channel. Our topics cover a wide range, suitable for various skill levels, from beginners to experts. These include data visualization, presentations, interdisciplinary work, and even how to develop an R package.

We also collaborate with other sectors, including those focused on Python programming, making our meetups relevant for a diverse audience. Beyond the technical aspects, we host social events; for instance, last October, we had a wonderful gathering in a bar in Paris. Additionally, we maintain a dedicated page on GitHub where summaries and recordings of all our meetups are available. That sums up my involvement and our initiatives at R-Ladies Paris.

What trends do you currently see in R language and your industry? Any trends you see developing in the near future?

I believe that the data science landscape has been significantly elevated, particularly through the capabilities of working with the tidyverse framework. The announcement of ‘quarto‘ and the wonderful projects developed using it has been revolutionary, enhancing our projects’ recommendation and reproducibility aspects. I’d like to talk about the use of R in France specifically. The community here has been evolving continuously.

INSEE, the National Institute of Statistics and Economic Studies in France, has been crucial in promoting open-source tools. This is evident in their GitHub repositories, where they advocate strongly for the use of R and Python. Several public and academic institutions in France have also adopted R. There’s a dedicated Slack space for R users in France. This space is primarily for discussions in French, where individuals can discuss R-related news, seek assistance, and ask questions. It’s a vibrant community where people actively participate and provide answers. That summarizes my perspective on the evolution and impact of data science tools, especially R, in France.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

Introduction of R activities in Japan 

By Blog

Guest blog contributed by Rikimaru Nishimura, Statistical Programming, Janssen Pharmaceutical K.K.

I would like to introduce the task force of Open-Source Software (OSS) in Data Science Expert Committee, Drug Evaluation Committee of Japan Pharmaceutical Manufacturers Association (JPMA), which is one of the R activities in Japan.

This task force has started its activities since 2022 and currently consists of 10 members from pharmaceutical companies. The purpose of this task force is to investigate the use of OSS which is being more actively used in the pharmaceutical industry, especially for the analysis of clinical trial data and work related to regulatory submission, and to compile and publish a report on the expected benefits and issues when OSS is used.

The document titled “Utilization and Considerations for Open-Source Software” was released by JPMA last year. In the document, current activities related to R in the pharmaceutical industry, the challenges of using R for regulatory submission, package versions and operation managements, and examples of R training are introduced.

In addition, a survey report on the actual use of R in pharmaceutical companies has also been released. Survey results show that more than half of pharmaceutical companies in Japan are already using R and half of the companies have already submitted documents generated by R to regulatory authorities. On the other hand, the task force found that many companies have concerns about using R, such as reliability of open-sourced packages.

There are also challenges in using R for submission. PMDA does not require the use of specific software for the data management and analysis of clinical trial data for the purpose of the submission. Therefore, the choice of software to be used is left to the applicant, and there is no problem in using R for the purpose of submission. However, it is necessary for the applicant to conduct verification work to ensure the quality of the software (reliability of analysis results) and document the procedures and results.

The main theme of this year’s task force is to collect and publish examples of R and Python utilization in the pharmaceutical companies in Japan. The task force is planning to investigate examples of OSS utilization in clinical development and share the introduction procedures, usage environment, software including packages, and utilization effects. The task force will also continue to consider the use of R for submission same as last year.

Finally, I believe that the use of R will become even more active in the pharmaceutical industry in Japan. I will continue to work closely and actively with the R consortium and other external organizations to contribute to the increased use of R throughout the pharmaceutical industry.

About the Author

Rikimaru Nishimura has worked as a Statistical Programmer for Janssen Pharmaceutical K.K. since February 2015. He is responsible for statistical analysis in clinical trials and e-Data submission to PMDA. Before working in the pharmaceutical industry, he has experience developing bank accounting and customer management systems in Japanese technology company. Also, he is a start-up member of the open-source software task force in Japan Pharmaceutical Manufacturers Association.

New R/Insurance Webinar Series: The Journey from Excel to High-Performance Programming in R

By Blog, Events

Introduction:

The actuarial profession is on the cusp of a transformation spearheaded by integrating programming into the traditional spreadsheet-based workflow. Georgios Bakoloukas, Head Model and Analytics, Swiss Re and Benedikt Schamberger, Head Atelier Technology & AI Consulting, Swiss Re, are at the forefront of this paradigm shift. 

This blog encapsulates 4 upcoming webinars showcasing the journey from Excel to high-performance programming in R. The webinars are scheduled to be broadcast in January 2024, and you can participate directly. Beginners are welcome.

Webinars by Georgios Bakoloukas:

  1. From Excel to Programming in R

            Date: January 10th, 2023

            Duration: 30 minutes

Georgios will introduce a compelling case for why actuaries should embrace R programming while continuing to use Excel. The session promises to demystify the transition, using insurance industry examples, to demonstrate the benefits of adopting a programming mindset alongside familiar spreadsheet computing.

  1. Putting R into Production

            Date: January 17th, 2023

            Duration: 30 minutes

The follow-up webinar takes the discussion from solving problems in R to sharing these solutions. Georgios will cover documentation, testing, and dissemination using R’s packaging, Web API creation, and GUI generation through Shiny. This practical approach ensures that coding efforts are not siloed but add value across the organization.

Webinars by Benedikt Schamberger:

  1. R Performance Culture

            Date: January 24th, 2023

            Duration: 30 minutes

Benedikt will address the nuances of optimizing code performance in R. Acknowledging that premature optimization might be detrimental, he will provide insights into when and how to refine code. The session includes a look at R’s design philosophy regarding performance and the tools available for tuning.

  1. High-Performance Programming in R

            Date: January 31st, 2023

            Duration: 30 minutes

Continuing the performance theme, Benedikt will delve into the limitations of CSV files and introduce binary formats that bolster efficiency. He will explore the arrow R package and the Parquet file format, demonstrating their potential to reduce time and disk space requirements significantly.

Conclusion:

These webinars offer a roadmap for actuarial professionals keen on enhancing their toolkit with programming capabilities. As the industry evolves, the skills taught by Georgios and Benedikt will become increasingly indispensable, marking a step towards a more innovative and efficient actuarial future.

Please join us to learn more about actuarial science here.

Webinar: Use of R in Japan’s Pharma Industry

By Blog, Events

The use of the R programming language has seen exponential growth across industries, with its role being especially pronounced in the pharmaceutical sector. Japan stands out in this growth narrative, thanks in part to the initiatives of the Japan Pharmaceutical Manufacturers Association (JPMA). Before diving into the webinar insights, understanding JPMA’s profound influence and contributions is essential.

About JPMA

Established in 1968, JPMA is a voluntary consortium of 71 research-centric pharmaceutical firms. With its core mission of “realizing patient-oriented healthcare,” JPMA has tirelessly worked towards enhancing global healthcare through the formulation of pioneering ethical drugs.

Summary of the Webinar’s Concept

The Adoption of R in Japan’s Pharma Industry talk and panel discussion will be led by key industry experts from the JPMA R Task Force Team. The webinar explores the usage and adoption of the R in the pharmaceutical industry, specifically focusing on the findings of the JPMA Drug Evaluation Committee, R Task Force Team, and the JPMA Report. The report references various open source works and publications, including those from the R Consortium pilot submissions, Package Validation Hub, and webinar training.

Agenda of the Webinar:

  • General Background (10-15 minutes)
    • PMDA: Highlighting the intricacies of submissions with R.
    • JPMA R Task Force: Delving into past activities and shedding light on future initiatives.
  • JPMA Survey Report (15-20 minutes)
    • Background: Understanding the rationale and motivation behind the JPMA survey.
    • Results: Discuss key findings and their implications for the industry.
  • Q&A Session (15 minutes)

Date and Time: Tues, Jan 9, 2024 at 9:00 am JST / Jan 8, 2024 at 4:00 pm PT / 7:00 pm ET

Conclusion:

Under the auspices of JPMA’s guiding principles, the webinar will present a comprehensive glimpse into R’s evolving role in Japan’s pharmaceutical landscape. The dedication of JPMA, combined with the increasing relevance of R, signifies an exciting chapter in the pharmaceutical domain’s data-driven journey.

Join the JPMA webinar to learn more about R’s pharmaceutical achievements! 

Register here!

femR: Bridging Physics and Statistics in R with Support from the R Consortium

By Blog

Laura M. Sangalli is a professor of Statistics at Politecnico di Milano, Italy. Her research interests include functional data analysis, high-dimensional and complex data, spatial data analysis, and biostatistics. With a team composed by Aldo Clemente, Alessandro Palummo and Luca Formaggia at Politecnico di Milano, and Eleonora Arnone at the University of Turin, Laura created the femR, a package for applying Finite Element Methods to solve second-order linear elliptic Partial Differential Equations over two-dimensional spaces. A grant from the R Consortium supported this project.

“femR: Finite Element Method for Solving PDEs in R” was funded by the R Consortium in 2022. What is the current status?

The femR project, generously supported last year by the R Consortium, aimed to develop a package implementing Finite Element Methods (FEM) for solving simple yet flexible forms of Partial Differential Equations (PDEs), specifically second-order linear elliptic PDEs across general two-dimensional spatial domains. This goal has been successfully achieved, and the project is now complete. However, development on the femR package continues, extending beyond the original proposal funded by the R Consortium. 

The femR package is accessible in its GitHub repository at https://github.com/fdaPDE/femR. It includes documentation and initial vignettes. Key functionalities of femR, as outlined in the proposal, are:

  • Providing an interface with the RTriangle package, enabling users to build two-dimensional triangular meshes from the boundaries of spatial domains.
  • Constructing finite element bases over these triangulations.
  • Solving elliptic PDEs using finite element discretization.

The project’s broader objective was to introduce the R community to a foundational tool for applying Finite Element Methods to certain PDE problems. This is especially relevant considering the R community’s prevalent use of the deSolve package, which employs finite differences for discretization in differential problems.

During the development of femR, the team engaged with Karline Soetaert and Thomas Petzoldt, the authors of deSolve. These interactions led to the realization that the deSolve community had a keen interest in space-time PDE problems—a domain femR had not initially planned to address in its original proposal. Responding to this need, the team began developing capabilities within femR for solving time-dependent PDE problems, expanding the package’s scope and utility beyond its original mandate.

In summary, the femR project has not only achieved its primary goal of providing a tool for solving specific types of PDEs using FEM in the R environment but has also adapted and grown in response to community feedback and evolving scientific needs, particularly in the realm of space-time PDE problems.

What is your background, and why did you decide to propose this project?

My main research interest lies in developing Physics-Informed statistical models for spatial and functional data observed over complex multidimensional domains, such as the brain’s surface, the gray matter’s volume, a spatial region with a complex conformation, or a road network. In regularizing terms, these Physics-Informed statistical models use PDEs to embed knowledge of the Physics of the underlying phenomenon and the geometry of the domain into the statistical models. 

I love these methods since they bind together two of the richest and most powerful modeling frameworks, from statistics and mathematics. On the one hand, the regression/maximum likelihood framework for the empirical model, and on the other hand, PDEs are the most powerful mathematical tool to model complex phenomena and behaviors. 

The Physics-Informed statistical models we develop leverage on Finite Elements discretizations of the estimation problems, and they are implemented in the R package fdaPDE, which is available from CRAN for almost ten years now. This package fills a significant gap in R’s capabilities, as there was a notable absence of direct implementations of FE methods for solving PDEs, as also highlighted in the CRAN Task View on Differential Equations.

As we already discussed, deSolve enables solving some simple forms of PDEs through finite differences. However, finite differences work over regular tensorized domains (rectangular domains) and are not the natural choice to solve PDEs on more complex domains. Moreover, they cannot achieve the same level of accuracy of FEM.

Therefore, the need for an R package capable of solving PDEs using the FEM to complement deSolve. The development of femR addresses this need. It provides solutions for PDEs on complex domains and enhances accuracy, aligning perfectly with my research interests in advancing sophisticated, Physics-Informed statistical models.

Finite Element Methods (FEM) are used to solve partial differential equations (PDEs). In what situations is it used most often? Can you give some basic examples of when you would use R and FEM to solve a PDE?

Partial Differential Equations (PDEs) are extensively used in all sciences and engineering fields to model complex phenomena behaviors. However, PDEs do not usually possess closed form solutions and are typically solved numerically through discretization methods. 

The Finite Element Method stands out as one of the most popular, computationally efficient, 

and versatile approaches for discretizing and solving PDEs. (Editor’s note: In applied mathematics, discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts.) This method enables the solution of PDEs on multidimensional domains with complex shapes and various boundary conditions with high accuracy.

The femR package I’ve been discussing exemplifies the application of FEM in addressing a range of partial differential problems. It can handle both spatial and spatiotemporal diffusion-transport-reaction problems. A common and straightforward example is heat conduction, modeled by the well-known heat equation. This capability finds numerous applications in various fields.

In environmental sciences, for instance, femR can be utilized to study the dispersion of pollutants released in water or air, which are transported by currents or winds. It is also adept at modeling temperature variations and pressure. In biomedicine, femR could be used to study electrophysiology and the mechanics of organs, including tumor growth dynamics.

These examples demonstrate the wide-ranging applicability of femR in solving complex PDEs across diverse scientific disciplines. The package’s ability to handle simple forms of elliptic and parabolic PDEs and more complex PDE scenarios makes it a valuable tool in the scientific community. Moving forward, there are plans to continually enhance femR, expanding its scope to address increasingly complex differential problems and applying it to additional fields of study.

Since FEM implementations were not available in R in the past, statisticians had to use external software. Now you can do this using R. Can you explain what the before and after looks like for a statistician who is using R? How big of a change is it?

Including PDEs in data analysis methods allows us to incorporate problem-specific information that may come from the problems in physics, chemistry, mechanics, or morphology of the problem into the statistical model. These physics-informed methods are gaining increasing attention from the scientific community. For example, Physics-informed Neural Networks represent a cutting-edge research direction in Artificial Intelligence. 

PDEs are not only pivotal in advanced fields like AI but are also fundamental components of epidemiological models. Additionally, they are being increasingly used in spatial data analysis and functional data analysis. These applications are particularly useful for modeling data observed over complex domains and addressing issues like anisotropy and non-stationarity.

As the complexity of data analysis challenges our community faces grows, there is an escalating need for new data analytics methods that effectively combine empirical and physical models. It’s becoming increasingly important for statisticians to be familiar with PDEs as a fundamental tool for modeling complex phenomena.

In this evolving landscape, the femR package plays a crucial role. It offers a built-in toolbox within the R environment for solving PDEs using Finite Element Methods (FEM), thus eliminating the need for users to depend on external environments. We have invested significant effort in creating an intuitive R interface. This interface simplifies formulating differential problems, making it as easy as writing them down on paper. We are confident that the availability of femR, a native and user-friendly package, will greatly aid and assist the statistical community in developing new physics-informed models.

FemR complements deSolve (https://desolve.r-forge.r-project.org/). Are there examples and documentation to help users with femR?

We have already prepared documentation and created initial vignettes for our package, available on the GitHub repository at https://fdapde.github.io/femR/.  We plan to continue enriching these vignettes with more content.

We aim to offer a tool that may complement deSolve. As mentioned earlier, we have been discussing with the authors of deSolve to understand the needs of the deSolve community. We decided to delay the release of the package on CRAN to enable a testing period in which interested users may start using the package, and we may get feedback. These are crucial as they may help us tailor the package in the most convenient way to the community. This testing period will help us avoid potential backward compatibility issues that we might otherwise encounter and ensure better integration with deSolve. 

As I mentioned, we are also actively working on incorporating other functionalities that interest the deSolve community. In particular, as part of the future development of the package, we will aim to introduce non-linear PDE terms, such as logistic terms. Indeed, adding non-linear forcing terms will enable the modeling of various biologically relevant problems, that cannot currently be tackled within femR, perhaps within the next year.

How was it working with the R Consortium? Would you recommend it to other people who work with the R programming language? Will you continue to develop femR?

We’ve had a great experience with the R Consortium, and we would recommend applying for funding to anyone who has the potential to develop packages that can benefit the R community. 

Furthermore, the grant enabled us to fund young collaborators, offering them the opportunity to gain research experience and encouraging them to pursue a Ph.D. In particular, through the grant, we supported two graduated students, Aldo Clemente and Alessandro Palummo, co-authors of femR. Aldo and Alessandro are already enrolled in a Ph.D. here at Politecnico di Milano, so we will keep working on femR and enlarge the scope of this new toolbox in several different directions toward more complex PDEs to model phenomena over different domains. We believe this may support the R community in developing new cutting-edge data analysis methods, interfacing statistical methodology and physical models. This will facilitate exploring new avenues of research.

There is a new frontier here in science! Seeds of something big! 

About ISC Funded Projects

A major goal of the R Consortium is to strengthen and improve the infrastructure supporting the R Ecosystem. We seek to accomplish this by funding projects that will improve both technical infrastructure and social infrastructure.

R Implementation in China’s Pharmaceutical Industry in 2023

By Blog

Guest blog contributed by Wenlan Pan, Statistical Programmer Analyst, Johnson & Johnson

Transforming the Landscape – The Rise of R in the China Pharmaceutical Industry

In recent years, the China Pharmaceutical Industry has undergone a significant transformation in data analysis practices, driven by the growing interest in utilizing R, a powerful statistical programming language, as an alternative to SAS. This article aims to delve into the significant strides made in R implementation within the industry in 2023, focusing on key meetings, notable examples of R implementation, and panel discussions surrounding the utilization of sharing R-generated reports and the harmonious development of open-source packages. 

Key Meetings – Driving R Implementation in 2023

Several influential meetings and discussions in 2023 played a pivotal role in promoting the widespread adoption of R within the China Pharmaceutical Industry. Two meetings were specifically focused on R, while the other two had a broader scope with a growing emphasis on R-related topics. These meetings provided platforms for professionals to exchange knowledge, foster collaboration, and present innovative ideas:

1. The first China Pharma R User Group (RUG) Meeting

It was the first conference of its kind. This groundbreaking event brought together over 300 participants from Shanghai, Beijing, and online on March 31st. With 13 presentations from nine leading companies, the meeting served as a platform for professionals to share knowledge and explore innovative solutions with R for the pharmaceutical industry. It highlighted the growing importance of R in this field and allowed participants to delve into the latest developments, powerful R packages, and breakthrough methodologies that benefit the industry.

2. Open Source Clinical Reporting SummeR 2023 workshop

Hosted by Roche on August 29, this workshop emphasized the importance of open-source solutions and collaboration for clinical data reporting. Through interactive sessions, industry experts shared their experiences using open-sourced R packages like Admiral, NEST, and tidytlg, for tasks such as SDTM mapping, ADaM data creation, and TLG creation. The workshop featured eight presentations, demonstrated the effectiveness of R in generating clinical reports, and provided valuable insights into the successful utilization of open-source R packages for efficient clinical reporting.

3. Pharma Software Users Group (Pharma SUG) 2023 and PHUSE Single Day Events

These two meetings had a boarder focus, each centered around a specific theme. The first meeting focused on “New Policy, New Technology & New Opportunity”, while the second had a theme of “Standardisation-Driven End-to-End Automation”. Coincidentally, these events witnessed a surge in the number of papers and presentations dedicated to R implementation. During these meetings, professionals presented their research and experiences, highlighting how R, when coupled with domain-specific knowledge and standards, contributed to advanced analytics and navigation of complex challenges. It is worth noting that PharmaSUG 2023 also offered pre-conference training on R titled “Deep Dive into Tidyverse, ggplot2, and Shiny with Real Case Applications in Drug Development.” This training provided participants with practical skills for leveraging R’s powerful data visualization and analysis capabilities.

Exemplifying R’s Potential – Notable Examples of R Implementation

R has found wide-ranging applications in the pharmaceutical industry. Here are notable examples:

1. Regulatory Submissions and Reporting

R’s open-code nature enables the development and utilization of open-source projects focused on implementing or developing CDISC standards. In addition to writing R open codes, open-source packages are leveraged. For instance, OAK automates the mapping of CDASH to SDTM and generates raw synthetic data. Admiral, a modularized toolbox, facilitates the development of ADaM datasets in R. R packages like tidytlg make it easier to create tables, listings, and graphs (TLG) for clinical study reports. Notably, several R-based tools have already been officially recognized by the CDISC Open-Source Alliance (COSA) as open-source projects focused on implementing or developing CDISC standards. Another example is Dataset-JSON – R Package Implementation, which allows users to read and write JSON files while also providing functions to update metadata on the dataset. This could help meet the requirements of regulatory submission and other data exchange scenarios. Valuable experience in the development and implementation of such packages in practice had been shared during the meetings.

2. Statistical Analysis and modelling

R could be extensively used for statistical analysis and modeling in clinical trials since it provides a wide range of statistical functions and packages for efficacy and safety analysis. It offers alternative approaches of SAS like mixed-effect models for repeated measures (MMRM) and negative binomial regression, which may require a combination use of multiple packages but also indicate the flexibility of R considering that users can freely choose any packages as they prefer. Furthermore, R allows for the development of custom packages tailored to specific analysis needs, providing specialized functionalities, and enhancing overall data analysis processes.

3. Quality Control and Validation

R offers comprehensive tools and functions for ensuring stringent quality control and validation processes in data analysis and reporting. This is particularly useful during the transition period when validating the outputs produced by SAS using R. R’s built-in validation functions, combined with customized scripts, provide confidence in the accuracy and consistency of results. For example, R allows for the comparison of data frames and reports, offering a fast and efficient way to execute validation checks and generate a summary report, which is user-friendly and flexible.

4. Data Visualization and Interactivity 

R’s Shiny package has revolutionized data visualization in the pharmaceutical industry.  Going beyond traditional methods, Shiny enables the development of interactive dashboards. This empowers stakeholders to dynamically explore and interpret data, facilitating timely, data-driven decision-making. There are several examples of R shiny apps shared during the meetings, such as those for Prostate-Specific Antigen (PSA) navigation, baseline shiny framework for standard safety tables and figures as well as efficacy modules, and support popPK analysis even for users without known any programming knowledge.

Panel Discussion – Report Sharing and Open-Source Package Development

During the meetings, concerns were also proposed, and panel discussions were conducted with the joined experts from various companies. 

Concerns were expressed about the direct use of R for internal and external sharing of reports.  Multinational pharmaceuticals and regulators are exploring and attempting to use the new R language written programs to make submissions, marking a shift from submissions that were mainly based on the SAS language in the past. The industry actively engaged in this process and expected the results of pilot studies to evaluate the feasibility and effectiveness of this transition.

Balancing the development of open-source packages and utilizing packages from other companies is another concern within the industry. Organizations are better off adopting an ecosystem-driven approach, evaluating the strengths and weaknesses of different solutions. Active participation in the open-source community empowers organizations to contribute to the development of packages, thereby advancing the industry’s collective knowledge and capabilities.

Embracing a Data-Driven Future in the China Pharmaceutical Industry

The pharmaceutical industry in China is rapidly adopting R for data analysis. Key meetings and discussions have forested collaboration, knowledge-sharing, and innovation. Examples have showcased the vast array of applications, illustrating how R has been implemented in regulatory submissions and reporting, statistical analysis and modeling, quality control, and validation, as well as data visualization and interactivity. Concerns also arise and have been discussed regarding report sharing and the balance between the development and utilization of open-source packages. Embracing R as an essential tool ensures competitiveness and positions organizations at the forefront of scientific progress in the evolving pharmaceutical landscape.

About the Author 

Wenlan Pan is a Statistical Programmer Analyst at Johnson &amp; Johnson with over two years of programming experience in the pharmaceutical industry, currently supporting neuroscience studies. She has been using R for seven years and received a Master of Science degree in Biostatistics from the University of California, Los Angeles.

R Validation Hub Community Meeting – December Recap

By Blog

After a brief hiatus, the R Validation Hub recently reconvened for its community meeting, celebrating a year of remarkable achievements and setting the stage for future endeavors.

In 2023, the R Validation Hub was present at these top conferences: useR!, Posit:conf, and R/Pharma. And riskassessment was awarded the title for “Best App” at Shiny Conf 2023

If you want to connect with us in 2024, please do so! The Regulatory R Repository workstream supports a transparent, cross-industry approach of establishing and maintaining a repository of validated R packages. Join us! 

Meeting Notes

 

In 2023, a new R Validation Hub structure, including Head, Executive Committee, and workstreams, was established. Doug Kelkhoff is taking over for Andy Nicholls as Head. The executive committee governs the R Validation Hug. Here are the committee members, as of August 2023:

  • Doug Kelkhoff (Chair)
  • Joe Rickert (R Consortium)
  • Preetham Palukuru (R Consortium)
  • Juliane Manitz (Communications Workstream Lead)
  • Coline Zeballos (Repositories Workstream Lead)
  • Eric Milliman (riskmetric Workstream Lead)
  • Aaron Clark (riskassessment App Workstream Lead)

Full Meeting Slides available here!

Additional Resources

R Validation website: https://www.pharmar.org/  

Thank you!

Thank you from the R Validation Hub and R Consortium Community for your support and interest in 2023. Welcome 2024!