R Consortium, Author at R Consortium

May 30

R4HR in Buenos Aires: Leveraging R for Dynamic HR Solutions

By R Consortium Blog

Marcela Victoria Soto, co-organizer of the R4HR -Club de R para RRHH, Buenos Aires, Argentina, recently updated the R Consortium about the group’s recent activities. Last year, Sergio García Mora, the group’s founder, discussed the adoption and expansion of R in human resources in Argentina. Marcela emphasized the importance of data analysis for informed and agile decision-making for companies in Argentina. She also shared details of some of her budgeting, accounting, and annual income tax projects.

R4HR is holding an online event called “Data Visualization in HR” on June 1, 2024, for Spanish-speaking R users. The meetup will be conducted via Google Meet.

Please share your background and involvement with the RUGS group.

I earned a bachelor’s degree in labor relations and received training as a labor relations teacher at the University of Buenos Aires (UBA). Additionally, I completed a postgraduate course in Human Resources Management from the Pontifical Catholic University of Argentina (UCA) and a Diploma in Computational Social Sciences from the National University of General San Martín (UNSAM). I also attended the Argentina Program at the University of Salta, completing all three program modules.

What industry are you currently in? How do you use R in your work?

I currently work in the textile industry as the Head of Human Resources. At Yagmour, I use R to present reports on employee turnover, salary reports, accounting entries, etc. Additionally, I use R to consolidate the annual Human Resources budget according to the company’s accounts.

Can you share what the R community is like in Buenos Aires?

The R4HR community is a collaborative space comprising individuals interested in data and human resources. We hold various meetups within the community where projects, R packages, etc., are shared. It is a Spanish-speaking community. The R Club is a meeting space for professionals in the field, where we can share tools, new ways of addressing issues, and novel approaches to similar problems. People who attend and are familiar with R sometimes need to be made aware of everything this programming language offers for simple and complex issues—the benefit of attending lies in sharing and creating spaces for knowledge exchange.

You have a Meetup on “Data Visualization in HR” on June 1st, 2024. Can you share more about the topic covered? Why this topic?

In June, we will hold a meetup to address Data Visualization in HR using the ggplot2 package, adding interactivity and context with plotly. This topic is not just interesting but also highly practical. Visualization is a great way to interpret data and graphically identify behavior patterns, which can also prompt questions about the presented information. The plotly package can add insights that are not apparent in the graphs. Additionally, plotly allows for creating interactive visualizations, enabling users to explore and manipulate the charts directly within the visualization. It can include zooming, data selection, and more, providing a richer and more dynamic user experience.

This meetup’s target audience is individuals interested in understanding the benefits of working with R‌ and people in the human resources field who are interested in the topic.

For this event, we conducted the invitation through Meetup and provided a Google Meet link. After the event, we will upload it to YouTube and communicate to the community through social media.

Would you like to tell us about an interesting recent Meetup from the group?

I recently presented at a group event titled Annual Income Tax with R to showcase the various problems one can address using R beyond data visualization or analysis. In Argentina, to carry out this development, one must consider the guidelines provided by the Federal Administration of Public Revenues (AFIP), which, at the national level, determines the parameters to be used for presentations; payroll software interprets these parameters. Those who do not have a payroll system can use the development done in R to carry out this presentation.

In Argentina, the frequent changes and calculation methods have made everything related to this tax quite complex. They impose this tax on salaries that are considered high-value. It is a tax withheld by the company, and due to inflation and various modifications, the analysis and handling of this tax end up being one of the most complex issues for employees in the country. I made this process easier in this meetup by using R.

What trends do you currently see in R language and your industry? Are there any trends you see developing in the near future?

Trends in R are about its growing popularity and its transformative impact. It allows more people to join and enhances its application to various problems. There is also ongoing work on clustering applied to Human Resources to understand how each group functions, their relationships, common characteristics, etc. In Argentina, due to the current economic situation, data analysis not only at the salary level but also at the soft skills level is an urgent necessity for companies aiming to use data for agile decision-making. Business data is vital for analyzing the rest of the decisions that need to be made by Human Resources and the entire company. To use data for agile decision-making, companies must consider salary levels, understand which soft skills are needed and what the context requires, and make decisions accordingly.

A trend that will continue to develop in the future relates to Artificial Intelligence and how it complements everyday tasks or serves as a support tool.

Please share about a project you are working on or have worked on using the R language. What is the goal/reason, result, or anything interesting related to your industry?

I have worked on several projects in R, starting with the basics related to data visualization of absenteeism, turnover, and salary analysis.

Something different that I worked on with R was creating the annual Income Tax presentation. The objective was to consolidate the yearly information of each employee covered by the regulations according to the parameters provided by the Federal Administration of Public Revenues. It required interpreting each requirement at the programming level. This file had to be submitted in TXT format, which meant working with rare extensions in Human Resources areas.

Another different project in R was creating accounting entries. It allows for systematizing a large amount of information and grouping it according to the accounts.

I have also used R to prepare information presented to the Ministry of Labor, which required extensive cross-referencing. For example, it involved cross-referencing gender with absences, working hours, days, leaves, and paid leaves, among other variables. The complexity of this was the relationship between the data, where any incorrect data would ultimately lead to inconsistencies in the information.

Lastly, before applying R, the Budgeting process in our company involved transferring information across different Excel sheets, using pivot tables, and copying and pasting it into a summarized form. It took a significant amount of time, and whenever a variable needed to be changed, the entire process had to be redone, which implied errors due to the large amount of information transfer. Today, people work on this process dynamically in Excel and then process it in a script that consolidates all the information in minutes, sometimes less. It allows for the creation of multiple scenarios dynamically in a time of significant volatility and limited time. This process using R has achieved a substantial reduction in time, in addition to ensuring data consistency.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups organize, share information, and support each other worldwide. We have given grants over the past four years, encompassing over 68,000 members in 33 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.

Learn more

May 29

Love0

One More Step Forward: The R Consortium Submission Working Group’s Presentation to Swissmedic on Regulatory Submission using R and Shiny

By R Consortium Announcement, Blog

This post was authored by Gregory Chen, Biostatistics and Research Decision Sciences (BARDS), MSD, Switzerland, and Ning Leng, Product Development Data Sciences (PDD), F. Hoffmann-La Roche, USA

On January 30, 2024, the R Consortium Submission Working Group made a presentation to Swissmedic in Bern, Switzerland, with 10 attendees in person and 50 online. It started with a motivation as to why to consider using open source and specifically R for regulatory submissions. The group then proceeded to show cases of the pilot 1 and 2 submission to FDA.

The conclusion was an insightful discussion for about 20 minutes with the participants on the lessons learned, key factors to sort in line for broader adoption of R and Shiny for regulatory submissions, and what would be most added value for a regulatory shiny app, namely

How to deploy the submitted R package and Shiny App to guarantee the clinical outcomes can be smoothly reproduced on the regulatory side
What would be the ultimate purpose of a regulatory shiny app, and what are value-added features? Should the app only focus on offering interactivities to facilitate the review of tables, figures, and listings in CSR, or should it also include designed features to enable exploratory, descriptive analysis (e.g., for subgroups) to certain degrees, which may greatly shorten back-and-forth inquiry between regulator and drug developer?
Validation and version traceability of dependent open source R packages used in the submission package
How to leverage existing and emerging cross-industrial initiatives (e.g. R consortium) in the open source space to support and ease the potential technical issue during the adoption of R for submission

Accompanying this post, the full presentation slide deck is made publicly available here, inviting further exploration and discussion.

The R Consortium’s presentation at Swissmedic represents a hopeful step toward more interactive, efficient, and transparent regulatory submissions. As the conversation between the R Consortium and regulatory bodies continues, our future collection of pilot projects hopefully will offer richer examples and templates to our growing R community within the pharmaceutical sector, spanning both regulatory and drug developer sides.

To find out more about the R Consortium Submission Working Group, please see: https://rconsortium.github.io/submissions-wg/

May 24

Love0

Collaborative Growth: The Botswana R User Group and Regional Partnerships

By R Consortium Blog

Last year, Edson Kambeu, founder and organizer of the Botswana R User Group, shared his plans with the R Consortium to implement data into local businesses in the New R Community in Botswana Wants to Implement Data Into Local Businesses. He recently updated the R Consortium about the group’s growth and recent activities. The group has attracted a global audience through its online events and actively collaborates with R User Groups in the region.

The Botswana R User Group is seeking speakers for their upcoming online events. If you are an R expert interested in sharing your experience with R users in Botswana, please contact Edson at botswanarusers@gmail.com.

Please share about your background and involvement with the RUGS group.

My educational background is in finance. I pursued finance and investments for my master’s degree but also studied economics during my undergraduate years. Mathematics has been my strongest subject since primary school, and I’m passionate about it. This passion led me to develop an interest in Statistics and statistical software.

In the past, I mainly used SPSS, Stata, and EViews for my statistical analysis projects. Then, someone introduced me to data science. During my research on data science, I discovered that two popular programming languages are used for it. I installed Python for the first time, but I could not use it as I didn’t have a computer science background. So, I switched to R and started watching a few YouTube videos. From there, I continued to learn and improve my skills in R.

R was my first language of choice for Data Science. Currently, I use both R and Python for my work.

I am pursuing a Master’s in Computer Science with Data Science from the University of Sunderland. Our different modules use R and Python, and knowing both languages is helping me in my studies.

As I was learning R around 2019 and beginning to follow several R Users on Twitter, I discovered that small R communities gathered together to learn and share knowledge about it. R Ladies Johannesburg in South Africa inspired me the most, as they held events more frequently during that time. I then became interested in starting an R User community in Botswana.

In February 2020, I reached out to Heather Turner, who was scheduled to visit Botswana and other Southern African countries to conduct Introduction to R workshops. During our conversation, Heather provided me with all the information needed to start a community. As a result, in March 2020, Botswana R Users was established during Heather Turner’s Introduction to R workshop.

How has your group been doing since we last talked?

Our meetup group had about 100 members when we last talked to you. We now have almost 400 members. However, I have observed that people from different countries are joining us. We are now a global meetup group rather than a Botswana User group. This is because we mostly hold online meetups, which allow people from other countries to join.

Participants attending an online meetup hosted by Botswana R users in collaboration with Estwatini R Users and Bulawayo R

We are, however, still committed to growing the local community. We want to see more local participation in our meetup group. Last year, we collaborated with R Ladies Gaborone to organize an introduction to R workshop to increase our local membership. We are pleased to announce that this year, we plan to hold another workshop as a pre-conference event in the upcoming Botswana Deep Learning Indaba conference in July 2024. This workshop will help us to increase our local membership further and create more awareness about our group.

Participant at the Introduction to R Workshop held in collaboration between Botswana R users and R Ladies Gaborone

We value collaborations with our partner R User meetup groups in Southern Africa. In recent years, we have had regular meetups involving collaborative efforts with the Bulawayo R User Group from Zimbabwe, the Eswatini R User Group from Swaziland, and the Namibia R User Group from Namibia. We have established a routine of holding joint meetups almost every two months, depending on the availability of speakers. The idea is to grow our communities by increasing the frequency of activities.

Vebash Naidoo of RLadies Jozi presenting in an online meetup for Botswana R Users

You have a Meetup titled “GIS and Creating Dashboards in R. A case study of conflicts events in Kenya,” can you share more on the topic covered? Why this topic?

I had an opportunity to attend a series of workshops and webinars organized by the United Nations for their Datathon. I realized the importance of GIS in advancing sustainable development. In January 2024, I invited Godwin Murithi, a GIS specialist, to present a topic on GIS. The topic was “GIS and Creating Dashboards in R. A case study of conflict events in Kenya.” We wanted to expose our members to the rising field of GIS and show them how the R language and various packages can help solve GIS problems. It was a fascinating topic for our participants, and they loved it.

How has the use of R evolved in the industry since we last talked?

We are observing an increasing acceptance of the R programming language, particularly in universities. Some universities have adopted R as their primary language for statistics and quantitative programs. This trend indicates academic institutions’ growing preference for open source programming languages.

Any techniques you recommend using for planning for or during the event? (Github, zoom, other) Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?

Organizing is one of the most challenging tasks. To get speakers, I have primarily used Twitter (now called X) and LinkedIn to communicate with people who might want to speak at our meetup groups. Lately, there has been a problem with sending direct messages on Twitter. The reason is that Twitter has changed its messaging system. Now, to send a direct message, you need to be verified. I have been affected by the fact that my usual way of talking to people has been disrupted. Therefore, I have resorted to using LinkedIn to search for people interested in R and reach out to them. Sometimes, they are too busy or cancel, which can be challenging. However, I have been successful in finding potential speakers through these platforms.

Occasionally, we use different video conferencing tools like Zoom and Google Meet. We usually rely on these two platforms. However, sometimes a speaker may prefer using Google Meet over Zoom, so we try to be flexible and accommodate their preferences.

We also use GitHub. We have our account, and if the speaker has their material on GitHub, they can share the link with us. Alternatively, they can provide us with the material directly, and we will upload it onto our own GitHub account for the community to access. Ultimately, it all depends on the speaker’s preference.

Please share about a project you are working on or have worked on using the R language. What is the goal/reason, result, or anything interesting, especially related to the industry you work in?

One of my recent school projects was to create a dashboard about UK imports and exports, which I completed towards the end of last year. I developed this project using Shiny and R packages such as Shiny Dashboard, ggplot2, and dplyr.

I’m currently working on another project that is still in its early stages. The goal of this project is to identify areas in Botswana that require greater financial inclusion. I am currently gathering data and plan to utilize R and Python to apply geospatial techniques.

What trends do you currently see in R language and your industry? Any trends you see developing in the near future?

I have observed that people find Quarto and GIS techniques interesting. The community is gaining increasing interest in these areas, and I foresee the increasing use of R in GIS applications.

How do I Join?

Learn more

May 16

Love0

Gergely Daróczi’s Journey: Empowering R Users in Hungary

By R Consortium Blog

Gergely Daróczi, the founder and organizer of the Budapest Users of R Network, updated the R Consortium about the group’s recent activities. Last year, Gergely discussed the group’s inception, and the challenges faced by the group during the pandemic. The group has now resumed in-person meetings, followed by networking sessions. The recent events organized by the group have focused on bioinformatics, large language models, and mathematical modeling.

Gergely Daróczi is an enthusiast R user and package developer, Ph.D. in Sociology; former assistant professor and founder of an R-based web reporting application at rapporter.net; ex Lead R Developer, then Director of Analytics at CARD.com; later Senior Director of Data Operations at System1; currently balancing the CTO role of Rx Studio, part-time lecturer at CEU along with a few open source side projects. He has contributed to a number of scientific journal articles (mainly in social sciences but in medical sciences as well), maintains a dozen CRAN packages, and wrote a book on “Mastering Data Analysis with R“.

Please share about your background and involvement with the RUGS group.

I have a background in social sciences, and it was during one of my university classes 20 years ago that I was introduced to the R language. We had to use R to run simulations related to the chaotic behavior of the Hungarian potato market. I found R more enjoyable and versatile than other GUI tools like IBM’s SPSS and started using it for other projects as well. Later, I even developed some additional packages for R.

I have been working with R for almost 20 years now. Despite my academic background in social sciences, I have worked in various industries, such as ad tech, fintech, and health tech, for the past 10 years.

In 2013, I attended my first useR! conference in Albacete, Spain, and it was a great experience to meet fellow R users from around the world. At the conference, I met Szilard Pafka, a Hungarian living in LA and organizer of the Los Angeles R User group. He suggested that I start an R User group in Hungary. After returning home, I decided to give it a shot, and we held our first meeting at the end of the summer of 2013. In a university room, it felt like there were only a dozen R users from academia. However, a lot has changed since then, as we now have almost 2,000 members in the local R User group, which exceeded my original expectations for such a small country like Hungary. It has been an interesting and great experience.

In Hungary, the community’s growth began slowly, with only 20 to 30 members in the first few years. However, it gradually increased over time. The community also hosted some famous personalities such as Romain Francois, Matt Dowle, and Hadley Wickham, which further accelerated its growth. Additionally, the community organized the first satRday and second ERUM conference, which provided a platform for networking and knowledge sharing, further strengthening the community.

How has the group been doing since our last conversation?

After COVID, restarting the meetups was very challenging. We didn’t organize any virtual events because the main benefit of meetups was meeting in person, having face-to-face conversations, and getting to know each other. Therefore, we waited until the quarantine was over and it was safe to meet in person. We started slowly, organizing only two events per year with around 30 to 70 attendees, which was much lower than before COVID-19. However, it has been great to reconnect with old friends and make new ones.

Recently, we have been focusing on bioinformatics and I was introduced to a local company that offered help with reaching out to speakers. Speakers drive these community meetings by bringing in a topic for discussion and talk, which we continue to discuss later on. Our past few events have focused on life sciences and have followed a lightning talk format, where we had around five 15-minute talks at each event. The topics were diverse, covering life sciences, some with LLMs involved, others focused on highly advanced math for modeling. We also had shiny applications that showed the biodiversity of forests in Hungary and some open-source tools besides R.

Any techniques you recommend using for planning for or during the event?

I can only offer subjective experiences on the matter, but I have witnessed the success of both virtual and in-person communities. However, our focus is on providing an exceptional in-person experience. To achieve this, we search for a central venue that is easily accessible for most of our members. This can be challenging, even in Hungary, a small country, as it can be difficult for members from other cities to travel to the capital for meetups. Nevertheless, we do our best to find a central venue, such as a university or an industry partner who can offer a space for talks and a networking opportunity afterward.

It is important to have a room with plenty of chairs and a larger area for people to gather after the talks. We can provide soft drinks, beer, or wine along with some pizzas and have a chat for an hour or two after the talk. The venue is a crucial factor. It’s also important to have speakers who are interested in the community so that they will come to learn as well. It’s great to have speakers with interesting topics, but the most important thing for me is networking. After the talks, coming together and getting to know others, learning about their struggles, and maybe sharing some tips in person with each other, becoming friends, or learning about opportunities in other industries. Networking and facilitating connections are crucial tasks for R user group organizers.

What trends do you currently see in R language?

Five years ago, machine learning models were a hot topic, and everyone discussed different implementations of GBM. However, things have changed, and nowadays, large language models (LLM) rule over all the topics. LLMs are often implemented in languages other than R, making it difficult to train them from R. Despite this, there are still many use cases for LLMs, even in life sciences and health tech. However, caution must be taken when using AI and LLMs in these fields. Recently, at two bioinformatics events, some nice use cases of LLMs were shared with the audience. This has attracted new members interested in learning how to use AI or LLMs, which can be as simple as doing some API integrations in R, such as calling the chatGPT API to generate text or images.

I’m excited that COVID restrictions are easing up and meetups are returning to normal. I can’t wait for the first in-person useR! conference in Salzburg in a few months. I highly recommend that anyone who can travel to Salzburg in July join us. The city has excellent train connections to European cities, so I hope many people from Europe can make it. I’m looking forward to attending an in-person useR! conference again.

Please share about a project you are currently working on or have worked on in the past using the R language. Goal/reason, result, anything interesting, especially related to the industry you work in?

Currently, I’m focusing on the ETL pipeline of the Spare Cores project, collecting information on cloud compute resources, which will soon have the R bindings as well. In the past, I’ve been working on R packages related to reporting (e.g. “pander”) and using R in production (e.g. “logger,” “dbr” or “boto3”). Recently, I enjoyed integrating APIs and frameworks from other programming languages, such as Python (kudos to the reticulate team!), in R.

How do I Join?

Learn more

May 15

Love0

Enhancing R: The Vision and Impact of Jan Vitek’s MaintainR Initiative

By R Consortium Blog

The R Consortium recently interviewed Jan Vitek, a professor at Northeastern University’s Khoury College of Computer Sciences. He specializes in programming languages, compilers, and systems. Notably, he developed one of the first real-time Java virtual machines in collaboration with Boeing, which involved writing the navigation software of a ScanEagle UAV in Java and demonstrating that it out-performed the legacy version of the system written in C++. Vitek is actively involved in the programming language community and has held multiple leadership roles, including chairing SIGPLAN. In his spare time Vitek is a cinephile with a presence on Letterboxd and is the human of a dog named Olaf.

Vitek has been working on R for a decade. He is currently working on the MaintainR 2021 project, which aims to support and update the key components of the R ecosystem. The R Consortium is funding this project.

Can you provide an overview of the MaintainR 2021 project and its main objectives?

“When does a programming language die?” is the wrong question. Languages do not die, they slowly fade into irrelevance. A language fades away when no longer deemed useful enough for people to learn it and convince their colleagues to adopt it in their work and to maintain software projects written in it. Why does this happen? It comes about when newer languages that are better or appear cooler, start to emerge. The rise of Python has shifted many machine learning users from R to Python. The success of Julia has pushed performance-sensitive users to develop new mathematical libraries in this new language. Is R fading?

The programming landscape is evolving, and R, which has been around since 1995, isn’t the newest option available. To remain relevant any complex language depends on a large ecosystem of software elements that must be maintained and fixed regularly. R is certainly complex and it has many dependencies. It relies on a core group of developers who are allowed to make changes in the key parts of the language. These developers, while, on average, being significantly younger than Joe Biden, are not getting younger.

My work focuses on trying to modernize R. I’ve been examining R from a computer science perspective for about a decade, focusing on software components such as just-in-time compilers. My group is currently in the midst of writing our third attempt at writing a compiler for R. This effort led me to bring my collaborator, Tomas Kalibera, into the R community through our projects, which sparked his desire to assist the community. This was all part of a natural extension of our research. The goal of the MaintainR project is to maintain key parts of the R environment, which are challenging for volunteers to sustain. We have not found companies willing to contribute top-notch software engineers for this maintenance effort for their own reasons—perhaps they don’t have the resources, or they’re occupied with other tasks. Thus, our effort is focused on providing the necessary maintenance to prolong R’s usefulness.

The R ecosystem is dependent on the R interpreter, the core libraries, and CRAN. Which takes the most effort to maintain and why?

Everything in this project is challenging because the components vary greatly in size and heterogeneity. The interpreter is the smallest part, which everyone relies on. Then, there’s the core library, which is about ten times larger than the interpreter and is a mix of R, C, and Fortran. Fortran isn’t as popular as it used to be, and we encounter issues when compiling it with modern compilers like LLVM. Ensuring Fortran compiles across all desired architectures and operating systems has been a persistent challenge.

We’ve also had difficulties integrating patches into LLVM and GCC for this purpose. Changes in these compilers can lead to breakages in our environment. The crown packages contain vast code—potentially 100 times more than the core library. This creates an inverted pyramid scenario where the amount of code increases as you move up the structure.

Maintaining these packages is not our direct responsibility, but we can’t ignore them. Some are crucial for the users’ satisfaction. Some packages inevitably break as the language evolves and new versions are released. Tomas often has to approach maintainers to inform them of these issues. Sometimes, they respond and agree to implement fixes, but not always. Even when a technical fix might take just half a day, it can require a full week of negotiation with a developer to accept the patch.

This social aspect of software maintenance is significant and often the most challenging part. Developers have their own priorities, and a patch that doesn’t align with their goals can be seen as disruptive. Sometimes, the delay is simply because they are slow to respond. This complex interplay of technical and social challenges is a constant part of our efforts to keep the project moving forward.

Tomas Kalibera, a member of the core R team and supported as part of the MaintainR project, has implemented CheckR, a software tool for verification of the C code linked against the R interpreter. Can you explain how CheckR improves the R ecosystem and the overall quality of packages available to R users?

My team developed a tool called CheckR, which addresses issues arising from libraries written with a substantial amount of C code. The aim is to identify potentially misbehaving C that could cause unpredictable crashes, leading end users to mistakenly believe that R itself is faulty when, in fact, the issue may stem from a poorly written library or careless usage.

CheckR processes the C code, transforms it, and an analyzer identifies points where things might go wrong. A common issue it detects involves what we call “Protect bugs.” This happens when the R code sends a value down to C, and C must “protect” this value to prevent it from being reclaimed by the garbage collector. Sometimes, developers handle this in a hasty and imprecise manner. If they make an error, the value given to C can be reclaimed and reused, leading to memory corruption—this could result in security flaws or crashes.

CheckR is a static analyzer that flags potential issues but is not always definitive. It identifies possible problems, and we return to the developers to discuss whether these should be fixed. Often, developers are skeptical about the identified issues, which can lead to extended discussions. Sometimes, these issues might never occur, but often they do, and since CheckR is used daily across our entire codebase, it automatically generates reports that help us address these vulnerabilities.

The next steps with this tool aren’t always clear-cut because we can’t predict all potential issues. For example, one persistent challenge has been how the Windows operating system encodes Unicode characters, requiring months of troubleshooting. Could we have foreseen this particular issue? Not really. It’s part of the unpredictable nature of software development, where new problems can emerge at any time.

Have you been successful in extending the life of R by having CheckR run daily and helping with the interpreter, libraries, and CRAN?

Thousands of changes have been made to the R environment over time, and while I’d like to say that these changes have definitely improved things, as a scientist, I feel the need to provide concrete evidence, which I can’t always do. However, I can confidently say that each time we identify and fix a bug, the system has one less problem. The challenge, though, is that the potential for bugs can be virtually unlimited because new code is continually being added. It’s an ongoing process, and realistically, there might never be a point where we can declare it completely done.

How has it been working with the R Consortium? Would you recommend applying for an ISC grant to other R developers?

In our case, a lot of the work we do isn’t glamorous, and most volunteers are drawn to projects where they can attach their names to something flashy. Yet, there’s a continuous stream of necessary tasks that aren’t as appealing but are essential. Without a steady source of funding, sustaining efforts like ours would be impossible. The process we follow is streamlined, the community is welcoming, and your contributions can significantly impact a large user base.

The key message here is that funding is incredibly beneficial, especially for supporting those who contribute more gradually; the return on this investment is significant. For instance, in our project, without the funding, we couldn’t have supported the work of Tomas Kalibera, and nothing would have progressed. No company was willing to employ someone full-time for this, despite it being a crucial component of the ecosystem. Being able to provide funding allows us to engage someone who might otherwise have to spend their time on other activities and only contribute to this project in their spare time. Having someone fully dedicated for even a limited period is a tremendous advantage.

About ISC Funded Projects

A major goal of the R Consortium is to strengthen and improve the infrastructure supporting the R Ecosystem. We seek to accomplish this by funding projects that will improve both technical infrastructure and social infrastructure.

Learn more

May 14

Love0

Tackling Hurdles: Embracing Open Source Packages in Pharmaceutical Research

By R Consortium Announcement, Blog

The R Validation Hub next meeting is May 21st, 12:00 PM EST.

Join the call here!

In the dynamic field of pharmaceutical research, open source R packages offer incredible potential to innovate and enhance efficiency. The R Validation Hub is guiding the community building riskmetric and the riskassessment app. riskmetric is a framework to quantify an R package’s “risk of use” by assessing a number of meaningful metrics designed to evaluate package development best practices, code documentation, community engagement, and development sustainability. Together, the riskassessment app and the riskmetric package aim to provide some context for validation within regulated industries.

The benefits of utilizing open source tools in pharmaceutical projects are compelling. To address these issues and maximize their potential, join us at the R Validation Hub community meeting on May 21st at 12:00 PM EST. This gathering will focus on sharing best practices, troubleshooting common problems, and exploring innovative solutions together.

Embrace the opportunity to transform pharmaceutical research with us. Let’s innovate, collaborate, and overcome these hurdles together.

Join the call here!

May 13

Love0

The Evolution of Melbourne’s Business Analytics and R Business User Group

By R Consortium Blog

Maria Prokofieva, organizer of the Business Analytics and R Business User Group, spoke to the R Consortium last year about the Adoption of R by the Actuaries Community in Melbourne. Recently, Maria updated the R Consortium on the group’s focus, which has shifted towards business consultancy. The group provides a platform for graduate students to gain valuable industry experience and mentorship through various projects. The group is committed to ethical data governance and inclusive community building and prioritizes these values in all its initiatives.

Please share about your background and your involvement in the R Community.

My name is Maria Prokofieva. I work as a Leading ML Engineer at Mitchell Institute at Victoria University. I lead a stack of projects that use data to inform strategy and policy development. I am also an academic at the university, conducting research and teaching courses on ML/AI and data analysis. Through my work, I have the privilege of collaborating with various organizations, governments, and scholars to assist them in utilizing data to make decisions that impact the lives of many. I love open source, and what we see today is amazing – the world is changing. I have been a member of R and Python communities for many years, and seeing us grow is great.

How has your R User Group been doing since the last time we spoke?

The group has been performing well. As we grow, we focus on projects and become extremely busy with them. We already have a small community of people involved in different projects who also work together and communicate regularly. Once a month, we organize meetups where we present master classes—we moved to an in-person space but occasionally do online events. Our group has two main directions: business consultancy and business knowledge exchange.

We have been quite successful in building connections with bigger and smaller businesses interested in doing more data analysis. Some smaller businesses have staff who can perform their duties, and this is where community members have been fantastic.

The backbone of my community comprises my current and former Master’s students, who completed a course on business analytics. They are passionate about using R in everyday tasks and already possess some knowledge and experience, which they are happy to share. They are also interested in building connections and networking for their future jobs. This platform provides a mutually beneficial relationship for new students who get valuable industry experience through unpaid volunteering. These students receive mentorship from business leaders and senior software developers who share their programming knowledge and their knowledge of business negotiations and working with clients through the entire project life cycle.

We have been successful in working with cloud services such as AWS. We are actively exploring ways to automate data science on AWS and have several upcoming workshops where we will dive deeper into this topic. One workshop will focus on AWS Bedrock, where we will introduce non-technical business community members to employing large language models to perform their tasks. Our workshops focus on addressing specific problem-solving tasks rather than just the environment. We look into the business problems and how they can be solved.

It’s better to identify a problem and brainstorm solutions than to focus on tools. It’s fascinating to see how the community comes up with unique solutions to the problem. This approach is exactly what we need today, where no single preferred tool exists. Even if we use R Studio, we can easily integrate Python and other environments to accomplish the task. The focus should always be on the task guiding the process rather than the tools themselves.

Any recent project you have worked on using R?

Our recent project is based on utilising AWS Bedrock and GPT-4 to implement a Retrieval-Augmented Generation (RAG) system for a business. This system streamlines customer email communication using internal documents and company FAQs to auto-generate tailored responses. With some components there, we successfully integrated data analysis in R with Python implementation. We also have a few projects using open source models and integrating transformer models from Hugging Face. R is a star for any data-wrangling tasks and data visualizations!

What are your plans for the upcoming months?

One area of interest that we plan to focus on is the use of responsible AI and responsible practices. This is crucial not only for AI but also for any data management that we undertake. Responsible modeling and responsible data science are important concepts that need more attention. We have seen instances where people intentionally or accidentally manipulate statistics, and this needs to change. We must focus on being data governors and ensure our analysis is responsible. This includes managing the data and the application size, as well as ensuring continuity of work. Many packages are available, but maintaining and updating them is challenging. Our future work is to contribute to the community by ensuring the continuity of our packages so developers can rely on them.

What trends do you currently see in R language and your industry? Any trends you see developing in the near future?

Many people talk about large language models, but the focus is often on their applicability and use cases. While many amazing models are available, businesses need to see how they can be practically applied to their needs. It’s not just about text generation – image generation and other areas are also important. When we share the use cases with the businesses, the possibilities they haven’t considered often amaze them. Therefore, there is a growing demand for case studies demonstrating these models’ practical applications rather than just tutorials.

We focus on the practical applications of tools. Our approach is to identify a problem and explore various solutions. I’m not interested in specific software packages but in finding efficient solutions to problems. If there’s a new tool that can help me solve a problem more effectively, I’m open to learning about it and sharing my experience with others.

The most important technique is interaction, networking, and keeping the connection alive among the group members. This is especially crucial when you have a larger proportion of new users in the group. It’s important to ensure that once they learn the skills, they understand that we are all busy and have business obligations to attend to. Therefore, it’s necessary to make sure that we keep the group relevant to all members, without getting carried away by our busy lives.

This is more about sitting together and engaging in problem-solving exercises, such as preparing for AWS certification. The group can help with other tasks, too, creating additional value beyond just learning. This is where the benefits of membership come in. Members are also motivated to give back to the community, as they can use their skills in real life, not just for learning purposes.

For instance, we have an AWS data practitioner interested in learning R. However, this is an opportunity for that person to share their expertise and contribute to the group. Similarly, we have a cybersecurity professional who is also interested in learning R. But this is an opportunity for them to present a use case on how machine learning can automate some of their tasks. They are also willing to share their knowledge, which may not have been considered. Therefore, it’s important to create a diverse experience for all members and engage with them in all possible ways. While it can be difficult to involve every group member, it’s crucial to understand their general interests and what’s important for them and focus on their professional development.

Take a moment to analyze where your members come from and their future plans and steps. Discuss their next career moves. It will be beneficial to provide networking opportunities where members can get referrals for job searches and advice for their next career move. These opportunities are quite important. Therefore, promotions should always be the end goal. You cannot become complacent or content with where you are because life is about growth and evolution.

How do I Join?

Learn more

Apr 30

Love0

R/Medicine is coming June 10-14, 2024 – See Top Five R Medicine Talks from Previous Years

By R Consortium Blog

What to get a feel for the kind of content will be available at R/Medicine 2024? We’re spotlighting the most engaging and educational sessions from past R Medicine Virtual Conferences. Whether you’re a healthcare professional, a data scientist, or simply curious about the intersection of healthcare and technology, these selected talks offer a wealth of knowledge and innovation using the R programming language. Dive into these sessions to enhance your understanding and skills in medical data science.

🔗 Register for the R Medicine 2024 Virtual Conference here!

1. GitHub Copilot in Rstudio, It’s Finally Here! – R Medicine Virtual Conference 2023

This session introduces GitHub Copilot for RStudio, a highly anticipated tool that enhances coding efficiency and innovation in medical research. Watch as experts demonstrate its capabilities and potential impact on healthcare data analysis.

2. Analyzing Geospatial Data in R (Sherrie Xie) – R/Medicine 2022 Virtual Conference

Featuring Sherrie Xie, this presentation explores the applications of geospatial data analysis within the healthcare sector using R. Gain insights into the importance of spatial data in understanding health trends and outcomes.

3. R/Medicine 101: Intro to R for Clinical Data (Stephan Kadauke, Joe Rudolf, Patrick Mathias) – R/Medicine 2022

This introductory session is perfect for those new to using R in a clinical setting. The speakers guide you through the basics and demonstrate how R can revolutionize medical research and patient care.

4. Introduction to R for Medical DataTidy Spreadsheets in Medical Research – R/Medicine 2021

UMich Prof and {medicaldata} author Peter Higgins will cover best practices for using medical data in spreadsheets like Excel and Google Sheets.

5. Multistate Data Using the {survival} Package – R/Medicine 2021

Explore the use of the {survival} package in R for analyzing multistate data. Discover the methods and models that are shaping the future of survival analysis in medical research.

Engage and Learn More!

Each of these sessions provides unique insights and practical tools for harnessing the power of R in medical research and healthcare analytics. Whether you are watching these for the first time or revisiting them, each video promises a deep dive into the capabilities of R that are driving advancements in the field.

📢 Mark Your Calendars! The R Medicine Conference for this year is scheduled for June 10-14. Register now to secure your spot and connect with a community of like-minded professionals!

🔗 Register for the R Medicine 2024 Virtual Conference here!

Remember to subscribe to the R Medicine channel for more updates and upcoming conference information. Enhance your skills in medical data science today!

Apr 29

Love0

Enhancing Clinical Trial Data Sharing with R Consortium’s R Submissions Working Group

By R Consortium Blog

The R Consortium’s working group R Submissions Working Group is spearheading an innovative approach to clinical trial data sharing, according to a feature in Nature. This initiative, led by Eric Nantz, a statistician at Eli Lilly in Indianapolis, Indiana, involves a pilot project with the US Food and Drug Administration (FDA). Sharing clinical trial data traditionally requires each scientist to install custom computational dashboards, a cumbersome and error-prone process.

Nantz elaborates on the benefits of using webR and WebAssembly in this context: “Using WebAssembly, [it] will minimize, from the reviewer’s perspective, many of the steps that they had to take to get the application running on their machines.” This technology not only simplifies the data sharing process but also has the potential to accelerate drug approval timelines and enhance collaborative research across various fields.

For more details, read the full article on Nature’s website: Read the full article here (Paid subscription required).

To further explore Eric Nantz’s insights on using R and Shiny in regulatory submissions, you can also check out the R/Adoption Series: R and Shiny in Regulatory Submissions with Eric Nantz.

Apr 26

Love0

Bridging Gaps: Tunis R User Group’s Journey in Democratizing R in Bioinformatics

By R Consortium Blog

Last year, the R Consortium had a conversation with Amal Tlili, the co-organizer of the Tunis R User Group, regarding the Use of R for Marketing and CRM in Tunisia. This year, Amal Boukteb and Hedia Tnani spoke to the R Consortium about the use of R for bioinformatics research in Tunisia and discussed the group’s efforts to bridge the gap between academia and industry. The Tunis R User Group hosts engaging virtual events to connect R enthusiasts across the MENA region and Worldwide. Their events promote the use of R and foster knowledge and skill development in data science and bioinformatics.

Amal Boukteb is a PhD student at the National Institute for Agricultural Research of Tunisia (INRAT). She holds master’s degrees in Molecular Genetics and Biostatistics. Her PhD project focuses on Orobanche foetida, a parasitic plant threatening faba bean crops. She analyzed O. foetida genetic diversity in Tunisia with RADseq and studied faba bean gene expression during this parasitic plant attack using RNA-seq. With a passion for integrating bioinformatics and Plant biology, Amal is determined to make significant contributions to the implementation of sustainable agricultural practices.

Dr. Hedia Tnani is a Staff Scientist at Lieber Institute for Brain Development (LIBD). She did a PhD in molecular biology and genetics. Her current work focuses on addressing the complex challenge of RNA degradation in postmortem brain tissue samples. She’s also the co-founder of R-Ladies Tunis and Tunis R User Group. Through the Tunis R User Group, she wants to democratize bioinformatics and data science.

Hédia and Amal met during the Bioinformatics and Genome Analyses course at Pasteur Institute of Tunis in 2017. Amal joined Tunis R User Group as a Bioinformatics Event Organizer

in 2023. With a deep commitment to inclusivity and empowerment, they’ve dedicated themselves to breaking down barriers faced by women and individuals from low-income countries when accessing education in these cutting-edge areas. By organizing workshops tailored to these communities, they aim to provide valuable skills and knowledge and foster a more diverse and equitable future in the bioinformatics field.

Please share about your background and involvement with the RUGS group.

Amal: We are biologists, and our academic curriculum did not include any programming courses. However, with the advancement of sequencing technologies, biologists are now facing the challenge of analyzing vast amounts of genomic data. This is a significant challenge for us. For my PhD project, I was involved in RNA-Seq and RAD-Seq projects. To overcome this challenge, I attended a course on analyzing genomic data using Unix, where I met Hedia for the first time. Additionally, in the framework of my thesis project, I had the opportunity to visit the Plant Immunity Group at RIKEN Yokohama in Japan for an internship. While there, I learned a lot from the talented scientists and their exciting research in bioinformatics.

When it comes to learning R programming for biologists, no specific courses are available. The only courses that exist are general ones. So, to overcome this gap, I started learning by myself. I attempted to understand the concepts by reading through error messages, package tutorials, and watching YouTube tutorials.

We realized we faced the same challenge after discussing this issue with our colleagues. We have genomic data that we need to analyze, but the available courses are located outside of Tunisia, primarily in Europe. Unfortunately, we lack the financial support to attend these courses. Additionally, obtaining a student visa for a temporary stay to attend such courses is a complex process. This challenge is not only unique to Tunisians but also a struggle for Africans and many biologists from middle and low-income countries. Our Tunis R user group aims to help others overcome this challenge and bridge this gap.

Hedia: I studied agronomy first and then pursued a master’s degree in plant breeding from Spain. Later, I completed my PhD in genetics. I did not know programming or R during my studies in Tunisia and Spain. However, when I started my postdoc at the International Rice Research Institute (IRRI) in the Philippines, especially when I first faced ‌analyzing genomics data, I felt out of my depth. With no programming experience, learning R seemed like a mountain too steep to climb. This is a familiar story for many biologists transitioning from wet to dry labs, where code replaces beakers. Despite the daunting challenge, I persevered and taught myself R; eventually, it became an invaluable tool for my research. I’m also thankful to the great mentors I had at IRRI who helped me accelerate my learning curve. My journey wasn’t easy, but it was incredibly rewarding.

Learning bioinformatics can be challenging, especially in regions like Tunisia where resources are scarce and training abroad is so costly. Moreover, the need for bioinformatics training to solve biological problems has left many highly skilled biologists struggling to find a job in their field. Recognizing these obstacles, we formed a supportive community to facilitate collective learning and growth in bioinformatics and related fields such as data science and artificial intelligence.

Our community is a friendly, inclusive, and welcoming space for anyone passionate about bioinformatics, data science, artificial intelligence, and beyond. We’re all about growing together and learning from each other in a supportive environment. Whether you’re just starting out or have lots of experience, we encourage you to dive in, ask questions, and share your insights. We all rise by lifting others. Don’t worry about asking the “wrong” question. Every question is a chance to understand and learn something new. Come join us and be part of our journey of discovery and growth. We can’t wait to learn with you!

Can you share what the R community is like in Tunisia?

Hedia: In Tunisia, programming is mainly used in the industry, but it is not widely taught in the curriculum for biologists. This creates a gap between what is taught in the academic courses and what is required in the industry. As a result, individuals are expected to possess programming skills when they work in the industry. Still, they may not have been able to learn programming during their academic courses. This gap must be addressed to better prepare individuals for the job market.

Can you please update us about the group’s recent activities?

Amal: First, it is important to mention that Arabic is our native language in Tunisia. However, French is the predominant teaching language in many subjects, including biology and informatics. Despite this, we have decided to conduct our workshop in English for the Tunis user group for two main reasons. Firstly, we aim to bridge the gap between the academic skills acquired in French and English resources. Secondly, by using English as our teaching language, we can reach a broader audience of scientists who share our needs.

We decided to allow us the flexibility to choose speakers without language barriers. Our main goal is to reach a broad audience worldwide. During our workshop, we noticed participants worldwide, not just Tunisians. This is very important to us. We conducted workshops for biologists, such as the Genome-Wide Association Studies (GWAS) workshop, and we already have 5k views on our YouTube channel. It is interesting to see that people are very interested in our workshops. We also had the opportunity to collaborate with highly qualified researchers in their respective fields. Within our community, we were privileged to learn from Pr. Emerson Del Ponte generously shared his expertise using R for Plant Disease Epidemiology.

We aim not only to cover biological subjects but also those related to artificial intelligence. Recently, we conducted two successful workshops on Building a Chatbot with OpenAI, Shiny and R, and Bioinformatics Analysis using Chatlize and ChatGPT. We strive to have a balance between biological and AI-related subjects to make the experience easier for our participants with the help of artificial intelligence.

What trends do you currently see in R language?

Hedia: In bioinformatics, there is a growing trend towards single-cell and spatial transcriptomics. Our latest event was an introduction to single-cell RNA-seq analysis. Additionally, packages based on OpenAI API are increasingly being used. For instance, many of those packages can be used by people who lack coding skills. This is particularly helpful because not all biologists possess coding skills, and it makes their work easier. Another trend we have noticed is using Quarto instead of R Markdown. Shiny is also gaining popularity in this field.

We have been receiving a lot of queries about bioinformatics workshops lately, particularly because they offer a diverse range of events, such as user groups. However, it can be challenging to find a specific topic. For instance, some R user groups may only hold one or two events yearly, whereas we host monthly bioinformatics events.

We value feedback from our attendees and gather suggestions from our latest events to improve our upcoming ones. Our events are designed to stay current with trends in the industry, and we often invite guest speakers to talk about relevant topics. For instance, during one of our workshops about Building a Chatbot with OpenAI, we had 200 participants whom we taught how to use R and create their chatbots. We learn from our experiences, and when we notice an interest in a particular area, we look to bring in speakers to teach on that subject.

Hedia: Our organization had a sponsorship for our Zoom account, an important tool for hosting events. One of the features that we utilize is the captions option, which allows participants from all over the world to have captions in their language and helps them follow the workshop. This is particularly helpful for those who may have difficulty understanding English. We are very grateful to Appsilon for their sponsorship of our Zoom account.

Amal: Thanks to Appsilon’s sponsorship, we have accepted more participants for our events. Previously, the number of participants was limited due to the capacity of our Zoom account. However, with this sponsorship, we can now handle up to 100 participants per event. This has made it easier for us to accept more subscribers and host successful workshops. We recently had an event with over 200 participants, which was a great success.

Hedia: We provide teaching materials for our speaker sessions on GitHub. You can find all the materials on YouTube and use them to reproduce what the speaker did during their session. We are always open to questions, especially if you encounter bugs while trying to reproduce the speaker’s work. Recently, we received an email from a participant experiencing a bug, and we had a great time figuring it out together. If you have any questions or problems, feel free to ask us for help, and we’ll do our best to assist you.

Are your events online, in-person, or hybrid?

Hedia: We are considering organizing hybrid events in the future, and we are searching for funding. We only have sponsorship for our Zoom, so we need additional funds to make this happen. We plan to organize events at multiple universities across the MENA region so important speakers can be followed in person and online. Amal, who is based in Tunis, has been in contact with many universities and academic professionals in the area. We’re currently exploring the best ways to make these hybrid events a reality, ensuring a seamless and enriching experience for everyone involved. Our goal is to make these events as engaging and accessible as possible, fostering a true sense of community.

We want to organize events for online events and to provide something valuable to our community. When we meet in person, we can better understand their needs and challenges, which helps us to build and organize workshops that cater to their specific needs. Recently, Amal mentioned that some courses are not free in Tunisia, which can be a barrier for some people. Therefore, we aim to organize a free hybrid event for everyone who wants to join and learn with us. We hope to get funding for this initiative to provide this opportunity to all.

Amal: For my PhD project, I conducted research on population genomics and RNA-seq to investigate the interaction between plants and parasitic plants. Our work shed light on the genetic diversity of Orobanche foetida, a parasitic plant posing a significant threat to faba beans in Tunisia. Additionally, through RNA-seq analysis, we identified a potential target gene for developing resistant varieties of faba beans against this parasitic plant. Furthermore, I recently completed a bachelor’s degree in biostatistics, specifically focusing on Aphid diversity in Tunisia.

During my academic journey, R has been my primary tool for conducting comprehensive data analysis across all my research projects. After finishing my PhD, I aim to develop my expertise in bioinformatics further, specifically focusing on wheat genomics.

Hedia: I primarily use R as the main software for all my research projects. I am currently working on maintaining and improving a package called qsvaR. qsvaR is a tool that generates quality surrogate variable analysis for degradation correction in RNA cells. It contains functions that help remove the degradation effect in post-mortem brain tissue, making it a useful tool for generating basic data. We are currently working on a publication based on this work.

How do I Join?

Learn more