Skip to main content
Category

Blog

New Data Science Degree in Zimbabwe Universities Fueling Interest and Growth in R Programming

By Blog

R consortium had a discussion with Asimbongeni Dhlodhlo, one of the key leaders spearheading the ZimR UseRs group in Zimbabwe. He talks on how the COVID-19 pandemic has affected them as a community and how they are dealing with the current global crisis. Being a community dominated by students, R is gaining popularity in the country especially after the introduction of a degree in data science by some universities. Asimbongeni also has an interesting take on seeing R running on serverless environments.

What is the R community like in your country?

The R community in Zimbabwe is young, driven by demand for workers with  programming skills. Our members are people who have a background in computer programming. Recently, there has been an introduction of data science degrees in three universities in the country. Five years ago, we did not have data science as a degree in Zimbabwe. R is being used in these universities as part of their core curriculum which has brought an increase in R members in the group. 

Also, Zimbabwe is facing a serious unemployment problem, but NGOs are growing. These NGOs normally come with their own standards such as the kind of software to be used, which has caused an upward trend in the use of R.  R is being used by NGOs for data cleaning, analysis and visualization. The largest group making up the R community in Zimbabwe are students from universities, not people in business.

How has COVID affected your ability to connect with members?

It has been a horrible time for us. Here in Zimbabwe we had an early lockdown, even when the numbers for COVID cases were quite low. This really affected us from meeting physically. The numbers are low even now but our borders still remain closed. 

We haven’t had a lot of events on just R. What we have done is to partner with the data science community which is bigger than ours, including a Python one that is popular. Whenever there is a Python Zimbabwe or data science event, we are always part of it, but that was like 2-3 years ago. We have tried using virtual meetings however, internet connectivity has been an issue of major concern. Most of our users being students, we cannot do a long meeting because the majority may not afford the internet costs. We have been using WhatsApp as a discussion board where people ask questions, help with challenges. But it’s been a horrible time. 

In the past year, did you have to change your techniques to connect and collaborate with members? For example, did you use GitHub, video conferencing, online discussion groups more? Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?  

We have been using virtual meetings, but did not get good numbers and the numbers drop quickly during virtual meetings. The maximum we can do is 2 hours, beyond that the numbers start falling. 

The main platform we are using right now is WhatsApp, but then it has limits on the number of people. We have never used GitHub. WhatsApp is the only tool we are using to connect, but I doubt that we can continue using it moving forward. People get excited when there is a physical conference; they want to meet, interact, and travel to different locations. They really enjoy that. Unfortunately, we cannot replicate that virtually. 

For us, the ideal thing is to meet physically. I’m just hoping this whole COVID crisis goes away.  

Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting?

We had Kundai Gwatidzo who did an analysis on tracking the number of burials during COVID. He used Sentinel satellite imagery to track graves. On the satellite image there is a shadow that shows when there is a grave that has been dug. He went ahead to analyze that and then used a classification algorithm that tracks the shadows and produces a count for a specific cemetery in the past week. Using this, we were able to have a weekly count of graves, if there are no clouds. This particular presentation was a “wow” moment for us as a community.

What trends do you see in R language affecting your organization over the next year?

People have always enjoyed R since it is easier to learn than other languages. The learning curve allows people to jump on fast. What’s more interesting for me in the future is to see R being used in serverless environments like Docker plus other serveless platforms. There has been a lot of work that has been done to make R easier using packages, such as Machine Learning packages and R MarkDown. 

R has always been strong in visualization, ggplots is one of the best packages out there. It makes creating graphs a beautiful thing to do.

When is your next event? Please give details!

The plan was to have an event in February. We are hoping that it will be a physical event because here the policy is that if you are vaccinated you are allowed to meet physically. However, with the Omicron variant, it’s unclear. We are keeping our fingers crossed.

Of the Funded Projects by the R Consortium, do you have a favorite project?  Why is it your favorite?

For me the one that stands out is the Google Earth Engine (GEE) with R project since it is trying to link R to the Google Earth Engine. Geographic Information Systems (GIS) is data intensive and it requires a lot of computation power. If that comes together by leveraging Google’s computation power, it would be an exciting project to watch.

Calgary R User Group on the Importance of Math Education

By Blog

Jonathan Lin of the Calgary R User group (meetup | website) talks to R Consortium about their adaptation into the COVID world and their struggles with the higher production value of videos. He also talks about the importance of making statistics and math more approachable to kids to get them more interested in data science in general.

RC: What is the R community like in Calgary?

JL: Speaking of the R community, it’s pretty diverse in Calgary. The main organizers are academic, and I joined primarily with a business background. A large portion of our audience comes from academia and statistical backgrounds, so our discussions are oriented towards studies and dissertations. We also get businesses and petrochemical engineering talks, too, being an Oil & Gas centric hub. The combination makes for a very diverse and interesting set of discussions.

RC: How has COVID affected your ability to connect with members?

JL: Up until this year, we hosted exclusively in-person meetings. While we would post slides after for people to use, we were limited by time needed to properly record our meetings. COVID has forced us to be more accessible. Having online meetings has increased our reach. We were able to advertise to a much wider audience, and also get a bigger range of both speakers outside of Calgary and even Canada. For instance, Edmonton is 3 hours away, and we were able to get a speaker that we wouldn’t have been able to consider before. 

There are some downsides, however. On Zoom, we have less banter and conversation. Our Zoom meetings follow a fairly basic format – the presenter presents, our members ask a few questions, and then participants leave. We also find its harder to organically identify speaking opportunities – Chel Hee Lee (co-organizer) has done an amazing job finding speakers during this time, and we are always looking for contributors.

We wanted to experiment with other platforms to try and break out from this pattern. We tried wonder, the online conferencing app. It showed a lot of promise, and the interaction was good. Zoom has significant momentum, so asking our users to try something new is a culture shift. The general shyness of online users (especially face to face) is a challenge – online, there is little incentive and lots of risk for individuals to engage with strangers. 

There is no substitute for in-person communication… but after doing several meetings online, there is a lot we can learn and integrate back into our in-person meetings.

RC: In the past year, did you have to change your techniques to connect and collaborate with members?  For example, did you use GitHub, video conferencing, online discussion groups more?  Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?  

JL: We used GitHub since day 1. This is a great way to show a presence both online and locally. It allows people to see prior meetings presentations, code, and other materials. This gave us a relatively easy transition for distributing materials when we started hosted online.

Recording videos is new to us, however. The need for reuse means we need to focus on higher production values, including additional post-event updates. 

The technical issues that come with it are worth the effort though. Having greater access to individuals with disabilities, or even those balancing family life, is the obvious benefit that comes from these changes. It enables access on our YouTube channel, allowing more people to access our content, regardless of their status. 

Ideally we’d continue recording videos, and would love ideas on how to easily put event videos onto Youtube.

RC: Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting? 

JL: When we have presenters who have done academic work, they come in with a very novel aspect to problem-solving. Abed Ayyad gave a talk about how Alberta has the best solar exposure in Canada. And he didn’t just tell us, but he showed us along with the calculations. Another was by Danielle Clarke on bee habitats and bioversity can be affected by the shape of the habitat, and modelled with GIS and Landscape Analysis. She turned everything into a grid, how diversity was in each grid, and how a change in one area affects other areas. This application of R techniques to academic topics is something that tangibly demonstrates how to apply R to specific problems. I love seeing the cross-pollination of R (pun intended).

RC: What trends do you see in R language affecting your organization over the next year?

JL: We’re getting a steady stream of new R users coming into our organization, who want to learn how to use R. We are thinking of doing workshops and helping the community. It is one thing to host these user groups, but actively teaching and learning would be a highly effective way to improve our community.

RC: Do you know of any data journalism efforts by your members? If not, are there particular data journalism projects that you’ve seen in the last year that you feel had a positive impact on society?

JL: I was watching Twitter and we had a recent municipal election where one of the people running was investigated after initial voting had taken place. One analysis used R to review how his votes changed from pre-voting to the day of elections. None of them were in our CalgaryR User Group, but it’s awesome to see other Calgarians using R for their work.

RC: When is your next event? Please give details!

JL: Our next event (at time of interview) is with Cherri Zhang from the University of Calgary on the Validation of the underlying constructs of survey instruments, relating to the diagnosis and management of concussions. 

And anyone can join our CalgaryR Meetup group to hear about our 2022 and future events!

RC: Of the Funded Projects by the R Consortium, do you have a favorite project? Why is it your favorite?

JL: They’re all great, and the following are especially of relevant to us:

  • Setting up an R-Girls-School Network is where we really want to get kids involved with coding earlier. 
  • The Java interactive visualization looks neat. I’m excited about that one. R is blessed with lots of visualization packages, and there are always advantages with one package over another, so I look forward to seeing what this one brings.

RC: Of the Active Working Groups, which is your favorite? Why is it your favorite?

JL: Distributive Computing and R Certification both are interesting to me. 

Regarding the R Certification – How would you go around getting a common certification in R? Being previously “certified” in a different programming language, I found that certifications do not always indicate a strong competency in the language. 

I feel like Distributive Computing could use some development in the R system due to some of the issues going on currently with data size.

RC: Four projects are R Consortium Top-Level Projects. If you could add another project to this list for guaranteed funding for 3 years and a voting seat on the ISC, which project would you add?

For reference, the four current projects are: R Community Diversity and Inclusion, R-Hub, R-Ladies, and R User Group Support Program (RUGS).

JL: Similar to my earlier answer on supporting “R Girls,” one of my co-organizers (Cliff) is strongly passionate about “R kids,” supporting statistics and math fluency for kids. It would help to make coding and statistics more approachable and embedded in the minds before they reach high school. Visualizations, graphs, interactivity and discovery would all help make R more accessible to kids. 

Osun R User Group in Nigeria talks about Spreading the Gospel of R across Africa

By Blog

The R consortium spoke with Timothy Ogunleye on adapting to fit the system during the lockdown. We found out how they were able to adjust and move from having so many in-person meetups to virtual sessions largely through WhatsApp.

Timothy holds a Master of Science (M.Sc.) and Bachelor of Science (B.Sc. – Hons.) in Statistics with 2nd Class Upper Division from the University of Ilorin in Nigeria. Currently, he is close to finishing his doctorate degree under the supervision of Prof. A.O. Adejumo. 

Timothy has also trained at the Department of Computing of Macquarie University, Sydney, Australia, and has been certified as an R and Python Programmer (expert) in the field of Data Science. And he has obtained a national diploma (ND) and higher national diploma (HND) certification in Statistics from the Federal Polytechnic in Osun State in Nigeria.

Tim, as he is proudly called by his friends and colleagues, has at least 15 years of experience in both industry and academia. He has been a monitoring and evaluation (M&E) consultant, data analyst, and epidemiology and health consultant with more than 40 reports produced. He has worked with many local and international NGOs, including the United Nations. In addition, he has published at least 13 journal articles both locally and internationally. Tim is a Member of the Science Association of Nigeria (SAN), the Nigerian Statistical Association (NSA), the Nigerian Association of Mathematical Physics (NAMP); and the International Association for Statistical Computing (IASC).

What is the R community like in Nigeria?

In Nigeria currently, there are a lot of R users and user groups. We have preached the gospel of R to over 2000 people all over Nigeria. Our R symposiums were always packed with 150 – 200 attendees. Our R meetups were held in the Osun state, here in Nigeria. The R user group in Osun has received grants from the R consortium and also the Society of Research Software Engineers (SRSE), IESC. We have conducted a few symposiums using the universities in Osun.

In the past year, did you have to change your techniques to connect and collaborate with members? For example, did you use GitHub, video conferencing, online discussion groups more? Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?  

During COVID19 here in Nigeria, we used Telegram and WhatsApp groups to connect with members. Every Sunday at 9 pm WAT, I would host an online session to teach R for 1 hour. We used the Zoom platform for video recordings and teachings. The classes were recorded with Zoom and shared across Telegram and WhatsApp for more engagement. Whatsapp is used to communicate updates. This is still ongoing, and we don’t take money. 

Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting? 

As the team leader of the Osun User group, I am assisted by four other organizers who can also speak at our meetups/sessions. This means there is a lot of variety in the people who speak. One of my favorite presentations was “Data graphics using the Plotly command.” I am especially interested in the Plotly application for graphics. It can do very good pie charts.

What trends do you see in R language affecting your organization over the next year?

The team members are committed and passionate, there is the need for spreading the R programming language. The goal is to teach the R language from secondary schools. It is important to do this as technology is spreading and the need for Data scientists increases. 

R could be introduced early to assist students and help push the R programming language. We want to introduce programming to the Government, so it can be included in the syllabus. The challenge is some high school students do not have laptops for learning.

Do you know of any data journalism efforts by your members? If not, are there particular data journalism projects that you’ve seen in the last year that you feel had a positive impact on society?

Some of my group members have worked with newspaper publications. For a symposium, we needed to have some paid publicity. We got it for free and in return, we wanted to do something for the publication; this was where I started to look at how R can be used in journalism. There was data for different sectors, income, and expenditure, for Nigeria alone. I used R to get graphics and percentages to build data analysis. Since then, I don’t think I’ve worked on a data journalism project.

When is your next event? Please give details!

My user group and I have contacted R experts from other African countries in Kenya, Sudan, and Zimbabwe. We’re planning to conduct data science and machine learning workshops using the R language for Africans. More information cannot be given currently since the idea is still in planning. We are looking to send a grant request to the ISC at the R consortium to help fund this idea. This event is slated for January 2022.

Of the Funded Projects by the R Consortium, do you have a favorite project? Why is it your favorite?

My favorite project is The RECON COVID-19 challenge: leveraging the R community to improve COVID-19 analytics resources. It is a vibrant project that talks about covid 19 and how we can leverage R to fix most of the challenges with COVID19.

Of the Active Working Groups, which is your favorite?  Why is it your favorite?

Well, I’d say my favorite is the R/Business and the R/medicine. They are directly connected to real-world problems.

We are also thinking of starting a working group ourselves. This is still in the pipeline. We need the opinions of other team members to make a final decision, but currently, it’s not something I can say so much about. 

Oslo UseR! Group’s Diverse and Inclusive Environment Has Fostered a Resilient R Community

By Blog

R Consortium talked to Raoul Wolf of the Oslo UseR! Group about the wide adoption of R in Norway, both in academia and industry. He explained how the pandemic initially hindered activities of the group, but they bounced back. The group took the opportunity to collaborate with other R communities and invite high profile international speakers for virtual events.

Raoul currently works as a Senior Advisor and Digital Developer within Sustainable Geosolutions at Norwegian Geotechnical Institute. Hailing from Germany, Raoul was first introduced to R when he began his PhD studies in Norway eight years ago. 

When he’s not coding in R, Raoul enjoys the cultural life in Oslo and is really fond of the fantastic museums and the vibrant culinary scene. 

What is the R community like in Norway?

I cannot talk on behalf of the entire R community in Norway, as we have several hubs of the R community. Oslo, as the capital of Norway, has the largest R community. But the R communities in other cities, e.g. Bergen, Trondheim or Tromsø, are also doing great things. 

What I like about the R community is how very welcoming it is. This is true not only for Oslo, but all over Norway. It is a diverse community in terms of cultural backgrounds, professional backgrounds and identities. My professional background is from academia, and we have a vast base in academia in Oslo, but also people from consultancies and the industrial sectors. 

We are lucky to be in the capital city here and can collaborate with the different communities that are already here and using R.

I need to mention that while I am the main organizer of our meetup group in Oslo, I have two wonderful co-organizers, Bethan Cropp and Henrik Galligani Ræder. At a certain point, both were regulars at our events and wanted to take a more proactive role in the community building here, and that’s a blessing. Between the three of us, we recruit speakers and organize events.

In Oslo, we not only have the UseR! Oslo meetup group, but also a great R Ladies meetup group. Across Norway,  other R Ladies groups are really active as well, and we have had great contact with R Ladies Oslo and co-organized events. So we are really happy with this constellation in Oslo. 

How has COVID affected your ability to connect with members?

Massively. It was an enormous challenge initially, but also presented new opportunities. Before the pandemic, we only had physical meetups. When the pandemic hit Europe in March 2020, we had events planned through the summer with confirmed speakers. We postponed our March event, hoping things would get back to normal in April or May. Luckily, Oslo useR!’s previous organizer Dmytro Perepolkin was able to present virtually on short notice, and subsequently we were able to transfer some of our planned physical events to virtual events.

At the end of 2020, we noticed that more and more meetups across Europe have turned virtual. We connected with a few of those meetup groups, but also reached out to international speakers who we thought would be interesting for our Oslo UseR! group.

In January 2021 we had our first “international” speaker, and it was a success event. We followed this pattern to mix up our local talent with some high-profile international speakers, and this has worked well for us. We have seen an increase in the number of people joining the meetup group and in the number of interactions. Along the way, we also figured out practicalities like moderating a chat box.

There was a steep learning curve, but the numbers went up considerably, and now we have over 1,900 members. It is very encouraging having 200+ people participating in our meetups. We started recording our meetups and put them on YouTube. Everyone can go back and watch the videos if they missed out on an event.

In the past year, did you have to change your techniques to connect and collaborate with members? For example, did you use GitHub, video conferencing, online discussion groups more? Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?  

We used the six months to figure out how we can continue with our meetups. We are lucky enough to have a colleague that is affiliated with the University of Oslo and they, very early on, had a professional Zoom license. At the beginning of the pandemic, it was a bottleneck to get people on a video conference like that, and we were lucky enough to have early access to such a platform. We quickly started reaching out to our local talent pool we were in contact with from before.

We are currently in contact with a tech community building where we plan to host our events once we go back to physical events. Additionally, we are exploring the possibility of hybrid events. Ideally we will have physical meetups with livestreams and recordings. This emphasizes our wish to stay in contact with as many people as possible, and to be more inclusive. 

Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting?  

There were a lot of interesting presentations this year. The one that impressed me the most was the one we had in January, which was our first international meetup.

The speaker was Paul Bürkner from Germany, and he talked about Bayesian Multilevel Modeling and presented his brms package. I have used the package before, and it is always a treat to have developers present their own packages. We got a lot of feedback and interaction in the chat box, and had an excellent discussion throughout.

What trends do you see in R language affecting your organization over the next year?

That’s an interesting question! There are several parallel trends in R depending on your point of view. There is the ongoing trend of tidyverse that continues to develop into more ecosystem, like tidymodels. We get a lot of requests at the meetup for integrating R in a larger environment with databases, APIs and visualization tools. Besides the stereotypical use of R in academia, there is an increasing demand for using R in production and business use cases. Data journalism is also becoming more important, and there are some newspaper houses in Norway that are using R.

I don’t know if it qualifies as “trend”, but the positive spirit of dissemination and inclusion in R and the R community is really making a difference. We all witnessed how important it was during the pandemic, and it will be of equal importance moving forward.  

Do you know of any data journalism efforts by your members? If not, are there particular data journalism projects that you’ve seen in the last year that you feel had a positive impact on society?

The Norwegian Institute of Public Health has presented some of its COVID research and predictions using R. The modeling and visualizations were done in R. It was one of the biggest impacts locally to see R prominently displayed in the public. 

When is your next event? Please give details!

Our next meetup, “Wrapping Packages in R with {devtools} and Friends”, is scheduled for 16th of December. It will be our annual holiday season meetup, which traditionally is a bit more light-hearted. The first half will be about “wrapping packages,” demonstrating how to make packages in R. Afterwards, we want to have an open discussion with our community about how they have experienced the last year.

For next year, we are really excited about the planned events until spring. There are several interesting talks in the pipeline, and we will announce them as soon as we have finalized all details.  

Of the Funded Projects by the R Consortium, do you have a favorite project? Why is it your favorite?

Two projects, actually. One is the project financing the webchem package. I have used the package in my academic work, and have directly benefited from having it available.

The other is the consolidation of R Ladies groups. It is both unique and beautiful, and demonstrates the idea of R being a welcoming and inclusive environment.

Of the Active Working Groups, which is your favorite? Why is it your favorite?

The R Community Diversity and Inclusion working group undoubtedly one of them. Even though I don’t have a direct connection, I think the R Validation Hub for the pharmaceutical industry is important. To get into regulatory territory is a huge step for any programming language, and I am happy to see that R is moving in this direction. 

Successful R-based Test Package Submitted to FDA

By Blog

The R Consortium is happy to announce that on Nov 22nd, 2021, the R Submissions Working Group successfully submitted an R-based test submission package through the FDA eCTD gateway! The submission package has been received by the FDA staff who were able to reproduce the numerical results.

All submission materials can be found at: https://github.com/RConsortium/submissions-pilot1-to-fda

The pilot 1 test submission,  was an example submission package following eCTD specifications which include a proprietary R package, R scripts for analysis, R-based analysis data reviewer guide, and other required eCTD components. To our knowledge, this is the first publicly available R-based FDA submission package. We hope this submission package and our learnings can serve as a good reference for future R-based regulatory submissions from different sponsors. Additional agency feedback will be shared in future communications.

About the R consortium R submission working group

The R Consortium R Submissions Working Group is focused on improving practices for R-based clinical trial regulatory submissions.

To bring an experimental clinical product to market, electronic submission of data, computer programs, and relevant documentation is required by health authority agencies from different countries. In the past, submissions have been mainly based on the SAS language. 

In recent years, the use of open source languages, especially the R language, has become very popular in the pharmaceutical industry and research institutions. Although the health authorities accept submissions based on open source programming languages, sponsors may be hesitant to conduct submissions using open source languages due to a lack of working examples.

Therefore, the R Consortium R Submissions Working Group aims at providing R-based submission examples and identifying potential gaps during submission of these example packages. All materials, including submission examples and communications, are publicly available on the R consortium Github page: https://github.com/RConsortium

The R consortium R submission working group includes members from more than 10 pharmaceutical companies, as well as regulatory agencies. More details of the working group can be found at: https://rconsortium.github.io/submissions-wg/

The R consortium R submission working group is open to anyone who is interested in joining. If interested, please contact Joseph Rickert at joseph.rickert@rstudio.com

Need to Code a Difficult Pharma Stats Table? The R Tables for Regulatory Submissions (RTRS) Working Group Wants to Know

By Blog

The R Consortium’s R Tables for Regulatory Submissions (RTRS) Working Group has made considerable progress in identifying and working through the issues involved with developing a modern R based framework for creating tables. The goal is to be able to make it easy for statistical programmers working in pharmaceutical companies to find the right R resources for creating any type of table that may be required to support a submission to the FDA or any other regulatory agency. 

Currently there are at least six R packages that have the functionality to support some portion of the panoply of tables that might comprise an essential part of a statistical report. These packages include

  • flextable
  • gt
  • huxtable
  • mmtable2
  • rtables
  • Tplyr

The trick is to establish a framework that makes it easy to use the combined features of relevant R packages to produce any table that is likely to show up in a production environment.  

RTRS wants examples of tables, those that are already part of your standard statistical submissions, and those that you would use if you could make them. 

              A: Drug X     B: Placebo    C: Combination
                (N=134)        (N=134)        (N=132)    
---------------------------------------------------------
AGE                                                      
  n               134            134            132      
  Mean (sd)   33.77 (6.55)   35.43 (7.9)    35.43 (7.72) 
  IQR              11            10              10      
  min - max     21 - 50        21 - 62        20 - 69    
BMRKR2                                                   
  LOW              50            45              40      
  MEDIUM           37            56              42      
  HIGH             47            33              50      

We want to make sure that we understand and explore the entire, conceivable space of production tables and want to see to which extent we are already able to build them in R. So please send us your tables. 

We will take them any way that is easy for you to provide. Open up an issue on our GitHub repository (https://github.com/RConsortium/rtrs-wg/issues) and drop them in. Something computable would be best, but we will take text or even screenshots. If there is some part of your example table that is particularly vexing to create, please point that out. Be careful not to include any proprietary information.

If we can already build your difficult table, we will show you how to do it in R. This is meant to be a benchmarking process to showcase abilities of the packages so far and to reveal current limitations. If it turns out that we cannot build your table with R’s current table making capabilities, then there is a good chance that we will add it to our “To Do” list.

If you think that you would like to become involved with our work, please let us know that, too. Include your email address and we will invite you to the next meeting.

R Skopje in North Macedonia Talks About the Challenges of Virtual Events for Smaller Communities

By Blog

The R-Consortium talked to Novica Nakov of R Skopje about the challenges of managing an R User Group during the pandemic. Novica told us about the budding R community in North Macedonia and how shifting events online has not been useful for this tightly knit community. However, Novica hopes things will return to normal once they resume events in a physical space.

Novica is a long term free software enthusiast and has been working with the free software community in Macedonia since 2001. Originally from a Social Sciences background, Novica initially contributed to the community mainly on policy issues, legal issues and localization. He later learned R-programming during his postgraduate studies in the UK.

Before forming the R Skopje User Group, Novica was also involved in establishing a hacker space Kika in Skopje 10 years ago. This hacker space hosted Linux help forum days, general knowledge sharing events and one day conferences.

What is the R community like in North Macedonia?

North Macedonia is a small country, so it is lagging in certain areas as compared to the US. Before 2018-2019, there were very few Data Scientist or Data Analyst jobs available in the country. However, there has been a general shift in public interest ever since. The R community in Macedonia is still in its early stages.

How has COVID affected your ability to connect with members?

I started the R User Group in 2018, hoping to provide a knowledge-sharing space for people like myself who do not have a technical background but need to use a programming language for data analysis. We formed the User Group within the hacker space, as there was an established infrastructure available.  

Before the pandemic, we were having regular meetings on Tuesdays. These meetings were like hands-on workshops where people helped each other with problems they faced while working with R. We also had presentations where members shared their projects with the group. Government ordered a lockdown in early 2020, and we had to shut down the hacker space. We also had to cancel the lease on the space.

As it was a tight-knit community which thrived on one-to-one interactions, lockdown significantly reduced the quality of communication between the members.

In the past year, did you have to change your techniques to connect and collaborate with members? For example, did you use GitHub, video conferencing, online discussion groups more? Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?  

After the lockdown, we had 2-3 online events that were R-specific in which different speakers presented. Besides that, we also moved all our regular events to Google Meet, but it didn’t work really well. Since it was a small community, people spent most of the time catching up with each other and had very little interest in the actual slides, etc.

We are talking about a really small country and a really small pool of people interested in data science. Just to give you an idea about the scale, there is a group called Data Science Macedonia, where the main organizer is a Macedonian company ScaleFocus. They used to host events at a local university which attracted around 150 people. Companies such as Slice used to present in these events and discussed possibilities of employment, which was interesting for people looking to start working in the data field. 

Our group, on the contrary, consists of people who are simply curious to learn new programming languages, etc. for personal use. I think online meetings are not really good for such a group, because they don’t allow for person-to-person interactions for solving a particular problem. 

We attempted having online workshops, but it didn’t work out as it required a lot of troubleshooting, which is really difficult unless you have physical access to the participant’s system. I think online meetings are not great for workshops. 

We started talking about reopening the space, as many people are missing the ability to get together. I don’t know if this will happen, but if it does, we will resume our events in the physical space. But I am not very optimistic about the online events. 

Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting? 

I think the last presentation we had was by a Financial Data Analyst, Ljupcho Naumov, and the topic was optimizing your portfolio using R. This event turned out to be a huge success and around 300 people joined the online event. 

I am not sure if people joined the event to learn R or to get free financial advice about getting rich. Another reason for the success of the event was that it coincided with the central bank lifting the ban on personal accounts for trading in foreign brokerage companies.

What trends do you see in R language affecting your organization over the next year?

Given the situation, I don’t think R or development in any language would be as important as dealing with the health crisis.  

Do you know of any data journalism efforts by your members? If not, are there particular data journalism projects that you’ve seen in the last year that you feel had a positive impact on society?

One media organization in Macedonia sometimes runs data driven stories, and I think they are connected by the European network called Organized Crime and Corruption Reporting Project (OCCRP).

Macedonia has joined this open data initiative recently and one non-government organization is running a project on creating datasets from local governments governance organizations.  

There’s also this project I was working on where we were trying to create local datasets from data from local governments. I guess this is something that can potentially be used in journalism. 

When is your next event? Please give details!

We don’t have a specific event in mind at the moment. However, as a lot of members in our events are women, I am hoping to connect them to R Ladies. R Ladies Belgrade is very active, and maybe we can have some sort of collaboration with them. I am also really interested in opening an R Ladies branch as we have this pleasant history of women getting involved in our space.

R Consortium Funding Delivers New Format for the R Journal

By Blog

By Di Cook, Mitch O’Hara-Wild, H. Sherry Zhang, Stephanie Kobakian, Michael Kane, Catherine Hurley, Simon Urbanek

The R Journal is a primary outlet for publishing research of interest to the R community. It publishes articles related to R packages (detailing the the broader context, implementation details and examples of use), applications, comparisons and benchmarking and reviews. The journal was born in 2009, superseding a long-running newsletter, to provide an outlet documenting advances in statistical computing, discuss issues and research opportunities at the intersection of statistical analysis and software, and to encourage awareness of these advances and opportunities within the community.

Articles have traditionally been written using latex, and outputted to pdf, with code and data to reproduce results provided separately. Sweave emerged later as a commonly used dynamic document system and allowed R code to be embedded directly into document whose output was tex document that included output (including tables and graphs) generated from embedded R code. This was essentially an accepted pre-processing step resulting in a tex conforming to the structure required by the R Journal.

The operations of the R Journal are entirely run using R itself and, to our knowledge, it is the only journal of this kind. Common operations performed by editors including acknowledging submissions, sending review requests, organising reviews, emailing authors about their paper’s status, compiling articles into an issue, and populating the web site are all contained within the rj package, which is available on Github. It’s lovely to contemplate; R is the editorial workflow system for the R Journal.

The journal is produced entirely by volunteers, like much of the R community. It is an ever-changing group of people who get involved because they care about the community and the availability and survival of a publishing outlet for research related to R. Some members are lucky to have the support of their workplace, usually academic, and juggle the responsibilities with teaching, their own research or consulting. The editorial responsibilities include updating editorial management software (with the rj package) as well as managing article submissions.

Inspired by a talk by Yihui Xie in 2018 “Towards An Open-access, Fast, and Reproducible Journal”,and knowing that she would be committing to be an editor of the R Journal from 2019, Di submitted a grant to the R Consortium requesting funds to experiment with changing the format and operations. It was successfully funded at $50,000USD to support the “hiring of an editorial assistant, and to experiment with alternative operations to streamline checking compliance of submitted papers, soliciting, constructing and returning reviews, copy-editing, construction and web delivery of new issues”.

Central to the new format was utilizing Rmarkdown for document writing. Just reviewing Yihui’s slides I now see the wonderful gif describing a latex foundation (slide 1630) document, and I can confirm that the first year as Editor-in-Chief (this year) that working with the the rj functions on the latex files to build an issue was very much like this, one little bump brings the whole system down. Rmarkdown is a much friendlier version of dynamic document building than Sweave, and writes to various outputs, including html, thanks to pandoc. It is much more elegant for graphics – which is dear to my heart. With html format there is also the prospect of including interactive graphics.

We are now happy to announce that there is a new way to write R Journal articles, that deliver them in html. The new web site can be viewed in development version, with a goal to rolling it out as the main site with the December issue this year.

The new web site has these features:

  • revised overview of the journal
  • updated instructions for authors
  • html formatted articles submitted as Rmd, in addition to the pdf format.

There are numerous benefits of the new html formatted articles, which include the ability to include interactive graphics, and that it is a more accessible format for the blind members of the R community. (Jonathan Godfrey is suitably excited!) To get a glimpse of what is possible with interactive graphics, take a look at the article Conversations in Time where you can find embedded, a plot with linked brushing between multiple plots, and a movie.

The way to get started yourself is to use the rjtools. The function create_article() creates a template in your project, containing a variety of key points, and an interactive plot made with ggplot2 and plotly. This package also allows you to check your article, prior to submission, a little like the way the devtools package helps to check your R package prior to submitting to CRAN.

In the interim, the funding has supported numerous developments to the main workhorse package for operations, rj. This includes:

  • A new submission system: amazingly the previous system was running on submission to a wufoo form hosted by the 2013 editor, Hadley Wickham. The new system uses a Google form, and populating into a Google spreadsheet, owned by current editors, with ownership migrating to new editors as needed.
  • pkgdown web site that includes better documentation on operations.
  • New functions like match_keywords() which allows finding reviewers whose expertise matches the the keywords for the article.
  • New functions for paper management such as tracking articles, and finding articles that are slow-moving in the repository.
  • A new issue building system. This is such a wonderful change, and a relief to the huge time commitment previously needed. With the new system it should be possible to bring out more than two issues per year to handle the larger number of articles.
  • A host of new functions to check articles, that have been migrated now to the rjtools package.
  • New functions to manage Associate Editor and reviewer load.
  • A system to manage the new process including Associate Editors in the review workflow.

The funding has also provided intermittent editorial assistance. Initially, editorial assistant converted the long, long author guidelines into a check-list, multiple check-lists. An initial check-list was used for identifying articles that were not conforming to the guidelines when they were submitted. The second check-list is targeted to the article proofing prior to publication. Authors are provided with this list when their paper is accepted and asked to check their paper carefully, which has helped reduce the load from the editors. The first issue of 2021 was also copy-edited by the editorial assistant, quite painstakingly detailed suggested edits as red-inked pdfs sent to the authors to correct. We hope that you can see the language improvement in these articles.

In the near future we hope to develop the journal operations in two directions: a system for authors to track the progress of their paper through the review system, and a web-based system to help editors manage reviews.

To summarise, the funds from the R Consortium have provided the freedom for the R Journal editors to experiment, and dramatically improve the operations.

Eskisehir R Users Group in Turkey Talks About Pushing Boundaries During the Lockdown

By Blog

The R consortium talked to Mustafa CAVUS of the Eskisehir R Users Group (also on Twitter) about handling group activities during the pandemic. Mustafa shared that online events during the pandemic have allowed their group to evolve as a global forum for Turkish R users and hybrid events in the future will help them further increase their reach. 

Mustafa is working as an Assistant Professor at Eskisehir Technical University, Department of Statistics and is also a visiting Postdoc Researcher at Warsaw University of Technology, Faculty of Mathematics and Information Science. His research interests are in the areas of AutoML and Explainable AI.

What is the R community like in Turkey?

There are many R users in academia and industry in Turkey. Many people have become interested in package development and Shiny development beyond being a user. Especially in recent years the interest I have seen in the use of R in several fields in Turkey amazes me. From a statistician’s point of view I can see some interesting examples in urban planning and landscape architecture using geo-spatial data. I can say that there is a really large number of R users and enthusiasts in Turkey.

How has COVID affected your ability to connect with members?

Actually our user group was founded just before the pandemic. When we were planning our first meeting, we were unable to plan a face-to-face event due to the pandemic and we conducted all our activities online. Although this reduced the overall quality of communication, it also enabled us to meet many people outside of Eskisehir. 

In the past year, did you have to change your techniques to connect and collaborate with members? For example, did you use GitHub, video conferencing, online discussion groups more? Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?  

We certainly use these methods e.g. video conferencing through Zoom is used often. I think many events will be planned as hybrids in future as we get better at using these techniques. And we are also using virtual tools to provide easy access to many people all over the world. We produced some content in the pandemic period for our Youtube Channel in Turkish with Uğur Dar, co-organizer of our community..  

Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting? 

As an Eskisehir R user group we had the opportunity to get together with R users from different cities in Turkey and even the world at the events we started. Over the time our group has evolved into a forum that is not just local. 

We turned this harmony which we achieved together with Turkish R users living in different countries into a great environment. 

The online conference we held in April 2021 with the support of Why R? Foundation and, of course, the R consortium, was the largest R conference held in Turkey to date with over 2500 registered and 800 on spot participants. We as an organization committee wrote about the details and our experiences in the “Conference Report of Why R? Turkey 2021 (PDF)” which was published in the latest issue of the R Journal also. 

I think that each presentation at this event was interesting because it appealed to different audiences.

What trends do you see in R language affecting your organization over the next year?

Actually I know that with the conference held in April 2021 many researchers and professionals from industry started to work together by using R. My expectation for the next year is that this interaction will continue to increase, and we will see interesting usage examples of R in several disciplines. 

Do you know of any data journalism efforts by your members? If not, are there particular data journalism projects that you’ve seen in the last year that you feel had a positive impact on society?

As far as I know, not in our community, but there is a quite interesting project on data journalism in Turkey called VeriPie. A group of undergraduate students from the department of statistics of Middle East Technical University are engaged in up-to-date data journalism practices mostly using R.

When is your next event? Please give details!

Actually, we are working on an online conference that we plan to organize for Turkish R users in the second quarter of 2022. The overwhelming interest in the event we organized last year motivated us to organize a more extensive event. It will consist of lightning talks, regular talks and also invited speakers. Last year our event just consisted of invited speakers. 

Of the Funded Projects by the R Consortium, do you have a favorite project?  Why is it your favorite?

This is a really tough question I think. My favorite is the RECON COVID-19 Challenge. It is a really promising project and actually I had interest in it but I couldn’t find enough time to contribute. 

Of the Active Working Groups, which is your favorite? Why is it your favorite?

I would say R Medicine because I have been working on Medical datasets lately in my postdoc period. 

Birmingham R talks about the difficulties of socializing in an online space

By Blog

One of the difficult parts of running a group in an online space is maintaining social interactions that you would normally foster with in-person meetups. R Consortium talks to Adnan Fiaz about how he is attempting to create those interactions in the online meetups. 

Adnan is Senior Data Scientist at National Grid in Birmingham, England. He is an analytics professional with a passion for mathematics and complex challenges. And outside of work and R, has a keen interest in playing football, cinema and general aviation.

RC: What is the R community like in Birmingham?

AF: I took over about 3 years ago. Before that, there were several meetings, but they slowed down quite a bit. I came in with quite a few new ideas. We started with 1 or 2 meetings every quarter. We had a good rhythm until 2020. It was getting harder to get speakers, but I was able to find them. We were able to have about 20 people attend the meetings and that was quite good for Birmingham. It was a mix of academics from the local university, NHS staff from the area, and scattered R Users that used it in their businesses. We also had new users and people who had been using it for years. It was very diverse.

RC: How has COVID affected your ability to connect with members?

AF: We were struggling in the beginning because we didn’t know what to do. It depends on how you deal with it as an organizer and a community builder. I was leaning on face-to-face contact and others to help me out. Online, it was harder to engage with people and to ask for speakers from the local community. We didn’t have a meetup for 4 or 5 months. In the autumn we had a meetup. Then we didn’t have anything until the winter. At the beginning of this year, we decided to do meetings online and jumped on the bandwagon of the Global R community. We advertised their meetings on the Birmingham R page to give the community something to watch. Then we organized a meetup of our own in between their events. There is a lot less interaction with the members this way. People tend to be less interactive in online meetings and spaces. Since you must put a lot of effort into forcing socialization in these online spaces. I am looking forward to being able to go back in person.

RC: In the past year, did you have to change your techniques to connect and collaborate with members?  For example, did you use GitHub, video conferencing, online discussion groups more?  Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?  

AF:  Slack, Twitter, and Zoom are the technologies that we use mainly. We also have a GitHub page that we use. These allow a lot of people to attend our online meetings.

RC: Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting? 

AF: We had several from the local meetup. We had a presentation/workshop from Birmingham University about mixed effect models by Bodo Winter. He explained mixed effect models from the basics of how they work as well as the more complex models that can be done. Once people had an understanding, they were able to ask more pointed questions. I was surprised because there was more engagement in the second part, mostly because people understood the concepts. There were a lot of questions, and people seemed to take it in a good way.

RC: What trends do you see in R language affecting your organization over the next year?

AF: We will probably see more people branching into how to build different statistical models. In the last year, we saw packages brought into the tidymodels framework and building upon caret and splitting it, and building specific parts. In short, having better support for model building.

RC: Do you know of any data journalism efforts by your members?  If not, are there particular data journalism projects that you’ve seen in the last year that you feel had a positive impact on society?

AF: I think one that got a lot of attention was the covid visualization by the Financial Times by John Murdoch. They were very informative. He also spent a lot of time discussing how he created them and engaging with everyone on Twitter.

RC: When is your next event? Please give details!

AF: November 18th. We will be meeting in person.

RC: Of the Funded Projects by the R Consortium,  do you have a favorite project?  Why is it your favorite?

AF: The most useful one to me is the R Ladies Project. I have used the materials from that project to start the meetup again as well as tips to increase engagement.

RC: Of the Active Working Groups, which is your favorite?  Why is it your favorite?

AF: I think the most interesting one is the R Certification. I remember when it was first proposed that it would be useful for meetups to give them a framework. We started with small segments before the meetup to start learning R in a beginner’s course. Just 10 minutes before the meetup to warm up. The R Certification would help guide that.

RC: There are four projects that are R Consortium Top Level Projects. If you could add another project to this list for guaranteed funding for 3 years and a voting seat on the ISC, which project would you add? 

The current four projects are:

AF: I saw some work from Heather Turner on the future of R developers. That would be interesting to get more focus on because it would be good if we had more diversity in the core team of R Development.

How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past 4 years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute.