R Consortium talks to LA R Users founder Szilard Pafka about how the community started, how they adapted to the pandemic and how things have evolved in the past fifteen years for the group.
RC: What is the R community like in Los Angeles?
SP: The Los Angeles R Users Group/LA R meetup was founded in March 2009 by me, Szilard Pafka, and Professor Jan de Leeuw, the then Chair of the UCLA Statistics Department. It was not only the first R meetup in the area, but the very first data/data science meetup in LA (actually the term data science became more widely used only later). Right from this early beginning, we covered not only R, but also more broadly statistics, data visualization, machine learning, and more, all through the R language. The meetup quickly attracted a lot of people and it was one of the 3 earliest, largest, and most active R communities in the US along with San Francisco Bay Area and New York.
While the first few years the events were mostly hosted at UCLA, we moved slowly to Santa Monica startup locations (including Google) and the focus became even more on tools/techniques that can be used in day-to-day data science practice. We often had speakers from out of town including some of the fame in the R community. In 2014 we became part of DataScience.LA, a community of meetups with a website that made sharing of information and knowledge even more easily.
In 2018 a group of young organizers (Malcolm Barrett, Emil Hvitfeldt, George G Vega Yon, Keren Xu) started a separate sub-group (LA R East) with events at USC, and then in 2019, Amy Tzu-Yu Chen started LA R West. During the pandemic the events became online and while networking became more difficult, the positive side effect was that now it was easier to “bring” more renowned people as speakers.
The other side effect was that now anyone from all over the world could join the meetup and enjoy the show. In July 2021 I (Szilard) moved to Texas and the original meetup group we used for events has moved too and will do mostly online/USA rather than Los Angeles focused events, while the other organizers will continue the LA events (online for now and then in-person later under the new brand of Southern California R by joining efforts with other SoCal R groups).
RC: How has COVID affected your ability to connect with members?
SP: While moving online has had some positive side effects on the talks part, the networking part of the meetup (which is the other equally important component) has suffered. While organizers could still connect with the members, and the Q&As at the end of the talks have worked pretty well (that is members asking questions and the speakers answering), the lively/casual discussions between members after the meetup were completely missing. Over the many years previous to COVID I have heard countless stories from members about how they managed to get a job by starting chatting with employers at our events, and also the other way around (I have many friends who managed to hire great people via the meetups). Unfortunately, all this has been missing, and also the general hanging out and face-to-face meets that build up slowly but surely a community.
RC: Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting?
SP: I liked James Lamb’s presentation on Writing command-line interfaces to R. The topic was interesting and very technical, but there was something to learn for people at all levels and James is a fantastic speaker.
He also gave another awesome talk at the “sister” meetup (DataScience.LA) about LightGBM (one of the most popular gradients boosting machines implementations), which was particularly interesting since James has been leading the R side of that important machine learning project (and he can be accredited with getting the library finally accepted to CRAN).
RC: What trends do you see in R language affecting your organization over the next year?
SP: I’ve been using R at the company I’m working at since 2006. While 15 years ago some of the R tools were more “rough,” even then we managed to do most of what we needed for analytics in R. We had for example machine learning models trained and running in production in R and even sophisticated graphical monitoring dashboards built with cronjobs, R, the lattice R library and HTML templates. R was already a viable and great tool for all data work 15 years ago.
Since then things in R have become even easier, more robust, and with more features. For example, shiny has made creating interactive graphical tools and dashboards a breeze. Or on the machine learning front R integrates now all the top high-performance machine learning libraries used in practice/business applications (e.g. gradient boosting machine libraries such as xgboost, lightgbm, h2o, or catboost, both on CPU and GPU, and also neural network libraries such as tensorflow or pytorch, etc). And tremendous work has been done to improve R in performance, reliability, and integration with other tools – making R even easier to use in production.
RC: When is your next event? Please give details!
SP: The part of the meetup that has moved to Texas/online/pan-USA will have its first next event in September and it will involve using R in production (stay tuned!) and I’m sure the organizers of the remaining SoCal R groups are also busy planning their next event.
RC: Of the Funded Projects by the R Consortium, do you have a favorite project? Why is it your favorite?
SP: First of all, I have to say that it is great we have the R Consortium and the funding and we can sponsor so many projects. As for my preference, I’m really happy to see projects that improve R’s performance, speed, memory usage, reliability, integration with other tools – in a word R’s ability to compete for being the best tool for data science and for being used in production.
So my favorites from this year’s batch are “Development and maintenance of the Windows build infrastructure” and “MATTER 2.0: larger-than-memory data for R.” We need to compete with Python and be able to dispel views that R is not suitable for serious projects or in production.
RC: Of the Active Working Groups, which is your favorite? Why is it your favorite?
SP: For the same reason as the above, my favorites are “Code Coverage,” “Distributed Computing” and “R / Business.”
RC: There are four projects that are R Consortium Top Level Projects. If you could add another project to this list for guaranteed funding for 3 years and a voting seat on the ISC, which project would you add?
SP: Tools for R in production. We need to make sure that we are seen as a viable competitor with Python for production. So, we need to have more tools to do so.
How do I Join?
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past 4 years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!