Woo June Jung, Founder of the R Korea Group (also on Facebook) recently talked to the R Consortium to discuss his efforts to promote the use of R in Korea. He stressed the importance of communities and also shared the group’s experience of hosting an annual R User Conference in Korea for six consecutive years. They R Korea Group stopped hosting the conference during the pandemic but are hopeful to start again this year. He also discussed his work on two accounting projects developed in R.
Woo June has played a vital role in building the R community in Korea and now hopes to start a R-Ladies chapter in Korea.
Please share about your background and your involvement in the R Community. What is your level of experience with the R language?
I don’t remember exactly, but I think it was around 2005 when I first encountered R, through my econometrics course in graduate school. With some experience in languages like C or Basic, I was fascinated by the amazing capabilities of R and began using it for research from that time onwards. Of course, I knew how to use SAS and SPSS, but I thought R was much better in many aspects.
Around 2010, the big data boom in Korea began and R started receiving attention as well. I think this was around the time when I started using R for analytical work. At first, I analyzed survey data, and later worked on web log analysis, text mining, sentiment analysis, anomaly detection, recommendation, etc., in various fields such as finance, commerce, communication, and manufacturing.
In addition to my two roles, as an analyst and researcher using R, I also founded and manage the Korean R user community (R Korea). The reason for this is that I hoped other people would learn about this amazing programming language. I had been using a Facebook group for building the community, but I am currently working on moving it to a web-based platform. In the early days of the community, I held free and open R seminars every other week for 2-3 years to expand our user base. With these seminars, the community grew rapidly and now has over 10,000 users. From 2014 to 2019, R Korea also held an annual conference called R User Conference in Korea (RUCK), which attracted at least 400 attendees each year, with some years exceeding 700 attendees. In 2020, we invited Hadley Wickham, but due to the COVID-19 situation, the conference was canceled, and we have not been able to hold it again since.
Unfortunately, during this period, people’s interest in R declined as artificial intelligence rapidly rose in popularity. However, some people who had positive memories of the RUCK conference contacted me and expressed their desire to keep the tradition going (special thanks to Kim Jin-seop and Na Young-jun). I hope we can hold the RUCK again for 2023. Personally, I have made good friends through community activities. One of my closest friends is Jeon Hee-won, who developed the Korean morphological analysis package KoNLP. (Currently, the KoNLP package is not supported on CRAN, and I would like him to maintain the package again if his circumstances allow.)
The community provides a communication channel with people in related fields. Community participants can solve their own problems within the community and also help others. Sometimes, they can relieve daily fatigue with witty jokes. These activities can help develop capabilities as analysts. I think these are the reasons why people seek out communities.
I consistently receive a lot of help on the new trends in R through R-bloggers, and I am very interested in the activities of R-Ladies. I would like to launch R-Ladies Korea through a newly opened community website.
What industry are you currently in? How do you use R in your work?
Currently, I have two roles. One is a DX researcher in the finance industry, and the other is a researcher in the accounting field. In my job, I mainly deal with the valuation modeling of unlisted companies and startups, which fortunately is also one of the research topics in accounting. As an accounting researcher, I am interested in the DX field of accounting such as structured accounting information like XBRL and digital reporting. Since accounting information is provided to the market mostly in the mixed form of data types such as numbers, text, etc., text mining can also be used in accounting research. I use R for both work and research.
What trends do you currently see in R language and your industry? Any trends you see developing in the near future?
Currently, I am working or researching in various fields, but artificial intelligence is the trend. For this reason, the demand for R in Korea has significantly decreased, and it is in a dangerous state. This phenomenon is thought to be due to Korea’s sensitivity to trends and small market size.
The scope of (statistical) data analysis is quite broad. However, data analysis is often simply classified into data analysis (descriptive statistics from a statistical perspective) and machine learning (there is also a machine learning field in statistics). Special thanks to the authors who wrote ISLR. Many people claim that using traditional but most used models such as regression is outdated, and machine learning is a new approach. This is wrong obviously.
Nevertheless, as the IT industry grows and the significant achievements of artificial intelligence and machine learning have become a major trend, interest in statistics significantly decreases. People seem to be more interested in creating IT services or products and believe that using artificial intelligence can create better services. In this environment, Python, a general-purpose language, has become the dominant language. This trend is likely to continue for some time to come. R, which is strong in (commonly called) statistics, is likely to face more difficult times in Korea. However, to confirm the analysis results at the same level as those output by R, Python requires much more code to be written. R just lacks deep learning packages.
Please share about a project you are currently working on or have worked on in the past using the R language.
Currently, I am not working on any projects like developing packages. Instead, I am focusing on research papers related to accounting. Previously, I worked on projects called WARD (Wrangling Accounting-Related Data) and AIA (Accounting Information Analysis) related to accounting, but they are currently private.
WARD is a project that structures and digitizes accounting-related information. Accounting information has diverse ranges and types. Accounting information does not simply mean financial statements, and it has been said that all publicly available data can be a subject of accounting research such as privacy, ESG, etc.
For example, one of the important research topics in accounting is firm value relevance, which requires stock price data. If it is an evaluation of the value of an unlisted company, it may be necessary to crawl funding news. Forensic techniques are also used to investigate accounting fraud, and recently, AI is also used for continuous auditing. To conduct accounting research for these purposes, various types of data such as tabular data, text data, and numerical and string data stored in databases or web must be handled, and various analysis techniques such as visualization, analysis using numerical values, and text mining can be applied.
WARD aimed to develop a data package for accounting that provides these various types of data. However, as the data size grew larger, it became difficult to handle accounting on GitHub and it continues to be used for only private purposes. AIA provided functions to analyze the data provided by the accounting package, but it is also being operated privately.
What resources or techniques did you use?
I am performing all the work for my research papers in the Posit (RStudio) environment. Recently, I have been conducting research on information security, internal controls, ESG, and the valuation of unlisted companies. Of course, tidyverse is always one of the first packages I load, and I even use Python in the Posit environment. I use several other packages as well, but there are too many to list in detail.
How do I Join?
R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!