It's not a question of which one will end up beating the other, but rather which language you should primarily focus your time and effort on.
dying language. Except that more than half of all data scientists use both in their daily routine. But why?
The R ecosystem is really powerful
I've never been a big fan of statistics. I know enough to analyze data perfectly, but when I take a look at R and its packages, I feel like a grad student seeing statistics 101 for the first time. R has a long tradition among academics and statistics experts. The sheer amount of projects available is impressive. Right now, CRAN, the largest R repository, has over 12,000 packages that are being updated.
Need a Lavaan test? You understood! Factor analysis? It's right there in the Psych package. Structural equations? Please at least try to make it a challenge. Keep in mind that Python also has many of these things implemented, but for the really opaque stuff, R still reigns as king.
Since most R developers are academics, most of these packages are specifically designed to solve academic problems. For example, Psych, that package I mentioned earlier, is aimed at psychologists who work in psychometrics.
R is for you if what you're looking for includes:
- Functions designed to prepare data for specific analysis
- Ready-made functions to perform analysis and interpret results
- Functions that create custom graphs for these analyzes
- All of this supported by documentation based on academic books,
Python's versatility is unbeatable
If R is the old but reliable Mustang your father bought when he was a teenager, then Python is a fancy Tesla. Python was originally built with readability in mind so that it could act as a gateway for new programmers, and it shows. The syntax is user-friendly and even a new programmer can read the code of a Pythonista (an advanced developer) and get a general idea of what he is doing.
Since Python is a multipurpose language, it is much more versatile than R, which makes it the ideal language for integration with other platforms. Just as an example, a colleague of mine is currently developing a game in Python that records decision-making data, uploads it to a server where it is analyzed, and will eventually be automatically uploaded to a web page so other scientists can take a look at the data.
A few years ago, most data scientists would have preferred R as it had a more robust set of tools for machine learning. But that is no longer the case. Python today equals (and sometimes surpasses R) as the best language for Artificial Intelligence.
Python is growing at a gigantic pace, and the main reason is that the developer space (at least for data science) is shared between programmers and scientists, so you have people with theoretical knowledge working side by side with people with technical knowledge. know how.
Some critics believe that Python is harder to get into because the ecosystem is so large. In my experience, as a data scientist, you only need to dabble with a small part of Python's tools and go further as you explore new possibilities. For new data scientists, you only need five libraries: Numpy, Scikit-learn, Pandas, Scipy, and Seaborn.
The best tool for you
A programmer who wants to delve into data science should start with Python and then tinker with R when necessary, while an academic may feel more comfortable starting with R and then moving to Python as they run larger projects.
Most of us simply use one or the other depending on what we feel comfortable with and what each language can offer to solve a problem. Python users can easily import R functions and vice versa.
It's not a question of which one will end up beating the other, but rather which language you should primarily focus your time and effort on. In my humble opinion, with the huge expansion we are seeing with Python, I find very little reason to tell anyone to start with R. But even so, any data scientist worth a penny should have a good understanding of both languages .
If you liked this article, check out one of our other Python articles.
- Python and Big Data, a current trend
- Is Python the right tool to help your company visualize data?
- 5 Best Python Data Visualization Libraries
- Python Poetry: A Poem for Python Dependency Management
- What are the skills required for Python developers?