Python and machine learning

June 3, 2024 Roberto Magalhães

Harness the power of Python in machine learning. Create intelligent, scalable ML models. Transform your data-driven decision making with Python.

For many, Python is not a language of the past, but rather a language of the present and future. Although Python was created in the 80s, it is very prominent in web development and industries that rely on simple and flexible programming languages.

Python suits this project better than many other languages. In fact, Python is such a simple language to learn and use that it is often one of the first languages that those new to programming work with. What makes Python so easy to use? Firstly, it's an interpreted language, which means you don't need to compile code to run it: just write and run your code.

Python also follows a philosophy that makes it possible to write few lines of code to complete a task. And with so many modules, classes, exceptions, data types and libraries available, there is very little this language cannot do.

Although Python's specialty is the web and web applications, it has several other tricks up its sleeve. One such trick is machine learning. For those who don't know, Machine Learning is a subset of Artificial Intelligence where computer algorithms automatically improve with experience. In other words, making a computer perform some task without actually programming for that task.

And considering that Machine Learning has become widespread, with companies like Google, Amazon, LinkedIn, and Facebook depending on it, ML is enabling exponential leaps in modern technological advancements.

Let's take a look at why you should consider Python for your machine learning needs.

Is Python good for machine learning?

In a word, yes. In fact, Python is thought of as the language of choice for machine learning. Although Python may be considered “slower” than some languages, its data manipulation capabilities are one of the best.

But what makes Python so exceptional with machine learning? There are several reasons, including:

1- It is easy to learn (compared to other languages capable of integrating with ML).

2- It is open source, which means it can be better integrated with new technologies as they emerge.

3- It can interact with practically any programming language and platform.

4- It has an incredible library ecosystem.

5- It is very flexible, with the ability to follow object-oriented or scripted routes.

6- Offers many viewing options.

7- It has great community support.

How do I start learning machine learning with Python?

To get started with Machine Learning and Python, you must first learn the Python language. Once you've got that taken care of, you'll also need to familiarize yourself with a framework like Django. With them under control, you should learn one of the ML-specific libraries. You will also need to understand the basics of machine learning. For example, every machine learning algorithm uses three main concepts:

1- Representation, which is how you represent knowledge.

2- Evaluation, which is the way of evaluating programs, focusing on accuracy, prediction, recall, quadratic error, likelihood, posterior probability, cost, margin, entropy and kL divergence.

3- Optimization, which is how ML programs are generated.

You will also need to understand the four types of machine learning:

1- Supervised learning is where the training data includes the desired result.

2- Unsupervised learning is where the training data does not include the desired result.

3- Semi-supervised learning is where the training data includes more than one desired outcome.

4- Reinforcement learning is where an agent learns to achieve a goal in an uncertain and complex environment.

Once you have a solid understanding of how machine learning really works, you'll be ready to begin your journey using Python to make it happen.

Is Python fast enough for machine learning?

Although Python is not the fastest programming language available, it has already proven itself to be more than capable enough to handle the demands of Machine Learning. To overcome what some might consider a slump, Python has a number of tools available to compensate. For example, Pandas is a tool used in data science to clean, transform, manipulate and analyze data. With these tools, you can ensure that data used with machine learning is better optimized for use.

Which version of Python is best for machine learning?

At one point, the best version of Python to use for machine learning was 2.7. However, the 2.x iteration of Python has been deprecated, which means you will have to use a version of Python 3.0 or newer. As of this writing, the latest stable version of Python is 3.9.0. If you want the latest features and security updates, your best bet is to use these or newer versions (if available).

Python Machine Learning Library

Before we cover an example, you'll need to know which libraries are available for use in machine learning in Python. The available languages are:

NumPy is a library for large multidimensional arrays and array processing.
SciPy is a library that contains modules for optimization, linear algebra, integration and statistics.
Scikit-learn is the library used for classical machine learning algorithms (in particular, those for supervised and unsupervised learning)
Theano is used to define, evaluate, and optimize mathematical expressions involving multidimensional arrays.
TensorFlow is a framework that involves defining and executing calculations involving tensors.
Keras is a high-level neural network API.
PyTorch allows developers to perform computations on tensors with GPU acceleration.
Pandas is one of the most popular libraries for data analysis.
Matplotlib is the most used library for data visualization.

Python Machine Learning Example

Let's demonstrate a basic experiment using machine learning with Python on Ubuntu Linux. First you will need to install Python with the command:

 sudo apt-get install python3 -y

Once the installation is complete, you will need to install the necessary libraries. First, let's install the necessary libraries with the command:

 sudo apt-get install python3-numpy python3-scipy python3-matplotlib python3-pandas python3-sympy python3-nose python3-sklearn -y

Then gain access to the Python console with the command:

 python3

In this console, check all required libraries by pasting the following and pressing Enter on your keyboard:

 # Python version
 import sys
 print('Python: {}'.format(sys.version))
 # SciPy
 import scipy
 print('scipy: {}'.format(scipy.__version__))
 # NumPy
 import numpy
 print('numpy: {}'.format(numpy.__version__))
 # matplotlib
 import matplotlib
 print('matplotlib: {}'.format(matplotlib.__version__))
 # pandas
 import pandas
 print('pandas: {}'.format(pandas.__version__))
 # scikit-learn
 import sklearn
 print('sklearn: {}'.format(sklearn.__version__))

The output should print the versions of each library.

Next, let's load the libraries. Paste the following into the console:

 # Load libraries
 from pandas import read_csv
 from pandas.plotting import scatter_matrix
 from matplotlib import pyplot
 from sklearn.model_selection import train_test_split
 from sklearn.model_selection import cross_val_score
 from sklearn.model_selection import StratifiedKFold
 from sklearn.metrics import classification_report
 from sklearn.metrics import confusion_matrix
 from sklearn.metrics import accuracy_score
 from sklearn.linear_model import LogisticRegression
 from sklearn.tree import DecisionTreeClassifier
 from sklearn.neighbors import KNeighborsClassifier
 from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
 from sklearn.naive_bayes import GaussianNB

Press Enter to return to the Python console.

Upload a dataset for the experiment with the following:

 # Load dataset
 url = "
 names = ('sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class')
 dataset = read_csv(url, names=names)

View the first 20 lines of data with the command:

 # head
 print(dataset.head(20))

You should see the first 20 lines of data printed.

Let's summarize all the data by pasting the following into the Python console:

 # summarize the data
 from pandas import read_csv
 # Load dataset
 url = "
 names = ('sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'class')
 dataset = read_csv(url, names=names)
 # shape
 print(dataset.shape)
 # head
 print(dataset.head(20))
 # descriptions
 print(dataset.describe )
 # class distribution
 print(dataset.groupby('class').size )

Finally, you can build and evaluate models from the loaded data by pasting the following into the console:

 # Spot Check Algorithms
 models =
 models.append(('LR', LogisticRegression(solver="liblinear", multi_)))
 models.append(('LDA', LinearDiscriminantAnalysis ))
 models.append(('KNN', KNeighborsClassifier ))
 models.append(('CART', DecisionTreeClassifier ))
 models.append(('NB', GaussianNB ))
 models.append(('SVM', SVC(gamma="auto"))) 
# evaluate each model in turn
 results =
 names =
 for name, model in models:
 kfold = StratifiedKFold(n_splits=10, random_state=1, shuffle=True)
 cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring='accuracy')
 results.append(cv_results)
 names.append(name)
 print('%s: %f (%f)' % (name, cv_results.mean , cv_results.std ))

After pasting the above text into the console, press Enter, and it will print the different models and accuracy estimates, which can be compared to choose the most accurate one.

And that's a pretty simple example of machine learning with Python.

Conclusion

And this is how and why Python is considered the best language for Machine Learning. With this information, you are ready to begin your journey in this fascinating field. There's a lot to learn and we've only just scratched the surface, so expect to spend a good amount of time learning the basics of this complicated technology.

If you liked this article, check out one of our other Python articles.