Essential Python Machine Learning Libraries

Essential Python libraries which will save you a lot of time when dealing with data analysis and machine learning. I’ve listed the most used libraries and their main uses.



  • Numerical Python, used for numerical computing.
  • Fast multidimensional array object ndarray
  • Operations between arrays
  • Reading and writing array-based datasets to disk
  • Linear algebra, fourier transform, random numbers
  • C API to enable extensions and C or C++ code to access data structures and computational facilities




  • High level data structures and functions. Work with structured or tabular data fast and easy.
  • DataFrame – tabular, column-oriented data structured with both row and column label, and the Series, a one-dimensional labeled array object
  • NumPy + relational databases
  • Reshape, slice and dice, aggregations, subsets of data
  • Data structures with labeled axes supporting automatic or explicit data alignment
  • Integrated time series functionality
  • Same data structured to handle both time series and non-time series data
  • Arithmetic operations and reductions that preserve metadata
  • SQL functions
  • Flexible handling of missing data


  • Plots and other two-dimensional data visualizations.


  • Collection of packages addressing a number of different standard problem domains
  • scipy.integrate: numerical integration routines and differential equation solvers
  • scipy.linalg: Linear algebra routines and matrix decompositions
  • scipy.optimize: Function optimizers(minimizers) and root finding problems
  • scipy.signals: signal processing tools
  • scipy.sparse: sparse matrices and sparse linear system solvers
  • scipy.special: SPECFUN, gamma function
  • scipy.stats: continuous and discrete probability distributions (density functions, samplers, continuous distribution functions), various statistical tests and more descriptive statistics


  • Classification: nearest neighbors, random forest, logistic regressions, SVM…
  • Regression: Lasso, ridge regression…
  • Clustering: k-means, spectral clustering…
  • Dimensionality reduction: PCA, feature selection, matrix factorization…
  • Model selection: Grid search, cross validation…
  • Preprocessing: feature extraction and normalization


  • Statistical analysis and econometrics
  • Regression models: Linear regression, generalized linear models, robust linear models, linear mixed effect models…
  • Analysis of variance
  • Time series analysis
  • Nonparametric methods: Kernel density estimation and regression
  • Visualization
  • Statistical inference, uncertainty and p-values