Home / Blog / Data Science / OBOE - Collaborative Filtering AutoML

OBOE - Collaborative Filtering AutoML

October 13, 2022
45

Meet the Author : Mr. Bharani Kumar

Bharani Kumar Depuru is a well known IT personality from Hyderabad. He is the Founder and Director of Innodatatics Pvt Ltd and 360DigiTMG. Bharani Kumar is an IIT and ISB alumni with more than 17 years of experience, he held prominent positions in the IT elites like HSBC, ITC Infotech, Infosys, and Deloitte. He is a prevalent IT consultant specializing in Industrial Revolution 4.0 implementation, Data Analytics practice setup, Artificial Intelligence, Big Data Analytics, Industrial IoT, Business Intelligence and Business Management. Bharani Kumar is also the chief trainer at 360DigiTMG with more than Ten years of experience and has been making the IT transition journey easy for his students. 360DigiTMG is at the forefront of delivering quality education, thereby bridging the gap between academia and industry.

Learn the core concepts of Data Science Course video on YouTube:

The AutoML technology and the AutoML frameworks are quite new technologies that support the Machine Learning model building process. Currently, various AutoML frameworks can work with a variety of data, available in both open-source and paid versions. We have many such AutoML packages, such as AutoGluon, Auto-PyTorch, MLJAR, H2O AutoML, MLBox, TPOT, AutoKeras, Auto-sklearn, Autoworker, etc., which are open-sourced, but, there are few commercially available packages such as Darwin, DataRobot, Google AutoML, etc.

Image courtesy: https://towardsdatascience.com/autogluon-deep-learning-automl-5cdb4e2388ec

Amazon Web Services recently launched this open-source library AutoGluon that allows developers to devise Deep Learning models on data such as images, text, or tabular data using just a few lines of code. This toolbox is planned to be an easy-to-use and easy-to-extend AutoML toolkit, which would be a tool that could be used for both Machine Learning beginners and experts. Using this AutoML framework, the advantage is using a short code; it can perform multiple applications such as automatic hyperparameter tuning, model selection process, and data pre-processing, such as data cleansing, feature engineering, and automatic application of SOTA Deep Learning models.

In this article, let's try to experience AutoGluon’s features which automate the Machine Learning tasks. Ensembling, Deep Learning, and real-world applications such as spanning image, text, and tabular data, are a few of the AutoGluon’s applications, which qualify it as an easy-to-use and easy-to-extend AutoML automated package. With only a few lines of code using the AutoGluon AutoML framework, we can prototype Deep Learning and classical ML solutions at a faster pace. Without having any in-depth knowledge of any of the state-of-the-art techniques, in appropriate context, anyone can utilize this package and access these techniques. The benefits or supremacy that we get by using this package in AutoML are automatic hyperparameter tuning, model selection/ensembling, architecture search, and data processing. It is easy to improve or tune different models and data pipelines created using AutoGluon and it could be easily customized to any of your use-case depending on your domain of expertise.

AutoGluon Library which was developed in 2020 is one of the latest libraries, developed by AWS helps in getting a strong predictive performance in various Machine Learning and Deep Learning models. It has been developed as AutoML open-source toolkit and when it comes to installation, it is supported for the Linux & Mac operating systems, but in the case of the Windows Operating system, AutoGluon Library is not officially supported by Windows OS. In specific terms, when we say the AutoGluon library is simple, it specifies how classification and regression models are trained and deployed, which could be achieved by implementing only a few lines of code. This package enables using raw data to users, without doing any feature engineering or data manipulation. These features from the AutoGluon package allow us to obtain the best model under a specified time constraint. Apart from this, it is a fault-tolerance AutoML framework, in which training resumes if there is any interruption and the users can inspect all the intermediate steps.

How do you install AutoGluon?

To install AutoGluon, we require Python version 3.7, 3.8, or 3.9 and by using the following lines of code, we can install this package.

pip3 install -U pip

pip3 install -U setuptools wheel

pip3 install "torch>=1.0,<1.11+cpu" -f

pip3 install autogluon

AutoGluon is divided into sub-modules dedicated for tabular, text, or image data and by installing a specific sub-module, we can reduce the number of dependencies required by executing python3 -m pip install , where could be related to any of the data. We have submodules, such as, autogluon.tabular, autogluon.vision, autogluon.text, autogluon.core and autogluon.features. So, AutoGluon can be used for the different categories, such as tabular prediction, image prediction, object prediction, text prediction, and multimodal prediction. When we fit and fine-tune our model, TabularPrediction(Classification) with AutoGluon, we use TabularPredictor(label=’stroke’).fit(train_data = df_train, verbosity = 2,presets='best_quality'), where based on the two unique labels ‘0’ & ‘1’, AutoGluon perfectly recognizes the classification problem in the outcome column. AutoGluon trains different models and selects the best model spontaneously. For example: when a TabularPrediction (Regression) has been taken, AutoGluon trained 11 models and recommended KNN (KNeighborsDist_BAG_L1) as the best model followed by XGBoost (XGBoost_BAG_L1).

In Tabular prediction, AutoGluon uses a simple ‘fit()’ command for classifying images based on their content which automatically produces high-quality image classification models. In the process of detecting and focusing objects in an image, computer vision analysis for object detection has an important role. Here too, AutoGluon gives an option of calling a simple ‘fit()’ command which will automatically generate a high-quality object detection model for identifying the presence and location of objects in images.

Let us see an example, where we provide Python code that first imports AutoGluon and provides a specific task where we will work with tabular data using TabularPrediction. The Dataset from a CSV file would be hosted on S3. On giving the function, fit(), AutoGluon processes the data and trains a different ensemble of ML models called a “predictor” which can predict the “class” variable in this data. AutoGluon uses the other columns as predictive features, such as the individuals’ occupation, age, and education. This ensemble includes different algorithms, which were trained and tested and are famous within the ML for their quality, robustness, and speed such as LightGBM, CatBoost, and Deep Neural Networks that constantly outperform more traditional ML models such as logistic regression.

AutoGluon’s model leaderboard, where we can notice different models and their accuracies.

For the prediction of text data in supervised learning, we can use a simple ‘fit()’ command that can automatically generate high-quality text prediction models. The training examples in data can be a sentence, a short paragraph, or some additional numeric/categorical features present in the text. If we provide a single function, ‘predictor.fit()’ command, it can train highly accurate neural networks on the given text dataset where the target values or labels used to predict may be continuous values or individual categories. Even though the TextPredictor is designed for classification and regression tasks only, it can directly be used for other NLP tasks also if the data is properly formatted into a data table. The TextPredictor uses only Transformer neural network models. These are fit to the provided data via transfer learning from a pre-trained list of NLP models like BERT, ALBERT, and ELECTRA. It also allows training on multi-modal data tables which contain text, numeric and categorical columns, and the neural network hyperparameter which can be automatically tuned with Hyperparameter Optimization (HPO). Multimodal tabular data consisting of text, numeric, and categorical columns can also be handled by AutoGluon. Raw text data is observed as a first-class citizen of data tables in AutoGluon. It can help you train and match a wide variety of models including classical tabular models like LightGBM, RF, and CatBoost as well as the pre-trained NLP model-based multimodal network.

We have seen many exciting facts about AutoGluon AutoML frameworks and their effectiveness. It is being used to reduce the time it takes to create production-ready ML models with amazing ease and competence. This accelerates the overall ML process and provides extra time for Data Scientists so that they focus on finding the different solutions to real-life problems. The biggest benefit of using the AutoGluon AutoML framework could be given to its ability to train and test multiple existing Machine Learning algorithms on a variety of data sets independently. Further, it is to be noted that using the AutoGluon AutoML framework does not remove the need for training and some basic understanding of data, data annotation, and the desired outcome. Thus, the AutoGluon AutoML framework’s success would likely depend on how soon it is accepted and adopted and the tangible benefits it brings to a certain industry. However, we can say that AutoGluon AutoML framework is there to stay and give us the solution.

As it was introduced earlier, AutoGluon is an open-source AutoML framework that requires only a single line of Python to train highly accurate Machine Learning models on an unprocessed dataset. When AutoML platforms such as TPOT, H2O, AutoWEKA, auto-sklearn, AutoGluon, and Google AutoML were compared with AutoGluon, it is faster, more robust, and much more accurate. Thus, the AutoGluon AutoML framework will be playing an important role in the Machine Learning process in the future.