Workflow Element Store

  1. WebScraping
  2. APIs and Data Feeds
  3. Data Collaboration and Partnerships
  4. Mobile Applications or IoT Applications
  5. Data bases - NoSQL
  6. Flat files
  7. Surveys and Questionnaires
  8. Experiments (DoE)
  9. Public Datasets
  10. Data Bases - SQL
  11. Feedback Data
  1. AWS Redshift
  2. s3
  3. Azure Streaming Analytics
  4. AWS Glue
  5. RDBMS
  6. GCP Dataflow
  7. Azure ADF
  8. MySQL
  9. PostgreSQL
  10. Oracle DB
  11. ETL/ELT pipeline
  12. Azure blob storage
  13. MongoDB
  14. MS SQL server
  15. AWS RDS
  16. GCP BigQuery
  17. Azure Synapse
  18. AWS Kinesis
  19. GCS
  20. GCP Data Fusion
  21. Apache Kafka
  1. Auto-Preprocessing libraries
  2. Time-Based Features
  3. Annotation
  4. Handling Noisy Data
  5. Feature Selection
  6. Data Partitioning - Train, Validation, & Test
  7. Augmentation
  8. Data Scaling and Normalization
  9. Handling Imbalanced Classes
  10. Handling Time-Series Data
  11. AutoEDA libraries
  12. Handling Missing Data
  13. Handling Categorical Data
  14. Interaction Features
  15. Domain-Specific Feature Engineering
  16. Feature Extraction from Images
  17. Textual Feature Extraction
  18. Data Transformations
  19. Dimensionality Reduction
  20. Binning / Discretization
  21. Polynomial Features
  22. Dealing with Outliers
  1. Early Stopping
  2. Natural Language Processing
  3. Cross-Validation
  4. Blackbox - Neural Network Models
  5. Regularization
  6. Weight Initialization
  7. Cross-Validation
  8. Forecasting Techniques
  9. Clustering
  10. Transfer Learning
  11. Multiclass Classification Techniques
  12. Model Interpretability
  13. Data Augmentation
  14. Association Rules
  15. Regression Analysis
  16. Reinforcement Learning
  17. External Validation
  18. Performance Visualization
  19. Batch Normalization
  20. Binary Classification Techniques
  21. Regular Monitoring and Logging
  22. AutoML
  23. Batch Size Selection
  24. Evaluation Metrics
  25. Network Analytics/ GeoSpatial Analytics
  26. Recommendation Engine
  27. Model Comparison
  28. Ensemble Techniques
  29. Word Embeddings
  30. Regularization Techniques
  31. Transfer Learning
  32. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  33. Hyperparameter Tuning
  34. Learning Rate Scheduling
  1. code repository
  2. Databases
  3. Datawarehouse
  4. Data Preprocessing pipeline models
  5. model registry
  1. Model Serialization
  2. FastAPI
  3. Bias and Fairness Assessment
  4. Model Drift
  5. Cloud Deployment
  6. Feedback Collection
  7. Flask
  8. Data Drift Monitoring
  9. Concept Drift Detection
  10. Performance Metrics
  11. Containerization
  12. Edge Deployment
  13. Model Versioning
  14. Model Health Monitoring
  15. Serverless Computing
  16. Alerting and Notification
  17. Streamlit
  18. Prediction Logging
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API