Workflow Element Store

  1. Experiments (DoE)
  2. APIs and Data Feeds
  3. Mobile Applications or IoT Applications
  4. Feedback Data
  5. WebScraping
  6. Data bases - NoSQL
  7. Public Datasets
  8. Data Collaboration and Partnerships
  9. Data Bases - SQL
  10. Surveys and Questionnaires
  11. Flat files
  1. MS SQL server
  2. AWS Redshift
  3. Azure Streaming Analytics
  4. GCP Data Fusion
  5. Azure blob storage
  6. ETL/ELT pipeline
  7. MongoDB
  8. AWS Kinesis
  9. Apache Kafka
  10. Oracle DB
  11. Azure ADF
  12. GCP Dataflow
  13. RDBMS
  14. AWS RDS
  15. Azure Synapse
  16. GCP BigQuery
  17. s3
  18. MySQL
  19. AWS Glue
  20. PostgreSQL
  21. GCS
  1. Handling Noisy Data
  2. Data Transformations
  3. Interaction Features
  4. Augmentation
  5. Data Partitioning - Train, Validation, & Test
  6. AutoEDA libraries
  7. Auto-Preprocessing libraries
  8. Domain-Specific Feature Engineering
  9. Dealing with Outliers
  10. Textual Feature Extraction
  11. Annotation
  12. Handling Missing Data
  13. Handling Categorical Data
  14. Polynomial Features
  15. Feature Selection
  16. Data Scaling and Normalization
  17. Handling Imbalanced Classes
  18. Dimensionality Reduction
  19. Feature Extraction from Images
  20. Time-Based Features
  21. Handling Time-Series Data
  22. Binning / Discretization
  1. Association Rules
  2. Learning Rate Scheduling
  3. Batch Size Selection
  4. Regularization Techniques
  5. Hyperparameter Tuning
  6. Batch Normalization
  7. Early Stopping
  8. Performance Visualization
  9. Regression Analysis
  10. Ensemble Techniques
  11. AutoML
  12. Forecasting Techniques
  13. Model Interpretability
  14. Cross-Validation
  15. Transfer Learning
  16. Regular Monitoring and Logging
  17. Reinforcement Learning
  18. Model Comparison
  19. Cross-Validation
  20. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  21. Multiclass Classification Techniques
  22. Data Augmentation
  23. Weight Initialization
  24. Clustering
  25. Natural Language Processing
  26. Transfer Learning
  27. Regularization
  28. Network Analytics/ GeoSpatial Analytics
  29. Blackbox - Neural Network Models
  30. Evaluation Metrics
  31. Binary Classification Techniques
  32. External Validation
  33. Word Embeddings
  34. Recommendation Engine
  1. model registry
  2. Datawarehouse
  3. Data Preprocessing pipeline models
  4. Databases
  5. code repository
  1. Cloud Deployment
  2. Model Versioning
  3. Alerting and Notification
  4. Data Drift Monitoring
  5. Containerization
  6. Edge Deployment
  7. Bias and Fairness Assessment
  8. Streamlit
  9. Performance Metrics
  10. Serverless Computing
  11. Concept Drift Detection
  12. Feedback Collection
  13. FastAPI
  14. Model Serialization
  15. Flask
  16. Model Health Monitoring
  17. Prediction Logging
  18. Model Drift
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API