Workflow Element Store

  1. APIs and Data Feeds
  2. Surveys and Questionnaires
  3. Mobile Applications or IoT Applications
  4. Data Collaboration and Partnerships
  5. Flat files
  6. Public Datasets
  7. Data Bases - SQL
  8. Feedback Data
  9. WebScraping
  10. Experiments (DoE)
  11. Data bases - NoSQL
  1. MongoDB
  2. Apache Kafka
  3. Azure ADF
  4. GCP BigQuery
  5. Oracle DB
  6. ETL/ELT pipeline
  7. Azure Streaming Analytics
  8. AWS Glue
  9. GCS
  10. s3
  11. AWS RDS
  12. AWS Redshift
  13. Azure Synapse
  14. Azure blob storage
  15. GCP Dataflow
  16. GCP Data Fusion
  17. MySQL
  18. RDBMS
  19. PostgreSQL
  20. AWS Kinesis
  21. MS SQL server
  1. Handling Categorical Data
  2. Polynomial Features
  3. Data Transformations
  4. Handling Imbalanced Classes
  5. Data Scaling and Normalization
  6. Handling Noisy Data
  7. Domain-Specific Feature Engineering
  8. Handling Time-Series Data
  9. Feature Selection
  10. AutoEDA libraries
  11. Augmentation
  12. Binning / Discretization
  13. Time-Based Features
  14. Dealing with Outliers
  15. Data Partitioning - Train, Validation, & Test
  16. Feature Extraction from Images
  17. Interaction Features
  18. Auto-Preprocessing libraries
  19. Dimensionality Reduction
  20. Annotation
  21. Textual Feature Extraction
  22. Handling Missing Data
  1. Model Comparison
  2. Early Stopping
  3. Batch Size Selection
  4. Transfer Learning
  5. Natural Language Processing
  6. Cross-Validation
  7. Word Embeddings
  8. Transfer Learning
  9. Regularization Techniques
  10. Regular Monitoring and Logging
  11. Regression Analysis
  12. Learning Rate Scheduling
  13. Hyperparameter Tuning
  14. Batch Normalization
  15. Recommendation Engine
  16. Clustering
  17. Blackbox - Neural Network Models
  18. Network Analytics/ GeoSpatial Analytics
  19. Cross-Validation
  20. Association Rules
  21. Evaluation Metrics
  22. Weight Initialization
  23. Regularization
  24. Model Interpretability
  25. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  26. External Validation
  27. Data Augmentation
  28. Ensemble Techniques
  29. Performance Visualization
  30. Forecasting Techniques
  31. Reinforcement Learning
  32. AutoML
  33. Multiclass Classification Techniques
  34. Binary Classification Techniques
  1. model registry
  2. Data Preprocessing pipeline models
  3. Datawarehouse
  4. Databases
  5. code repository
  1. Containerization
  2. Serverless Computing
  3. Concept Drift Detection
  4. Performance Metrics
  5. Model Versioning
  6. FastAPI
  7. Model Health Monitoring
  8. Flask
  9. Alerting and Notification
  10. Bias and Fairness Assessment
  11. Streamlit
  12. Data Drift Monitoring
  13. Edge Deployment
  14. Prediction Logging
  15. Cloud Deployment
  16. Feedback Collection
  17. Model Serialization
  18. Model Drift
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API