Workflow Element Store

  1. Experiments (DoE)
  2. Data bases - NoSQL
  3. Data Bases - SQL
  4. Mobile Applications or IoT Applications
  5. Flat files
  6. APIs and Data Feeds
  7. Public Datasets
  8. Surveys and Questionnaires
  9. WebScraping
  10. Data Collaboration and Partnerships
  11. Feedback Data
  1. Apache Kafka
  2. MongoDB
  3. GCP Dataflow
  4. GCP Data Fusion
  5. Azure ADF
  6. GCP BigQuery
  7. ETL/ELT pipeline
  8. MS SQL server
  9. Oracle DB
  10. Azure Streaming Analytics
  11. Azure Synapse
  12. MySQL
  13. AWS RDS
  14. Azure blob storage
  15. AWS Redshift
  16. RDBMS
  17. GCS
  18. AWS Kinesis
  19. s3
  20. PostgreSQL
  21. AWS Glue
  1. Time-Based Features
  2. Handling Noisy Data
  3. Annotation
  4. Interaction Features
  5. Data Transformations
  6. Polynomial Features
  7. Handling Missing Data
  8. Augmentation
  9. Auto-Preprocessing libraries
  10. Feature Extraction from Images
  11. Textual Feature Extraction
  12. Handling Categorical Data
  13. Dealing with Outliers
  14. AutoEDA libraries
  15. Handling Imbalanced Classes
  16. Handling Time-Series Data
  17. Dimensionality Reduction
  18. Data Scaling and Normalization
  19. Feature Selection
  20. Binning / Discretization
  21. Data Partitioning - Train, Validation, & Test
  22. Domain-Specific Feature Engineering
  1. Performance Visualization
  2. Data Augmentation
  3. Cross-Validation
  4. External Validation
  5. Weight Initialization
  6. Batch Size Selection
  7. Regression Analysis
  8. Hyperparameter Tuning
  9. Regular Monitoring and Logging
  10. Ensemble Techniques
  11. Evaluation Metrics
  12. Forecasting Techniques
  13. Regularization Techniques
  14. Natural Language Processing
  15. Batch Normalization
  16. Clustering
  17. Recommendation Engine
  18. Multiclass Classification Techniques
  19. Association Rules
  20. Blackbox - Neural Network Models
  21. Transfer Learning
  22. Regularization
  23. Model Interpretability
  24. Reinforcement Learning
  25. Transfer Learning
  26. Model Comparison
  27. AutoML
  28. Network Analytics/ GeoSpatial Analytics
  29. Cross-Validation
  30. Learning Rate Scheduling
  31. Binary Classification Techniques
  32. Word Embeddings
  33. Early Stopping
  34. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  1. model registry
  2. code repository
  3. Data Preprocessing pipeline models
  4. Databases
  5. Datawarehouse
  1. Performance Metrics
  2. Concept Drift Detection
  3. Serverless Computing
  4. Model Drift
  5. Edge Deployment
  6. Cloud Deployment
  7. Bias and Fairness Assessment
  8. Prediction Logging
  9. Model Versioning
  10. Containerization
  11. Alerting and Notification
  12. Flask
  13. Model Health Monitoring
  14. Feedback Collection
  15. Data Drift Monitoring
  16. Streamlit
  17. Model Serialization
  18. FastAPI
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API