Workflow Element Store

  1. Surveys and Questionnaires
  2. Mobile Applications or IoT Applications
  3. Data Collaboration and Partnerships
  4. Flat files
  5. Experiments (DoE)
  6. Data Bases - SQL
  7. Public Datasets
  8. Data bases - NoSQL
  9. Feedback Data
  10. WebScraping
  11. APIs and Data Feeds
  1. GCP Dataflow
  2. AWS RDS
  3. PostgreSQL
  4. MongoDB
  5. MS SQL server
  6. MySQL
  7. ETL/ELT pipeline
  8. Azure Streaming Analytics
  9. Azure Synapse
  10. AWS Redshift
  11. Oracle DB
  12. RDBMS
  13. Azure blob storage
  14. Azure ADF
  15. GCS
  16. GCP Data Fusion
  17. AWS Kinesis
  18. Apache Kafka
  19. s3
  20. AWS Glue
  21. GCP BigQuery
  1. Handling Noisy Data
  2. Data Partitioning - Train, Validation, & Test
  3. Handling Imbalanced Classes
  4. Binning / Discretization
  5. Polynomial Features
  6. Auto-Preprocessing libraries
  7. Annotation
  8. Augmentation
  9. Handling Missing Data
  10. Domain-Specific Feature Engineering
  11. Textual Feature Extraction
  12. AutoEDA libraries
  13. Feature Extraction from Images
  14. Handling Categorical Data
  15. Dealing with Outliers
  16. Data Scaling and Normalization
  17. Dimensionality Reduction
  18. Handling Time-Series Data
  19. Interaction Features
  20. Time-Based Features
  21. Feature Selection
  22. Data Transformations
  1. Model Comparison
  2. Performance Visualization
  3. Cross-Validation
  4. AutoML
  5. Transfer Learning
  6. Blackbox - Neural Network Models
  7. Regression Analysis
  8. Regular Monitoring and Logging
  9. Association Rules
  10. Regularization
  11. Recommendation Engine
  12. Model Interpretability
  13. Natural Language Processing
  14. External Validation
  15. Learning Rate Scheduling
  16. Cross-Validation
  17. Weight Initialization
  18. Regularization Techniques
  19. Word Embeddings
  20. Ensemble Techniques
  21. Reinforcement Learning
  22. Batch Normalization
  23. Binary Classification Techniques
  24. Forecasting Techniques
  25. Clustering
  26. Hyperparameter Tuning
  27. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  28. Network Analytics/ GeoSpatial Analytics
  29. Data Augmentation
  30. Transfer Learning
  31. Batch Size Selection
  32. Early Stopping
  33. Evaluation Metrics
  34. Multiclass Classification Techniques
  1. Databases
  2. Data Preprocessing pipeline models
  3. code repository
  4. Datawarehouse
  5. model registry
  1. Bias and Fairness Assessment
  2. Streamlit
  3. FastAPI
  4. Flask
  5. Performance Metrics
  6. Model Versioning
  7. Model Serialization
  8. Feedback Collection
  9. Data Drift Monitoring
  10. Concept Drift Detection
  11. Edge Deployment
  12. Alerting and Notification
  13. Serverless Computing
  14. Model Health Monitoring
  15. Cloud Deployment
  16. Containerization
  17. Model Drift
  18. Prediction Logging
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API