Workflow Element Store

  1. Data Bases - SQL
  2. Data Collaboration and Partnerships
  3. WebScraping
  4. Public Datasets
  5. Experiments (DoE)
  6. Data bases - NoSQL
  7. Surveys and Questionnaires
  8. Flat files
  9. Mobile Applications or IoT Applications
  10. Feedback Data
  11. APIs and Data Feeds
  1. s3
  2. GCP Dataflow
  3. GCS
  4. Azure Streaming Analytics
  5. AWS Kinesis
  6. Azure Synapse
  7. AWS Glue
  8. MS SQL server
  9. RDBMS
  10. ETL/ELT pipeline
  11. Oracle DB
  12. PostgreSQL
  13. MongoDB
  14. Azure blob storage
  15. GCP Data Fusion
  16. AWS Redshift
  17. AWS RDS
  18. MySQL
  19. GCP BigQuery
  20. Azure ADF
  21. Apache Kafka
  1. Polynomial Features
  2. Dimensionality Reduction
  3. Handling Categorical Data
  4. Dealing with Outliers
  5. Handling Imbalanced Classes
  6. Textual Feature Extraction
  7. Data Transformations
  8. Data Partitioning - Train, Validation, & Test
  9. Auto-Preprocessing libraries
  10. Feature Selection
  11. Handling Noisy Data
  12. Feature Extraction from Images
  13. Handling Missing Data
  14. Handling Time-Series Data
  15. Data Scaling and Normalization
  16. Time-Based Features
  17. AutoEDA libraries
  18. Domain-Specific Feature Engineering
  19. Annotation
  20. Interaction Features
  21. Augmentation
  22. Binning / Discretization
  1. Forecasting Techniques
  2. Reinforcement Learning
  3. AutoML
  4. Natural Language Processing
  5. Performance Visualization
  6. Multiclass Classification Techniques
  7. Association Rules
  8. Cross-Validation
  9. Batch Size Selection
  10. Word Embeddings
  11. Model Comparison
  12. External Validation
  13. Clustering
  14. Ensemble Techniques
  15. Data Augmentation
  16. Binary Classification Techniques
  17. Regression Analysis
  18. Batch Normalization
  19. Early Stopping
  20. Regular Monitoring and Logging
  21. Evaluation Metrics
  22. Weight Initialization
  23. Hyperparameter Tuning
  24. Regularization
  25. Cross-Validation
  26. Transfer Learning
  27. Network Analytics/ GeoSpatial Analytics
  28. Learning Rate Scheduling
  29. Blackbox - Neural Network Models
  30. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  31. Transfer Learning
  32. Regularization Techniques
  33. Recommendation Engine
  34. Model Interpretability
  1. Datawarehouse
  2. Databases
  3. code repository
  4. model registry
  5. Data Preprocessing pipeline models
  1. Alerting and Notification
  2. Prediction Logging
  3. Feedback Collection
  4. Model Health Monitoring
  5. Model Serialization
  6. Model Versioning
  7. Serverless Computing
  8. FastAPI
  9. Edge Deployment
  10. Data Drift Monitoring
  11. Streamlit
  12. Flask
  13. Containerization
  14. Concept Drift Detection
  15. Model Drift
  16. Performance Metrics
  17. Bias and Fairness Assessment
  18. Cloud Deployment
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API