Workflow Element Store

  1. Flat files
  2. Mobile Applications or IoT Applications
  3. Experiments (DoE)
  4. Surveys and Questionnaires
  5. Data bases - NoSQL
  6. WebScraping
  7. APIs and Data Feeds
  8. Data Collaboration and Partnerships
  9. Public Datasets
  10. Feedback Data
  11. Data Bases - SQL
  1. AWS Redshift
  2. GCP BigQuery
  3. Oracle DB
  4. MySQL
  5. AWS RDS
  6. MongoDB
  7. ETL/ELT pipeline
  8. MS SQL server
  9. GCS
  10. AWS Kinesis
  11. PostgreSQL
  12. AWS Glue
  13. Azure Synapse
  14. Azure ADF
  15. Azure Streaming Analytics
  16. Apache Kafka
  17. Azure blob storage
  18. s3
  19. GCP Dataflow
  20. RDBMS
  21. GCP Data Fusion
  1. Binning / Discretization
  2. Feature Extraction from Images
  3. Handling Categorical Data
  4. Time-Based Features
  5. Dimensionality Reduction
  6. Handling Noisy Data
  7. Auto-Preprocessing libraries
  8. AutoEDA libraries
  9. Interaction Features
  10. Dealing with Outliers
  11. Domain-Specific Feature Engineering
  12. Handling Missing Data
  13. Data Partitioning - Train, Validation, & Test
  14. Augmentation
  15. Polynomial Features
  16. Data Scaling and Normalization
  17. Handling Imbalanced Classes
  18. Feature Selection
  19. Data Transformations
  20. Textual Feature Extraction
  21. Handling Time-Series Data
  22. Annotation
  1. Word Embeddings
  2. Transfer Learning
  3. Ensemble Techniques
  4. Transfer Learning
  5. Cross-Validation
  6. External Validation
  7. Early Stopping
  8. Blackbox - Neural Network Models
  9. Batch Size Selection
  10. Model Comparison
  11. Learning Rate Scheduling
  12. Hyperparameter Tuning
  13. Regularization Techniques
  14. AutoML
  15. Network Analytics/ GeoSpatial Analytics
  16. Data Augmentation
  17. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  18. Batch Normalization
  19. Model Interpretability
  20. Cross-Validation
  21. Performance Visualization
  22. Multiclass Classification Techniques
  23. Recommendation Engine
  24. Binary Classification Techniques
  25. Evaluation Metrics
  26. Regular Monitoring and Logging
  27. Weight Initialization
  28. Reinforcement Learning
  29. Natural Language Processing
  30. Association Rules
  31. Regression Analysis
  32. Forecasting Techniques
  33. Regularization
  34. Clustering
  1. model registry
  2. code repository
  3. Databases
  4. Data Preprocessing pipeline models
  5. Datawarehouse
  1. Prediction Logging
  2. Streamlit
  3. Cloud Deployment
  4. Feedback Collection
  5. Data Drift Monitoring
  6. Containerization
  7. Flask
  8. Edge Deployment
  9. Bias and Fairness Assessment
  10. Alerting and Notification
  11. Model Serialization
  12. Serverless Computing
  13. Model Versioning
  14. Model Health Monitoring
  15. Performance Metrics
  16. Concept Drift Detection
  17. FastAPI
  18. Model Drift
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API