Workflow Element Store

  1. Public Datasets
  2. APIs and Data Feeds
  3. Data bases - NoSQL
  4. Surveys and Questionnaires
  5. Mobile Applications or IoT Applications
  6. Data Collaboration and Partnerships
  7. Feedback Data
  8. Data Bases - SQL
  9. WebScraping
  10. Flat files
  11. Experiments (DoE)
  1. GCP Dataflow
  2. Azure blob storage
  3. GCP Data Fusion
  4. Apache Kafka
  5. Azure Synapse
  6. GCP BigQuery
  7. MS SQL server
  8. ETL/ELT pipeline
  9. AWS RDS
  10. RDBMS
  11. Azure ADF
  12. Azure Streaming Analytics
  13. MongoDB
  14. AWS Redshift
  15. MySQL
  16. AWS Glue
  17. GCS
  18. AWS Kinesis
  19. PostgreSQL
  20. Oracle DB
  21. s3
  1. Handling Imbalanced Classes
  2. Data Scaling and Normalization
  3. Annotation
  4. Interaction Features
  5. Data Partitioning - Train, Validation, & Test
  6. Feature Extraction from Images
  7. Time-Based Features
  8. Data Transformations
  9. Polynomial Features
  10. Dimensionality Reduction
  11. Domain-Specific Feature Engineering
  12. Handling Missing Data
  13. Dealing with Outliers
  14. Handling Time-Series Data
  15. Auto-Preprocessing libraries
  16. Handling Noisy Data
  17. Handling Categorical Data
  18. Binning / Discretization
  19. Augmentation
  20. Feature Selection
  21. Textual Feature Extraction
  22. AutoEDA libraries
  1. Regression Analysis
  2. External Validation
  3. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  4. Model Interpretability
  5. Transfer Learning
  6. Cross-Validation
  7. Word Embeddings
  8. Weight Initialization
  9. Blackbox - Neural Network Models
  10. Association Rules
  11. Ensemble Techniques
  12. Hyperparameter Tuning
  13. Recommendation Engine
  14. AutoML
  15. Transfer Learning
  16. Reinforcement Learning
  17. Forecasting Techniques
  18. Regularization
  19. Cross-Validation
  20. Model Comparison
  21. Binary Classification Techniques
  22. Batch Size Selection
  23. Data Augmentation
  24. Evaluation Metrics
  25. Clustering
  26. Multiclass Classification Techniques
  27. Performance Visualization
  28. Network Analytics/ GeoSpatial Analytics
  29. Natural Language Processing
  30. Learning Rate Scheduling
  31. Batch Normalization
  32. Regularization Techniques
  33. Early Stopping
  34. Regular Monitoring and Logging
  1. Data Preprocessing pipeline models
  2. Databases
  3. model registry
  4. code repository
  5. Datawarehouse
  1. Serverless Computing
  2. Cloud Deployment
  3. Flask
  4. Feedback Collection
  5. Model Versioning
  6. Containerization
  7. Bias and Fairness Assessment
  8. Model Health Monitoring
  9. Concept Drift Detection
  10. FastAPI
  11. Prediction Logging
  12. Alerting and Notification
  13. Model Serialization
  14. Edge Deployment
  15. Data Drift Monitoring
  16. Performance Metrics
  17. Streamlit
  18. Model Drift
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API