Workflow Element Store

  1. Flat files
  2. Public Datasets
  3. APIs and Data Feeds
  4. Data Bases - SQL
  5. Surveys and Questionnaires
  6. Data bases - NoSQL
  7. Data Collaboration and Partnerships
  8. Feedback Data
  9. WebScraping
  10. Mobile Applications or IoT Applications
  11. Experiments (DoE)
  1. Azure Streaming Analytics
  2. GCS
  3. Azure blob storage
  4. AWS Glue
  5. Azure Synapse
  6. MySQL
  7. AWS Kinesis
  8. GCP Dataflow
  9. AWS RDS
  10. GCP BigQuery
  11. MongoDB
  12. GCP Data Fusion
  13. Apache Kafka
  14. s3
  15. RDBMS
  16. ETL/ELT pipeline
  17. AWS Redshift
  18. Oracle DB
  19. PostgreSQL
  20. MS SQL server
  21. Azure ADF
  1. Textual Feature Extraction
  2. Handling Time-Series Data
  3. Data Partitioning - Train, Validation, & Test
  4. Auto-Preprocessing libraries
  5. Handling Noisy Data
  6. Augmentation
  7. Feature Extraction from Images
  8. Time-Based Features
  9. Polynomial Features
  10. Dealing with Outliers
  11. Domain-Specific Feature Engineering
  12. Binning / Discretization
  13. Handling Categorical Data
  14. Interaction Features
  15. Dimensionality Reduction
  16. Data Transformations
  17. Data Scaling and Normalization
  18. AutoEDA libraries
  19. Handling Imbalanced Classes
  20. Annotation
  21. Handling Missing Data
  22. Feature Selection
  1. Regression Analysis
  2. GridSearchCV, RandomisedSearchCV, BayesianSearchCV
  3. Natural Language Processing
  4. Weight Initialization
  5. Recommendation Engine
  6. Ensemble Techniques
  7. Transfer Learning
  8. Binary Classification Techniques
  9. Forecasting Techniques
  10. Model Interpretability
  11. External Validation
  12. AutoML
  13. Regular Monitoring and Logging
  14. Regularization
  15. Evaluation Metrics
  16. Batch Size Selection
  17. Cross-Validation
  18. Learning Rate Scheduling
  19. Transfer Learning
  20. Network Analytics/ GeoSpatial Analytics
  21. Regularization Techniques
  22. Model Comparison
  23. Batch Normalization
  24. Early Stopping
  25. Data Augmentation
  26. Hyperparameter Tuning
  27. Clustering
  28. Word Embeddings
  29. Performance Visualization
  30. Blackbox - Neural Network Models
  31. Association Rules
  32. Cross-Validation
  33. Multiclass Classification Techniques
  34. Reinforcement Learning
  1. Databases
  2. code repository
  3. model registry
  4. Datawarehouse
  5. Data Preprocessing pipeline models
  1. Model Drift
  2. Prediction Logging
  3. Cloud Deployment
  4. Serverless Computing
  5. Model Serialization
  6. Feedback Collection
  7. Streamlit
  8. Performance Metrics
  9. Concept Drift Detection
  10. Model Versioning
  11. Bias and Fairness Assessment
  12. Alerting and Notification
  13. Edge Deployment
  14. Model Health Monitoring
  15. Flask
  16. FastAPI
  17. Data Drift Monitoring
  18. Containerization
ML Workflow Intermediate - Architecture
  • Element belongs to model
  • Element not belongs to model
Training Pipeline
Data Collection

Data Collection

Inference API

API Stream

Web crawler

API Stream

Web crawler

Python logo

Selenium

Data Ingestion

Data Ingestion

Data Landing Zone

Store Data from all the Sources
Store Data from all the Sources

Store Data from all the Sources

Data Cleaning / Preprocessing

Data Cleaning / Preprocessing

Derived & Base features

Data Training & Modelling

Data Training & Modelling

Inference Pipeline
Input Data for Forecasting

Input Data for Forecasting

Input Data

Cleaned & Processed Data

Inference

Inference

Inference pickle
Inference Joblib
streamlit
Inference API