• Mon to Fri Open: 10am - 6pm

Become the Quality Guardian of the AI Revolution



Artificial Intelligence (AI) and Large Language Models (LLMs) are transforming how software is tested and validated. As organizations adopt AI-driven applications, the role of a specialized AI QA Engineer is emerging as one of the most in-demand careers in the QA domain.

This 40-hour course provides a comprehensive and hands-on introduction to testing AI, ML, and LLM-based systems. Designed for freshers and QA professionals, it bridges traditional software testing with the new world of probabilistic AI systems.

Learners will explore how to test data pipelines, validate machine learning models, evaluate LLM outputs, and automate AI QA processes using tools such as DeepEval, RAGAS, Great Expectations, and Ollama.
By the end of the course, you will be able to design, execute, and automate testing strategies for real-world AI systems confidently.



Who Should Attend ?

This course is ideal for:

  • Freshers or QA professionals looking to upskill into AI QA.

  • Manual and automation testers who want to learn AI and LLM testing.

  • Professionals involved in testing AI, ML, or data-driven products.


Prerequisites

  • No prior experience in AI or ML is required.

  • Understanding of basic software testing concepts is an advantage.

  • All necessary AI/ML testing foundations will be taught from scratch during the course.





What will you learn?

1 Module 1: Foundations of AI/ML for QA

AI, ML, and Deep Learning distinctions • Supervised, Unsupervised, Reinforcement learning • Common algorithms & testing implications • Data lifecycle, model training/validation, overfitting/underfitting • MLOps basics from QA lens • Comparing traditional QA vs. AI QA mindset • Hands-on: build a simple classification model in Scikit-learn + data pre-processing in Pandas

Testing ingestion, transformation, and output stages • Schema validation; nulls, outliers, duplicates • Record-level vs aggregate validation • Testing joins, aggregations, transformations • Tools: Great Expectations, Pandera, SQL-based checks • Synthetic test data & augmentation • Hands-on: validate sample datasets using Great Expectations; write SQL-based quality checks

Key evaluation metrics: accuracy, precision, recall, F1, confusion matrix • Drift detection: data drift, concept drift • Bias, fairness, and ethical evaluation • Black-box vs white-box testing for models • Hands-on: compute metrics manually in Python; analyze confusion matrix; perform a bias/fairness test on a sample model

• Challenges in LLM/RAG evaluation: hallucinations, grounding, relevance, factuality • What to test: prompts, outputs, grounding, consistency • Metrics: BLEU, ROUGE, BERTScore, faithfulness, toxicity, helpfulness • Local LLM setup with Ollama: installation, performance, prompt tuning • DeepEval framework: writing evaluators (StringMatchEvaluator, ContextualEval, ToxicityEval, etc.) • RAGAS evaluation: context precision/recall, faithfulness, answer correctness; integrating with LangChain/LlamaIndex • Hands-on: run local LLM via Ollama, write DeepEval tests, conceptually explore RAGAS on a sample RAG pipeline

• Automating data-driven tests (Python + PyTest/Robot + Pandas) • Testing AI/ML APIs (REST, GraphQL) with Postman etc. • CI/CD integration: GitHub Actions, Jenkins, model versioning (MLflow) • Cloud / MLOps: AWS SageMaker, GCP Vertex AI, Azure ML workflows • Monitoring & logging: CloudWatch, ELK, Prometheus, Grafana; drift detection in production • Responsible AI: explainability (SHAP, LIME), adversarial robustness, bias audits • Hands-on: build a CI pipeline to run model tests, explore SHAP/LIME explainability, discuss a real-world bias detection case study