Open to opportunities · Stamp 4 · No sponsorship needed

Swaraj Shaw

Machine Learning Engineer · NLP · Speech · LLM Systems

Dublin, Ireland · shaw.swaraj16@gmail.com · +353 89 984 9430

Swaraj Shaw

Machine Learning Engineer with 5+ years of experience delivering NLP, Speech (TTS/ASR), and LLM-based systems across Big Tech and startups. Expert in text-to-speech modelling, embeddings, ONNX inference, Python/Rust ML infrastructure, evaluation design, and large-scale data quality automation. Strong track record building high-impact AI platforms, offline inference engines, ML pipelines, and production-grade products. Currently building full-stack Irish mobility platform AutoHub Ireland and pursuing an LLM Agentic AI course at DkIT while working at Meta.

ML / AI
NLPLLMsTTS / ASREmbeddingsG2PText NormalisationONNX RuntimeEvaluation Design
Frameworks
PyTorchTensorFlowHuggingFaceScikit-learnspaCyLangChain
Backend / Systems
RustPythonC++FastAPINext.jsNode.jsCeleryRedis
Cloud / Infra
AWS SageMakerLambdaTextractGCP VertexBigQueryDockerTerraform
Data Engineering
PostgreSQLSQL ServerMySQLETLData Lakes
Tools
TauriSearXNGPrefectTableauPower BIGitLinux
🏢
Text-to-Speech Linguist — Hindi Locale Oct 2023 – Present
Meta (via Covalen) · Dublin, Ireland
  • SME for Hindi TTS: phoneset design, G2P mappings, TN rules, and full linguistic pipeline.
  • Built a 14k+ sentence evaluation suite across 11 linguistic categories; automated coverage analysis.
  • Corrected 14k+ phonetic errors in a 35k-word lexicon database using Python tooling.
  • Achieved 95.6% benchmark accuracy through iterative training and error analysis.
  • Major contributor to Ray-Ban Meta smart glasses Hindi TTS: evaluated 30k+ scripts, shortlisted 200+ voices.
  • Collaborated with engineering on training platforms (Bento, MLHub) and quality tooling.
📊
Business Intelligence & ML Engineer Jan 2020 – Jan 2022
AspireNXT Pvt. Ltd. · Singapore (Remote/Hybrid)
  • Built conversational chatbots (Amazon Lex + Connect) across Slack, Messenger, and internal tools.
  • Delivered a recommendation engine via Amazon Personalize + SageMaker, increasing conversions by 15%.
  • Built anomaly detection pipelines for energy data; reduced operational wastage by 15%.
  • Automated BI dashboards (QuickSight, Tableau); improved data visibility and onboarding KPIs by 50%.
🧪
Research Data Scientist May 2019 – Jan 2020
Zeo Minds IT Solutions · Hyderabad, India
  • Built credit-risk and churn models using SMOTE + XGBoost; improved accuracy by 15%.
  • Prototyped ASR-based booking and customer support workflows integrated with ML APIs.
📈
Data Analyst Jun 2018 – Apr 2019
Digital Nest Pvt. Ltd. · Hyderabad, India
  • Built collaborative filtering recommendation models improving sales by 23%.
  • Created CNN-based image classifiers achieving 95% accuracy for internal automation.
📱
Android Developer Sept 2017 – Jun 2018
E Info Solutions Pvt. Ltd. · Kolkata, India
  • Led Android app for Quest Mall; achieved 5,000+ installs and increased footfall via location-intelligent features.
🚗
AutoHub Ireland
Full-Stack · Mobile · Web · AI
Active

Irish-localised vehicle intelligence platform: real-time fuel & EV price map, best-value routing, RSA driving test scraper, and an AI Master Mechanic assistant with RAG over vehicle manuals.

  • React Native mobile app (Expo) + Next.js web portal with Supabase auth
  • NestJS API + PostgreSQL monorepo with Docker + Render deployment
  • Playwright-based automated RSA wait-time scraper running on cron
  • Community features: verified owner badges, contributions, achievements
Next.jsReact NativeNestJSPostgreSQLSupabasePlaywrightDocker
🧠
DataDrive.ai · Nexus Platform
AI Orchestration · MLOps · LLM Fine-tuning
Private

Enterprise AI orchestration monorepo: FastAPI gateway with prompt orchestration, Next.js 19 messenger UI, async Celery workers, QLoRA LLM fine-tuning, and Whisper ASR adaptation.

  • Knowledge Distillation: Automated GPT-4o teacher to local GGUF/student distillation pipeline.
  • Compliance ETL: PII-redaction-aware dataset exports (spaCy-powered) with audit trails.
  • Unified MLOps: Prefect-orchestrated nightly training, S3 model registry, and Slack-integrated failure reporting.
  • AI Lab: Full-stack evaluation suite for real-time model comparison using BLEU/WER metrics.
FastAPINext.js 19PrefectCeleryRedisspaCyGGUFDockerTerraform
📁
Sorted
Desktop App · Rust · ONNX · Tauri
Private

Local-first AI file intelligence desktop app. Rust daemon with GPU-accelerated ONNX inference (Metal/DirectML) semantically renames, deduplicates, and organises files — all fully offline.

  • MiniLM/BGE/Gemma embeddings for semantic rename & folder recommendations
  • SQLite (WAL) metadata + audit trail — every action undo-safe
  • Tauri UI with previews, batch actions, confidence threshold config
  • C++ ONNX Runtime inference engine with Metal + DirectML GPU backends
RustC++ONNX RuntimeTauriSQLitePython
🌐
P2P · Decentralised AI · Desktop
Open Source

Run Llama 3, Mistral, and Phi-3 on any laptop — no GPU, no cloud, no login. Uses libp2p DHT to split model layers across peers, like BitTorrent for LLM inference.

  • libp2p DHT peer discovery: layers distributed across the network
  • Encrypted prompts — no server, no IP logging, fully auditable
  • Supports any Hugging Face GGUF model ID out of the box
ElectronNext.jsRust/WASMlibp2pHuggingFace
💼
EasyApply
Hiring Intelligence · ML Pipelines · Automation
Private

Proprietary hiring intelligence platform and automation cockpit. Features custom scrapers, heuristic/ML detail normalization, and enterprise-grade recruitment signals orchestration.

  • MiniLM-powered JD field extraction with automated retraining loop (Corrections → JSONL → Embeddings → Model)
  • High-performance Rust-based HTML extractors and job feed parsers for native-speed ingestion
  • Modular Node.js + PostgreSQL backend with invite-only access, admin impersonation, and job-board localization
  • Signal-rich analytics dashboard for live recruitment automation orchestration
Node.jsPostgreSQLFastAPIMiniLMPythonRust
🐄
WireFree
Mobile · IoT · AI Behaviour Analysis
Private

Virtual fencing platform for cattle using AirTags and BLE devices — a cost-effective alternative to Nofence. Farmers draw boundaries on a mobile map; smart alerts fire when animals approach fence lines.

  • Cross-platform mobile app with real-time location tracking and alerts
  • AI behaviour analysis to distinguish grazing from fence-breaking events
  • Universal device support: AirTags, Tile, and custom collar integration
React NativeNode.jsBLEMaps SDKPython
🎵
Record Genie
LLM Agents · ReAct · Tool Use
Academic

Agentic AI music assistant built for DkIT LLM course. Implements both ReAct loop and Workflow patterns with Pydantic tool-calling (album lookup, artist search, genre queries) over a SQLite music database.

  • ReAct agent: single prompt → tool call loop → LLM summary
  • Workflow mode: CSV batch processing, one enquiry at a time
  • LangSmith tracing + Groq inference backend
PythonOpenAI SDKPydanticLangSmithGroqSQLite
🎓
DkIT LLM Agentic AI Course
Production AI Engineering · 2026
Education

Year-long production AI engineering curriculum at Dundalk Institute of Technology. Building enterprise-grade systems with RAG, agents, evals in CI, security controls, observability, and cost controls.

  • Weekly labs: LangChain, vector DBs, agent workflows, prompt engineering
  • Infra: Docker, Terraform, Kubernetes deployment pipelines
  • Capstone: shippable, evaluated AI system with full observability
PythonLangChainDockerTerraformKubernetes
🏠
LocalSpace
Web · Mobile · Accommodation Portal
Private

Full-stack accommodation management portal with web and mobile clients. Designed to simplify rental management for landlords and tenants, with real-time alerts and document management.

ReactNode.jsReact Native
🩺
AcneDetection
Computer Vision · CNN · Web Deployment
Academic

CNN-based skin condition classifier comparing ResNet, AlexNet, and VGG16 architectures under identical operational conditions. Best model deployed as a web application for real-time inference.

  • ResNet, AlexNet, VGG16 evaluated with consistent train/val splits
  • Flask web app deployment for live image classification
PythonPyTorchResNetVGG16Flask
📄
TextExtraction
OCR · Document Intelligence · AWS
Private

Automated document intelligence pipeline using AWS Textract + PyTorch for invoice and document parsing. Reduced manual document processing workload by 25% in production.

  • AWS Textract for OCR with PyTorch post-processing for field extraction
  • Structured output pipeline for invoices, forms, and receipts
PythonAWS TextractPyTorch
🎬
Hybrid Movie Recommender
Hybrid ML · NLP · Embeddings
Private

Hybrid collaborative and content-based recommender system. Combines MovieLens dataset interactions with NLP-based content analysis to provide highly accurate, dual-signal movie suggestions.

  • Hybrid engine: blends collaborative filtering with TF-IDF content similarity
  • Advanced NLP: uses textual metadata and tags for semantic item matching
  • Extensive evaluation: tested on ml-latest dataset with cold-start mitigation
PythonJupyterScikit-learnPandasNLP
🎓
MSc Artificial Intelligence — 1.1 Honours
Dublin Business School · Dublin, Ireland
2023
🏛️
B.Tech Computer Science & Engineering
Amity University · India
2019

Let's build something together

Open to ML engineering roles, consulting, and research collaborations.

✅ Stamp 4 · Ireland · No sponsorship required