Hello! I'm Anuj, a passionate Data Scientist currently pursuing my Master's in Computer Science at the University of Texas at Arlington, graduating in May 2027. I enjoy creating things that live on the internet, whether that's building machine learning models, developing data-driven solutions, or solving complex analytical problems.
With 10+ completed projects under my belt, I've worked on various technologies and frameworks. My journey in data science has been driven by a passion for creating efficient, scalable, and user-friendly applications.
Developing a Physics-Informed Neural Network (PINN) pipeline to model soil water retention curves using PSD and soil property data.
Designed machine-learning-powered automation systems, integrating OCR + LLMs for structured data extraction.
Rebuilt a legacy application, reducing system errors by 20% and increasing data reliability.
Coordinated with multiple companies to conduct campus recruitment drives and managed the entire placement process.
Production-grade user retention and churn prediction using 10K+ transaction-level records. Optimized for business impact (CLV preservation) over raw accuracy, improving retention targeting efficiency by 30%. SQL-driven EDA, 15+ RFM/behavioral features, cost-sensitive evaluation, and risk-based segmentation to minimize expected revenue loss.
Built a role-based AI expense management system using Flask and MongoDB, enabling secure bill uploads, automated extraction, and HR approval workflows. Implemented JWT authentication, scalable backend architecture, and Railway-ready deployment for production use. Leveraged OpenAI Vision & LLMs to extract structured expense data from bill images, reducing manual processing by ~70%.
Introduced a Graph-based Retrieval Augmented Generation (Graph-RAG) system over arXiv metadata using Neo4j graph traversal instead of vector search to deliver more accurate, context-grounded answers. Established a knowledge graph schema (papers, authors, topics) and FastAPI APIs for graph retrieval, multi-hop reasoning, and LLM-driven response synthesis.
Built a sentiment classification model (~80% accuracy) using NLP preprocessing and ML algorithms. Developed a web interface for real-time analysis via Flask API endpoints. Conducted cleaning, tokenization, stop-word removal, and vectorization.
Built a multi-agent orchestration framework enabling autonomous agent-to-agent communication using a standardized A2A protocol. Designed a centralized orchestrator for task routing, context sharing, and structured message passing between specialized agents.
Built a production-grade incremental news intelligence system for continuous article ingestion, clustering, and trend detection using NLP and sentence embeddings. Implemented incremental clustering, time-decayed topic modeling, and LLM-based summaries, enabling real-time insight without retraining models. Designed a deterministic, state-persistent architecture with REST APIs and a professional analytics dashboard.
Published in Metszet Journal. An in-depth analysis of AI technologies impacting the healthcare industry.
An exploration of DeepSeek OCR's revolutionary approach to optical character recognition and image compression technology.
Exploring the revolutionary concept of AI-to-AI communication and how autonomous agents are reshaping the future of technology.
A comprehensive guide to building and deploying AI agents using Google's Agent Starter Pack, simplifying the process of creating intelligent autonomous systems.
I am currently looking for new opportunities to contribute to Data Science projects. Whether you have a question, want to collaborate, or just want to say hi, feel free to reach out or schedule a meeting!
I can provide information about Anuj's professional background, including his education, projects, skills, and work experience. What would you like to know about Anuj?