Data Engineer · B.Tech AI
S. Abhijit Rao
Scalable data pipelines, ETL/ELT workflows, and distributed data systems using Python, PySpark, and SQL. Building production-grade data infrastructure on cloud platforms.
About Me
Data Engineer with hands-on experience designing and deploying scalable data pipelines, ETL/ELT workflows, and distributed data systems using Python, PySpark, and SQL. Proven ability to build production-grade data infrastructure integrating cloud platforms (AWS), databases, and automated ingestion frameworks. Delivered end-to-end data solutions on time across multiple organizations, including enterprise analytics platforms with PostgreSQL, API integrations, and workflow orchestration using Airflow. Strong foundation in data quality, performance optimization, and cross-functional stakeholder collaboration. B.Tech in AI with coursework in distributed computing, database systems, and cloud computing.
Experience
Professional Experience
A timeline of my roles and key contributions in AI and software engineering.
Jan 2026 – Present
Jan 2026 – Present
Astraveda
AI Engineer (Freelance)
Architected and deployed production data infrastructure for enterprise petroleum analytics platform, owning end-to-end database design, ETL pipelines, cloud operations, and data monitoring serving live business users.
Key responsibilities
- Engineered scalable data pipelines using Python, FastAPI, and PostgreSQL with AWS services (Lambda, S3, Amplify) for automated data ingestion, transformation, and delivery across multiple data sources
- Built automated document processing pipeline ("Click Astra" OCR) extracting structured data from unstructured documents, implementing data validation, quality controls, and compliance checks with error handling workflows
- Developed multi-source data retrieval and aggregation system ("Ask Astra") enabling real-time analytics queries across enterprise datasets using LangChain and LangGraph orchestration
Nov 2025 – Dec 2025
Nov 2025 – Dec 2025
Nuevosol Energy Pvt Ltd
AI Intern
Designed and built data ingestion and retrieval pipelines enabling enterprise-wide access to internal documentation through automated data extraction, transformation, and indexing workflows, delivering the solution on time to stakeholders.
Key responsibilities
- Implemented scalable data processing workflows using Python and SQL, integrating multiple data sources into a unified searchable knowledge base with vector database (ChromaDB) for semantic indexing
- Deployed end-to-end data solution with FastAPI backend and Next.js frontend on cloud infrastructure (Vercel, Render), ensuring reliable data delivery and real-time query capabilities
Feb 2025 – June 2025
Feb 2025 – June 2025
Prodigal AI
Agentic AI Intern
Developed automated ETL pipelines for Dhanur AI video processing platform, transforming raw media data into structured, production-ready outputs using Python and distributed processing techniques.
Key responsibilities
- Built data pipelines integrated with vector databases for intelligent data retrieval, content segmentation, and metadata extraction using ChromaDB, processing large-scale unstructured data
- Reduced manual data processing time significantly through pipeline automation and performance tuning for cost-effectiveness, improving throughput and operational efficiency across the content production workflow
Achievements
- ★ Awarded Intern of the Month in April 2025 for exceptional performance and innovation in data pipeline development
Recognition
Achievements & Education
Work
Projects
A showcase of my work in AI/ML, data science, and software development
Multi-Agent Financial Chatbot System
Modular multi-agent system using open-source LLMs with Phidata framework for real-time financial data retrieval and analytics.
- 1
Multi-agent architecture with specialized roles for finance and web research
- 2
High-performance response times using Groq inference APIs
- 3
Real-time stock data retrieval and analytics capabilities
Natural Language → SQL Agent
Conversational agent using LangGraph to generate and execute SQL queries from plain English questions against a local database.
- 1
3-node graph architecture for query generation and execution
- 2
Interactive Streamlit UI with SQL preview and results
- 3
Syntactically-correct SQL generation from natural language
NASA Turbofan Jet Engine RUL Prediction
Data pipeline and ML model to predict Remaining Useful Life (RUL) of turbofan engines using NASA's dataset.
- 1
End-to-end data pipeline for time-series sensor data
- 2
Feature engineering and data preprocessing workflows
- 3
Random Forest model optimization for accurate RUL predictions
Skills
Technical Skills
Languages
Big Data & Processing
Databases & Storage
Cloud Platforms
Orchestration & Streaming
DevOps & Tools
Concepts & Expertise
BI & Visualization
Soft Skills
GitHub
Contribution activity from my GitHub profile
Contributions in the last year
Contact
Get In Touch
Interested in collaborating on AI projects or discussing opportunities? I’d love to hear from you.

