Data Engineer · B.Tech AI

S. Abhijit Rao

Scalable data pipelines, ETL/ELT workflows, and distributed data systems using Python, PySpark, and SQL. Building production-grade data infrastructure on cloud platforms.

View Projects

Download Resume

About Me

Data Engineer with hands-on experience designing and deploying scalable data pipelines, ETL/ELT workflows, and distributed data systems using Python, PySpark, and SQL. Proven ability to build production-grade data infrastructure integrating cloud platforms (AWS), databases, and automated ingestion frameworks. Delivered end-to-end data solutions on time across multiple organizations, including enterprise analytics platforms with PostgreSQL, API integrations, and workflow orchestration using Airflow. Strong foundation in data quality, performance optimization, and cross-functional stakeholder collaboration. B.Tech in AI with coursework in distributed computing, database systems, and cloud computing.

IndiaB.Tech AI, Mahindra University

Emailsar.abhijit2003@gmail.com

Phone+91 9985727779

Scroll

Experience

Professional Experience

A timeline of my roles and key contributions in AI and software engineering.

Jan 2026 – Present

Astraveda

AI Engineer (Freelance)

Jan 2026 – PresentRemote

Architected and deployed production data infrastructure for enterprise petroleum analytics platform, owning end-to-end database design, ETL pipelines, cloud operations, and data monitoring serving live business users.

Key responsibilities

Engineered scalable data pipelines using Python, FastAPI, and PostgreSQL with AWS services (Lambda, S3, Amplify) for automated data ingestion, transformation, and delivery across multiple data sources
Built automated document processing pipeline ("Click Astra" OCR) extracting structured data from unstructured documents, implementing data validation, quality controls, and compliance checks with error handling workflows
Developed multi-source data retrieval and aggregation system ("Ask Astra") enabling real-time analytics queries across enterprise datasets using LangChain and LangGraph orchestration

Website·App

Python

FastAPI

PostgreSQL

AWS Lambda

AWS S3

AWS Amplify

LangChain

LangGraph

Supabase

Next.js

Nov 2025 – Dec 2025

Nuevosol Energy Pvt Ltd

AI Intern

Nov 2025 – Dec 2025Head office Madhapur, Hyderabad, India

Designed and built data ingestion and retrieval pipelines enabling enterprise-wide access to internal documentation through automated data extraction, transformation, and indexing workflows, delivering the solution on time to stakeholders.

Key responsibilities

Implemented scalable data processing workflows using Python and SQL, integrating multiple data sources into a unified searchable knowledge base with vector database (ChromaDB) for semantic indexing
Deployed end-to-end data solution with FastAPI backend and Next.js frontend on cloud infrastructure (Vercel, Render), ensuring reliable data delivery and real-time query capabilities

Python

SQL

ChromaDB

FastAPI

Next.js

Vercel

Render

Feb 2025 – June 2025

Prodigal AI

Agentic AI Intern

Feb 2025 – June 2025Remote, India

Developed automated ETL pipelines for Dhanur AI video processing platform, transforming raw media data into structured, production-ready outputs using Python and distributed processing techniques.

Key responsibilities

Built data pipelines integrated with vector databases for intelligent data retrieval, content segmentation, and metadata extraction using ChromaDB, processing large-scale unstructured data
Reduced manual data processing time significantly through pipeline automation and performance tuning for cost-effectiveness, improving throughput and operational efficiency across the content production workflow

Achievements

★ Awarded Intern of the Month in April 2025 for exceptional performance and innovation in data pipeline development

Python

ChromaDB

ETL Pipelines

Vector Databases

FastAPI

Data Processing

Recognition

Achievements & Education

LinkedInAugust 2025

Bachelor of Technology in Artificial Intelligence

Mahindra University

On August 3rd, 2025, I officially graduated with a Bachelor of Technology in Artificial Intelligence from Mahindra University—and had the incredible honor of receiving my degree from Mr. Mohit Joshi, CEO of Tech Mahindra.

View post on LinkedIn

LinkedInApril 2025

Intern of the Month

Prodigal AI

Thrilled to be recognized as Prodigy of the Month at Prodigal AI! Grateful to be part of a team that fosters innovation, learning, and collaboration. Looking forward to continuing this journey with even more energy!

View post on LinkedIn

Work

Projects

A showcase of my work in AI/ML, data science, and software development

Multi-Agent Financial Chatbot System

Modular multi-agent system using open-source LLMs with Phidata framework for real-time financial data retrieval and analytics.

Python

LangChain

LangGraph

Phidata

LLaMA 3.1

1
Multi-agent architecture with specialized roles for finance and web research
2
High-performance response times using Groq inference APIs
3
Real-time stock data retrieval and analytics capabilities

Code

Natural Language → SQL Agent

Conversational agent using LangGraph to generate and execute SQL queries from plain English questions against a local database.

Python

LangGraph

SQLite

Streamlit

SQL

1
3-node graph architecture for query generation and execution
2
Interactive Streamlit UI with SQL preview and results
3
Syntactically-correct SQL generation from natural language

Code

NASA Turbofan Jet Engine RUL Prediction

Data pipeline and ML model to predict Remaining Useful Life (RUL) of turbofan engines using NASA's dataset.

Python

Scikit-Learn

Pandas

NumPy

Random Forest

1
End-to-end data pipeline for time-series sensor data
2
Feature engineering and data preprocessing workflows
3
Random Forest model optimization for accurate RUL predictions

Code

Skills

Technical Skills

Languages

PythonPySparkSQLJavaC++Node.js

Big Data & Processing

Apache Spark (PySpark)Distributed ComputingETL/ELT PipelinesPandasNumPyData Preprocessing

Databases & Storage

PostgreSQLMySQLSQLiteChromaDBAWS S3AWS RedshiftData Warehousing/Data Mart (Star/Snowflake Schema)

Cloud Platforms

AWS (S3, EMR, Glue, Lambda, Redshift, Amplify)SupabaseGCP (BigQuery)

Orchestration & Streaming

Apache AirflowApache KafkaPrefect

DevOps & Tools

CI/CD PipelinesGitDockerFastAPIStreamlitREST APIs

Concepts & Expertise

Data Pipeline ArchitectureData Quality & GovernanceData Monitoring & CompliancePerformance OptimizationCost-Effectiveness AnalysisAPI/System IntegrationStakeholder ManagementAgile/ScrumSDLC