Data Engineer · B.Tech AI

S. Abhijit Rao

Scalable data pipelines, ETL/ELT workflows, and distributed data systems using Python, PySpark, and SQL. Building production-grade data infrastructure on cloud platforms.

About Me

Data Engineer with hands-on experience designing and deploying scalable data pipelines, ETL/ELT workflows, and distributed data systems using Python, PySpark, and SQL. Proven ability to build production-grade data infrastructure integrating cloud platforms (AWS), databases, and automated ingestion frameworks. Delivered end-to-end data solutions on time across multiple organizations, including enterprise analytics platforms with PostgreSQL, API integrations, and workflow orchestration using Airflow. Strong foundation in data quality, performance optimization, and cross-functional stakeholder collaboration. B.Tech in AI with coursework in distributed computing, database systems, and cloud computing.

IndiaB.Tech AI, Mahindra University
Scroll

Experience

Professional Experience

A timeline of my roles and key contributions in AI and software engineering.

Jan 2026 – Present

Astraveda

AI Engineer (Freelance)

Jan 2026PresentRemote

Architected and deployed production data infrastructure for enterprise petroleum analytics platform, owning end-to-end database design, ETL pipelines, cloud operations, and data monitoring serving live business users.

Key responsibilities
  • Engineered scalable data pipelines using Python, FastAPI, and PostgreSQL with AWS services (Lambda, S3, Amplify) for automated data ingestion, transformation, and delivery across multiple data sources
  • Built automated document processing pipeline ("Click Astra" OCR) extracting structured data from unstructured documents, implementing data validation, quality controls, and compliance checks with error handling workflows
  • Developed multi-source data retrieval and aggregation system ("Ask Astra") enabling real-time analytics queries across enterprise datasets using LangChain and LangGraph orchestration
Python
FastAPI
PostgreSQL
AWS Lambda
AWS S3
AWS Amplify
LangChain
LangGraph
Supabase
Next.js

Nov 2025 – Dec 2025

Nuevosol Energy Pvt Ltd

AI Intern

Nov 2025Dec 2025Head office Madhapur, Hyderabad, India

Designed and built data ingestion and retrieval pipelines enabling enterprise-wide access to internal documentation through automated data extraction, transformation, and indexing workflows, delivering the solution on time to stakeholders.

Key responsibilities
  • Implemented scalable data processing workflows using Python and SQL, integrating multiple data sources into a unified searchable knowledge base with vector database (ChromaDB) for semantic indexing
  • Deployed end-to-end data solution with FastAPI backend and Next.js frontend on cloud infrastructure (Vercel, Render), ensuring reliable data delivery and real-time query capabilities
Python
SQL
ChromaDB
FastAPI
Next.js
Vercel
Render

Feb 2025 – June 2025

Prodigal AI

Agentic AI Intern

Feb 2025June 2025Remote, India

Developed automated ETL pipelines for Dhanur AI video processing platform, transforming raw media data into structured, production-ready outputs using Python and distributed processing techniques.

Key responsibilities
  • Built data pipelines integrated with vector databases for intelligent data retrieval, content segmentation, and metadata extraction using ChromaDB, processing large-scale unstructured data
  • Reduced manual data processing time significantly through pipeline automation and performance tuning for cost-effectiveness, improving throughput and operational efficiency across the content production workflow
Achievements
  • Awarded Intern of the Month in April 2025 for exceptional performance and innovation in data pipeline development
Python
ChromaDB
ETL Pipelines
Vector Databases
FastAPI
Data Processing

Work

Projects

A showcase of my work in AI/ML, data science, and software development

Multi-Agent Financial Chatbot System

Modular multi-agent system using open-source LLMs with Phidata framework for real-time financial data retrieval and analytics.

Python
LangChain
LangGraph
Phidata
LLaMA 3.1
+4
  • 1

    Multi-agent architecture with specialized roles for finance and web research

  • 2

    High-performance response times using Groq inference APIs

  • 3

    Real-time stock data retrieval and analytics capabilities

Natural Language → SQL Agent

Conversational agent using LangGraph to generate and execute SQL queries from plain English questions against a local database.

Python
LangGraph
SQLite
Streamlit
SQL
+1
  • 1

    3-node graph architecture for query generation and execution

  • 2

    Interactive Streamlit UI with SQL preview and results

  • 3

    Syntactically-correct SQL generation from natural language

NASA Turbofan Jet Engine RUL Prediction

Data pipeline and ML model to predict Remaining Useful Life (RUL) of turbofan engines using NASA's dataset.

Python
Scikit-Learn
Pandas
NumPy
Random Forest
+2
  • 1

    End-to-end data pipeline for time-series sensor data

  • 2

    Feature engineering and data preprocessing workflows

  • 3

    Random Forest model optimization for accurate RUL predictions

Skills

Technical Skills

Languages

PythonPySparkSQLJavaC++Node.js

Big Data & Processing

Apache Spark (PySpark)Distributed ComputingETL/ELT PipelinesPandasNumPyData Preprocessing

Databases & Storage

PostgreSQLMySQLSQLiteChromaDBAWS S3AWS RedshiftData Warehousing/Data Mart (Star/Snowflake Schema)

Cloud Platforms

AWS (S3, EMR, Glue, Lambda, Redshift, Amplify)SupabaseGCP (BigQuery)

Orchestration & Streaming

Apache AirflowApache KafkaPrefect

DevOps & Tools

CI/CD PipelinesGitDockerFastAPIStreamlitREST APIs

Concepts & Expertise

Data Pipeline ArchitectureData Quality & GovernanceData Monitoring & CompliancePerformance OptimizationCost-Effectiveness AnalysisAPI/System IntegrationStakeholder ManagementAgile/ScrumSDLC

BI & Visualization

Power BITableauMatplotlibSeaborn

Soft Skills

Collaborative TeamworkStakeholder CommunicationAdaptabilityProblem Solving

GitHub

Contribution activity from my GitHub profile

Contributions in the last year

GitHub contribution heatmap for Abhijit7979 (last year)
Data from public contributions · updates automaticallyView profile

Contact

Get In Touch

Interested in collaborating on AI projects or discussing opportunities? I’d love to hear from you.