Hi, I'm Abhijeet Singh.
A(n)
Self-driven, quick starter, passionate Data Engineer & ML Developer with a curious mind who enjoys solving complex and challenging real-world problems.
About
Experienced Data Engineer and ML Developer with 7+ years of research and industry experience in designing, building, and scaling end-to-end data and machine learning solutions. Proficient in developing and deploying robust ELT/ETL pipelines, scalable data models, and cloud-based data warehouses using platforms like GCP and Azure. Skilled in architecting data workflows for batch and streaming use cases, leveraging tools such as BigQuery, Dataflow, Cloud Composer, Data Factory, and PySpark to process millions of records daily. Specialized in building predictive models and classification systems using NLP, LLMs, and statistical methods. Experienced with end-to-end ML lifecycle management—including feature engineering, model training, evaluation, deployment, monitoring, and retraining—using tools like Vertex AI, Azure ML, MLflow, and TensorFlow.Strong advocate of MLOps best practices for production-grade AI systems, with hands-on experience in version control, CI/CD automation, model governance, and infrastructure provisioning using Terraform and Git. Passionate about transforming raw data into actionable insights and intelligent products. Adept at cross-functional collaboration, bridging the gap between engineering, data science, and business to deliver high-impact, scalable.
- Languages: Python, SQL, C++, JavaScript, MATLAB, PHP, HTML/CSS
- Tools: GCP, Azure, Linux, Bash, Git, Terraform, Docker, Shell, ROS, Cognos, Apache Airflow
- NLP: NLTK, Spacy, Gensim, Hugging Face, Stanza, LangChain, DeepSpeed, PaddlePaddle
- Databases: MySQL, Microsoft SQL Server, SQLite, PostgreSQL, Pandas
- Libraries: Tensorflow, Pytorch, Keras, Sklearn, OpenCV, YOLO, Pandas, Numpy
- Soft Skills: Teamwork, Leadership, Communication, Work Ethic, Time Management, Creativity
I am seeking a challenging position that will enable me to combine my skills in Data Engineering and ML Development, while fostering professional growth, engaging experiences, and personal development.
Experience
- Led design and automation of 10+ data pipelines on Azure ingesting millions of records daily from SAP and legacy systems into Snowflake for market segmentation. Implemented CI/CD with Azure DevOps for streamlined deployments, and used ADF, Azure Functions, and ELT best practices to deliver scalable, cost-efficient solutions (40% faster queries & 99.9% uptime)
- Designed scalable, optimized data models and built robust backend APIs to manage millions of sales-representative records. Orchestrated end-to-end data pipelines on Azure using Data Factory, Azure SQL, and Azure Functions delivering high-performance, maintainable solutions aligned with enterprise data architecture standards
- Built and orchestrated telecom order processing pipelines on Azure using Data Factory, Event Hubs, Azure Databricks, and Synapse. Engineered scalable ETL workflows to support machine learning-driven customer segmentation, enabling efficient data ingestion, transformation, and model scoring with 88%+ accuracy. Ensured seamless integration across services with CI/CD and monitoring
- Designed and automated end-to-end data pipelines on GCP using Cloud Composer (Airflow), PySpark on Dataproc, and BigQuery to process 200K+ timesheet records daily. Optimized data reconciliation workflows, reducing errors by 30% and improving performance by 40%, while ensuring scalability, reliability, and cost-effective processing
- Designed and deployed a streaming fraud detection pipeline on GCP using Pub/Sub, Dataflow, BigQuery, and Vertex AI. Enabled low-latency model inference with neural networks, achieving a 92% F1 score and $1M+ annual fraud savings. Built with CI/CD, monitoring, and data quality checks for scalability and reliability
- Built and orchestrated 9 batch prediction pipelines on GCP for customer churn using Cloud Composer, Pub/Sub, BigQuery, and Vertex AI. Automated data ingestion, transformation, and model deployment for scalable, repeatable batch scoring
- Built an automated machine learning pipeline for finance revenue forecasting on Alteryx,reducing forecast error MAPE by 20% and manual effort by 40%.
- Gained a deep understanding of business values and agile delivery model through close collaboration with the clients and the business team
- Developed an NLP–driven binary classification model using TF-IDF and Word2Vec to identify duplicate Quora questions, 84% precision and 90% recall
- Designed and implemented a sentiment analysis pipeline for online reviews using SpaCy-based feature engineering and cross-validation, achieving a 90.08% F1 score with an SVM model
- Developed a robust malware detection model using advanced feature engineering on ASM and byte files, achieving a multiclass log loss of 0.01 with XGBoost
- Developed a pickup density forecasting model using clustering for regional segmentation and Fourier features for temporal patterns, achieving 10.16% MAPE with XGBoost
- Handled data cleaning, missing value treatment, and preparation for analysis with full process documentation to support logistics insights
- Created visualizations using Python and Tableau to present user stories and drive data-informed decisions in logistics operations
Skills
Languages






Tools






Databases




NLP






Libraries






Visualizations





Soft Skills




