a
ansarhayat515

Ansar H

@ansarhayat515

Data Engineering, Azure,AWS, Databricks, Lakehouse , Spark, Fabric

Paquistão
Inglês
Algumas informações são exibidas no idioma inglês.
Sobre mim
I am a Databricks Certified Professional and Microsoft Certified Data Engineer with 9 years of real-world experience in building scalable, high-performance data platforms. I help businesses design and implement end-to-end data solutions, including ETL/ELT pipelines, data warehousing, lakehouse architectures, and cloud-based analytics. My focus is on delivering clean, optimized, and production-ready data systems that empower better decision-making Core Skills: Azure, Databricks, AWS, EMR,Airflow, Fabric, Data Lake Storage, DeltaLake, Azure Data Factory, PostgreSQL, MySQL, SQL,Mongo,Power BI.... Saiba mais

Habilidades

a
ansarhayat515
Ansar H
offline • 
Tempo médio de resposta: 7 horas

Conheça meus serviços

ETLs de dados
I will build and optimize scalable databricks delta lake pipelines
ETLs de dados
I will build and optimize scalable microsoft fabric pipelines and onelake architecture

Portfólio

Experiência profissional

Senior Data Engineer

DIS • Período integral

Sep 2023 - Jan 20262 yrs 4 mos

Responsible for:  Led Databricks platform onboarding with cost analysis comparing AWS Glue and Databricks compute.  Designed and managed end-to-end data pipelines using Databricks and AWS Glue.  Utilized Apache Spark for batch and stream processing, ensuring data quality and consistency.  Integrated data from PostgreSQL and Amazon S3.  Performed ETL/ELT operations and maintained Delta Lake-based data lakes.  Developed and deployed a DIS Analytics alerting process leveraging Amazon S3 event triggers, AWS Lambda, and Amazon SNS to automatically detect and notify on duplicate records, missing source files during data ingestion in Databricks.  Built a conversion layer using Databricks Genie AI.  Exposed data via Databricks SQL API & Databricks Delta Share with secure access controls (RBAC, authentication).  Used Spark OCR to extract data from financial invoice PDFs stored in S3 and loaded results into Delta tables.  Implemented data governance practices with Unity Catalog.  Optimized Spark job performance, reducing costs by 50%.  Monitored and tuned clusters for performance and reliability.  Integrated Databricks with Power BI for reporting and dashboards.

Senior Data Consultant

Systems Limited (Regeneron) • Período integral

Jul 2019 - Aug 20234 yrs 1 mo

Responsible for:  Delivered big data solutions to various clients using AWS data platform services.  Ingested data from SharePoint, SFTP, and cloud storage into the data lake using Apache NiFi and Pypark.  Performed data transformation using PySpark; deployed scripts via Jenkins and scheduled DAGs in Airflow, running on EMR clusters.  Collaborated with the Databricks team to implement solutions using AWS Databricks E2.  Provisioned S3 buckets for audit logs and Terraform state files using Terraform scripts.  Built CI/CD pipelines integrated with Bitbucket.  Created and managed Databricks workspaces and policies.  Configured Unravel with Databricks for Hive and S3 access control.  Worked with Informatica Cloud to move data into Amazon Redshift.  Developed Kafka producer/consumer applications on clusters managed with Zookeeper.  Leveraged Kafka APIs and connectors (MySQL, PostgreSQL, MongoDB) for smooth message processing.  Used Alteryx to build ETL pipelines; created functional diagrams and documented data flows in Confluence.

The_ENTERTAINER

Data Engineer

The ENTERTAINER • Período integral

Jul 2019 - Oct 20212 yrs 3 mos

Responsible for:  Worked with structured, semi-structured, and unstructured data. Designed and implemented a scalable Big Data architecture using Databricks, Data Lake, and Delta Lake, incorporating automated data quality and governance controls.  Delivered end-to-end data engineering solutions leveraging Databricks, Delta Lake, and Azure Synapse Analytics to support enterprise analytics needs.  Developed high-performance batch data pipelines using PySpark and Spark SQL to ingest and transform data from MongoDB into the enterprise data lake.  Built, scheduled, and orchestrated ETL/ELT pipelines in Azure Data Factory for Azure Synapse Analytics environments.  Designed and implemented real-time streaming data pipelines using Azure Event Hub, enabling near real-time ingestion into Delta Lake.  Created executive and operational dashboards using Tableau Desktop and Tableau Server, supporting automated and scheduled reporting.  Utilized advanced MongoDB aggregation and indexing techniques to support ad hoc and analytical reporting use cases.