Anh Hoang Chu

Software Engineer & Data Engineer

profile-pic

Summary

I'm a Data Architect who is passionate about working with data and bringing data insights closer to business users through the help of technology. I have experience in data engineering, big data, data science, data warehouse, back-end databases for web applications on GCP, Azure and AWS. My tech stack is Python, SQL, Linux, PySpark, Kafka, Airflow, Tableau, Kubernetes, BigQuery, Redshift, and Azure Synapse Analytics

Experience

Databricks

08/2024 - Present

Specialist Solutions Architect

Deliver technical leadership to enterprise clients on architecting and implementing data modernization solutions, specializing in Delta Lake, big data platforms, Apache Spark, SQL optimization, and advanced data engineering practices.

  • Lead technical engagements with strategic enterprise customers, providing in-depth guidance on architecting and implementing large-scale data modernization initiatives.
  • Advise on best practices for designing robust data lakehouse solutions using Delta Lake, Apache Spark, and Databricks.
  • Drive optimization of big data workloads, including Spark and SQL performance tuning, data pipeline development, and scalable data engineering architectures.
  • Collaborate with cross-functional teams to deliver end-to-end solutions that accelerate data-driven business outcomes and ensure successful adoption of modern data platforms.

Databricks

03/2023 - Present

Sr Specialist Solutions Engineer

Provide technical guidance to strategic customers in designing and implementations of enterprise data modernization projects from using Delta Lake, Big Data, Spark And SQL Optimization, and Data Engineering

Microsoft

02/2022 - 03/2023

Software Engineer

Software Engineer building, configuring, and managing back-end infrastructure for a video-powered social-learning platform owned by Microsoft

  • Led the data warehouse migration of AWS Redshift to Synapse Data Lakehouse (DLH) from architecture design to production operation
  • Built and maintained batch and streaming pipelines from transactional databases and telemetry data to Data Lakehouse
  • Provided a fast, stable, and consistent data platform on Azure Cloud for analytics downstream
  • Performed data transformation and analytics with Python, Azure Synapse Spark, Change Data Capture with Debezium, and streaming service with Kafka and Azure EventHub
  • Actively resolved performance issues, applied data loading and table design optimization resulting in 4-5x times faster queries
  • Ensured data quality, and data security through data validation, data management, and monitoring best practices
  • Built and maintained a more reliable and consistent downstream sync from DLH to CRM system using REST API
  • Ensured highly available and performant application by maintaining a multitude of Azure cloud services including storage, CI/CD, database, data warehouse, and Kubernetes

Walmart Global Tech

01/2020 - 02/2022

Software Engineer II

Software Engineer building an end-to-end analytical Supply Chain web application to track inventory and transportation from Suppliers to Stores for international markets

  • Led a team of 4 developers in migrating on-prem Data Warehouse (Teradata) to Google Cloud Platform for 10 markets using Big Query, Dataproc, Python, PySpark, and Aiflow
  • Continuously delivered new data features by analyzing and calculating supply chain metrics with SQL and Spark
  • Built and maintained ETL data pipelines that load analytical datasets to MSSQL Server from multiple data sources in Teradata, BigQuery, Oracle Database, Informix Database
  • Perform data validation and unit testing to ensure data quality
  • Improved application performance by 70% with the implementation of caching, indexing and data aggregation in the database instead of in back-end web service, which reduced the volume of data flow through the network.
  • Reduced development time and codebase complexity by 80% with code refactoring, SQL reformating, Git and CI/CD pipeline

Education

Harrisburg University of Science & Technology

Masters Computer Science

University of Texas at Dallas

Masters Supply Chain Management