Data Analyst



Product Designer



Story Teller

About Me

I am an experienced data analyst specialized in product analytics, machine learning and visual communication.


In the past three years, I have collaborated with multiple teams with diverse background on a span of very different projects in a fast-paced, enthusiastic environment. I worked closely with product management, business development, policy, and data science, as well as municipal staff across North America and Europe at Uber. My projects have a broad span from market size prediction to dashboard implementation, also from assisting data migration to creating internal team websites.


I always have curiosity, passion and collaborative spirit to work in an innovative field and contribute my skills to optimizing business decisions or creating successful products.


Professional Experience


Uber, New York, NY
Data Analyst, Tech
Apr 2019 - Jun 2020


Data Analytics
- Established and analyzed key metrics using statistical methods in Python, R, and QGIS, provided guidance and support for the Uber central policy, product, and local operation teams to expand JUMP (Uber’s e-bike and e-scooter share brand) globally
- Optimized the BOOST (JUMP’s low-income plan) pricing system with census data by applying a quantitative approach to estimate the market size, which will increase revenue by 20%. The plan has been adopted by the Uber Central Policy team after presented to a group of senior data scientists and operation directors
- Improved 60% of operation efficiency by developing complex SQL queries and dashboards leveraging 1 million + records from Uber databases in Hive, Presto, and Vertica, the dashboard is utilized by 5 operation teams to timely adjust business strategies

Predictive Modeling
- Conducted spatial distribution and time series statistical research in Python on JUMP vehicle loss in North America, developed a Random Forest model to predict vehicle loss risk, and performed K-fold cross-validation to increase the sensitivity. The final model has a high recall rate of 98% and has been accepted by local operation team as operation guidelinet
- Designed and implemented a prediction and visualization framework in Python based on K-MEANS clustering algorithm, which utilizes JUMP datasets to auto-generate mappings and graphs for market entry permits and tenders submissions

Product Analytics
- Proposed the prototype of suggested parking zone features, maintained data integrity, supported feature launch in the Uber App with Product engineers, and performed A/B testing. This feature was derived from the Douglas-Peucker algorithm, and has significantly lifted parking compliance rate by 18% and helped Uber avoid 36k+ parking fines per city per year
- Created an internal work-file sharing website using JavaScript, CSS, and Google Sites to promote team collaboration between data science, business development, policy, market-entry, as well as municipal staff in 33 cities across North America and Europe


Herrera Environmental Consultants, Seattle, WA
Data Analyst Intern
Jan 2019 - Apr 2019


- Created interactive story maps using drone photos and online open data, assisted company winning clients and cases worth of $300,000, including King County, City of Redmond and City of Federal Way
- Improved fieldwork data collection efficiency by 30% through designing customized web applications using ArcGIS Online
- Corrected abnormal values, checked data integrity, maintained and updated 60G+ of spatial data using ArcGIS, Numpy, ArcPy Module and R
- Improved 50% efficiency for data research & report writing by developing a pipeline to automate detecting the outliers


The Wharton School, University of Pennsylvania, Philadelphia, PA
Data Analyst in Visualization (part-time)
Oct 2017 - Jan 2019


- Improved efficiency by 80% for cartography & report writing by developing R scripts to automatic cartography process
- Produced choropleth maps including 3040 counties and infographics using R (sf, ggplot2, etc.), QGIS, GeoDa, and Adobe Suite
- Maintained and updated a database using data from open data websites including ACS, USGS, TIGER, etc. using PostgresSQL
- Co-researched the emerging private residential flood insurance market, and the structure of the residential flood insurance market in the US with Dr.Carolyn Kousky, produced visual works for over 5 publications in economic and policy research


University of Pennsylvania, Philadelphia, PA
Teaching Assistant (part-time)
Aug 2017 - Dec 2018


- Improved efficiency by 80% for cartography & report writing by developing R scripts to automatic cartography process
- Taught 187 students in collaboration with Dr. Dana Tomlin, the founder of Map Algebra, in raster processing tools and Python in ArcGIS (ArcPy module), ModelBuilder, ArcGIS API for Python and Google Earth Engine JavaScript API for two semesters
- Co-taught Quantitative Reasoning with Dr. Ioana E. Marinescu for 56 students with non-technical backgrounds in employing R, descriptive and inferential statistics in social studies projects at the School of Social Policy & Practice
- Solved 240+ problems raised by 119 students without technical background in 165+ independent office hours


Azavea, Philadelphia, PA
Data Analyst
Jun 2018 - Sep 2018


Project 1: Identify Marine Debris Patterns in Mid-Atlantic Region
- Improved efficiency by 80% for cartography & report writing by developing R scripts to automatic cartography process
- Slashed 80% marine debris cleanup costs by wisely targeting high-risk marine pollution locations by performing a Kriging-based geostatistical analysis in ArcGIS desktop for International Coastal Cleanup at Ocean Conservancy
- Geolocated 10,000+ marine debris pickup records in the past 20 years across 111056 square miles in five Mid-Atlantic states
- Formulated a feasible plan for targeting future cleanup efforts and fundraising based on the predicted results


Project 2: Ecological Analysis for Arctic National Wildlife Refuge
- Increased geoprocessing efficiency by 75% by developing Python scripts to automate ELT processes for ArcGIS
- Digitized the past 20 years of historical datasets in the Refuge by applying georeferencing and remote sensing techniques
- Designed a pipeline in R for species distribution modeling, and prepared technical tutorials for non-technical readers
- Visualized data and created interactive web applications using JavaScript, D3, ggplot2, vector tiles, and Adobe Creative Cloud


Skills


Data Analytics and Visualization
Python: Pandas, Numpy, SciPy, Scikit-learn, Matplotlib, Ploty, Seaborn, Altair, Bokeh
R: Tidyverse, dplyr, ggplot2, Shiny
Big Data: Hive, Spark
Machine learning: NLTK, Keras
AWS: S3, EC2
Dashboard: Tableau, D3, Flask


Database
RDBMS: MySQL, Presto, Vertica, PostgresQL & PostGIS, SQLite
Big Data: Spark, Hive, Hadoop, EMR, Airflow
Cloud Data Warehouse: Redshift, RDS, S3
NoSQL: Cassandra


Spatial Analysis and Cartography
Spatial Analysis: GeoPandas, QGIS, ArcGIS (ModelBuilder, arcpy, ArcGIS online), Google Earth Engine
Cartography: Leaflet, CARTO, Mapbox


Graphics and Design
Graphic Design: Adobe Suite, Adobe Creative Cloud, InkScape
Modeling: AutoCAD, Rhino, Google Sketchup

Web Development
HTML, CSS, JavaScript (jQuery, bootstrap), Flask, node.js, RESTful APIs, docker


Education


University of Pennsylvania, Philadelphia, PA
Graduated Dec 2018

Master of Urban Spatial Analytics (Top 1%)
Master of Landscape Architecture and Regional Planning


Tongji University, Shanghai, China
Graduated Jun 2015

Bachelor of Engineering in Landscape Architecture (Top Graduate with Distinction)