Data Export Usage Telemetry Collection Project

Noel Mendoza
Engineers @ Optimizely
5 min readJun 8, 2020

--

Introduction

Six months ago, I was selected as a 2019 Optimizely I/Own It recipient. Having very trivial knowledge of coding, Optimizely’s I/Own It program, in partnership with Hack Reactor @ Galvanize gave me the opportunity to jumpstart my career as a software engineer. After completing my Software Engineering Immersive program through Hack Reactor @ Galvanize, I began my internship at Optimizely. I’ve spent the past three months learning about how Optimizely empowers its customers through experimentation data and completed a project that gives insight into customer usage of Optimizely’s data export products for our internal teams. In this blog post, I’ll share how I built a real-time data pipeline to track customers’ usage telemetry data of our data export products.

Overview

As the leading experimentation and personalization platform, Optimizely offers a powerful data export service to our customers. Data export allows customers to export their experimentation events data to their own data warehouse and perform free form analysis with their own analytics platform. Last month, Optimizely launched Enriched Events Export, the most powerful and flexible export product yet, and simultaneously deprecated the existing Raw Events and Results Exports. This new product launching prompted an urgent need for collecting migration metrics and observability into export usage over time. Without this visibility, the migration of customers to the Enriched Events Export would be difficult to track. In addition, it helps us identify major users of Optimizely data, and work with them to develop improvements to the Data platform suite.

For more information on Optimizely’s data exports, please visit docs.

Data Export Usage Telemetry Collection

As part of the Data Infrastructure team, the objectives of my internship project were:

  • Tracking customer usage of deprecated exports and new Enriched Events Export
  • Implementing a data pipeline that persists export usage data

With long-term goals of:

  • Understanding customer behavior based on usage
  • Enabling data-driven decisions as we launch new features

Overall Workflow for Data Export and Usage Monitoring

Visualization of the data pipeline that captures export data usage.

Step 1: Customer Export Request

The data pipeline begins when a customer requests an export by logging a ticket with our support team. Each export bucket has server access logging enabled, which means for each object being read, a corresponding log is generated to our S3 export access logs bucket. This log captures all the relevant data associated with the export request. You can find more information about the log format here: Amazon S3 Server Access Log Format.

Step 2: AWS Lambda Function

Next, comes the AWS Lambda function. AWS Lambda allowed me to run code without provisioning or managing any other server. It also executes the code only when triggered and can scale automatically from handling a few requests per day to more than thousands of requests per second, depending upon the customers’ usage of export patterns. I configured the function to get triggered every time an access log file is written to the export access logs bucket. The code I implemented in the function parsed the log files, extracted relevant information (bucket, accountId, requester, etc.) and wrote into a new location, in this case, our FiveTran bucket.

Step 3: FiveTran & SQL Database

The FiveTran connector I configured allowed me to periodically sync JSON files representing the access log data from S3 to Optimizely’s internal PostgreSQL data warehouse. Below is a snippet of the contents of a JSON file that my AWS Lambda Function parsed:

In addition, a snapshot of what my FiveTran connector looks like for a single day:

For more information on how to configure FiveTran, please visit their docs.

Step 4: Dashboard for Usage Data Reporting and Monitoring

Chartio is our data visualization and exploration tool which we use to power dashboards and make data-driven decisions. After connecting Chartio to our database, my team was able to begin answering primary questions about our customer’s export usage. For example:

  • What are the daily counts of reads happening for each customer?
  • What are counts of errors happening while reading for each customer?
  • What is the total data size read by each customer on a daily basis?

Here is the chart that provides insight into the total accounts utilizing our export feature for the last two weeks of May:

For more information on how to configure Chartio, please visit their docs.

Results

This near real-time telemetry pipeline to track data export usage enables both the data infrastructure and product development teams to get crucial insights into how Optimizely’s customers access and use data, which in turn helps us build better products to serve our customers. In addition, the historical usage dataset is also backfilled to drive deeper insights around export usage over time, which would help us migrate our customers using legacy export products into the new export feature. Finally, the Data Infrastructure team can continuously answer additional questions by utilizing Chartio queries as we begin to migrate from previous export products to the new Enriched Event Export product.

Learnings

My experience working on this project and exposure to a software engineering team was priceless. All the technologies mentioned in my workflow data pipeline were unfamiliar to me, but with the support of my team, I was able to quickly grasp these technologies. In addition, I was exposed to Optimizely’s infrastructure on AWS and setup permissions, IAM roles/policies, and AWS configuration via Terraform. I implemented Jest unit testing for the AWS Lambda function and worked closely with AWS S3 API operations.

Many thanks to my mentors Atul Jha and Rahul Aravind Mehalingam, my engineering manager Mike Borsuk, and of course the entire Data Infrastructure Team in supporting me throughout my project, reviewing drafts of this post, and making many helpful suggestions and edits. Lastly, I’m grateful to Optimizely, Peter Oh, and the I/Own It program for allowing me to break into tech as a historically underrepresented minority.

Noel Mendoza is on LinkedIn.

--

--