Skip to main content

Wave Workshop - Big Data Visualizer

· 3 min read
Michelle Tanco

H2O Wave allows for easily building front ends to your projects. I was recently inspired by this tutorial notebook which explains how to use open source H2O-3 for finding anomalies in a dataset. Part of this process is using the H2O-3 aggregator function to visualize relationships in large datasets. A data scientist is at home in a Jupyter Notebook, but we could make it easier for ourselves and analysts or other business users to run this code and benefit from the H2O-3 aggregator function by building a front-end using H2O Wave.

Below you can see our data aggregation and visualization app. Currently, the app itself is creating a 1M row dataset as a demo. We can see that the H2O-3 aggregation function reduces this down into 68 exemplar rows and tells us how many of the original rows fall into each exemplar.

Aggregated data as a table.

Aggregated data as a plot.

Resources

You can find the full source code for this app on GitHub.

Interested in seeing what it takes to make this type of application? In a 1 hour live-coding session we were able to:

  • Create the layout of our application
  • Create two interactive tabs for navigating in the app
  • Create a table view for a dataset
  • Create a plot view for a dataset

Here's the replay:

Ideas for Improvement

For this app to be fully useful to our business users, we would probably want to add the following features:

  • Easily add data: allow users to aggregate and visualize on their own datasets
    • File upload from local machine
    • Connect to common SQL warehouses
    • Connect to common cloud data stores like s3
  • Improved backend performance
    • Connect to a production H2O-3 cluster rather than creating a cluster on the local machine of the H2O Wave server
  • Added user control
    • Let the end user decide parameters of the aggregator function like how many exemplar rows to attempt to make
  • Improved and new visualizations
    • Add new visualizations based on different data types
  • Robustness
    • Add unit tests!

If you do decide to work on this project, or use this as a template for your own projects, be sure to tag us on Twitter @h2o_wave or post as a Show and Tell on our GitHub discussions!