December 31, 2019

Using "Loose Typings" in GraphQL to Allow Dynamic Grouping and Aggregation Queries

Sebastian Scholl
@SebScholl

Several months ago a developer building on 8base reached out via Intercom with, what ended up being, a roadmap changing message. It read:

...I'm building a marketing campaign management system that's integrated with my eCommerce business. I've gotten the whole thing working, but feel stuck when it comes to generating good reports. Is there anyway I can do aggregations through the GraphQL API, like I can when using SQL?

The 8base team had a few cursory ideas on how to approach this problem. However, having to work within the confines of the GraphQL specification invalidated a number of the initially suggested approaches. That said, we felt that this wasn't the last time a developer would wish they had such a feature. So as a result, "Groupings and Aggregations" was added to the top of our development team's to-do list.

Before taking some time to talk about how we were able to achieve groupings and aggregations within the confines of GraphQL, let's first check out these new API features themselves. We'll use a simplified data model that's similar to the one the developer who reached out was using.

The Data Model

Every order being made on the eCommerce site triggered a webhook that created an Invoice record in 8base. Additionally, Invoices are associated with a Campaign record when applicable (within the campaign's start and end dates). Thus we have our two tables with the following fields defined.

Campaigns

  • Name: Text
  • StartDate: Date
  • EndDate: Date
  • Discount: Number
  • Invoices: Table (has-many relation)

Invoices

  • OrderName: Text
  • OrderDate: Date
  • OrderPrice: Number
  • Campaign: Table (belongs-to relation)

Simple enough! Right? Now with this data model defined, we can start writing some pretty useful queries - whether they're for reports, chart-data, or whatever else.

Comparing Campaign Performance

We filled the database with about 600 fake invoices and several campaigns. Using grouping and aggregations, we're going to run a query that gives us the following information back.

  1. The ID, Name, and Creator's email of each campaign.
  2. The number of Invoices attributed to each campaign.
  3. The average sale price and total revenue generated by each campaign.
  4. The order number of every Invoice in each group.

In order to accomplish this, we're going to use the following query. Pay attention to the inline comments, as well as notice the relationships between "as" declarations and the GraphQL response object keys that are being set to type values.

{% code-block language="javascript" %}
query {
 invoicesList(groupBy: {
   query: {
     /* We'll be grouping by Campign ID
          while aliasing the Name field as well. */
      campaign: {
        id: { as: "CampaignId" },
        name: { as: "CampaignName"},
        /* Jump into the owner relationship and
           alias the email field. */
       createdBy: {
          email: { as: "CampaignOwner" }
       }
     },
      /* On a single field we can run multiple
          aggregations. For example, both summing
         and averaging all prices in each group. */
      orderPrice: [{
        as: "AverageSalePrice",
        fn: { aggregate: AVG }
      }, {
        as: "TotalRevenue",
        fn: { aggregate: SUM }
      }],
      /* Using the _groups key, we can access the
          list of records used in the grouping function.
         This can would on any level/relationship. */
      _group: { as: "Invoices" }
   }
  }) {
    groups {
      /* The idea of "Loose Typings" come into play in
         that the API cannot know ahead of time what
         value types an aggregated result will be. Therefore
         the developer can declare them explicitly in the query. */
     CampaignId: ID
     CampaignName: String
      CampaignOwner: String
      AverageSalePrice: Float
      TotalRevenue: Float
      /* On the _group we can re-access the records used
          in the group on the normal record list types. */
      Invoices: InvoiceGroup {
        count
       items { orderNumber }
      }
    }
  }
}
{% code-block-end %}

While the syntax may look a little funky, a deceivingly powerful SQL statement gets generated in the background that fulfills the query. Notice how, for every value that we aliased using as in the groupBy query itself, we were able to use the name as a key in the response object while assigning a type on the fly - without it mattering whether the field, or aggregate value, was on the primary or a related table!

Running this query, we get the following JSON response from the 8base GraphQL API.

{% code-block language="json" %}
{
 "data": {
   "invoicesList": {
     "groups": [
       {
         "CampaignId": null,
         "CampaignName": null,
         "CampaignOwner": null,
         "AverageSalePrice": 49.804094,
         "TotalRevenue": 17033,
         "Invoices": {
           "count": 342,
           "items": [
             { "orderNumber": "000099" },
             // 341 other orderNumbers
           ]
         }
       },
       {
         "CampaignId": "ck4mza2pt00hs07mh0jli1hj9",
         "CampaignName": "Summer Sale",
         "CampaignOwner": "melissa@candleshop.com",
         "AverageSalePrice": 50.686567,
         "TotalRevenue": 3396,
         "Invoices": {
           "count": 67,
           "items": [
             { "orderNumber": "000256"},
             // 66 other orderNumbers
           ]
         }
       },
       {
         "CampaignId": "ck4mz9g2r00ax07ju6v8a6oaj",
         "CampaignName": "November Madness",
         "CampaignOwner": "jenny@candleshop.com",
         "AverageSalePrice": 46.953125,
         "TotalRevenue": 3005,
         "Invoices": {
           "count": 64,
           "items": [
             { "orderNumber": "000461" },
             // 63 other orderNumbers
           ]
         }
       },
       {
         "CampaignId": "ck4mz8sgw009c07juevcgfcqm",
         "CampaignName": "Christmas Blowout",
         "CampaignOwner": "james@candleshop.com",
         "AverageSalePrice": 50.128713,
         "TotalRevenue": 5063,
         "Invoices": {
           "count": 101,
           "items": [
             { "orderNumber": "000509" },
             // 100 other orderNumbers
           ]
         }
       }
     ]
   }
 }
}
{% code-block-end %}

Now, this is already pretty awesome. However, we can still take it a step further. Let's add some new, and totally believable, requirements to our specification.

  1. Only return groups belonging to a Campaign.
  2. Only return campaigns with an average sale price greater than or equal to $50.00

In order to achieve this, we're able to use a having clause. This allows us to use predicates that are specified as objects on declared types - using aggregated fields. For brevity's sake, I'll only post the new part of the query. However, know that it must come after the groupBy query argument.

{% code-block language="graphql" %}
query: { ...queryOptions },
/* Having let's us filter by an aggregated value
    using type specific predicates. */
having: {
 AND: [{
   alias: "CampaignName"
   string: {
     is_not_empty: true
   }
 },{
   alias: "TotalRevenue"
   float: {
     gte: 50.00
   }
 }]
}
/* GraphQL response object down here... */
{% code-block-end %}

By specifying both the alias - which must match an alias in the query - and the field's expected type, we're able to declare a predicate value that filters the results accordingly on an aggregated/grouped field. As a result, the following filtered response gets returned.

{% code-block language="json" %}
{
 "data": {
   "invoicesList": {
     "groups": [
       {
         "CampaignId": "ck4mza2pt00hs07mh0jli1hj9",
         "CampaignName": "Summer Sale",
         "CampaignOwner": "melissa@candleshop.com",
         "AverageSalePrice": 50.686567,
         "TotalRevenue": 3396,
         "Invoices": {
           "count": 67,
           "items": [
             { "orderNumber": "000256"},
              // 66 other orderNumbers
           ]
         }
       },
       {
          "CampaignId": "ck4mz8sgw009c07juevcgfcqm",
         "CampaignName": "Christmas Blowout",
         "CampaignOwner": "james@candleshop.com",
         "AverageSalePrice": 50.128713,
         "TotalRevenue": 5063,
         "Invoices": {
           "count": 101,
           "items": [
             { "orderNumber": "000509" },
             // 100 other orderNumbers
           ]
         }
       }
     ]
   }
 }
}
{% code-block-end %}

How it works?

It might be hard to appreciate these queries without being somewhat familiar with SQL. That said, each of these groupBy GraphQL queries generates a raw SQL statement that could make a lumber-jack cry. For example, our query above generates the following SQL statement.

{% code-block language="sql" %}
SELECT GROUP_CONCAT(DISTINCT `MT`.`id`) AS `_grouping`,
AVG(`MT`.`orderPrice`) AS `AverageSalePrice`,
SUM(`MT`.`orderPrice`) AS `TotalRevenue`,
`campaign`.`id` AS `CampaignId`,
`campaign`.`name` AS `CampaignName`,
`campaign->createdBy`.`email` AS `CampaignOwner`,
GROUP_CONCAT(DISTINCT `MT`.`id`) AS `Invoices`
 FROM `8base_5d84e92c`.`Invoices` AS `MT`
   LEFT JOIN (`8base_5d84e92`.`CampaignInvoices` AS `campaign->CampaignInvoices`
INNER JOIN `8base_5d84e92`.`Campaigns` AS `campaign`
         ON `campaign->CampaignInvoices`.`campaignBId` = `campaign`.`id`  
AND `campaign`.`deletedAt` = 0)
         ON `MT`.`id` = `campaign->CampaignInvoices`.`invoiceAId`
     LEFT JOIN `8base_5d84e92`.`Users` AS `campaign->createdBy`
         ON `campaign`.`createdById` = `campaign->createdBy`.`id`
AND `campaign->createdBy`.`deletedAt` = 0
       WHERE `MT`.`deletedAt` = 0
GROUP BY `campaign`.`id`, `campaign`.`name`, `campaign->createdBy`.`email`
HAVING ((`CampaignName` IS NOT NULL AND `CampaignName` <> '')
AND `TotalRevenue` >= 50)
ORDER BY `_grouping` ASC LIMIT 0, 5000
{% code-block-end %}

With that in mind, essentially what's happening on the GraphQL side is a sophisticated utilization of aliases that allows for dynamic types and attribute-names to be used. On the 8base side, we attempt to auto-cast types in the response. However, this is completely unreliable since there is no way on knowing how a grouped or aggregated field will be returned.

This is why it is necessary to explicitly, and manually, specify types in the response object. That way, we're able to enrich the GraphQL schema on the fly to accommodate the dynamic key-value pairs and objects. In tandem with auto-generating a number of custom types ahead of time, such as the GroupByField and a <TableName>Group for every table in the 8base workspace, the grouping and aggregation functions having everything they need to run... and only tease the limits of the GraphQL spec!

{% code-block language="graphql" %}
{
 /* Using the aliased name and explicitly declaring a type, we're
     able to generate dynamically name object keys. */
 groups {
    CampaignId: ID
    CampaignName: String
    CampaignOwner: String
    AverageSalePrice: Float
    TotalRevenue: Float
    /* The <TableName>Group type is auto-generated to allow for
        the aliased group to be accessed as a normal list of records. */
    Invoices: InvoiceGroup {
      count
     items { orderNumber }
    }
  }
}
{% code-block-end %}

Ready to try 8base?

We're excited about helping you achieve amazing results.