BigQuery Integration Guide

Last updated Aug 28th, 2024

Overview

Common Room supports importing data from BigQuery via GCS. Importing data via GCS may be suitable for you if you're interested in setting up a recurring import of data from your warehouse into Common Room.

Setup

When setting up your BigQuery integration via GCS, we recommend that you create a new GCS bucket with appropriate permissions so you can have full control over the life cycle of the data. Setting up the connection involves the following steps:

  1. [Customer] Create your GCS bucket
  2. [Customer] Share the bucket with the Common Room GCP service account (provided upon request) with roles/storage.objectCreator permissions
  3. [Common Room + Customer] Validate that data can be read from the bucket using test data
  4. [Common Room + Customer] Set up the recurring imports

To help make the bucket setup as easy as possible, here is a sample Pulumi snippet that can be used to set up the GCS bucket.

import * as pulumi from "@pulumi/pulumi";
import * as gcp from "@pulumi/gcp";

const COMMON_ROOM_GCS_SERVICE_ACCOUNT_EMAIL =
"common-room-gcs-access@common-room-production.iam.gserviceaccount.com";

const READ_DATA_PREFIX: string | undefined = "READ_PREFIX"; // this can be undefined if no data imports are configured

// Create a GCS bucket, choose your own bucket name
const bucket = new gcp.storage.Bucket("common-room-shared-bucket", {
  location: "US",
});

if (READ_DATA_PREFIX != null) {
  // Allows Common Room to read and write to the prefix
  new gcp.storage.BucketIAMMember(`common-room-  access-${READ_DATA_PREFIX}`, {
    bucket: bucket.name,
    role: "roles/storage.objectViewer",
    member: `serviceAccount:${COMMON_ROOM_GCS_SERVICE_ACCOUNT_EMAIL}`,
    condition: {
      title: "ReadPrefixReadAccess",
      expression: `resource.name.startsWith("projects/_/buckets/${bucket.name}/objects/${READ_DATA_PREFIX}/")`,
    },
  });

  new gcp.storage.BucketIAMMember(
    `common-room-access-write-${READ_DATA_PREFIX}`,
    {
      bucket: bucket.name,
      role: "roles/storage.objectCreator",
      member: `serviceAccount:${COMMON_ROOM_GCS_SERVICE_ACCOUNT_EMAIL}`,
      condition: {
        title: "ReadPrefixWriteAccess",
        expression: `resource.name.startsWith("projects/_/buckets/${bucket.name}/objects/${READ_DATA_PREFIX}/")`,
      },
    }
  );
}

export const bucketName = bucket.name;
export const serviceAccountEmail = serviceAccount.email;

Please contact us if you're interested in exploring this option and have any questions!

Requirements

Our GCS integration is available on Team and Enterprise plans as an add-on. Please work with your Common Room contact for more information.

Details

When importing data to Common Room, there are a few important things to keep in mind.

  • User data is keyed by a customer email with one record per customer
  • Company data is keyed by a unique identifier (e.g. SFDC account id) with one record per company
  • Each company record has non-nullable fields for: primary domain, name
    • Primary Domain is what we use to match up with the organizations in the community. It could be the actual domain of the customer’s company, email domain of the billing admin, account owner, etc.
    • Name is used to differentiate multiple records for the same primary domain (e.g. google.com could have Android and Google Maps as two different client records
  • Different datasets are written into different top level locations
    • E.g. /data/customers/…, /data/companies/...
  • Data snapshots are written into date-based partitions
    • E.g. /data/customers/date=20211025 will contain the snapshot generated on 2021-10-25
    • Common Room will detect new partitions and always use the data only from the latest partition
    • Each partition contains the entire snapshot of the dataset
  • Once written the partition should not change
  • Files are one of:
    • CSV/TSV (optionally gzipped)
    • JSONL (optionally gzipped)

FAQ

Should the daily data dump be a delta or full data dump?

We support both full and delta data dumps but full data dumps are preferred.

How should we handle deleted records?

Adding a deleted: true flag on a record can be used to flag deleted records

BigQuery -> GCS export breaks files up if they are larger than 1GB. Can Common Room handle multiple files for a given object?

Yes, our import will handle multiple files as long as they are grouped under the same dated path

Didn't find your answer?Get in touch· Visit ourhelp center