Use MySQL Trigger To Do Incremental ETL

There’s a huge MySQL table that I need to ETL to Google BigQuery daily, about 1 billion rows. The rows are updated in a random fashion all the time so I can’t do incremental ETL by the recording the max primary key.

Then my colleague brought up the trigger idea, which I believe is the perfect tool for this job: record what has changed then only load the changed rows to BQ.

Below is the trigger I will use, rxt is the source table, rxts is the ‘diff’ table:

create TRIGGER rxt_update after UPDATE on rxt
  FOR EACH ROW
  BEGIN
    if new.value is not null then
      insert INTO rxts SET rxts.id = new.id
      on DUPLICATE KEY UPDATE rxts.ts = current_timestamp();
    end if;
  END;

The difference will be generated by the SQL

SELECT rxt.* FROM rxt join rxts on rxt.id = rxts.id

BQ doesn’t support update at the moment, so the table can be rebuilt with the original ‘rxt’ and the ‘rxt_diff’, eg. build rxt_0 with rows in rxt but not in rxt_diff, then append rxt_diff into rxt_0 to form the new rxt table.

🙂

Saving Images in Chrome with Selenium

Here’s how to save images in Chrome browser using Selenium. The API has element.screenshot_as_png() method but apparently it’s not implemented at the moment.

With some minor changes to this answer I can save a image via the browser: http://stackoverflow.com/questions/13832322/how-to-capture-the-screenshot-of-a-specific-element-rather-than-entire-page-usin

from selenium import webdriver
from PIL import Image

#chrome_options = ...
#chrome = webdriver.Chrome(chrome_options=chrome_options)
#element = chrome.find_element_by_id('some_id')

def save_image(chrome, element, save_path):
  # in case the image isn't isn't in the view yet
  location = element.location_once_scrolled_into_view
  size = element.size

  # saves screenshot of entire page
  chrome.save_screenshot(save_path)
  # uses PIL library to open image in memory

  image = Image.open(save_path)
  left = location['x']
  top = location['y']
  right = location['x'] + size['width']
  bottom = location['y'] + size['height']
  image = image.crop((left, top, right, bottom))  # defines crop points
  image.save(save_path, 'png')  # saves new cropped image

🙂