Use MySQL Trigger To Do Incremental ETL


There’s a huge MySQL table that I need to ETL to Google BigQuery daily, about 1 billion rows. The rows are updated in a random fashion all the time so I can’t do incremental ETL by the recording the max primary key.

Then my colleague brought up the trigger idea, which I believe is the perfect tool for this job: record what has changed then only load the changed rows to BQ.

Below is the trigger I will use, rxt is the source table, rxts is the ‘diff’ table:

create TRIGGER rxt_update after UPDATE on rxt
  FOR EACH ROW
  BEGIN
    if new.value is not null then
      insert INTO rxts SET rxts.id = new.id
      on DUPLICATE KEY UPDATE rxts.ts = current_timestamp();
    end if;
  END;

The difference will be generated by the SQL

SELECT rxt.* FROM rxt join rxts on rxt.id = rxts.id

BQ doesn’t support update at the moment, so the table can be rebuilt with the original ‘rxt’ and the ‘rxt_diff’, eg. build rxt_0 with rows in rxt but not in rxt_diff, then append rxt_diff into rxt_0 to form the new rxt table.

🙂


One response to “Use MySQL Trigger To Do Incremental ETL”

  1. 给哥哥拜个晚年,祝哥哥一家,鸡年,吉祥如意,阖家欢乐,笑笑、洋洋小朋友,天天开心,身体棒棒。