The Burnout Effect

Back in October 2015 I got an offer from a big data startup, and after 1 year and 4 months I decided to move on.

There’s a 3D printer and a drone in the office and the team was talking about Fallout 4 in the morning because it was just released. I thought the company and the team were very cool and I still think so now.

My first challenge was to migrate a self-hosted MySQL database to AWS Aurora, because the MySQL server was over stretched and felt like it could collapse anytime soon. I was quite experienced at MySQL so I didn’t think that would be hard. However there were some complications: The MySQL was a huge VM managed by Ganeti cluster and backed by DRBD volumes and the best of all was that the disks were old school magnetic SAS disks.

The DB migration tool recommended by AWS just failed randomly on large tables(~400GB). It wasn’t acceptable to do a full mysqldump(~1TB) and setup replication to Aurora because that will cause huge downtime. And since there’s no access to the Aurora’s system, the option to use LVM snapshot was out too. I thought there must have a way so I created a MySQL replica in the same cluster using LVM snapshot, then setup replication between the replica and the master.

After the replication was done and verified between the master and the replica, I had the opportunity to pause the replication and do mysqldump on the replica and then setup another replication between the replica MySQL and Aurora. After the replica  caught  up with the master we did the DNS switch-over and the apps almost did feel a thing and started to update on the Aurora. This concluded the first success.

I was involved in several big projects such as migration of DNS to route53, migration of core servers(about 40) to AWS and migration of data warehouse from AWS Redshift to Google BigQuery in 2016.

I thought the job should have become more comfortable since a large portion of the infrastructure had been rebuilt. However since a few months ago I started to have poor sleeps, and in daytime poor concentration. I searched for the answer, to my surprise, a lot of people shared the same issue which is called burnout. So rather than being asked to leave for poor performance, I choose to have a break and search for a new job.

On the last day of the job, we played Rocket League together and those were my best hours. I felt super relieved, yet very sad to leave the team. Thanks to the team especially Trist and Adam who I learned a lot from.

🙂

快车道上的2016

我们的2016年仿佛在快进模式中完成,简单回顾一下吧:

终于我们去了传说中的大洋路和12门徒!景色无比美丽,但苍蝇却着实多,估计是托附近众多农场的福。

工作方面,这是我丰收的一年。首先我完成了TB容量的数据库在线迁移,从美国加州某数据中心的服务器搬到AWS Aurora,让同事们基本上告别了数据库过载造成的麻烦,没用“完全”一词因为 Aurora 有时候也会被放倒。接下来的一个大项目可想而知,把私有云上的所有服务器搬家到AWS EC2。这两个项目就花了几个月,而且是我的前任以及前任的前任都没搞定的事情,因此我小小的有点自豪。有趣的是之前我们要求那个数据中心的工程师搭建一条数据中心到AWS的VPN专线,结果迟迟没有完成,这也是老板跟他们翻船的主要原因。后来我用了半天时间连读文档带执行就完成了AWS到公司局域网的VPN,我为自己的学习能力又稍微的自豪了一把。

另外一个大项目就是大数据仓储。公司之前使用 AWS Redshift,但性价比达不到要求。于是我接受了挑战,试用Google BigQuery,并把现有的为Redshift编写的程序和SQL改写一遍。改写还不算难,困难的是让新程序能把数据处理结果上传到BQ并保证两边的数据一致。比较数据也不难,做diff就是了,但30+TB的就不那么随便了,一旦一个月有那么几天的数据对不上,就要一个个环节的去排查。经过几个星期的排错,最后数据终于完全吻合了,100%吻合。BQ的优势相当明显,特别是对于不规律的大数据计算,用多少就付款多少,不需要评估CPU什么的。

其他琐碎的事情就不提了。

两个孩子各有进步。洋洋自打出院以后就很努力的吃饭和睡觉,体重慢慢接近中等水平了。笑笑的NAPLAN 考试成绩意外的好,因为我们俩关注她学习的时间无可奈何的很少。老婆给笑笑找了美术老师,一切都被她计划的很好,我给她点赞 🙂