Published inThoughts on Data EngineeringAggregating Files in Your Data Lake: Part 3Several years ago, I wrote the post Friends Don’t Let Friends Use JSON (in their data lakes). The crux of that post was that JSON doesn’t…Mar 27Mar 27
Published inThoughts on Data EngineeringAggregating Files in your Data Lake: Part 2In my last post, I developed a data pipeline to aggregate CloudTrail log files. When I ran this pipeline against Chariot’s CloudTrail…Feb 29Feb 29
Published inThoughts on Data EngineeringAggregating Files in your Data Lake: Part 1As I’ve written in the past, large numbers of small files make for an inefficient data lake. But sometimes, you can’t avoid small files…Feb 15Feb 15
Published inThoughts on Data EngineeringData Engineering: More SRE Than SQLFollowing my post about the Chariot Data Engineering interview, I received some comments along the lines of “wait, you don’t test their SQL…Jan 9Jan 9
Published inThoughts on Data EngineeringDeveloping A Coding Test for Data EngineeringHiring good candidates is difficult. After nearly 40 years in this business, and interviewing hundreds of candidates, I’m not going to…Nov 21, 2023Nov 21, 2023
Published inThoughts on Data EngineeringSmall Data: A Pipeline for Low-Latency Decision SupportIn my last post, I said that I didn’t think Postgres was a good choice for a decision support database, versus a task-specific DBMS such as…Aug 14, 2023Aug 14, 2023
Published inThoughts on Data EngineeringWhy Not Just Use Postgres?My last few posts have focused on Redshift and Athena, two specialized tools for managing and querying Big Data. But there’s a meme that’s…Jul 31, 2023Jul 31, 2023
Published inThoughts on Data EngineeringA Deep Dive on Redshift Execution PlansExecution plans are one of the primary tools to optimize your database queries, but they can be daunting to read and understand. In this…Jul 17, 2023Jul 17, 2023
Published inThoughts on Data EngineeringUnbalanced Data in RedshiftI first experienced unbalanced data in a data warehouse thirty years ago. I was working for a mutual fund company, and something like a…Jul 3, 2023Jul 3, 2023
Published inThoughts on Data EngineeringRightsizing Data for AthenaAmazon Athena is a service that lets you run SQL queries against structured data files stored in S3. It takes a “divide and conquer”…Jun 19, 2023Jun 19, 2023