Chariot SolutionsinThoughts on Data EngineeringAggregating Files in Your Data Lake: Part 3Several years ago, I wrote the post Friends Don’t Let Friends Use JSON (in their data lakes). The crux of that post was that JSON doesn’t…9 min read·Mar 27, 2024----
Chariot SolutionsinThoughts on Data EngineeringAggregating Files in your Data Lake: Part 2In my last post, I developed a data pipeline to aggregate CloudTrail log files. When I ran this pipeline against Chariot’s CloudTrail…7 min read·Feb 29, 2024----
Chariot SolutionsinThoughts on Data EngineeringAggregating Files in your Data Lake: Part 1As I’ve written in the past, large numbers of small files make for an inefficient data lake. But sometimes, you can’t avoid small files…10 min read·Feb 15, 2024----
Chariot SolutionsinThoughts on Data EngineeringData Engineering: More SRE Than SQLFollowing my post about the Chariot Data Engineering interview, I received some comments along the lines of “wait, you don’t test their SQL…5 min read·Jan 9, 2024----
Chariot SolutionsinThoughts on Data EngineeringDeveloping A Coding Test for Data EngineeringHiring good candidates is difficult. After nearly 40 years in this business, and interviewing hundreds of candidates, I’m not going to…6 min read·Nov 21, 2023----
Chariot SolutionsinThoughts on Data EngineeringSmall Data: A Pipeline for Low-Latency Decision SupportIn my last post, I said that I didn’t think Postgres was a good choice for a decision support database, versus a task-specific DBMS such as…5 min read·Aug 14, 2023----
Chariot SolutionsinThoughts on Data EngineeringWhy Not Just Use Postgres?My last few posts have focused on Redshift and Athena, two specialized tools for managing and querying Big Data. But there’s a meme that’s…12 min read·Jul 31, 2023----
Chariot SolutionsinThoughts on Data EngineeringA Deep Dive on Redshift Execution PlansExecution plans are one of the primary tools to optimize your database queries, but they can be daunting to read and understand. In this…12 min read·Jul 17, 2023----
Chariot SolutionsinThoughts on Data EngineeringUnbalanced Data in RedshiftI first experienced unbalanced data in a data warehouse thirty years ago. I was working for a mutual fund company, and something like a…8 min read·Jul 3, 2023----
Chariot SolutionsinThoughts on Data EngineeringRightsizing Data for AthenaAmazon Athena is a service that lets you run SQL queries against structured data files stored in S3. It takes a “divide and conquer”…7 min read·Jun 19, 2023----