All Things Data

It's quite common to see people sharing papers about small advancements in NLP, Computer Vision, Machine Learning, Deep Learning, and similar fields.

Something difficult to see in fields such as Data Engineering, so this is just a collection of papers, articles, and books I've enjoyed reading lately. Some of them include notes and thoughts I wanted to share.


Druid: A Real-time Analytical Data Store

The first real-time database I was exposed to, really powerful, despite managing it was a tremendous pain in the arse.

The Log: What every software engineer should know about real-time data's unifying abstraction

An article from one of the Kafka creators. He explains the basic data structure, key for many databases, and modern distributed systems.

Space/Time Trade-offs in Hash Coding with Allowable Errors

Not especially interesting reading, but had to mention Bloom Filters and this is the original paper. Perhaps the most surprising data structure I discovered working in data.