I’ve never invested much time in understanding how operating systems work. It has never been on my interests list, nor have I considered it useful knowledge for my career. What can I say? We all make bad decisions in our lives.

Over the last two years, I have invested a lot of time in understanding how some things work in Linux. Those are the consequences (or benefits?) of working with a complex and low-level system like a database.

This week, I was working on tracking data that was read from the page cache. While debugging some ClickHouse metrics directly collected from the Kernel, I discovered the /proc/thread-self directory1.


When a thread accesses this magic symbolic link, it resolves to the process’s own /proc/self/task/[tid] directory.

So /proc/thread-self points to the proc folder for the current thread. In those folders, you can find a lot of helpful information. In my case, I was interested in /proc/thread-self/io 2, where you have IO statistics.

I was focused on investigating whether the Kernel reported bytes read from S3 inside rchar. I shared more information in this PR in ClickHouse repository. Despite the PR being closed, the examples I shared there still hold valuable insights and a reproducer that can contribute to the collective understanding of the system.


  1. There is also a /proc/self for the current process. This is something I didn’t know either.

  2. https://docs.kernel.org/filesystems/proc.html#proc-pid-io-display-the-io-accounting-fields

Subscribe to the newsletter to keep you posted about future articles