Jordi Villar
# Zed screenshot

I’ve never been enthusiastic about the AI revolution.

However, this week, I started using Zed.dev together with their AI assistant integration. I’m surprised by the experience and efficiency gains. I still have reservations, but it can be valuable allies in tackling programming challenges and routine tasks.

For example, a few days ago, I was working on a tool that needed to parse ClickHouse explain output1, an easy task requiring recursively transverse all the properties looking for some specific properties.

With a simple prompt, the model wrote a working version.

Write code to extract all primary key granules from explain_result. An example of explain_result here:

{…}

def extract_primary_key_granules(explain_result):
    def traverse(node):
        granules = []
        if isinstance(node, dict):
            if 'Indexes' in node:
                for index in node['Indexes']:
                    if index.get('Type') == 'PrimaryKey':
                        granules.append({
                            'Keys': index.get('Keys', []),
                            'Initial Granules': index.get('Initial Granules', 0),
                            'Selected Granules': index.get('Selected Granules', 0)
                        })
            for value in node.values():
                granules.extend(traverse(value))
        elif isinstance(node, list):
            for item in node:
                granules.extend(traverse(item))
        return granules

    return traverse(explain_result)

# Example usage:
result = extract_primary_key_granules(explain_result)
for granule in result:
    print(f"Keys: {granule['Keys']}")
    print(f"Initial Granules: {granule['Initial Granules']}")
    print(f"Selected Granules: {granule['Selected Granules']}")
    print()

The task was relatively easy and I had to make changes and improve the initial code. Despite that, this saved me at least an hour of back-and-forth writing the function and running some tests to see if it worked as expected!

I started using Zed AI assistant with ChatGPT integration. A day after they released their own integration that I’m now testing together with Anthropic’s Claude model. I’m not even using Google for simple searches anymore.

Footnotes

  1. This is how a ClickHouse explain output looks like

#

I’ve never invested much time in understanding how operating systems work. It has never been on my interests list, nor have I considered it useful knowledge for my career. What can I say? We all make bad decisions in our lives.

Over the last two years, I have invested a lot of time in understanding how some things work in Linux. Those are the consequences (or benefits?) of working with a complex and low-level system like a database.

This week, I was working on tracking data that was read from the page cache. While debugging some ClickHouse metrics directly collected from the Kernel, I discovered the /proc/thread-self directory1.

/proc/thread-self

When a thread accesses this magic symbolic link, it resolves to the process’s own /proc/self/task/[tid] directory.

So /proc/thread-self points to the proc folder for the current thread. In those folders, you can find a lot of helpful information. In my case, I was interested in /proc/thread-self/io 2, where you have IO statistics.

I was focused on investigating whether the Kernel reported bytes read from S3 inside rchar. I shared more information in this PR in ClickHouse repository. Despite the PR being closed, the examples I shared there still hold valuable insights and a reproducer that can contribute to the collective understanding of the system.

Footnotes

  1. There is also a /proc/self for the current process. This is something I didn’t know either.

  2. https://docs.kernel.org/filesystems/proc.html#proc-pid-io-display-the-io-accounting-fields

#

It’s quite common to find folks sharing papers on minor breakthroughs in areas like NLP, Computer Vision, Machine Learning, Deep Learning, and related fields here.

We’ve carefully gathered a collection of papers focused on databases, distributed systems, and data in general. We’ve explored the latest developments in these areas and gained valuable insights from the many articles in our list.

Some of these publications were suggested to us after we made our list public, adding to the richness of our curated collection.

#

When a query or transaction acquires a lock on a table, the lock remains for the duration of the query or transaction. Other queries or transactions that are waiting to acquire the same lock are blocked.

When you take a look to Redshift documentation they recommend you using STV_LOCKS, which results on:

STV_LOCKS results

It seems really useful until you have a real database lock. Last month I was trying to solve a lock that was blocking lots of processes. Finally, I found a better way to locate the queries that are causing locks:

Queries causing locks

Here you have the query itself:

SELECT current_time,
    c.relname,
    l.database,
    l.transaction,
    l.pid,
    a.usename,
    l.mode,
    l.granted
FROM pg_locks l
JOIN pg_catalog.pg_class c ON c.oid = l.relation
JOIN pg_catalog.pg_stat_activity a ON a.procpid = l.pid
WHERE l.pid <> pg_backend_pid();

source

← Previous Page 2