Tomas Vondra

Tomas Vondra

blog about Postgres code and community

Performance archaeology: OLAP

A couple days ago I wrote about performance improvements on OLTP workloads since Postgres 8.0, released 20 years ago. And I promised to share a similar analysis about analytical workloads in a follow-up post. So here we go ;-) Let me show you some numbers from a TPC-H benchmark, with some basic commentary and thoughts about the future.

Performance archaeology: OLTP

The Postgres open source project is nearly 30 years old, I personally started using it about 20 years ago. And I’ve been contributing code for at least 10 years. But even with all that experience I find it really difficult to make judgments about how the performance changed over the years. Did it improve? And by how much? I decided to do some benchmarks to answer this question.

Tuning the glibc memory allocator (for Postgres)

If you’ve done any Postgres development in C, you’re probably aware of the concept of memory contexts. The primary purpose of memory contexts is to absolve the developers of having to track every single piece of memory they allocated. But it’s about performance too, because memory contexts cache the memory to save on malloc/free calls. But malloc gets the memory from another allocator in libc, and each libc has its own thing. The glibc allocator has some concurrency bottlenecks (which I learned the hard way), but it’s possible to tune that.

[PATCH IDEA] parallel pgbench -i

There are multiple tools to run benchmarks on Postgres, but pgbench is probably the most widely used one. The workload is very simple and perhaps a bit synthetic, but almost everyone is familiar with it and it’s a very convenient way to do quick tests and assessments. It was improved in various ways (e.g. to do partitioning), but the initial data load is still serial - only a single process does the COPY. Which annoys me - it may take a lot of time before I can start with the benchmarks it...

Playing with BOLT and Postgres

A couple days ago I had a bit of free time in the evening, and I was bored, so I decided to play with BOLT a little bit. No, not the dog from a Disney movie, the BOLT tool from LLVM project, aimed at optimizing binaries. It took me a while to get it working, but the results are unexpectedly good, in some cases up to 40%. So let me share my notes and benchmark results, and maybe there’s something we can learn from it. We’ll start by going through a couple rabbit holes first, though.