<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Tomas Vondra</title><link>https://vondra.me/</link><description>Recent content on Tomas Vondra</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><copyright>© Tomas Vondra</copyright><lastBuildDate>Thu, 03 Dec 2099 15:00:00 +0200</lastBuildDate><atom:link href="https://vondra.me/index.xml" rel="self" type="application/rss+xml"/><item xml:base="https://vondra.me/posts/the-real-cost-of-random-io/"><title>The real cost of random I/O</title><link>https://vondra.me/posts/the-real-cost-of-random-io/</link><pubDate>Thu, 26 Feb 2026 15:00:00 +0200</pubDate><guid>https://vondra.me/posts/the-real-cost-of-random-io/</guid><description>&lt;p>The &lt;code>random_page_cost&lt;/code> was &lt;a href="https://www.postgresql.org/message-id/flat/14601.949786166@sss.pgh.pa.us">introduced&lt;/a>
~25 years ago, and since the very beginning it&amp;rsquo;s set to 4.0 by default.
The storage changed a lot since then, and so did the Postgres code.
It&amp;rsquo;s likely the default does not quite match the reality. But what
value should you use instead? Flash storage is much better at handling
random I/O, so maybe you should reduce the default? Some places go as
far as recommending setting it to 1.0, same as &lt;code>seq_page_cost&lt;/code>. Is this
intuition right?&lt;/p></description></item><item xml:base="https://vondra.me/posts/the-ai-inversion/"><title>The AI inversion</title><link>https://vondra.me/posts/the-ai-inversion/</link><pubDate>Fri, 20 Feb 2026 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/the-ai-inversion/</guid><description>&lt;p>If you attended &lt;a href="https://fosdem.org/2026/">FOSDEM 2026&lt;/a>, you probably
noticed discussions on how AI impacts FOSS, mostly in detrimental ways.
Two of the three keynotes in Janson mentioned this, and I assume other
speakers mentioned the topic too. Moreover, it was a very popular topic
in the &amp;ldquo;hallway track.&amp;rdquo; I myself chatted about it with multiple people,
both from the Postgres community and outside of it. And the experience
does not seem great &amp;hellip;&lt;/p></description></item><item xml:base="https://vondra.me/posts/stabilizing-benchmarks/"><title>Stabilizing Benchmarks</title><link>https://vondra.me/posts/stabilizing-benchmarks/</link><pubDate>Tue, 06 Jan 2026 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/stabilizing-benchmarks/</guid><description>&lt;p>I do a fair amount of benchmarks as part of development, both on my own
patches and while reviewing patches by others. That often requires
dealing with noise, particularly for small optimizations. Here&amp;rsquo;s an
overview of ways I use to filter out random variations / noise.&lt;/p>
&lt;p>Most of the time it&amp;rsquo;s easy - the benefits are large and obvious. Great!
But sometimes we need to care about cases when the changes are small
(think less than 5%).&lt;/p></description></item><item xml:base="https://vondra.me/posts/dont-give-postgres-too-much-memory-even-on-busy-systems/"><title>Don't give Postgres too much memory (even on busy systems)</title><link>https://vondra.me/posts/dont-give-postgres-too-much-memory-even-on-busy-systems/</link><pubDate>Tue, 23 Dec 2025 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/dont-give-postgres-too-much-memory-even-on-busy-systems/</guid><description>&lt;p>A couple weeks ago I posted about how &lt;a href="https://vondra.me/posts/dont-give-postgres-too-much-memory/">setting &lt;code>maintenance_work_mem&lt;/code>
too high may make things slower&lt;/a>.
Which can be surprising, as the intuition is that memory makes things
faster. I got an e-mail about that post, asking if the conclusion would
change on a busy system. That&amp;rsquo;s a really good question, so let&amp;rsquo;s look
at it.&lt;/p>
&lt;p>To paraphrase the message I got, it went something like this:&lt;/p>
&lt;blockquote>
&lt;p>Lower &lt;code>maintenance_work_mem values&lt;/code> may split the task into chunks
that fit into the CPU cache. Which may end up being faster than with
larger chunks.&lt;/p></description></item><item xml:base="https://vondra.me/posts/qubes-os-is-pretty-great/"><title>Qubes OS is pretty great</title><link>https://vondra.me/posts/qubes-os-is-pretty-great/</link><pubDate>Tue, 25 Nov 2025 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/qubes-os-is-pretty-great/</guid><description>&lt;p>I&amp;rsquo;ve been using &lt;a href="https://www.qubes-os.org/">Qubes OS&lt;/a> as my primary OS
since version 2, released in 2014. That means I&amp;rsquo;m using Qubes OS for
about a decade. Despite having to deal with a couple issues over the
years, I think it&amp;rsquo;s a great Linux system. I can&amp;rsquo;t quite imagine
switching to a more traditional one. Yet I know very few other Qubes OS
users in my developer community, which surprises me. Let me explain why
I like Qubes OS, and maybe give it a try.&lt;/p></description></item><item xml:base="https://vondra.me/posts/wireguard-setup-to-access-a-home-network/"><title>Wireguard to access a home network</title><link>https://vondra.me/posts/wireguard-setup-to-access-a-home-network/</link><pubDate>Mon, 17 Nov 2025 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/wireguard-setup-to-access-a-home-network/</guid><description>&lt;p>I work remotely from home, and over the years I&amp;rsquo;ve amassed a bunch of
machines related to that (development, testing, benchmarking, &amp;hellip;),
and other devices you may usually find at home (printer, NAS, &amp;hellip;).
Occasionally I need remote access, and for a while SSH tunnels were
good enough. I decided to simplify and clean this up and use a proper
VPN wireguard. This blog post explains the setup I used.&lt;/p></description></item><item xml:base="https://vondra.me/posts/dont-give-postgres-too-much-memory/"><title>Don't give Postgres too much memory</title><link>https://vondra.me/posts/dont-give-postgres-too-much-memory/</link><pubDate>Fri, 31 Oct 2025 15:00:00 +0200</pubDate><guid>https://vondra.me/posts/dont-give-postgres-too-much-memory/</guid><description>&lt;p>From time to time I get to investigate issues with some sort of a batch
process. It&amp;rsquo;s getting more and more common that such processes use very
high memory limits (&lt;code>maintenance_work_mem&lt;/code> and &lt;code>work_mem&lt;/code>). I suppose
some DBAs follow the logic that &amp;ldquo;more is better&amp;rdquo;, not realizing it can
hurt the performance quite a bit.&lt;/p>
&lt;p>Let me demonstrate this using an example I ran across while testing a
fix for parallel builds of GIN indexes. The bug is not particularly
interesting or complex, but it required a fairly high value for
&lt;code>maintenance_work_mem&lt;/code> (the initial report used &lt;code>20GB&lt;/code>).&lt;/p></description></item><item xml:base="https://vondra.me/posts/tuning-aio-in-postgresql-18/"><title>Tuning AIO in PostgreSQL 18</title><link>https://vondra.me/posts/tuning-aio-in-postgresql-18/</link><pubDate>Wed, 24 Sep 2025 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/tuning-aio-in-postgresql-18/</guid><description>&lt;p>PostgreSQL 18 was &lt;a href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=3d6a828938a5fa0444275d3d2f67b64ec3199eb7">stamped&lt;/a>
earlier this week, and as usual there&amp;rsquo;s a &lt;a href="https://www.postgresql.org/docs/release/18.0/">lot of improvements&lt;/a>.
One of the big architectural changes is asynchronous I/O (AIO), allowing
asynchronous scheduling of I/O, giving the database more control and
better utilizing the storage.&lt;/p>
&lt;p>I&amp;rsquo;m not going to explain how AIO works, or present detailed benchmark
results. There have been multiple &lt;a href="https://pganalyze.com/blog/postgres-18-async-io">really&lt;/a>
&lt;a href="https://www.dbi-services.com/blog/postgresql-18-support-for-asynchronous-i-o/">good&lt;/a>
&lt;a href="https://www.pgedge.com/blog/highlights-of-postgresql-18">blog&lt;/a>
&lt;a href="https://neon.com/postgresql/postgresql-18/asynchronous-io">posts&lt;/a> about
&lt;a href="https://www.cybertec-postgresql.com/en/postgresql-18-better-i-o-performance-with-aio/">that&lt;/a>.
There&amp;rsquo;s also a great &lt;a href="https://www.youtube.com/watch?v=GR5v9DHiS8w">talk from pgconf.dev 2025&lt;/a>
about AIO, and a recent &lt;a href="https://talkingpostgres.com/episodes/what-went-wrong-what-went-right-with-aio-with-andres-freund">&amp;ldquo;Talking Postgres&amp;rdquo; podcast episode&lt;/a>
with Andres, discussing various aspects of the whole project. I highly
suggest reading / watching those.&lt;/p>
&lt;p>I want to share a couple suggestions on how to tune the AIO in Postgres
18, and explain some inherent (but not immediately obvious) trade-offs
and limitations.&lt;/p></description></item><item xml:base="https://vondra.me/posts/using-jwt-to-establish-trusted-context-for-rls/"><title>Using JWT to establish a trusted context for RLS</title><link>https://vondra.me/posts/using-jwt-to-establish-trusted-context-for-rls/</link><pubDate>Wed, 27 Aug 2025 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/using-jwt-to-establish-trusted-context-for-rls/</guid><description>&lt;p>&lt;a href="https://www.postgresql.org/docs/current/ddl-rowsecurity.html">Row-level security (RLS)&lt;/a>
is a great feature. It allows restricting access to rows by applying
filters defined by a policy. It&amp;rsquo;s a tool useful for cases when the data
set can&amp;rsquo;t be split into separate databases.&lt;/p>
&lt;p>Sadly, using RLS may be quite cumbersome. RLS requires some sort of
&amp;ldquo;trusted context&amp;rdquo; for the RLS policies. The policies need to filter
using data the user can&amp;rsquo;t change. If the filter uses some sort of
&amp;ldquo;tenant ID&amp;rdquo;, and the user can change it to an arbitrary value, that
would break the RLS concept.&lt;/p>
&lt;p>This is why solutions like using GUCs are flawed, because the access
control for GUC is very limited. The traditional solution is to use
roles, which derives the trust from authentication.&lt;/p>
&lt;p>It occurred to me it should be possible to build a trusted context on
cryptography, independently of authentication. I&amp;rsquo;ll explain the basic
idea, and discuss a couple interesting variations. I&amp;rsquo;ve also published
an experimental extension &lt;a href="https://github.com/tvondra/jwt_context">jwt_context&lt;/a>,
implementing this using &lt;a href="https://www.jwt.io/">JWT&lt;/a>.&lt;/p>
&lt;p>I&amp;rsquo;m interested in all kinds of feedback. Is it a good idea to use JWT
this way, as a basis for RLS context? Did I miss some fundamental issue?
Are there interesting improvements?&lt;/p></description></item><item xml:base="https://vondra.me/posts/fun-and-weirdness-with-ssds/"><title>Fun and weirdness with SSDs</title><link>https://vondra.me/posts/fun-and-weirdness-with-ssds/</link><pubDate>Wed, 20 Aug 2025 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/fun-and-weirdness-with-ssds/</guid><description>&lt;p>When I started working with Postgres (or databases in general) 25 years
ago, storage systems looked very different. All storage was &amp;ldquo;spinning
rust&amp;rdquo; - rotational disks with various interfaces (SATA/SAS/&amp;hellip;) and
speeds (7.2K/10k/15k/&amp;hellip;). The spindle speed was the main performance
determining feature, and everyone knew what IOPS and bandwidth to expect
from a disk. The general behavior was pretty much the same.&lt;/p>
&lt;p>With SSDs it&amp;rsquo;s more complicated. The interface may be the same, but the
hardware &amp;ldquo;inside&amp;rdquo; the device can be very different. There&amp;rsquo;s nothing like
the &amp;ldquo;spindle speed&amp;rdquo;, a single feature determining fundamental behavior.
The flash memory is subject to various limits, but manufacturers may
(and do) make different tradeoffs (much more cache, more spare space,
etc.). And the hardware changes a lot over time too.&lt;/p>
&lt;p>While working on the &lt;a href="https://www.postgresql.org/message-id/3dafd3be-8c20-4130-b956-eff178d9fe0a%40vondra.me">index prefetching patch&lt;/a>,
I ran into a couple weird differences between &amp;ldquo;very similar&amp;rdquo; queries.
And we speculated it might be due to how SSDs handle the different I/O
patterns. I did testing on my SSD devices, and there definitely are
some very surprising differences in behavior, contradicting (reasonable)
expectations. Let’s look at the results, and how it can result in
strange query timings.&lt;/p></description></item><item xml:base="https://vondra.me/posts/so-why-dont-we-pick-the-optimal-query-plan/"><title>So why don't we pick the optimal query plan?</title><link>https://vondra.me/posts/so-why-dont-we-pick-the-optimal-query-plan/</link><pubDate>Tue, 08 Jul 2025 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/so-why-dont-we-pick-the-optimal-query-plan/</guid><description>&lt;p>Last week I posted about &lt;a href="https://vondra.me/posts/how-often-is-the-query-plan-optimal/">how we often don&amp;rsquo;t pick the optimal plan&lt;/a>.
I got asked about difficulties when trying to reproduce my results, so
I&amp;rsquo;ll address that first (I forgot to mention a couple details). I also
got questions about how to best spot this issue, and ways to mitigate
this. I&amp;rsquo;ll discuss that too, although I don&amp;rsquo;t have any great solutions,
but I&amp;rsquo;ll briefly discuss a couple possible planner/executor improvements
that might allow handling this better.&lt;/p></description></item><item xml:base="https://vondra.me/posts/how-often-is-the-query-plan-optimal/"><title>How often is the query plan optimal?</title><link>https://vondra.me/posts/how-often-is-the-query-plan-optimal/</link><pubDate>Mon, 30 Jun 2025 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/how-often-is-the-query-plan-optimal/</guid><description>&lt;p>The basic promise of a query optimizer is that it picks the &amp;ldquo;optimal&amp;rdquo;
query plan. But there&amp;rsquo;s a catch - the plan selection relies on cost
estimates, calculated from selectivity estimates and cost of basic
resources (I/O, CPU, &amp;hellip;). So the question is, how often do we
actually pick the &amp;ldquo;fastest&amp;rdquo; plan? And the truth is we actually
make mistakes quite often.&lt;/p>
&lt;p>Consider the following chart, with durations of a simple &lt;code>SELECT&lt;/code> query
with a range condition. The condition is varied to match different
fractions of the table, shown on the x-axis (fraction of pages with
matching rows). The plan is forced to use different scan methods using
&lt;code>enable_&lt;/code> options, and the dark points mark runs when the scan method
&amp;ldquo;won&amp;rdquo; even without using the &lt;code>enable_&lt;/code> parameters.&lt;/p>
&lt;p>
&lt;a href="nvme-uniform.png">&lt;img src="https://vondra.me/posts/how-often-is-the-query-plan-optimal/nvme-uniform.png" title="scan method durations for uniform data set" alt="scan method durations for uniform data set"/>&lt;/a>
&lt;/p>
&lt;p>It shows that for selectivities ~1-5% (the x-axis is logarithmic), the
planner picks an index scan, but this happens to be a poor choice. It
takes up to ~10 seconds, and a simple &amp;ldquo;dumb&amp;rdquo; sequential scan would
complete the query in ~2 seconds.&lt;/p></description></item><item xml:base="https://vondra.me/posts/benchmarking-is-hard-sometimes/"><title>Benchmarking is hard, sometimes ...</title><link>https://vondra.me/posts/benchmarking-is-hard-sometimes/</link><pubDate>Thu, 05 Jun 2025 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/benchmarking-is-hard-sometimes/</guid><description>&lt;p>I do a fair number of benchmarks, not only to validate patches, but also
to find interesting (suspicious) stuff to improve. It&amp;rsquo;s an important
part of my development workflow. And it&amp;rsquo;s fun ;-) But we&amp;rsquo;re dealing with
complex systems (hardware, OS, DB, application), and that brings
challenges. Every now and then I run into something that I don&amp;rsquo;t
quite understand.&lt;/p>
&lt;p>Consider a &lt;a href="https://www.postgresql.org/docs/current/pgbench.html#PGBENCH-OPTION-SELECT-ONLY">read-only pgbench&lt;/a>,
the simplest workload there is, with a single &lt;code>SELECT&lt;/code> doing lookup by
PK. If you do this with a small data set on any machine, the expectation
is near linear scaling up to the number of cores. It&amp;rsquo;s not perfect, CPUs
have frequency scaling and power management, but it should be close.&lt;/p>
&lt;p>Some time ago I tried running this on a big machine with 176 cores (352
threads), using scale 50 (about 750MB, so tiny - it actually fits into
L3 on the &lt;a href="https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9684x.html">EPYC 9V33X CPU&lt;/a>).
And I got the following chart for throughput with different client
counts:&lt;/p>
&lt;p>
&lt;a href="pgbench-tps.png">&lt;img src="https://vondra.me/posts/benchmarking-is-hard-sometimes/pgbench-tps.png" title="results for read-only pgbench on a system with 176 cores" alt="results for read-only pgbench on a system with 176 cores"/>&lt;/a>
&lt;/p>
&lt;p>This is pretty awful. I still don&amp;rsquo;t think I entirely understand why this
happens, or how to improve the behavior. But let me explain what I know
so far, what I think may be happening, and perhaps someone will correct
me or have an idea how to fix it.&lt;/p></description></item><item xml:base="https://vondra.me/posts/advanced-patch-feedback-session-apfs-at-pgconf-dev-2025/"><title>Advanced Patch Feedback Session (APFS) at pgconf.dev 2025</title><link>https://vondra.me/posts/advanced-patch-feedback-session-apfs-at-pgconf-dev-2025/</link><pubDate>Thu, 29 May 2025 14:00:00 +0200</pubDate><guid>https://vondra.me/posts/advanced-patch-feedback-session-apfs-at-pgconf-dev-2025/</guid><description>&lt;p>The &lt;a href="https://pgconf.dev">pgconf.dev&lt;/a> conference, a revamp of the original
&lt;a href="https://www.pgcon.org">PGCon&lt;/a>, happened about two weeks ago. It&amp;rsquo;s the
main event for Postgres developers, and one of the things we&amp;rsquo;re trying
is an Advanced Patch Feedback Session (APFS).&lt;/p>
&lt;p>We first tried that last year in Vancouver, and then again in Montreal.
But I realized many people attending the conference either are not aware
of the event at all, or are not sure what it&amp;rsquo;s about. So let me explain,
and share some reflections from this year.&lt;/p></description></item><item xml:base="https://vondra.me/posts/good-time-to-test-io-method-for-pg-18/"><title>Good time to test io_method (for Postgres 18)</title><link>https://vondra.me/posts/good-time-to-test-io-method-for-pg-18/</link><pubDate>Mon, 12 May 2025 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/good-time-to-test-io-method-for-pg-18/</guid><description>&lt;p>We&amp;rsquo;re now in the &amp;ldquo;feature freeze&amp;rdquo; phase of Postgres 18 development.
That means no new features will get in - only bugfixes and cleanups of
already committed changes. The goal is to test and stabilize the code
before a release. &lt;a href="https://www.postgresql.org/about/news/postgresql-18-beta-1-released-3070/">PG 18 beta1&lt;/a>
was released a couple days ago, so it&amp;rsquo;s a perfect time to do some
testing and benchmarking.&lt;/p>
&lt;p>One of the fundamental changes in PG 18 is going to be support for
&lt;a href="https://github.com/postgres/postgres/blob/master/src/backend/storage/aio/README.md">asynchronous I/O&lt;/a>.
And with beta1 out, it&amp;rsquo;s the right time to run your tests and benchmarks
to test this new feature. Both for correctness and regression.&lt;/p></description></item><item xml:base="https://vondra.me/posts/patch-adaptive-execution-for-in-queries/"><title>[PATCH IDEA] adaptive execution for `IN` queries</title><link>https://vondra.me/posts/patch-adaptive-execution-for-in-queries/</link><pubDate>Mon, 28 Apr 2025 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/patch-adaptive-execution-for-in-queries/</guid><description>&lt;p>Last week I visited the &lt;a href="https://www.meetup.com/malmo-postgresql-user-group-m-pug/">Malmö PUG&lt;/a>
to talk about &lt;a href="https://vondra.me/pdf/performance-cliffs-pgconfbe-2024.pdf">performance cliffs&lt;/a>.
It&amp;rsquo;s a really nice meetup - cozy environment, curious audience asking
insightful questions. I highly recommend attending or even giving
a talk there.&lt;/p>
&lt;p>After the meetup I realized it&amp;rsquo;s been a while since I posted about some
&lt;a href="https://vondra.me/tags/patch-idea">patch idea&lt;/a>, and the performance cliff talk has a
couple good candidates. Some might be a bit too hard for the first
patch, for example improving the JIT costing. But improving the very
first example about queries with &lt;code>IN&lt;/code> clauses seems feasible. It&amp;rsquo;s quite
well defined and isolated.&lt;/p></description></item><item xml:base="https://vondra.me/posts/15-years-of-prague-pg-dev-day/"><title>15 years of Prague PostgreSQL Developer Day</title><link>https://vondra.me/posts/15-years-of-prague-pg-dev-day/</link><pubDate>Wed, 26 Mar 2025 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/15-years-of-prague-pg-dev-day/</guid><description>&lt;p>It&amp;rsquo;s been a couple weeks since P2D2 (Prague PostgreSQL Developer Day)
2025. We&amp;rsquo;ve been busy with the various tiny tasks that need to happen
after the conference - processing feedback, paying invoices, and so on.
But it&amp;rsquo;s also a good opportunity to look back - I realized this was the
15th year of the event I&amp;rsquo;ve helped to organize, so let me share some
of that experience.&lt;/p>
&lt;p>
&lt;a href="talks-2024.jpg">&lt;img src="https://vondra.me/posts/15-years-of-prague-pg-dev-day/talks-2024.jpg" title="Prague PostgreSQL Developer Day 2024" alt="Prague PostgreSQL Developer Day 2024"/>&lt;/a>
&lt;/p></description></item><item xml:base="https://vondra.me/posts/postgres-performance-archaeology-olap/"><title>Performance archaeology: OLAP</title><link>https://vondra.me/posts/postgres-performance-archaeology-olap/</link><pubDate>Tue, 03 Dec 2024 14:00:00 +0200</pubDate><guid>https://vondra.me/posts/postgres-performance-archaeology-olap/</guid><description>&lt;p>A couple days ago I wrote about &lt;a href="https://vondra.me/posts/postgres-performance-archaeology-oltp/">performance improvements on OLTP&lt;/a>
workloads since Postgres 8.0, released 20 years ago. And I promised to
share a similar analysis about analytical workloads in a follow-up post.
So here we go ;-) Let me show you some numbers from a TPC-H benchmark,
with some basic commentary and thoughts about the future.&lt;/p></description></item><item xml:base="https://vondra.me/posts/postgres-performance-archaeology-oltp/"><title>Performance archaeology: OLTP</title><link>https://vondra.me/posts/postgres-performance-archaeology-oltp/</link><pubDate>Tue, 26 Nov 2024 16:00:00 +0200</pubDate><guid>https://vondra.me/posts/postgres-performance-archaeology-oltp/</guid><description>&lt;p>The Postgres open source project is nearly 30 years old, I personally
started using it about 20 years ago. And I’ve been contributing code
for at least 10 years. But even with all that experience I find it
really difficult to make judgments about how the performance changed
over the years. Did it improve? And by how much? I decided to do some
benchmarks to answer this question.&lt;/p></description></item><item xml:base="https://vondra.me/posts/tuning-the-glibc-allocator-for-postgres/"><title>Tuning the glibc memory allocator (for Postgres)</title><link>https://vondra.me/posts/tuning-the-glibc-allocator-for-postgres/</link><pubDate>Mon, 14 Oct 2024 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/tuning-the-glibc-allocator-for-postgres/</guid><description>&lt;p>If you&amp;rsquo;ve done any Postgres development in C, you&amp;rsquo;re probably aware of
the concept of memory contexts. The primary purpose of memory contexts
is to absolve the developers of having to track every single piece of
memory they allocated. But it&amp;rsquo;s about performance too, because memory
contexts cache the memory to save on &lt;code>malloc&lt;/code>/&lt;code>free&lt;/code> calls. But &lt;code>malloc&lt;/code>
gets the memory from another allocator in &lt;code>libc&lt;/code>, and each &lt;code>libc&lt;/code> has
its own thing. The &lt;code>glibc&lt;/code> allocator has some concurrency bottlenecks
(which I learned the hard way), but it&amp;rsquo;s possible to tune that.&lt;/p></description></item><item xml:base="https://vondra.me/posts/patch-idea-parallel-pgbench-i/"><title>[PATCH IDEA] parallel pgbench -i</title><link>https://vondra.me/posts/patch-idea-parallel-pgbench-i/</link><pubDate>Tue, 01 Oct 2024 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/patch-idea-parallel-pgbench-i/</guid><description>&lt;p>There are multiple tools to run benchmarks on Postgres, but
&lt;a href="https://www.postgresql.org/docs/current/pgbench.html">pgbench&lt;/a> is
probably the most widely used one. The workload is very simple and
perhaps a bit synthetic, but almost everyone is familiar with it and
it&amp;rsquo;s a very convenient way to do quick tests and assessments. It was
improved in various ways (e.g. to do partitioning), but the initial
data load is still serial - only a single process does the &lt;code>COPY&lt;/code>.
Which annoys me - it may take a lot of time before I can start with the
benchmarks itself.&lt;/p>
&lt;p>This week&amp;rsquo;s &amp;ldquo;first patch&amp;rdquo; idea is to extend &lt;code>pgbench -i&lt;/code> to allow the
data load to happen in parallel, with multiple clients generating and
sending the data.&lt;/p></description></item><item xml:base="https://vondra.me/posts/playing-with-bolt-and-postgres/"><title>Playing with BOLT and Postgres</title><link>https://vondra.me/posts/playing-with-bolt-and-postgres/</link><pubDate>Wed, 25 Sep 2024 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/playing-with-bolt-and-postgres/</guid><description>&lt;p>A couple days ago I had a bit of free time in the evening, and I was
bored, so I decided to play with BOLT a little bit. No, not the &lt;a href="https://en.wikipedia.org/wiki/Bolt_%28Disney_character%29">dog
from a Disney movie&lt;/a>,
the &lt;a href="https://github.com/llvm/llvm-project/blob/main/bolt/README.md">BOLT&lt;/a>
tool from LLVM project, aimed at optimizing binaries. It took me a
while to get it working, but the results are unexpectedly good, in
some cases up to 40%. So let me share my notes and benchmark results,
and maybe there&amp;rsquo;s something we can learn from it. We&amp;rsquo;ll start by going
through a couple rabbit holes first, though.&lt;/p></description></item><item xml:base="https://vondra.me/posts/patch-idea-amcheck-support-for-brin-indexes/"><title>[PATCH IDEA] amcheck support for BRIN indexes</title><link>https://vondra.me/posts/patch-idea-amcheck-support-for-brin-indexes/</link><pubDate>Tue, 17 Sep 2024 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/patch-idea-amcheck-support-for-brin-indexes/</guid><description>&lt;p>Time for yet another &amp;ldquo;first patch&amp;rdquo; idea post ;-) This time it&amp;rsquo;s about
BRIN indexes. Postgres has a contrib module called
&lt;a href="https://www.postgresql.org/docs/current/amcheck.html">amcheck&lt;/a>,
meant to check logical consistency of objects (tables and indexes). At
the moment the module supports heap relations (i.e. tables) and B-Tree
indexes (by far the most commonly used index type). There is a &lt;a href="https://commitfest.postgresql.org/49/3733/">patch
adding support for GiST and GIN indexes&lt;/a>,
and the idea is to also allow checking &lt;a href="https://www.postgresql.org/docs/current/brin.html">BRIN&lt;/a>
indexes.&lt;/p></description></item><item xml:base="https://vondra.me/posts/writing-a-good-talk-proposal/"><title>Writing a good talk proposal</title><link>https://vondra.me/posts/writing-a-good-talk-proposal/</link><pubDate>Tue, 10 Sep 2024 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/writing-a-good-talk-proposal/</guid><description>&lt;p>I&amp;rsquo;ve submitted a lot of talk proposals to a lot of Postgres conferences
over the years. Some got accepted, many more were not. And I&amp;rsquo;ve been on
the other side of this process too, as a member of the CfP committee
responsible for selecting talks. So let me give you a couple suggestions
on how to write a good talk proposal.&lt;/p></description></item><item xml:base="https://vondra.me/posts/patch-idea-statistics-for-file-descriptor-cache/"><title>[PATCH IDEA] Statistics for the file descriptor cache</title><link>https://vondra.me/posts/patch-idea-statistics-for-file-descriptor-cache/</link><pubDate>Tue, 03 Sep 2024 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/patch-idea-statistics-for-file-descriptor-cache/</guid><description>&lt;p>Let me present another &amp;ldquo;first patch&amp;rdquo; idea, related to a runtime stats
on access to files storing data. Having this kind of information
would be very valuable on instances with many files (which can happen
for many reasons).&lt;/p>
&lt;p>This is a very different area than the &lt;a href="https://vondra.me/posts/patch-idea-use-copy-for-postgres-fdw-insert-batching">patch idea&lt;/a>,
which was about an extension. The runtime stats are at the core of the
system, and so is the interaction with the file systems. But it&amp;rsquo;s still
fairly isolated, and thus suitable for new contributors.&lt;/p></description></item><item xml:base="https://vondra.me/posts/office-hours-experiment/"><title>Office hours experiment</title><link>https://vondra.me/posts/office-hours-experiment/</link><pubDate>Sat, 31 Aug 2024 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/office-hours-experiment/</guid><description>&lt;p>I&amp;rsquo;ve decided to experiment a little bit and do regular &amp;ldquo;office hours.&amp;rdquo;
I&amp;rsquo;ll be available to chat about almost anything related to Postgres.
It might be a technical discussion about a patch you&amp;rsquo;re working on, or
a topic about the community etc.&lt;/p>
&lt;p>This is not an entirely new thing. I&amp;rsquo;ve been telling people to just
ping me if they want to discuss something off-list, or have a call and
chat about it. I did have a couple such calls, and it was nice - faster
than discussing that by email, maybe a bit closer to the ad hoc
watercooler talk or a hallway track. So I&amp;rsquo;m mostly just announcing this
more widely, with a couple simple &lt;a href="https://vondra.me/about/#office-hours">rules&lt;/a>.&lt;/p></description></item><item xml:base="https://vondra.me/posts/patch-idea-use-copy-for-postgres-fdw-insert-batching/"><title>[PATCH IDEA] Using COPY for postgres_fdw INSERT batching</title><link>https://vondra.me/posts/patch-idea-use-copy-for-postgres-fdw-insert-batching/</link><pubDate>Tue, 27 Aug 2024 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/patch-idea-use-copy-for-postgres-fdw-insert-batching/</guid><description>&lt;p>In an earlier post I &lt;a href="https://vondra.me/posts/how-to-pick-the-first-patch">mentioned&lt;/a> I
plan to share a couple patch ideas, suitable for new contributors. This
is the first one, about using &lt;a href="https://www.postgresql.org/docs/current/libpq-copy.html">&lt;code>COPY&lt;/code> protocol&lt;/a>
for &lt;a href="https://www.postgresql.org/docs/current/postgres-fdw.html">postgres_fdw&lt;/a>
batching. This would replace the current implementation, based on
prepared statements. Let me share a couple thoughts on the motivation
and how it might be implemented.&lt;/p></description></item><item xml:base="https://vondra.me/posts/import-mailing-list-archives/"><title>Importing Postgres mailing list archives</title><link>https://vondra.me/posts/import-mailing-list-archives/</link><pubDate>Fri, 23 Aug 2024 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/import-mailing-list-archives/</guid><description>&lt;p>A couple weeks ago I needed to move my mailing list communication to
a different mailbox. That sounds straightforward - go to the &lt;a href="https://lists.postgresql.org/manage/">community
account&lt;/a> and resubscribe to all
the lists with the new address, and then import a bit of history from
the archives so that the client can show threads, search etc.&lt;/p>
&lt;p>The first part worked like a charm, but importing the archives turned
out to be a bit tricky, and I ran into a bunch of non-obvious issues.
So here&amp;rsquo;s how I made that work in the end.&lt;/p></description></item><item xml:base="https://vondra.me/posts/how-to-pick-the-first-patch/"><title>How to pick the first patch?</title><link>https://vondra.me/posts/how-to-pick-the-first-patch/</link><pubDate>Tue, 20 Aug 2024 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/how-to-pick-the-first-patch/</guid><description>&lt;p>Picking the topic for your first patch in any project is hard, and
Postgres is no exception. Limited developer experience with important
parts of the code make it difficult to judge feasibility/complexity of
a feature idea. And if you&amp;rsquo;re not an experienced user, it may not be
very obvious if a feature is beneficial. Let me share a couple simple
suggestions on how to find a good topic for the first patch.&lt;/p></description></item><item xml:base="https://vondra.me/posts/will-postgres-rely-on-mailing-lists-forever/"><title>Will Postgres development rely on mailing lists forever?</title><link>https://vondra.me/posts/will-postgres-rely-on-mailing-lists-forever/</link><pubDate>Tue, 13 Aug 2024 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/will-postgres-rely-on-mailing-lists-forever/</guid><description>&lt;p>Postgres is pretty old. The &lt;a href="https://en.wikipedia.org/wiki/PostgreSQL">open source project&lt;/a>
started in 1996, so close to 30 years ago. And since then, Postgres has
become one of the most successful and popular databases. But it also
means a lot of the development process reflects how things were done
back then. The reliance on &lt;a href="https://www.postgresql.org/list/">mailing lists&lt;/a>
is a good example of this heritage. Let&amp;rsquo;s talk about if / how this might
change.&lt;/p></description></item><item xml:base="https://vondra.me/posts/the-state-of-the-postgres-community/"><title>The state of the Postgres community</title><link>https://vondra.me/posts/the-state-of-the-postgres-community/</link><pubDate>Sat, 03 Aug 2024 12:00:00 +0200</pubDate><guid>https://vondra.me/posts/the-state-of-the-postgres-community/</guid><description>&lt;p>About a month ago I presented a keynote at &lt;a href="https://www.pgday.ch/2024/">Swiss PGDay 2024&lt;/a>
about the state of the Postgres community. My talk included a couple
charts illustrating the evolution and current state of various parts of
the community - what works fine and what challenges will require more
attention.&lt;/p>
&lt;p>Judging by the feedback, those charts are interesting and reveal things
that are surprising or at least not entirely expected. So let me share
them, with a bit of additional commentary.&lt;/p></description></item></channel></rss>