Tomas Vondra

Tomas Vondra

blog about Postgres code and community

How to pick the first patch?

Picking the topic for your first patch in any project is hard, and Postgres is no exception. Limited developer experience with important parts of the code make it difficult to judge feasibility/complexity of a feature idea. And if you’re not an experienced user, it may not be very obvious if a feature is beneficial. Let me share a couple simple suggestions on how to find a good topic for the first patch.

I’m going to write about Postgres, because that’s the project I have the most experience with. But I think most of what I write applies to other projects too.

Firstly, pick something simple. But even then you should assume getting the first patch in will probably take longer than you expect. You will have to learn a lot of new stuff.

This includes technical tasks like building and testing changes, maybe a bit of benchmarking etc. But it also involves a lot of non-technical stuff - learning how to communicate with the other developers, convince them your approach is sensible, etc.

Secondly, pick something that matters to you. Perhaps it’s related to a research topic you’re personally interested in? Or maybe the change would greatly help your current employer? Or something else.

I always hated homeworks. Not because of my general laziness (at least not exclusively), but because most of the time the whole point of the homework was to do the homework. Don’t do a patch only to do a patch.

It’s a bit sad to see someone spending a lot of time on a patch, and then just abandoning it after a while. I’m not saying it’s 100% waste, I’ve learned a lot from working on patches that didn’t make it. But still, abandoning a patch does not feel great.

Also, when a patch matters to you, that puts you in a good position to make judgments about the approach and trade offs. There usually are multiple ways to implement something, with different pros/cons. And you will be expected to explain why this is the right choice.

Where to find patch ideas?

A couple months ago I presented a lightning talk “Patch ideas and where to find them?” with a couple suggestions where to look for inspiration. I’ll go over some of those ideas in this post, but the main point I was trying to make is that:

Patch ideas are everywhere, you just need to look the right way.

It’s a bit like with the “attacker mindset” where issues are more an opportunity to do something unexpected. Almost every “not supported” and “not implemented” behavior may be considered an opportunity - maybe it could be supported and implemented? Not always, sometimes the gap is entirely intentional.

So, where can you look?

Commitfest Application

The first place I’d check is the commitfest application, especially the list of patches in the current cycle. If you go through the patches one by one, chances are you’ll find a couple where you will think either “This would be very useful for my application!” and you have a patch to review. And reviews are an excellent way to start contributing.

Or perhaps you’ll think “Nice, but I wish the patch also did X,” and that’s a new patch.

The documentation

Perhaps a bit unexpected, but documentation is a great source of patch ideas. The reason is very simple - we’re very are to document limitations of the current implementation. And as mentioned earlier, every such limitation is a possible future patch.

Consider for example this comment about batching in postgres_fdw:

postgres_fdw batching documentation

The first sentence says we only do batching for INSERT queries. But is there any principal reason why we couldn’t do the same thing for other DML commands too? I don’t think so, and UPDATE/ DELETE batching seems like a great idea for a patch (or maybe two?).

The second sentence is about an internal limitation. The batching is implemented as a prepared statement, and that comes with parameters indexed by uint16. But there are other ways to implement batching (hint: COPY), so switching to that implementation might be another patch?

I’m sure you’ll find many similar cases in the docs, and also in various comments in the source code. It’s pretty common to not implement some useful bit to keep the scope limited, but mention the limitation as a possible future improvement.

Mailing lists

Do we discuss missing features elsewhere? Yes, mailing lists! Consider for example this response on the performance list:

mailing lists

The “no support currently” can be translated as a “possible patch.” Sure, this does not say it’s a trivial patch - it seems simple, but there can be many non-obvious complications. For example, how stable does the hash assigned to a plan need to be?

Test coverage

Where else can you look? If you find no ideas for a new feature in the CFA or a mailing list, maybe consider improving the code we already have. A great way to do that (and learn something in the process) is to add tests for undertested code.

How to find code that would benefit from an extra test (or two)? We have coverage reports which show what fraction of each “module” (directory/file) gets exercised in regression tests. That’s an excellent starting point.

For example this:

test coverage

says that we don’t have very good coverage for fuzzystrmatch and pgstattuple contrib modules. It seems like adding a couple test queries for dmetaphone function would improve this a lot, for example.

This is unlikely to lead to “new features” of course. It’s more likely to run into a bug in the “undertested” part of the code. But that’s as good a contribution as a new feature.

Note: There’s a difference between adding meaningful tests and “improving test coverage” and improving the metric blindly.

TODO list on wiki

Finally, we also have a TODO list with perhaps 200 patch ideas. For a while I was discouraging new hackers from this list, because my impression was it’s not really maintained and hackers add items that they don’t know how to implement (which makes them terrible for new contributors).

But it seems I was wrong - at least partially. The list seems to be in a quite good shape, thanks to Bruce Momjian (and others) maintaining it. It does not say if the patches are suitable for new contributors, but the entries often point to a mailing list discussion.

My patch ideas

Granted, all of these suggestions have a major flaw. My first suggestion was to “pick something simple” but if you’re a new contributor, how would you know what’s simple to implement? A patch seems super simple, but you start hacking on it and after a while you realize it’s very complicated. Or even not feasible at all.

For a while I’ve been collecting patch ideas that I consider suitable for new contributors:

  • Does something I believe to be actually useful / beneficial.
  • I’m pretty sure it’s actually doable / feasible.
  • Requires changes only to an isolated part of the code.
  • The required chances are not exceptionally complex.
  • Does not touch the “missing critical” parts (like WAL).

At the moment I have maybe 20 such patch ideas, and ’ve been pitching some of them to people at conferences etc. I intend to do something like that here too. I’ll post patch ideas with a basic description of what I think the benefits would be, how it might be implemented, how complex do I expect it to be, etc.

So maybe keep watching planet.postgresql.org, and if you see a patch idea you like, give it a try. No promises it’ll get committed, maybe it won’t work in the end, but I’m willing to help with getting you started. Everything I wrote earlier still applies (e.g. you should find the idea interesting / useful).

Conclusions

  • Pick something simple, and keep it simple. If the patch grows, try to break it into multiple smaller/simpler patches.

  • Pick something you’re personally interested in. That helps to keep your attention, and also when choosing the approach.

  • Every time you run into a limitation, missing feature or performance issue, don’t just work around it. Take a note somewhere, and think if it could be improved.

  • When you run into a limitation/restriction, or a feature that is “not supported currently”, sometimes that might be a patch.

  • I plan to share patch ideas that I believe are suitable for new contributors (feasible, not too complex, …). If you see an idea that you find interesting, let me know.

Do you have feedback on this post? Please reach out by e-mail to tomas@vondra.me.