OSM replication osc poly filter
Regional osm2pgsql updated minutely as an alternative to Overpass.
Short version:
Recently, I wrote a script that could help with OSM DB data replication using osm2pgsql by filtering .osc files with a specific .poly. It doesn’t limit to a specific continent or country, it can be used with any custom PBF (e.g. a specific city), so it doesn’t need to use .fr replication, it uses planet.osm.org directly (which also could be changed).
It’s inspired by the trim_osc.py script by Zverik, but rewritten from scratch with tests, because the script unfortunately didn’t work for me (maybe I did something wrong).
It’s not a very typical thing, so I’m not sure if it will be useful for anyone, but if someone would like try to self-host a regional OSM DB with replication, I recommend to at least checking it.
More details in the repo: osm-replication-osc-poly-filter
Longer version:
A few months ago, I was looking for an alternative for public Overpass instances (due to overloaded servers) for my projects.
I read SomeoneElse’s diary about self-hosting Overpass instance. But it seems over-complicated to me. I also read many times that there are random issues with reliability and it’s time-consuming to maintain it (not sure how true it is).
I decided that I want to switch to something else for my projects. Something more low-level with greater control of data and more likely self-hosted to avoid such problems. Instead of OverpassQL. I switched to SQL with a PostGIS DB. There are no a lot of choices here, so I chose osm2pgsql.
Osm2pgsql is quite advanced tool, which I really recommend at least reading about it – it may require some time to learn it, but it’s worth to seeing some features like --output=flex, which allows to defining custom table schemas in Lua scripts with tags/geometry columns which works when importing and appending (replicating) data. It can be adjusted and optimized per project.
Unfortunately, there are some difficulties, like installing specific os2mpgsql version (there are no binaries of app in releases, no custom repos, and older OSes don’t get the latest version, Debian backports are also not good solution for me – I would expect possibility of easily pinning the version which I want to use). Replication also has some limitations. For example, if we load data from a GeoFabrik extract, it will replicate every ~24 hours, because GeoFabrik doesn’t support minutely replication. We can always replace it with .fr planet servers, but it won’t support untypical PBFs, and Poly files are a little bit different, so I didn’t decide to use it either.
I found this thread, which contains a solution to above problem, which means to: import custom PBF, and apply minutely replication. It relies on the trim_osc.py script by Zverik, but unfortunately it didn’t work for me. Maybe I did something wrong, but very quickly I saw data in my DB from another continent.
Finally, I decided to write my script from scratch which filters .osc files by specific poly. There are really many edge cases to handle, but despite the rewritten script, some limitations still exist due to complexity or .osc file lack of data (e.g. node versions with nd or member tags). I also dockerized everything and pushed it to Docker Hub. So it can be easily self-hosted.
Not sure if anyone will found it useful, but publishing it anyway.
More details in the repo: osm-replication-osc-poly-filter
PS. It’s my first diary, maybe I should write more often about my other projects as well :)
Discussion in the ATmosphere