Render_list no longer renders tiles

When prerendering tiles, I would have renderd be killed every few hours by the system for OOM (it would slowly consume the entire system). I changed systemd to restart the service on failure, which extended the system life to about a day before needing a restart.

I’ve noticed now that render_list no longer generates tiles. mod_tile will still have renderd generate tiles on demand, but it seems to stall out after a minute or so.

Is there a lockfile or db flag I need to clean up? I feel like the system is waiting for a shared lock that was never released.

In order for people to comment helpfully, they’re probably going to need examples of what’s happening in the system log, and also information about what versions of things you’re running.

1 Like

Of course. It’s a month old install following the guide for switch2osm for ubuntu 22.04, using the default openstreetmap-carto stylesheet on 8 core / 64gb / 2TB ssd machine.

render_list call:

james@osm-import-local:/osm$ render_list -f -m s2o -all -z 11 -Z 13 --num-threads=8 -t /osm/tile-cache/tiles/ &
[1] 10511
james@osm-import-local:/osm$ Rendering client
Starting 8 rendering threads
Rendering all tiles from zoom 11 to zoom 13
Rendering all tiles for zoom 11 from (0, 0) to (2047, 2047)

syslog:

james@osm-import-local:~$ tail -f /var/log/syslog | grep renderd
Jun  8 08:48:05 osm-import-local renderd[846]: Data is available now on 1 fds
Jun  8 08:48:05 osm-import-local renderd[846]: Got incoming connection, fd 7, number 0, total conns 1, total slots 1
Jun  8 08:48:05 osm-import-local renderd[846]: Data is available now on 1 fds
Jun  8 08:48:05 osm-import-local renderd[846]: Got incoming connection, fd 8, number 1, total conns 2, total slots 2
Jun  8 08:48:05 osm-import-local renderd[846]: Data is available now on 1 fds
Jun  8 08:48:05 osm-import-local renderd[846]: Got incoming connection, fd 9, number 2, total conns 3, total slots 3
Jun  8 08:48:05 osm-import-local renderd[846]: Data is available now on 1 fds
Jun  8 08:48:05 osm-import-local renderd[846]: Got incoming connection, fd 10, number 3, total conns 4, total slots 4
Jun  8 08:48:05 osm-import-local renderd[846]: Data is available now on 1 fds
Jun  8 08:48:05 osm-import-local renderd[846]: Got incoming connection, fd 11, number 4, total conns 5, total slots 5
Jun  8 08:48:05 osm-import-local renderd[846]: Data is available now on 1 fds
Jun  8 08:48:05 osm-import-local renderd[846]: Got incoming connection, fd 12, number 5, total conns 6, total slots 6
Jun  8 08:48:05 osm-import-local renderd[846]: Data is available now on 1 fds
Jun  8 08:48:05 osm-import-local renderd[846]: Got incoming connection, fd 13, number 6, total conns 7, total slots 7
Jun  8 08:48:05 osm-import-local renderd[846]: Data is available now on 1 fds
Jun  8 08:48:05 osm-import-local renderd[846]: Got incoming connection, fd 14, number 7, total conns 8, total slots 8

Process list:

james@osm-import-local:~$ ps -alh
4     0     864       1  20   0   6172  1092 -      Ss+  tty1       0:00 /sbin/agetty -o -p -- \u --noclear tty1 linux
0  1000    2931    2930  20   0   8864  5648 do_sel Ss+  pts/0      0:00 -bash
0  1000   10466   10465  20   0   9184  5428 do_wai Ss   pts/1      0:00 -bash
0  1000   10511    2931  20   0  86536  6476 futex_ Sl   pts/0      0:00 render_list -f -m s2o -all -z 11 -Z 13 --num-threads=8 -t /osm/tile-cache/tiles/
0  1000   10559   10466  20   0  10460  1592 -      R+   pts/1      0:00 ps -alh

From the DB side:

gis=# select now() - backend_start as duration, pid, usename, wait_event_type, wait_event, state, LEFT(query,80) from pg_stat_activity;
       duration        |  pid  | usename  | wait_event_type |     wait_event      | state  |                                       left
-----------------------+-------+----------+-----------------+---------------------+--------+----------------------------------------------------------------------------------
 1 day 06:58:07.804978 |   973 |          | Activity        | AutoVacuumMain      |        |
 1 day 06:58:07.804559 |   975 | postgres | Activity        | LogicalLauncherMain |        |
 00:01:01.486766       | 10539 | postgres |                 |                     | active | select now() - backend_start as duration, pid, usename, wait_event_type, wait_ev
 1 day 06:58:07.805317 |   971 |          | Activity        | BgWriterHibernate   |        |
 1 day 06:58:07.805534 |   970 |          | Activity        | CheckpointerMain    |        |
 1 day 06:58:07.805145 |   972 |          | Activity        | WalWriterMain       |        |
(6 rows)

htop shows negligable processor activity, iotop shows no disk activity.

It looks like unfortunately something went wrong before that log snippet. As you’re no doubt aware, renderd should then do something with the request, like this:

Jun 10 08:10:03 map renderd[698142]: Data is available now on 1 fds
Jun 10 08:10:03 map renderd[698142]: Got incoming connection, fd 7, number 0, total conns 1, total slots 4
Jun 10 08:10:03 map renderd[698142]: Data is available now on 1 fds
Jun 10 08:10:03 map renderd[698142]: Got incoming request with protocol version 2
Jun 10 08:10:03 map renderd[698142]: Got command RenderPrio fd(7) xml(ajt), z(18), x(127617), y(80183), mime(image/png), options()
Jun 10 08:10:03 map renderd[698142]: START TILE ajt 18 127616-127623 80176-80183, new metatile
Jun 10 08:10:03 map renderd[698142]: Rendering projected coordinates 18 127616 80176 -> -528332.739507|7779454.990756 -527109.747055|7780677.983209 to a 8 x 8 tile

Maybe try running with fewer threads and keep an eye on memory use as it transitions from “working” to “not working”?

For completeness, When I run render_list I mail the output to myself for information. The command that kicked ^^ off was “render_list -a -z 3 -Z 3 -x 3 -X 7 -y 2 -Y 7 -n 1 -m ajt”.