xf.is Blog   Archives  About

Slow startup on Fedora 28 VPS

2018-07-05

When rebooting this instance I noticed a very long startup time. Running

# systemd-analyze blame
    11min 4.212s cloud-init-local.service
    1min 30.221s chronyd.service
...

shows that cloud-init was taking alot of time run (and it is intentionally a blocking process since it is making changes to the system). Browsing cloud-init.log in /var/log/ gave another clue:

Attempting to remove /var/lib/cloud/instance
Using distro class <class 'cloudinit.distros.fedora.Distro'>
Looking for for data source in: ['NoCloud', 'ConfigDrive', 'OpenNebula', 'DigitalOcean', 'Azure', 'AltCloud', 'OVF', 'MAAS', 'GCE', 'OpenStack', 'AliYun', 'Ec2', 'CloudSigma', 'CloudStack', 'SmartOS', 'Bigstep', 'Scaleway', 'None'], via packages ['', 'cloudinit.sources'] that matches dependencies ['FILESYSTEM']
...
10 minute delay
...
Searching for local data source in: ['DataSourceNoCloud', 'DataSourceConfigDrive', 'DataSourceOpenNebula', 'DataSourceDigitalOcean', 'DataSourceAzure', 'DataSourceOVF', 'DataSourceEc2Local', 'DataSourceCloudSigma', 'DataSourceSmartOS']
start: init-local/search-NoCloud: searching for local data from DataSourceNoCloud
Seeing if we can get any data from <class 'cloudinit.sources.DataSourceNoCloud.DataSourceNoCloud'>

From the log file it seemse the cloud-init-local process is trying to load different datasource profiles and not succeeding.

Added following line in /etc/cloud/cloud.cfg to only enable specific datasources:

datasource_list: [ NoCloud, ConfigDrive, DigitalOcean, None ]

Boot time is now almost 10 minutes less

# systemd-analyze blame
    1min 49.823s cloud-init.service
    1min 30.221s chronyd.service
...

Checking chronyd.service revealed that is was being killed after 90 seconds of run time (default) for not starting. Probably because cloud-init was running and network not available.

Fixed it by makeing chronyd.service run after cloud-init.service:

Running

systemctl edit chronyd.service

and adding

[Unit]
After=cloud-init.service

[Service]
TimeoutStartSec=10

make chronyd.service start up after cloud-init service.

Going from ~13 minutes boot time to ~2 minutes is a huge improvement. Why cloud-init.service is taking 100 seconds to run I have no idea.

EDIT:

After seing this article it clicked that I was experiencing the same issue. Checking dmesg gave the clue:

[  121.931684] random: crng init done
[  121.932880] random: 7 urandom warning(s) missed due to ratelimiting

This let me to Bug 1572944 and the attempted fix for CVE-2018-1108. This issue is documented in the Fedora wiki.

Before the reboot I updated the kernel to 4.17.3-200 from 4.17.2-200

In a nutshell cloud-init tries to use random data early in the boot process but is starved so it takes a long time to finish. The above “fix” limits the need for random data early in the boot process. The VPS provider doesn’t provide access to virtio_rng via /dev/hwrng so I guess I have to suffer long boot times.