Skip to main content

Troubleshooting

Common failure modes and their fixes. If your problem is not here, check the Infrahub documentation or open an issue.


Bootstrap timeout

Symptom:

[ERROR] ConnectionRefusedError: [Errno 111] Connection refused

or the bootstrap script exits immediately with "Infrahub not reachable".

Cause: The bootstrap script runs too soon after docker compose up. Infrahub takes 20–40 seconds to initialise its database and API server.

Fix:

# Wait for Infrahub to be ready, then bootstrap manually:
uv run invoke start
sleep 40
uv run invoke bootstrap

Alternatively, run invoke init which includes a built-in wait, but if your machine is slow, increase the sleep in tasks.py:

# tasks.py — find the sleep call in init_demo and increase it
time.sleep(40) # change to 60 or more on slow machines

Generator fails: "No free physical interface"

Symptom:

RuntimeError: No free physical interface on pe-lon-arista

Cause: The L3VpnGenerator allocates PE-CE interfaces by picking the first InterfacePhysical with status = free on the target PE. All interfaces on that PE are already in use (either active or cust).

Fix: Create a free interface on the PE via the Infrahub UI or GraphQL:

curl -s -X POST http://localhost:8000/graphql \
-H "X-INFRAHUB-KEY: 06438eb2-8019-4776-878c-0941b1f1d1ec" \
-H "Content-Type: application/json" \
-d '{
"query": "mutation {
InterfacePhysicalCreate(data: {
name: {value: \"Ethernet10\"},
status: {value: \"free\"},
mtu: {value: 9000},
device: {hfid: [\"pe-lon-arista\"]}
}) { ok object { id } }
}"
}'

Repeat for whichever PE is out of free interfaces.


Port conflict: Infrahub already running

Symptom:

Error response from daemon: driver failed programming external connectivity ...
Bind for 0.0.0.0:8000 failed: port is already allocated

Cause: Another process (or a leftover container) is using port 8000, 4200 (Prefect), or 8501 (Streamlit).

Fix:

# Find what is using the port:
lsof -i :8000

# Stop leftover containers from this project:
docker compose -p sp-demo down

# If another unrelated project is running on the same port,
# override the port in .env:
echo "INFRAHUB_PORT=8001" >> .env
# Then update INFRAHUB_ADDRESS accordingly:
echo 'INFRAHUB_ADDRESS="http://localhost:8001"' >> .env
uv run invoke start

Containerlab: image pull failure

SR Linux (ghcr.io/nokia/srlinux)

Error response from daemon: Head "https://ghcr.io/...": unauthorized

SR Linux is on GitHub Container Registry and requires a GitHub token:

docker login ghcr.io -u <github-username> -p <personal-access-token>

Create a PAT at https://github.com/settings/tokens with read:packages scope.

Arista cEOS (ceos:latest)

cEOS is not on any public registry. Download the .tar.xz from the Arista Software Downloads portal (requires an Arista account) and import it:

docker import cEOS-lab-4.30.0F.tar.xz ceos:latest
docker image ls ceos

network-multitool (ghcr.io/hellt/network-multitool)

This is a public image. If it fails, you may be hitting Docker Hub rate limits. Authenticate with Docker Hub:

docker login docker.io

Streamlit catalog: "Infrahub not reachable"

Symptom: The Streamlit app loads but shows "Cannot connect to Infrahub".

Cause: INFRAHUB_ADDRESS is not set or points to the wrong host/port.

Fix:

# Verify the env var is set:
grep INFRAHUB_ADDRESS .env

# If running Streamlit outside Docker Compose, ensure the address
# points to the correct host:
echo 'INFRAHUB_ADDRESS="http://localhost:8000"' >> .env
uv run streamlit run service_catalog/app.py

If you are running the Streamlit app inside Docker Compose, use the internal service name:

INFRAHUB_ADDRESS=http://infrahub-server:8000

Schema load order error

Symptom:

SchemaNotFound: Node 'RoutingProtocol' not found

Cause: The SP schemas (schemas/sp/) reference base and extension nodes that were not loaded first.

Fix: Always load schemas in order:

infrahubctl schema load schemas/base/
infrahubctl schema load schemas/extensions/
infrahubctl schema load schemas/sp/

invoke bootstrap does this automatically. If you loaded schemas manually in the wrong order, run invoke destroy && invoke init to start clean.


Proposed Change: check fails after generator runs

Symptom: The pe_interface_alloc check fails even though the generator ran and set the interface.

Cause: The check reads the current branch state. If the generator ran on a previous branch and the change was already merged, the interface status may already be cust on main, making it appear unavailable for a new site.

Fix: Use a different (free) interface for the new site, or set an existing cust-status interface back to free if it was decommissioned.


yamllint: line too long

Symptom:

[error] line too long (105 > 100 characters)

Cause: Bootstrap YAML files sometimes have long inline object specs.

Fix: Break the long line or raise the limit in .yamllint.yml:

rules:
line-length:
max: 120
level: warning # demote from error to warning if needed

uv sync fails

Symptom: uv sync fails with a resolver error or requires-python mismatch.

Fix:

# Verify your Python version:
python --version # must be 3.10, 3.11, or 3.12

# If using pyenv:
pyenv install 3.12
pyenv local 3.12
uv sync