Debug or add resilience to incoming rabbitmq
@rayg and I noticed while debugging geosphere-test that some Geo2Grid containers displayed an exception from python about losing connection to rabbitmq. Others showed no error, but also weren't receiving updates.
│ ERROR:pika.adapters.utils.connection_workflow:getaddrinfo failed: gaierror(-2, 'Name or service not known'). │
│ ERROR:pika.adapters.utils.connection_workflow:AMQP connection workflow failed: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - gaierror(-2, 'Name or service not known'); first exception - None. │
│ ERROR:pika.adapters.utils.connection_workflow:AMQPConnectionWorkflow - reporting failure: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - gaierror(-2, 'Name or service not known'); first exception - None │
│ ERROR:pika.adapters.blocking_connection:Connection workflow failed: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - gaierror(-2, 'Name or service not known'); first exception - None │
│ ERROR:pika.adapters.blocking_connection:Error in _create_connection(). │
│ Traceback (most recent call last): │
│ File "/root/.local/lib/python3.9/site-packages/pika/adapters/blocking_connection.py", line 451, in _create_connection │
│ raise self._reap_last_connection_workflow_error(error) │
│ File "/root/.local/lib/python3.9/site-packages/pika/adapters/utils/selector_ioloop_adapter.py", line 565, in _resolve │
│ result = socket.getaddrinfo(self._host, self._port, self._family, │
│ File "/usr/lib64/python3.9/socket.py", line 954, in getaddrinfo │
│ for res in _socket.getaddrinfo(host, port, family, type, proto, flags): │
│ socket.gaierror: [Errno -2] Name or service not known │
│ Traceback (most recent call last): │
│ File "/work/product_amqp.py", line 123, in <module> │
│ sys.exit(main()) │
│ File "/work/product_amqp.py", line 70, in main │
│ publish_amqp_messages(amqp_msg_gen) │
│ File "/work/product_amqp.py", line 106, in publish_amqp_messages │
│ with pika.BlockingConnection(conn_params) as connection: │
│ File "/root/.local/lib/python3.9/site-packages/pika/adapters/blocking_connection.py", line 360, in __init__ │
│ self._impl = self._create_connection(parameters, _impl_class) │
│ File "/root/.local/lib/python3.9/site-packages/pika/adapters/blocking_connection.py", line 451, in _create_connection │
│ raise self._reap_last_connection_workflow_error(error) │
│ File "/root/.local/lib/python3.9/site-packages/pika/adapters/utils/selector_ioloop_adapter.py", line 565, in _resolve │
│ result = socket.getaddrinfo(self._host, self._port, self._family, │
│ File "/usr/lib64/python3.9/socket.py", line 954, in getaddrinfo │
│ for res in _socket.getaddrinfo(host, port, family, type, proto, flags): │
│ socket.gaierror: [Errno -2] Name or service not known │
│ DEBUG: Done sending AMQP messages │
│
Ray points out that the above exception is from the sending amqp logic, not the receiving. It's possible that the Pods/containers with no exceptions crashed on listening/receive.