device accessible on some clients but not on others

Hi,
we noticed a strange issue with a device.

This DS runs on a tango client running a debian 12 in a conda env with pytango 9.5.0 installed through pip.

It can be accessed on debian 11 clients with a debian 11 server (Tango 9.3.4) but not on our debian 12 clients (recently installed, Tango 9.3.4 also).

When we launch atkpanel on this specific device, we got this output message (same message is displayed on GUI):
twac-pup1:~$ atkpanel bluecougar/test/camera.1
5.9
AtkPanel: Cannot build attribute list.
AtkPanel: Application aborted….
AtkPanel: Connection Exception : Severity: ERROR
Origin: bluecougar/test/camera.1.class fr.esrf.TangoApi.DeviceProxyDAODefaultImpl.get_attribute_config_ex)
Description: Device (bluecougar/test/camera.1) timed out (>3000 ms)!
Reason: org.omg.CORBA.TIMEOUT: client timeout reached

When we access the device through pytango, we got a similar error:
cc@twac-pup3:~$ ipython –profile=tango
Python 3.12.3 | packaged by conda-forge | (main, Apr 15 2024, 18:38:13) [GCC 12.3.0]
Type 'copyright', 'credits' or 'license' for more information
IPython 8.24.0 – An enhanced Interactive Python. Type '?' for help.

IPython profile: tango

In [1]: import tango

In [2]: cam=tango.DeviceProxy("bluecougar/test/camera.1")

In [3]: cam.get_attribute_list()
—————————————————————————
CommunicationFailed Traceback (most recent call last)
Cell In[3], line 1
—-> 1 cam.get_attribute_list()

CommunicationFailed: DevFailed[
DevError[
desc = TRANSIENT CORBA system exception: TRANSIENT_CallTimedout
origin = DeviceProxy:get_attribute_config
reason = API_CorbaException
severity = ERR]

DevError[
desc = Timeout (3000 mS) exceeded on device bluecougar/test/camera.1
origin = DeviceProxy:get_attribute_config
reason = API_DeviceTimedOut
severity = ERR]

Si it does not seem due to atkpanel.

Other devices are accessible:
In [4]: tangotest=tango.DeviceProxy("sys/tg_test/1")

In [5]: tangotest.get_attribute_list()
Out[5]: ['ampli', 'boolean_scalar', 'double_scalar', 'double_scalar_rww', 'double_scalar_w', 'float_scalar', 'long64_scalar', 'long_scalar', 'long_scalar_rww', 'long_scalar_w', 'no_value', 'short_scalar', 'short_scalar_ro', 'short_scalar_rww', 'short_scalar_w', 'string_scalar', 'throw_exception', 'uchar_scalar', 'ulong64_scalar', 'ushort_scalar', 'ulong_scalar', 'boolean_spectrum', 'boolean_spectrum_ro', 'double_spectrum', 'double_spectrum_ro', 'float_spectrum', 'float_spectrum_ro', 'long64_spectrum_ro', 'long_spectrum', 'long_spectrum_ro', 'short_spectrum', 'short_spectrum_ro', 'string_spectrum', 'string_spectrum_ro', 'uchar_spectrum', 'uchar_spectrum_ro', 'ulong64_spectrum_ro', 'ulong_spectrum_ro', 'ushort_spectrum', 'ushort_spectrum_ro', 'wave', 'boolean_image', 'boolean_image_ro', 'double_image', 'double_image_ro', 'float_image', 'float_image_ro', 'long64_image_ro', 'long_image', 'long_image_ro', 'short_image', 'short_image_ro', 'string_image', 'string_image_ro', 'uchar_image', 'uchar_image_ro', 'ulong64_image_ro', 'ulong_image_ro', 'ushort_image', 'ushort_image_ro', 'State', 'Status']

In [6]: caen=tango.DeviceProxy("twac/ea/ps.01")

In [7]: caen.get_attribute_list()
Out[7]: ['current', 'ramp_current', 'voltage', 'ramp_voltage', 'power', 'loop_mode', 'enabled', 'ramping', 'fault', 'State', 'Status']

In [8]: bob=tango.DeviceProxy("test/modbus/bobine1_1")

In [9]: bob.get_attribute_list()
Out[9]: ['DefautEau', 'DefautTemp', 'DefautAlim', 'MarcheAlim', 'State', 'Status']

So I understand it is not some network issue or a generic tango config issue.

The only difference I found is the debian-specific version number:

On the debian 11 client, versions are following:

$ dpkg -l | grep tango
ii liblog4tango-dev:amd64 9.3.4+dfsg1-1 amd64 logging for TANGO - development library
ii liblog4tango5v5:amd64 9.3.4+dfsg1-1 amd64 logging for TANGO - shared library
ii libtango-dev:amd64 9.3.4+dfsg1-1 amd64 TANGO distributed control system - development library
ii libtango-tools 9.3.4+dfsg1-1 amd64 TANGO distributed control system - common executable files
ii libtango9:amd64 9.3.4+dfsg1-1 amd64 TANGO distributed control system - shared library
ii python3-tango 9.3.2-1+b1 amd64 API for the TANGO control system (Python 3)
ii tango-common 9.3.4+dfsg1-1 all TANGO distributed control system - common files
ii tango-db 9.3.4+dfsg1-1 amd64 TANGO distributed control system - database server
ii tango-starter 9.3.4+dfsg1-1 amd64 TANGO distributed control system - starter server
ii tango-test 9.3.4+dfsg1-1 amd64 TANGO distributed control system - test device


while on debian 12 clients I got:

$ dpkg -l | grep tango
ii liblog4tango-dev:amd64 9.3.4+dfsg1-2 amd64 logging for TANGO - development library
ii liblog4tango5v5:amd64 9.3.4+dfsg1-2 amd64 logging for TANGO - shared library
ii libtango-dev:amd64 9.3.4+dfsg1-2 amd64 TANGO distributed control system - development library
ii libtango-tools 9.3.4+dfsg1-2 amd64 TANGO distributed control system - common executable files
ii libtango9:amd64 9.3.4+dfsg1-2 amd64 TANGO distributed control system - shared library
ii tango-common 9.3.4+dfsg1-2 all TANGO distributed control system - common files
ii tango-starter 9.3.4+dfsg1-2 amd64 TANGO distributed control system - starter server


Our users remind that this device was working 2 weeks ago, so it seems something changed but I do not find any recent package change in /var/log/apt/ or in the conda env.
However, I do not think we can exclude some code change.

Would you have any idea of what could cause this issue?

Regards.
- Philippe
Hello,
I add that jive can be launched on the buggy clients, but I fail to launch astor

(base) cc@twac-pup3:~$ astor
Display is localhost:10.0
Device (sys/database/2) timed out (>3000 ms)!

while a window displays following message
Stack
org.omg.CORBA.TIMEOUT: client timeout reached
Severity - ERROR
Origin - sys/database/2.class fr.esrf.TangoApi.ConnectionDAODefaultImpl.command_inout)
Description - Device (sys/database/2) timed out (>3000 ms)!
Reason - org.omg.CORBA.TIMEOUT: client timeout reached
- Philippe
When clients cannot reach devices any more and just time out, then the first thing to check is if the devices are not stuck somehow. I am sure that the logs of your devices will provide proper information about what they are doing.

If the TangoDB cannot be reached then try to telnet it from the same computer:
telnet ${TANGO_HOST/:/ }
If telnet can reach TangoDB then your network might be laggy. Export an environment variab
export TANGOconnectTimeout=10000
and try to connect to the TangoDB from iTango. If that works then you have confirmed that it is a network issue. If that still does not work, then try again telnetting on the same computer that TangoDB runs on. That should always work.

BTW The telnet test also works for devices servers if you know on which CORBA port they are reachable.

Cheers,
Thomas
Thank you Thomas.
FYI, I heard the issue would be due to some MTU set to jumbo frame which was not compatible with the computer, and that reset the MTU configuration in /etc/network/interface.
- Philippe
 
Register or login to create to post a reply.