|
Hi everyone!
I would like to share with you our problem. For the moment I will explain it on a very specific scenario that we have here at Alba. We will work on preparing an isolated case, but maybe meanwhile some expert could already see an error on our side or a bug.
So let's start…
The problem was discovered when integrating Keithley in the continuous scan.
The simplified Tango architecture is as follows (all written in Python):
Keithley device emits data ready events when it is involved in an acquisition.
On the client side (Sardana controller) we are listening to those events and in the event callback we execute a command on that device in order to obtain data. Apart of that, the acquisition action (a dedicated thread) executes periodically State command on that device in order to check if the acquisition have finished. All of that is done using a single DeviceProxy.
This causes random and sporadic timeouts when acquiring the monitor on the push_data_ready call.
DevFailed[
DevError[
desc = Not able to acquire serialization (dev, class or process) monitor
origin = TangoMonitor::get_monitor
reason = API_CommandTimedOut
severity = ERR] We have run the Keithley DS with v5 Tango logging. We were able to reproduce the problem three times (two of them produce the similar traces) and all of them points to the State command which locks the monitor for more than 3 seconds. NORMAL EXECUTION OF STATE COMMAND:
1464270693 [140091939538688] In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 0
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor DeviceImpl::state (attribute) arrived
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() entering …
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() found : No associated device name!
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering …
1464270693 [140091939538688] INFO rpastor/keithely2010/01 2016-05-26 15:51:33.541613: In rpastor/keithely2010/01::always_executed_hook()
1464270693 [140091939538688] DEBUG rpastor/keithely2010/01 Entering in state_machine ((Keithley2010(rpastor/keithely2010/01),))
1464270693 [140091939538688] DEBUG rpastor/keithely2010/01 Exiting state_machine
1464270693 [140091939538688] INFO rpastor/keithely2010/01 2016-05-26 15:51:33.541784: After state_machine rpastor/keithely2010/01::always_executed_hook()
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 1
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor owner_thread !!
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring …
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 1
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor owner_thread !!
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring …
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464270693 [140091939538688] INFO rpastor/keithely2010/01 2016-05-26 15:51:33.541945: Out rpastor/keithely2010/01::always_executed_hook()
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 1, thread = 6
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor Signalling !
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering …
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::state (attribute)
CASE 1 (coincides with emitting Heartbeat event) OCCURENCE 1:
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 0
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor DeviceImpl::state (attribute) arrived
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() entering …
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() found : No associated device name!
1464270693 [140091931145984] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 15, ctr = 1
1464270693 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering …
1464270693 [140091931145984] DEBUG dserver/Keithley/rpastor Thread 15: waiting !!
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor ———-> Time = 462270694,977220 Sending event heartbeat
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): delta time since last heartbeat 9
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): detected heartbeat event for tango://pc255.cells.es:10000/dserver/keithley/rpastor.heartbeat
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): delta _time 9
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): nb_event = 1
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor Event heartbeat, next wake_up at 462270703,977598
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor Sub device property storage, next wake_up at 462272410,474407
1464270694 [140091947931392] DEBUG dserver/Keithley/rpastor Sleep for : 9000
1464270696 [140091922753280] DEBUG dserver/Keithley/rpastor DeviceImpl::ping arrived
1464270696 [140091922753280] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::ping
1464270696 [140091922753280] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 8, ctr = 1
1464270696 [140091922753280] DEBUG dserver/Keithley/rpastor Thread 8: waiting !!
1464270696 [140091931145984] DEBUG dserver/Keithley/rpastor TIME OUT for thread 15
DevFailed[
DevError[
desc = Not able to acquire serialization (dev, class or process) monitor
origin = TangoMonitor::get_monitor
reason = API_CommandTimedOut
severity = ERR]
]
1464270696 [140091939538688] INFO rpastor/keithely2010/01 2016-05-26 15:51:36.779552: In rpastor/keithely2010/01::always_executed_hook()
1464270696 [140091939538688] DEBUG rpastor/keithely2010/01 Entering in state_machine ((Keithley2010(rpastor/keithely2010/01),))
1464270696 [140091939538688] DEBUG rpastor/keithely2010/01 Exiting state_machine
1464270696 [140091939538688] INFO rpastor/keithely2010/01 2016-05-26 15:51:36.779722: After state_machine rpastor/keithely2010/01::always_executed_hook()
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 1
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor owner_thread !!
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring …
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 1
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor owner_thread !!
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring …
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464270696 [140091939538688] INFO rpastor/keithely2010/01 2016-05-26 15:51:36.780422: Out rpastor/keithely2010/01::always_executed_hook()
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 1, thread = 6
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor Signalling !
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering …
1464270696 [140091939538688] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::state (attribute)
CASE 1 (coincides with emitting Heartbeat event) OCCURRENCE 2:
1464274066 [139868710455040] In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 0
1464274066 [139868710455040] DEBUG dserver/Keithley/rpastor DeviceImpl::state (attribute) arrived
1464274066 [139868710455040] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() entering …
1464274066 [139868710455040] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() found : No associated device name!
1464274066 [139868710455040] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering …
1464274066 [139868710455040] INFO rpastor/keithely2010/01 2016-05-26 16:47:46.932663: In rpastor/keithely2010/01::always_executed_hook()
1464274066 [139868710455040] DEBUG rpastor/keithely2010/01 Entering in state_machine ((Keithley2010(rpastor/keithely2010/01),))
1464274066 [139868710455040] DEBUG rpastor/keithely2010/01 Exiting state_machine
1464274066 [139868710455040] INFO rpastor/keithely2010/01 2016-05-26 16:47:46.932833: After state_machine rpastor/keithely2010/01::always_executed_hook()
1464274066 [139868710455040] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 1
1464274066 [139868710455040] DEBUG dserver/Keithley/rpastor owner_thread !!
1464274066 [139868361455360] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 8, ctr = 2
1464274066 [139868361455360] DEBUG dserver/Keithley/rpastor Thread 8: waiting !!
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor ———-> Time = 462274069,533521 Sending event heartbeat
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): delta time since last heartbeat 25
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): detected heartbeat event for tango://pc255.cells.es:10000/dserver/keithley/rpastor.heartbeat
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): delta _time 25
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor ZmqEventSupplier::push_heartbeat_event(): nb_event = 2
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor Event heartbeat, next wake_up at 462274078,534326
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor Sub device property storage, next wake_up at 462275844,004265
1464274069 [139868718847744] DEBUG dserver/Keithley/rpastor Sleep for : 9000
1464274069 [139868344669952] DEBUG dserver/Keithley/rpastor DeviceImpl::ping arrived
1464274069 [139868344669952] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::ping
1464274069 [139868344669952] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 10, ctr = 2
1464274069 [139868344669952] DEBUG dserver/Keithley/rpastor Thread 10: waiting !!
1464274070 [139868361455360] DEBUG dserver/Keithley/rpastor TIME OUT for thread 8
DevFailed[
DevError[
desc = Not able to acquire serialization (dev, class or process) monitor
origin = TangoMonitor::get_monitor
reason = API_CommandTimedOut
severity = ERR]
]
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring …
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 1
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor owner_thread !!
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring …
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464274070 [139868710455040] INFO rpastor/keithely2010/01 2016-05-26 16:47:50.147497: Out rpastor/keithely2010/01::always_executed_hook()
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 1, thread = 6
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor Signalling !
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering …
1464274070 [139868710455040] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::state (attribute)
CASE 2 (no Hearbeat emit):
1464272713 [140721105397504] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 16, ctr = 2
1464272713 [140721105397504] DEBUG dserver/Keithley/rpastor Thread 16: waiting !!
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor DeviceImpl::ping arrived
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::ping
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 8, ctr = 2
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor Thread 8: waiting !!
1464272716 [140721105397504] DEBUG dserver/Keithley/rpastor TIME OUT for thread 16
DevFailed[
DevError[
desc = Not able to acquire serialization (dev, class or process) monitor
origin = TangoMonitor::get_monitor
reason = API_CommandTimedOut
severity = ERR]
]
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring …
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 6, ctr = 1
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor owner_thread !!
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring …
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 6
1464272716 [140721113790208] INFO rpastor/keithely2010/01 2016-05-26 16:25:16.446286: Out rpastor/keithely2010/01::always_executed_hook()
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 1, thread = 6
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor Signalling !
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering …
1464272716 [140721113790208] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::state (attribute)
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor DeviceImpl::state (attribute) arrived
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() entering …
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor SubDevDiag::get_associated_device() found : No associated device name!
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering …
1464272716 [140721097004800] INFO rpastor/keithely2010/01 2016-05-26 16:25:16.446446: In rpastor/keithely2010/01::always_executed_hook()
1464272716 [140721097004800] DEBUG rpastor/keithely2010/01 Entering in state_machine ((Keithley2010(rpastor/keithely2010/01),))
1464272716 [140721097004800] DEBUG rpastor/keithely2010/01 Exiting state_machine
1464272716 [140721097004800] INFO rpastor/keithely2010/01 2016-05-26 16:25:16.446530: After state_machine rpastor/keithely2010/01::always_executed_hook()
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 8, ctr = 1
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor owner_thread !!
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring …
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 8
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor In get_monitor() rpastor/keithely2010/01, thread = 8, ctr = 1
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor owner_thread !!
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor Attribute::fire_change_event() entring …
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 2, thread = 8
1464272716 [140721097004800] INFO rpastor/keithely2010/01 2016-05-26 16:25:16.446616: Out rpastor/keithely2010/01::always_executed_hook()
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor In rel_monitor() rpastor/keithely2010/01, ctr = 1, thread = 8
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor Signalling !
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor SubDevDiag::set_associated_device() entering …
1464272716 [140721097004800] DEBUG dserver/Keithley/rpastor Leaving DeviceImpl::state (attribute)
We are using: Tango 8.1.2 (with patches applied) PyTango 8.1.1
If no one comments we will proceed in preparing an isolated scenario.
Many thanks in advance for any suggestions!
|