sys-tango-benchmark results

# 5 years ago
Anton	Hi All SKA is interested in benchmarking performance of a large TANGO system in our environment. The https://github.com/tango-controls/sys-tango-benchmark tool looks very useful in this regard. Does anyone have some results from previous runs that they'd be willing to share? We're looking at 10k to 100k devices, but even smaller systems will be interesting for comparison. Maybe this is already available somewhere online? Regards, Anton

# 5 years ago

Hi All

SKA is interested in benchmarking performance of a large TANGO system in our environment. The https://github.com/tango-controls/sys-tango-benchmark tool looks very useful in this regard. Does anyone have some results from previous runs that they'd be willing to share? We're looking at 10k to 100k devices, but even smaller systems will be interesting for comparison. Maybe this is already available somewhere online?

Regards,
Anton

# 5 years ago
pgoryl	Hi Anton, You are asking for it just in time . We are now preparing an ask for Institutes to run some unified set of benchmarks to get results. Up to now, we were running it only on the local virtual machines for test purposes. Beginning next week we will provide a proposition of the configuration .yml file. All the best, Piotr

# 5 years ago
Andy	Hi Anton, this sounds like a very interesting use case and as Piotr pointed out this arrives just when Piotr is requesting benchmarks from all sites. We need these results before ICALEPCS. In your case I wonder what kind of metrics are you planning to measure? I can imagine measuring performance as a function of number of clients per server but in the case of many devices what values do you want to measure - startup times, grouped calls, individual client accessing 1 or more device servers, events performance? As you know the Tango model implements point-2-point connections between clients and servers. Multiplying the number of device servers does not necessarily impact the performance of individual client-server connections. Are you planning on putting 10k devices in one device server? Or are you looking to optimise the number of devices per device server? Andy

# 5 years ago

Andy

Hi Anton,

this sounds like a very interesting use case and as Piotr pointed out this arrives just when Piotr is requesting benchmarks from all sites. We need these results before ICALEPCS.

In your case I wonder what kind of metrics are you planning to measure? I can imagine measuring performance as a function of number of clients per server but in the case of many devices what values do you want to measure - startup times, grouped calls, individual client accessing 1 or more device servers, events performance? As you know the Tango model implements point-2-point connections between clients and servers. Multiplying the number of device servers does not necessarily impact the performance of individual client-server connections. Are you planning on putting 10k devices in one device server? Or are you looking to optimise the number of devices per device server?

Andy

# 5 years ago
Anton	Hi Andy, Piotr Thanks for replies. Glad to hear some tests and reports are planned. Andy, you make a good point about the point-to-point communications, which should remain very efficient. We're thinking of looking at metrics like these: - Start / initialisation time. - Peak memory usage . - Peak CPU usage. - Some measure of query response time for attributes & commands & events, to N devices concurrently. - How many devices can we run on a VM with say 1 CPU and 4 GB RAM. Does doubling the resources allow twice as many devices? - Possibly the TANGO DB registration time (first time population of the DB with all devices, attributes, properties), although this isn't a recurring cost, so not that important. In our environment everything is Dockerised. The plan is to use Kubernetes to orchestrate the TANGO control system. Early tests have shown that 1 device per device server per container doesn't scale very well - e.g. problem starting 2000 on a single machine. Multiple devices per server works better, with maybe 100 containers on a machine. We are looking at how to spread the load out, giving guidelines for developers. Questions like: - How many devices per device server? - How many device servers per container? - How many containers per VM? - How much CPU and RAM per VM? Obviously, it depends what each device is doing, but we'd start with something simple. Anton

# 5 years ago

Anton

Hi Andy, Piotr

Thanks for replies. Glad to hear some tests and reports are planned. Andy, you make a good point about the point-to-point communications, which should remain very efficient.

We're thinking of looking at metrics like these:
- Start / initialisation time.
- Peak memory usage .
- Peak CPU usage.
- Some measure of query response time for attributes & commands & events, to N devices concurrently.
- How many devices can we run on a VM with say 1 CPU and 4 GB RAM. Does doubling the resources allow twice as many devices?
- Possibly the TANGO DB registration time (first time population of the DB with all devices, attributes, properties), although this isn't a recurring cost, so not that important.

In our environment everything is Dockerised. The plan is to use Kubernetes to orchestrate the TANGO control system. Early tests have shown that 1 device per device server per container doesn't scale very well - e.g. problem starting 2000 on a single machine. Multiple devices per server works better, with maybe 100 containers on a machine. We are looking at how to spread the load out, giving guidelines for developers. Questions like:
- How many devices per device server?
- How many device servers per container?
- How many containers per VM?
- How much CPU and RAM per VM?
Obviously, it depends what each device is doing, but we'd start with something simple.

Anton

# 5 years ago
pgoryl	Hi Anton, You can start with a kind of standard tests (prepared by Michal) to be able to compare results from different institutes. See: https://github.com/tango-controls/sys-tango-benchmark-standard-tests Regarding already available benchmarks, there are measurements of: response time for attributes, pipes and commands, events subscriptions dynamic attributes impact on memory consumption howe the start time is influenced by the number of devices within device servers Feel free to propose or (event better ) to write additional tests. Piotr

# 5 years ago

pgoryl

Hi Anton,

You can start with a kind of standard tests (prepared by Michal) to be able to compare results from different institutes.
See: https://github.com/tango-controls/sys-tango-benchmark-standard-tests

Regarding already available benchmarks, there are measurements of:

response time for attributes, pipes and commands, events subscriptions
dynamic attributes impact on memory consumption
howe the start time is influenced by the number of devices within device servers

Feel free to propose or (event better smile

) to write additional tests.

Piotr

# 5 years ago
Anton	Thanks, Piotr - we'll take a look.

# 5 years ago
Ingvord	Hi, Back to initial Anton's question - Are there any results available already? Are they published somewhere? Thanks! Cheers, Edited 5 years ago

# 5 years ago
pgoryl	@Ingvord, there are at least results of tests made on AWS, for ICALEPCS paper: https://github.com/tango-controls/sys-tango-benchmark-standard-tests/tree/master/aws-ec2-tests All the best, Piotr

# 5 years ago
Ingvord	Hi Piotr, Thanks a lot! That is already interesting to see! Cheers,

# 5 years ago
Ingvord	I have looked through the tests result and it seems to me that Java in unfairly slow. First of all some question to test benchmark itself: was Java server warmed up before measurement? Was it at least started with -server flag? Do I understand correctly that there were a number of AWS instance per client? Was any tuning done before running the tests, like setting jacORB thread pool etc? Sorry if these questions have been answered somewhere - I could not find. I have extracted test server from the benchmark and wrote a simple test here Running the test for 15s with 64 clients (all on a single machine though) gave me 124371 from WriteAttributeCounterCount, while amazon results are typically 6oK (x2 times slower) Anyway I have started to investigate this, you can track the progress here Edited 5 years ago

# 5 years ago

Ingvord

I have looked through the tests result and it seems to me that Java in unfairly slow.

First of all some question to test benchmark itself:

was Java server warmed up before measurement? Was it at least started with -server flag?
Do I understand correctly that there were a number of AWS instance per client?
Was any tuning done before running the tests, like setting jacORB thread pool etc?

Sorry if these questions have been answered somewhere - I could not find.

I have extracted test server from the benchmark and wrote a simple test here

Running the test for 15s with 64 clients (all on a single machine though) gave me 124371 from WriteAttributeCounterCount, while amazon results are typically 6oK (x2 times slower)

Anyway I have started to investigate this, you can track the progress here

Edited 5 years ago