Alceu R. de Freitas Jr.
2011-06-03 13:35:54 UTC
Hello everybody,
I started doing some tests by running a Perl program in a two node cluster.
One node has 4 CPU's and the other has 2.
The Perl program will fork 6 child process in total and my expectations is that I could use at least 50% of all CPU's during this process.
I started executing the program with no DISTANT_FORK or CAN_MIGRATE capabilities enabled. Since I was starting the program from the node with 4 CPU's, all the CPU's in this node were being used in a average of 80%. I think this is the expected behavior.
After adding DISTANT_FORK, CAN_MIGRATE or both capabilities, I was able to see that the second node CPU's were being used, but with an average of 10%. This usage average could be saw in the first node too. Running this way the overall performance is so low that running the same algorithm with a single process (or using a single node with all CPU's being used) is actually faster to finish the data processing.
The program itself does very few I/O operations and zero network operations (well, almost, since they're writing exclusive files per child process in the NFS root file system). I noticed that the network was being heavily used (by checking with ifconfig).
My questions:
1 - Am I using the scheduler correctly? Since the program will fork 6 child, but as soon one child finishes it's job, it will be terminated and another child process will be forked, I believe it would be better to execute a distant_fork than migrating the process. But CPU usage is too low.
2 - Is there any specific care to be taken to have a better CPU utilization?
3 - Is there any way to check network usage (and try to reduce)?
4 - The regular tools used to find bottlenecks in Linux (vmstat, iperf, ntop and sar for example) can be used within a Kerrighed cluster?
Thanks,
Alceu
I started doing some tests by running a Perl program in a two node cluster.
One node has 4 CPU's and the other has 2.
The Perl program will fork 6 child process in total and my expectations is that I could use at least 50% of all CPU's during this process.
I started executing the program with no DISTANT_FORK or CAN_MIGRATE capabilities enabled. Since I was starting the program from the node with 4 CPU's, all the CPU's in this node were being used in a average of 80%. I think this is the expected behavior.
After adding DISTANT_FORK, CAN_MIGRATE or both capabilities, I was able to see that the second node CPU's were being used, but with an average of 10%. This usage average could be saw in the first node too. Running this way the overall performance is so low that running the same algorithm with a single process (or using a single node with all CPU's being used) is actually faster to finish the data processing.
The program itself does very few I/O operations and zero network operations (well, almost, since they're writing exclusive files per child process in the NFS root file system). I noticed that the network was being heavily used (by checking with ifconfig).
My questions:
1 - Am I using the scheduler correctly? Since the program will fork 6 child, but as soon one child finishes it's job, it will be terminated and another child process will be forked, I believe it would be better to execute a distant_fork than migrating the process. But CPU usage is too low.
2 - Is there any specific care to be taken to have a better CPU utilization?
3 - Is there any way to check network usage (and try to reduce)?
4 - The regular tools used to find bottlenecks in Linux (vmstat, iperf, ntop and sar for example) can be used within a Kerrighed cluster?
Thanks,
Alceu