Message boards : Number crunching : 1 task takes Very Very long time.
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Jan 18 Posts: 3 Credit: 60,935 RAC: 0 |
https://boinc.multi-pool.info/latinsquares/result.php?resultid=437216883 name: odlkmax_26000_1672832179.366460_2 Env: Linux(Fedora 37) It already takes over 4 hours. Does it find something special? Or just a bug? Others works fine. |
Send message Joined: 6 Sep 21 Posts: 2 Credit: 13,240,110 RAC: 37,901 |
I get those long buggy tasks all the time. Definitely a bug. 🛠And will finish as a error to... Better kill ✋ the task. My screen is usually always off so I never see them in time to kill the tasks. Server state Over Outcome Computation error Client state Compute error Exit status 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED Computer ID 110328 Run time 7 hours 56 min 13 sec CPU time 7 hours 51 min 33 sec Validate state Invalid Credit 0.00 Device peak FLOPS 4.27 GFLOPS Application version odlk3@home v1.00 windows_x86_64 |
Send message Joined: 28 Jan 18 Posts: 3 Credit: 60,935 RAC: 0 |
Thank you. |
Send message Joined: 7 Nov 17 Posts: 29 Credit: 13,770,308 RAC: 187 |
This is a real bug in the task. Unfortunately there are more than 50k such tasks. And the most annoying thing is that if we just kill this task on the client, then it is simply transferred to another client. I have studied such errors. Mostly the reason is that the squares of the beginning and end of the problem are swapped. This is what leads to the fact that it takes several weeks to get the result. And this is not possible in the boink client (this is how it is written in the code and this is normal). Such tasks will be finally rejected by the server after 99999 iterations. If, for example, one iteration is one day, then 99999 x 1 = 99999 / 365 = 273 That is, for one such task, it will take 273 years to cancel. |
Send message Joined: 22 Oct 17 Posts: 3083 Credit: 0 RAC: 0 |
I have studied such errors. Demis Please provide an example of such a task. So that there are two squares - the beginning and the end, which are mixed up in places. |
Send message Joined: 7 Nov 17 Posts: 29 Credit: 13,770,308 RAC: 187 |
input_odlkmax_27_0076651_00105.txt input_odlkmax_27_0076651_00105-to-check.txt 0 1000000 0 9 8 4 7 6 3 2 5 1 2 1 4 6 5 3 8 9 0 7 6 0 2 9 8 4 1 3 7 5 1 7 6 3 9 8 4 5 2 0 9 2 3 0 4 7 5 8 1 6 8 3 0 1 2 5 7 6 9 4 7 4 5 8 0 9 6 1 3 2 3 5 9 2 6 1 0 7 4 8 4 6 7 5 1 2 9 0 8 3 5 8 1 7 3 0 2 4 6 9 0 9 8 4 7 6 3 2 5 1 8 1 7 5 3 2 9 4 0 6 4 5 2 9 6 0 1 3 7 8 9 7 6 3 1 8 4 0 2 5 6 2 0 1 4 7 5 8 9 3 3 8 1 6 2 5 0 9 4 7 7 4 5 8 0 9 6 1 3 2 2 3 9 0 5 1 8 7 6 4 1 6 3 7 9 4 2 5 8 0 5 0 4 2 8 3 7 6 1 9 ./main_mosch_kf_2 input.txt output.txt 99999 PowerMeter2 intervals LS10 Start time:2023-01-18.15:39:06 File 'output.txt' created. Line N27 start: 0 9 8 4 7 6 3 2 5 1 2 1 4 6 5 3 8 9 0 7 6 4 2 5 9 8 1 3 7 0 1 7 5 3 6 0 4 8 9 2 8 0 1 9 4 7 5 6 2 3 7 3 0 1 2 5 9 4 6 8 4 2 7 8 1 9 6 0 3 5 3 5 9 0 8 4 2 7 1 6 9 6 3 7 . . . . 8 . 5 . . . . . . . . 9 Range end. Desired square not reached.23-01-23.02:48:44 File 'out-not-finded-last-dirty-ls-99999.txt' created. The dirty square of the end point of the search is recorded. 0 9 8 4 7 6 3 2 5 1 2 1 4 6 8 3 9 5 0 7 1 4 2 5 0 8 7 3 9 6 8 7 6 3 1 9 4 0 2 5 9 2 5 0 4 7 8 6 1 3 3 8 7 9 2 5 0 1 6 4 4 5 1 8 3 0 6 9 7 2 6 0 9 2 5 4 1 7 3 8 7 6 3 1 9 2 5 4 8 0 5 3 0 7 6 1 2 8 4 9 See file: out-not-finded-last-dirty-ls-99999.txt Time work: 181095 sec Finish time:2023-01-23.02:48:44 Program is end . . . last dls (dirty latin squares, file out-not-finded-last-dirty-ls-99999.txt) 0 9 8 4 7 6 3 2 5 1 2 1 4 6 8 3 9 5 0 7 1 4 2 5 0 8 7 3 9 6 8 7 6 3 1 9 4 0 2 5 9 2 5 0 4 7 8 6 1 3 3 8 7 9 2 5 0 1 6 4 4 5 1 8 3 0 6 9 7 2 6 0 9 2 5 4 1 7 3 8 7 6 3 1 9 2 5 4 8 0 5 3 0 7 6 1 2 8 4 9 |
Send message Joined: 22 Oct 17 Posts: 3083 Credit: 0 RAC: 0 |
input_odlkmax_27_0076651_00105.txt What makes you think that these squares are mixed up in places? They are not mixed up, the second square is "larger" than the first, as it should be. The problem is that between these squares there is almost no CF SN DLS. Start the CF generation from the starting first square and see how the generation goes. It is unlikely that a million CF SN DLS will be accumulated in this interval. Therefore the task also hangs up. But it is difficult or even impossible to track this in advance. Generation goes like this . . . . . . . . . СÐДЛК: 56000000 КФ: 14508 времÑ: 320 Ñек СÐДЛК: 56500000 КФ: 14508 времÑ: 323 Ñек СÐДЛК: 57000000 КФ: 14508 времÑ: 326 Ñек СÐДЛК: 57500000 КФ: 14508 времÑ: 329 Ñек СÐДЛК: 58000000 КФ: 14508 времÑ: 332 Ñек СÐДЛК: 58500000 КФ: 14508 времÑ: 334 Ñек СÐДЛК: 59000000 КФ: 14508 времÑ: 337 Ñек СÐДЛК: 59500000 КФ: 14508 времÑ: 341 Ñек СÐДЛК: 60000000 КФ: 14508 времÑ: 345 Ñек СÐДЛК: 60500000 КФ: 14508 времÑ: 347 Ñек СÐДЛК: 61000000 КФ: 14508 времÑ: 350 Ñек СÐДЛК: 61500000 КФ: 14508 времÑ: 353 Ñек СÐДЛК: 62000000 КФ: 14508 времÑ: 356 Ñек СÐДЛК: 62500000 КФ: 14508 времÑ: 358 Ñек СÐДЛК: 63000000 КФ: 14508 времÑ: 361 Ñек СÐДЛК: 63500000 КФ: 14508 времÑ: 363 Ñек СÐДЛК: 64000000 КФ: 14508 времÑ: 366 Ñек СÐДЛК: 64500000 КФ: 14508 времÑ: 369 Ñек СÐДЛК: 65000000 КФ: 14508 времÑ: 373 Ñек СÐДЛК: 65500000 КФ: 14508 времÑ: 378 Ñек СÐДЛК: 66000000 КФ: 14508 времÑ: 381 Ñек СÐДЛК: 66500000 КФ: 14508 времÑ: 384 Ñек СÐДЛК: 67000000 КФ: 14508 времÑ: 387 Ñек СÐДЛК: 67500000 КФ: 14508 времÑ: 391 Ñек СÐДЛК: 68000000 КФ: 14508 времÑ: 396 Ñек СÐДЛК: 68500000 КФ: 14508 времÑ: 399 Ñек СÐДЛК: 69000000 КФ: 14508 времÑ: 403 Ñек СÐДЛК: 69500000 КФ: 14508 времÑ: 407 Ñек СÐДЛК: 70000000 КФ: 14508 времÑ: 410 Ñек СÐДЛК: 70500000 КФ: 14508 времÑ: 414 Ñек СÐДЛК: 71000000 КФ: 14508 времÑ: 418 Ñек . . . . . . . . . . . . . . PS. I don't understand anything you wrote next. Perhaps ice00 understands this. |
Send message Joined: 22 Oct 17 Posts: 3083 Credit: 0 RAC: 0 |
By the way, if this task is always “killedâ€, then the CF SN DLS, which are in this interval, will not be checked. |
Send message Joined: 7 Nov 17 Posts: 29 Credit: 13,770,308 RAC: 187 |
I took an example, for this year, which was on my client on 01/05/2023. In general, problematic tasks (which run for more than 4 hours) come to the client's computer of several types: 1. The first and second square from different rulers. 2. The first and second squares are from the same ruler, but the distance between them is excessively large (does not fit into the standard working time of the client). 3. The first and second squares from the same ruler, but the distance between them is large (does not fit into the number 99.999 of the end of the task on the server). 4. The first and second squares are from the same ruler, but are mixed up in places (very often the real distance between them == (the size of the entire ruler) - 1.000.000). 5. There were a few more options, they are (mostly) mixed variations of the first four. The specific example shown is option #3. In this example, it can be seen that after 99.999 iterations of 1.000.000 LS (99.999 * 1.000.000 = 99.999.000.000), the task will not be completed. I showed above how to calculate approximately the time of one iteration. It took me about 5 days to calculate it out. Ask ice00 in a week or two at what stage the input_odlkmax_27_0076651_00105.txt task is at. |
Send message Joined: 7 Nov 17 Posts: 29 Credit: 13,770,308 RAC: 187 |
|
Send message Joined: 7 Nov 17 Posts: 29 Credit: 13,770,308 RAC: 187 |
|
Send message Joined: 7 Nov 17 Posts: 29 Credit: 13,770,308 RAC: 187 |
|
Send message Joined: 7 Nov 17 Posts: 29 Credit: 13,770,308 RAC: 187 |
odlk3_29028_1676163624.111103_1 https://boinc.multi-pool.info/latinsquares/result.php?resultid=442895478 odlkmax_19753_1675303115.839181_7 https://boinc.multi-pool.info/latinsquares/result.php?resultid=442268818 |
Send message Joined: 7 Nov 17 Posts: 29 Credit: 13,770,308 RAC: 187 |
|
Send message Joined: 7 Nov 17 Posts: 29 Credit: 13,770,308 RAC: 187 |
I have studied such errors. https://boinc.multi-pool.info/latinsquares/result.php?resultid=448332584 |
Send message Joined: 22 Oct 17 Posts: 3083 Credit: 0 RAC: 0 |
I have studied such errors. I don't see any squares on the given link. I don't have access to the server. Задача 386752768 |
©2024 ©2024 Progger & Stefano Tognon (ice00) & Reese