PDA

View Full Version : [VU+ Solo SE] The box hangs with build 5.0.025, how to trace?



Boulder
30-08-17, 19:29
My SE V2 box hangs unexpectedly every once in a while during standby mode and I need to turn the power off and back on again to make it work. I can watch things over my NFS share (used as the HDD replacement for the box), I then put the box on standby mode. Some time during the next day, the box won't wake up with the remote.

This issue first occurred with 5.0.021 and I was able to make things work by downgrading to 5.0.017 and restoring settings. After testing 5.0.025 for some days, I noticed that it also hangs. This time downgrading to 5.0.017 didn't help, it just hung today the exact same way. I have now once again tried reinstalling 5.0.017 (and imported settings I know to have worked for months). We'll see tomorrow if the hangs continue also with the old version.

It is rather irritating, and I would very much like to find out why this happens. Is there a way to debug it properly so I can see what happens during standby mode? I have a feeling that this has something to do with the network share because I sometimes cannot access my NFS or CIFS shares from VU+. I can access them from other devices on my network so the actual shares or firewall settings etc. are not the problem.

gear259
30-08-17, 22:06
Check your deep standby settings

Sent from my FRD-L09 using Tapatalk

birdman
31-08-17, 00:43
This issue first occurred with 5.0.021 and I was able to make things work by downgrading to 5.0.017 and restoring settings. After testing 5.0.025 for some days, I noticed that it also hangs. This time downgrading to 5.0.017 didn't help, it just hung today the exact same way.Which sounds as though it's nothing to do with 5.0.17 vs 5.0.2x but rather a random issue (such as timing of events...).

How is your NFS server set-up? Does it have a static/fixed IP address and is the server permanently up?

Boulder
31-08-17, 04:06
Thanks for the replies.

All the devices on my local network have a static IP based on their MAC address and the server is up permanently. As it is located on my desktop PC, it can sometimes be down for a moment if I have to reboot. I do recall that the first time the problem occurred (with 5.0.021), I had changed the share to use AUTOFS as I read that it was recommended. Now it is set back to FSTAB and I have also enabled automount to occur every 4 hours. Those settings worked without issues throughout the summer. At least the box hasn't hung now after it's been in standby mode for several hours.

birdman
31-08-17, 11:25
I had changed the share to use AUTOFS as I read that it was recommended.It's certainly what I would use...


Now it is set back to FSTAB and I have also enabled automount to occur every 4 hours.Why would you have both? They'd need to be mounting to different places.
And automount is a running daemon process, not something that runs "every 4 hours".

Boulder
31-08-17, 11:40
It's certainly what I would use...

Why would you have both? They'd need to be mounting to different places.
And automount is a running daemon process, not something that runs "every 4 hours".

I would expect that if I have a share first set to FSTAB and then I change it to AUTOFS, it would remove the mount from the fstab file so they are probably both not active at the same time.

The GUI says that auto remount occurs every xx hours if it's enabled so that's what a normal user expects :)

I guess I could try once again upgrading to 5.0.025. This time I'll also remove the mounts and make sure that fstab is empty what comes to them. The problem last time was that the network browser didn't find my shares on the desktop PC whereas other devices found them just fine.

Boulder
31-08-17, 16:53
I just upgraded to 5.0.025 and made sure that AUTOFS is on and fstab has no entries for the share.

The thing which bothers me is that the network browser doesn't find any shares on my PC. There are many shares; two public NFS shares and several CIFS ones. Which means that if I now delete the existing share, I won't be able to get it back by browsing to it. I only use one share and have had no need to change anything but the problem appeared to me when I upgraded the first time and added all the settings from scratch.

birdman
01-09-17, 02:07
The GUI says that auto remount occurs every xx hours if it's enabled so that's what a normal user expects :)But that's not how automount (== autofs) works. It mounts on demand, and tries to unmount every 10mins (by default - can be configured) and ignores any failure to do so (such as it still being in use).
[When you have ~10,000 potential mounts it's the only way to handle things. It does work...]

Boulder
01-09-17, 16:18
And the box is now frozen. It was put to standby mode last evening, less than 24 hours ago.

I'm going to rollback to 5.0.017 and restore the settings from a backup which I know to work, then see if it hangs during the weekend. Unless you have some clue on what to try next.

EDIT: Based on how hot the top of the box is, I'd say it's running the CPU at 100%. Still, it doesn't respond to the remote nor can I enter the web interface.

Boulder
04-09-17, 20:07
It's still working fine with 5.0.017. If I upgrade to 5.0.025, it's certain that within the next 24 hours, it's a jammed box. v5.0.017 still has the issue that the network browser doesn't find the shares on my PC but it works as long as the old mount is kept there.

Boulder
06-09-17, 18:49
OK, just had the hang in 5.0.017.

What I noticed this time is that the hang probably happened during a timer recording. I had auto timers today and one of them was supposed to record something between 17:59 - 19:10. When I noticed that the box was not responding, I turned it off and on again, then checked the folder where the recorded files are. The recorded file was over 2,5 hours long which means that it never stopped recording and probably kept on until I switched the power off.

I have now enabled debug logging and will see what goes on there when something like this happens again.

ccs
06-09-17, 19:00
If you highlight a completed timer in the timer-list and press info, you'll get a lot more detail of what it did.

Boulder
06-09-17, 19:25
Weird, the screwed recording shows nothing about the actual recording there. Just two items - ones that are from the time when the timer was created by the autotimer.

birdman
07-09-17, 01:14
The recorded file was over 2,5 hours long which means that it never stopped recording...which is not too surprising given that the controlling process had hung and hence couldn't stop it.

Do you have a debug log from when this happened (i.e. the one that was running when the box hung and this recording overran)?

Boulder
07-09-17, 03:52
Do you have a debug log from when this happened (i.e. the one that was running when the box hung and this recording overran)?

No, I only enabled debug logging after the most recent hang. I can try reproducing the issue quicker by upgrading to 5.0.025, it doesn't seem to take longer than 1-2 days to occur then.