You are not logged in.
Hello, we have a new Openfiler server that we are having a consistent problem with. The intent is to provide 5TB of storage to a service running on a Windows server. We're running 2.3 x86_64, with all conary updates, and our AD is hosted on a pair of 2003R2 servers.
The Openfiler is in the domain and normally everything works fine, with the service reading and serving out somewhere on the order of 100GB/day. However, every Thursday morning, exactly 7 days (168 hours, 10080 minutes) apart, the service stops being able to connect and the openfiler starts throwing the following error in one of the Samba logs:
[2010/03/04 11:13:26, 1] smbd/sesssetup.c:342(reply_spnego_kerberos)
Failed to verify incoming ticket with error NT_STATUS_LOGON_FAILURE!
Here's what we know when this happens:
"wbinfo -t", "wbinfo -u", "net ads status", "smbstatus" all appear fine (though smbstatus does not show an open session for this particular service user, of course).
Time skew is effectively 0.
/etc/hosts and /etc/resolv.conf point to the AD DNS servers and AD name, and all AD servers are explicitly listed in /etc/hosts as well.
Other users on the same Windows server can connect to the same share just fine.
The user that the service runs under, when logged in interactively, can see the same share fine.
Restarting the service on Windows does not help (in fact, the Windows server is rebooted every Saturday night as well).
Restarting winbindd and samba on Openfiler does not help.
Removing openfiler from AD and rejoining (net ads leave; net ads join) does not help.
Active Directory does not list any errors in the Security event log.
Rebooting Openfiler... well, that's unknown. The first time this happened, we rebooted Openfiler early on, but the problem persisted for at least 5 minutes after Samba was back up, then suddenly started working. The second time, we rebooted some minutes later, and the problem went away several minutes after reboot. Today, we tried everything SHORT of rebooting for at least 20 minutes, then rebooted as a last resort, and things worked immediately upon reboot.
We also have a test Openfiler, running the same version (but x86, not x86_64), connected to a different AD. This test version does not exhibit the same behavior, and there are no other glaring differences.
Wondering if there are any known issues with long-running connections, or any hints for what else to look at the next time this happens. Our plan right now is to try to shift this issue from Thursday morning (heavy traffic) to our maintenance window on Saturday night, then figure out the cause.
Any help or pointers would be appreciated!
Offline