Doing consulting I get to see a lot of errors (who calls in
the consultant when things are working fine?). One of the most common errors to
see on a domain controller is the Netlogon event ID: 5807 error. I see this
error on so many client’s domain controllers that I have a power shell script
just to find the culprits.
Let’s start at the beginning, AD Sites and Services is used
to define subnets so that a client can figure out which site a client it is in,
and which domain controllers that it should use.
Sounds simple enough right? But what happens when there is
no subnet defined?
Human logic would say, “Go to the closest DC then”.
But just how does the client pc figure out who is closest?
Simple answer….It can’t.
A client receives a list of the DCs offering services
according to the sites defined in AD Sites and Services. These are what guide
our PCs to the correct domain controllers for AD services. When you create
subnets and bind them to sites you are setting up the connections for the
clients. So if you bind the subnet 10.0.0.0/24 to Site1, any client in that
subnet will use the DCs in Site1.
Now if we have a subnet that is not defined in AD Sites and
Services, say 10.1.1.0/24, the client will not get a list of the DCs in its
site, but will randomly select one of the DCs in your domain. This sounds good
at first until you think about that satellite sales office in Singapore that
has a congested 512K WAN link, and the fact that now PCs in the London office are
using the DC there for logon services!
Now let’s take a look at the System event log on a DC in a
domain where there are subnets that haven’t been defined. We’re all busy and we
totally meant to define the 10.1.1.0/24 subnet that was used for the new
expansion of the London office but somehow it just slipped through the cracks.
Now we see in the system log the NETLOGON event ID: 5807.
Here’s the error text:
During
the past 4.15 hours there have been 132 connections to this Domain Controller
from client machines whose IP addresses don't map to any of the existing sites
in the enterprise. Those clients, therefore, have undefined sites and may
connect to any Domain Controller including those that are in far distant
locations from the clients. A client's site is determined by the mapping of its
subnet to one of the existing sites. To move the above clients to one of the
sites, please consider creating subnet object(s) covering the above IP
addresses with mapping to one of the existing sites. The names and IP addresses of the clients in
question have been logged on this computer in the following log file
'%SystemRoot%\debug\netlogon.log' and, potentially, in the log file
'%SystemRoot%\debug\netlogon.bak' created if the former log becomes full. The
log(s) may contain additional unrelated debugging information. To filter out
the needed information, please search for lines which contain text
'NO_CLIENT_SITE:'. The first word after this string is the client name and the
second word is the client IP address. The maximum size of the log(s) is
controlled by the following registry DWORD value
'HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Netlogon\Parameters\LogFileMaxSize';
the default is 20000000 bytes. The
current maximum size is 20000000 bytes.
To set a different maximum size, create the above registry value and set
the desired maximum size in bytes.
This DC has seen 132 connections, and the error is telling
us to go look in the log file to see what IPs are the culprits. It’s such a
nice error that it even gives us the path to the file so that we don’t have to
go hunt for it!
So let’s copy the path from the error and paste it into the
Run box. Make sure you don’t copy the single quotes!
The file will look something like this:
So we can see that the offending IPs are in the 10.1.1.x
range. Which is the subnet we forgot to create for the London office. So the
good news is that the clients are able to log in, the bad news is that this is
the DC in the Singapore office that has a congested WAN link!
The solution to the problem is simple, create a subnet for
10.1.1.0/24 and bind it to the London site. Once replication happens, all the
clients that login from PCs within that subnet will be directed to the London
DCs.
The resolution for this error is so easy that you might
getting complacent right now. The problem is that in most mid/large companies
the Network department is the one setting up the physical subnets and the
Server team is responsible for creating the subnets in Sites and Services. This
is where a disconnect usually occurs. A new subnet is set up for clients in a
satellite office by the network team.
The server team now needs to create the subnet in AD, but the Network
team assigns the ticket to the wrong team, or the information gets lost by the
Server team. There’s a thousand ways for it to slip through the cracks.
We don’t tend to have enough time to go through our event
logs enough and not everyone has log monitoring in place (try the demo version
of System Center!). So what I use is a simple script that I run on the first of
each month that goes and grabs the contents of the Netlogon.log files on all my
DCs and returns me a csv of the unique IPs seen in the last month. This way I
can see any subnets that haven’t been defined and define them.
It also helps to catch things like that wireless signal that
was set up just for the test lab and couldn’t possibly connect to the
production network, because you made sure that it was isolate….but hey look at
that there’s your laptop trying to login to our production AD from that
wireless segment. Guess that “isolated test network” wasn’t so isolated after
all, huh?
So here’s the script, It’s not very complicated but there
are a few twists to it that make it work so lets break it down.
First we set up a function to grab all the domain
controllers in the domain and stuff it into an array.
Next we set up a function that will connect to a server’s c$
share to grab the netlogon.log file and push the contents into an array. The
annoying thing about the log file is that it simply appends to the end of the
file but has no year in the date stamp. I’m only really interested in the last
months entries but if I match just June I’ll get the entries from this June and
every other June for the past few years!
The trick is to use
the [array]::Reverse function to flip the array around since the array so that
you are effectively reading the log file from the end. No we just need to loop
through the array until we get an entry that is not this month (nice I run it
on the 1st of the month) or last month. As soon as we see something
from two months ago we break out of the loop. I added a little bit of logic in
that I also check to see if the IP address column is actually an IP address
using a regex match. The reason for this is that if you have the logging level
turned up (a registry entry) the value won’t be an IP Address so we can skip
it.
Now that we have just the entries we want we set up we push
them into a variable and at return that to the main script where we use the
Sort-Object –unique function to get only the unique IP addresses, because we
really don’t care that 10.1.1.2 has 55 entries in the log. We just need to know
it's there, not how many times the IP was logged.
The final part of the script is to set up your email
variables and send the report on its merry way. Just watch out for the $to
variable. You must use a comma separated quoted list for the send-mail function
to work right when using multiple addresses.
Schedule this script to run on the 1st of each
month and you won’t have to worry about missing subnets anymore!
Here’s the full script. As always this is provided as is, use at your own risk.
Labels: AD, powershell