Holiday Fun: Debugging a Home Network with Loggly

By Hoover J. Beaver 06 Jan 2015

After a recent rainstorm with accompanying 40+mph wind gusts, I was having intermittent Internet connectivity problems. There were days went connectivity would drop for 20-30 seconds dozens of times. Then there would be stretches of perfectly solid connectivity. After turning off some extra devices and access points to simplify the LAN, there was some improvement, but it wasn’t perfect. It was time to start getting serious about debugging the issue, and the Christmas holiday provided the time necessary to put things in place. I had hoped for more time away from my day job running technical operations at Loggly, but luckily I had the skills and the log management service to get the debugging done quickly.

Looking at the logs on the home router, I quickly discovered that it had been rebooting itself. How often was impossible to tell, since the old logs were wiped at each reboot. I needed to get the logs from this router and a few other devices in one place so I could nail down what was going wrong. This necessitated configuring the only spare machine I had lying around; an old Aluminum MacBook Pro with OS X 10.4 installed.

The setup was fairly trivial, following the basic ideas from a MacWorld OS X Hints article. Once configured, the router setup was updated to pass syslog events to the MacBook. But I didn’t want to just sit there with the logs streaming waiting for something to break. Being able create a simple visual to look at the state of my network would also make quick work of narrowing down which logs to examine. Thankfully, I could pass my events to Loggly and they would provide the tools to visualize and correlate issues.

The version of syslog available on the old MacBook did not allow for altering the log headers as necessary to pass on them onto Loggly. I was going to need to have a script to watch the syslog file on the MacBook and pass the events to Loggly. With that in mind, I added some functionality to reformat the events into JSON to simplify parsing. The resulting script below tails a syslog file, reformats the log lines at JSON, then sends the events to Loggly over HTTP/S.

	#!/bin/bash
	## Read from a logfile/tail and send out json to Loggly.
	## Does simple parsing on firewall events, but nothing else.

	tail -Fn0 /var/log/all.log \| while read line; do
	## Did this come from my router
	if [[ "${line}" == "192.168.1.254" ]]; then
	## I ignore the original timestamp from the log file and
	## replace it with current time.
	## That will be within ms of reality.
	## datetime=`echo $line \| awk '{ print $1,$2,$3 }'`
	datetime=`date -u +"%Y-%m-%dT%H:%M:%SZ"`
	## What subsystem of the router is logging
	source=`echo $line \| awk '{ print $6}' \| cut -f1 -d:`
	## The log message.
	value=`echo "$line" \| cut -d: -f4-`
	## echo "$source - $value"

	## If this is something from the firewall, break it
	## down further.
	if [[ "${source}" == "fw" ]]; then
	src=`echo $value \| awk '{ print $1 }' \| cut -f2 -d=`
	dst=`echo $value \| awk '{ print $2 }' \| cut -f2 -d=`
	ipprot=`echo $value \| awk '{ print $3 }' \| cut -f2 -d=`
	sport=`echo $value \| awk '{ print $4 }' \| cut -f2 -d=`
	dport=`echo $value \| awk '{ print $5 }' \| cut -f2 -d=`
	message=`echo $value \| awk '{ print $5 }' \| \
	cut -f2 -d= \| cut -f2-`
	curl -s -k -X POST https://logs-01.loggly.com/inputs/[AUTHTOKEN]/tag/uverse -d "{ \"timestamp\": \"${datetime}\", \"uverse\": { \"source\": \"${source}\", \"value\": { \"src\": \"$src\", \"dst\": \"$dst\", \"ipprot\": \"$ipprot\", \"sport\": \"$sport\", \"dport\": \"$dport\", \"msg\": \"$message\" }, \"raw\": \"$line\" } }" > /dev/null

	## Everything else... so far
	else
	curl -s -k -X POST https://logs-01.loggly.com/inputs/[AUTHTOKEN]/tag/uverse -d "{ \"timestamp\": \"${datetime}\", \"uverse\": { \"source\": \"${source}\", \"value\": \"${value}\", \"raw\": \"${line}\" } }" > /dev/null
	fi
	fi
	done

view raw gistfile1.txt hosted with ❤ by GitHub

The above would now catch when my router rebooted and any other noteworthy events, but wouldn’t necessarily identify my problem. Unfortunately, recent updates to the Apple Airport Utility software have made it impossible to get Airport access points to send syslog events to a server. I needed another method to probe those devices. I wrote another small script to ping each device on the network I cared about. The results of the ping would then get massaged into a JSON message and sent to Loggly. The script would check each internal access point, the internal and external interface of the router, the first hop from my house onto the Internet, and finally an additional Internet destination.

	#!/bin/bash

	## check-link2.sh
	## Andy Mayhew <andy@loggly.com>
	## Check the status of internal network devices and the uplink.
	## Simple ping of each device. Then a simple ping of each side of the
	## router, first hop onto the Internet,
	## and finally a "distant" device.
	## Log the ping response time or failure, build up a json event and
	## send to loggly.
	##
	## usage: nohup ./check-link2.sh &

	## List of devices
	devName=("inside" "outside" "firstHop" "web.mit.edu" "AP Closet")
	devIP=("192.168.1.254" "255.1.1.1" "255.1.2.3" "23.207.46.93" "192.168.1.250")

	## Declare some global variables and set defaults.
	tag="network2" ## tag our events in Loggly
	token="[AUTHTOKEN" ## Loggly Auth Token
	isUp="true"
	failedEvents=()

	## Sends an event to Loggly. Requires the log event as an parameter.
	sendEvent() {
	event=$1
	curl -s -k -X POST https://logs-01.loggly.com/inputs/${token}/tag/${tag} -d "${event}"
	}

	## Attempts to ping a device and record the result.
	processPing() {
	dest=$1; destName=$2
	results=`ping -c1 -t1 $dest \| tail -2`
	rtt=0
	packets=0
	up="false"
	if [[ "${results}" == "round-trip" ]]; then
	rtt=`echo ${results} \| awk '{ print $(NF-1)}' \| \
	cut -f1 -d/`
	packets=1
	up="true"
	fi

	echo "\"${destName}\": { \"name\": \"$destName\", \"dest\": \"$dest\", \"rtt\": $rtt, \"packets\": $packets, \"upStatus\": $up }"
	}

	## We just loop forever.
	## Could've fork, forked to daemonize, but chose to just
	## use nohup when I call the script.
	while true; do
	rtt=""
	## Cycle through the list of devices.
	for (( i = 0; i < ${#devIP[@]} ; i++ )) do
	## echo "$i - ${devIP[$i]} - ${devName[$i]}"
	if [ "${#rtt}" -eq 0 ]; then
	rtt=$(processPing "${devIP[$i]}" "${devName[$i]}")
	else
	rtt="${rtt}, $(processPing "${devIP[$i]}" "${devName[$i]}")"
	fi
	done

	## Time stamp for this event.
	timedate=`date -u +"%Y-%m-%dT%H:%M:%SZ"`

	## Check the results from the ping tests. If any fail, set the global isUp flag to down (false)
	if [[ "${rtt}" == "false" ]]; then
	isUp="false"
	else
	isUp="true"
	fi

	## Build the event message, with ISO timestamp, global isUp flag,
	## and individual responses.
	message="{ \"timestamp\": \"${timedate}\", \"isUp\": ${isUp}, $rtt }"

	echo "Sending: $message"
	rez=$(sendEvent "${message}")

	## check whether curl command worked. If not, push onto stack.
	if [[ "${rez}" != '{"response" : "ok"}' ]]; then
	if [ ${#failedEvents[@]} -eq 0 ]; then
	failedEvents[0]="${message}"
	else
	failedEvents=("${failedEvents[@]}" "${message}")
	fi
	fi

	sleep 5
	## Attempt to resend failed events
	if [ ${#failedEvents[@]} -gt 0 ]; then
	for i in "${!failedEvents[@]}"; do
	echo "Resending: ${failedEvents[$i]}"
	rez=$(sendEvent "${failedEvents[$i]}")
	if [[ "${rez}" == "ok" ]]; then
	unset failedEvents[$i]
	fi
	done
	fi
	done

view raw gistfile1.txt hosted with ❤ by GitHub

With the data now flowing into my Loggly account, everything I needed to debug my issues were all in one place. I created dashboard widgets to be able to see when any element of my network was down, along with the ping response times from the network checks. The visual elements allowed me to quickly see when something broken or if I was just experiencing a network slowdown.

From the visual view, I could quickly click on an event and see the precise logs from that time period. Then it was a trivial matter of looking at a few log lines from my various sources to find the culprit. In the case above, I could see in the logs that one of my internal WiFi access points was flapping. It kept losing radio sync with the main AP and would drop off the wireless network.

In the end, I found multiple points of failure in my network. The router from AT&T was flaking out, the U-Verse link was losing sync, and an access point on my internal network was flapping. A couple of days with some AT&T techs on-site to replace the router and the fibre from the pole to the house remedied the first two problems. And the problem access point has been turned off until I can make an Ethernet run from it to the router instead of letting the device expand the WiFi network. If I hadn’t been able to put all my logs in one place and easily created visualizations of the data, I would most likely still be trying to figure out what was going wrong.

Postscript: Telling AT&T customer support that, “the logs from the router indicate it is rebooting itself,” instantly gets your problem escalated within the support chain and a Tech at your house a few hours later on a Sunday. 🙂

The Loggly and SolarWinds trademarks, service marks, and logos are the exclusive property of SolarWinds Worldwide, LLC or its affiliates. All other trademarks are the property of their respective owners.

Hoover J. Beaver

Holiday Fun: Debugging a Home Network with Loggly

The Loggly and SolarWinds trademarks, service marks, and logos are the exclusive property of SolarWinds Worldwide, LLC or its affiliates. All other trademarks are the property of their respective owners.

Latest Posts

Debugging a .NET Application with Loggly

Getting the Most Out of Java With SolarWinds Loggly

Using SolarWinds Loggly to Get the Most Out of MongoDB Structured Logging

Getting the Most Out of Python with SolarWinds Loggly

Best Practices for Client-Side Logging and Error Handling in React

Holiday Fun: Debugging a Home Network with Loggly

The Loggly and SolarWinds trademarks, service marks, and logos are the exclusive property of SolarWinds Worldwide, LLC or its affiliates. All other trademarks are the property of their respective owners.

Latest Posts

Debugging a .NET Application with Loggly

Getting the Most Out of Java With SolarWinds Loggly

Using SolarWinds Loggly to Get the Most Out of MongoDB Structured Logging

Getting the Most Out of Python with SolarWinds Loggly

Best Practices for Client-Side Logging and Error Handling in React

Categories

Related blog posts