Development of Secure Software Systems

CSci 4271 Lab 8

This lab will introduce how you can see some of the technical details of network communication, and what it looks like to snoop on network traffic (though just your own, for technical and ethical reasons). First we'll mention some common tools for checking the configuration of a network and checking how traffic flows to a destination. Then we will have you set up an isolated network environment where you can capture all the packets flowing and inspect their details. In this environment we'll illustrate the difference between encrypted and unencrypted communications, and point out some ways that some information is disclosed even in the presence of encryption. As a reminder, snooping on network traffic can feel fun if you've never done it before, but you need to resist the temptation to do it to non-consenting victims if you try it yourself outside of class.

We've done some setup for this lab that is based on CSE Labs Ubuntu 22.04, and the easiest-to-use tool we recommend for looking at captured packets has a GUI. So our suggestion is that you should do this lab either directly on a lab machine (first choice), or on the "new Vole" Vole-FX3 (second best). The first section could be done on any Linux machine, and you could use a terminal-based packet capturing tool over SSH. But the GUI tool won't work conveniently with X11 forwarding.

IP addresses and testing tools

ip address

One of the most basic questions you might ask when understanding the networking situation on a computer is "what is this computer's IP address?". On recent Linux systems, the command that anwsers this question is easy to remember: it's ip address, which can be abbreviated ip a if you don't feel like typing (the older equivalent was ifconfig). Try running this command on your test machine. Recall that to be precise, IP addresses are associated with interfaces not hosts, which is why the output of this command is structured according to the machine's interfaces. Within the output about each interface, the most relevant line for us is the one labeled inet, which has IPv4 information.

The interface that is printed first is a special case called the "loopback interface", lo for short, which represents the ability to connect back to the same computer, even if no other computers are connected. This is always associated with the reserved IP address 127.0.0.1, and given the special name localhost.

The second interface on the lab machines is a wired Ethernet connection whose interface name will start with en with some other disambiguating suffixes: for instance enp0s31f6 gives the device's location on the PCI bus (compare lspci | grep Ethernet. The workstations in 1-262 currently have IP addresses of the form 134.84.182.(100 + x), where x ranges between 1 and 27 and matches the number in the hostname on the stickers on the machine and monitor. For routing purposes internal to the University, these machines are part of a network with 256 addresses, which is described in CIDR notation as 134.84.182.0/24.�The CIDR notation is a suffix on an IP address, after a slash, that gives the number of bits of the IP address that designate the network (starting from the most significant); the remaining bits out of the 32 designate one host within the network. So the range of IP addresses covered by this network is 134.84.182.0 through 134.84.182.255 inclusive. However the lowest and highest addresses in the network are reserved to refer to the network itself, or to represent broadcasting to the entire network (brd in the ip address output). Other lines in this output give the interface's Ethernet address (link/ether), an information about IPv6 which we'll mostly be skipping (inet6).

host

The simplest command-line interface to DNS lookups is a command named host. If you give it a DNS name, it will translate it into an IP address, or if you give an IP address, it will reverse-translate it into a DNS name. These mappings aren't always unique or inverses of each other. Try mapping the IP address of your workstation forwards and backwards, and also try looking up some other web sites you visit regularly.

ss

Basic information about all the connections currently open on a machine can be accessed with the command ss (successor to an older tool netstat). You might want to try both ss -t4a and ss -t4ar. Both commands are limited to TCP connections (t) and IPv4 (4), and the a option prints both sockets in the listening state (typically servers waiting for connections) and established connects (more often connections where this machine is a client and the other party is the server). The r option controls whether IP addresses are mapped back to hostnames, which is sometimes informative but sometimes makes the results harder to read because the lines are long. Another feature notable in the ss output is port numbers, typically listed after a colon. Servers for commonly-used services listed on standardized low-numbered ports; for instance :ssh stands for port 22, and indicates an SSH server. By contrast the client sides of connections use port numbers randomly selected from a range of large numbers such as 32768-60999. If you are a systems administrator, you can add the -p option to print information about what process is responsible for each socket, but as an unprivileged user this isn't as useful because it only gives information about your own processes.

whois

Several kinds of "phonebook-like" directory information about Internet entities can be looked up with the command whois. It is perhaps best known for looking up domains; for instance the identity of whois umn.edu will probably not surprise you. However this functionality is of decreasing value because many domains list third-party services instead of providing their real information.

More relevant to understanding the structure of IP addresses, whois with an IP address will show you the block from which that IP was drawn, which is a large group of addresses managed by a single entity, often corresponding to the level of network structure used in Internet-wide routing. For instance if you use whois on the address of one of the 1-262 lab machines, you'll see that it is part of a large block 134.84.0.0 - 134.84.255.255, a "/16" in CIDR notation, all of which is used by UMN. In fact, since UMN got most of its IP addresses in the early days of the Internet, its allocations followed an older pattern where the size of a network was predictable from its first bits: first bytes 0-127 were /8 networks, 128-191 were /16 networks, and 192-223 were /24s. UMN's first allocation of 128.101.0.0/16 dates from 1986, and a number of other universities and research institutes got similar allocations around that time. 131.212.0.0/16 was originally for the Duluth campus, and the U also has the aforementioned 134.84.0.0/16, plus 146.57.0.0/16, 160.94.0.0/16, 192.35.86.0/24, and 192.42.152.0/24.

ping

The most basic command for testing your Internet connection to a particular destination is ping, which will send ICMP Echo Request packets once a second and wait for replies. It also reports the round-trip-time of the packets, which defines a kind of distance in the network. The packets that ping uses are sometimes blocked by firewalls, so you might fail to ping a remote destination, but you should have a pretty good success rate from the campus network. You might try pinging destinations that seem like they should be located outside North America, and check what the round trip times are. But if the time is under 50ms, your packets are probably actually staying in North America.

traceroute

Based on a similar principle to ping but collecting more information, traceroute tries to determine an entire sequence of IP-level hops from the beginning to the end of a communications path. It is even more affected by firewalls than ping, so it is becoming increasingly rare to see complete output from it. However you can often see interesting information about which networks are used for connections, and sometimes the DNS names of routers incorporate geographic information. You might try repeating your overseas example from ping with traceroute. If your destination is a web server or otherwise has an open TCP server port, there is also a variant named tcptraceroute that uses TCP connection packets that pass through more firewalls.

Basic packet capture

You can understand more of what's going on on a network, and potentially learn information about other users' communications, by examining the contents of all the packets that pass through a network interface. It would be both against the rules and blocked by security mechanisms to look at all the real traffic on the lab network. But in this lab we'll give you a feel for how it works by setting up an isolated situation where you can see all the packets from your own network operations. We do this with a Linux feature called a network namespace, part of the same mechanism that is used to provide isolated containers for systems like Docker. This namespace is a lot simpler than a full virtual machine because only the network has to be virtualized; for instance the filesystem can be shared with the main host.

You can start a network namespace for this lab's exercises by running the following script (on a CSE Labs Ubuntu 22.04 machine with a GUI):

/web/classes/Spring-2024/csci4271/labs/08/start-namespace.sh

This command will start a new shell with a different prompt; any programs you start from this shell will execute inside the network namespace. In fact, inside this namespace, it will look like you are the root superuser, but you don't actually have any root privileges outside the namespace. The one relevant privilege you have is to listen to network traffic inside the namespace, but since no one else is running anything inside the namespace, the only traffic you can listen to is your own.

If you run ip address inside the namespace, you'll no longer see the Ethernet device that was the main network connection of the host. In its place there is a virtual device named tap0, which is the network of the namespace. There is a software connection from this network to the host system's network, which in turn connects the virtual network to the Internet. The virtual network has its own set of IP addresses that are all in the 10.0.2.0/24 network. This is from one of three IP address blocks (10.0.0.0/8, 172.16.0.0/12, and 192.168.0.0/16) that are seen commonly on different kinds of private networks that connect to the Internet, but are not normally reachable from the Internet. For instance you couldn't usefully run a web server on this virtual network, because no one but you could connect to it.

The easiest packet capture tool to get started with is one called Wireshark, since it has a pretty rich GUI. We've also compiled some tools that can be used just from a terminal (tshark, a sibling program of Wireshark, and tcpdump, the classic command-line tool), but they are a bit less convenient so we won't use them in these directions. You will want to run Wireshark to collect packets while running other commands in the terminal that create network traffic, so you should start Wireshark as a background process with the command:

wireshark &

Wireshark's splash screen will ask you to choose a device to listen to: you should use the recommended default of tap0 by double clicking on it. It will then open a main capture window with three panes (their layout can be changed under Edit/Preferences/Appearance/Layout). The top pane displays one line summarizing each captured packet; the lines are color coded according to rules you can read and modify under View/Coloring Rules. The second pane is a hierarchical presentation (click on the triangles to unfold) of the interpretation of the packet, with higher abstraction layers later in the list. The final pane is a hex dump of the raw packet contents, similar to the hex dumps of files you've seen in previous labs. You probably won't have to look at the raw hex data very often, and you can hide the pane if you are short on space, but you can see how data is encoded because Wireshark knows the relationship between the last two panes and highlights which bytes have which interpretation. If you want to restart the capture from a clean slate to try something new, clicking the green shark fin icon in the tool bar will start a new capture.

One place to start with looking at packet dumps is to repeat some of the query commands from the previous section of the lab (like host and ping), and see what packets implement them. You can also try SSH and browsing the web. The command-line SSH program will work mostly similar to how it works outside the namespace, though it won't see your usual configuration files because of the namespace user-id mapping. For browsing the web, it's easier not to run a full-strength web browser like Firefox or Chrome, since they have some complex extra features, and have some bad interactions with namespace mechanisms. Instead for a GUI web browser, we've set up a simple one that you can start with the command netsurf-gtk.

If you want to do simple web page accesses, you can also use the command-line programs wget, curl, or GET, each of which takes a URL as a command-line argument. By default wget saves the page in a file, while curl and GET use the standard output.

Sniffing web traffic

These days, it is relatively difficult to find major web sites that serve content with unencrypted HTTP: most sites accept HTTP connections, but immediately redirect them to the corresponding page on an HTTPS site. This means that you won't see much web site contents in your packet captures from most sites. However there are still a few sites that can give you a feel for what the "old days" of unencrypted web content were like. For instance, Scholarpedia (http://www.scholarpedia.org) is a site that is inspired by and visually modeled after Wikipedia, but requires articles to be peer-reviewed. Though it appears to still be maintained its content is more limited and not updated as quickly as Wikipedia. If you browse articles here, you'll find that the URLs requested and all of the content can be found in the packet capture.

However, there are still some more subtle information disclosure risks that can occur even with encrypted web connections. For instance, suppose a user is browsing the web for information about a medical condition they are worried they might have: it is easy to imagine that the user feels a privacy interest around not having others know about their medical condition. To take a simple extreme case, suppose an attacker has narrowed down from other information that the victim is viewing the Wikipedia page for either Influenza or Hair loss: can the attacker tell which page was just viewed?

One simple piece of information still available to an attacker is the size of the web page: the size of the page doesn't change much over time, and is only slightly increased by the encryption process. If the attacker has measured the sizes of the two possible pages in advance, a new download will be easy to distinguish. Try this attack out. A simple way in Wireshark to see the amount of information that has been transmitted over a TCP connection after it closes is to look at the Seq and Ack fields in the final packets: they are swapped based on the packet direction, but they represent the total amount of data sent in the two different directions. The experiment is simplest if you use a command-line program that just downloads a single web page without other resources like images. You can also compare the size of the TCP stream to the size of a file storing the downloaded contents.

Timing of keystrokes over SSH

Another place where timing can reveal information even for encrypted data can be seen with SSH. If you open an SSH connection from the network namespace and watch a packet capture, you will see that each time you type a single key there will be two bursts of encrypted traffic, first from client to server and then from server to client. There is traffic in both directions because in most typing scenarios, the keys you type appear immediately in the terminal; this is called "echo". The contents of the packets don't obviously reveal anything about what key was pressed: for instance even if you type the same key repeatedly, you'll see that the packet contents all look different and random. However you can also see that the timing of the packets is recorded quite precisely; and unless you do something like cut and paste, it is very difficult to type fast enough for multiple keystrokes to be combined in one packet. This means that an attacker on the network still has quite detailed information about the timing of your keystrokes. This risk was analyzed in a classic 2001 paper on Timing Analysis of Keystrokes and Timing Attacks on SSH.