This article is the second in a series that is designed to help readers to assess the risk that their Internet-connected systems are exposed to.
In the first installment, we established the reasons for doing a technical risk assessment. In this installment, we'll start discussing the methodology that we follow in performing this kind of assessment.
This article is the second in a series that is designed to help readers
to assess the risk that their Internet-connected systems are exposed
to. In the first installment,
we established the reasons for doing a technical risk assessment. In
this installment, we'll start discussing the methodology that we follow
in performing this kind of assessment.
Why all the fuss about a Methodology?
If you ever read anything SensePost
publishes on assessments, or if you attend our training, you'll notice
that we tend to go on a bit about methodology. The methodology is the
set of steps we follow when performing a certain task. We try and work
according to methodologies with just about everything we do. We're not
fanatical about it, and the methodologies change and adapt to new and
different environments, but they always play a big role in the way we
approach our work. Why such a big fuss then? There are a few good
reasons for performing assessments according to a strict methodology:
Firstly, it gives us a game plan. Rather
then stare blankly at a computer screen or a network diagram, an
analyst now has a fixed place to start and a clear task to perform.
This takes the whole "guru" element out what we do. The exact steps
that we will follow are clear, both to us, and to the customer, from
the outset.
Secondly, a methodology ensures that our work is consistent and complete.
I've worked on projects where the target organization has in excess of
150 registered DNS domains. Can you imagine how many IP addresses that
eventually translates to. I don't have to imagine - I know it was
almost 2000. Consider how hard it must be to keep track of every DNS
domain, every network and every IP to ensure that you don't miss
something. Consider also what happens when the actual "hacking" starts
(we'll get to this later) and the analyst's heart is racing. A strict
methodology ensures that that we always cover all the bases and that
our work is always of the same quality. This holds true, no matter how
big all small the environment is that you're assessing.
Finally, our methodology gives our customers something to measure us against.
Remember, to date there are really no norms or standards for technical
assessment work. How does the customer know that she's getting what she
paid for? This is an especially pertinent question when the assessment
findings are (how can I put this?) dull. By working strictly according
to a sensible methodology with clear deliverables at each stage we can
verify the quality of the assessment even when there's very little to
report.
A Methodology that Works
I'm completely sure that, when it comes to
security assessment, there's more then one way to skin the cat. What
follows is a description of a methodology that we like to use when
performing security assessments over the Internet. It's certainly not
the only way to approach this task, but it's one way that works, I
believe.
1. Intelligence Gathering
The first thing we do when we begin an assessment
is to try and figure out who the target actually is. Primarily we use
the Web for this. Starting with the customer's own Web site(s), we mine
for information about the customer that might be helpful to an
attacker. Miscellaneous tidbits of useful data aside, our primary
objective is to derive the DNS domain names that the target uses. If
you're assessing your own infrastructure, you may already have this
information but if the organization is big, it can be a fascinating
exercise. Later, these domain names will be mapped to the IP addresses
we will actually analyze. Some companies have a small Internet
presence, and discovering the DNS names they use may be simple. Other
companies we've worked with have hundreds of domains, and discovering
all of them is no mean feat.
How do we get the DNS domain names? Well, usually
we have an e-mail address, the company's name or some other logical
place to begin. From there we have a number of techniques:
- We use search engines to search all
instances of the company's name. This not only provides links to the
company's own site (from which DNS domain information can be easily
derived), we also obtain information about mergers and acquisitions,
partnerships and company structure that may be useful.
-
We use a tool like httrack
to dump all the relevant Web sites to disk. We then scan those files to
extract all mail and HTTP links, which are then parsed again to extract
more DNS domains.
-
Then, we use the various domain registries. Tools like geektools.com, register.com and the like are simple and can often be used in one of two ways:
-
To help verify whether the domains we have identified actually belong to the organization we are assessing.
- To extract any additional information that may be recorded in
a specific domain's record. For example, you'll often find that the
technical contact for a given domain has provided an e-mail address at
a different domain. The second domain then automatically falls under the spotlight as a potential part of the assessment.
-
Many of the registries provide for wildcard searches. This allows us to
search for all domains containing a certain string, like "*abc*". I
would use such a search to identify all the domains that may be
associated with the company ABC Apples Inc, for example.
-
Then, we need to apply some human intelligence - using the information
we read on Web sites, company reports and news items we attempt to make
logical jumps to other domains that may be relevant to our analysis.
The output of this phase is a comprehensive list of DNS
domains that are relevant to the target company. You'll notice that the
next phase of the assessment may provide additional domain names that
weren't found during this phase. In that case, those domains are used
as inputs during this phase and the entire process is repeated. Phases
1 and 2 may recur a number of times before we've located all the
relevant domains. Typically, we'll check this list with the customer
once we're done to ensure that we haven't missed anything or included
something inappropriate.
2. Foot Printing
At the start of phase two we have a list DNS domains - things like apples.com, apples-inc.com, applesonline.com, apples.co.uk,
etc. The reasons these domains exist is to provide Internet users with
a simple way of reaching and using the resources they require. For
example, instead of typing http://196.30.67.5, a user simply
needs to remember www.sensepost.com. Within a domain, therefore, there
are a number of records - specific mappings between machine names and
their actual Internet Protocol (IP) numbers. The objective of this
phase is to identify as many of those IP/name mappings as we possibly
can in order to understand which address spaces on the Internet are
actually being used by the target organization. There are a few
different techniques for identifying these mappings. Without going into
too much detail, these techniques are all derived from the same
assumptions, namely:
-
Some IP/name mapping must exist
for a domain to be functional. These include the name server records
(NS) and the mail exchanger records (MX). If a company is actually
using a domain then you will be able to request these two special
entries. Immediately you have one or more actual IP addresses to work
with.
-
Some IP/name mappings are very likely to exist on an
active domain. For example, "www" is a machine that exists in just
about every domain. Names like "mail", "firewall" and "gateway" are
also likely candidates. We have a long list of common names that we
test. This is by no means a watertight approach but one is more often
lucky then not.
- An organization's machines usually live close together. This
means that if we've found one IP address, we have a good idea of where
to look for the rest of the addresses.
- The Name -> IP mapping (the forward lookup), and the IP
-> Name mapping (the reverse lookup) need not necessarily be the
same.
- The technology is fundamentally verbose. DNS, as a
technology, was designed for dissemination of what is essentially
considered "public" information. With one or two simple tricks we can
usually extract all the information there is to be had. The DNS zone
transfer - a feature of DNS literarily designed for the bulk transfer
of DNS records - is a fine example of this. Other, craftier, techniques
fall beyond the scope of this paper.
Once we have all the relevant DNS names we can find, we
attempt to identify the distinct network "blocks" in which the target
organization operates. As stated previously, IPs tend to be grouped
together. The nature of IP networking is to group addresses together in
what are known as subnets. The expected output of this phase is a list
of all the IP subnets in which the target organization has machines
active. At this stage, our broad reasoning is that if we find even a
single IP in a given subnet we include that entire subnet in the list.
The technically astute among you will already be crying "False
assumption! False assumption!" and you'd be right. But bear with me. At
this stage we tend rather to over-estimate then to under-estimate.
Later, we will do our best to prune the list to a more accurate
depiction of what's actually there.
3. Vitality
We ended the last phase with a list of IP subnets
in which we believe the target organization to have a presence and a
horde of technocrats objecting loudly to our assumptions about the
subnet size. Let's quickly make a list of the some of the facts we need
to know before we can move on with the process:
- An organization does not need to own
the entire subnet in which it operates. IP addresses can be lent,
leased or shared. Nor do all an organization's IPs have to be grouped
together, they can be as widely spread across the Internet as they
wish.
- Just because a Name / IP mapping exists for a machine,
doesn't mean that machine actually exists. Conversely, just because a
Name / IP mapping doesn't exist for a machine, doesn't mean the machine
doesn't exist. There are thousands of nameless addresses on the
Internet. Yes, it's sad, but true nevertheless.
-
Without a route to describe how an IP address can be reached, that address can never be used on the Internet
So we see that, although DNS gives us a logical
starting point for our search, it by no means provides a comprehensive
list of potential targets. This is why we work with the rather loose
subnet definitions we derived in the previous phase. The objective of
the "Vitality" phase of the assessment is to determine, within the
subnet blocks that we have, which IP addresses are actually active and
being used on the Internet. We now leave the wonderful world of DNS
behind us, and begin to concentrate solely on the IP address space.
So how does one determine if an address is active
on the Internet or not? Well, let's recall the third "fact" from our
list above. If there's no route to a given IP subnet, that subnet is as
good as dead. Various core routers on the Internet graciously allow
technicians and administrators to query them regarding routes to any
given address. At the time of writing, one such router is route-views.oregon-ix.net.
Such a router can't tell us that an IP address is alive. If there's no
route for a subnet on the core routers, however, then we can conclude
that all the IPs in that subnet are dead.
The next, and probably most the obvious technique is the famous IP "ping". Pinging works just like sonar. You send a ping
to a specific address and the machine responds with a "pong" indicating
that it is alive and received your request. Ping is a standard
component of the Internet Protocol (IP), and machines that talk IP are
compelled to respond when they receive a ping request. With simple and
freely available tools we are able to ping an entire subnet. This is
know as a "ping scan". Without going into too much detail, the response
of such a ping scan can be interpreted as follows:
-
A reply from an IP address indicates that the address is probably in use and accessible from the Internet.
- Multiple replies from a single IP address indicate that the
address is probably actually a subnet address or a broadcast address
and suggest a subnet border.
-
No reply can only be interpreted to mean that the machine is not replying to IP ping requests.
I realize that the latter point is a bit vague, but
that really is the only conclusion that can be drawn from the
information available. I said that all machines that speak IP are
obliged to respond to ping requests. Why not simply conclude that if
the IP doesn't respond, it isn't being used? The confusion is
introduced by modern network security products like firewalls and
screening routers. In the real world, one often sees networks
configured in such a way that the IP ping packet is blocked by the
firewall before the packet reaches the machine. Thus the machine would
respond if it could, but it's prevented from doing so.
So we haul out the heavy artillery. Just about
every machine on the Internet works with a series of little Internet
"post boxes" called ports. Ports are used to receive incoming traffic
for a specific service or application. Each port on a machine has a
number and there are 65536 possible port numbers. A modern machine that
is connected to the Internet and actually functioning is almost certain
to be using at least one port. Thus, if an IP address does not respond
to our ping request, we can further probe it using a tool called a
"port scanner". Port scanners are freely available software utilities
that attempt to establish a connection to every possible port within a
specified range. If we can find just one port that responds when
probed, we know that the IP is alive. Unfortunately, the amount of work
required to probe all 65,000 plus ports is often prohibitive. Such an
exercise can takes hours per single IP address and days or even weeks
for an entire range. So we're forced to make yet another assumption: If
an IP address is active on the Internet, then it's probably there for a
reason. And there are only so many reasons to connect a machine to the
Net:
-
The machine is a Web server (and thus uses port 80 or 443)
-
The machine is a mail server (and thus uses port 25)
-
The machine is a DNS server (and thus uses port 53)
-
The machine is another common server - FTP, database, time, news, etc)
-
The machine is a client. In this case it is probably a Microsoft machine and uses port 139.
Thus, we can now modify our scan to search for only a
small number of commonly used ports. This approach is called a "host
scan". It is by no means perfect, but it generally delivers accurate
results and is efficient enough to be used with large organizations.
The common ports we scan for can be adjusted to better suite the nature
of the organization being analyzed, if required. The nmap network utility (available from www.insecure.org) is a powerful tool that serves equally well as ping scanner and a port scanner.
Thus, by the end of this phase we have fine-tuned our list of IP
subnets and generated a list of actual IP addresses that we know to be
"alive" and that therefore qualify as targets for the remainder of the
process. At this point, our findings are usually presented to the
customer to ensure that we're still on the right track.
Conclusion
That concludes the first part of our discussion of
Internet assessment methodology. In the next installment in this
five-part series on Internet Risk Assessments, we will continue to
discuss methodology, including: visibility, vulnerability scanning, and
analyses of Web applications.
Only registered users can write comments. Please login or register. Powered by AkoComment! |