Advanced Reconnaissance: Compiling gathered information

On my previous posts we went over several different reconnaissance tactics and tools of the trade. It was just a start as many of these techniques are manual and require a fair amount of time to execute. But the gathered information is now scattered over different files, with different formats. It is time to take this to the next level, compiling the data to make the next steps faster and more efficient. There are many compilation options and it all is depending on the intended use of the information. However, I’ll show you how to organize your information in a robust and flexible way using as an example the data collected on SANS, namely the subdomains.

Gathering all the files

If you have been paying attention to my previous posts, you know that I’ve been collecting data on SANS and putting the individual files inside a single folder.

Files gathered under a single folder

Looking at the files, the most important ones are:

  • final-sans.org.txt – output from Osmedeus subdomain module
  • SANS-Maltego.csv – Exported results from Maltego
  • stash.sqlite – output from theHarvester
  • sublister.txt – output from Sublist3r

Apart from these files, we also have the results from recon-ng (data.db), still on the original folder.

Recon-ng output folder

To have an individual folder for each target is a question of choice, or style if you prefer. I find it convenient because it saves me time.

Creating a database

Inside the folder destined to this target, I am now going to create a new SQLite database using the tool shipped with Kali Linux.

Finding DB Browser for SQLite on Kali's menu

  • Open the tool and create a new database inside the target’s folder. I called it “SANS.db

Creating a new database

  • Create a table. I called mine “AllDomains”. Add two text fields to the table
    • Host
    • Origin

Creating a new table

  • The database should look like this:

New database

  • Import the result from the text files into new tables

Importing text files

  • Just for greater simplicity, name the new tables according to the originator application

Naming the tables

  • Now the database should look like this:

Database with the imported tables

  • Write Changes
  • Now attach the SQLite database from theHarvester

Attaching theHarvester's database

  • I always name things properly

Naming the attached database

  • Now you have an additional database to get results from

theHarvester database attached

  • Let’s attach recon-ng’s database too, shall we?

Attaching recon-ng's database

  • And now we have 5 sources of data on subdomains of the sans.org domain:

New database with all external sources

Compiling the data into the new table

The goal now is to gather the relevant data from all available sources and place it all in a single location; the field host of the AllDomains table.

Take the time to study your data sources and you will realize that theHarvester collects a lot of URLs and mixes them with the hosts. Therefore, we must filter the data by selecting the records ending with “sans.org”. Besides, we only want the hosts and not the emails and other data.

  • This can be achieved by running a single SQL command:

Using SQL to combine all useful records

  • We still have some strange domains that need to be expunged.

Deleting bad records

Now you have a table with all the subdomains and that can be the embryo for some serious information gathering on your target.

  • If you want, just for tracking purposes, state the origin of your data:

UPDATE AllDomains SET Origin="Compiled"

Final table with all data compiled

This is obviously just a simple example the illustrate the basics of my compilation method. In a real-life scenario, I would add the IP addresses, open ports, etc.

Feel free to expand this method for emails, contacts, etc.

Compiling the data into recon-ng

Another possibility, and it is something I usually do, is to send all this data back to recon-ng in order to use to dig a bit deeper using the nice scripts available in the tool. There re at least two distinct possibilities:

Adding the data to recon-ng

Let’s start by checking out how many duplicates we have in the hosts table

Checking for duplicates

  • Insert data from external sources directly into the recon-ng hosts table

Adding new records to recon-ng

  • How many duplicates do we have now?

Checking for new number of duplicates

This might look like a bad outcome but you can easily remove the duplicates if you want to.

Replacing the data in recon-ng

But why not start with a fresh set of hosts, without duplicates and with no extra information?

Let’s imagine you don’t have a compiled results table yet. You can create a new one, compile all the available data there, delete everything from the host table, and copy everything back to the empty table

  • This can be done sequentially on a single SQL run

Replacing all recon-ng hosts

  • This is the result

New recon-ng hosts table

We had 494 hosts, now we have 770.

All we have to do now is go back to recon-ng and run some of the modules taking advantage of the new set of hosts found by the other footprinting tools.

This is the advanced way of doing reconnaissance; iteration after iteration, compiling, filtering and analyzing.


Next post: Introduction to Scanning

Footprinting with Sublist3r

Sublist3r is a Python subdomain discovery tool that has been designed to enumerate subdomains of websites using data from publicly available sources and brute force techniques. The public sources consist of a wide range of popular search engines such as Google, Yahoo, Bing, Baidu, Ask and also Netcraft, Virustotal, ThreatCrowd, DNSdumpster and ReverseDNS to discover subdomains.

You can also brute force subdomains using an integrated tool named Subbrute. Subbrute is a DNS meta-query spider that enumerates DNS records and subdomains by using an extensive wordlist. This tool uses open resolvers to avoid rate limiting issues that prevent Subbrute from completing the list or attempting all entries.

Sublist3r installation

Sublist3r is not pre-installed in Kali Linux, so you will have to install it from the official repository.

If you are installing Sublist3r on Ubuntu 19.10, you will need to prepare your system:

  • Update your system and install required software:

$ sudo apt-get update && sudo apt-get upgrade

$ sudo apt-get install git

$ sudo apt-get install python-pip python-dev build-essential

$ sudo pip install --upgrade setuptools pip wheel

$ sudo pip install --upgrade virtualenv

  • Next, clone Sublist3r Github. In this tutorial we clone to the /opt directory but feel free to use whatever directory structure works for you.

git clone https://github.com/aboul3la/Sublist3r.git


Cloning Sublist3r repository

  • Next, change directory into the newly created Sublist3r and use the requirements file to finish installing the dependencies for Sublist3r.

cd Sublist3r

pip install –r requirements.txt

Installing Sublist3r

  • At this point the installation is complete and the application can be started from the current Sublist3r directory.

./sublist3r.py

Running Sublist3r

NOTE: If you are using Kali Linux you can start the application manually by typing Sublist3r in your terminal or you can add it to the Applications menu.

Adding Sublist3r to Kali's menu

Using Sublist3r

Like Osmedeus, Sublist3r does not require you to configure any API keys. Therefore, this application is fairly simple to use.

Sublist3r options

So far, we’ve only searched publicly available sources for sub domains for the given domain name. In the next step we will also activate Subbrute which uses a wordlist to brute force subdomains. The results will be saved in a text file under the proper directory.

  • The following command activates Subbrute with 100 threads:

./sublist3r.py -d sans.org -v -b -t 100 -o /root/Documents/SANS/sublister.txt

image

Conclusions:

Even though not being updated recently, Sublister is still a reliable tool to find subdomains.


Next post: Advanced Reconnaissance: Compiling gathered information

Reconnaissance with Osmedeus

Osmedeus is a fully automated offensive security framework for reconnaissance and vulnerability scanning. Like all the other tools in the category, Osmedeus allows you to automate some of the boring stuff like footprinting and scanning a target using a collection of awesome tools.

Osmedeus installation

Osmedeus doesn’t ship with Kali Linux, so you will have to install it from the official repository.

The generic steps to install Osmedeus on Kali Linux are as follow:

  • Choose a directory of your liking (I install these external tools to /opt)
  • Clone the GitHub repository:

git clone https://github.com/j3ssie/Osmedeus

Cloning Osmedeus repository

  • Install the application

cd Osmedeus

./install.sh

Osmedeus installationimage

You can now start the application either from the CLI or from the menu if you take two minutes to add there. I’ve previously mentioned alacarte and you can use it again to add another icon to your list of Info Gathering tools.

Adding Osmedeus to Kali Linux menu

Using Osmedeus

Unlike most of the other OSINT tools, Osmedeus does not require you to configure any API keys. Therefore, this application is fairly simple to use.

The results will be saved in separate folders, inside a general folder for each target also referred to as a workspace like in Recon-ng.

Running Osmedeus to scan for subdomains

The tool currently has eight modules with different goals:

  • subdomain - Scanning subdomain and subdomain takeover
  • portscan - Screenshot and Scanning service for list of domains
  • screenshot - Screenshot list of hosts
  • vuln - Scanning version of services and checking vulnerable service
  • git - Scanning for git repo
  • burp - Scanning for burp state
  • dirb - Do directory search on the target
  • ip - IP discovery on the target

I will run only the subdomain because I want to gather additional information on a previous target. For this kind of information request, Osmedeus runs the following apps:

  • Amass
  • subfinder
  • massdns
  • assetfinder
  • gobuster
  • findomain
  • goaltdns

The command is very simple.

  • Just type:

./osmedeus.py -m subdomain -t sans.org

Running Osmedeus

And the final results will be written in a single text file under the aforementioned folder.

Osmedeus scan results

Running Osmedeus in report mode

Osmedeus has a text report mode and a Web UI.

  • Just type :

/osmedeus.py --report help

(will start the Web UI)

Running Osmedeus in report mode

Open the link in your favorite browser:

image

Conclusions:

Osmedeus is a good application to add to your OSINT toolbox. Specially because it automates the use of a different set of tools and therefore it might get some extra results.


Next post: Footprinting with Sublist3r

Advanced Reconnaissance with theHarvester

Another useful tool for open source intelligence gathering is theHarvester. It is a very simple tool, not as complex as Recon-ng. However, in spite of its simplicity it is very effective in the early stages of a penetration test and it can be used in combination with similar tools (I’ll show you how in a future post).

The tool gathers emails, names, subdomains, IPs, and URLs using multiple public data sources, harvesting a huge quantity of data in an automated way. As we have seen before, this is crucial to determine a company's exposure to the external threat landscape.

theHarvester installation

If you are using Kali Linux, theHarvester comes pre-installed with the official distribution. But at time of writing this post, the version available in the repository (3.1.0) is half broken (it can’t find the APIs). So, it is always a good idea to know how to install it manually.

The generic steps to install theHarvester on Ubuntu 19.04 are as follow:

  • Clone the GitHub repository:

git clone https://github.com/laramies/theHarvester.git

Cloning theHarvester's Github repository

  • Install the application

pip install -r requirements.txt

Installing theHarvester

You can now start the application either from the CLI or from the menu.

Running theHarvester from the CLI

In Kali Linux, theHarvester can be started by navigating in the applications menu by clicking on Applications > Information Gathering > OSINT Analysis > The Harvester

Running theHarvester from Kali's menu

However, you might need to correct the link from the menu.

Install and use alacarte to change the command from “theharvester” to “theHarvester”.

Fixing Kali's menu item

And install the logo, just to look extra cool Winking smile

Be careful with the installation of several versions. Kali presently ships with two different versions of theHarvester, the default one (3.1.0) accessible via CLI and menu, located at:

theHarvester's script default location

But there is also an old version inside /usr/share/golismero/tools/theHarvester

Obsolete theHarvester version

And I installed a third one, included in a recently released framework I’m currently testing, so now my updated script (3.1.1dev) is at:

Latest theHarvester version

Anyway, this is just a reminder so that you guys don’t get lost with all these versions and know where to go in the next step.

theHarvester configuration

Like all the other OSINT tools, theHarvester relies heavily in the use of API keys and these are supposed to be in a file called api-keys.yaml

Right now, I have two distinct api-keys files and a symbolic link pointing to the first one. Your setup will probably be different than mine so it’s your job to find the correct file and insert your API keys in it:

Multiple api-keys files

API keys inserted into the proper file

And that’s all there is to it.

Using theHarvester

Running theHarvester is pretty straightforward but a few details might make a difference for an advanced user. I’m specifically talking about running the script always from the same location/directory. Why?

Because the directory where you start the command from will be the one where the application will create and save the SQLite database it uses to store the results. Running the app from the menu (unless you change it) will open a CLI in the current user’s root folder.

Obviously, you might want to have separate databases for different entities. If that is the case, then start the script from different directories and you’ll have separate database files.

Creating a separate database for each target

I have a folder under /root/Documents/ for each of my targets. If I run theHarvester from inside the respective folder, a file named stash.sqlite will be created in each of the individual folders. That is theHarvester’s database.

Running theHarvester in a dedicated folder

In the end, theHarvester will write all the results inside that database and you can them use them without messing around with other target’s results.

Passive footprinting with theHarvester

So far, in these articles I’ve never directly touched a target. And this tool is perfect for that because it only uses OSINT sources, like search engines, unless you go for the DNS brute force option.

Like we did in Recon-ng, we can focus primarily on either social engineering or on the network infrastructure. In order to do that, all we have to do is select the proper search engines on the command the start the application.

Obviously, the easiest way to use it is just to run all search engines. But it will take longer and sometimes it’s really not useful.

The instructions are pretty clear; there is a set of parameters to be entered as arguments through which we can customize the search. The most important ones are “-d” and “-b” which are mandatory and determine respectively the target domain about which we want to gather information and the data sources we want to use to find them (the list of the sources that can be set is reported in the description).

Take a look at some examples:

Running theHarvester against one domain using all sources

Running theHarvester using just a few sources

Obviously, if you are aiming at finding emails and user accounts, you should focus on the sources with a higher probability of returning that kind of information:

  • Baidu
  • Yahoo
  • IntelX
  • Github
  • Linkedin
  • Twitter

On the other hand, if you are looking only for network information, you should focus on:

  • Trello
  • OTX

All the other sources will give you a mix of hosts, domains and URLs. Useful, but only after filtered. Anyway, do your own testing and you will see. There are no universal rules, each target will have a completely different attack surface and thus it will require a distinct approach.

Outputting the results

Finally, the tool can immediately provide a report either in html or xml.

theHarvester -d sans.org -b all -f /root/Documents/SANS/Harvest.html

All you have to do is use the proper parameter and the file will be written in the specified folder.

theHarvester html scan report


Conclusions

theHarvester is a valuable tool for OSINT because it can quickly discover a good amount of data, especially email addresses. Remember that you need to check ever obtained information. Automatic tools are great but their outputs need to be correctly managed and interpreted.

Remember also of the limits enforced on the free APIs, if you run to many queries you will eventually exhaust your credit and have to wait a few hours or maybe even a day.


Next post: Reconnaissance with Osmedeus