Look Back on 2012's Famous Password Hash Leaks - Wordlist, Analysis and New Cracking Techniques

This article is a collaborative work between 3 authors. This is our look back on the most famous public password leaks of 2012.

Authors: m3g9tr0n, Thireus, CrackTheHash | Copy Editor: Thireus.

Nowadays, different black hat hacking communities around the World are publishing their leaks on various online paste Web Services like Pastebin, Paste2.org, and others. The most common dumps of credentials are performed using SQL Injection exploitation. These leaks often contain elements such as usernames, passwords, addresses, zip codes, telephone numbers and even paypal accounts or credit card numbers. In a small amount of them, passwords are in plain text which makes hackers' job very easy.

In this article, we gathered a big amount of public published leaks with main purpose to check the strength of users' passwords and password policy which is applied for each service. Some well known leaks, included in our article, are LinkedIN, Stratfor, Gamigo, NVidia, Adobe and eHarmony. We are going to present our cracking techniques and tools which we used to crack passwords from these leaks. And as a gift gave to our readers, you will find attached to the end of this article a wordlist containing all cracked passwords from these leaks.

CRACKING METHODOLOGIES AND TOOLS… (m3g9tr0n)

The tools we used to accomplish our cracking process are John the Ripper and Hashcat-suite. In other words, we took advantage of both CPU and GPU powers.

When dealing with password cracking the most important thing is to know as many elements as possible about your target. For the case of Stratfor we had all the appropriate elements needed for effective password cracking. These are usernames, first name, last name and e-mails. Many users use their e-mail or username (or part of) as password or keyword. Knowing these information really speeds up the cracking process as it is more effective to create a wordlist based on these information for our first cracking step. On the other side, LinkedIN and other well known leaks contained only hashes… that makes the cracking process more difficult and time consuming. But, with good rules and techniques some interesting results can be achieved. For better documentation, we are going to analyze each case separately by showing the techniques and custom rules.

Stratfor Case

Regarding Stratfor, we had all the appropriate elements needed for effective password cracking. The first action was to separate names, usernames, e-mails and encrypted passwords to different files. In a first attempt we used John the Ripper's --single attack which is a cracking attack purely based on usernames associated to hashes (the Hashcat-suite does not provide such an attack). The hashfile must have this kind of format for the attack to be effective:

John@yahoo.com:90560000032a57c389f686bd4eeccd4a
Kate@hotmail.com:d4c202003a0a66496df5c043ec1eaaac

John the Ripper command for --single attack against MD5:

  m3g9tr0n@linux:~/JohnTheRipper-OMP/run/$ ./john --format=raw-md5 --single --pot=stratfor.pot Stratfor-hashes.txt

This kind of attack was able to crack many passwords. When I (m3g9tr0n) am trying to crack passwords, my first reaction is to apply effective rules against effective wordlists. As far as John the Ripper is concerned, I always try Single, Extra, Jumbo and rules presented in my first article in addition to some rules generated by Bartavelle. Regarding Hashcat-suite our favourite rules are best64.rule, best80.rule, passwordpro.rule, T0XlC.rule and d3ad0ne.rule.

A typical example of a wordlist attack with John the Ripper is:

  m3g9tr0n@linux:~/JohnTheRipper-OMP/run/$ ./john --format=raw-md5 --wordlist=list.txt --pot=stratfor.pot --rules:Single Stratfor-hashes.txt

A typical example of a wordlist attack with oclHashcat-plus (GPU based) is:

  m3g9tr0n@linux:~/oclHashcat-plus0.09/$ ./oclHashcat-plus64.bin -m 0 hashfile.txt list.txt -r rules/best80.rule -o hashfile-crack.txt --remove

During our cracking processes against Stratfor, we observed that many passwords contained the word "stratfor". Based on this observation, we considered to generate our own rule that appends or prepends this keyword at the beginning and at the end of each word of a given wordlist. The following code is an example of rule created for John the Ripper in the john.conf file:

[List.Rules:stratfor]
A0"[Ss][tT+][rR][aA@][tT+][fF][oO0][rR]"
Az"[Ss][tT+][rR][aA@][tT+][fF][oO0][rR]"

After cracking a big amount of passwords, we generated a custom charset with John the Ripper.

A typical example to generate your own charset file with John the Ripper:

  m3g9tr0n@linux:~/JohnTheRipper-OMP/run/$ ./john --make-charset=stratfor.chr --pot=stratfor.pot

And the associated incremental rule in john.conf file:

  [Incremental:stratfor]
  File = $JOHN/stratfor.chr
  MinLen = 10
  MaxLen = 31
  CharCount = 95

The charset file can be used to conduct Brute Force attack with John the Ripper based on Markov model.

A typical example of Brute Force attack with Markov model in John the Ripper is:

  m3g9tr0n@linux:~/JohnTheRipper-OMP/run/$ ./john --format=raw-md5 --incremental=stratfor --pot=stratfor.pot hashfile.txt

We left John the Ripper to run for a large amount of time. Many passwords were cracked, but the most important was that a large amount of these recovered passwords using this method were 8 characters mixed upper, lower and numbers. Thus, we understood that Stratfor had a policy of generating either default or recovered passwords with this policy for their users. Our first thought was to use the pwgen utility in order to produce random passwords based on this policy.

A typical example of pwgen to generate 8 characters mixed upper, lower and numbers:

  m3g9tr0n@linux:~/JohnTheRipper-OMP/run/$ pwgen -c -n -s -1 8 5
  Ch1NiIzz
  YrN5SSXL
  8CdcCJGG
  5YBIxBTt
  rmIW8ipN

Of course in our case we should generate more passwords and pipe pwgen’s output to John the Ripper or Hashcat-Suite. But this kind of attack is too slow. For that reason we should take advantage of the GPU. We applied Brute Force attacks via oclHashcat-plus.

A typical example of Brute Force attack with oclHashcat-plus:

  m3g9tr0n@linux:~/oclHashcat-plus0.09/$ ./oclHashcat-plus64.bin -a 3 -1 ?l?d?u hashfile.txt ?1?1?1?1?1?1?1?1 -o hashfile-crack.txt --remove

This kind of attack took 2 days and 17 hours to complete with an ATI 5770 but it was only able to crack 48% of passwords.

Some examples of cracked passwords generated from the Stratfor’s policy are:

  dd39ebf25b0892803c0edfdedfcf137a:4QnvJQKQ
  0adff76e3b3c2130fcb8d9cf476f947a:4Kjduu8J
  61b4f425867841330cec762d96df157b:4sFqqEnY
  ffee030ed8d97ad550e50b011d95b47b:2xdjVx7G
  728d78a787d7279cb0a007f5f68d817c:2DJsL9jE
  00ca874d657b3fcdddbb96121667ca7c:33g3UWcA
  73b87959e3d1ba6c97037f6ddb5be87c:3TSfVw9M
  9a4f0f28125c03323951283409c8187d:37nfZS6p
  01dfda585ff13b24ab1d276bfd58227d:2K2HHfKC
  7a4f94112cd50422740035dd80f52a7d:2s6KkegZ
  99ee4023fc71693006af30dbb25f477d:4f9ySQxR
  e46c4ccb9323566dbeb1a33967c94a7d:2SfXBWb7
  99aba8d7e69649332ac64e813a664b7d:4pZ7ZmjJ
  e5f706829a937c3fa5e430c81e926f7d:3YnxoEfy
  ffff9c930660fae4c9e9ace85a96a27d:2JTSA88Y
  0d7103e46a1c0f44df5c096b6e2ae17d:2ATb8ApH

eHarmony Case

Regarding eHarmony it seems that the website had a policy to covert all users' passwords to UpperCase. For example, if you had inserted, as a registered user, the password "p@$$w0rd", eHarmony’s system would have converted it to "P@$$W0RD".

The first thought that came up to my mind was to write a simple rule for John the Ripper to convert all my wordlists to uppercase characters:

[List.Rules:eharmony]
u

Then, I applied this Rule to John the Ripper and a large amount of passwords were cracked very fast:

m3g9tr0n@linux:~/JohnTheRipper-OMP/run/$ cat ../Wordlists/* | ./john --format=raw-md5 --pipe --pot=eharmony.pot --rules:eharmony hashfile.txt

Due to the fact that my wordlists do not contain only uppercase letters, numbers and symbols it was a waste of time to apply other rules against eHarmony hashes. So I decided to convert the most effective wordlists to uppercase characters, using the above mentioned rule, and apply some specific rules:

Convert a wordlist to uppercase with John the Ripper:

  m3g9tr0n@linux:~/JohnTheRipper-OMP/run/$ cat ../Wordlists/* | ./john --pipe --rules:eharmony --stdout > ../Wordlists/UpperList.txt

Then, I used the --wordlist attack with John the Ripper using the following rules (it is just a sample to which you can add more rules):

$[1]$[2]$[3]
^[S]
$[T]$[E]$[R]
^[P]
$[M]$[A]$[N]
^[M]
^[B]
^[C]
^[A]
^[A]^[P]
^[T]
$[I]$[N]$[G]
^[A]^[M]
^[S]^[A]^[P]
$[P]$[B]$[B]
$[R]$[T]$[Y]
^[D]
$[E]$[R]$[S]
^[H]
$[P]$[E]$[R]
^[F]
$[G]$[E]$[R]
^[G]
$[K]$[E]$[R]
^[K]
$[S]$[O]$[N]
^[R]
^[L]
$[I]$[N]$[E]
^[P]^[H]^[P]
$[I]$[O]$[N]
^[J]
$[V]$[E]$[R]
^[W]
$[E]$[S]$[T]
^[H]^[P]
$[D]$[E]$[R]
^[N]
$[K]$[E]$[Y]
^[H]^[C]
$[O]$[N]$[E]
^[E]
$[A]$[S]$[S]
^[E]^[W]^[Q]
^[A]^[S]
$[T]$[O]$[N]
^[E]^[D]
$[D]$[O]$[G]
^[W]^[Q]

Of course, you can always generate your own rules or modify existing custom rules contained in the john.conf file. In addition to this, Hashcat Suite's rules can be used. One simple rule is to use the keyword "EHARMONY" at the beginning or at the end of each word:

[List.Rules:eharmony]
A0"[E][H][A][R][M][O][N][Y]"
Az"[E][H][A][R][M][O][N][Y]"

For people who do not own strong hardware and adequate disk space, Hashcat-suite contains a powerfull parameter which has to do with combination. In other words, you can combine each word of your first wordlist with the other.

Thus, I generated some wordlists via crunch, such as the following one of 4 ualpha-numeric characters:
```
  m3g9tr0n@linux:~/crunch3.1/$ ./crunch 4 4 -f charset.lst ualpha-numeric -o 4-list.txt
```

And used combination attacks with oclHashcat-plus:

  m3g9tr0n@linux:~/oclHashcat-plus0.09/$ ./oclHashcat-plus64.bin -a 1 hashlist.txt ../crunch3.1/4-list.txt ../crunch3.1/4-list.txt -o hashfile-crack.txt --remove

Methodology for Other Leaks

Regarding other leaks such as Nvidia, Gamigo, Adobe, Project Whitefox, LinkedIN and various unknown leaks collected from Pastebin, the tools and methodology are the same. The only difference is that in each situation we have to create custom rules that refer to the name of the platform/website or by guessing some keywords.

John the Ripper Rules for Nvidia:

  [List.Rules:nvidia]
  A0"[Nn][Vv][iI1][Dd][iI1][aA@]"
  Az"[Nn][Vv][iI1][Dd][iI1][aA@]"

John the Ripper Rules for Adobe:

  [List.Rules:adobe]
  A0"[Aa@][Dd][oO0][bB][eE]"
  Az"[Aa@][Dd][oO0][bB][eE]"

You can also create similar rules for Hashact-Suite.

Another effective technique is the fingerprint attack. This is an attack that is focused on using cracked passwords against the remaining hashes.

To isolate cracked passwords from .pot files (John the Ripper or Hashcat-suite) use:
```
  cut -d: -f2- john.pot | sort | uniq > list.txt
```
In Hashcat-suite to isolate MD5 cracked passwords (from output with the -o option), use:
```
  cut -b34- crack-file.txt | sort | uniq > list.txt
```

Then you can try all the rules mentioned above. From my own experience this technique has always great results.

ADVANCED PASSWORD CRACKING FOR HUNGRY PASSWORD CRACKERS… (Thireus)

During your cracking sessions you may certainly have noticed that most of the passwords used by users are always made of "keywords". This can easily be noticed when dealing with big leaks such as LinkedIn, Gamigo or Stratfor. These keywords are interesting for us, as they are used by users consciously or unconsciously in their passwords. Fortunately for us, lot of users use the same keywords and if you want to go further in your cracking process the main idea will be to use these keywords as roots for generating new passwords. In this article section I (Thireus) will introduce you a new cracking technique based on this idea. But first of all let me explain what those keywords are exactly and why they can be so useful…

About "Keywords"…

Basically keywords can be described as passwords or part of passwords that appear as intelligible or used by multiple users. Let's focus on the following example:

Il0v3soph
il0v3sam
k4r3nl0v3sk4t3
l0v3s3at
l0v3s3x
Myl0v3s

These passwords have the keyword "l0v3s" in common, which can be found at the beginning, at the end or in the middle of the password. A common mistake would be to think that re-using these passwords with various rules will make more "l0v3s" based passwords appear, which is false because most of the rules you use will never extract the "l0v3s" pattern only, but combine or transform each of these passwords… And yet, you keep thinking that there should be more words containing this keyword… and you are right!

As explained in this section’s introduction, keywords are not just words, they are part of passwords that are intelligible or repeated among multiple users' passwords. Here are some example of keywords:

inked
_123
assword
!)!

Keywords can be anything intelligible or not. The most important think about keywords is that they are not random, ideally generated by humans AND have a high probability to appear in other passwords. Of course keywords can be part of other keywords, for example:

inked –> Linked, linked, winked, inkedIN, etc.

Another nice property of keywords is that they are independent of the password size. And a weak password (understand easily crackable with BruteForce/Rules/Wordlists) can contain a specific keyword, that you can reuse to crack other strong passwords. Let's see for example how the following passwords have been cracked:

a6fee417cdc11a71ac5da0ebb9cd20acb93d2959:M00linkedin13
ebf1570c045011b27706a28eb4c857a5b994cf47:0linkedin1-us2

M00linkedin13 –> Was cracked because it contains the keyword "linkedin13" which is part of more than 40 other linkedin passwords and is also a weak linkedin password. M00linkedin13 = 3chars + keyword

0linkedin1-us2 –> Was cracked because it contains the keyword "0linkedin1" which is part of “M00linkedin13” and 1 other linkedin password. 0linkedin1-us2 = keyword + 4chars

The padding technique – CTH_WordExtractor

So the main idea that can cross your mind would be to manually analyse your cracked passwords and look for good keywords, to finally write rules based on these few keywords… But what if there are so many keywords that you can't even complete all this work manually? The answer is to have a keyword extractor based on your results, and CTH_WordExtractor.sh (from my “Crack That Hash” project) is the script I have created for this purpose!

You can get the script here: CTH_WordExtractor.sh

This script helps you to extract all potential keywords directly from your current pot file. Basically what this script does is:

Read all passwords and use a padded window which padding and size vary from X to Y as defined by the user.
Sort extracted words by size and for each word count its redundancy in all passwords.
Ask the user to select a range of redundancy to select only good words. In other words to select real "keywords".
Generate keyword wordlists from X chars to Y chars to be used by the user.

In the case of LinkedIN passwords, a 4-6chars keyword wordlist would contain the following keywords (this is just a sample):

inke
inked
link
Link
linke
Linke
linked
Linked

This wordlist will be used to append and prepend characters using BruteForce and Mask attack (which is the most effective). As you can see, most of these keywords are part of other keywords… and you can think this is actually very bad in term of performances… but it is not… let’s see why.

Let’s take the example of the "inke" keyword…

BruteForce + Mask attack with ?l will generate 26 possibilities per keyword:

inke –> ?l + inke = 26 combinations

But ONLY 1 will cause a repeated password which is "linke".

The next step of the process will be to use BruteForce + Mask attack with ?l?l which will generate 26^2=676 combinations per keyword:

inke –> ?l?l + inke = 676 combinations

But ONLY 26 will cause repeated passwords which are those that have been generated by ?l + "linke".

etc.

And for sure, we have been able to recover all passwords containing the keyword inke, including unexpected passwords such as:

$dynamic_26$00000cd9fb6fe9d200144077861d4dc70c7d4798:reinke
$dynamic_26$00000efc970e5f2edc1bf34fea284e930b677c19:twinke
etc.

The Proper Way to Use Generated Keyword Wordlists

First of all, this technique becomes more effective and useful when you reach your limits with other classic cracking techniques. Meaning that if you want to have a very good keyword wordlist you need a very big pot file.

Then, this technique must be used with GPU BruteForcing + Mask attack or using combination attacks. Applying classic John the Ripper or Hashcat rules on the keyword wordlist will not be effective at all and will be very slow. In this article I will only take as example the GPU BruteForcing + Mask attack.

First of all, we need to generate our keyword wordlists from 4 to 14 chars. Let's do this for the john.pot of our LinkedIN cracked passwords:
```
  $ ./CTH_WordExtractor.sh 4 14
```

Other settings can be found in the CTH_WordExtractor.sh script such as padding limits.

This is the list of wordlists generated:

  $ ls CTH/
  CTH_WORDLIST_FINAL_10-10.dic CTH_WORDLIST_FINAL_4-6.dic CTH_WORDLIST_FINAL_6-9.dic
  CTH_WORDLIST_FINAL_10-11.dic CTH_WORDLIST_FINAL_4-7.dic CTH_WORDLIST_FINAL_7-10.dic
  CTH_WORDLIST_FINAL_10-12.dic CTH_WORDLIST_FINAL_4-8.dic CTH_WORDLIST_FINAL_7-11.dic
  CTH_WORDLIST_FINAL_10-13.dic CTH_WORDLIST_FINAL_4-9.dic CTH_WORDLIST_FINAL_7-12.dic
  CTH_WORDLIST_FINAL_10-14.dic CTH_WORDLIST_FINAL_5-10.dic CTH_WORDLIST_FINAL_7-13.dic
  CTH_WORDLIST_FINAL_11-11.dic CTH_WORDLIST_FINAL_5-11.dic CTH_WORDLIST_FINAL_7-14.dic
  CTH_WORDLIST_FINAL_11-12.dic CTH_WORDLIST_FINAL_5-12.dic CTH_WORDLIST_FINAL_7-7.dic
  CTH_WORDLIST_FINAL_11-13.dic CTH_WORDLIST_FINAL_5-13.dic CTH_WORDLIST_FINAL_7-8.dic
  CTH_WORDLIST_FINAL_11-14.dic CTH_WORDLIST_FINAL_5-14.dic CTH_WORDLIST_FINAL_7-9.dic
  CTH_WORDLIST_FINAL_12-12.dic CTH_WORDLIST_FINAL_5-5.dic CTH_WORDLIST_FINAL_8-10.dic
  CTH_WORDLIST_FINAL_12-13.dic CTH_WORDLIST_FINAL_5-6.dic CTH_WORDLIST_FINAL_8-11.dic
  CTH_WORDLIST_FINAL_12-14.dic CTH_WORDLIST_FINAL_5-7.dic CTH_WORDLIST_FINAL_8-12.dic
  CTH_WORDLIST_FINAL_13-13.dic CTH_WORDLIST_FINAL_5-8.dic CTH_WORDLIST_FINAL_8-13.dic
  CTH_WORDLIST_FINAL_13-14.dic CTH_WORDLIST_FINAL_5-9.dic CTH_WORDLIST_FINAL_8-14.dic
  CTH_WORDLIST_FINAL_14-14.dic CTH_WORDLIST_FINAL_6-10.dic CTH_WORDLIST_FINAL_8-8.dic
  CTH_WORDLIST_FINAL_4-10.dic CTH_WORDLIST_FINAL_6-11.dic CTH_WORDLIST_FINAL_8-9.dic
  CTH_WORDLIST_FINAL_4-11.dic CTH_WORDLIST_FINAL_6-12.dic CTH_WORDLIST_FINAL_9-10.dic
  CTH_WORDLIST_FINAL_4-12.dic CTH_WORDLIST_FINAL_6-13.dic CTH_WORDLIST_FINAL_9-11.dic
  CTH_WORDLIST_FINAL_4-13.dic CTH_WORDLIST_FINAL_6-14.dic CTH_WORDLIST_FINAL_9-12.dic
  CTH_WORDLIST_FINAL_4-14.dic CTH_WORDLIST_FINAL_6-6.dic CTH_WORDLIST_FINAL_9-13.dic
  CTH_WORDLIST_FINAL_4-4.dic CTH_WORDLIST_FINAL_6-7.dic CTH_WORDLIST_FINAL_9-14.dic
  CTH_WORDLIST_FINAL_4-5.dic CTH_WORDLIST_FINAL_6-8.dic CTH_WORDLIST_FINAL_9-9.dic

CTH_WORDLIST_FINAL_4-14.dic for example means WORDLIST from 4 to 14 chars.

Then we can select a specific wordlist to be used by cudaHashcat-plus or oclHashcat-plus:

  $ ./cudaHashcat-plus64.bin -m 100 -a 6 -1 ?a ../LEFT_LINKEDIN_CLEANED.txt ../CTH/CTH_WORDLIST_FINAL_4-11.dic ?1?1?1?1 --remove --gpu-temp-abort=110

In this example, CTH_WORDLIST_FINAL_4-11.dic has been chosen because oclHashcat-plus/cudaHashcat-plus has a limit of 15 chars for hash computation. Which means you will never be able to crack passwords that are more than 15 chars long… And that's why if you use a mask attack of 4 chars to be bruteforced you must use a wordlist containing words limited to a size of 11 chars.

This is an output sample:

  499896a0a104c0be6d7e578f9257e56e2dd97b31:rottweiler3:!^
  556cdfaabedd4a90c23627782ab7eb7a4d709565:LinkedInMakes$
  e5386e1f0de44840a987c4d0840accbe2573511f:NetworkingLuv!
  08e7c2d275a68e1519c8b0842c68601b7ba6274a:19linkedin_68!
  359e2430b1e4352f1577575b7ca1ae6866131820:linkedinmym99!
  8e6139a4503dd34297e32df7ea4cedc4275d3a85:linkedin15c00!
  df0fdf12590705e9c3ef6edb6f59323e3de6a70b:linkedinl1ng0!
  79984358590405280bca6e43d331465bdb586746:linkedin81*&1$
  49cd314ab02e393171bcf1bf13099f55495b2c2e:Linkedin12kay#
  7813dc98e26938e83f4475c32bbd07a3fb81b473:linkedin69TJK]
  cc307a7d9e40b00c0100bc049c397b817aa0a274:linkedin12914??
  33f13bb3b861c0e5fc82b10fba7857107e079884:steelwindows@77
  3dd28c9d9cc4f646c254d6b4570e8bc6268b020b:artdirector@nsa
  44bdcefe2a698925c57d80712763245d07326704:yaslinkedin@yas
  8aa482c9989df0def8756e545457ebf206da9895:Linkedin151$cdu
  56267a448f53e5d6095844152310d12e52b710aa:thundercats@83a
  a5949feca9f34d7042aaffe537db0e2d298c572f:linkedin13713@@
  fab9ae4accf0b5766489c7760f4ee52582940d3c:missinglink=wwd
  1d92639e0279840b8d00a2d7793c291838664c6c:my-linkedin-pwd
  a1bac77b4fe610ec13300d246ad882a68f0fedda:Interactive@ln1
  90ba89bfa42002d8e6fb4fe3728bcbcd6605b49c:Inspiration.SSN
  [s]tatus [p]ause [r]esume [b]ypass [q]uit => s
  Status.......: Running
  Input.Base...: File (../CTH/CTH_WORDLIST_FINAL_4-11.dic)
  Input.Mod....: Mask (?1?1?1?1)
  Hash.Target..: File (../LEFT_LINKEDIN_CLEANED.txt)
  Hash.Type....: SHA1
  Time.Running.: 1 day, 7 hours
  Time.Left....: 3 hours, 59 mins
  Time.Util....: 112529717.4ms/0.0ms Real/CPU, 0.0% idle
  Speed........: 35724.6k c/s Real, 36175.5k c/s GPU
  Recovered....: 292/1086109 Digests, 0/1 Salts
  Progress.....: 4020080601574/4533053083750 (88.68%)
  Rejected.....: 0/4020080601574 (0.00%)
  HWMon.GPU.#1.: -1% Util, 82c Temp, -1% Fan

And as we can see some interesting keywords have been selected, such as "rottweiler", "Networking", "Interactive", "artdirector", "Inspiration", and of course keywords containing the word "linkedin".
You can also notice that I'm not using a very powerful GPU, but a laptop with a "NVIDIA NVS 3100m" chip. So you can imagine how powerful this method can be with a better GPU!

To conclude on my new technique, I would say that it was very successful. I've been able to recover more than 1 million passwords after having exhausted all the classic techniques I normally use, and that in just 13 days with a NVidia GTX 480 and an AMD HD6870. This 1 million result was mainly against Gamigo, eHarmony and Stratfor and after an initial achievement of about 80% recovered passwords. And one thing to consider is that to go further in the cracking process and have an optimized cracking methodology, I preferred merging multiple MD5 leaks into one big MD5 leak and use this technique against the merged pot file to generate my keywords. As explained before, you will find this technique more useful in the case of very big leaks and very big pot files.

Please consider my CTH_WordExtractor.sh script as a Xmas gift. I would love to receive feedbacks about your results with it. Of course, if you have ideas to ameliorate this script or this technique do not hesitate to contact me.

METHODOLOGY TO GENERATE EFFECTIVE WORDLISTS… (CrackTheHash)

The main purpose of most of the classic cracking techniques are to guess the most common patterns in users' passwords. Those techniques are either dealing with rules or wordlists, but in any case for them to be the most effective possible they need good candidate passwords as root of the technique process. But how can you find those good candidate passwords? The purpose of this part will be to explain a technique to find fresh new candidates from various sources such as Pastebin or Twitter.

First of all, to understand what brought me (CrackTheHash) on this methodology field, you need to know something about my hardware resources. They are very limited! I just own a dual-opteron with 2GB RAM. And for this reason, I do not want to exhaust my CPU for cracking hashes that everyone can easily recover. So I decided to focus my research on finding sources of good candidate passwords to be used for cracking techniques.

In order to know what we are looking for, let's write some principles that will rule our research. Those principles are based on the password characteristics for them to match at best the requirements of good candidates. And they are the following:

Password candidates must be up to date.
Password candidates must be representative of what people may use.
Password candidates must be multilingual (passwords in Russian, Chinese, Greek, Farsi, etc.).
Password candidates must be available in large quantity.

There are multiple sources on the Internet where you can find a large amount of data containing password candidates, but only a few will fill those requirements. For the needs of this article we will focus only on two platforms and sources of good password candidates, Pastebin and Twitter.

Pastebin

Pastebin is probably the first Web location where you can find lot of fresh leaks and various user data. What is very interesting in most of the leaks we can find on Pastebin is that they often include real passwords in plaintext. So, monitoring Pastebin is quite interesting and useful to get fresh new candidate passwords. On top of that, there are several resources on the Internet, that will help you to monitor and download the latest Pastebin leaks. Portals like Leakedin, @Pastebindorks or @PastebinLeaks or projects like pastemon and pasteminer are good examples of sources and tools you can use.

Unfortunately, in order to generate effective wordlists you have to create some further scripting because the data does not come very well parsed. The first step and ordinary solution to parse the Pastebin data is to generate a wordlist using the space or tab character as separator and replace it with a line break. This way may lead to miss some interesting candidates as in some leaks or cracking results. Most of the time you will find lines containing "username:password", "username | password" or even worse, direct sqlmap output, etc. So you have to be clever and find the best way to parse those leaks to create useful wordlists.

In any case, Pastebin can help us to build useful wordlists, because everyday new leaks are uploaded. The produced wordlists are not that amazing in term of quantity, but usually their content is valuable.

Twitter

Nowadays people tend to use sentences or combination of words for their passwords. They have been advised to do this as it is considered to be a strong and easy to remember way to create passwords. So I decided to use one of the the best sentence generator ever… Twitter! Indeed, everyday people generate tweets with fresh content and in this case our password candidates are just what people are saying.

The most important thing about Twitter is that this social platform generates a lot of public and fresh data, is international and tweets are short enough to be parsed individually! On top of that, wordlists generated via Twitter can continuously feed John the Ripper.

So the first step is to grab live Twitter' content. In order to achieve this, Twitter provides a live-feed query that gives you a full json of tweets with all the data you need. The only elements that are required to perform this query are a valid Twitter username and password:

curl --user <username>:<password> https://stream.twitter.com/1.1/statuses/sample.json

To get only the tweet content you have to parse it a bit. First we may need the -m argument of curl to timeout just in case of network trouble and then grep the data received with the keyword "text".

curl -m 10 --user <username>:<password> https://stream.twitter.com/1.1/statuses/sample.json | grep \"text\"

Once received, the result must be parsed because it comes with Unicode escaped characters. Something like the following script will do the trick:

import json, sys
for data in sys.stdin:
  jj=json.loads(data)
  twit=jj["text"]
   print twit.encode('utf-8')
print "done"

The above few lines of Python code can be directly used to generate candidate passwords, which means keeping the whole sentence of the tweet. Another approach is to use each word of the tweet as a candidate password. Furthermore, an interesting idea is to combine tweet words with others.

What we can do is generate combinations of 4 words. Best results are by combining with or without space separators.

Here is a small Python script I wrote to perform this task, the input file is "tweets.txt":

    import sys
    def combinations(words, length):
        if length == 0:
            return []
        result = [[word] for word in words]
        while length > 1:
            new_result = []
            for combo in result:
                new_result.extend(combo + [word] for word in words)
            result = new_result[:]
            length -= 1
        return result
    filein=open("tweets.txt","r")
    linesin=filein.readlines()
    for i in linesin:
      thisline=i.rstrip("\n").split(" ")
      for j in combinations(thisline,4):
        print '%s' % ''.join(map(str,j))
        print '%s' % ' '.join(map(str,j))
      for j in combinations(thisline,3):
        print '%s' % ''.join(map(str,j))
        print '%s' % ' '.join(map(str,j))
      for j in combinations(thisline,2):
        print '%s' % ''.join(map(str,j))
        print '%s' % ' '.join(map(str,j))
      for j in thisline[:]:
        print j

As far as size is concerned, 10 seconds of live Twitter feed will give you about 1.5 MB and about 600 tweets. This size can be reduced down to 50 KB when keeping only the parsed tweet contents. This combination script will give you around 50 Million candidate passwords to test.

Those two approaches, are not the most effective for cracking million passwords. But for sure, they will give you interesting results such as passwords considered as very strong that have even resisted to lots of GPUs' on fire.

CONCLUSION

As you might expect, we are not professional password crackers. Password cracking is a hobby for us. Actually, our hardware resources are limited. And bruteforcing passwords is not the most time friendly way, unless you own many GPUs and strong hardware. For this reason, we are trying to discover new and effective techniques to crack complex passwords.

But always keep in mind that any platforms, websites and online services are never entirely protected against hacking and data leaks. So we would like to give some advices in order to protect your passwords in case critical scenarios such as the LinkedIN leak happen:

Never share passwords
Never use the same password
Always use strong passwords
Do not use common words
Change your passwords on a regular basis

We hope you enjoyed reading this article. Find attached at the end of this article our new wordlist as a late Xmas gift. And of course…

HAPPY NEW YEAR 2013!!!

ABOUT THE WORDLIST

75.8 MB - M3G_THI_CTH_WORDLIST_CLEANED.zip

Leaks

LinkedIN
Gamigo
Adobe
Blizzard
eHarmony
Geissens
NVidia
Stratfor
Project Whitefox
Various leaks collected from Pastebin

Some Results

LinkedIN*:
        Loaded 6458020 password hashes SHA-1 LinkedIn
        Remaining 1078419 password hashes
LinkedIN**: (CLEANED NO DUPS)
        Loaded 5787239 password hashes SHA-1 LinkedIn
        Remaining 880786 password hashes
Gamigo:
        Loaded 7004341 password hashes MD5
        Remaining 1019934 password hashes
Adobe:
        Loaded 630 password hashes MD5
        Remaining 95 password hashes
Blizzard:
        Loaded 15932 password hashes MD5
        Remaining 4967 password hashes
eHarmony:
        Loaded 1513805 password hashes MD5
        Remaining 134345 password hashes
Geissens:
        Loaded 32502 password hashes MD5
        Remaining 4180 password hashes
NVidia:
        Loaded 791 password hashes MD5
        Remaining 354 password hashes
Stratfor:
        Loaded 822666 password hashes MD5
        Remaining 58694 password hashes

*, ** The initial LinkedIN hashlist contains 00000ed and non-00000ed SHA1 hashes. A lot of 00000ed hashes still have their duplicate non-00000ed hash in the list. For instance, if you crack the initial LinkedIN hashes with our wordlist you will find 473148 duplicates between 00000ed and non-00000ed, and if you are using John the Ripper with --format:raw-sha1-linkedin you will need to run the process twice to write duplicates (either the 00000ed or non-00000ed version) in your POT file. If you have already considered duplicates as non-useful, then the right results to consider are the ones from the CLEANED version.

Some Pipal Analysis

LinkedIN:

9 KB - m3g_thi_cth_wordlist_linkedin_pipal.txt

Gamigo:

9 KB - m3g_thi_cth_wordlist_gamigo_pipal.txt

eHarmony:

9 KB - m3g_thi_cth_wordlist_eharmony_pipal.txt

Stratfor:

9 KB - m3g_thi_cth_wordlist_stratfor_pipal.txt

FINAL NOTICE

The wordlist provided in this article has been created using all the presented cracking techniques against public leaks only. Do not expect to find new passwords using the same leaks and techniques presented here.

As always it is up to the reader to use this wordlist to do password recovery. We do not take any responsibility if some of your passwords can be found in this wordlist or be recovered using our techniques. Be aware that the best way to protect you is always to change your passwords as often as possible.