This article is a collaborative work between 3 authors. This is our look back on 2012′s most famous public password leaks.
Nowadays, different hacking communities around the World publish their leaks on various online paste Web Services like Pastebin, Paste2.org, and others. The most usual target’s vulnerability is SQL Injection. These leaks contain elements like usernames, passwords, addresses, zip codes, telephone numbers and even paypal accounts or credit card nubers. In a small amount of them, passwords are in plain text which makes hackers’ job very easy.
In this article, we gathered a big amount of public published leaks with main purpose to check the strength of users’ passwords and password policy which is applied for each service. Some well known leaks, included in our article, are LinkedIN, Stratfor, Gamigo, NVidia, Adobe and eHarmony. We are going to present our cracking techniques and tools which we used to crack passwords from these leaks. And as a gift gave to our readers, you will find attached to the end of this article a wordlist containing all cracked passwords from these leaks.
CRACKING METHODOLOGIES AND TOOLS… (m3g9tr0n)
When dealing with password cracking the most important thing is to know as many elements as possible about your target. For the case of Stratfor we had all the appropriate elements needed for effective password cracking. These are usernames, first name, last name and e-mails. Many users use their e-mail or username (or part of) as password or keyword. Knowing these information really speeds the cracking process as it is more effective to create a wordlist based on these information for our first cracking step. On the other side, LinkedIN and other well known leaks contained only hashes… that makes the cracking process more difficult and time consuming. But, with good rules and techniques some interesting results can be achieved. For better documentation, we are going to analyze each case separately by showing the techniques and custom rules.
Regarding Stratfor, we had all the appropriate elements needed for effective password cracking. The first action was to separate names, usernames, e-mails and encrypted passwords to different files. In a first attempt we used John the Ripper’s –single attack which is a cracking attack purely based on usernames associated to hashes (Hashcat-suite does not provide such an attack). The hashfile must have this kind of format for the attack to be effective:
- John the Ripper command for –single attack against MD5:
m3g9tr0n@linux:~/JohnTheRipper-OMP/run/$ ./john --format=raw-md5 --single --pot=stratfor.pot Stratfor-hashes.txt
This kind of attack was able to crack many passwords. When I (m3g9tr0n) am trying to crack passwords, my first reaction is to apply effective rules against effective wordlists. As far as John the Ripper is concerned, I always try Single, Extra, Jumbo and rules presented in my first article plus some rules generated by Bartavelle. Regarding Hashcat-suite our favourite rules are best64.rule, best80.rule, passwordpro.rule, T0XlC.rule and d3ad0ne.rule.
- A typical example of a wordlist attack with John the Ripper is:
m3g9tr0n@linux:~/JohnTheRipper-OMP/run/$ ./john --format=raw-md5 --wordlist=list.txt --pot=stratfor.pot --rules:Single Stratfor-hashes.txt
- A typical example of a wordlist attack with oclHashcat-plus (GPU based) is:
m3g9tr0n@linux:~/oclHashcat-plus0.09/$ ./oclHashcat-plus64.bin -m 0 hashfile.txt list.txt -r rules/best80.rule -o hashfile-crack.txt --remove
During our cracking processes against Stratfor, we observed that many passwords contained the word “stratfor”. Based on this observation, we considered to generate our own rule that appends or prepends this keyword at the begining and at the end of each word of a given wordlist. The following code is an example of rule created for John the Ripper in the john.conf file.
[List.Rules:stratfor] A0"[Ss][tT+][rR][aA@][tT+][fF][oO0][rR]" Az"[Ss][tT+][rR][aA@][tT+][fF][oO0][rR]"
After cracking a big amount of passwords, we generated a custom charset with John the Ripper.
- A typical example to generate your own charset file with John the Ripper:
m3g9tr0n@linux:~/JohnTheRipper-OMP/run/$ ./john --make-charset=stratfor.chr --pot=stratfor.pot
- And the associated incremental rule in john.conf file:
[Incremental:stratfor] File = $JOHN/stratfor.chr MinLen = 10 MaxLen = 31 CharCount = 95
The charset file can be used to conduct Brute Force attack with John the Ripper based on Markov model.
- A typical example of Brute Force attack with Markov model in John the Ripper is:
m3g9tr0n@linux:~/JohnTheRipper-OMP/run/$ ./john --format=raw-md5 --incremental=stratfor --pot=stratfor.pot hashfile.txt
We left John the Ripper to run for a large amount of time. Many passwords were cracked, but the most important was that a large amount of these recovered passwords using this method were 8 characters mixed upper, lower and numbers. Thus, we understood that Stratfor had a policy of generating either default or recovered passwords with this policy for their users. Our first thought was to use pwgen utility in order to produce random passwords based on this policy.
- A typical example of pwgen to generate 8 characters mixed upper, lower and numbers:
m3g9tr0n@linux:~/JohnTheRipper-OMP/run/$ pwgen -c -n -s -1 8 5 Ch1NiIzz YrN5SSXL 8CdcCJGG 5YBIxBTt rmIW8ipN
Of course in our case we should generate more passwords and pipe pwgen’s output to John the Ripper or Hashcat-Suite. But this kind of attack is too slow. For that reason we should take advantage of GPU. We applied Brute Force attack via oclHashcat-plus.
- A typical example of Brute Force attack with oclHashcat-plus:
m3g9tr0n@linux:~/oclHashcat-plus0.09/$ ./oclHashcat-plus64.bin -a 3 -1 ?l?d?u hashfile.txt ?1?1?1?1?1?1?1?1 -o hashfile-crack.txt --remove
This kind of attack took 2 days and 17 hours to complete with an ATI 5770 but it was only able to crack 48% of passwords.
- Some examples of cracked passwords generated from Stratfor’s policy are:
dd39ebf25b0892803c0edfdedfcf137a:4QnvJQKQ 0adff76e3b3c2130fcb8d9cf476f947a:4Kjduu8J 61b4f425867841330cec762d96df157b:4sFqqEnY ffee030ed8d97ad550e50b011d95b47b:2xdjVx7G 728d78a787d7279cb0a007f5f68d817c:2DJsL9jE 00ca874d657b3fcdddbb96121667ca7c:33g3UWcA 73b87959e3d1ba6c97037f6ddb5be87c:3TSfVw9M 9a4f0f28125c03323951283409c8187d:37nfZS6p 01dfda585ff13b24ab1d276bfd58227d:2K2HHfKC 7a4f94112cd50422740035dd80f52a7d:2s6KkegZ 99ee4023fc71693006af30dbb25f477d:4f9ySQxR e46c4ccb9323566dbeb1a33967c94a7d:2SfXBWb7 99aba8d7e69649332ac64e813a664b7d:4pZ7ZmjJ e5f706829a937c3fa5e430c81e926f7d:3YnxoEfy ffff9c930660fae4c9e9ace85a96a27d:2JTSA88Y 0d7103e46a1c0f44df5c096b6e2ae17d:2ATb8ApH
Regarding eHarmony it seems that the website had a policy to covert all users’ passwords to UpperCase. For example, if you had inserted, as a registered user, the password “p@$$w0rd”, eHarmony’s system would have converted it to “P@$$W0RD”.
The first thought that came up to my mind was to write a simple rule for John the Ripper to convert all my wordlists to uppercase characters:
Then, I applied this Rule to John the Ripper and a large amount of passwords were cracked very fast:
m3g9tr0n@linux:~/JohnTheRipper-OMP/run/$ cat ../Wordlists/* | ./john --format=raw-md5 --pipe --pot=eharmony.pot --rules:eharmony hashfile.txt
Due to the fact that my wordlists do not contain only uppercase letters, numbers and symbols it was a waste of time to apply other rules against eHarmony hashes. So I decided to convert the most effective wordlists to uppercase characters, using the above mentioned rule, and apply some specific rules:
- Convert a wordlist to uppercase with John the Ripper:
m3g9tr0n@linux:~/JohnTheRipper-OMP/run/$ cat ../Wordlists/* | ./john --pipe --rules:eharmony --stdout > ../Wordlists/UpperList.txt
Then, I used the –wordlist attack with John the Ripper using the following rules (it is a sample you can add more):
$$$ ^[S] $[T]$[E]$[R] ^[P] $[M]$[A]$[N] ^[M] ^[B] ^[C] ^[A] ^[A]^[P] ^[T] $[I]$[N]$[G] ^[A]^[M] ^[S]^[A]^[P] $[P]$[B]$[B] $[R]$[T]$[Y] ^[D] $[E]$[R]$[S] ^[H] $[P]$[E]$[R] ^[F] $[G]$[E]$[R] ^[G] $[K]$[E]$[R] ^[K] $[S]$[O]$[N] ^[R] ^[L] $[I]$[N]$[E] ^[P]^[H]^[P] $[I]$[O]$[N] ^[J] $[V]$[E]$[R] ^[W] $[E]$[S]$[T] ^[H]^[P] $[D]$[E]$[R] ^[N] $[K]$[E]$[Y] ^[H]^[C] $[O]$[N]$[E] ^[E] $[A]$[S]$[S] ^[E]^[W]^[Q] ^[A]^[S] $[T]$[O]$[N] ^[E]^[D] $[D]$[O]$[G] ^[W]^[Q]
Of course, you can always generate your own rules or modify existing custom rules contained in the john.conf file. In addition to this, Hashcat’s Suite rules can be used. One simple rule is to use the keyword “EHARMONY” at the beggining or at the end of each word:
[List.Rules:eharmony] A0"[E][H][A][R][M][O][N][Y]" Az"[E][H][A][R][M][O][N][Y]"
For people who do not own strong hardware and adequate disk space, Hashcat-suite contains a powerfull parameter which has to do with combination. In other words, you can combine each word of your first wordlist with the other.
- Thus, I generated some wordlists via crunch, such as the following one of 4 ualpha-numeric characters:
m3g9tr0n@linux:~/crunch3.1/$ ./crunch 4 4 -f charset.lst ualpha-numeric -o 4-list.txt
- And used combination attacks with oclHashcat-plus:
m3g9tr0n@linux:~/oclHashcat-plus0.09/$ ./oclHashcat-plus64.bin -a 1 hashlist.txt ../crunch3.1/4-list.txt ../crunch3.1/4-list.txt -o hashfile-crack.txt --remove
Methodology for Other Leaks
Regarding other leaks such as Nvidia, Gamigo, Adobe, Project Whitefox, LinkedIN and various unknown leaks collected from Pastebin, the tools and methodoly are the same. The only difference is that in each situation we have to create custom rules that refer to the name of the platform/website or by guessing some keywords.
- John the Ripper Rules for Nvidia:
[List.Rules:nvidia] A0"[Nn][Vv][iI1][Dd][iI1][aA@]" Az"[Nn][Vv][iI1][Dd][iI1][aA@]"
- John the Ripper Rules for Adobe:
[List.Rules:adobe] A0"[Aa@][Dd][oO0][bB][eE]" Az"[Aa@][Dd][oO0][bB][eE]"
You can also create similar rules for Hashact-Suite.
Another effective technique is fingerprint attack. This is attack is focused on using cracked passwords against remaining hashes.
- To isolate cracked passwords from .pot files (John the Ripper or Hashcat-suite) use:
cut -d: -f2- john.pot | sort | uniq > list.txt
- In Hashcat-suite to isolate MD5 cracked passwords (from output with the -o option), use:
cut -b34- crack-file.txt | sort | uniq > list.txt
Then you can try all the rules mentioned above. From my own experience this technique has always great results.
ADVANCED PASSWORD CRACKING FOR HUNGRY PASSWORD CRACKERS… (Thireus)
During your cracking sessions you may certainly have noticed that most of the passwords used by users are always made of “keywords”. This can easily be noticed when dealing with big leaks such as LinkedIn, Gamigo or Stratfor. These keywords are interesting for us, as they are used by users consciously or unconsciously in their passwords. Fortunately for us, lot of users use the same keywords and if you want to go further in your cracking process the main idea will be to use these keywords as roots for generating new passwords. In this article section I (Thireus) will introduce you a new cracking technique based on this idea. But first of all let me explain what those keywords are exactly and why they can be so useful…
Basically keywords can be described as passwords or part of passwords that appear as intelligible or used by multiple users. Let’s focus on the following example:
Il0v3soph il0v3sam k4r3nl0v3sk4t3 l0v3s3at l0v3s3x Myl0v3s
These passwords have the keyword “l0v3s” in common, which can be found at the begining, at the end or in the middle of the password. A common mistake would be to think that re-using these passwords with various rules will make more “l0v3s” based passwords appear, which is false because most of the rules you use will never extract the “l0v3s” pattern only, but combine or tranform each of these passwords… And yet, you keep thinking that there should be more words containing this keyword… and you are right!
As explained in this section’s introduction, keywords are not just words, they are part of passwords that are intelligible or repeated among multiple users’ passwords. Here are some example of keywords:
inked _123 assword !)!
Keywords can be anything intelligible or not. The most important think about keywords is that they are not random, ideally generated by humans AND have a high probability to appear in other passwords. And of course keywords can be part of other keywords, for example:
inked –> Linked, linked, winked, inkedIN, etc.
Another nice property of keywords is that they are independant of the password size. And a weak password (understand easily crackable with BruteForce/Rules/Wordlists) can contain a specific keyword, that you can use to crack other strong passwords. Let’s see for example how the following passwords have been cracked:
M00linkedin13 –> Was cracked because it contains the keyword “linkedin13” which is part of more than 40 other linkedin passwords and is also a weak linkedin password. M00linkedin13 = 3chars + keyword
0linkedin1-us2 –> Was cracked because it contains the keyword “0linkedin1” which is part of “M00linkedin13” and 1 other linkedin password. 0linkedin1-us2 = keyword + 4chars
The padding technique – CTH_WordExtractor
So the main idea that can cross your mind would be to manually analyse your cracked passwords and look for good keywords, to finally write rules based on those few keywords… But what if there are so many keywords that you can’t even complete all this work manually? The answer is to have a keyword extractor based on your results, and CTH_WordExtractor.sh (from my “Crack That Hash” project) is the script I have created for this purpose!
You can get the script here: CTH_WordExtractor.sh
This script helps you to extract all potential keywords directly from your current pot file. Basically what this script does is:
- Read all passwords and use a padded window which padding and size vary from X to Y as defined by the user.
- Sort extracted words by size and for each word count its redundancy in all passwords.
- Ask the user to select a range of redundancy to select only good words. In other words to select real “keywords”.
- Generate keyword wordlists from X chars to Y chars to be used by the user.
In the case of LinkedIN passwords, a 4-6chars keyword wordlist would contain the following keywords (this is just a little sample):
inke inked link Link linke Linke linked Linked
This wordlist will be used to append and prepend characters using BruteForce and Mask attack (which is the most effective). As you can see, most of these keywords are part of other keywords… and you can think this is actually very bad in term of performances… but it is not … let’s see why.
Let’s take the example of the “inke” keyword…
BruteForce + Mask attack with ?l will generate 26 possibilities per keyword:
inke –> ?l + inke = 26 possibilities
But ONLY 1 will cause a repeated password which is “linke“.
The next step of the process will be to use BruteForce + Mask attack with ?l?l which will generate 26^2=676 possibilities per keyword:
inke –> ?l?l + inke = 676 possibilities
But ONLY 26 will cause repeated passwords which are those that have been generated by ?l + “linke“.
And for sure, we have been able to recover all passwords containing the keyword inke, including unexpected passwords such as:
$dynamic_26$00000cd9fb6fe9d200144077861d4dc70c7d4798:reinke $dynamic_26$00000efc970e5f2edc1bf34fea284e930b677c19:twinke etc.
The Proper Way to Use Generated Keyword Wordlists
First of all, this technique becomes more effective and useful when you reach your limits with other classic cracking techniques. Meaning that if you want to have a very good keyword wordlist you need a very big pot file.
Then, this technique must be used with GPU BruteForcing + Mask attack or using combination attacks. Applying classic John the Ripper or Hashcat rules on the keyword wordlist will not be effective at all and will be very slow. In this article I will only take as example the GPU BruteForcing + Mask attack.
- First of all, we need to generate our keyword wordlists from 4 to 14 chars. Let’s do this for the john.pot of our LinkedIN cracked passords:
$ ./CTH_WordExtractor.sh 4 14
Other settings can be found in the CTH_WordExtractor.sh script such as padding limits.
- This is the list of wordlists generated:
$ ls CTH/ CTH_WORDLIST_FINAL_10-10.dic CTH_WORDLIST_FINAL_4-6.dic CTH_WORDLIST_FINAL_6-9.dic CTH_WORDLIST_FINAL_10-11.dic CTH_WORDLIST_FINAL_4-7.dic CTH_WORDLIST_FINAL_7-10.dic CTH_WORDLIST_FINAL_10-12.dic CTH_WORDLIST_FINAL_4-8.dic CTH_WORDLIST_FINAL_7-11.dic CTH_WORDLIST_FINAL_10-13.dic CTH_WORDLIST_FINAL_4-9.dic CTH_WORDLIST_FINAL_7-12.dic CTH_WORDLIST_FINAL_10-14.dic CTH_WORDLIST_FINAL_5-10.dic CTH_WORDLIST_FINAL_7-13.dic CTH_WORDLIST_FINAL_11-11.dic CTH_WORDLIST_FINAL_5-11.dic CTH_WORDLIST_FINAL_7-14.dic CTH_WORDLIST_FINAL_11-12.dic CTH_WORDLIST_FINAL_5-12.dic CTH_WORDLIST_FINAL_7-7.dic CTH_WORDLIST_FINAL_11-13.dic CTH_WORDLIST_FINAL_5-13.dic CTH_WORDLIST_FINAL_7-8.dic CTH_WORDLIST_FINAL_11-14.dic CTH_WORDLIST_FINAL_5-14.dic CTH_WORDLIST_FINAL_7-9.dic CTH_WORDLIST_FINAL_12-12.dic CTH_WORDLIST_FINAL_5-5.dic CTH_WORDLIST_FINAL_8-10.dic CTH_WORDLIST_FINAL_12-13.dic CTH_WORDLIST_FINAL_5-6.dic CTH_WORDLIST_FINAL_8-11.dic CTH_WORDLIST_FINAL_12-14.dic CTH_WORDLIST_FINAL_5-7.dic CTH_WORDLIST_FINAL_8-12.dic CTH_WORDLIST_FINAL_13-13.dic CTH_WORDLIST_FINAL_5-8.dic CTH_WORDLIST_FINAL_8-13.dic CTH_WORDLIST_FINAL_13-14.dic CTH_WORDLIST_FINAL_5-9.dic CTH_WORDLIST_FINAL_8-14.dic CTH_WORDLIST_FINAL_14-14.dic CTH_WORDLIST_FINAL_6-10.dic CTH_WORDLIST_FINAL_8-8.dic CTH_WORDLIST_FINAL_4-10.dic CTH_WORDLIST_FINAL_6-11.dic CTH_WORDLIST_FINAL_8-9.dic CTH_WORDLIST_FINAL_4-11.dic CTH_WORDLIST_FINAL_6-12.dic CTH_WORDLIST_FINAL_9-10.dic CTH_WORDLIST_FINAL_4-12.dic CTH_WORDLIST_FINAL_6-13.dic CTH_WORDLIST_FINAL_9-11.dic CTH_WORDLIST_FINAL_4-13.dic CTH_WORDLIST_FINAL_6-14.dic CTH_WORDLIST_FINAL_9-12.dic CTH_WORDLIST_FINAL_4-14.dic CTH_WORDLIST_FINAL_6-6.dic CTH_WORDLIST_FINAL_9-13.dic CTH_WORDLIST_FINAL_4-4.dic CTH_WORDLIST_FINAL_6-7.dic CTH_WORDLIST_FINAL_9-14.dic CTH_WORDLIST_FINAL_4-5.dic CTH_WORDLIST_FINAL_6-8.dic CTH_WORDLIST_FINAL_9-9.dic
CTH_WORDLIST_FINAL_4-14.dic for example means WORDLIST from 4 to 14 chars.
- Then we can select a specific wordlist to be used by cudaHashcat-plus or oclHashcat-plus:
$ ./cudaHashcat-plus64.bin -m 100 -a 6 -1 ?a ../LEFT_LINKEDIN_CLEANED.txt ../CTH/CTH_WORDLIST_FINAL_4-11.dic ?1?1?1?1 --remove --gpu-temp-abort=110
In this example, CTH_WORDLIST_FINAL_4-11.dic has been choosen because oclHashcat-plus/cudaHashcat-plus has a limit of 15 chars for hash computation. Which means you will never be able to crack passwords that are more than 15 chars long… And that’s why if you use a mask attack of 4 chars to be bruteforced you must use a wordlist containing words limited to a size of 11 chars.
- This is an output sample:
499896a0a104c0be6d7e578f9257e56e2dd97b31:rottweiler3:!^ 556cdfaabedd4a90c23627782ab7eb7a4d709565:LinkedInMakes$ e5386e1f0de44840a987c4d0840accbe2573511f:NetworkingLuv! 08e7c2d275a68e1519c8b0842c68601b7ba6274a:19linkedin_68! 359e2430b1e4352f1577575b7ca1ae6866131820:linkedinmym99! 8e6139a4503dd34297e32df7ea4cedc4275d3a85:linkedin15c00! df0fdf12590705e9c3ef6edb6f59323e3de6a70b:linkedinl1ng0! 79984358590405280bca6e43d331465bdb586746:linkedin81*&1$ 49cd314ab02e393171bcf1bf13099f55495b2c2e:Linkedin12kay# 7813dc98e26938e83f4475c32bbd07a3fb81b473:linkedin69TJK] cc307a7d9e40b00c0100bc049c397b817aa0a274:linkedin12914?? 33f13bb3b861c0e5fc82b10fba7857107e079884:steelwindows@77 3dd28c9d9cc4f646c254d6b4570e8bc6268b020b:artdirector@nsa 44bdcefe2a698925c57d80712763245d07326704:yaslinkedin@yas 8aa482c9989df0def8756e545457ebf206da9895:Linkedin151$cdu 56267a448f53e5d6095844152310d12e52b710aa:thundercats@83a a5949feca9f34d7042aaffe537db0e2d298c572f:linkedin13713@@ fab9ae4accf0b5766489c7760f4ee52582940d3c:missinglink=wwd 1d92639e0279840b8d00a2d7793c291838664c6c:my-linkedin-pwd a1bac77b4fe610ec13300d246ad882a68f0fedda:Interactive@ln1 90ba89bfa42002d8e6fb4fe3728bcbcd6605b49c:Inspiration.SSN [s]tatus [p]ause [r]esume [b]ypass [q]uit => s Status.......: Running Input.Base...: File (../CTH/CTH_WORDLIST_FINAL_4-11.dic) Input.Mod....: Mask (?1?1?1?1) Hash.Target..: File (../LEFT_LINKEDIN_CLEANED.txt) Hash.Type....: SHA1 Time.Running.: 1 day, 7 hours Time.Left....: 3 hours, 59 mins Time.Util....: 112529717.4ms/0.0ms Real/CPU, 0.0% idle Speed........: 35724.6k c/s Real, 36175.5k c/s GPU Recovered....: 292/1086109 Digests, 0/1 Salts Progress.....: 4020080601574/4533053083750 (88.68%) Rejected.....: 0/4020080601574 (0.00%) HWMon.GPU.#1.: -1% Util, 82c Temp, -1% Fan
And as we can see some interesting keywords have been selected, such as “rottweiler“, “Networking“, “Interactive“, “artdirector“, “Inspiration“, and of course keywords containing the word “linkedin“.
You can also notice that I’m not using a very powerful GPU , but a laptop with a “NVIDIA NVS 3100m” chip. So you can imagine how powerful this method can be with a better GPU!
To conclude on my new technique, I would say that it was very successful. I’ve been able to recover more than 1 million passwords after having exhausted all the classic techniques I usually use, and that in just 13 days with a NVidia GTX 480 and an AMD HD6870. This 1 million result was mainly against Gamigo, eHarmony and Stratfor and after an initial achievment of about 80% recovered passwords. And one thing to consider is that to go further in the cracking process and have an optimized cracking methodology, I prefered merging multiple MD5 leaks into one big MD5 leak and use this technique against the merged pot file to generate my keywords. As explained before, you will find this technique more useful in the case of very big leaks and very big pot files.
Please consider my CTH_WordExtractor.sh script as a Xmas gift. I would love to receive feedbacks about your results with it. Of course, if you have ideas to ameliorate this script or this technique do not hesitate to contact me.
METHODOLOGY TO GENERATE EFFECTIVE WORDLISTS… (CrackTheHash)
The main purpose of most of the classic cracking techniques are to guess the most common patterns in users’ passwords. Those techniques are either dealing with rules or wordlists, but in any case for them to be the most effective possible they need good candidate passwords as root of the technique process. But how can you find those good candidate passwords? The purpose of this part will be to explain a technique to find fresh new candidates from various sources such as Pastebin or Twitter.
First of all, to understand what brought me (CrackTheHash) on this methodology field, you need to know something about my hardware resources. They are very limited! I just own a dual-opteron with 2GB RAM. And for this reason, I do not want to exhaust my CPU for cracking hashes that everyone can easily recover. So I decided to focus my research on finding sources of good candicate passwords to be used for cracking techniques.
In order to know what we are looking for, let’s write some principles that will rule our research. Those principles are based on the password characteristics for them to match at best the requirements of good candidates. And they are the following:
- Password candidates must be up to date.
- Password candidates must be representative of what people may use.
- Password candidates must be multilingual (passwords in Russian, Chinese, Greek, Farsi, etc.).
- Password candidates must be available in large quantity.
There are multiple sources on the Internet where you can find a large amount of data containing password candidates, but only a few will fill those requirements. For the needs of this article we will focus only on two platforms and sources of good password candidates, Pastebin and Twitter.
Pastebin is probably the first Web location where you can find lot of fresh leaks and various user data. What is very interesting in most of the leaks we can find on Pastebin is that they often include real passwords in plaintext. So, monitoring Pastebin is quite interesting and useful to get fresh new candidate passwords. On top of that, there are several resources on the Internet, that will help you to monitor and download the latest Pastebin leaks. Portals like Leakedin, @Pastebindorks or @PastebinLeaks or projects like pastemon and pasteminer are good examples of sources and tools you can use.
Unfortunately, in order to generate effective wordlists you have to create some further scripting because the data does not come very well parsed. The first step and ordinary solution to parse the Pastebin data is to generate a wordlist using the space or tab character as separator and replace it with a line break. This way may lead to miss some interesting cadidates as in some leaks or cracking results. Most of the time you will find lines containing “username assword”, “username | password” or even worse, direct sqlmap output, etc. So you have to be clever and find the best way to parse those leaks to create useul wordlists.
In any case, Pastebin can help us to build useful wordlists, because everyday new leaks are uploaded. The produced wordlists are not that amazing in term of quantity, but usually their content is valuable.
Nowadays people tend to use sentences or combination of words for their passwords. They have been advised to do this as it is considered to be a strong and easy to remember way to create passwords. So I decided to use one of the the best sentence generator ever… Twitter! Indeed, everyday people generate tweets with fresh content and in this case our password candidates are just what people are saying.
The most important things about Twitter are that this social platform generates a lot of public and fresh data, is international and tweets are short enough to be parsed individually! On top of that, wordlists generated via Twitter can continuously feed John the Ripper.
So the first step is to grab live Twitter’ content. In order to achieve this, Twitter provides a live-feed query that gives you a full json of tweets with all the data you need. The only elements that are required to perform this query are a valid Twitter username and password:
curl --user <username>:<password> https://stream.twitter.com/1.1/statuses/sample.json
To get only the tweet content you have to parse it a bit. First we may need the ‘-m’ argument of curl to timeout just in case of network trouble and then grep the data received with the keyword \”text\”.
curl -m 10 --user <username>:<password> https://stream.twitter.com/1.1/statuses/sample.json | grep \"text\"
Once received, the result must be parsed because it comes with Unicode escaped characters. Something like the following script will do the trick:
import json, sys for data in sys.stdin: jj=json.loads(data) twit=jj["text"] print twit.encode('utf-8') print "done"
The above few lines of Python code can be directly used to generate candidate passwords, which means keeping the whole sentence of the tweet. Another approach is to use each word of the tweet as a candidate password. Furthermore, an interesting idea is to combine tweet words with others.
What we can do is generate combinations of 4 words. Best results are by combining with or without space separators.
Here is a small Python script I wrote to performe this task, the input file is “tweets.txt”:
import sys def combinations(words, length): if length == 0: return  result = [[word] for word in words] while length > 1: new_result =  for combo in result: new_result.extend(combo + [word] for word in words) result = new_result[:] length -= 1 return result filein=open("tweets.txt","r") linesin=filein.readlines() for i in linesin: thisline=i.rstrip("\n").split(" ") for j in combinations(thisline,4): print '%s' % ''.join(map(str,j)) print '%s' % ' '.join(map(str,j)) for j in combinations(thisline,3): print '%s' % ''.join(map(str,j)) print '%s' % ' '.join(map(str,j)) for j in combinations(thisline,2): print '%s' % ''.join(map(str,j)) print '%s' % ' '.join(map(str,j)) for j in thisline[:]: print j
As far as size is concerned, 10 seconds of live Twitter feed will give you about 1.5 MB and about 600 tweets. This size can be reduced down to 50 KB when keeping only the parsed tweet contents. This combination script will give you around 50 Million candidate passwords to test.
Those two approaches, are not the most effective for cracking million passwords. But for sure, they will give you interesting results such as passwords considered as very strong that have even resisted to lots of GPUs’ on fire.
As you might expect, we are not professional password crackers. Password cracking is a hobby for us. Actually, our hardware resources are limited. And bruteforcing passwords is not the most time friendly way, unless you own many GPUs and strong hardware. For this reason, we are tryining to discover new and effective techniques to crack complex passwords.
But always keep in mind that any platforms, websites and online services are never entirely protected against hacking and data leaks. So we would like to give some advices in order to protect your passwords in case critical senarios such as LinkedIN leak happen:
- Never share passwords
- Never use the same password
- Always use strong passwords
- Do not use common words
- Change your passwords in a regular basis
We hope you enjoyed reading this article. Find attached at the end of this article our new wordlist as a late Xmas gift. And of course…
HAPPY NEW YEAR 2013!!!
ABOUT THE WORDLIST
LinkedIN Gamigo Adobe Blizzard eHarmony Geissens NVidia Stratfor Project Whitefox Various leaks collected from Pastebin
LinkedIN*: Loaded 6458020 password hashes SHA-1 LinkedIn Remaining 1078419 password hashes LinkedIN**: (CLEANED NO DUPS) Loaded 5787239 password hashes SHA-1 LinkedIn Remaining 880786 password hashes Gamigo: Loaded 7004341 password hashes MD5 Remaining 1019934 password hashes Adobe: Loaded 630 password hashes MD5 Remaining 95 password hashes Blizzard: Loaded 15932 password hashes MD5 Remaining 4967 password hashes eHarmony: Loaded 1513805 password hashes MD5 Remaining 134345 password hashes Geissens: Loaded 32502 password hashes MD5 Remaining 4180 password hashes NVidia: Loaded 791 password hashes MD5 Remaining 354 password hashes Stratfor: Loaded 822666 password hashes MD5 Remaining 58694 password hashes
*, ** The initial LinkedIN hashlist contains 00000ed and non-00000ed SHA1 hashes. A lot of 00000ed hashes still have their duplicate non-00000ed hash in the list. For instance, if you crack the initial LinkedIN hashes with our wordlist you will find 473148 duplicates between 00000ed and non-00000ed, and if you are using John the Ripper with –format:raw-sha1-linkedin you will need to run the process twice to write duplicates (either the 00000ed or non-00000ed version) in your POT file. If you have already considered duplicates as non-useful, then the right results to consider are the ones from the CLEANED version.
Some Pipal Analysis
- LinkedIN: M3G_THI_CTH_WORDLIST_LinkedIN_PIPAL.txt
- Gamigo: M3G_THI_CTH_WORDLIST_Gamigo_PIPAL.txt
- eHarmony: M3G_THI_CTH_WORDLIST_eHarmony_PIPAL.txt
- Stratfor: M3G_THI_CTH_WORDLIST_stratfor_PIPAL.txt
The wordlist provided in this article has been created using all the presented cracking techniques against public leaks only. Do not expect to find new passwords using the same leaks and techniques presented here.
As always it is up to the reader to use this wordlist to do password recovery. We do not take any responsibility if some of your passwords can be found in this wordlist or be recovered using our techniques. Be aware that the best way to protect you is always to change your passwords as often as possible.