Size Matters

Print This Post

David Byron

David Byron is a software developer working for the military-industrial complex. At Popehat, he writes about art, language, theater (mostly magic), technology, lyrics, and aleatory ephemera. Serious or satirical poetry spontaneously overflows from him while he's recollecting in tranquility. @dcbyron

You may also like...

179 Responses

  1. Steven H. says:

    nice intro to the basics of crypto, and the value of passphrases….

  2. David says:

    Thanks. I sometimes feel, in keeping with Phil back in the 90s, that if we could just stop calling them 'passwords' and start calling them 'passphrases', that might somehow subtly, magically, wishfulthinkingly start to change the cultural landscape that most users inhabit.

  3. Ken White says:

    This may be a question that is so big that it's asking for another post.

    But are there any numbers on how the analysis changes if the string isn't jumbled?

    So, like, if I use "bare ruined choirs where late the sweet birds sang" I'm sure it's not as reliable as "rutabaga XYZ Gargamel 3498 stupendous rottweiler odor pyrite amaretto." But — given the ability of computers to run every phrase in literature — how much worse is it?

  4. Dan Weber says:

    I read Diceware almost 20 years ago and it's still a great way to pick passphrases (they talk about 5 words, not 7).

    http://world.std.com/~reinhold/diceware.html

    "world dot std dot com" practically screams stone-age.

    A small group can probably put together a rainbow table for all MD5s of the Diceware group in not too long a time, but you can very easily modify Diceware to be trivially more secure without making it much harder to memorize. Just be aware that some ways of modifying it make it less secure.

  5. Cat G says:

    And yet, with all the billions of Empires running numbers, our esteemed masters wish us to be happy with 16 wheels running at best an alphabet of 96. Which they then transform to 40 wheels with an alphabet of 36.

  6. AlphaCentauri says:

    First thing I'd do is not have all the words from the same language. That increases your "girth" significantly.

  7. Trevor Pott says:

    @Ken White you are definitely on the right track to understand how modern crytocracking works. Ars has a great primer for you here. They have a number of other great articles on the topic (Just search their site for "hashcat") that do a far better job of explaining than I ever could.

  8. babaganusz says:

    Qióng: Shīfu, why do you use such a stupid, pointless lock on the shed? It takes on average only a little more than 8 minutes for a stranger to open it!

    Shīfu Shíjú: The shed is empty, Qióng.

    Qióng: I am now enlightened, Shīfu Shíjú. Please continue.

    in before first joke/rant about squatters… and echoing Steven H.'s appreciation!

  9. Hayden says:

    @Ken

    You are right that English phrases are less secure than random strings of The same length, because there are fewer valid English phrases.

    Quotes as passphrases should be considered insecure.

    arstechnica.com/security/2013/08/thereisnofatebutwhatwemake-turbo-charged-cracking-comes-to-long-passwords/

    A nonsense phrase made up on your own should be fairly secure, but your best bet is diceware, or a password manager with random strings as passwords.

  10. Ben Robinson says:

    If the first thing the attacker tries is searching all possible combinations of words, then a meaningful one like "bare ruined choirs where late the sweet birds sang" is no less complex than a random one.

    Against an attacker who searches a collection of well-known phrases before searching random phrases, all I can say about the security of a meaningful phrase is "too easy." To be able to quantify "too easy," I'd have to know how big his list is and have some way of estimating the probability that your phrase is on it.

    http://www.phrases.org.uk/meanings/phrases-and-sayings-list.html has a list of about 2000 commonly-quoted phrases. Trying everything in that list would take seconds. It becomes a matter of minutes or hours if the attacker is using a larger collection of well-known phrases.

    Fun fact: At least one security researcher's collection of well-known phrases includes "Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn". Just in case anyone ever thought that would make a good password. (src: http://arstechnica.com/security/2013/08/thereisnofatebutwhatwemake-turbo-charged-cracking-comes-to-long-passwords/ )

  11. Yinepuhotep says:

    If only we could rely on the thieves to be satisfied with merely trying to break our combination. Unfortunately, xkcd knows better what the true problem is:

    http://xkcd.com/538/

    What we need is both the best passphrases and a defense against wrenches.

  12. Ken White says:

    So. Is there a password strength tester that is recommended given this encryption theory?

  13. En Passant says:

    Just to state the obvious, as long noted by XKCD #538:

    Shīfu Shíjú: After you have raked the yard, I will teach you about a padlock that has a girth of 2 and 4096 wheels. How long do you think it would take many empires of many thieves to find the magic number for such a lock?

    Qióng: Do you know the magic numbers?

    Shīfu Shíjú: Yes. Otherwise I could not use the lock.

    Qióng: Then a couple days tops if they know where to find you. Now I will go rake the yard.

  14. Trebuchet says:

    How long will it take the thief to open the lock? About two minutes, if the thief is Richard Feynman and the lock is on a filing cabinet containing the most secret stuff in the whole world.

  15. John Kindley says:

    A couple dumb questions: Are there only 7776 words in the English dictionary or something? And I think i see the lesson on pass phrases but is there also a lesson on encryption as the first commenter suggested, or is there an obvious connection between the two I'm missing?

  16. Erich says:

    It would seem then that the point of the ARS technica article is simply, "Don't use a passphrase that might exist in whole in any written work."

  17. Trebuchet says:

    Wish I could edit! Anyhow, here's the appropriate XKCD:

    https://xkcd.com/936/

    Also, the cabinets Feynman was getting into had a length of 3 and a girth of 30. He could get into them because the users would leave them at the last number after opening them, he could tell the previous number by feel, and there was a tolerance of +/- 3 on the numbers, meaning he only had to try about six numbers (multiples of 5) to get the first digit.

  18. Xenocles says:

    "Is there a password strength tester that is recommended given this encryption theory?"

    I would think that such a tester would need to have a list of possible passwords to compare against if it's going to provide more insight than the simple combination calculation. And wouldn't the existence of that list reduce the security of any password on it?

  19. JDM says:

    Qiong:When I am done raking the yard, I shall discuss it with the SHA of Pera! :)

    In all seriousness the beauty of defeating a dictionary attack is how easy it is. So yes, Ken, "to be or not to be" is a pretty weak passphrase but "to be sucks or not to be" is a MUCH stronger passphrase. "To be klhdHL2$ or not to be" is even stronger. Unless the hacker guesses the exact phrase, there is no points for getting close.

    The other nice thing is that, unless your secrets are uniquely valuable, then you don't have to defeat a million empires for a million years — you just have to be harder to crack than the next guy! As things exist on the internet right now, a passphrase, even a fairly well known passphrase, is going to keep you safe for a very long time. (Or alternatively, since NSA has strongarmed software vendors into putting back doors into their product, your password doesn't really matter that much anyway.)

  20. David says:

    So, like, if I use "bare ruined choirs where late the sweet birds sang" I'm sure it's not as reliable as "rutabaga XYZ Gargamel 3498 stupendous rottweiler odor pyrite amaretto." But — given the ability of computers to run every phrase in literature — how much worse is it?

    This isn't addressed in the article, but length doesn't achieve security; length allows some space in which entropy can achieve security. So it's not just a matter of having one thing after another; it's a matter of the next thing not being in any sense predictable based on what has gone before.

    For this reason, an 8-word quote from Shakespeare in principle would be much less secure than a chain of 8 randomly selected words.

  21. Xenocles says:

    To break the metaphor (or perhaps not), when faced with a lock with a million wheels I would reach for the bolt cutters.

  22. Trevor Pott says:

    @Erich: the Ars article was merely an intro-level one. For more in depth ones search the site for "hashcat." You'll find gems like this one and this one, both by Dan Goodin.

    Ken seemed to be asking some prety entry level questions, I figured I'd point him at an entry-level article (that addressed phrases specifically) and let him search the site for more if his interest was piqued. There are lots of good articles linked to on the topic by HackerNews as well.

  23. Dan Weber says:

    Are there only 7776 words in the English dictionary or something?

    See the Diceware link. David didn't link to it — and maybe I stole his thunder by doing so — but each time you roll 5d6 and pick a word from the list, you are picking from one of 7776 words, almost 13 bits of entropy.

    Note that it is perfectly okay if you tell an attacker exactly the method you used to pick your random words — even to give him the exact word list you used.

    And those 7776 words were chosen to be "short" — only one of "correct horse battery staple" is in there. I haven't checked word lists but you can probably easily grow to 46656 (6^6) by allowing longer normal English words. And it's not really any harder to remember longer words than shorter words.

  24. Tom says:

    If only my bank allowed passwords with 9 or more characters! (seriously)

  25. Ken White says:

    For this reason, an 8-word quote from Shakespeare in principle would be much less secure than a chain of 8 randomly selected words.

    I figured. Thanks. (Though "much," when dealing with these numbers, is a pretty broad term, right? )

  26. David says:

    @Ken, you asked for numbers. In general, an English sentence has about 1 bit of entropy per character, so the quote would have perhaps 50 bits. That's about the same as 4 and a half diceware words.

    Of course, if your attacker happens to use a dictionary that includes each line of Shakespeare as a possible passphrase, you're toast. And Project Gutenberg certainly makes that feasible for a large adversary, or for a script kiddie in Bulgaria who happens to download the right crackware.

    If you must go non-random, then counteract the loss of entropy by perverting the phrase– mashing up segments of different quotes to thwart dictionary attacks or interlarding a few girth-enhancing oddities you're unlikely to forget.

  27. David says:

    By the way: a padlock with girth of 2 and length of 4096, attacked by a billion empires of 2.2 billion attackers making one attempt each per second would, according to WolframAlpha, take approximately 5.5×10^1196 times 14 billion years (the estimated age of the universe). And that's a common key length for what people have been calling "strong crypto". There's not much inconvenience in using an even longer key– the proper response to massive, magical increases in computation speed.

    This is why one naive commenter's remarks about how rainbow tables will defeat computational complexity earned ridicule. It would not only take multiple lifetimes of the universe to perform the brute force attack, but would also take the physical space of many universes to house the tables. :)

  28. John Kindley says:

    If length creates more entropy than girth how about using a lengthy memorable phrase but spacing the letters according to an easily memorable method that breaks up or mashes together the constituent words so they're unrecognizable by a hacking program?

  29. Erwin says:

    First, assume that you start with 4 random words. There are about 10^5 words in English. So, about 10^20, or a bit over 60 bits.
    Then, assume an 8 character random password. About 5 bits per character – or 40 bits.
    Next, assume a memorable password – there's about 10^6 such passwords (word+some letter substitutions). So, about 18 bits. Not so good.

    Now, assume that you choose a guessable quote as a base phrase and that that quote has about 32 letters and 4 words in it. There seem to be ~10^4 guessable quotes in English. So, that's about 12 bits. Really sucky.

    Then, assume that you include n sensible word replacements and m random word replacements. I'll assume that there are about 10 sensible words and about 10^5 random words. So, each sensible word replacement adds about 3 bits and each insensible replacement adds about 15 bits.

    Spaces and other letter/numeric replacements add probably 1 bit per replacement.

    So, probably, you do a bit worse replacing every word with 'sensible' replacements as by replacing 1 word with something random.

    Basically, you seriously constrain the space of available passwords and reduce security by choosing something easy to remember. 4 randomly chosen words is probably a decent compromise between security and memorability.

    –Erwin

  30. David says:

    @Ken

    So. Is there a password strength tester that is recommended given this encryption theory?

    One quick and easy way to figure it out is to consult this chart:
    Scroll down to the chart that starts with the caption "Lengths L of truly randomly generated passwords…."

    Across the top, you'll see some common "girth" sets or alphabets.
    Down the left side, you'll see some resulting amounts of entropy in bits.

    Figure out which column best describes the passphrase you're contemplating. Then move down the rows until you see its characteristics. Finally, check the left for an estimate of your bits of entropy.

    For example, suppose you choose "I understand why 9h8 see how they c?y". That looks like it's made up of upper and lower case letters plus numbers and common special characters. "Case sensitive alphanumeric" is still too narrow, but the next column looks right: "printable ascii". The phrase is 37 "wheels" long, so moving down that column, we find 35 and 39; it's right between 'em.

    Moving over to the lefthand side, we see that 35 in printable ascii offers 224 and 39 offers 256. So the passphrase, if random, would offer around 240 bits of entropy. The same as 19 diceware words. That would be very secure, if truly random.

    It isn't random, of course; it's a multimunge of lines and lyrics. But even at 1 bit per char, it boasts 37 bits of entropy– about as much as 3 or 4 diceware words. And since it definitely doesn't appear in any attacker's dictionary, it's good enough as long as you're able to memorize it without pain.

  31. Trevor Pott says:

    @John Kindley that approach only works until it becomes known. Then someone will write something for hashcat to test that method and we're back to square one.

    Long version: anything you can come up with to make your passwords easy to remember, but high in entropy can be defeated in one of two ways:

    1) You blab about the method you are using to achieve "easy to remember".
    2) Someone else comes up with the same method and blabs about it.

    Thus it is that anything with any sort of pattern to it will likely be documented, a test coded to work with it and added to the list of "password patterns hascat can smash in seconds/minutes/hours."

    With key legnths of 4096 or higher, everything comes down to "find an exploit in the crypto algorithm" or "attack it with pattern matching." As David points out: even if you converted the entire mass of the planet into storage and compute for a bruting (or rainbow table) attack the mean time to crack even one such password would be unfeasibly long.

    Short version: there are many, many smart people out there right now building things that can attack even 4096-bit stuff based on patterns. If you use a pattern to your password, you're done. Totally random with crazy key lengths is the only way to secure data storage or communications.

    Well, secure until the dark room with the bright light becomes involved…

    (Also, then you get into "how secure is your password manager/keyring/etc.")

  32. John Kindley says:

    The last time I put any real thought into creating a password the advice I came across was to instead create a passphrase. But it seems by doing that you're dramatically increasing girth at the cost of dramatically decreasing length. If you've got 3 words in your passphrase your girth might be in the neighborhood of 10k but your length is only 3. What's the point of including the spaces in your passphrase, or if you include them at all why include them where they form actual words? If you don't include them, haven't you thereby dramatically increased the "length" of your "passphrase," to the number of its characters rather than the number of its words? In fact aren't you now back to using a password, but a password that's better than a passphrase?

  33. david taylor says:

    Morons. Ask any woman and they'll tell you that girth is definitely more important than length.

  34. David says:

    @John Kindley

    If length creates more entropy than girth how about using a lengthy memorable phrase but spacing the letters according to an easily memorable method that breaks up or mashes together the constituent words so they're unrecognizable by a hacking program?

    Length doesn't create more entropy; it just allows space for entropic content. That's why 11111111 and 12345678 are worse than `-\4:{Xs, even though they're all the same length.

  35. Grandy says:

    If only my bank allowed passwords with 9 or more characters! (seriously)

    This is a major problem with software. Frequently, online entities:

    1. Restrict password length (by far the most harmful action).
    2. Remove some/quite a few characters from the potential alphabet.

    The latter is an artifact of a crazier time in the web, I think. Sanitizing your inputs remains critically important but there are lots of ways to stop injection attacks while allowing people to construct sane passwords of their choosing.

    Your bank's website is designed horribly. But a lot of them are.

  36. xtmar says:

    Perhaps this isn't the right forum, but has anyone made progress on reasonably easy to generate one time pads? Obviously, you have problems related to passing the key, but if you can do that ahead of time, it seems like that would be the easiest answer, though obviously very few people have their own random number generators.

  37. hanmeng says:

    TL;DR. But if you're using "Shīfu" as the Chinese word for "master", it should follow the surname. Extra points for the correct tone mark, though.

  38. John Kindley says:

    David, I do get that length just allows space for more entropic content. I should be able to follow more better the math presented, but I gathered perhaps erroneously from the point that a girth of only 2 combined with a length of 4000+ created a cosmological level of entropy, and the student's impression earlier in the dialogue that length is proportionately more significant than girth, that the student's impression was correct, even if both length and girth are significant.

  39. David says:

    @hanmeng

    if you're using "Shīfu" as the Chinese word for "master", it should follow the surname. Extra points for the correct tone mark, though.

    Thank you for your remark. I pondered that option carefully, since I'm interested in that sort of thing.

    I decided that although the inversion is correct in Chinese, I'm not speaking or writing Chinese. I'm writing in English to a mostly English-capable readership. So I decided to follow the convention most familiar to such readers.

  40. David says:

    @xtmar

    Perhaps this isn't the right forum, but has anyone made progress on reasonably easy to generate one time pads?

    There's not really any progress to make; it's already easy to generate one-time pads. Using Java's SecureRandom to govern selections from a character set of your choice would do the trick.

  41. Dan Weber says:

    Chain is only as strong as the weakest link.

    Your goal is make sure that your password is not near the weakest link.

    Once you pass the point where your password is strong enough to never be in anyone's lookup table, start worrying about other lapses in your system.

  42. Xenocles says:

    Where does the three-strike lockout approach factor into all this?

  43. David says:

    @Xenocles
    Use fail2ban (or equiv velocity check) on your servers, and relegate offenders to a global ban list. That has nothing to do with cryptography proper, but it's a lot of fun to burn the wannabes. ;)

  44. Trevor Pott says:

    @David +1000 points for bringing up fail2ban. Everyone should know about it.

  45. Daran says:

    By the way: a padlock with girth of 2 and length of 4096, attacked by a billion empires of 2.2 billion attackers making one attempt each per second would, according to WolframAlpha, take approximately ~~ 5.5×10^1196 times the estimated age of the universe (14 billion years). And that's a common key length for what people have been calling "strong crypto". There's not much inconvenience in using an even longer key– the proper response to massive, magical increases in computation speed.

    Both you and David appear to be confusing symmetric and asymmetric crypto.

    Symmetric crypto keys – basically what David described in his post – max out at 256 bits for reputable ciphers. There's no point in going any higher as nobody, not even the NSA, will be able to brute-force even 128 bits in the foreseeable future. Indeed absurdly large keys is considered an indicator of snake-oil. See Warning sign #5.

    Asymmetric crypto is based upon different mathematics entirely and requires keys many times the length of symmetric for the same level of security. Here 4096 bit keys are common.

  46. En Passant says:

    David wrote Sep 8, 2013 @5:10 pm:

    If you must go non-random, then counteract the loss of entropy by perverting the phrase– mashing up segments of different quotes to thwart dictionary attacks or interlarding a few girth-enhancing oddities you're unlikely to forget.

    FWIW Something I've always done is to use the easily recalled pass phrase as the basis of a transmogrification scheme which is also easy for me to remember.

    It's sort of an idiosyncratic mental hash.

    You can make your transmogrification create as long a phrase or word as you want, just by adding more steps that you can easily remember. You can probably even write down the questions for each step without much chance of someone figuring out your new password or phrase, if they don't know your original phrase.

    For example, to use Ken's phrase, and transmogrify to only a few alpha-numerics:

    Original passphrase: "bare ruined choirs where late the sweet birds sang".

    Some easy to remember free association transmogrification steps based on questions. You make these up to suit yourself, whatever is easy for you to remember.

    1. What is it? William Shakespeare's Sonnet 73, line 4. Call it WSS73L4. This gives you 7 characters.

    2. How many lines? 14. Two more characters.

    3. Rhyme scheme? ABab CDcd EFef Gg. Call it AB squared times 3.5. Restate that as AB235. Five more characters.

    4. Meter? Iambic Pentameter. Call it I5. Two more characters.

    5. What meter did that supplant? Alexandrine. What was he? The Gr8. Three more characters.

    Continue this until you have sufficient characters. All you have to recall is the original phrase and your personal whimsical questions and answers.

    String them all together (separated by blanks here for clarity). What have you got?

    WSS73L4 14 AB235 I5 Gr8

    That can become your new passpharase or password.

    Or, you can "merge" the characters of all the "words" (blanks shown here for clarity), to form a different "password": W1AIG S4B5r S28 73 35 L4

    The "merge" is just the first character of each group, second character of each group, etc., skipping to the next group as shorter groups are exhausted.

    You can probably safely write down all the transmogrification questions without the "pass phrase" and leave them lying around. Maybe a clever "spy" could figure out your password from it. But likely not without knowing the original phrase.

    You can make your transmogrification as simple or as complicated as you can easily remember.

    Resulting passwords or phrases should be fairly impervious to dictionary attack.

    But it still won't likely survive the $5 wrench attack.

  47. David says:

    @Daran,
    There's no confusion. The sole purpose of this episode in the tutelage of Qióng is to help readers new to the issue to understand the effect of exponentiation on processing time.

    You have, however, guessed correctly at one thing I mentioned to Ken privately: in leading up to 4096 and broaching polycosmic time, I've laid a foundation for the next episode, in which I will discuss asymmetry, PKI, and related issues.

    Your link to the 1999 article from Schneier is welcome; he is a friend of the 'hat. Here's the most beautiful quote from that source:

    In fact, we cannot even imagine a world where 256-bit brute force searches are possible. It requires some fundamental breakthroughs in physics and our understanding of the universe. For public-key cryptography, 2048-bit keys have same sort of property; longer is meaningless.

    It is intriguing, though, that asymmetric with a 2048-bit prime modulus is only rated for classified content up to Secret, according to CNSSP-15. I would guess this has to do more with the impropriety of handling TS that way than with any genuine concern about outlasting the cosmos.

  48. Mark - Lord of the Albino Squirrels says:

    Very little better than an informative post and thread on a topic I know next to zero about. Especially when it is a topic I want/need to know about.
    Thanks to all.

  49. Carl 'SAI' Mitchell says:

    The big problem with "transmogrification"/"transformation" schemes is that they often give outputs that seem far more random than they really are. They tend not to actually add all that much entropy, they just make things look weird. Humans are TERRIBLE at judging what's actually random, and even subtle biases can provide a means of attack. Diceware is good because it's easy to use AND it has a good entropy source (dice!)

  50. AlphaCentauri says:

    Password security depends on the situation. If you have used a password on a site that gets hacked, so that all the password hashes are downloaded and the hacker can take as long as he wants decrypting, your password is probably going to be decrypted. But who cares? He's gotten control of the whole site anyway.

    If you've used the same username/password combination elsewhere, it's a much bigger problem.

    If he's trying to log into an account with your username and guessing passwords, it doesn't matter how many phrases he can try from Project Guttenberg if he gets locked out after 5 tries, or even if he has to wait an hour to try again. But if you've used a common password and he's trying multiple usernames with that password, he's going to get in, because he won't get locked out of any one account.

    If you are talking about logging into a computer at work or another place where people work shoulder to shoulder, a complex password is a bad idea because people can watch you enter it. Better to have something long using lower case characters in the center of the keyboard so you can type it so fast people can't catch it.

  51. eigenperson says:

    If you are talking about logging into a computer at work or another place where people work shoulder to shoulder, a complex password is a bad idea because people can watch you enter it.

    Yeah, but no one can remember it. Memorizing 12 random characters on one pass, even if they are presented slowly, is impossible for the average person (or even the above-average person), and at 4 characters per second it's even harder. If it were easy, then no one would complain about the difficulty of memorizing random passwords in the first place.

  52. Xenocles says:

    @David-

    Thanks. Good to know that actually works.

  53. En Passant says:

    Carl 'SAI' Mitchell wrote Sep 8, 2013 @7:28 pm:

    The big problem with "transmogrification"/"transformation" schemes is that they often give outputs that seem far more random than they really are. They tend not to actually add all that much entropy, they just make things look weird. …

    True. My point was thwarting dictionary attacks (or whole phrase attacks based on literary libraries), by creating easily derivable phrase or words, not generally increasing entropy.

    Although I note that testing the original passphrase and the derived one on a few "password strength" sites indicated that the derived one was "stronger".

    YMMV. The scare quotes mean that I have no idea what methods the various "password strength" sites used. The particular result could easily be just a fortuitous artifact.

  54. Trevor Pott says:

    @En Passant question for you sir: did you put a password you want to use on one website into a "password checker" on another website? Not to offend, but that seems like bad security practice.

    Who runs that password checker? Is it capturing your password? Who is it giving that password to? Do they have an American or Chinese legal attack surface such that the government can hit them with secret NSLs demanding that they collect passwords even if they claim they don't? My tinfoil hat is sparking a little here, sorry….

  55. En Passant says:

    Trevor Pott wrote Sep 8, 2013 @8:36 pm:

    @En Passant question for you sir: did you put a password you want to use on one website into a "password checker" on another website? Not to offend, but that seems like bad security practice.

    No offense taken. I tested the passwords discussed in comments above. I don't use them for anything. Neither does anybody else that I know of. They are hypotheticals.

  56. SharonA says:

    Making a password security checking website would be a cool way to get lots of potential passwords and have the added bonus of associating them with an IP address, browser, all the cookie data, and so on.

  57. James Pollock says:

    "Where does the three-strike lockout approach factor into all this?"

    Password authentication is based on an assumption; that assumption is that you know your password but nobody else does. Therefore, anyone who knows your password must be you.

    Now, go back to the basic lock in the article. You turn the wheels, trying each combination, until you hit the right one. How do you KNOW you hit the right one? The lock opens.

    Lockout works by changing the response. If you don't have lockout, you get a different response when you guess wrong and when you guess right. You keep trying until you get the feedback that indicates that you've guessed correctly.

    Once lockout is in play, the EVERY attempt to authenticate generates the SAME response, whether it's right or wrong. So, you can keep guessing, but you won't get the different feedback when you guess correctly, so you won't know when you've guessed correctly.

    The downside of lockout, of course, is that the person who knows what the password is is denied access along with the guesser. However, knowing that a lockout has occurred can trigger deeper investigation "did you forget your password?", "was that you trying to authenticate at this time and date?", and so on… and that allows your security administrator to learn if something needs their attention and take appropriate action.

    Another response is possible in two-factor authentication, which requires both "what you know" and "what you have", which is capture of the security token. For people familiar with the real world, that's what happens when you don't know the PIN that goes with an ATM card… the machine keeps the card. That's part of the reason why the bank can get away with a 4-digit numeric PIN, which is ridiculously weak.

  58. James Pollock says:

    "Yeah, but no one can remember it. Memorizing 12 random characters on one pass, even if they are presented slowly, is impossible for the average person (or even the above-average person), and at 4 characters per second it's even harder."

    There's a technological fix for this, and it's called a "digital video recorder". And a substantial number of adult Americans routinely carry one with them wherever they go. Shoulder-surfing still works, largely because nobody really watches for it.
    (People used to do it back in the olden days, when people typed in long strings of numbers on payphones to make long-distance calls without feeding coins into the phones. When there were still payphones.)

  59. mud man says:

    It would seem then that the point of the ARS technica article is simply, "Don't use a passphrase that might exist in whole in any written work."

    and be sure don't Google your phrase to find out, you fish

  60. melK says:

    "The shed is empty"

    Bad: you have one shed, and lock it once. (The thieves take the shed.)
    Bad: you have N sheds, and lock only the ones that have valuable gardening equipment in them when you lock them.
    Deplorable: you have N sheds, put the valuable rakes in one at random, and lock all the sheds at random. (The thieves simply open the unlocked sheds and eventually get lucky.)
    Good: you have N sheds and lock them at random. The gardener takes the equipment home with him at night once word gets around about the sheds.

    Occasionally you do have to smuggle mules…

  61. Yendor says:

    Shīfu Shíjú: Ah, I see you have finished with the leaves. Are you ready for more enlightenmet?
    Qióng: I am always ready for enlightenment from my Shifu.
    Shīfu Shíjú: Excellent! So here is my question. Suppose someone uses a lock such as the one we described earlier. How would you get into his shed?
    Qióng: Uh, I suppose I would try different combinations until I died of old age?
    Shīfu Shíjú: Ah Qióng! How gently you reproof me when I ask the wrong question! Very well, how would *I* get into his shed?
    Qióng: But Shīfu, you are not as quick as I — it would take you even longer!
    Shīfu Shíjú: You unworthy student! Do you really think I would spend my life standing like a fool trying different combinations and hope one of them worked?
    Qióng: Of course not, Shīfu! You would ask me to do so!
    Shīfu Shíjú: And would that get me into the shed?
    Qióng: No, of course not. It would take more lifetimes than my entire family line has to guess the combination.
    Shīfu Shíjú: I see I will have to ask yet a simpler question. Suppose I wished to get into *your* shed?
    Qióng: Shīfu! Everything I own is yours! If I had such a lock I would gladly tell you my combination!
    Shīfu Shíjú: And now we see a small glimmer of enlightenment. Suppose now your great aunt bèndàn owned such a lock?
    Qióng: Um, Shīfu?
    Shīfu Shíjú: Yes?
    Qióng: She is …. Uh, I mean I love bèndàn dearly, but …. that is, do you…
    Shīfu Shíjú: To answer the question you were eventually going to ask, "No, I do not think she would be able to correctly use such a lock". And that is why I would try the lock's initial combination of 012345012345 with every expectation of success.
    Qióng: Do you think that would work? She is not a clever woman, but perhaps she could manage to change the combination?
    Shīfu Shíjú: Quite possibly. For example, the locksmith provides detailed instructions on how to change the code to "Correct Horse Battery Staple".
    Qióng: Correct Horse Battery Staple? That's a marvellous code! If I ever get a lock, I will surely use that one!
    Shīfu Shíjú: Of course you will. And that's why I will try it second if 012345012345 does not work.
    Qióng: Uh….. Does that mean Correct Horse Battery Staple is actually a poor password?
    Shīfu Shíjú: Moving on, suppose I wish to get into the shed of the captain of the guard?
    Qióng: But he is smart, and will use a great code! And he's not going to tell you what it is!
    Shīfu Shíjú: No he won't. I happen to know that he also changes his code every week – I see his apprentice fiddling with the lock every Sunday.
    Qióng: But Shīfu, why would he need to change his combination if his old one would take uncountable years to discover?
    Shīfu Shíjú: You ask a question that is beyond even my ability to answer.
    Qióng: I am sorry, Shīfu. But about the lock, do you think our telescope would be able to watch the sergeant change the code? To see at least part of it?
    Shīfu Shíjú: You are now thinking clearly! And if not the telescope, I notice that the captain tends to leave his window open on these warm nights.
    Qióng: Ah! So I could sneak into the garden, and overhear the captain telling the sergeant what the new code should be!
    Shīfu Shíjú: Excellent, Qióng! You are truly mastering todays lessons. For a change of pace, suppose I wish to ensure that the locksmith can not get into my shed. How should I do it?
    Qióng: But Shīfu, you are too clever for him! I know you will whisper the combination into my ear so he can not overhear, and you pick a clever one no one can guess, and assuredly not let him find out. So this is an easy problem, is it not?
    Shīfu Shíjú: You flatter me, Qióng. But remember: The lcoksmith built the lock.
    Qióng: But he does not know which combination you will pick? Does he?
    Shīfu Shíjú: Probably not. But perhaps the lock has two combinations. Perhaps the locks he builds will open up for the combination "I Locksmith Open Up" in addition to the code I set?
    Qióng: Shīfu! Can the locksmith really do that?
    Shīfu Shíjú: I am not a locksmith, so I can not say. But if we wished to keep him out of our shed, perhaps we should find out. Two more examples, to prove your enlightenment. First, how about the glassmaker's shed?
    Qióng: "I Locksmith Open Up"!
    Shīfu Shíjú: No, no! Or I mean, yes very clever. But suppose our locksmith is honest and there is no master code?
    Qióng: Oh!. Do you think the glassmaker will use "Correct Horse Battery Staple"?
    Shīfu Shíjú: No.
    Qióng: Um, then I am a loss, Shīfu.
    Shīfu Shíjú: You frequently are. But you were doing well so far, so I will ask this question: What is the glassmaker's shed made out of?
    Qióng: But Shīfu! You would not break open his shed? It's a beautiful piece of art! Breaking it would be gauche!
    Shīfu Shíjú: Quite right. It would be too gauche for me to break his shed. Nor would I ask you to do such an ignoble act. But what about our stableboy?
    Qióng: He is very crass.
    Shīfu Shíjú: Quite so. He would possibly break the glassmaker's glass even if I did not hint the idea to him.
    Qióng: Shīfu, you have depressed me. It seems the wondrous lock we spent the morning discussion is actually quite useless.
    Shīfu Shíjú: Not at all. The lock is fantastic! It is marvellous. In fact, the lock is so effective that none of our attempts — except perhaps bèndàn — had anything to do with trying random lock combinations.
    Qióng: I see. So the lock protects the shed, but it is not the only way inside.
    Shīfu Shíjú: Quite right. Now, one more example, and please make me proud! Suppose the woman by the well had such a lock?
    Qióng: Her? She's not a citizen! She's not even an apprentice! She's just a peasant. Why, if you needed to get into her shed, you would simply ask the guards. If she didn't open it herself, the guards would throw her in prison until she changed her mind!
    Shīfu Shíjú: Ah yes, you see, in some cases just the fact that the shed is locked is enough to -
    Qióng: How dare she try to keep you out of her shed! What did you ever do to her?
    Shīfu Shíjú: Qióng, remember this is just an example, she has not actually wronged us -
    Qióng: And we give our peasants such freedom!
    Shīfu Shíjú: I have heard that said.
    Qióng: Why we even allow them to explain their side of the story to our judges unless they are accused of such a great crime we exile them to our island prison!
    Shīfu Shíjú: Our residents do have a certain level of freedom, yes.
    Qióng: And our Supreme Leader has promised that he will only order the Warlord to assassinate people who deserve to die!
    Shīfu Shíjú: I remember that, yes.
    Qióng: I can't believe this. I should go to the Council right now and get them to issue a proclamation reminding peasants they must not hide things from their betters!
    Shīfu Shíjú: Qióng! Stop your agitation at once!
    Qióng: Yes Shīfu. Sorry Shīfu. I will try to do better Shīfu.
    Shīfu Shíjú: Much better. I think you have had quite enough enlightenment for one day. Now go clean the library, make sure the dining room is in order, and help the cook with supper.
    Qióng: Yes, Shīfu. Thank you for all you have done for poor Qióng.

  62. Sheriff Fatman says:

    @David: I think Daran's point is that with asymmetric keys, a brute-force attack doesn't have to try every combination from 0 to 2^4096, as only a (relatively) small proportion of them are valid (something to do with products of pairs of large prime numbers, if I understand correctly).

    I like your Schneier quote: I like this one too (found on a Stack Overflow post:

    [Lots of scary physics stuff ...] These numbers have nothing to do with the technology of the devices; they are the maximums that thermodynamics will allow. And they strongly imply that brute-force attacks against 256-bit keys will be infeasible until computers are built from something other than matter and occupy something other than space.

    If you work through the scary physics, it shows that if you built a perfectly efficient computer, and channelled the entire energy output of the Sun to power it, then by the time the Sun had burnt itself out and gone cold, your computer wouldn't have been able to even count anywhere close to 2^256, let alone do anything useful like try to use those combinations to decrypt something.

  63. Bob Brown says:

    It is not necessarily true that a "hacker" who has compromised the password file (really, password hashes, one hopes) has control of the entire system. Often such attacks succeed because a system is not adequately protected against SQL injection. (It is true that, no matter how one gets those password hashes, one can work on them for as long as one likes.)

  64. Bob Brown says:

    I suggest that the most important password/passphrase most of us have is that for our email account. The miscreant who has access to one's email account can probably subvert the password reset mechanisms for one's other accounts.

  65. David says:

    @Sheriff Fatman

    @David: I think Daran's point is that with asymmetric keys, a brute-force attack doesn't have to try every combination from 0 to 2^4096, as only a (relatively) small proportion of them are valid (something to do with products of pairs of large prime numbers, if I understand correctly).

    Thanks. I'm aware. Please see allusions to factorization and Zimmerman interlacing the preceding text. All such details have been bracketed out until the next episode.

    @Yendor, Nice!

  66. David says:

    @Bob Brown

    I suggest that the most important password/passphrase most of us have is that for our email account.

    A good practical point, and anyone using any service that offers multi-factor authentication who declines to take advantage of that feature deserves whatever compromise eventually comes his way. It doesn't solve the ball peen problem, but it so complicates automated attacks that they become effectively irrelevant.

  67. Sheriff Fatman says:

    @David

    Thanks. I'm aware. Please see allusions to factorization and Zimmerman interlacing the preceding text.

    In that case, isn't this part:

    After you have raked the yard, I will teach you about a padlock that has a girth of 2 and 4096 wheels. How long do you think it would take many empires of many thieves to find the magic number for such a lock?

    a bit misleading, given that all previous examples were premised on the thieves' having to try, on average, half of all possible combinations?

  68. Paul McG. says:

    Wired: Kill the Password – Why a String of Characters Can't Protect Us Anymore

    http://www.wired.com/gadgetlab/2012/11/ff-mat-honan-password-hacker/

  69. David says:

    @Sheriff
    Misleading? I suppose that's one way to look at it. Another way to look at it is that it fits perfectly into the immediate context within the discussion of symmetric encryption:

    The master proposes continuing the lesson after the raking is done.
    The student explains that his abacus is not large enough to continue.
    The master explains that the universe isn't large enough to continue.

    Seems easy enough to interpret in context. Why do you feel the need to wrench it out of its context?

  70. David says:

    To skirt any further confusion: if AES-256 is deemed sufficient for the government's most sensitive data, that's certainly more than adequate for routine non-official purposes. My saying 4096 was "a common key length" for strong crypto muddied the waters, since that's true (and overkill) for asymmetric pk calculations but 16x greater than the already huge 256 routinely used for encryption in situ.

    So @Daran is right after all that I confused the two unhelpfully in the comment he quoted. I should not have, since I know the diffie twixt 'em and use both at work. But that's what comes from being old and thinking/typing in haste. ;)

    Now where did I put my glasses….

  71. David says:

    As a reference point for passphrase adequacy, consider the Army standard:

    * Minimum of 14 characters, maximum of 25 characters
    * Contain 2 upper case characters,
    * Contain 2 lower case characters,
    * Contain 2 numbers,
    * Contain 2 special characters(!@#$%^*_-=+,.?),
    * Be case sensitive.

    Although the DoD, like industry and sci-fi, prefers multifactor biometric security over passphrases alone, this is the standard for those when they're used.

    Mixed case alphanumeric with half the set of visible specials (a proper subset of ascii printable) and a minimum length of 14 is equivalent to about 7 diceware words.

  72. Xenocles says:

    Most of the time in DoD systems I've had to change my password every 90 days or so, too.

    How do I remember a new 14-character password every three months? I don't, of course. I change the two special characters and move on.

    Rule 1 for security folks should be to always remember that people are lazy, often creatively so.

  73. eddie says:

    Just remember that the NSA (and other such similar agencies in other nations) don't care about the strength of your password or the strength of your encryption.

    They cheat.

    They subvert the systems you use and directly obtain your passphrases and/or plaintext. Assuming they care about you at all, that is.

    Even Bruce thinks so.

  74. David W says:

    Got a question about biometric security. From the perspective of the computer, isn't it still just a string of bits? How is 'scan your fingerprint' any different than 'use this password'? Except you can't change your fingerprints if someone gets a copy.

  75. David says:

    Biometric is normally used as just one factor in a multifactor authentication. There's value in being able to change a passphrase, but there's also value in being unable to change a passphrase!

  76. Steven H. says:

    @David W:

    Pretty much – bits is bits, after all.

    Real problem with fingerprint scanners is that they assume that the finger is still attached to the original owner when presented.

  77. Peter B says:

    4096 bits might be RSA secret or public key length. A symmetric cipher key of that length screams snake oil. ECC (Elliptic Curve Cryptography) keys are much shorter for the same security.

    I wrote code implementing NIST p-256 that performs the computationally expensive point multiply in about 25 milliseconds on a 168 MHz ARM Cortex-M4. Cell phone CPUs are faster but tend to run Java. I’m sure that less than 100 milliseconds computational overhead is enough for a cell phone to create an ephemeral key pair and perform a Diffie–Hellman key exchange. Then both parties create the same 256-bit AES key.

    The nice thing about ephemeral keys is that they are created as needed then discarded. No past messages can be decrypted no matter how much compute power is involved.

  78. Anonymous says:

    First thing I'd do is not have all the words from the same language. That increases your "girth" significantly.

    !

    I speak three languages and yet had never considered this!

  79. David says:

    @Peter B

    4096 bits might be RSA secret or public key length. A symmetric cipher key of that length screams snake oil.

    In context, 2^12 was meant to illustrate a keyspace that would require multiple lifetimes of multiple universes to explore. So yes–its selection was a deliberate illustration of overkill.

    "Screaming snake oil" might be a useful metaphor if we were discussing some particular product or claim. We're not.

    @Steven H.
    "I'm gonna need a hedge clipper…."

  80. James Pollock says:

    "Rule 1 for security folks should be to always remember that people are lazy, often creatively so."

    This is built into many security systems, but new creative "solutions" keep popping up.

    This is integrated into my presenation on security policy by noting this progression:

    We require people to change their passwords because if the password has become compromised, changing it limits the damage. Since we can't always tell that a password HAS been compromised, we require periodic changes of all users. (Note: Two groups routinely seek exemption from this requirement: high-level executives, and IT systems managers. In other words, exactly the people whose passwords would be most valuable if compromised. Watch for this.)

    OK, so you have a recalcitrant user who thinks "it's too hard to think up, and remember, a new password; I can remember the old one and want to keep using it.

    So, their old password is "password", we prompt them to change it, and they go through the "choose new password" procedure and choose the new password "password". Note the difference between the two questions… A) Did the user change their password? and B) Did the user's password change? Also note which one you're actually interested in.

    So, in addition to the policy of requiring periodic password changes, we also implement the policy of "password history", wherein we remember previously-used passwords and do not permit them to be chosen.

    But, NEVER underestimate the recalcitrance of users.
    "Fine" says the recalcitrant user. I can't pick "password" again. I can't pick any of the last 5 passwords I used. So, instead of picking "password", my new password will be "password1". (Admit it, most of you saw this coming.) But our recalcitrant user doesn't stop there. He immediately changes the password again, to "password2", then "password3", "password4", "password5", and then back to "password", which is no longer on the list of the 5 most-recently-used passwords.

    Setting a password history has not achieved our security goal of actually changing the user's password (except for the 20 minutes while we cycled through all the variations) AND wasted 20 minutes of that employees time that could have been spent on productive work.

    So we introduce a policy of "minimum password age", the amount of time you have to stick with a changed password before you can change it again.

    What about the people who just cycle through passwords by incrementing a digit? There isn't a technological solution to that one, we just have to educate them to pick a progression that isn't obvious.
    Instead of password1, password2, password3, etc. Do
    Password$, Password&, password@, Passwrd+, etc.

  81. James Pollock says:

    "Got a question about biometric security. From the perspective of the computer, isn't it still just a string of bits?"
    Yes. A VERY LONG string of bits.

    "How is 'scan your fingerprint' any different than 'use this password'?"
    It doesn't rely on (human) memory.

    "Except you can't change your fingerprints if someone gets a copy."
    You have ten to choose from. (YMMV).

    "Real problem with fingerprint scanners is that they assume that the finger is still attached to the original owner when presented."
    True, but not a problem for retina scans. Remove the eyeball from a functional blood source, and the blood vessels collapse and the scan fails.

  82. James Pollock says:

    "Screaming snake oil" might be a useful metaphor if we were discussing some particular product or claim. We're not."

    Meh. The product is "encryption" and indirectly, either "security" or "privacy", while the claim, implicitly, relates to "how much is enough" or "how strong does it have to be in order to be "secure") Yes, "secure" is a relative term, and there are various (implied) comparators.

    There's also the point that for short enough messages, the key length stops making any difference, and the number of cycles through the ciphering engine becomes the relevant factor for security (for block ciphers, anyway.)

  83. Cat G says:

    From my knowledge of my coworkers, if a password is to be cracked, ignore the crypto. Look for the small notebook each and every one of them has – it will contain the keys to the kingdom. Because having more than 10 different passwords that all must change on a rotating 45, 90, or 180 day cycle means most people just write it down.

    Alternately, check for post-it notes.

  84. anne mouse says:

    You don't need to chop off somebody's finger to get their print. Just dust their car's door handle, or a glass they've sipped from, or heck, the back of the laptop or phone that you're trying to log onto.

    Retinal patterns are quite a bit harder to get without being noticed, but there's no need to remove eyeballs either. The old $5 wrench method (see links to XKCD above) works just fine.

  85. malcom digest says:

    What about the people who just cycle through passwords by incrementing a digit? There isn't a technological solution to that one, we just have to educate them to pick a progression that isn't obvious.

    Seems fairly easy to prevent with a technological measure. Just require that each new password — in addition to meeting complexity requirements, minimum length, not include the user's name, username, etc. — be a minimum distance away from the prior password(s).

  86. George William Herbert says:

    eddie wrote:
    "Just remember that the NSA (and other such similar agencies in other nations) don't care about the strength of your password or the strength of your encryption.

    They cheat.

    They subvert the systems you use and directly obtain your passphrases and/or plaintext. Assuming they care about you at all, that is.

    Even Bruce thinks so."

    For the record, though he hasn't explicitly said why, he did both just admit he's procured a new, never internet connected system since the Snowden scandal broke, uses it for some particularly sensitive stuff with sneakernet (USB thumb drive data transport) connectivity only, and he recently changed his PGP public keys.

    One can hypothesize he thinks his previous ones were snooped, off a non-air-gapped system.

    If it ACTUALLY needs to be secure, don't connect it to the internet, and ideally don't even have any bluetooth or wifi or similar chips in the box.

  87. Trevor Pott says:

    @James Pollock thank you, General Alexander, for your considered and nuanced opinion on the value of cryptography and the proper position various citizens should assume with regards to the state.

    Snark regarding our fundamental disagreement with regards to the role of the state, I think your complaint regarding password reuse and iteration is slowly being addressed by the industry.

    "Similarity algorithms" are being introduced into many end-user facing applications. They will detect things like "Password1, Password2" and outright refuse the iteration. Some of them have knobs to turn regarding how similar the passwords are able to be.

    Password reuse across sites/authentication systems is a bigger issue. Some password managers can help here – squawking at you when it detects reuse and/or allowing password generation – but this is the area where I think we (collectively) haven't thought up a good ease-of-use solution.

    What worries me more than the above (especially in terms of citizens attempting to shield themselves from an intrusive state) is the propagation of American-owned online identity services as the creeping "de facto" centralised authentication services for the next generation of computing.

    Not only is your Facebook/Google/Twitter identity used to log you on to dozens (or hundreds) of other websites, but it is creeping into combined authentication on local systems as well. (See: Windows 8 and it's Microsoft ID integration.)

    That quite literally is giving the government a key to your house. (Or more accurately, giving the home security company a key to your house that the government "requisitions", orders the security company to keep silent about and never tells you they copied.)

    I have all sorts of issues with that practice simply because of many of the arguments you yourself put forth: such practices ultimately end up used as a justification for invasion of the citizen's "house" based on the (utterly bullshit) premise that citizens have "no reasonable expectation of privacy."

    The number of places where citizens have no "reasonable" expectation of privacy according to the American government is decreasing seemingly with the hour. A lot of the rationale used by them and American cloud providers (Google in particular, in a recent lawsuit is what I'm thinking of here) is "you have no reasonable expectation of privacy if you store or communicate your data to a third party (read: cloud e-mail provider/cloud storage/etc.)

    I cannot disagree with the precepts behind that tortured logic strongly or strenuously enough. It is morally wrong. It is ethically wrong. It is legally wrong in my country and in many others (especially in Europe.)

    For all that password hygiene might help keep the script kiddies out of your e-mail, I think that far more important is combating the slow creep of authentication system unification and the legal implications it will ultimately bring with it.

    If you must have unified authentication systems, I think the way to go isn't to hand the keys to your house over to the security company. It is to use a virtual directory service and perform the unification on our own terms.

    This comes back to ease of use, as you pointed out. Current VDS solutions are cumbersome and difficult. At the moment, there don't seem to be solutions to this problem. I hope that we can find them before the legal "reasonable" expectation of privacy is removed even from our own digital "homes".

    Even better would be a rationalization, simplification and alignment of the legal "reasonable expectation of privacy" with what actual citizens feel instead of what suits the state.

    Ah well, we plebians can dream. That isn't illegal…yet.

  88. xtmar says:

    @Trevor

    The other problem, which crypto doesn't address at all, is that most of the things people really care about, like bank details, are already very easy for the government requisition. If they can get your bank details, phone call history, and track your cell phone, what do most people have left that's worth keeping secret? Obviously, you want reasonably strong defenses against non-state actors, but it seems like state actors, at least in the US/EU, can more easily access most meaningful data via pressuring the other party.

    This is clearly wrong, but it's not a problem that infinitely strong crypto can solve.

  89. James Pollock says:

    "Seems fairly easy to prevent with a technological measure. Just require that each new password — in addition to meeting complexity requirements, minimum length, not include the user's name, username, etc. — be a minimum distance away from the prior password(s)."

    The problem is that not even the system knows what your password is, and if it does, it has very weak security. The problem (demonstrated thoroughly in early Unix development) with storing passwords on the computer is that if it is compromised, so are the passwords. So modern operating systems don't store passwords. They encrypt (or usually, hash) the passwords, and store THAT. Then, when someone offers a password, they encrypt (or hash) the offered password to see if the results are the same… if yes, the user has offered a correct password; if no, then no.
    But the whole point is that you can't go backwards… what the OS has stored cannot be used to recreate the original password… so the OS has no way to tell if the password you're changing to is substantially similar to the one you had before.

  90. Steven H. says:

    @Ken White:

    "So, like, if I use "bare ruined choirs where late the sweet birds sang" I'm sure it's not as reliable as "rutabaga XYZ Gargamel 3498 stupendous rottweiler odor pyrite amaretto." But — given the ability of computers to run every phrase in literature — how much worse is it?"

    Hmm, assuming, say, 1,000,000,000 phrases like your example (bare ruined choirs where late the sweet birds sang), we're talking the rough equivalent of a five character password for that one, given a dictionary of english phrases.
    Your second example (rutabaga XYZ Gargamel 3498 stupendous rottweiler odor pyrite amaretto) is 69 characters long, so it's moderately safe to say that the second passphrase is about 10^116 times better.

    Note that if you assume that that dictionar of common english phrases actually has 1 trillion entries, instead of a billion, the second phrase would only be about 10^112 times as good….

  91. Jason says:

    @Yendor,

    Bravo, well done!

  92. James Pollock says:

    "Password reuse across sites/authentication systems is a bigger issue. Some password managers can help here – squawking at you when it detects reuse and/or allowing password generation – but this is the area where I think we (collectively) haven't thought up a good ease-of-use solution."

    Password Managers violate the first rule of passwords, which is "never write your password down."

    "Not only is your Facebook/Google/Twitter identity used to log you on to dozens (or hundreds) of other websites"
    Mine isn't, because I don't have any of those.

    "The number of places where citizens have no "reasonable" expectation of privacy according to the American government is decreasing seemingly with the hour."
    Agreed. For us (U.S. citizens) firming up the barricades here is of primary importance. Note that this is a legal, not a technological, problem. The technology solution either A) does not exist, or B) may exist, but it takes a lot of work, eternal vigilance, and a significant amount of money to achieve.
    That's with regard to repulsing the efforts of nation-states and their combined intelligence and law-enforcement agencies.

    "I cannot disagree with the precepts behind that tortured logic strongly or strenuously enough."
    As a general precept, I tend to agree. If you share something with someone who has contractually agreed to keep it confidential, there IS a reasonable expectation of privacy, except that if you store your data on someone else's hardware, they retain rights to inspect and examine it to the extent required to manage and maintain their systems, and there are a handful of justifications to break confidentiality, none of which the NSA claims (although the FBI sometimes does).
    As applied to the Internet, however, not so much. The Internet is a public network. It has always been a public network. There is no privacy in public. There is no reasonable expectation of privacy in public.
    In analogous terms, your digital "house" includes your computer(s) and the networks that join them where you own the hardware. Once you transmit on the Internet, you've left your "house" and gone out onto the public "street".

  93. Nobody says:

    > if we could just stop calling them 'passwords' and start calling them 'passphrases', that might somehow subtly, magically, wishfulthinkingly start to change the cultural landscape that most users inhabit.

    I complained the other day about a password system that forbid one from using spaces. I am still at a loss as to why. Were they unable to master URL decoding or something? Password size restrictions always get me, too. I should not be willing to type anything that might hit your limit.

  94. Trevor Pott says:

    @James Pollock I've very glad my nation – and others out there – disagree with your opinion regarding expectation of privacy on the internet.

    As regards the issue of "eternal vigilance" being required to protect against encroachment by the state into our lives…

    …that's the price of freedom.

  95. GoSign says:

    A list of random words might be easiER to remember than a string of random letters and symbols like fj49(83hg%$#Q$gui4;8gh, but I still wouldn't trust myself to remember it if I'm only using it twice a year. So I'm back to keeping my passwords written on a slip of paper in my wallet.

    This is the main problem I'm trying to wrap my head around, as a complete novice to security stuff: It sounds like any method that would allow me to memorize the password reliably (like, using an actual meaningful phrase) also makes it some degree of "useless". But it feels really dumb to keep a physical list of all 80 of my passwords (of varying importance).

    I've even resorted to reusing the same password for some harmless things like forums (i.e. things that don't get ANY of my personal information but occasionally my name), and then having unique passwords for anything important like email or banking. But I get the impression that security experts would balk at even this?

  96. AlphaCentauri says:

    @Nobody — there are limitations regarding how much freedom they can allow users in choosing usernames and passwords. When you enter characters into a webform, the characters you enter are placed in memory, and that memory has a certain number of characters allotted for it. By entering character strings that violate that limit in some way, it is possible to hack into computers using a "SQL injection." See http://sqlzoo.net/hack/ also http://xkcd.com/327/ . Ideally the person doing the programming has anticipated all the tricks hackers might use and taken steps to prevent them from being successful. But major sites get hacked through SQL injection even so.

  97. James Pollock says:

    "As regards the issue of "eternal vigilance" being required to protect against encroachment by the state into our lives…
    …that's the price of freedom."

    And you've been successfully distracted from that vigilance into worrying about cryptography instead.

    "I've very glad my nation – and others out there – disagree with your opinion regarding expectation of privacy on the internet."

    Oh. Does your nation have a different Internet, one that was built by private entities for the purpose of NOT sharing information?

  98. Trevor Pott says:

    In a desperate attempt to return to some semblance of "on topic" here, I have a question for the floor:

    Bruce Schneier wrote about all of this recently.

    Since I started working with Snowden's documents, I have been using GPG, Silent Circle, Tails, OTR, TrueCrypt, BleachBit, and a few other things I'm not going to write about. There's an undocumented encryption feature in my Password Safe program from the command line); I've been using that as well.

    Here he lists several applications used to keep his system "clean" (BleachBit falls into that category) as well as some common applications for using encryption in the real world.

    I use some of these, but I also use some not on the list. Some of the people in this thread are pretty well versed in encryption; what are the recommendations regarding applications for doing hard crypto properly?

  99. Trevor Pott says:

    Sorry, I'm derpy and forgot to add the link to the article. It can be found here.

  100. James Pollock says:

    "what are the recommendations regarding applications for doing hard crypto properly?"

    The basic rule ALWAYS applies… only use encryption tools that you understand fully.

  101. Trevor Pott says:

    @AlphaCentauri sir, I am not a full time dev – so pardon any crashing ignorance as to the details of SQL injection defence difficulties – but do not most modern languages have something like this, this and this?

    Now, I admit to not having written many authentication systems myself, but I've typically relied on functions like the above to handle sanitisation. When combined with parameterized SQL shouldn't this be a fairly simple means of combating injection?

    This doesn't directly protect against XSS – you'd still need some additional output filtering, bound variables and the like – but it should allow you to use "all of UTF-8" your character set. Combined with things like password managers* you should be able to get massively long and complicated randomly generated passwords with huge character sets.

    *My password manager allows for three-factor auth (master password, fingerprint and generated code from phone) so I'm less worried about someone getting hold of it's treasure than I am $5 wrench and/or the biological fallibility of my own aging memory. (The failing memory limits the number of different passwords I can remember and their complexity.)

  102. En Passant says:

    I have no fears of government decrypting my messages, because I encrypt everything twice. With ROT13.

    That said, it's worth noting that Bruce Schneier just issued his new PGP public key and OTR key fingerprint

  103. Trevor Pott says:

    @En Passant I take it the ROT-13 snark is due to the OTR fingerprint inclusion?

  104. Castaigne says:

    *slides up to empty shed*
    *looks at very impressive lock*
    *waits till nightfall, goes to back of shed, and quietly cuts a hole in the wall to get in*
    *leaves a message saying "Sage says: Lock is only as good as the walls surrounding it."*

  105. The Man in the Mask says:

    fail2ban and friends are nice…but there are better ways.

    Instead of waiting to observe failed login attempts and banning the originating IP address(es), ban them ALL outright and only allow the subset that are necessary.

    Obviously you can't do this if you're Gmail. But if you're Joe's Fill Dirt in Omaha, then you're not going to see valid login attempts from Ghana, Peru, Romania or Vietnam. Given that, why would you even consider letting packets from the those networks get anywhere near port 22 on your system?

    So: you can either do this by (a) blocking everything and then using the network allocations from ipdeny.com to permit connections from your own country or thereabouts or (b) blocking everything and then using the history of observed successful logins to construct the permitted-networks list or (c) blocking countries one at a time by adding the allocations from ipdeny.com. (c) is the most work and least effective, but if people foolishly insist on a default-permit policy then it's better than nothing.

    In addition, the Spamhaus "DROP" list is quite useful: apply to perimeter routers and use it to bidirectionally drop packets from/to those networks. This achieves lossless compression of network traffic.

    And finally, use fail2ban or equivalent on the leftovers. It'll have a lot less work to do:

    My logs are silent
    Attackers cannot reach me
    Serene calm pleases

  106. James Pollock says:

    ""Sage says: Lock is only as good as the walls surrounding it."*

    First rule of security is that there is no security without physical security. This is why the CISSP exam has a whole section on it covering fences and mantraps. (or at least, it did, ten years ago.)

  107. Trevor Pott says:

    @The Man in the Mask you might consider adding malwaredomains.com to your data sources, if you haven't already.

  108. Daran says:

    @David

    I know the diffie twixt 'em

    What the hell, man?

  109. Daran says:

    @Sheriff Fatman

    @David: I think Daran's point is that with asymmetric keys, a brute-force attack doesn't have to try every combination from 0 to 2^4096, as only a (relatively) small proportion of them are valid (something to do with products of pairs of large prime numbers, if I understand correctly).

    Not really. My point was precisely what David eventually conceded: that when discussing appropriate key-lenghs it is important to distinguish between symmetric and asymmetric ciphers.

    You are correct that the large majority of bit strings will not correspond to valid moduli for RSA crypto, but even taking this into account, bruteforcing a 512-bit, or even a 256-bit modulus would be well beyond the ability of any human agency. The reason larger moduli are required is because there are factoring algorithms far more efficient than brute force.

    I would expect to be able to factor a 512-bit modulus in no more than a few weeks on a single modern high-end desktop, with no great experience or knowledge of how the algorithms work, using freely available software.

  110. En Passant says:

    Trevor Pott wrote Sep 9, 2013 @6:23 pm:

    @En Passant I take it the ROT-13 snark is due to the OTR fingerprint inclusion?

    Really just a little humor on the general Stasi slimeball snooping, storing and decryption flap. Fsck 'em if they can't take a joke.

  111. David says:

    @Sheriff Fatman
    After reviewing the thread, before it was taken over by slapfights and ego-flexing, I came to the conclusion that Daran was faulting me for having said of 4096 that "that's a common key length for what people have been calling 'strong crypto'." His reason for objecting was that we had been discussing conventional symmetric encryption, a domain in which that is by no means a common key length.

    In other words, he faulted me for being sloppy and importing an observation about asymmetric keys in the wild into a discussion where mention of them was unhelpful. He was right to do so.

    My goal was merely to emphasize to Ken (and to one commenter from a different thread), my chief audience for this post and that comment, that creating a computationally infeasible scenario is not only well within reach but commonplace.

  112. James Pollock says:

    "My goal was merely to emphasize to Ken (and to one commenter from a different thread), my chief audience for this post and that comment, that creating a computationally infeasible scenario is not only well within reach but commonplace."

    That's misleading, too, since we've only been talking about one type of attack (one and half, a dictionary attack is really just a shortcut on a brute-force attack).

    However, if there's an undisclosed, computationally-cheap method for factoring large primes, all bets are off. You don't think this has occurred, and I don't think this has occurred, but neither of us can eliminate the possibility that it has occurred. If the encryption algorithm can be defeated computationally easy, then the DH key exchange is compromomised, and so is most symmetric encryption.

  113. James Pollock says:

    "After reviewing the thread, before it was taken over by slapfights and ego-flexing"

    You titled it "size matters", and you're surprised a slapfight broke out?

  114. Sheriff Fatman says:

    @David

    I apologise unreservedly if anything I said appeared to be ego-flexing or contributing to a slapfight. Such was not my intention.

    Thank you for an entertaining and informative post, and sorry for being a picky ass about one very small part of it.

  115. Sheriff Fatman says:

    P.S. As an act of dogeza, I am forthwith changing my moniker on this blog to "Sheriff Fathead".

  116. Trevor Pott says:

    @Sherrif Fatman I don't think you're at fault, sirrah. Your questions were valid. I believe David is (most likely entirely correctly) taking a swipe at myself (and possibly one or two others) for generic boorishness.

    I accept entirely that I am a giant douchecanoe.

  117. Daran says:

    However, if there's an undisclosed, computationally-cheap method for factoring large primes, all bets are off.

    RSA moduli are semi-primes, not primes. (David has made the same slip-up.) Obviously factoring a prime is impossible.

    Incidentally DH isn't vulnerable to a factorisation attack. DH is, however, trivially vulnerable to a man-in-the-middle attack. Since RSA is commonly used as a wrapper around the DH to prevent this, breaking the RSA would render it vulnerable.

  118. Dan Weber says:

    (Long response, I'm going to repeat a few things others have said in here, sorry.)

    Memorizing 12 random characters on one pass, even if they are presented slowly, is impossible for the average person

    Yes, but people can learn very complex passwords made out of entirely random characters after a few days practice typing it in. Muscle memory is an amazing thing, and you will come up with your own mnemonics.

    Unfortunately most security systems don't warn the user ahead of time about setting up passwords and pressure the user into thinking of one on the spot, not telling him until afterwards what silly restrictions are on it.

    There is no clear winner whether that random blob is better than "correct horse battery staple": part of it depends on what you are trying to protect against.

    password checker sites

    This is my favorite: http://www.ismytwitterpasswordsecure.com/

    writing down passwords

    Just yesterday I wrote a password down on a sticky-note on my desk at work. I'm not trying to protect it from my coworkers — in fact, they deserve to know it. It's for an account that represents the company, not me.

    lockout systems

    There are lots of ways for your password to be compromised. Someone could steal a copy of the encrypted password database. Part of your security depends on how well the company running the site has encrypted those passwords.

    If they are stored plaintext, it doesn't matter, but you weren't the weakest link anyway.

    If they are stored with MD5(), then your complexity matters. Any password made of 7 or less printable characters is trivially crackable. You need to not be low-hanging fruit.

    If they used proper practices like scrypt, then you just need to make sure your password isn't completely trivial. A password of "qwerty" will never ever be secure no matter how well the other side of the system is engineered.

    Your password might be the weak link — in which case, don't let it be.

    I suggest that the most important password/passphrase most of us have is that for our email account.

    Yes yes yes. Turn on two-factor auth for your email account, right now. SMS is not perfectly secure, but the big email hacks have always been against users who don't have it.

    Google Authenticator is also pretty good if you have a smart phone you trust.

  119. Clark says:

    You titled it "size matters", and you're surprised a slapfight broke out?

    Snort.

  120. David says:

    @Daran

    However, if there's an undisclosed, computationally-cheap method for factoring large primes, all bets are off.

    RSA moduli are semi-primes, not primes. (David has made the same slip-up.) Obviously factoring a prime is impossible.

    At no time have I ever said (or thought) that "factoring a prime" is a meaningful concept. Meanwhile, referencing the coprimes/semi-primes (not necessarily prime, of course) as "prime moduli" is a perfectly common idiom in casual discussions.

    Correct if you have something to contribute, but try to apply to yourself the same expectation of precision that you apply to others, and perhaps more importantly try to calibrate those expectations (of yourself in particular) to the forum of discourse, which is obviously casual.

  121. JTM says:

    @ David

    It's always nice seeing one of your posts here at Popehat. Your writing is both entertaining and informative, and I find myself learning about subjects that I never would have sought out on my own. A gold star to you for making the internet a better place. Thanks. :)

  122. barry says:

    @xtmar

    has anyone made progress on reasonably easy to generate one time pads?

    Having established 2^256 is a pretty big number (~10^77), there are about that many ways too choose 17 books from the Project Gutenberg collection of 42,000 books (my approximations are very approximate).

    By overlaying (cyclic/modulo addition) your 17 books, first letter for first letter etc, you get a pretty unique one time pad. And as long as your message is shorter than the shortest book in the selection, nothing has to be repeated.

    An advantage is you don't have to transmit the whole pad, the key is just the book list index of the books chosen to make the pad (16 bits per book). A downside is that people can see which books you download, so you should get and store the whole 50 GB or whatever it is.

    very few people have their own random number generators.

    Everyone with a webbrowser with javascript has Math.random(), but that's only pseudorandom. For true random you can't beat the Hitchhikers Guide to the Galaxy's 'wire in a good hot cup of tea' method (or point your TV antenna at the sun).

    One time pads are just not 'sexy' cryptography. As David says "not really any progress to make". They're simple and they work, and if NSA ever claimed to know how to decrypt them without the key, nobody would believe them.

  123. David says:

    @Sheriff Fatman, I've noticed nothing untoward in your behavior. Some adults, however, try so hard to demonstrate intelligence|maturity|capability|tenacity|mastery that they end up seeming very much like children. Insecurity? Fix-the-worldism? A shattered capacity to grasp how they appear to others? I'm sure there's some pop-psychological diagnosis that nails it for those who care just enough to wonder but not enough to probe.

    That sort of social behavior degrades the quality of the forum for everyone, and it suggests that the value system of such offenders is overskewed toward their interest in attention or self-promotion at the expense of community. It seems needy.

    Would the slapfighters act that way in person if they were invited to X? The likelihood of invitation varies inversely with that risk, I'd guess. But "unseemly" is an old-fashioned notion, and not too many care for it anymore. The world of comment sections ist montelspringert geworden.

    It's an easy misbehavior to fix from within. Here's the heuristic query: "Am I contributing in a way that makes knowledge acceptable, or am I making so much (or such cacophonous) noise that it drowns out my positive contribution?" Most find their way through that particular developmental station by the end of secondary school, I'd guess. But maybe I'm too optimistic.

  124. David says:

    Interesting sources of randomness:
    Random.org
    Hotbits at fourmilab

  125. David says:

    @JTM, Thanks for your kind remarks.

  126. David says:

    BTW, @Trevor Pott allows himself to be dragged into slapfights, but at least he maintains good humor and an admirable sense of his own foibles. In this respect, he's a lot like Clark– though he's probably not a Catholic anarchist survivalist. ;)

  127. barry says:

    Years ago when I first learned people were buying and selling books of lists of random numbers, it did cause a slight rethink on the nature of the world.

  128. David says:

    However, if there's an undisclosed, computationally-cheap method for factoring large primes, all bets are off.

    On reflection, that's actually a deep metaphysical insight!

  129. David says:

    @barry
    That has such people in't!

  130. Dan Weber says:

    Most computers these days have very good sources of random numbers based on high-entropy pools. Not enough to generate enough one-time-pads for video calls, but definitely enough for your tweets and probably enough for your emails.

    One-time pads should be generated straight from the entropy source and used exactly once; that is what makes them unbreakable.

  131. The Man in the Mask says:

    @Trevor

    I'm aware of malwaredomains, and use it for some other purposes (particularly SMTP blocking) but I don't often use it to block SSH because it adds (some) complexity for very little reward: nearly everything it lists is already listed by something else that I'm using.

    I should probably also mention that sometimes i shift ssh services to a different port; sometimes I employ port-knocking; sometimes i impose time-of-day restrictions; and so on. None of these are panaceas: sufficient patient and clever and knowledgeable adversaries will observe and counter each. But it'll take them time and effort: maybe they'll go away. Or maybe they'll make enough noise that I'll notice them before they make substantial progress. Or maybe fail2ban/equivalent will catch them.

    No guarantees of course: just an attempt to stack the deck in my favor.

  132. David says:

    Security by obscurity is a dubious method.

  133. James Pollock says:

    "RSA moduli are semi-primes, not primes. (David has made the same slip-up.) Obviously factoring a prime is impossible."

    Yes, I meant to say "products of primes". Please proceed as if I had.

  134. Rob says:

    "How is 'scan your fingerprint' any different than 'use this password'?"
    It doesn't rely on (human) memory.

    Neither do my passwords, I don't know them. They're in a password vault.

  135. AlphaCentauri says:

    @Trevor Pott — I wasn't trying to give the last word on SQL injection. I was just trying to explain why the people that created password log in fields in the first place might have been motivated to limit size/complexity of passwords. Despite the trick being well known, it still seems to succeed against sites that should be better protected against that:
    http://krebsonsecurity.com/2013/07/hacker-ring-stole-160-million-credit-cards/

  136. Richard Gadsden says:

    If you want a quick test for your proposed passphrase, go into porn-mode on your web browser, go to google, stick in the phrase in quotes and if you get any results, then don't use it.

  137. En Passant says:

    David wrote Sep 10, 2013 @6:58 am

    @barry
    That has such people in't!

    I blame the Chemical Rubber Company.

  138. Steven H. says:

    @Richard Gadsden:

    "If you want a quick test for your proposed passphrase, go into porn-mode on your web browser, go to google, stick in the phrase in quotes and if you get any results, then don't use it."

    If you do that, Google has your proposed passphrase in its database.

    If you must use something that sounds quotable, just misspell a word or two, or replace them with homonyms.

  139. CJK Fossman says:

    @David

    This article has spawned a fantastic conversation.

    Thank you for that.

  140. David says:

    Don't thank me. Thank Shíjú!

  141. Trevor Pott says:

    @The Man In The Mask I'm usually pretty good with my servers – I'm a fan of non-traditional ports for servers that's aren't doing SFTP/SCP as a service, use .htaccess religiously, block things at the edge router with massive blocklists, etc.

    That said, my experience with Linux can never be complete and my lockdown of systems could always use another eye. If you've the time, I'd appreciate your taking a boo at this and adding any tools you think are good practice to add to my standard deployments.

    Next month I'm putting together (finally!) a followup that lists all the favourite tools of commenters from several communities that I've worked with. I'd love to add any items you come up with.

  142. Trevor Pott says:

    @AlphaCEntauri sorry for my miscommunication sir; I was not trying to insinuate you were "giving the final word" on the subject…I was genuinely curious if I had missed something. Your knowledge in the area seems greater than my own and the entire concept seems topic-relevant.

  143. Trevor Pott says:

    @David I do let myself get dragged into slap fights. It's a personal failing I wish I could exorcise. One can be entirely self -aware of his or her own foibles but still lack the self control/restrains/individual discipline to control them all the time. Sadly for folks like you "getting all rowdy about things I am passionate about on the internet" is my personal mental pressure valve. Keeps my sane, even if it does give Ken all of the sads. (Sorry Ken.)

    I'm bizarrely okay with being viewed as a giant douchecanoe; if the excellent writing of people like the Popehat bloggers has taught us all anything, it is that the world is filled with people who will disagree vehemently with just about anything any of us say.

    Most of the time, I'm okay with letting these people just be wrong on the internet. Other times, however…I feel they have to be challenged. I feel I need to poke the beehive a little so that they say things which give other readers of the thread a glimpse at how deep the rabbit hole of their reasoning goes.

    Some times (most times?) this ends up making me look the fool as much as "the other guy." I'm okay with that. If even one person reading my fevered ratings stops for a second to consider the wider implications of the world Americans are building right now, through their own apathy, then any shame I heap upon myself via textual douchecanoeing is worth it.

    To a certain extent, I'm quite satisfied to have an apologist to play against. FUD and glory and America Uber Alles might just make a few people think about exactly who the people are that they are handing this kind of power over to.

    How to use encryption, why it works…that's all good for protecting yourself from the bad guys. But you are irrelevant and so I am; each one of us – even whole populations of us – matter nothing at all. What matters – at least to me – is fairly simple: that we leave this world a better place than we found it.

    If the world we hand to our successors is the Orwellian panopticon og General Alexander's wet dreams, we will have failed at that one, critical task. I am not okay with being the frog. Fortunately, both threads I've participated in have given readers a multitude of views to consider. With luck, most who read them will choose not to be the frog either.

    It's the right to make that choice – and to discuss the decision openly – that I value. If my only means of expression makes me irritating in the process, I apologise to all. Like a lot of people, it looks like I could use some civilian oversight of my own. ;)

  144. barry says:

    I blame the Chemical Rubber Company.

    I originally thought it was sociologists. But I just checked "A Small Book of Random Numbers" on Amazon. They say people who viewed that item also viewed "How to Avoid Huge Ships" by John W Trimmer.
    So it's pirates.

  145. CJK Fossman says:

    @Trevor,

    In my opinion, the problem with SQL injection is not the absence of prevention methods. The problem is failure to use the methods, for whatever reason.

    For that reason SQL injection will be with us as long as websites use SQL.

  146. CJK Fossman says:

    @Trevor Pott

    both threads I've participated in have given readers a multitude of views to consider.

    What's the other thread?

  147. Trevor Pott says:

    @CJK Fossman This one.

  148. David says:

    Who drew your gravatar?

  149. Trevor Pott says:

    The art guy for The Register; all the writers got one made. It was part of the creation of the author pages a little while ago.

  150. En Passant says:

    barry wrote Sep 10, 2013 @1:49 pm:

    I originally thought it was sociologists. But I just checked "A Small Book of Random Numbers" on Amazon. They say people who viewed that item also viewed "How to Avoid Huge Ships" by John W Trimmer.
    So it's pirates.

    I'm more inclined to think it's a Fifth Column of deviated preverts cleverly disguising themselves as normal.

    [Reaches into toe of Persian slipper; withdraws pinch of tobacco and packs calabash pipe; lights same and blows some smoke rings.]

    One must never disregard the obvious. The other title accompanying the tome you found was A Million Random Digits with 100,000 Normal Deviates.

  151. Clark says:

    @barry

    has anyone made progress on reasonably easy to generate one time pads?

    Having established 2^256 is a pretty big number (~10^77), there are about that many ways too choose 17 books from the Project Gutenberg collection of 42,000 books (my approximations are very approximate).

    Barry,

    You're thinking logically, but you're making a mistake.

    2^256 is a big number, but only in certain contexts.

    If you look into DES and other CBC (cypher block chaining) algorithms, you'll see that they use several very specific techniques to take a small key (and 256 bits is a very small key in this day and age) and expand it into a bunch of different only-very-weakly-correlated keys and do other tricks to make sure that there are not correlations between one part of the cyphertext and another part.

    E.g.:

    * DES in CBC mode divides the incoming plaintext into blocks
    * DES generates a per-block key and encrypts each block differently
    * DES uses input from one block to help generate the key for the next block
    * DES uses S-boxes to permute the bits

    By overlaying (cyclic/modulo addition) your 17 books, first letter for first letter etc, you get a pretty unique one time pad. And as long as your message is shorter than the shortest book in the selection, nothing has to be repeated.

    …with a grand total of 256 bits of key entropy.

    I'm not 100% sure that your scheme is susceptible to differential cryptanalysis (does your one time pad text repeat?), but it's certainly susceptible to brute force attacks.

    E.g.

    * Note that because your 17 books are all written in English they've got unequal (and well known) letter distribution. 17 English texts XOR-ed together are not as random as 17 random bitstreams XOR-ed together.
    * While it's trivial in theory to build and store a rainbow table (a precomputed table of key expansions) of the first 1,024 characters of the generated keytext, current technology makes this infeasible (it would involve covering the Earth in harddrives about one mile deep)…but if we can reduce the keyspace by a decent fraction, it becomes possible.

    And so on.

    As with the hobby of critiquing perpetual motion machines, the core isn't in tearing apart any particular implementation, but in applying the core concept of entropy (or, in this case, limited informational entropy).

    TANSTAAFL.

    If you want secure communication using one-time pads, your best bet is the other idea you mentioned in passing: use thermal noise to generate true entropy, burn two CD-ROMS with the data, and then use.

  152. David says:

    Barry,

    You're thinking logically, but you're making a mistake.

    2^256 is a big number, but only in certain contexts.

    If you look into DES and other CBC (cypher block chaining) algorithms, you'll see that they use several very specific techniques to take a small key (and 256 bits is a very small key in this day and age) and expand it into a bunch of different only-very-weakly-correlated keys and do other tricks to make sure that there are not correlations between one part of the cyphertext and another part.

    (emphasis added)

    Now Clark is infected. 2^256 is approximately 0.0012 * the number of atoms in the visible universe.

    DES has been superseded and is now a historical/academic artifact. Its de facto key length, now considered insecure, was 2^56. That's only around 72 quadrillion.

    The difference between a number with 17 decimal digits and a number with 78 decimal digits is not negligible.

  153. barry says:

    2^256 is a big number, but only in certain contexts.

    Its a lot of different pads of half a million or so bytes in length each.

    I'm not 100% sure that your scheme is susceptible to differential cryptanalysis

    If you change the 5th character of the text going in, only the 5th character of the text coming out is changed.. it's no help.

    but it's certainly susceptible to brute force attacks.

    That's the bit I said I wouldn't believe.

    17 English texts XOR-ed together are not as random as 17 random bitstreams XOR-ed together.

    17 random bitstreams XOR-ed together should be exactly as random as 1 random bitstream. My proposition is that 17 English texts XOR-ed together is so close to random that there's no way of telling if it is or not. All the patterns and predictability of english get washed out. If you play enough different CD's at the same time at the same volume, you get noise. Even the beat will be washed out.

    The 'no free lunch' part of one time pads is that the sender has to send a secret key at least as long as the message itself, that's where the inconvenience and insecurity are. My suggestion was a compromise where a short 256 bit entropy key selects the larger 2 million bit entropy key. (You can do that, it's not perpetual motion, information entropy isn't the same stuff as physical entropy)

    both sender and receiver already have a large (50GB) known library from which a short key can generate/select billions of times more 'random' pads than there are nucleons in the sun. (if you want more, use a 512 bit key to choose 34 books).

    The 'randomness' of what RSA type algorithms do is more suspect. There's a lot of theoretically reversible processes going on, with the only variation coming from the original short seed/key itself. So whenever someone claims to have cracked one, nobody is really surprised.

  154. barry says:

    En Passant

    I'm more inclined to think it's a Fifth Column of deviated preverts

    It could be the Stonecutters. I've never seen them drink water either.

  155. En Passant says:

    barry wrote Sep 11, 2013 @11:28 am:

    The 'no free lunch' part of one time pads is that the sender has to send a secret key at least as long as the message itself, that's where the inconvenience and insecurity are. My suggestion was a compromise where a short 256 bit entropy key selects the larger 2 million bit entropy key. (You can do that, it's not perpetual motion, information entropy isn't the same stuff as physical entropy)

    I'm relatively naive about current cryptography, although I have just enough math chops to labor through most descriptions and arguments.

    If I understand you correctly, you're saying that Sally and Bill share a small secret from which they can construct a big secret. The small secret is the particular collection of otherwise publicly available data from which they will generate a big one time pad (OTP) for the message Sally will send to Bill.

    So, Sally sends Bill a message, encrypted less securely than an OTP, which message instructs Bill how to construct the OTP for the message. Call it the instructional message.

    Then, Sally will send Bill the actual message, to be decrypted using the properly constructed OTP.

    If Spy intercepts and decrypts the initial instructional message, thereby learning how to construct the OTP, he still does not know the collection of data from which to construct the OTP.

    This leaves Spy, best case, to find statistical biases in the actual message, which might lead to deriving the OTP, or pieces of it, if the constructed OTP is not actually a set of uniformly randomly distributed characters. Worst case, Spy must try every possible OTP in the universe to decrypt the message.

    Clark appears to argue that any OTP constructed from low entropy (ie: less than randomly distributed characters) sources, will itself be lower entropy and therefore reveal some statistical information about the OTP in the encrypted message.

    You appear to argue that some methods of constructing an OTP from low entropy sources (say, deriving each OTP character by summing modulo character size the positionally corresponding character from large numbers of very low entropy plain language public sources like literary works or statute books) can yield a high entropy OTP.

    Therefore, you say that Sally and Bill can share the long OTP without necessity of communicating the OTP itself through an actually secure channel. Sally and Bill need only share the secret of knowing exactly the specific open literature from which the OTP is to be constructed.

    If I understand the arguments correctly, my gut inclination is to agree with you. But my gut also tells me that I could be very wrong. I would like to see a clearly stated argument against the method.

    Such an argument might entail more mathematics than can be easily expressed in this forum though. Do any readily available scholarly or technical articles that address the subject?

    Damn, that was long. I hope it wasn't equally stupid.

  156. Daran says:

    @David:

    At no time have I ever said (or thought) that "factoring a prime" is a meaningful concept.

    I never thought otherwise. My use of the term "slip-up" was intended to convey – clearly it failed – that you had simply chosen the wrong word, not that you didn't understand the concepts.

    My correction wasn't for your benefit. It was for the benefit any of the target audience of your post who might still be reading. People who need exponential growth explained in great detail are likely to be people who aren't too clear on such concepts as "prime" and "factorisation".

    Meanwhile, referencing the coprimes/semi-primes (not necessarily prime, of course) as "prime moduli" is a perfectly common idiom in casual discussions.

    Now you've opened a can of worms. I will give two replies, one aimed at the less-technical reader, and one for you.

    For the less-technical reader: yes they do need to be prime. If they're not, the security of the crypto is severely compromised.

    For you: technically you are correct. You could create, for example, a 2048-bit modulus by multiplying together three primes: a, b, and c each about one third the length – about 682. You could then generate the private key from the two factors a*b and c, one of which is not prime. Morover, factorising a modulus so created would still be beyond human capacity using known algorithms.

    But consider how crypto software actually works. To create a 2048 bit modulus, it generates two random 'prime' numbers each 1024 bits long, and multiples them together. Suppose one of these 'primes' should turn out not to be so. Then it is the product of at least two smaller prime numbers, the smallest of which is at most 512 bits, and possibly much, much smaller.

    If the modulus has a small factor, then becomes vulnerable to the Elliptic Curve Method of factoring. The current record for the largest factor found by this method is 79 digits (about 253 bits). Bear in mind that these records are set by academics and in some cases hobbyists with a few tens or hundreds of machines at their disposal. The NSA with hundreds of thousands or millions of machines doesn't publish its factorisation records.

    So yes, in practice, the modulus needs to be the product of two primes. How the software verifies that the factors really are prime would involve a discussion about strong probable primes, stuff which I suspect you already know, and which your less-technical readers don"t need to worry about.

    Correct if you have something to contribute, but try to apply to yourself the same expectation of precision that you apply to others, and perhaps more importantly try to calibrate those expectations (of yourself in particular) to the forum of discourse, which is obviously casual.

    If you feel my own contributions have lacked precision, particularly in ways which might mislead or confuse your target audience, feel free to point out them out.

  157. Daran says:

    Could a moderator please fix my screwed-up markup in the above comment. Thanks.

    I said:

    technically you are correct. You could create, for example, a 2048-bit modulus by multiplying together three primes: a, b, and c each about one third the length – about 682. You could then generate the private key from the two factors a*b and c, one of which is not prime. Morover, factorising a modulus so created would still be beyond human capacity using known algorithms.

    The unstated assumption here is that in order to break RSA crypto, you need to factorise the modulus. It's true that this is the best known attack on RSA, but this statement is always made in a context where a semi-prime modulus is assumed. I don't know whether or not it hold good when the modulus has three or more factors.

    It's all academic anyway. Any crypto software worth its salt will ensure that it is practically certain that the two factors are prime.

  158. James Pollock says:

    "So, Sally sends Bill a message, encrypted less securely than an OTP, which message instructs Bill how to construct the OTP for the message. Call it the instructional message."

    The typical assumption is that the otp is exchanged out-of-band. If a otp is captured, then all communication using that otp is compromised.

    The key aspect of a otp is that part of the code is not re-used; the other is that the method of changing the key not be predictable. That's where most codes break down… they use something that IS predictable. Alternatively, the source of the OTP repeats in a pattern, meaning that the code re-uses the same data, making the encryption based on it subject to analysis.
    That's what causes the breakdown of WEP encryption; the system uses a predictable algorithm and re-uses keys; if it used an unpredictable algorithm and didn't re-use keys, it would not be trivial to break.

  159. Daran says:

    My proposition is that 17 English texts XOR-ed together is so close to random that there's no way of telling if it is or not. All the patterns and predictability of english get washed out. If you play enough different CD's at the same time at the same volume, you get noise. Even the beat will be washed out.

    I'm assuming that you're not proposing to do any compression, other than stripping out the boilerplate added by Project Gutenberg to all its texts. – each text then to be treated as a binary file. You then bitwise XOR your selected files together to create your keystream.

    According to this paper, English has no more than about 1.1 bit of entropy per character. If the texts are ASCII encoded, that's 1.1 bits per byte. Seventeen texts will give you about 19 bits or less per byte It's not obviously insecure

    I'm not convinced that it's secure, though. Schneier's law springs to mind.

    Here's how I'd go about attacking your scheme, assuming I had sufficient cyphertext, the full corpus of texts and full knowledge of the algorithm. In other words, if I had the key, I could decrypt the message as easily as its intended recipient. (These are standard assumptions in cryptography. If your system cannot stand up to an attack under these circumstances, then it's no good.)

    First, I'd make a list of the most common three character strings in English. "These will include words like "the", but also words with spaces such as " a ", "to " and " to". Suppose I chose N of them. Then there are N^17 ways to choose a combination of 17 of them, but given any such combination, any permutation will produce the same fragment of keytext, so there are really only N^17/17! distinct possibilities. Finally I can discard the least likely combinations (where all or most of the elements come from low down on my original list).

    Starting with the most common combination, probably " a " repeated 17 times, I'd apply this key fragment, byte-by-byte to the cyphertext, Whenever the result is itself a common sequence of characters in English, I'd scan the corpus of texts, looking for any that have in the right place, any of the character sequences that went into making the key fragment. If I can find a set of seventeen texts covering all the sequences that made the key-fragment, then I'll test to see if they decrypt the entire message.

    If not, move onto the next most common key-fragment. Rinse and repeat.

    Are you certain that I wouldn't be able to decrypt your message? Would you literally bet your life on it?

    The 'no free lunch' part of one time pads is that the sender has to send a secret key at least as long as the message itself, that's where the inconvenience and insecurity are. My suggestion was a compromise where a short 256 bit entropy key selects the larger 2 million bit entropy key. (You can do that, it's not perpetual motion, information entropy isn't the same stuff as physical entropy)

    You still only have 256 bits of entropy. An attacker with sufficient cypher-text and lacking only the 256-bit key could brute-force your system using no more than 2^256 operations – much more than is feasible in the physical universe, but still theoretically possible. A true one-time pad is theoretically uncrackable, meaning that an attacker with infinite resources cannot break it. Your scheme is not a one-time pad.

    The 'randomness' of what RSA type algorithms do is more suspect. There's a lot of theoretically reversible processes going on, with the only variation coming from the original short seed/key itself. So whenever someone claims to have cracked one, nobody is really surprised.

    The reason nobody is really surprised when an RSA modulus is cracked, is that nobody has cracked one in a surprising way. Reputable crypo algorithms are rarely broken in practice. Real-world cracks fall into one of three categories: brute-force attacks against a too-small keyspace (DES, for example), attacks against faulty implementation, or the crypto was snake-oil in the first place.

  160. barry says:

    @En Passant

    I'm relatively naive about current cryptography

    There's not much related to new or current cryptography in this. The only 'new' thing is that a lot of people now have access to and the ability to store the same 50GB block of data without having to actually exchange it that makes it possible.

    And yep, you got it mostly right. Exactly right for how I first considered it, but then I changed my mind about the key being secret. There's two ways the key sharing can go.

    *1. Both Bill and Sally have the same set of books, shuffled to be in the same order as each other. Sally sends Bill a list of 17 book indexes eg [book34, book724, book9786...]. Bill cyclically adds them make the same big key Sally used to encrypt the message.

    *2. Both Bill and Sally have the same set of books in the same order as everyone else has them. The book order is not a secret anymore, everyone has them in the same order.

    In case #1 the short key (the 17 book index list) does not have to be a secret, but the actual book order does. Case #2 is the opposite. Sally and Bill don't have to share a secret book order, thats public and standard. But the short key for which books make up each pad must now be secret each time.

    But heck, now that you mention it, why not have both? The disadvantage to #1 is that if Spy somehow finds out their common library book order, Bill and Sally could go on exchanging messages not realizing their short keys now need to be secret. So now we have a 'medium key' as well (210 Kbits), for people who exchange messages regularly, unique to them. This medium key can specify how all 42,000 books are shuffled. If they can exchange that every so often, the security of the small key is not as crucial.

    I think Clarks objection is from his mistake about entropy. But something David said about entropy worried me more, that in english text, each character adds about 1 bit, and I was sure it was about 4.5 bits. Entropy is a slippery concept, and it turns out both are right depending how you're counting 'predictibility'. Stochastically, taking into accout the previous letters, you only have to choose between 2 letters (1 bit) on average to get the next letter. Sometimes a few more, and often none ('u' comes after 'q', and you know when to put the 'g' on the end of 'in' etc). But for predicting just by letter frequency (more likely an 'e' than a 'j' etc), using the -p*log(p) formula 4.5 is right. Stochastic entropy washes out first, after the second book.

    So I got experimental evidence rather than a mathematical proof to back up my gut-instinct mathematics. I downloaded 4 books from Gutenberg and measured the entropy of the first 1,000 characters of each, adding them together (mod256) and got this;

    Av entropy for 1 text = 4.48 bits/char
    Av entropy for 2 texts= 6.45 bits/char
    Av entropy for 3 texts= 7.16 bits/char
    Av entropy for 4 texts= 7.49 bits/char

    Since mod256 gives 8-bit characters, when/if the entropy gets to 8 bits per character, its as random as pure noise.

    I'll chop up the rest of those 4 books into 13 more 1000 char texts, and hopefully tell you how close it gets to the perfect 8.00 tomorrow.

  161. barry says:

    @Daran

    Seventeen texts will give you about 19 bits or less per byte

    Is that theoretically possible? I'm using ascii 8 bit bytes
    But thanks for the comment, there's a few things I need to think about there.
    Re Schneier's law. My approach would be to test subtracting each of the 42,000 books, then actually subtracing the one that reduced the entropy of the cyphertext the most_ repeat 17 times and see what can be done with what's left.

  162. David says:

    @barry, I mentioned here and Daran mentioned here that a character in English offers about 1 bit of entropy. That seems lowballed to you, but check out the paper to which Daran links for a rationale.

  163. Daran says:

    @barry

    Seventeen texts will give you about 19 bits or less per byte

    Is that theoretically possible? I'm using ascii 8 bit bytes

    Of course an 8-bit byte can have no more than 8 bits of entropy. What I meant was that the input to each byte of the purported OTP key had 19 bits or less of entropy.

  164. Peter B says:

    It is my understanding that ECC (Elliptic Curve Cryptography) has an advantage over RSA. With adequate key length both public key methods are adequately secure, based on whatever definition of “adequate” is right for the specific task. But ECC accomplishes the same level of security as RSA with shorter keys.

    The following is a short high level look at ECC.

    A known prime is involved. The formula is y^2 = x^3 + ax +b (mod p). NIST publishes several along with a standard starting point, G. They give a, b, p (the prime) the (x, y) for G and the number of points in the curve. Because some of the internal math involves numbers with twice the number of bits as p, the prime is chosen to make the mod p operation efficient in 32-bit hardware.

    Alice and Bob can share a secret. Each selects a random private key which must in the range of 1 to p-1. I would restrict the range to sqrt(p) to p-sqrt(p). Let’s call these private keys a and b.

    Alice’s public key: A = aG
    Bob’s public key: B = bG
    Shared secret: bA = aB (use the x coordinate)
    (The above is the Diffie–Hellman key exchange.)

    This works because bA = b(aG) and aB = a(bG) and the fact that multiplication is associative i.e. b(aG) = a(bG).

    ECC works because while A = aG is straightforward, a = A/G is computationally infeasible.

    Thus using NIST p-256, Alice and Bob can share a 256-bit secret via email without any snooping party learning more than the shared secret is 256 bits long.

  165. barry says:

    @David, Yes I accept the 1.1 bits of entropy per character for the written english language, for a person who can read english. And that last qualification is important, as is the fact that Shannon came to that figure by experimenting on humans rather than being able to calculate it from the letters of the texts alone. The entropy-rate by that measure is a function of how good the predictor is as well as on how predictable the particular sequence is. Though I'm sure there are computer programs as good or better at "pick the next letter of english text" than humans now.

    The "Limitations of entropy as information content" section of the Wikipedia page for Entropy_information_theory helped explain the discrepancy, and made me feel OK about using the non-stochastic frequency probability entropy figure of around 4.5 bits:

    It is important not to confuse the above concepts. Often it is only clear from context which one is meant For example, when someone says that the "entropy" of the English language is about 1.5 bits per character, they are actually modeling the English language as a stochastic process and talking about its entropy rate.

    eg. the next letter in: 'popeha..' is probably 't' (could be 'm', but I don't know what popeham is), so that next letter has an entropy rate close to zero bits. But once you overlay even two english texts together, you get something like 'lWkctE..' The increased difficulty in predicting the next letter takes the entropy rate up to near its maximum for whatever alphabet is being used. That is what I meant by "Stochastic entropy washes out first, after the second book".

    And as far as I can tell, letter frequency entropy washes out at about 10 books (which is a lot more than I would have guessed)

  166. barry says:

    I'm not asking for an editor previewer, but a big red 'CHECK YOUR TAG SLASHES' sign next to the 'Submit Comment' button would help.

  167. En Passant says:

    @barry –

    Daran on Sep 11, 2013 @10:52 pm above mentioned Schneier's law, which is the reason (though I had forgotten the semi-formal attribution) that I said:

    If I understand the arguments correctly, my gut inclination is to agree with you. But my gut also tells me that I could be very wrong.

    Seems to me (by guess and by golly) that the time to decrypt such a cipher would be at best polynomial w.r.t. the size of the public corpus source for the OTP. So, a sufficiently huge public corpus could make decryption time long enough to provide security within the users' lifetimes. But I'd be inclined to choose a public corpus sufficient to make decryption time, if it is polynomial w.r.t. size of corpus, many many orders of magnitude longer.

    But Schneier's law is the right point, so I'd like to see a much more mathematically rigorous demonstration of the method's security against as many conceivable attacks as possible.

  168. barry says:

    @Daran, after reading your reply, thinking about it, having a few coffees, going for a drive, thinking about it, sleeping on it, reading it a few more times.. I think I can see your point, that while you can't unscramble an omelet, With known ingredients and enough time, it's possible to scramble enough other omelets to find one scrambled in exactly the same way, and the weak points in the 'pad' are where combinations of common words line up in the same position in all of the books used. Is that close? (it's not something I'd bet my life on anyway)

    I had been concentrating more on "how random is the final pad" side, which is interesting it itself, though I accept is not the total source of security. And an odd result of my experiment to actually modulo-256 add 17 texts together and measure the '-p*log(p)' quantity (to which I seem to be developing an odd aversion against calling 'entropy'). As I mentioned before, the graph flattens out after about 10 books (ie adding more books to the first 10 doesn't increase the -p*log(p) quantity, but it seems to flatten out at 7.8 bits per character rather than any closer to the expected 8.0

    it could be something as dumb as a program error always counting one particular letter twice, or something more mysterious (but I'm not looking today).

    ps. David|Clark, thanks for fixing the tags.

  169. Daran says:

    @Daran, after reading your reply, thinking about it, having a few coffees, going for a drive, thinking about it, sleeping on it, reading it a few more times.. I think I can see your point, that while you can't unscramble an omelet, With known ingredients and enough time, it's possible to scramble enough other omelets to find one scrambled in exactly the same way, and the weak points in the 'pad' are where combinations of common words line up in the same position in all of the books used. Is that close? (it's not something I'd bet my life on anyway)

    Basically, yes.

    Here's another attack: Alice wants to send a message to Bob, so both of them attempt to download Project Guttenburg's corpus. Mallory intercepts Alice's connection and substitutes her own texts, modified to increase the frequency of the coincidences attacked in my previous comment. Unless Mallory can also do the same to Bob, he won't be able to decrypt Alice's message, so may eventually work out that the protocol has been compromised, but that's a bit late. Mallory has decrypted the message.

    Alice and Bob could block this by using secure connections and the existing PKI. But if you're going to rely upon that for your security, why not just use it directly?

  170. Daran says:

    Here's a third attack. In fact, I'm not going to attack your system. I'm going to attack a much stronger one.

    Instead of using Project Gutenberg, create 42,000 completely random files, each 10MB long. Instead of choosing seventeen of them, choose any number. Your "short" key is now 42,000 bits long. You now have 2^42,000 "one time pads" to choose from, including all the ones your "choose seventeen" approach gives you, and countless others. If I can break this system, yours has no chance.

    This system is trivially vulnerable to a known plaintext attack. For each byte of plaintext I know, I can recover one byte of keytext. Each byte of keytext gives me a linear equation in 44,000 unknowns. 44,000 such equations, or perhaps a little more, is easily soluble. Even if I'm a few bytes short, I might be able to reduce the keyspace sufficiently to brute force the rest of the way.

  171. En Passant says:

    This system is trivially vulnerable to a known plaintext attack. For each byte of plaintext I know, I can recover one byte of keytext. Each byte of keytext gives me a linear equation in 44,000 unknowns. 44,000 such equations, or perhaps a little more, is easily soluble.

    Minor nit. Leave it at 44K unknowns and 44K equations.

    More linear equations than unknowns will overdetermine the solution set. The equation set is either inconsistent, yielding ambiguous solutions for 1 or more unknowns; or some equations are linearly dependent, which can yield a solution, but at least one equation can be ignored.

  172. Daran says:

    Minor nit. Leave it at 44K unknowns and 44K equations.

    Yeah, I meant all the number to be the same. My bad.

    More linear equations than unknowns will overdetermine the solution set. The equation set is either inconsistent, yielding ambiguous solutions for 1 or more unknowns; or some equations are linearly dependent, which can yield a solution, but at least one equation can be ignored.

    If the message was encoded correctly, there will always be at least one solution – the correct one. It's likely that there will be some dependencies in the equations even if there are as many as unknowns, or fewer, which is why I said "a little more".

    The point is, with sufficient known plaintext, attacking this system is a solved problem in mathematics, not just theoretically but practically. Matricies of this size are solved routinely these days.

  173. barry says:

    [I got a goose !]
    @Daran, Once I understood your attack #3, and drew a little 4 row example to check it works, my reactions were:
    1. Damn heck, it's broken already!
    2. But its still OK for encrypting tweets isn't it?

    If I'm reading it right, your attack #3 depends on the message having at least as many characters as the number of possible books used. The number of equations is the length of the message, and the number of variables is the length of the short-key (the 42,000 books or random files), and solving the simultaneous equations with the inverse matrix etc will produce the short key (or a set of possible short keys) .

    The first idea for a fix (counter-attack) was from En Passants assumption that the the order of the books was a secret. For a #3 attack, this increases the number of variables to 42,000!(factorial) which is a pretty big number which the message length won't come close to. So now the 'medium key' that defines how the books are shuffled for each sender/receiver pair becomes necessary rather than a luxury. And with the use of a medium key, the short key (which is what method #3 finds) doesn't have to be secret anyway. I know that looks a bit like cheating. By making the key larger and larger with each 'fix', when it gets as large as the text to be encrypted, you might as well use a true one-time-pad.

    But back to your original #3 attack:

    For each byte of plaintext I know, I can recover one byte of keytext. Each byte of keytext gives me a linear equation in 44,000 unknowns. 44,000 such equations, or perhaps a little more, is easily soluble

    Now, after sleeping on it, I'm not so sure, particularly about what solving this mass of polynomials is finding. Your explanation is short so I might be assuming some things that you aren't. So here is what I've got so far.:

    *each equation is 44,000 elements long (actually 44,000 + 1 because you're including one element from either the plaintext or the cyphertext (it should work both ways?) for the augmented matrix)

    *You've got a sample text at least 44,001 characters long, that you can cut down to exactly a square solvable matrix. Each of these characters is one 44,001 element polynomial equation.

    *The 44,001 x 44,001 matrix is solvable, and produces the 44,000 bit short-key that generated that cryptext.

    Unless I've totally misunderstood the attack method, this allows you to find the short-key (and therefore generate the large pad) which encrypted any particular plaintext into its cryptext. So if you've got both the plaintext and cryptext of one message that Alice sent Bob, AND Alice uses the same key to send the next message message, you've got them ! This emphasizes the importance of the 'one time' aspect of a one time pad, but does it do more?

  174. barry says:

    Sorry (previous long post for short point).
    As it says on the box; "Keyed-One-Time-Pad. to Use a pad only once, use a key only once."
    With any one time pad, if you have the plaintext and cryptext, you can find the pad used just by subtracting them.

    While solving a 44,000 x 44,000 matrix to find the short key that produced that pad is interesting and ingenious, I'm not counting that as 'making the system vulnerable' (except in the sense that not keeping the secret key secret makes any system vulnerable).

    Should being vulnerable only to a brute force attack be counted as a win or a lose?

  175. Daran says:

    @barry, I'll take your points in reverse order

    Should being vulnerable only to a brute force attack be counted as a win or a lose?

    Ordinarily it's considered a win, provided the keyspace is sufficiently large that a brute force search will always be unfeasible with an extreme margin of safety.

    That said, I don't agree that this applies In the case of a scheme which purports to be (equivalent to) a one-time pad. The salient characteristic of a true OTP is that it is theoretically unbreakable. It cannot be broken by brute force. It cannot be broken by an algorithm slower than brute force. It cannot be broken (by cryptographic means) at all.

    What is the point of going to all this trouble and collecting and storing all this data, if the end result is a system which at best has cryptographic properties no better than existing, well-analysed ciphers which are more convenient to use?

    As it says on the box; "Keyed-One-Time-Pad. to Use a pad only once, use a key only once."

    With a OTP you don't have to use the pad only once; you have to use the numbers only once. If the first message you send has 100 bytes, you can safely encrypt your next message starting at byte 101 from the pad. Conceptually, using the known plaintext from one message to decrypt a second message encrypted using the same key is indistinguishable from using a known fragment of plaintext from one message to decrypt the rest of the message.

    As long as you don't encrypt more than 44,000 bytes using a single short key – whether that's 44,000 bytes for a single message or several, then my attack won't work, though it might serve to reduce the brute-force search space.

    If I'm reading it right, your attack #3 depends on the message having at least as many characters as the number of possible books used.

    The method won't yield a complete solution if there are fewer than #books bytes of known plaintext. It's not guaranteed to produce a unique solution even if there are more

    The first idea for a fix (counter-attack) was from En Passants assumption that the the order of the books was a secret. For a #3 attack, this increases the number of variables to 42,000!(factorial) which is a pretty big number which the message length won't come close to. So now the 'medium key' that defines how the books are shuffled for each sender/receiver pair becomes necessary rather than a luxury. And with the use of a medium key, the short key (which is what method #3 finds) doesn't have to be secret anyway.

    The order of books is irrelevant. The method doesn't find the short key. It finds the set of books which are used to make the long key. If the attacker doesn't know the book order, he can't derive the short key, but he can still decrypt the message.

  176. Daran says:

    *each equation is 44,000 elements long (actually 44,000 + 1 because you're including one element from either the plaintext or the cyphertext (it should work both ways?) for the augmented matrix)

    Each equation has 44,000 unknowns augmented with the difference between the plain-and corresponding cyphertext. It is the number of unknowns which is salient here, not the "plus one".

    Each unknown is equal to 1 if the corresponding book contributes to the long key, 0 otherwise. The coefficients are just the byte values taken from the books.

    *You've got a sample text at least 44,001 characters long, that you can cut down to exactly a square solvable matrix. Each of these characters is one 44,001 element polynomial equation.

    Ideally at least 44,000 characters long, same as the number of unknowns. With 44,000 a unique solution is possible, though not guaranteed. If you have more than 44,000 characters, you don't need to cut the matrix down to a square. The methods of linear algebra will work for overspecified systems, and the chance of finding a unique solution is increased.

    The equations are polynomial, yes, but the more significant fact is that they are linear – the simplest type of polynomial and also the easiest to solve.

    *The 44,001 x 44,001 matrix is solvable, and produces the 44,000 bit short-key that generated that cryptext.

    For a given book ordering, yes. If the attacker doesn't know that actually book ordering he chooses at his convenience. The resulting key will be a permutation of the original key, but it will still work using the attackers permutated book ordering.

    Another thing to note. This attack does not depend upon each book being used once only. Each book could be added multiple times (up to 255 after which the sequence cycles) and the method would still work. Such a system would require, not 44,000 bits of short key, but 44,000 bytes. Of course if you have 44,000 bytes of key, you might as well use it as your one-time pad directly.

    The point here is that the attacker has information that isn't used in this attack. He knows that each book is included exactly once, or not at all. And he knows that only 17 books are included. These constraints vastly reduce the size of the key, from 44,000 bytes to about 256 bits. Theoretically you could use this information to effect a similar reduction in the number of equations required to specify the system. I can't think of a practical way to do this, but you're not designing a system that only has to defeat me. You have to defeat the best cryptanalysts in the world. Would you bet your life that they couldn't?

    As En Passent said: Schneier's law is the right point. The fact that I – who have a good mathematical background, but know particular knowledge or experience of cryptanalysis – was able to defeat your system so easily ought to satisfy everyone including you, that you are not competent to design a cryptosystem.

    Here's another point. There is a reason One-time pads are considered to be an indicator of snake-oil. As Schneier put it: "Most companies that claim they have a one-time pad actually do not. They have something they think is a one-time pad." you have something you thought was an OTP, but it isn't an OTP. And if it's not an OTP, what is there to commend it?

  177. barry says:

    Hi Daran,

    The order of books is irrelevant. The method doesn't find the short key. It finds the set of books which are used to make the long key.

    Yep, i got that one exactly wrong (not booklistlength.factorial at all). And do accept it is not a true one time pad, though I have been using the word 'pad' to mean the randomesque string used to encrypt the message\long key, I think I've been careful to say it works like a one time pad, and have not actually called it one, though the reason for using a key only once is the same.

    It's not a system I'm particularly invested in. xtmar asked the question on the 8th, and David replied that there wasn't really any progress to make, and somewhere between then and the 10th the idea occured to me.

    Much of the discussion was about how big a number 2^256 really was, which is why I chose 17 books_ to make the short key 256 bits long. It's not a system I would ever actually use. I do have a handful of Gutenberg books on my HD, but 50GB is still a lot of memory for me, and for the practical problem of transmitting the keys, the public/private key pairs of systems like pgp looks hard to beat.

    Each equation has 44,000 unknowns augmented with the difference between the plain-and corresponding cyphertext.

    The fact that I – who have a good mathematical background, but know particular knowledge or experience of cryptanalysis – was able to defeat your system

    Yep, I got that the difference between the plaintext and corresponding cyphertext I have been calling the 'pad' or long-key. And that your system finds the set of books that make up that particular pad (which is different with each message)
    But I do not call that a 'defeat' of the system because you need the original plaintext to solve the equations.

    Or, it's a 'one time defeat', When the sender uses the next short key, the solution to the previous 44,000 x 44,000 equations is no help. And you can only solve the next message when you have the plaintext too. And the person reading it to you is likely to ask "why do you need to know which books encrypted this?"

    True 'one time pads' are vulnerable to plaintext attacks in that sense too.

    I'm not saying its unsolvable, just that I haven't seen it yet, and your method isn't it.

    but it isn't an OTP. And if it's not an OTP, what is there to commend it?

    Much shorter keys !

  178. Daran says:

    I have been using the word 'pad' to mean the randomesque string used to encrypt the message\long key,

    There is a whole class of ciphers – stream ciphers – which work this way. The (short) key is the seed to a pseudo-random number generator.

    I think I've been careful to say it works like a one time pad, and have not actually called it one

    You're still missing the point. It only "works like a one time pad" in the most superficial way. It doesn't have the "theoretically unbreakable" property that makes a true OTP (theoretically) desirable.

    But I do not call that a 'defeat' of the system because you need the original plaintext to solve the equations.

    Seriously?

    So the NSA intercepts your message and suspect that you're transmitting a copy of one of Snowdon's files, (which they have copies of), so they try each one as the "known plaintext" until they find the match. Now they have proof you sent one of Snowdon's files.

    That's not a defeat?

    Or, it's a 'one time defeat',

    Tell that to the judge.

    When the sender uses the next short key, the solution to the previous 44,000 x 44,000 equations is no help. And you can only solve the next message when you have the plaintext too. And the person reading it to you is likely to ask "why do you need to know which books encrypted this?"

    You're not going to use the key to encrypt just one byte, are you? Say your message length is N. A known plaintext attack against some smaller number M of bytes would allow an attacker to deduce the rest of the message.

    True 'one time pads' are vulnerable to plaintext attacks in that sense too.

    No they're not. Without the corresponding keytext fragment, OTP cyptertext is useless to the cryptanalyst. No amount of known plaintext from other parts of the message will help. Even if you know every byte but one, you cannot decode the last one, except in the trivial sense that you might be able to deduce the rest of the message from the plaintext alone.

    So if the known part of the message reads "???? Barack Oba???", then you can probably guest the last three characters, but there is no way to tell whether the first says "kill" or "save" or something else.

    Your system does not have this property.

    I'm not saying its unsolvable, just that I haven't seen it yet, and your method isn't it.

    Where is the burden of proof here? Is it for me to show that the NSA could break your system? Or for you to show that it could not?

    but it isn't an OTP. And if it's not an OTP, what is there to commend it?

    Much shorter keys !

    What is there to commend it over AES, Twofish, Serpent, Camellia, etc, each of which:

    1. Does not require huge amounts of data to be stored,
    2. Has survived years of cryptanalysis by some of the best brains in the world,
    3. Has proven to be resistant against all the attacks I've described, and many other2,
    4. Can encrypt messages of arbitrary length without becoming vulnerable,
    5. Has keys which can be used indefinitely,
    6. Has 256-bit keys or shorter (but still more than enough to resist brute-force)?

  179. Daran says:

    I said:

    I would expect to be able to factor a 512-bit modulus in no more than a few weeks on a single modern high-end desktop, with no great experience or knowledge of how the algorithms work, using freely available software.

    The 696-bit challenge number RSA-210 has just been factored, apparently by an individual. There are no details yet on the time and resources used.

    http://www.mersenneforum.org/showthread.php?t=18626