Feature #10098

[PATCH] Timing-safe string comparison for OpenSSL::HMAC

Added by Matt U 10 months ago. Updated 7 months ago.

[ruby-core:64101]
Status:Assigned
Priority:Normal
Assignee:Yukihiro Matsumoto

Description

I could be totally wrong, but it seems the standard library doesn't provide a reliable way of comparing hashes in constant-time.

With this patch I propose to add an additional method, OpenSSL::HMAC#verify, which takes a binary string with a digest and compares it against the computed hash.

hmac-timing.patch Magnifier (2.5 KB) Matt U, 07/28/2014 02:58 PM

hmac-timing.patch Magnifier (2.48 KB) Matt U, 07/28/2014 03:13 PM

tsafe_eql.patch Magnifier (2.48 KB) Matt U, 07/29/2014 03:56 AM

tsafe_inline.patch Magnifier (3.51 KB) Matt U, 07/29/2014 10:29 AM

0001-add-timing-safe-string-compare-method.patch Magnifier (4.31 KB) Matt U, 08/23/2014 09:12 AM

History

#1 Updated by Matt U 10 months ago

Modified misleading error message - new patch attached.

#2 Updated by Nobuyoshi Nakada 10 months ago

  • Indent style mismatch
  • Should try to convert the argument with StringValue()
  • Why HMAC only? Other digests don't need it?
  • Probably we should provide timing-safe binary compare function?

#3 Updated by Matt U 10 months ago

Thanks for the feedback!

Nobuyoshi Nakada wrote:

  • Indent style mismatch
  • Should try to convert the argument with StringValue()

Will fix - sorry, this is my first contribution!

  • Why HMAC only? Other digests don't need it?

Good point, I thought since HMAC is for both redundancy and message authentication it made most sense on HMAC rather than all digests - but - I can see there would be use-cases for comparing other digests or other strings in a timing-safe manner.

  • Probably we should provide timing-safe binary compare function?

I think this makes the most sense :)
How about moving this to a method on String - e.g. String#slow_eql? or String::timing_safe_eql?? I'm terrible at naming things :)

#4 Updated by Nobuyoshi Nakada 10 months ago

Matt U wrote:

How about moving this to a method on String - e.g. String#slow_eql? or String::timing_safe_eql?? I'm terrible at naming things :)

slow is not the main concern here, IMHO.
The latter is more descriptive, but seems less pretty a little.

#5 Updated by Matt U 10 months ago

Nobuyoshi Nakada wrote:

slow is not the main concern here, IMHO.
The latter is more descriptive, but seems less pretty a little.

Cleaned up (hopefully correctly) and moved to String#tsafe_eql?. Any ideas for a better name?

#6 Updated by Nobuyoshi Nakada 10 months ago

Seems correct.

I made a benchmark for it:

# bm-tsafe_eql.rb: [Feature #10098]

require 'benchmark'
a = "x"*1024_000
b = a+"y"
c = "y"+a
a << "x"
n = (ARGV.shift || 100000).to_i
Benchmark.bm(15) {|bm|
  bm.report("a==b") {n.times{a==b}}
  bm.report("a==c") {n.times{a==c}}
  bm.report("a.tsafe_eql?(b)") {n.times{a.tsafe_eql?(b)}}
  bm.report("a.tsafe_eql?(c)") {n.times{a.tsafe_eql?(c)}}
}

It seems quite slow.

user system total real
a==b 4.660000 0.000000 4.660000 ( 4.672629)
a==c 0.010000 0.000000 0.010000 ( 0.007154)
a.tsafe_eql?(b) 34.330000 0.020000 34.350000 ( 34.366759)
a.tsafe_eql?(c) 34.450000 0.020000 34.470000 ( 34.480267)

With the following patch,

user system total real
a==b 4.660000 0.000000 4.660000 ( 4.666683)
a==c 0.000000 0.000000 0.000000 ( 0.006657)
a.tsafe_eql?(b) 7.660000 0.010000 7.670000 ( 7.662584)
a.tsafe_eql?(c) 7.660000 0.000000 7.660000 ( 7.671189)
diff --git a/string.c b/string.c
index 5c2852f..90865c0 100644
--- a/string.c
+++ b/string.c
@@ -2504,7 +2504,7 @@ static VALUE
 rb_str_tsafe_eql(VALUE str1, VALUE str2)
 {
     long len, idx;
-    char result;
+    VALUE result;
     const char *buf1, *buf2;

     str2 = StringValue(str2);
@@ -2516,7 +2516,13 @@ rb_str_tsafe_eql(VALUE str1, VALUE str2)
     buf2 = RSTRING_PTR(str2);

     result = 0;
-    for (idx = 0; idx < len; idx++) {
+    idx = 0;
+    if (UNALIGNED_WORD_ACCESS || !((VALUE)buf1 % sizeof(VALUE)) && !((VALUE)buf2 % sizeof(VALUE))) {
+   for (; idx < len; idx += sizeof(VALUE)) {
+       result |= *(const VALUE *)(buf1+idx) ^ *(const VALUE *)(buf2+idx);
+   }
+    }
+    for (; idx < len; idx++) {
         result |= buf1[idx] ^ buf2[idx];
     }

#7 Updated by Nobuyoshi Nakada 10 months ago

According to notes on timingsafe_memcmp,
OpenBSD has timingsafe_memcmp(), and NetBSD has consttime_memequal().

#8 Updated by Matt U 10 months ago

Nobuyoshi Nakada wrote:

According to notes on timingsafe_memcmp,
OpenBSD has timingsafe_memcmp(), and NetBSD has consttime_memequal().

Wow, thank you for such detailed and valuable feedback (and an awesome patch!)

What do you think about extracting this to an (inline) method like rb_timingsafe_memcmp(..) which can then use the system-provided ones if they exist? Since this is moving into distro/platform-specific territory I'm not sure how this fits with Ruby's coding guidelines.

#9 Updated by Nobuyoshi Nakada 10 months ago

Agree to extract a function, I meant it by "provide timing-safe binary compare function".
But the name *_memcmp doesn't look nice, as it looks like returning the same result as memcmp.
The order on this comparison won't make sense, so *_equal or *_eql would be better.

#10 Updated by Matt U 10 months ago

What's your thoughts on this new patch?

At the moment I'm using OSX and Linux, unable to test timingsafe_memcmp() and consttime_memequal().

#11 Updated by Nobuyoshi Nakada 10 months ago

rb_tsafe_eql() doesn't need to be VALUE, int is OK.
Tests for timing-safeness are desirable, but it would be fragile by noise.

#12 Updated by cremno phobia 10 months ago

I don't like the proposed method name.

tsafe? It should be timingsafe. Saving some keystrokes isn't worth it to have a less obvious name. Even C programmers chose a longer name!

I'm also not happy about eql? either. Then people expect it to work like String#eql?. Same argument requirements/conversion, same result, and it's even mentioned in the proposed docs (“similarly” is vague)! But it doesn't. For example:

a = "\xFF".force_encoding(Encoding::CP437)
b = "\xFF".force_encoding(Encoding::CP850)
p a.eql?(b)  # => false
p a.tsafe_eql?(b)  # => true

If something is going to be added to String, then encodings need to be considered. Or:

require 'fiddle'
a = "\xFF"
(b = Fiddle::Pointer.malloc(1))[0] = 0xff
p a.eql?(b)  # => false
p a.tsafe_eql?(b)  # => true

#13 Updated by Matt U 10 months ago

Nobuyoshi Nakada wrote:

rb_tsafe_eql() doesn't need to be VALUE, int is OK.
Tests for timing-safeness are desirable, but it would be fragile by noise.

I'll get these done. Your benchmark code demonstrated this pretty well so (if it's ok with you) I'll use that as a starting point.

cremno phobia wrote:

I don't like the proposed method name.

tsafe? It should be timingsafe. Saving some keystrokes isn't worth it to have a less obvious name. Even C programmers chose a longer name!
I'm also not happy about eql? either. Then people expect it to work like String#eql?. Same argument requirements/conversion, same result, and it's even mentioned in the proposed docs (“similarly” is vague)! But it doesn't. For example:
...

Thanks for taking the time to test out this patch!
You have some really good points to consider. My thoughts in response:

  • Since Ruby has only the one String class for text and data, I think it does make sense to keep this method on the String class.
  • The behaviour you've demonstrated is intended, we care about the bytes in the buffer; not the encoding.
  • Name and documentation is terrible, I agree :)

I think the easiest way to resolve this would be to come up with a better name, and to explain more clearly in the documentation.

For a starting point, how about timingsafe_bytes_eq? ? I'll improve the documentation while making the fixes as suggested by Nobu :)

#14 Updated by cremno phobia 10 months ago

Matt U wrote:

  • Since Ruby has only the one String class for text and data, I think it does make sense to keep this method on the String class.

I agree with you!

  • The behaviour you've demonstrated is intended, we care about the bytes in the buffer; not the encoding.
  • Name and documentation is terrible, I agree :)

Then that encoding will be ignored has to be mentioned in the documentation. What happens if the length of both strings differ, too.

I don't know yet how I feel about (silently) ignoring encoding. Maybe only ASCII-8BIT strings should be allowed? But having bytes in the name is good anyway! Naming things is hard. I don't have a better idea either (timingsafe_bytecmp?, but there's also casecmp…).

Are there any gems for this or is such a method part of one?

#15 Updated by Matt U 9 months ago

Changelog:

  • Renamed rb_tsafe_eql => rb_consttime_memequal.
  • Renamed rb_str_tsafe_eql => rb_str_consttime_bytes_eq.
  • Renamed tsafe_eql? => consttime_bytes_eq?.
  • rb_consttime_memequal now has return type int.
  • Updated documentation to reflect that encodings are ignored, and removed reference to eql?.
  • Added tests to ensure timing safety (delta of 0.25 sec allowed to account for GC/system noise).
  • Build on Travis passing: https://travis-ci.org/ruby/ruby/builds/33351019

#16 Updated by Matt U 8 months ago

Keen to hear feedback if any. Completely understand there are many more important tickets than this one, but it would be great to see this feature in MRI soon!

Devise, one of the most popular frameworks currently implements a timing-safe string compare in Ruby manually: https://github.com/plataformatec/devise/blob/66db52ce31b5d8629f5813a1d7f03a8bc17e5d52/lib/devise.rb#L480-L488

#17 Updated by Tomoyuki Chikanaga 7 months ago

  • Category changed from ext/openssl to core
  • Status changed from Open to Assigned
  • Assignee set to Yukihiro Matsumoto

The latest patch seems satisfy nobu, doesn't it?
At last we need to get approved from Matz.

Also available in: Atom PDF