Project

General

Profile

Bug #19621

Updated by felix.wolfsteller@betterplace.org (Felix Wolfsteller) over 1 year ago

By default on unixoid systems, `Resolv` will read `/etc/hosts` once. Privacy- and security aware people might use the file to prevent unwanted traffic, developers use it to quickly manipulate address resolution. 

 `Resolv::Hosts` uses [`IO.read`](https://github.com/betterplace/ruby/blob/9b07d30df8c6bf65c2558c023fd6452405915610/lib/resolv.rb#LL195C4-L195C4), which seems to be inefficient when dealing with large amounts of data that should be consumed by line. 

 E.g. if you install the `/etc/hosts` additions by [hblock](https://hblock.molinero.dev/hosts) (https://github.com/hectorm/hblock), the first call to resolve an address will likely take **minutes**. 

 Unfortunately, replacing `.open ... .each` with We believe the solution is easy: Use streaming `IO.foreach` does not help. (see patch and PR attached). 

 Benchmarking with partial examplary `/etc/hosts` /etc/host from above (172751 line) with xyz done like this 

 ```ruby ``` 
 require 'resolv' 
 require 'benchmark' 

 Benchmark.measure do 
   Resolv::Hosts.new.lazy_initialize 
 end 
 ``` 

 yields to With `read`: 
 ``` ... 

 With `foreach`: 
 25.622515     8.821095    34.443610 ( 34.495448) ... 
 ``` 
 . 

 Reading in all the lines into memory first and then consuming them    (`File.readlines`) might improve the situation, but is probably not desirable due to memory concerns.

Back