Project

General

Profile

Bug #5978

YAML.load_stream should process documents as they are read

Added by trans (Thomas Sawyer) over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Target version:
ruby -v:
ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-linux]
[ruby-core:42404]

Description

Psych say YAML.load_documents is deprecated and say to use YAML.load_stream instead.

Looking at the implementation for load_stream(), looks to me as if it waits for all documents in the stream to load before anything can be done with it.

# File 'lib/psych.rb', line 221

def self.load_stream yaml
  parse_stream(yaml).children.map { |child| child.to_ruby }
end

I don't think this should be the case. Ideally load_stream() would take a block, and if an IO object is given, read a document, yield it and then read the next document, and so on.

I imagine an Enumerator might be applicable to this as well.

noname (500 Bytes) noname Anonymous, 02/09/2012 02:23 AM
noname (500 Bytes) noname Anonymous, 02/09/2012 11:23 AM

Associated revisions

Revision a2e3de1b
Added by tenderlove over 6 years ago

  • ext/psych/lib/psych.rb (parse_stream, load_stream): if a block is
    given, documents will be yielded to the block as they are parsed.
    [Bug #5978]

  • ext/psych/lib/psych/handlers/document_stream.rb: add a handler that
    yields documents as they are parsed

  • test/psych/test_stream.rb: corresponding tests.

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@34953 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

Revision 34953
Added by tenderlove over 6 years ago

  • ext/psych/lib/psych.rb (parse_stream, load_stream): if a block is
    given, documents will be yielded to the block as they are parsed.
    [Bug #5978]

  • ext/psych/lib/psych/handlers/document_stream.rb: add a handler that
    yields documents as they are parsed

  • test/psych/test_stream.rb: corresponding tests.

Revision 34953
Added by tenderlovemaking (Aaron Patterson) over 6 years ago

  • ext/psych/lib/psych.rb (parse_stream, load_stream): if a block is
    given, documents will be yielded to the block as they are parsed.
    [Bug #5978]

  • ext/psych/lib/psych/handlers/document_stream.rb: add a handler that
    yields documents as they are parsed

  • test/psych/test_stream.rb: corresponding tests.

Revision 34953
Added by tenderlove over 6 years ago

  • ext/psych/lib/psych.rb (parse_stream, load_stream): if a block is
    given, documents will be yielded to the block as they are parsed.
    [Bug #5978]

  • ext/psych/lib/psych/handlers/document_stream.rb: add a handler that
    yields documents as they are parsed

  • test/psych/test_stream.rb: corresponding tests.

Revision 34953
Added by tenderlove over 6 years ago

  • ext/psych/lib/psych.rb (parse_stream, load_stream): if a block is
    given, documents will be yielded to the block as they are parsed.
    [Bug #5978]

  • ext/psych/lib/psych/handlers/document_stream.rb: add a handler that
    yields documents as they are parsed

  • test/psych/test_stream.rb: corresponding tests.

Revision 34953
Added by tenderlove over 6 years ago

  • ext/psych/lib/psych.rb (parse_stream, load_stream): if a block is
    given, documents will be yielded to the block as they are parsed.
    [Bug #5978]

  • ext/psych/lib/psych/handlers/document_stream.rb: add a handler that
    yields documents as they are parsed

  • test/psych/test_stream.rb: corresponding tests.

Revision 7d984d76
Added by tenderlove about 6 years ago

merge revision(s) 32578,33401,33403,33404,33531,33655,33679,33809,33900,33965,34067,34069,34087,34328,34330,34527,34772,34783,34839,34914,34953,34954,35153: [Backport #6212]

* ext/psych/lib/psych.rb: updating version to match gem
* ext/psych/psych.gemspec: ditto
* ext/psych/lib/psych/visitors/to_ruby.rb: fixing deprecation warning

* ext/psych/lib/psych.rb: define a new BadAlias error class.

* ext/psych/lib/psych/visitors/to_ruby.rb: raise an exception when
  deserializing an alias that does not exist.

* test/psych/test_merge_keys.rb: corresponding test.

* ext/psych/lib/psych.rb (load, parse): stop parsing or loading after
  the first document has been parsed.

* test/psych/test_stream.rb: pertinent tests.

* ext/psych/lib/psych.rb (parse_stream, load_stream): if a block is
  given, documents will be yielded to the block as they are parsed.
   [Bug #5978]

* ext/psych/lib/psych/handlers/document_stream.rb: add a handler that
  yields documents as they are parsed

* test/psych/test_stream.rb: corresponding tests.

* ext/psych/lib/psych/core_ext.rb: only extend Kernel if IRB is loaded
  in order to stop method pollution.

* ext/psych/lib/psych.rb: default open YAML files with utf8 external
  encoding. 
* test/psych/test_tainted.rb: ditto

* ext/psych/parser.c: prevent a memory leak by protecting calls to
  handler callbacks.
* test/psych/test_parser.rb: test to demonstrate leak.

* ext/psych/parser.c: set parser encoding based on the YAML input
  rather than user configuration.
* test/psych/test_encoding.rb: corresponding tests.
* test/psych/test_parser.rb: ditto
* test/psych/test_tainted.rb: ditto

* ext/psych/parser.c: removed external encoding setter, allow parser
  to be reused.
* ext/psych/lib/psych/parser.rb: added external encoding setter.
* test/psych/test_parser.rb: test parser reuse

* ext/psych/lib/psych/visitors/to_ruby.rb: Added support for loading
  subclasses of String with ivars
* ext/psych/lib/psych/visitors/yaml_tree.rb: Added support for dumping
  subclasses of String with ivars
* test/psych/test_string.rb: corresponding tests

* ext/psych/lib/psych/visitors/to_ruby.rb: Added ability to load array
  subclasses with ivars.
* ext/psych/lib/psych/visitors/yaml_tree.rb: Added ability to dump
  array subclasses with ivars.
* test/psych/test_array.rb: corresponding tests

* ext/psych/emitter.c: fixing clang warnings. Thanks Joey!

* ext/psych/lib/psych/visitors/to_ruby.rb: BigDecimals can be restored
  from YAML.
* ext/psych/lib/psych/visitors/yaml_tree.rb: BigDecimals can be dumped
  to YAML.
* test/psych/test_numeric.rb: tests for BigDecimal serialization

* ext/psych/lib/psych/scalar_scanner.rb: Strings that look like dates
  should be treated as strings and not dates.

* test/psych/test_scalar_scanner.rb: corresponding tests.

* ext/psych/lib/psych.rb (module Psych): parse and load methods take
  an optional file name that is used when raising Psych::SyntaxError
  exceptions
* ext/psych/lib/psych/syntax_error.rb (module Psych): allow nil file
  names and handle nil file names in the exception message
* test/psych/test_exception.rb (module Psych): Tests for changes.

* ext/psych/parser.c (parse): parse method can take an option file
  name for use in exception messages.
* test/psych/test_parser.rb: corresponding tests.

* ext/psych/lib/psych.rb: remove autoload from psych
* ext/psych/lib/psych/json.rb: ditto

* ext/psych/lib/psych/tree_builder.rb: dump complex numbers,
  rationals, etc with reference ids.
* ext/psych/lib/psych/visitors/yaml_tree.rb: ditto
* ext/psych/lib/psych/visitors/to_ruby.rb: loading complex numbers,
  rationals, etc with reference ids.
* test/psych/test_object_references.rb: corresponding tests

* ext/psych/lib/psych/scalar_scanner.rb: make sure strings that look
  like base 60 numbers are serialized as quoted strings.
* test/psych/test_string.rb: test for change.

* ext/psych/parser.c: remove unused variable.

* ext/psych/lib/psych/syntax_error.rb: Add file, line, offset, and
  message attributes during parse failure.
* ext/psych/parser.c: Update parser to raise exception with correct
  values.
* test/psych/test_exception.rb: corresponding tests.

* ext/psych/parser.c (parse): Use context_mark for indicating error
  line and column.

* ext/psych/lib/psych/scalar_scanner.rb: use normal begin / rescue
  since postfix rescue cannot receive the exception class. Thanks
  nagachika!

git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/branches/ruby_1_9_3@35165 b2dd03c8-39d4-4d8f-98ff-823fe69b080e

History

#1 [ruby-core:42441] Updated by Anonymous over 6 years ago

On Wed, Feb 08, 2012 at 01:47:31AM +0900, Thomas Sawyer wrote:

Issue #5978 has been reported by Thomas Sawyer.


Bug #5978: YAML.load_stream should process documents as they are read
https://bugs.ruby-lang.org/issues/5978

Author: Thomas Sawyer
Status: Open
Priority: Normal
Assignee:
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-linux]

Psych say YAML.load_documents is deprecated and say to use YAML.load_stream instead.

Looking at the implementation for load_stream(), looks to me as if it waits for all documents in the stream to load before anything can be done with it.

# File 'lib/psych.rb', line 221

def self.load_stream yaml
  parse_stream(yaml).children.map { |child| child.to_ruby }
end

I don't think this should be the case. Ideally load_stream() would take a block, and if an IO object is given, read a document, yield it and then read the next document, and so on.

I imagine an Enumerator might be applicable to this as well.

I'd rather not change load_stream, but I want this functionality as
well. What about something like this:

YAML::Reader.new(io).each do |doc|
...
end

Deserialized documents will be yielded as read. Does that seem
acceptable? I'm hesitant to make it enumerable though because if we're
truly doing stream processing, you couldn't iterate on the same object
twice (imagine reading YAML from a socket or something).

--
Aaron Patterson
http://tenderlovemaking.com/

#2 [ruby-core:42442] Updated by trans (Thomas Sawyer) over 6 years ago

Yea, that would suffice. It would still be nice to have a more intuitive/convenient class method though.

What about a new method, process_stream or each_document, or something like that, to wrap that code? Oh wait... why not just keep load_documents method for this and that way it will remain backward compatible with Syck API?

#3 [ruby-core:42454] Updated by Anonymous over 6 years ago

On Thu, Feb 09, 2012 at 03:51:53AM +0900, Thomas Sawyer wrote:

Issue #5978 has been updated by Thomas Sawyer.

Yea, that would suffice. It would still be nice to have a more intuitive/convenient class method though.

What about a new method, process_stream or each_document, or something like that, to wrap that code? Oh wait... why not just keep load_documents method for this and that way it will remain backward compatible with Syck API?

Honestly, I think you're right about the load_stream method. I'll
just make it take a block and act the same as load_documents.

--
Aaron Patterson
http://tenderlovemaking.com/

#5 [ruby-core:43100] Updated by tenderlovemaking (Aaron Patterson) over 6 years ago

  • Assignee set to tenderlovemaking (Aaron Patterson)

#6 Updated by tenderlovemaking (Aaron Patterson) over 6 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r34953.
Thomas, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • ext/psych/lib/psych.rb (parse_stream, load_stream): if a block is
    given, documents will be yielded to the block as they are parsed.
    [Bug #5978]

  • ext/psych/lib/psych/handlers/document_stream.rb: add a handler that
    yields documents as they are parsed

  • test/psych/test_stream.rb: corresponding tests.

Also available in: Atom PDF