Bug #5978

YAML.load_stream should process documents as they are read

Added by Thomas Sawyer about 2 years ago. Updated about 2 years ago.

[ruby-core:42404]
Status:Closed
Priority:Normal
Assignee:Aaron Patterson
Category:-
Target version:2.0.0
ruby -v:ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-linux] Backport:

Description

Psych say YAML.load_documents is deprecated and say to use YAML.load_stream instead.

Looking at the implementation for load_stream(), looks to me as if it waits for all documents in the stream to load before anything can be done with it.

# File 'lib/psych.rb', line 221

def self.load_stream yaml
  parse_stream(yaml).children.map { |child| child.to_ruby }
end

I don't think this should be the case. Ideally load_stream() would take a block, and if an IO object is given, read a document, yield it and then read the next document, and so on.

I imagine an Enumerator might be applicable to this as well.

noname (500 Bytes) Anonymous, 02/09/2012 02:23 AM

noname (500 Bytes) Anonymous, 02/09/2012 11:23 AM

Associated revisions

Revision 34953
Added by tenderlove about 2 years ago

  • ext/psych/lib/psych.rb (parsestream, loadstream): if a block is
    given, documents will be yielded to the block as they are parsed.
    [Bug #5978]

  • ext/psych/lib/psych/handlers/document_stream.rb: add a handler that
    yields documents as they are parsed

  • test/psych/test_stream.rb: corresponding tests.

History

#1 Updated by Anonymous about 2 years ago

On Wed, Feb 08, 2012 at 01:47:31AM +0900, Thomas Sawyer wrote:

Issue #5978 has been reported by Thomas Sawyer.


Bug #5978: YAML.load_stream should process documents as they are read
https://bugs.ruby-lang.org/issues/5978

Author: Thomas Sawyer
Status: Open
Priority: Normal
Assignee:
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-linux]

Psych say YAML.load_documents is deprecated and say to use YAML.load_stream instead.

Looking at the implementation for load_stream(), looks to me as if it waits for all documents in the stream to load before anything can be done with it.

# File 'lib/psych.rb', line 221

def self.load_stream yaml
  parse_stream(yaml).children.map { |child| child.to_ruby }
end

I don't think this should be the case. Ideally load_stream() would take a block, and if an IO object is given, read a document, yield it and then read the next document, and so on.

I imagine an Enumerator might be applicable to this as well.

I'd rather not change load_stream, but I want this functionality as
well. What about something like this:

YAML::Reader.new(io).each do |doc|
...
end

Deserialized documents will be yielded as read. Does that seem
acceptable? I'm hesitant to make it enumerable though because if we're
truly doing stream processing, you couldn't iterate on the same object
twice (imagine reading YAML from a socket or something).

--
Aaron Patterson
http://tenderlovemaking.com/

#2 Updated by Thomas Sawyer about 2 years ago

Yea, that would suffice. It would still be nice to have a more intuitive/convenient class method though.

What about a new method, process_stream or each_document, or something like that, to wrap that code? Oh wait... why not just keep load_documents method for this and that way it will remain backward compatible with Syck API?

#3 Updated by Anonymous about 2 years ago

On Thu, Feb 09, 2012 at 03:51:53AM +0900, Thomas Sawyer wrote:

Issue #5978 has been updated by Thomas Sawyer.

Yea, that would suffice. It would still be nice to have a more intuitive/convenient class method though.

What about a new method, process_stream or each_document, or something like that, to wrap that code? Oh wait... why not just keep load_documents method for this and that way it will remain backward compatible with Syck API?

Honestly, I think you're right about the load_stream method. I'll
just make it take a block and act the same as load_documents.

--
Aaron Patterson
http://tenderlovemaking.com/

#4 Updated by Thomas Sawyer about 2 years ago

Cool.

#5 Updated by Aaron Patterson about 2 years ago

  • Assignee set to Aaron Patterson

#6 Updated by Aaron Patterson about 2 years ago

  • Status changed from Open to Closed
  • % Done changed from 0 to 100

This issue was solved with changeset r34953.
Thomas, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.


  • ext/psych/lib/psych.rb (parsestream, loadstream): if a block is
    given, documents will be yielded to the block as they are parsed.
    [Bug #5978]

  • ext/psych/lib/psych/handlers/document_stream.rb: add a handler that
    yields documents as they are parsed

  • test/psych/test_stream.rb: corresponding tests.

Also available in: Atom PDF