Bug #5978
closedYAML.load_stream should process documents as they are read
Description
Psych say YAML.load_documents is deprecated and say to use YAML.load_stream
instead.
Looking at the implementation for load_stream()
, looks to me as if it waits for all documents in the stream to load before anything can be done with it.
# File 'lib/psych.rb', line 221
def self.load_stream yaml
parse_stream(yaml).children.map { |child| child.to_ruby }
end
I don't think this should be the case. Ideally load_stream()
would take a block, and if an IO object is given, read a document, yield it and then read the next document, and so on.
I imagine an Enumerator might be applicable to this as well.
Files
Updated by Anonymous almost 13 years ago
On Wed, Feb 08, 2012 at 01:47:31AM +0900, Thomas Sawyer wrote:
Issue #5978 has been reported by Thomas Sawyer.
Bug #5978: YAML.load_stream should process documents as they are read
https://bugs.ruby-lang.org/issues/5978Author: Thomas Sawyer
Status: Open
Priority: Normal
Assignee:
Category:
Target version: 2.0.0
ruby -v: ruby 1.9.3p0 (2011-10-30 revision 33570) [x86_64-linux]Psych say YAML.load_documents is deprecated and say to use
YAML.load_stream
instead.Looking at the implementation for
load_stream()
, looks to me as if it waits for all documents in the stream to load before anything can be done with it.# File 'lib/psych.rb', line 221 def self.load_stream yaml parse_stream(yaml).children.map { |child| child.to_ruby } end
I don't think this should be the case. Ideally
load_stream()
would take a block, and if an IO object is given, read a document, yield it and then read the next document, and so on.I imagine an Enumerator might be applicable to this as well.
I'd rather not change load_stream
, but I want this functionality as
well. What about something like this:
YAML::Reader.new(io).each do |doc|
...
end
Deserialized documents will be yielded as read. Does that seem
acceptable? I'm hesitant to make it enumerable though because if we're
truly doing stream processing, you couldn't iterate on the same object
twice (imagine reading YAML from a socket or something).
--
Aaron Patterson
http://tenderlovemaking.com/
Updated by trans (Thomas Sawyer) almost 13 years ago
Yea, that would suffice. It would still be nice to have a more intuitive/convenient class method though.
What about a new method, process_stream
or each_document
, or something like that, to wrap that code? Oh wait... why not just keep load_documents
method for this and that way it will remain backward compatible with Syck API?
Updated by Anonymous almost 13 years ago
On Thu, Feb 09, 2012 at 03:51:53AM +0900, Thomas Sawyer wrote:
Issue #5978 has been updated by Thomas Sawyer.
Yea, that would suffice. It would still be nice to have a more intuitive/convenient class method though.
What about a new method,
process_stream
oreach_document
, or something like that, to wrap that code? Oh wait... why not just keepload_documents
method for this and that way it will remain backward compatible with Syck API?
Honestly, I think you're right about the load_stream
method. I'll
just make it take a block and act the same as load_documents
.
--
Aaron Patterson
http://tenderlovemaking.com/
Updated by tenderlovemaking (Aaron Patterson) over 12 years ago
- Assignee set to tenderlovemaking (Aaron Patterson)
Updated by tenderlovemaking (Aaron Patterson) over 12 years ago
- Status changed from Open to Closed
- % Done changed from 0 to 100
This issue was solved with changeset r34953.
Thomas, thank you for reporting this issue.
Your contribution to Ruby is greatly appreciated.
May Ruby be with you.
-
ext/psych/lib/psych.rb (parse_stream, load_stream): if a block is
given, documents will be yielded to the block as they are parsed.
[ruby-core:42404] [Bug #5978] -
ext/psych/lib/psych/handlers/document_stream.rb: add a handler that
yields documents as they are parsed -
test/psych/test_stream.rb: corresponding tests.