<eregon>
in Ruby, there might be more state from the environment captured, that needs to be carefully re-initialized
<enebo>
eregon: I did serialization at IR level and object creation seemed to be main cost
<eregon>
do you have that many objects?
<enebo>
eregon: AST level serialization may be a bit more light but then we would still need to build the IR
<enebo>
more instrs than AST nodes for sure
<enebo>
in some cases a lot more
<chrisseaton>
at some point de-serialising a Ruby AST or IR isn't a whole lot different from parsing is it?
<chrisseaton>
I'm not sure it would be a huge win
<enebo>
chrisseaton: well for things like repeated intern’ing vs having a constant pool which allows you to intern once per ident
<eregon>
For the AST nodes the cleaner approach is essentially to save the arguments needed to call the constructor. Then we can just call the constructor to build instead of setting up all the funny state (profiles, secializations, etc).
<enebo>
chrisseaton: not having to make two formats
<enebo>
chrisseaton: not having to check comments for pragmas
<GitHub164>
jruby/master b0b29cf Brandon Fish: [Truffle] Errno::EWOULDBLOCK as Errno::EAGAIN constant when int values equal
<enebo>
chrisseaton: I guess none of them are huge but they all have some cost
<eregon>
we need to make the parser slower to be worth it :D
<chrisseaton>
Imagine if you had enough control to allocate your AST object graph in a region of memory that you knew you could re-load it back into
<enebo>
fwiw the 9k parser is beatying the 1.7 parser it warms up about the same but ends up faster
<eregon>
The Oz parser is written in Scala, so the startup/warmup is not so nice ...
<eregon>
chrisseaton: sounds like substrate ;)
<enebo>
one thing to improve would be to make some parser instances immortal per thread so for tiny evals we get a little bit faster
<enebo>
that is icky :)
<eregon>
mmh, but easy to keep bad state around I guess
<enebo>
eregon: yyparse method constructs a big primitive array with a end index which grows data in that
<eregon>
chrisseaton: in Truffle it also skips translation entirely
<enebo>
eregon: passing that in would eliminate a fairly big allocation per parse
<enebo>
eregon: but it is in super micro opt territory
<enebo>
same for constructing a lexer and parser and parsersupport instance per parse
<eregon>
but allocations are fast, isn't it? At least for objects that are not just wasted.
<enebo>
eregon: remove some of these and stuff gets faster
<enebo>
for parsing a 5% win at this point is really huge
<enebo>
the chrisseaton mention of arenaalloc for AST would probably be one of those wins
zacts has quit [Quit: WeeChat 1.4]
<enebo>
but who wants to change the AST that much at this point :)
<enebo>
eregon: I should also say that allocation/free of objects are cheap but initialization of fields seems to make that less cheap
<eregon>
it should be just a few writes though
<enebo>
yep
<chrisseaton>
you're limited by having to touch all the memory - it's probably bus-bound
<chrisseaton>
the beauty of just mapping something is you don't need to touch the memory, or even load the pages
<enebo>
chrisseaton: yeah
<eregon>
I like the idea of having a small core Oz image sitting on my disk, like Smalltalk but without the drawbacks: you can just delete it if you like :)
<enebo>
eregon: you are old enough to have used 3.5” floppies?
<chrisseaton>
did you ever try deserialising lazily?
johnsonch is now known as johnsonch_afk
<enebo>
chrisseaton: all methods in IR serializartion are lazily decoded
<enebo>
chrisseaton: most interesting is the stats from that
<enebo>
chrisseaton: 70-80% of all methods are never actually accessed in booting rails or running gem install
<enebo>
we did see a pretty big jump in load time with default settings
<enebo>
if we enabled —dev it was only like 5% faster even with that laziness
<enebo>
so either my deserializer sucked (which it might) or we just load a lot more data perhaps
<chrisseaton>
I did want to write an Antlr grammar for Ruby and produce a new lazy parser from that, but not so keen now after using it for some other stuff
<enebo>
heh
<chrisseaton>
I feel like I could write a new, from scratch hand-written lexer and parser and it would be beautiful, but that is probably hubris
<eregon>
enebo: yeah :)
<enebo>
chrisseaton: yeah I think the bling antlr provides also comes at a price
<enebo>
chrisseaton: subbu|away and I were into the idea of Beaver which would barely be an incremental improvement over Jay
<GitHub123>
[jruby] chrisseaton pushed 1 new commit to truffle-head: https://git.io/vKVFd
<GitHub123>
jruby/truffle-head 5bad395 Chris Seaton: [Truffle] Exclude cext gc specs for now.
<enebo>
chrisseaton: but the guy stopped working on it and his alpha/beta broke some aspects of it
<chrisseaton>
Yeah, I think hand-written is the way to go
<enebo>
there were some other smaller problems with it like how you had to impl nodes
<chrisseaton>
so much stuff in the current parser and lexer happen outside the tool's normal behaviour
<enebo>
but those were pretty hackable if I wanted to support a fork
<enebo>
chrisseaton: I think de_ who maintains parser is really gung ho on fixing the weird state toggling in the lexer/parser
<enebo>
the rubygem parser
<eregon>
one dark truth is some stuff is actually much faster to validate later than at parse time
<enebo>
I wondered if a threaded radix tree could perform faster in looking up an intern’d string than interning all idents
<enebo>
ln(n) bit compares to run into leaf
<enebo>
otherwise pay the cost and have to make a node or more
<enebo>
most programs do not have a lot of identifiers
<eregon>
String intern must have some kind of hashtable, so it's about literal nodes?
<enebo>
and all keywords could be in starting tree
<eregon>
ah, for lexing?
<enebo>
eregon: yeah only for lexing
<enebo>
the datastructure I am thinking about can actually be less than ln(n) if end of string deviates enough
<eregon>
I know some parser combinator libs used some prefix tree for keywords indeed
<enebo>
although I guess I cannot do that in this case because I need to know it is really same ident
<eregon>
(expected failure, just updated specs)
<enebo>
that only works when you know it will be an ident in the tree already
blandflakes has joined #jruby
<enebo>
which is a rare case (I learned about this >20 years ago in college for ds class)
<enebo>
eregon: we use a hash and then on missing we intern
<enebo>
eregon: so in those cases where it might be a keyword we hash twice
<chrisseaton>
I feel like with all parser tools the problem I have to solve is always 'how do I express this perfectly natural piece of syntax in a twisted enough way that this tool will accept it'
<chrisseaton>
I never feel like it's helping me, always I'm having to submit to it's twisted idea about how languages should be done, which doesn't help me if the language is set in stone
<chrisseaton>
error handling is the worst case for this
<enebo>
chrisseaton: I always feel lost after a period of time where I forget all the rules of making a grammar in something like LALR
<chrisseaton>
i need to report x, y and z for Ruby errors, and Antlr gives me a, b and c and fuck me if that's not enough
<enebo>
chrisseaton: yeah and LALR has the worst reporting
<enebo>
chrisseaton: yeah that is why antlr would be challenging
<enebo>
chrisseaton: MRI just barfs out in a depth-first node as the error more or less
<enebo>
or whereever it gets to
<enebo>
on top of that I find that grammar pretty tough to follow
blandflakes has quit [Client Quit]
tcrawley is now known as tcrawley-away
blandflakes has joined #jruby
<chrisseaton>
oh and Antlr works on CharSequences, so that's a problem immediately
<chrisseaton>
I was stuffing bytes into chars last time I tried
<enebo>
chrisseaton: yeah same problem with Beaver from what I recall
<enebo>
chrisseaton: although that library ius really small