TheWhip has quit [Remote host closed the connection]
thedarkone2 has joined #jruby
<chrisseaton>
is enebo here?
<enebo>
chrisseaton: yes
<chrisseaton>
what is your thinking about redistributing jars? For example JRuby ships the psych jar, which is Apache licensed and requires the licence to be included, but I don't think we do that
<enebo>
chrisseaton: as gems I do not think a lot about it but the gems should include licensing
<enebo>
chrisseaton: I am not sure if we should at project root include that info or not
<enebo>
chrisseaton: although maybe that is a requirement?
<chrisseaton>
For gems like psych you sometimes unpack them, and you end up with just the jar sitting in your lib directory
<enebo>
chrisseaton: yeah as a default gem I guess that is true
<enebo>
chrisseaton: as an outsider it is not clear it is from that
<enebo>
chrisseaton: oh ick…so jline shades jansi and I guess jansi shades hawtjni
<enebo>
chrisseaton: I have already fielded several version/license questions for whatever this is about and found when we dual source (as in pom.xml + LICENSE.md + src) we have lost the plot in a few places
<enebo>
chrisseaton: so it is probably fine to include something for Apache bundled libs but we have to remember to change it if we ever stop using it
TheWhip has joined #jruby
TheWhip has quit [Remote host closed the connection]
<headius>
I know there's a lot of code in that PR but any review you can provide would be great
<headius>
same goes for anyone else
<enebo>
PACKED
<headius>
the gains are not enormous but tangible...100KB off a blank rails app footprint, potentially more like a few MB of allocation reduced overall for boot+request+error
<headius>
I suspect the biggest wins for this will be large data structures that use many small arrays (think two-tuples for a binary tree) and for all our transient RubyArray-packing of single block arguments
<headius>
chrisseaton: have you ever gathered data on the various libraries and benchmarks you run to see how well Truffle's object repesentation is doing?
<chrisseaton>
well we don't have anything to compare it against
<headius>
specifically, I'd like to find some code that makes heavier use of small arrays and see whether expanding this packing to 3-4 elts is worth it
<chrisseaton>
you mean like compare it to a hash map or something?
<headius>
I mean seeing how much packing only small arrays ends up being worth in an array-heavy library
<headius>
analysis of a Rails app showed a rapid drop-off in number of arrays as the size increases beyond 3 or 4
<headius>
but there's not an enormous number of arrays in there to begin with
<headius>
(blank app)
<headius>
the next project I want to look at will either be an all-primitive Array or small packed forms of Hash for just a few elements
<headius>
packed objects seem to be working well...they could potentially have primitive forms as well
<headius>
base blank rails app boots up in around 50MB of heap
<chrisseaton>
Maybe run my acid test with the three-element arrays in it?
<chrisseaton>
That's designed to exercise a compact representation of small arrays
<headius>
I don't pack three-elt arrays because they appeared to be much rarer than one and two...but it is on my mind
<headius>
it wouldn't be a tremendous amount of work to add additional packing sizes
<enebo>
chrisseaton: do you have any tools for measuring memory?
<enebo>
chrisseaton: Last time I tried I just looked at processes RSS
<chrisseaton>
Yes we measure total allocation and minimum heap size for running Truffle
<enebo>
chrisseaton: so there is an API for asking for heap
<headius>
minimum heap size = maximum heap allocation yeah?
<enebo>
chrisseaton: and all code is in heap too or is native code generated not part of that?
<chrisseaton>
We bisect heap sizes until the program can't run any more
<headius>
or maximum heap occupation
<chrisseaton>
No, the smallest heap we can function in
<headius>
right
<headius>
same thing then
<chrisseaton>
No!
<headius>
Yes!
<enebo>
heh
<chrisseaton>
Because if I give you 10 GB of heap, and you use it for caches, then that's not a minimum is it
<headius>
the minimum heap you can run in is bounded by the maximum number of live objects you'll need in the heap
<chrisseaton>
It's what it says on the tin - the smallest heap you can run it
<headius>
the maximum number to run properly
<enebo>
chrisseaton: so you tally up live objects and that is minimum
<enebo>
?
<chrisseaton>
No, it's the smallest -Xmx value we can run in without crashing
<enebo>
chrisseaton: ah I see. I have not tried that
<chrisseaton>
You can't just measure RSS, because if you give Java extra memory it may find something useful to do with that, but that doesn't mean it needed that space
<enebo>
chrisseaton: no I figure RSS is worst-possible way which was why I was asking
<chrisseaton>
No that's the mistake - it isn't worst possible
<chrisseaton>
Because Java will fill up caches if it has spare space
<headius>
I'm just drawing a distinction with heap size after a memory-heavy boot cycle...that blank Rails app ends up at 50MB of occupied heap, but the heap grows to several hundred MB while booting
<enebo>
chrisseaton: yeah but I am interested in minimum not what you happened to allocate
<headius>
so it's the maximum amount of required heap occupation that gives you a minimum heap size for running
<enebo>
chrisseaton: so RSS tells me nothing
<chrisseaton>
This problem is so complex - and really we need to measure performance at the same time, because it'll be slower with less memory, faster with more, so you can't separate the two
<chrisseaton>
So I tried to think about what practically matters - and people want to know if they can run in their DO droplet - and this answers that question
<headius>
what tool are you using for that measurement now?
<chrisseaton>
tools/jt.rb metrics minheap and alloc - you can run it for classic with -Xclassic
subbu is now known as subbu|away
<chrisseaton>
Let me put together an example
<headius>
RSS includes a lot more than heap though
<enebo>
chrisseaton: yeah my way is pretty horrible but I just eye ball it in visualvm by jamming on the gc button once whatever I am measuring runs long enough to hit a stable state :)
<headius>
you can size the heap all you want but still need a lot more than heap to run
<headius>
GC'ed heap I mean
<headius>
are you talking GC heap or total process heap?
<enebo>
who?
<headius>
when you say "minimum heap"
<enebo>
oh chris
<enebo>
heh
<chrisseaton>
It's literally -Xmx
<headius>
I've seen plenty of JVM apps that consumed several hundred MB more than Xmx
<chrisseaton>
Oh right, metaspace etc
<headius>
there's a lot of invisible native memory used by JVM and JNI
<chrisseaton>
Yeah we aren't measuring that
<enebo>
I was able to run empty rails app on raspi so I think our heap can go quite low
<headius>
so you need to know RSS minimum more than Xmx minimum
<headius>
well not more than...in addition to
<chrisseaton>
I'm mainly looking for changes at the moment, so not looking at the absolute yet
<enebo>
chrisseaton: headius: are there tools for measuring native generated code?
<headius>
nice...your tool should be useful for classic as well
<headius>
code?
<chrisseaton>
I think the best thing to do would be to literally have a sever with 512 MB, 1 GB, 10 GB of RAM, and run benchmarks on those
<headius>
you can query how much of the code cache is being used
<chrisseaton>
Takes a few minutes
<enebo>
headius: that is not in Java heap
<headius>
I don't know what the default max is right now
<chrisseaton>
This tells me JRuby classic runs -e 14 in a 7 MB heap
<headius>
code cache holds the native code plus some metadata
<chrisseaton>
Which is pleasantly surprising
<headius>
chrisseaton: very....people wouldn
<headius>
wouldn't give us such a hard time if the JVM didn't so aggressively increase the heap
<headius>
I suppose that's tunable but tuning anything on the JVM is a black hole
<chrisseaton>
Truffle is heavily disadvantaged for min heap actually - because the compiler is also running in the user heap in our case
<headius>
ahh, good point
<enebo>
ah
<headius>
7MB boot for jruby 9k is pretty good
<headius>
considering the increase in size from IR
<chrisseaton>
You can run some scripts by just changing -e 14 for whatever command line you want
<enebo>
chrisseaton: can you measure 1.7 with jt?
<chrisseaton>
And add options there etc
<headius>
yeah cool, thanks
<headius>
is this stuff on wiki somewhere?
<chrisseaton>
Maybe with the RUBY_BIN option - not sure
<chrisseaton>
No, but jt.rb --help says a lot
<headius>
might be good to throw up a few sentences on what we are all using to improve things
<enebo>
chrisseaton: maybe just swap jruby.jar with 1.7 one
<headius>
enebo: jruby.home stuff will not work well
<chrisseaton>
There is also metrics time, and metrics alloc
<headius>
it won't find RG to boot etc
<enebo>
-e 14
<chrisseaton>
I find alloc often goes up without effecting min heap, so I don't worry about it too much
<headius>
-e 14 still loads RG
<enebo>
headius: does 1.9 load rg by default?
<headius>
yes
<enebo>
headius: heh
<enebo>
time stops for no man
<headius>
hah
<headius>
chrisseaton: other things that would be interesting to figure out is what the highest memory consumers are for live heap in a big app and what are the heaviest allocated objects for a running app
<headius>
I do that as one-off analysis on various things but we don't have anything standard
<enebo>
I think minimum heap + code cache would be an interesting comparison too
<headius>
and instrumenting allocation is so expensive it makes all but the most trivial scripts unusably slow
<enebo>
assuming that can be measured with graal
<headius>
hmmm
<enebo>
PE seems like it might generate a lot of code
<chrisseaton>
Code cache is just that though - cache - so it fills up available space, and could run with less
<headius>
we might be able to do cheaper allocation profling if we added the Ruby alloc hooks and ran interpreted...we can generate an interpreted stack trace *way* faster than JVM can generate a compiled one
<chrisseaton>
Well the idea of PE is that it generates very little code :) but it generates bit data structures along the way yeah
<enebo>
chrisseaton: ah yeah I guess it would wouldn’t it :) I did not think that through
prasunanand has quit [Ping timeout: 260 seconds]
<enebo>
chrisseaton: so then that will generally all be in heap as well
<headius>
well, I was under the impression that there would be many fully-PE'ed paths live at once
<headius>
for different shapes of code and data
<headius>
good for perf but doubling a lot of code
<chrisseaton>
Yes I suppose it generates multiple copies, but only when it thinks its' worth it
<headius>
sure
<chrisseaton>
We're adding processor cycle estimates into the model now, even
<enebo>
chrisseaton: thanks this is helping to fill in some gaps for me
<headius>
yeah, our inlining is based on safepoint counts so that's kinda sort getting to cycle estimation
<headius>
hard to know what's happening between all those safepoints but they happen on a fairly regular cadence
<headius>
I suppose I should say JRuby safepoints rather than JVM safepoints
<chrisseaton>
did I mention Brandon Fish is now working part time for us?
<headius>
yeah I heard, very cool
<headius>
how many FTE now?
<chrisseaton>
4.5
<enebo>
chrisseaton: yeah I saw that
<headius>
you're still .5 or is that eregon?
<chrisseaton>
Brandon is half time
<headius>
ahh ok
<headius>
well cool
yfeldblum has quit [Ping timeout: 250 seconds]
<headius>
I think we've gotten enough kinks worked out of 9k that we can start being more aggressive on opto
<chrisseaton>
This is open source at its best - literally saying to an open source collaborator keep doing what you're doing and we'll just start paying you for your time
<headius>
packing data is part of that
<headius>
I want to do a pass over things to try to remove extraneous call frames from our interpreter and call path also
<headius>
chrisseaton: any idea of stack use in JT?
<headius>
AST interpreter can really bloat that up, but ideally it reduces soon after
<chrisseaton>
No, but I'm guessing it might be a tad high in Truffle
<headius>
so there's going to be a high early maximum for stack
<headius>
that fades
<enebo>
If we can make InterpreterEngine be Interpreter we can kill that whole frame but Interpreter exists for the backtrace detection mostly
<enebo>
and there are different execution paths into the engines
<enebo>
like eval vs no
<enebo>
headius: I am not sure if IR uses less or more stack now
<enebo>
headius: with indy I know we also have some hidden frames
<enebo>
It seems it should be shorter
<enebo>
block is pretty bad atm though
<headius>
it uses more for very simple methods and significantly less for complex ones, I would guess
<headius>
we have a lot of useless frames leading into interpreter that would have been maybe one or two for a simple AST method
skade has joined #jruby
<headius>
like each Ruby frame in interpreter equates to a half dozen JVM frames or more
<enebo>
we did still have INTERPRET_METHOD in 1.7 right?
<headius>
we did, that's one hop
<enebo>
yeah so same number for that still
<enebo>
one for engine
<headius>
that could be improved too...I went with a very isolated and unique name to reduce problems calculating frames
<enebo>
one to call sub method for callOperation
<headius>
right
<headius>
switch splitting
<headius>
Instr.execute
<enebo>
I guess we could reduce interp perf for less stack
<headius>
especially in simple
<enebo>
startup yeah
<headius>
as long as we don't cross JIT size boundary
<enebo>
heh famous last words
<headius>
:-D
<headius>
other problem is that most of these splits won't inline, or at least not until much later
<enebo>
if we did not specialize interp for operands in startup it could be quite a bit smaller for call
<enebo>
which we could do
<headius>
so passing a half-dozen params through a half-dozen frames equals a lot of wasted frame space
<enebo>
startup would likely not slow down
<enebo>
well those were empirically averaged out to be manual inline decisions
<headius>
yeah
<headius>
I understand
<headius>
we have a lot of overloads leading up to interp too
<enebo>
kind of a bummer
<headius>
we could collapse some of those
<enebo>
I really really thought mega call site would be it
<headius>
it's really sad that idiomatic Java is so frequently the worst pattern for JIT
<enebo>
of course the other issue if we needed to decouple operand decoding from the instr
<headius>
overloads, delegation, interfaces
<headius>
varargs
<enebo>
then we could switch on arity but mega on the actual meat
<headius>
hmm yeah
<enebo>
not decoding all operands in all instrs would reduce code significantly
<headius>
stupid hotspot and its lack of type specialization
<enebo>
multimethods
<enebo>
where is the JEP!
<headius>
heh, author one
<headius>
I already checked that off my bucket list
<headius>
chrisseaton: hey is there still an svm javascript binary out there somewhere? I can't find it
<kares>
enebo: hey! not really - did not try 9 yet ... anything interesting?
<enebo>
kares: I got an setAccessible error right away
<kares>
najs!
<enebo>
kares: I think jigsaw changes might end up now allowing some of that
<enebo>
perhaps it is just some policy file we need to set up
<enebo>
but since we do it to any code we load it might end up being more of an issue
<kares>
interesting
drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
<headius>
kares: hey if you get a chance you might look over the packed_arrays PR too
<headius>
might be easier to look at it as one big diff...there's a lot of commits
<chrisseaton>
headius: no
<chrisseaton>
Hopefully SVM builds of JRuby in the next few months
yfeldblum has joined #jruby
jeremyevans has quit [*.net *.split]
joast has quit [*.net *.split]
johnsonch_afk has quit [*.net *.split]
Antiarc has quit [*.net *.split]
kith has quit [*.net *.split]
pawnbox_ has quit [Remote host closed the connection]
<headius>
chrisseaton: boo
<headius>
I never got a chance to play with it
<headius>
svm builds of JT would be neat though
subbu|away is now known as subbu
skade has quit [Quit: Computer has gone to sleep.]
pawnbox has joined #jruby
tcrawley is now known as tcrawley-away
e_dub has quit [Quit: It's a hard knock life]
e_dub has joined #jruby
e_dub has quit [Read error: Connection reset by peer]
e_dub has joined #jruby
enebo has quit [Quit: enebo]
shellac has joined #jruby
drbobbeaty has joined #jruby
skade has joined #jruby
shellac has quit [Quit: Computer has gone to sleep.]
shellac has joined #jruby
e_dub has quit [Read error: Connection reset by peer]
e_dub has joined #jruby
e_dub has quit [Read error: Connection reset by peer]
e_dub has joined #jruby
shellac has quit [Quit: Computer has gone to sleep.]