lopex[m] has left #jruby ["Kicked by @appservice-irc:matrix.org : Idle for 30+ days"]
pedran[m] has left #jruby ["Kicked by @appservice-irc:matrix.org : Idle for 30+ days"]
anubhav8421[m] has quit [Quit: Idle for 30+ days]
mischa[m] has quit [Quit: Idle for 30+ days]
louis[m] has quit [Quit: Idle for 30+ days]
nieve[m] has quit [Quit: Idle for 30+ days]
ilikeorangutans[ has quit [Quit: Idle for 30+ days]
amiracam[m] has quit [Quit: Idle for 30+ days]
ezzeddine[m] has quit [Quit: Idle for 30+ days]
harald[m]3 has quit [Quit: Idle for 30+ days]
aaronkelton[m] has quit [Quit: Idle for 30+ days]
lucasb has joined #jruby
nirvdrum has joined #jruby
shellac has joined #jruby
shellac has quit [Read error: Connection reset by peer]
nirvdrum has quit [Ping timeout: 252 seconds]
shellac has joined #jruby
nirvdrum has joined #jruby
rusk has quit [Remote host closed the connection]
<headius[m]>
Good morning
<headius[m]>
lopex: hey last chance to take over docker... we need to get that done
<headius[m]>
klobuczek: any updates on your end?
<headius[m]>
I just ran a quick benchmark measuring java_method and did not see a substantial slowdown yet
<headius[m]>
oddly enough it's a bit faster
shellac has quit [Ping timeout: 258 seconds]
joast has quit [Quit: Leaving.]
<lopex>
headius[m]: yep
<headius[m]>
I have a patch that splits up java_method searching by name, so for non-overloaded cases it will just be the cost of a stream-based filter and for overloaded cases it will be that plus a linear search of all overloads
<headius[m]>
remaining overhead at this point appears to be mostly the cost of creating the Ruby Method and the internal invoker for the method
fzakaria[m] has quit [Ping timeout: 240 seconds]
harpritsinghkohl has quit [Ping timeout: 240 seconds]
sandio[m] has quit [Ping timeout: 240 seconds]
claudiuinberlin[ has quit [Ping timeout: 240 seconds]
slackfan[m] has quit [Ping timeout: 240 seconds]
JasonRogers[m] has quit [Ping timeout: 240 seconds]
byteit101[m] has quit [Ping timeout: 240 seconds]
BlaneDabneyGitte has quit [Ping timeout: 256 seconds]
rebelwarrior[m] has quit [Ping timeout: 256 seconds]
FlorianDoubletGi has quit [Ping timeout: 256 seconds]
harlemsquirre4 has quit [Ping timeout: 240 seconds]
kares[m] has quit [Ping timeout: 240 seconds]
ChrisSeatonGitte has quit [Ping timeout: 240 seconds]
<headius[m]>
enebo: you around today?
fzakaria[m] has joined #jruby
BlaneDabneyGitte has joined #jruby
FlorianDoubletGi has joined #jruby
rebelwarrior[m] has joined #jruby
harpritsinghkohl has joined #jruby
slackfan[m] has joined #jruby
byteit101[m] has joined #jruby
claudiuinberlin[ has joined #jruby
sandio[m] has joined #jruby
JasonRogers[m] has joined #jruby
kares[m] has joined #jruby
harlemsquirre4 has joined #jruby
ChrisSeatonGitte has joined #jruby
<headius[m]>
klobuczek: so good news and bad news
<headius[m]>
finally back at work and got around to benchmarking java_method
subbu is now known as subbu|lunch
<headius[m]>
the good news is that it seems faster in 9.2.10... which is sort of bad news for us finding your perf regression
<headius[m]>
I've made additional changes to make it even faster but this is not really helping our current conundrum
<headius[m]>
I could go further by more aggressively caching the resulting method object, making subsequent lookups cost only as much as building a string key based on name + arg types
<headius[m]>
but again that is just icing on the cake...java_method does not appear to be slower in 9.2.10
<klobuczek[m]>
@hedius I didn't get a chance to successfully get through the git bisect
<headius[m]>
what can I do to help you
joast has joined #jruby
<klobuczek[m]>
what's the simplest thing can we do to narrow down the source of the problem?
<klobuczek[m]>
in limited time
<headius[m]>
well kares suggestion to get timings of your specs is a good one, because that would reduce how much code you have to run to reproduce it
<headius[m]>
obviously it's not going to scale well trying to bisect with an 80-minute test suite
<klobuczek[m]>
I'm doing that as we speak
<headius[m]>
perfect I will stand by then
<klobuczek[m]>
so I got down to 1 spec (albeit with a lot of setup) with 3.2 sec vs 5.88 sec
<klobuczek[m]>
what do you suggest next?
<klobuczek[m]>
I could retry the bisect
<headius[m]>
well that sounds like a good candidate
<headius[m]>
bisect is probably best since that's pretty short
<headius[m]>
so that's the main source of slowdown? You mentioned some extreme increase in test time
<headius[m]>
like 15 min to 80 min or something
<klobuczek[m]>
yes for the entire suits of ~2K specs
<klobuczek[m]>
5.88 sec is the slowest one out of the 2K
<klobuczek[m]>
and it exposes only factor ~2, but might be good enough
<headius[m]>
previous commit returns that spec to original speed?
<klobuczek[m]>
yes
<headius[m]>
this is only used when constructing backtraces for `caller`
<klobuczek[m]>
what is the previous commit you want me to confirm?
<enebo[m]>
my browser crashed and I lost tabs including this one
<klobuczek[m]>
* which previous commit you want me to confirm?
<headius[m]>
klobuczek: whatever commit is before that one you bisected to
<headius[m]>
I am looking at whether `count` passed in there might be unreasonably large in some places
<enebo[m]>
raw arraylist will set to 0 len list vs actually alloc'ing one
<enebo[m]>
but we will never not populate 0 into caller
<enebo[m]>
and this is to make a backtrace which is already expensive
<headius[m]>
yeah
<headius[m]>
so any overhead added here shold be dwarfed by the backtrace cost in the first place
<enebo[m]>
I believe 15ef947ac423baf5c68c5b2a4f5a58e668eb7db7 is commit before it
<headius[m]>
klobuczek is on Java 11 though so it will be using StackWalker, which is an interesting wrinkle
<klobuczek[m]>
ok, I just wanted to ask, becuase it's not easy to find out from bisect what's after or before
<enebo[m]>
I did 'git co ed72b695584909e3db384732c85643dcafbe337d^'
<enebo[m]>
If I know my git ^ magic which it is possible I don't :)
<headius[m]>
don't see why it would have a big impact unless someone's passing a huge length to `caller`
<klobuczek[m]>
15ef947ac423baf5c68c5b2a4f5a58e668eb7db7 is good
<headius[m]>
and yeah `git log <hash>^` should show you parent
<enebo[m]>
15ef947ac423baf5c68c5b2a4f5a58e668eb7db7 seems like it is more interesting for a change...it passes true/false on backtrace and in paths which supposedly have one it will then call a helper to populate it vs potentially doing nothing if there is none
<headius[m]>
klobuczek: wow
<enebo[m]>
yeah I am surprised :)
<headius[m]>
klobuczek: ok on either commit run that spec passing this to JRuby: `-Xlog.callers`
<headius[m]>
I hope it's not slower because of --profile
<enebo[m]>
will that spec run without --profile?
<headius[m]>
I assume --profile is for timing specs
<headius[m]>
that's not jruby --profile
<klobuczek[m]>
yes, I think not much difference in time
<headius[m]>
ok
<enebo[m]>
klobuczek: if you are sure that is good to know
<klobuczek[m]>
running without now
<headius[m]>
I'm just wondering if --profile is making `caller` calls for anything... but if it's one spec I wouldn't expect to see much overhead from that
<klobuczek[m]>
with the -Xlog I got now into the bad time range on the good one
<headius[m]>
the log flag is just for our information, not timing
<headius[m]>
I just want to see where those backtraces are coming from
<enebo[m]>
I don't see a mistake in that commit but I really wonder if somehow this is constructing a MAX_VALUE sized backtrace
<enebo[m]>
That would totally explain it
<enebo[m]>
:)
<headius[m]>
enebo: yeah it filters MAX_VALUE but does not filter unreasonably large sizes smaller than that
<enebo[m]>
it is just it passwd it around
<headius[m]>
of course if it were constructing a MAX_VALUE arrayit would consume something like 2B * 8 bytes
<headius[m]>
seems unlikely it's that large
<enebo[m]>
we wil laugh when we find that JVM 11 bug which improerly passes Integer.MAX_VALUE as a parm
<headius[m]>
not sure what you mean though... this won't construct a trace that large given this code
<headius[m]>
that line was there before... it's just a way to say "all frames"
<enebo[m]>
I did not see a trace so I don't know what code is being called
<headius[m]>
-1 might be safer but it also was used as an upper bound before, not used for size calc
<enebo[m]>
yeah I am just saynig if count == Integer.MAX_VALUE fails for some reason it would alloc MAX_VALUE
<enebo[m]>
It shouldn't
<headius[m]>
I don't think it would ever fail to detect if count is MAX_VALUE unless the JVM is broken
<headius[m]>
but MAX_VALUE - 1 would get through
<enebo[m]>
Just thinking about how changing alloc size on something making a backtrace could possibly overshadow even the backtrace much less anything else
<headius[m]>
it would have to be pretty big to overshadow, yes
<enebo[m]>
yeah which is how I got there
<headius[m]>
klobuczek: other thing you could do would be to get a timing of the FULL suite with that previous commit
<headius[m]>
if this is really the problem then you should see times go back to normal
<headius[m]>
or git revert <that hash> on jruby-9.2 branch and run from that
<enebo[m]>
while (stackIter.hasNext() && trace.size() < count) {
<enebo[m]>
No problem there unless somehow stackIter.hasNext is not working as expected
<headius[m]>
caller and caller_locations without size specified don't use MAX_VALUE, they use this
<headius[m]>
well I guess we found the problem
<enebo[m]>
yeah in my mind an excessively large arraylist was the only thing which would make sense
<enebo[m]>
it is processing a backtrace after all...the thing we do the slowest in our impl
<headius[m]>
yeah
<headius[m]>
probably could have had klobuczek run an alloc profile
<headius[m]>
rather than revert that commit I think we should choose something suitable as a max initial size
<headius[m]>
that will avoid having many small arraycopy for an unbounded backtrace but not allocate something huge
<enebo[m]>
This opt really should have almost no values based on how arraylist is impld
<enebo[m]>
the 16 part of it at least
<enebo[m]>
err 10
<headius[m]>
maybe simply Math.min(count, 100) as the initial size
<enebo[m]>
whatever
<headius[m]>
100 is negotiable
<enebo[m]>
for a large pre-known it can be but you may not end up with that many so I guess this is the tradeoff
<headius[m]>
right
<enebo[m]>
a bad caller kills stuff dead
<headius[m]>
100 may be too much since this is Ruby calls
<headius[m]>
I dunno maybe I'm coming back around to just reverting it
<enebo[m]>
I wonder what the shortest stack even is
<enebo[m]>
it may be >100 :)
<headius[m]>
ruby calls
<headius[m]>
so shortest stack will be 1
<enebo[m]>
2 for main perhaps yeah
<headius[m]>
that's one reason this is a weird unbounded thing
<headius[m]>
I don't know how many frames out of the Java trace will produce a Ruby frame
<headius[m]>
so it just consumes Java frames until meeting the count of Ruby frames requested
<enebo[m]>
I have no opinion on this fix
<headius[m]>
100 elements will mean unbounded calls always allocate 800 bytes for the array
<enebo[m]>
Large stacks are improved by passing in a big value
<enebo[m]>
At some smaller levels we probably cannot see any difference doubling our way to 10 ruby calls or what not
<headius[m]>
800ish, maybe 400 with compressed oops but I don't know how that works
<headius[m]>
800 bytes versus pretty much any full caller trace seems like less than noise
<enebo[m]>
something big we can but I just don't know...
<enebo[m]>
yeah this is heuristic land and I don't think we can just make a guess. I think we should revert for .11 and then maybe kares might want to re-heuristic this
<enebo[m]>
Especially since this is time in generating a backtrace which is already pretty slow
<headius[m]>
I was not quite sure what version of neo4j to look at, since I saw it has been renamed to active_graph, but the caller traces in your output show that `querable.rb` is initiating some ActiveSupport instrumentation that must eventually be using `caller_locations`
<headius[m]>
that instrumentation is guarded by the `skip_instrumentation` option
<headius[m]>
lopex: I tihnk it's still experimental in this release
<lopex>
I gather it will be quite a hassle to implement efficiently
<klobuczek[m]>
yes, the neo4j is parially renamed (gem name so far). I'll reconsile this soon.
<headius[m]>
lopex: do you know how pattern matching is implemented in other runtimes? Is it just a cascade of if's?
<lopex>
headius[m]: no, but I remember is was a big pita for scala
<headius[m]>
I assumed that statically-typed languages were able to statically dispatch to the right case in O(1) for many cases
<headius[m]>
at least for things like case classes
<lopex>
no in scala it was still pita since it leads to exponential check pretty quickly
<headius[m]>
I assume any langauges with arbitrary expressions for the cases would have to be O(n)
<lopex>
headius[m]: like the distinction between values and types
<headius[m]>
mm yeah
<lopex>
like case (_:String) | (_:Integer)
<headius[m]>
maybe a very limited set of forms could be O(1)
<headius[m]>
in any case Ruby will be no better than case/when
<lopex>
vs ADTs
<headius[m]>
I think the only thing you get is more explicit destructuring
<lopex>
but I dont know how deeply nested can it be wrt types/values
<lopex>
like casing a hash, like, {foo: an integer, bar: a value}
<headius[m]>
yeah I have not followed the spec
<lopex>
the doc doesnt mention that combination though
<klobuczek[m]>
@hedius the instrumentation in Querable is simply for the purpose of logging queries and it is used in production. We inherited that code. Are you saying that using active_support/notifications is not appropriate for that purpose?
<headius[m]>
I'm saying it's expensive
<headius[m]>
a single caller backtrace can take a few milliseconds on JRuby, so if you have many instrumentation calls happening per request it will reduce throughput
<headius[m]>
enebo: idea for reducing overhead of unbounded caller....
<headius[m]>
Ruby should add a form that takes a block to process each element
<headius[m]>
that will allow users to have unbounded caller but bail out early, which in JRuby would mean we can use StackWalker to reduce how many frames get generated
<headius[m]>
so caller_locations with no size would just grab full trace right away... with size or with block would do it iteratively
<headius[m]>
the basic problem with cases like this instrumentation is when only one specific frame is desired that should be close to the caller location
<headius[m]>
users may not know how many levels to ask for but know they want to bail out early