Antiarc has quit [Quit: ZNC 1.7.4+deb7 - https://znc.in]
Antiarc has joined #jruby
nirvdrum has quit [Ping timeout: 265 seconds]
victori has quit [Ping timeout: 240 seconds]
victori has joined #jruby
_whitelogger has joined #jruby
rusk has joined #jruby
rusk has quit [Write error: Connection reset by peer]
rusk has joined #jruby
drbobbeaty has joined #jruby
KeyJoo has joined #jruby
drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
KeyJoo has quit [Quit: KeyJoo]
drbobbeaty has joined #jruby
nirvdrum has joined #jruby
lucasb has joined #jruby
victori_ has joined #jruby
victori has quit [Ping timeout: 268 seconds]
<headius[m]>
enebo: so about this async-profiler thing
xardion has quit [Remote host closed the connection]
<headius[m]>
I did find another sibling project called honest-profiler that only profiles Java code, but it has a lot of dependencies like CMake, JavaFX and stuff...I've submitted some PRs to get it more usable but that's not an option right now
xardion has joined #jruby
<headius[m]>
I am not sure how maintained it is, either, and the async-profiler folks just did a talk at OC1, so that seems the best path
<headius[m]>
but
<headius[m]>
I'm not sure how best to link it into JRuby
<headius[m]>
well I think it can but you still have to pass some flags to allow it
<headius[m]>
that may be less onerous than passing the actual agent library, but I have not tried to make it work
<headius[m]>
so the gem builds fine...where it starts to feel a little icky to me is copying the .so
<headius[m]>
it would be nice if we could just query for installed jruby-async-profiler and add that in at startup...that would be ideal
<enebo[m]>
I am thinking we probably do not care what happens up to a certain point for this so it might be a nice option and also potentially be simpler than updating both bash and launcher
<headius[m]>
I could probably push the gem any time if y'all want to play with it
<enebo[m]>
--server as noop possibly to use it later if we realize there are reasonable setting beyond the default?
lucasb has quit [Quit: Connection closed for inactivity]
<headius[m]>
I don't remember why we added it originally...I'm wondering if it was just for parity with --client
subbu is now known as subbu|lunch
<headius[m]>
holy crap, 400µs per getCurrentContext?
<headius[m]>
I just wrote a stupid benchmark and that seems to be the case
<headius[m]>
wait no
<headius[m]>
whew ok my math was off...4ns per get but unsure how much is optimizing away
<headius[m]>
kares: this is a little noisy but new getCurrentContext is maybe 0.1ns slower than 9.2.8
<headius[m]>
could be the branching logic versus your recursion
<kares[m]>
ufff
<headius[m]>
I'm going to fiddle with it a bit
<headius[m]>
4ns for runtime.threadService.threadLocal.softReference.get seems good but it's wasted cycles when we have to do it
<headius[m]>
this is also one thread, I need to test with many threads that are evacuating CPU caches
<headius[m]>
getCurrentContext is also probably too big to inline at the smallest thresholds
<headius[m]>
I was just workign on this experimental to_a(ThreadContext) PR and wondered if we can reduce getCurrentContext overhead
<headius[m]>
hard to believe I can measure things like making these methods static
victori has quit [Remote host closed the connection]
victori has joined #jruby
<headius[m]>
oh wow, how gross is this idea: make ThreadService extend ThreadLocal
<headius[m]>
one less field to traverse
<headius[m]>
could even go as far as making Ruby extend ThreadLocal but I think that's a bit too far 😀
<headius[m]>
working on the to_a PR I just realized we still have quite a few paths that traverse getCurrentContext
<headius[m]>
yeah this cuts off a good 0.5ns
<headius[m]>
I can do better
<lopex>
dark matter!
<headius[m]>
yeah some kind of dark 😀
<headius[m]>
0.6ns
<headius[m]>
holy crap, I crossed some threshold...0.75ns
<headius[m]>
gotta be getting pretty damn close to pthread_local overhead or whatever JVM uses
<headius[m]>
hah, Java 11 is "way slower"... 4.2ns versus 3.2s
<headius[m]>
GraalVM CE is faster, 3.0ns
<rdubya[m]>
headius: I know you've been talking a lot about the performance of OpenJ9, is there a reason not to use that over OpenJDK?
<headius[m]>
it's not as fast yet
<rdubya[m]>
ah ok
<headius[m]>
they did some recent work to improve invokedynamic performance that helps a TON, but they're still like 20-25% slower than Hotspot C2
<headius[m]>
and they're MUCH slower on this, interestingly
<headius[m]>
7.5ns or so
<rdubya[m]>
oh wow
<headius[m]>
oh that was J9 Java 11...may be some OpenJDK ThreadLocal impl change that's responsible for part of that, since Hotspot Java 11 also degraded
<headius[m]>
J9 Java 8 is around 6ns, so nearly 2x slower than C2 Java 8
<enebo[m]>
I am ok with massive deprecations on this. I have always wanted TC to be the main path through all methods down. Ruby down to some methods is ok but it is not future proof.
<enebo[m]>
I have occasionally moved private methods to using TC but public is not as appealing do to the deprecations
<enebo[m]>
Improving getCurrentContext is never a bad thing but we should slowly move all code to just accepting it as a param
<headius[m]>
enebo: so you're saying I ought to restore the other signatures throughout my PR
<enebo[m]>
unless we are saying pushing it on the stack everywhere somehow will be more expensive over the long haul
<enebo[m]>
headius: well if they can be used by native extensions I fear we may get reports
<headius[m]>
it didn't occur to me until I pushed it that code compiled to call RubyArray.to_a() will hard fail at compile or runtime even though there's RubyBasicObject.to_a() still
<enebo[m]>
This is one reason I have wanted to make some embedding API for all the typical operations via probably static API methods
<enebo[m]>
I guess that will largely be entry points into the type though
<headius[m]>
well if you look at what some other languages do, their "runtime" is our "context"
<enebo[m]>
once you have an array you will just want to call to_a(xxx) on it
<headius[m]>
so the equivalent would be context.newArray() instead of context.runtime.newArray
<headius[m]>
the fact that we have two separate things makes the usability a bit worse
<enebo[m]>
yeah arguably we started with Ruby when we should have started with TC but neither of us were on the project at that point
<headius[m]>
yeah blame it on Chad
<enebo[m]>
well I want static org.jruby.embed.Creators.array(context, ...)
<headius[m]>
yeah
<enebo[m]>
then how we actually do this is insulated. For all the things it maybe is too far in the sense we will have to inline a static method with acall in it but for native exts it feels like the right path
subbu|lunch is now known as subbu
<enebo[m]>
We have talked about making the cext lexicon app too which fits into this where rb_new_ary2 will display out array methods
<enebo[m]>
In this case I am not too worried about direct ThreadService consumers here but we could preserve the original public signature
<enebo[m]>
I did not contextualize how much this improved that call
<headius[m]>
yeah this is binary compatible and nobody will break from this patch
<headius[m]>
the big gain was from making ThreadService actually be the ThreadLocal it aggregated
<enebo[m]>
ah yeah I missed the no-arg version
<headius[m]>
still there and only called in like two places, for exit and setup
<headius[m]>
everyone goes through runtime.getCurrentContext, which does the static version in this patch
<enebo[m]>
so we will not be boiling the ocean today so a faster getCurrentContext is great. Seems reaonable to me too unless it somehow pins a runtime in some weird unexpected way
<headius[m]>
there's one detail
<headius[m]>
service.tearDown can't dereference the threadlocal anymore, because it is the threadlocal
<headius[m]>
but I can make runtime.tearDown clear the reference to the ThreadService, which should be mostly the same thing
<headius[m]>
probably better/cheaper than tearing down the service anyway
<headius[m]>
I'll push this as a PR anyway since we have a bunch of GC/leak tests that should run
<headius[m]>
these changes don't feel too hacky do they?
<headius[m]>
I mean all runtimes should benefit from removing the extra hope, and the virtual => static change should at least not do any harm and probably help other runtimes too
<headius[m]>
not like I'm trying to reorder a loop for C2 or something
<enebo[m]>
it is not single responsibliity :) but hey it looks like it will work
<enebo[m]>
and visibly other than the static call aspect it appears to be invisible to consumers
<enebo[m]>
except for shutdown but that is our code
<enebo[m]>
even the static part is technically invisible unless you are in an IDE
<headius[m]>
basically dug all the way into the threadlocal map entry, and if it's not null we unroll
<enebo[m]>
it is amusing to see allour convertTo methods does this
<headius[m]>
yeah :-(
<enebo[m]>
DOES THIS
<headius[m]>
hey, log calls and see how many we actually make
<headius[m]>
there's gotta be a few hot path getCurrentContext that are massively called
<enebo[m]>
Well another 1000 lines of confusing methods never hurt anyone
<enebo[m]>
yeah I mean in obvious places like this we should just deprecate and change to context based one...it ends up being a reverse onion though since you know some amount of convertTo callers will not have context either
<enebo[m]>
nonetheless we probably need to be a bit more serious about this if we actually want it to go away
<enebo[m]>
All numeric methods end up getting a long or an int will end up callint through it since we only know we have an IRubyObject
<enebo[m]>
well I guess I partially take it back with fastmathops
<headius[m]>
yeah it propagates out
<headius[m]>
but ultimately it should trend toward fewer callers
<enebo[m]>
and we definitely have some short-circuits to avoid it
<headius[m]>
in my to_a PR I did track down callers and make sure they were passing context
<enebo[m]>
which is almost disappointing because it feels a bit inconsistent
<headius[m]>
that was a much smaller diff though
<enebo[m]>
yeah great
<enebo[m]>
just keep original versions with deprecations
<enebo[m]>
which I hate to say :)
<headius[m]>
I will add back the signatures I removed
<enebo[m]>
the odds of a to_a() in particular is probably very high...I did not look at that PR
<headius[m]>
well when you think about it most consumers call RubyBasicObject.to_a
<headius[m]>
but this method isn't on IRubyObject so there's far fewer cases
<enebo[m]>
did you see a measurable speed up from doing it? (not that I care in this case)
<headius[m]>
I didn't even check
<enebo[m]>
I am just happy to consistently pass TC through :)
<headius[m]>
it would probably be hard to see
<headius[m]>
I'm not going to dig up sub-ns timings for RubyArray.to_a
<enebo[m]>
add those nanos up :)
<enebo[m]>
This morning I was going to get to the bottom of where and with what unpack('m') is used in Rails...the day is withering before me
<headius[m]>
I'm sure between optimized getCurrentContext and unpack('m') we're poised to take over the Ruby world
<headius[m]>
meanwhile we're chewing up 1GB/s allocating fixnums