#jruby on 2021-02-12 — irc logs at freenode.irclog.whitequark.org

2020-12-10 18:57 ChanServ changed the topic of #jruby to: Get 9.2.14.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

00:27 * jswenson[m] sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/DHRfsNgNAjlFLHzaFkQERwYB/message.txt >

00:30 <jswenson[m]> Looks like most of the nashorn stuff has a more complex class name (`jdk.nashorn.internal.scripts.Script$181$\^eval\_`) while the jruby stuff is typically more simple.

00:31 <jswenson[m]> Top few without stripping the end off the class names

00:31 <headius[m]> yeah they are probably encoding it more

00:31 * jswenson[m] sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/aGxvnuEXecRQZWhdsotnmykE/message.txt >

00:31 <jswenson[m]> for reference the OQL for the second one was : `SELECT s.member.clazz.getName() FROM java.lang.invoke.DirectMethodHandle s`

00:32 <jswenson[m]> I'm using eclipse mat, not sure if that matters for these.

00:32 <headius[m]> it does, the visualvm oql sucks

00:32 <headius[m]> I am realizing that now

00:33 <jswenson[m]> had to use bash to group and sort

00:33 <headius[m]> that query is syntax error in visualvm, lame

00:33 <headius[m]> ok so basically we could filter by member.clazz.getName =~ /jruby/

00:33 <headius[m]> however that is done in MAT OQL

00:34 <headius[m]> there does seem to be a lot of duplication of DMH just in your results though

00:35 <jswenson[m]> can do this: `SELECT s.member.clazz.getName() FROM java.lang.invoke.DirectMethodHandle s WHERE s.member.clazz.getName().startsWith("org.jruby")` which yields 39k results (of the 66k total)

00:35 <headius[m]> ok that's pretty good

00:36 <headius[m]> 39k direct handles to JRuby classes seems like a lot to start

00:37 * jswenson[m] sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/GTUmefVlyUaGWXSNCKQOdGGK/message.txt >

00:38 <jswenson[m]> that's the className and memberName

00:38 <jswenson[m]> `SELECT s.member.clazz.getName(), s.member.name.toString() FROM java.lang.invoke.DirectMethodHandle s WHERE s.member.clazz.getName().startsWith("org.jruby")`

00:39 <headius[m]> these are the kinds of sites I would expect to see even with indy off because these are literals and constants

00:39 <jswenson[m]> Yeah I think we must not have any indy on

00:39 <jswenson[m]> as we're not explicitly adding it anywhere and only disabling indy.yield

00:39 <headius[m]> some of these are being created by JVM and would be reduced in newer versions

00:40 <headius[m]> like those references to fstring in Bootstrap should be almost entirely in static data for jitted Ruby code and could use the same entry, but probably doesn't in Java 8

00:41 <headius[m]> ok so this is a good report on direct handles

00:41 <headius[m]> it doesn't look like a symptom of a problem at the moment

00:41 <headius[m]> maybe we can look at LF trees that are rooted by nashorn somehow

00:43 <jswenson[m]> I can query the 113k lambda forms, but I'm not sure where I can go from there.

00:43 <headius[m]> jswenson: well we are getting closer

00:44 <headius[m]> basically we need a report of where all the lambda forms are and what is holding onto them to see which ones are JRuby and could be reduced

00:45 <jswenson[m]> let me see if I can wrangle mat / OQL into doing what I want

00:46 <jswenson[m]> not sure where on a lambda form (or around a lambda form) it is hiding the information about why it was created / what created it

00:47 <headius[m]> yeah I am trying to figure that out... seems like it is not in the object but would have to be based on who is referencing it

00:47 <headius[m]> that is exactly what we want though, who created it and why

00:47 <jswenson[m]> Darn.

00:50 <headius[m]> I may be able to pull in a method handle expert

01:07 <headius[m]> I pinged Vladimir Ivanov from Oracle, who has done a large part of the work to reduce and reuse lambda forms

01:07 <headius[m]> he will be able to give us some pointers on investigating and perhaps tuning this stuff

01:08 <headius[m]> I have to drop off for the night soon so I can't help continue but you can keep poking around with MAT and we will see what Vladimir comes back with

01:08 <jswenson[m]> :thu

01:08 <jswenson[m]> * 👍️

01:08 <jswenson[m]> thanks for the help!

01:09 <headius[m]> I am looking to see if it is possible to turn more indy off, but that might only be on master

01:09 <headius[m]> on master we have worked toward a completely indy-free mode to use for native compiles, but that is not available in the 9.2 line

01:10 <headius[m]> from what I see, though, those are not unexpected DMH to see for normal mode, and those numbers don't seem to indicate a problem

01:10 <headius[m]> jswenson: yeah thanks for your patience... I want to help you but also figure out if we can do more to reduce our metaspace load

03:48 travis-ci has joined #jruby

03:48 <travis-ci> jruby/jruby (master:5ccdc58 by Charles Oliver Nutter): The build is still failing. https://travis-ci.com/jruby/jruby/builds/216882491 [165 min 3 sec]

03:48 travis-ci has left #jruby [#jruby]

03:50 <headius[m]> bout ready to give up on Travis

03:51 <headius[m]> nonsensical maven failures... restarted

04:11 travis-ci has joined #jruby

04:11 <travis-ci> jruby/jruby (master:5ccdc58 by Charles Oliver Nutter): The build is still failing. https://travis-ci.com/jruby/jruby/builds/216882491 [162 min 16 sec]

04:11 travis-ci has left #jruby [#jruby]

04:25 <headius[m]> seems like something is still corrupting the shared maven cache on travis, but removing it would add a lot of time and noise to the builds

04:26 ur5us_ has joined #jruby

04:47 ur5us_ has quit [Remote host closed the connection]

04:47 ur5us_ has joined #jruby

04:50 <travis-ci> jruby/jruby (master:5ccdc58 by Charles Oliver Nutter): The build is still failing. https://travis-ci.com/jruby/jruby/builds/216882491 [159 min 9 sec]

04:50 travis-ci has left #jruby [#jruby]

04:50 travis-ci has joined #jruby

04:53 <headius[m]> hosed

04:57 ur5us_ has quit [Ping timeout: 264 seconds]

05:09 <headius[m]> clearing caches and restarting builds is working, but I guess it is just time to leave

05:14 ur5us_ has joined #jruby

05:24 <travis-ci> jruby/jruby (master:5ccdc58 by Charles Oliver Nutter): The build was fixed. https://travis-ci.com/jruby/jruby/builds/216882491 [225 min 22 sec]

05:24 travis-ci has joined #jruby

05:24 travis-ci has left #jruby [#jruby]

05:46 ur5us_ has quit [Ping timeout: 264 seconds]

05:53 travis-ci has joined #jruby

05:53 travis-ci has left #jruby [#jruby]

05:53 <travis-ci> jruby/jruby (jruby-9.2:2c3c16f by Charles Oliver Nutter): The build was fixed. https://travis-ci.com/jruby/jruby/builds/216887019 [177 min 59 sec]

05:56 <headius[m]> double green, good time to call it a night

05:56 <headius[m]> kalenp: copy_stream fix is in

09:00 rdubya[m] has quit [Quit: Idle for 30+ days]

09:10 travis-ci has joined #jruby

09:10 <travis-ci> jruby/jruby (kares-patch-joda+asm:f71bfe2 by Karol Bucek): The build was fixed. https://travis-ci.com/jruby/jruby/builds/216899731 [201 min 55 sec]

09:10 travis-ci has left #jruby [#jruby]

13:22 rdubya[m] has joined #jruby

16:28 <Freaky> headius[m]: got my json reproduction down to two files

16:29 <headius[m]> oh rad

16:29 <Freaky> ish

16:29 <Freaky> hmmm

16:29 <Freaky> this file loads a large JSON payload

16:29 <headius[m]> you want to try to mess with the JVM JIT and see if it goes away, try -XX:TieredStopAtLevel=1

16:30 <headius[m]> could still be jit but it would implicate C2

16:30 <headius[m]> flip side would be -XX:-TieredCompilation which will use only C2

16:32 <Freaky> is JIT partially time based?

16:32 <headius[m]> invocation count triggers it but it happens asynchronously once triggered

16:33 <headius[m]> normal tiered execution will compile a simpler form with profiling and then later recompile an optimized form

16:33 <Freaky> right, keeps running the interpreted/lower tiers until the new code is ready

16:33 <headius[m]> that is tier 3 (C1 client compiler with profiling) and tier 4 (C2 server compiler)

16:34 <headius[m]> but we can force it to use only one and maybe skip to the conclusion of this exciting story

16:37 <headius[m]> TBH I doubt it is JVM JIT just because it is pretty rare to find a bug

16:45 <Freaky> right

16:45 <Freaky> GIT2SVN = JSON.parse(IO.read('db/git2svn.json')).freeze

16:46 <Freaky> if I comment that line out, it goes away

16:46 <headius[m]> hmm

16:46 <Freaky> but it isn't enough on its own..

16:51 <Freaky> crash damn you

17:16 <Freaky> it's certainly harder to reproduce with this cut down environment

17:16 <Freaky> it's like it's sensitive to just the amount of code that happens to be floating about

17:26 <Freaky> meh

17:28 <Freaky> let's try jdb again, now I can reproduce it fairly reliably

17:28 <headius[m]> yeah need to look at the incoming string at the point it raises the error, checking whether it has a weird encoding or bad offsets or something

17:29 <headius[m]> first place to look anywat

17:31 <Freaky> hmm

17:37 <Freaky> │Breakpoint hit: "thread=main", json.ext.Parser$ParserSession.parseImplemetation(), line=2,331 bci=528

17:37 <Freaky> nearly ;)

17:37 <Freaky> off by 2

17:39 <Freaky> observation appears to make it go away

17:39 <headius[m]> of course it does

17:42 <Freaky> ran it a dozen times with the debugger, nothing

17:42 <Freaky> ran it once without, boom

17:44 <Freaky> I don't suppose there's some way I can dump the object from JRuby after the fact?

18:04 <Freaky> both JIT options make it go away too

18:04 <headius[m]> how certain are you of that?

18:04 <headius[m]> I think the next option might be to build a version of the json ext that has some extra logging

18:05 <Freaky> yeah, was thinking that

18:05 <headius[m]> but that might fix it too 🤷‍♂️

18:06 <Freaky> TieredStopAtLevel=1 on iteration 20 and still nothing, -TieredCompilation on 10 and nothing

18:06 <Freaky> baseline managed 2 iterations

18:13 <headius[m]> wouldn't be the first time we found a bug in tiered

18:13 <headius[m]> you could try level=3 and see if it comes back

18:13 <headius[m]> I have some folks that could look into a jit issue if we have a repro

18:22 <headius[m]> I can even open a bug on OpenJDK once we are sure 😀

19:05 <Freaky> as I recall from the compilation log just before it happened was a deopt from level 4 to 3, iirc?

19:07 <Freaky> 26634 14095 4 json.ext.Parser$ParserSession::parseObject (962 bytes) made not entrant

19:08 <headius[m]> yes we did see that

19:11 <Freaky> nothing with StopAtLevel=3

19:11 <headius[m]> ok

19:23 <Freaky> right

19:24 <Freaky> aside from the giant JSON file I have it pretty minimal now

19:33 <headius[m]> that is good

19:33 <headius[m]> once you get to a point you can share it just attach something to the issue or link a repo

19:34 <Freaky> yep

19:34 <kalenp[m]> exciting to see progress on this. hope it's relevant to our own json issues 🤞

19:36 <Freaky> eh

19:37 <headius[m]> seems unlikely they wouldn't be related 😀

19:39 <Freaky> yep

19:39 <Freaky> big JSON file, that seems to be all it takes

20:10 <Freaky> https://github.com/Freaky/jruby-issue-6554

20:10 <headius[m]> sweeet

20:10 <headius[m]> bbl

20:31 <jswenson[m]> @kalen

20:31 <jswenson[m]> * @kalenp I wonder if we're doing any massive json parsing here.

21:35 <kalenp[m]> Some of our known instances are actually on fairly small json objects (2 keys and short values). but perhaps the large json just exercises it intensely enough to show up in a reasonable amount of time

21:36 <jswenson[m]> From the looks of the repro, it looks like simply loading the large JSON will make it so a smaller load will fail later.

21:37 <jswenson[m]> Loads the big bork.json (around 2.5MB) then loads a simple json payload in a loop.

21:37 <jswenson[m]> Thus far I cannot repro, but I haven't tried on jdk15 yet

21:39 <kalenp[m]> well, we certainly do serialize large json objects. will have to think about de-serializing large json

22:15 <kalenp[m]> interesting, I'm able to get the repro on java 16 (no 15 available in my current environment), but on java 11, it's running just fine for several minutes. so, either we have two separate issues, or the particular details of how to trigger the bug have changed

22:31 <headius[m]> could be different heuristics for when and how it jits that aren't being hit with this example on 11

22:57 <Freaky> yeah, I've not seen it <15

22:58 <kalenp[m]> on further reflection, we definitely do parse some large json objects

22:58 <Freaky> the smaller the initial JSON the less reliably I can reproduce it

22:59 <Freaky> maybe it has different sensitivities on different versions

22:59 <headius[m]> I'm wrapping up another investigation and then I will try to repro locally too

23:05 <Freaky> kalenp[m]: any JVM tuning?

23:06 <headius[m]> ah that could make a difference

23:21 <Freaky> JRUBY_OPTS='-J-XX:+UnlockExperimentalVMOptions -J-XX:+UseShenandoahGC' seems to stop it as well

23:24 <Freaky> also SerialGC and ParallelGC from the look of it

23:36 <headius[m]> grr G1

23:36 <headius[m]> I would not be surprised if this ends up being a G1 bug

23:36 <jswenson[m]> We're definitely using G1

23:38 <headius[m]> worth throwing another GC at it if you can afford to try it

23:45 <headius[m]> jswenson: you probably didn't see this on Twitter but I got some assistance from Nashorn folks and there's a way to turn off its aggressive optimizations

23:47 <headius[m]> https://twitter.com/hannesw/status/1360134097952202758

23:48 <headius[m]> I'll add this to the issue