#jruby on 2020-06-15 — irc logs at freenode.irclog.whitequark.org

2019-08-12 18:53 ChanServ changed the topic of #jruby to: Get 9.2.8.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

00:39 ur5us has quit [Ping timeout: 256 seconds]

01:00 ur5us has joined #jruby

01:24 <subbu> chrisseaton[m]1, documenting? or something else?

01:24 <chrisseaton[m]1> Just writing some notes at the moment

01:24 <chrisseaton[m]1> Will share with you before I publish

04:19 _whitelogger has joined #jruby

04:43 <subbu> sounds good! thanks.

05:16 ur5us has quit [Ping timeout: 256 seconds]

06:43 _whitelogger has joined #jruby

08:24 ur5us has joined #jruby

09:51 drbobbeaty has joined #jruby

09:56 drbobbeaty has quit [Ping timeout: 256 seconds]

10:18 ur5us has quit [Ping timeout: 256 seconds]

10:57 nirvdrum has joined #jruby

13:35 jmalves has joined #jruby

13:46 <jmalves> Hey all,

13:46 <jmalves> We are seing a very sporadic NPE in our testing pipeline that seems to happen when calling a ruby block in Java. This block might be called from multiple threads concurrently. I have not been able to reproduce this NPE locally.

13:46 <jmalves> Our JRuby version is 9.2.7.0. The NPE stack trace is:

13:46 <jmalves> java.lang.NullPointerException,

13:46 <jmalves> org.jruby.runtime.InterpretedIRBlockBody.commonYieldPath(InterpretedIRBlockBody.java:127)

13:46 <jmalves> org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:79)

13:46 <jmalves> org.jruby.runtime.Block.call(Block.java:124)

13:46 <jmalves> org.jruby.RubyProc.call(RubyProc.java:295)

13:46 <jmalves> org.jruby.RubyProc.call(RubyProc.java:274)

13:46 <jmalves> org.jruby.RubyProc.call(RubyProc.java:270)

13:47 <jmalves> org.jruby.javasupport.Java$ProcToInterface.callProc(Java.java:1131)

13:47 <jmalves> org.jruby.javasupport.Java$ProcToInterface.access$300(Java.java:1108)

13:47 <jmalves> org.jruby.javasupport.Java$ProcToInterface$ConcreteMethod.call(Java.java:1169)

13:47 <jmalves> org.jruby.gen.InterfaceImpl235418331.consume(org/jruby/gen/InterfaceImpl235418331.gen:13)

13:47 <jmalves> I was looking at the code in InterpretedIRBlockBody, and I was wondering if the method `commonYieldPath` can be called concurrently? If so that might explain the cause of this NPE.

13:47 <jmalves> (I can create an issue about this if that would be more appropriate, still trying to get a reproduction :/)

13:47 <headius[m]> jmalves: yo

13:47 <jmalves> Hey Headius!

13:47 <headius[m]> hmm ok let's see

13:48 <headius[m]> off the top of my head, that path should be safe

13:48 <headius[m]> there are places in the IR logic where we do some set-up lazily, though and it has bitten us on concurrency once in a while

13:49 <headius[m]> I'll roll my source back to 9.2.7 and have a look at that line, but you should open an issue now

13:49 <jmalves> The way the method ensureInstrsReady() is used + commonYieldPath can make interpreterContext be set to nil if fullInterpreterContext can ever be nil

13:49 <headius[m]> it may be something we fixed but if you hit it, others might hit it... best to have a record

13:49 <headius[m]> oh nice, you have looked into it a bit already

13:50 <headius[m]> I think I remember looking into this at one point myself

13:50 <headius[m]> it's the promotion from "startup" interpreter context to the full one, and wasn't being done atomically

13:50 <headius[m]> enebo: do you remember fixing anything here?

13:50 <jmalves> Like there is the nil check here `if (interpreterContext == null) {` then `interpreterContext = closure.getInterpreterContext(); ` gets called. Next thread does not into the if because `interpreterContext` is not null any longer so when it runs ` interpreterContext = fullInterpreterContext; ` this sets it back to nil

13:51 <jmalves> (in the example above the first thread got paused at `interpreterContext = closure.getInterpreterContext();` i.e.: never run `fullInterpreterContext = interpreterContext;`)

13:51 <jmalves> Not sure if I was clear or what I said made sense

13:52 <headius[m]> yeah I think you're on the right track

13:59 <headius[m]> there are changes in here since 9.2.7 but I'm not sure if they fix the problem or not

14:00 <headius[m]> I would say probably not

14:01 drbobbeaty has joined #jruby

14:02 <headius[m]> ick, so many state changes

14:04 <headius[m]> well this should be easy enough to reproduce... generate new blocks in a loop and try to call them concurrently

14:08 <jmalves> Ok I can try that

14:08 <jmalves> Issue: https://github.com/jruby/jruby/issues/6282

14:10 <headius[m]> hmm I have a script but it doesn't repro yet

14:10 <headius[m]> rvm jruby-9.2.7.0 do jruby -X-C -Xjit.threshold=1 -e 'loop { block = eval "proc { i = 0; 100000.times { i += 1 } }"; 100.times.map {Thread.new { 100.times { block.call } } }.each(&:join) }'

14:10 <headius[m]> pretty aggressively threading there but no NPE

14:11 <headius[m]> the eval might be messing with it... we do things slightly differently for evals in some cases

14:14 <headius[m]> jmalves: thank you

14:17 <jmalves> Why do you do `block = eval " ... "` ?

14:17 <headius[m]> to make sure it's a new block every time... otherwise if it safely transitions to "full" it will never fail after that

14:18 <headius[m]> you don't see it often in your app because you have to catch it just right

14:26 <jmalves> @headius I was debugging your script and it does not seem to call `commonYieldPath`

14:26 <headius[m]> oh good catch

14:26 <headius[m]> it probably calls the "direct" logic

14:27 <jmalves> Actually it is a little bit more under there in the stack

14:27 <jmalves> at org.jruby.runtime.InterpretedIRBlockBody.ensureInstrsReady(InterpretedIRBlockBody.java:62)

14:27 <jmalves> at org.jruby.runtime.InterpretedIRBlockBody.yieldDirect(InterpretedIRBlockBody.java:104)

14:27 <jmalves> at org.jruby.runtime.IRBlockBody.yieldSpecific(IRBlockBody.java:85)

14:27 <jmalves> at org.jruby.runtime.Block.yieldSpecific(Block.java:134)

14:27 <jmalves> at org.jruby.RubyFixnum.times(RubyFixnum.java:278)

14:27 <jmalves> at org.jruby.RubyInteger$INVOKER$i$0$0$times.call(RubyInteger$INVOKER$i$0$0$times.gen:-1)

14:27 <jmalves> at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodZeroBlock.call(JavaMethod.java:555)

14:27 <jmalves> at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:80)

14:27 <jmalves> at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:89)

14:27 <jmalves> at org.jruby.ir.instructions.CallBase.interpret(CallBase.java:537)

14:27 <headius[m]> I see an unrelated concurrency issue... if the block is promoted to a full build at the same time a thread updates the ic, it may overwrite the full build

14:27 <jmalves> at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:362)

14:27 <jmalves> at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:72)

14:27 <jmalves> at org.jruby.ir.interpreter.Interpreter.INTERPRET_BLOCK(Interpreter.java:128)

14:27 <jmalves> at org.jruby.runtime.InterpretedIRBlockBody.commonYieldPath(InterpretedIRBlockBody.java:137)

14:29 <headius[m]> I will just comment out the direct logic

14:34 <jmalves> headius was able to reproduce locally using our own code and eval!

14:34 <headius[m]> oh nice!

14:34 <jmalves> Now I am not sure how I can generalize this :/

14:37 subbu is now known as subbu|away

14:37 <headius[m]> can you show me the code where you create the block?

14:38 <headius[m]> you have that ConcurrentHashMap snippit but is that it?

14:41 <headius[m]> still no luck on my end... I'll let you play with it for a bit in your codebase and see if you can narrow it down

14:41 <headius[m]> I'm auditing places where we set IRScope.interpreterContext to see why it would be null at this point

14:47 <jmalves> It is basically something like this:

14:47 <jmalves> 500_000.times do

14:47 <jmalves> results = java.util.concurrent.ConcurrentHashMap.new

14:47 <jmalves> database.view('table.column').for_each(eval " ->(row_id, result) { results[row_id] = result } ").wait_and_get

14:47 <jmalves> end

14:48 <jmalves> So that block is evaluated once per row of that table and called concurrently

14:49 <jmalves> We use futures btw not sure if this influences anything

14:52 <headius[m]> direct threads versus futures in an executor shouldn't make a difference

15:12 <jmalves> @headius got a repro! I pasted in the github issue

15:14 <headius[m]> oh excellent, I can stop bashing my head against this script

15:16 <jmalves> let me know if it works for you

15:23 <headius[m]> I've got it running but no fail so far... what JDK are you on?

15:24 <headius[m]> I assume my 4/8 cores should be enough

15:24 <jmalves> java 11.0.2 2019-01-15 LTS

15:24 <jmalves> Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.2+9-LTS, mixed mode)

15:24 <jmalves> Java(TM) SE Runtime Environment 18.9 (build 11.0.2+9-LTS)

15:25 <headius[m]> ok

15:25 <jmalves> I also run this within `binding.pry` console not sure if it matters

15:25 <headius[m]> toss that in the bug so we have it

15:25 <headius[m]> it might...pry like irb always ends up running in our interpreter at first

15:26 <headius[m]> can you gist me a console session where you run it and it fails?

15:27 <jmalves> What do you mean by console session?

15:28 <headius[m]> like just show me your pry session from command line to failure I guess

15:28 <headius[m]> I'm running your thing like this but no fail yet:

15:28 <jmalves> ok

15:28 * headius[m] sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/hlOkhZNyCFFIkikxPkWZpjGq >

15:30 <jmalves> Ok, one minute

15:31 <jmalves> Hmm weird, I restarted the console and now it is not failing at all! argh :/

15:31 <headius[m]> threading is so much fun

15:32 <jmalves> It was failing consistently before

15:32 <headius[m]> I'm setting it up to run in a loop and I'll be back in a bit

15:33 <headius[m]> this will be fixed for 9.3 in any case so don't worry about that... but if we can repro we might be able to find a workaround

15:33 <jmalves> Got it now again after re-compiling and re-runing

15:33 <jmalves> Let me isolate the steps

15:34 <jmalves> (re-compiling and reloading with the jvm running, also attached debugger)

15:37 <jmalves> @headius https://gist.github.com/jmalves/e805d9db62a553562cfdd6e03d39b708

15:39 <jmalves> Thanks for taking a look at this @headius. We have only seen this problem in our specs but I suspect it could happen also out of them. However outside of specs it is not too common to call these methods from ruby.

15:41 <headius[m]> well I'm glad you haven't seen it in production yet

15:41 <headius[m]> hopefully we'll get you on a fixed release soon

15:42 <headius[m]> hey I have to run for a bit now but you could try it with JRuby 9.2.11.0 since the repro is small now

15:42 nirvdrum has quit [Ping timeout: 240 seconds]

15:42 <headius[m]> not failing won't be conclusive but perhaps encouraging

15:42 <jmalves> I also have to go now, I will try that tomorrow!

15:42 <headius[m]> ok, I'll let my loop run

15:45 jmalves has quit [Remote host closed the connection]

15:53 <headius[m]> jmalves: in the future, don't assume this is something you're doing wrong... NPE specifically always means a bug to me (treat it like a hard crash), and pretty much any other low-level Java exception probably means bug too

15:53 subbu|away is now known as subbu

15:55 jmalves has joined #jruby

15:56 jmalves has quit [Remote host closed the connection]

15:56 jmalves has joined #jruby

16:01 jmalves has quit [Ping timeout: 264 seconds]

16:10 jmalves has joined #jruby

16:10 jmalves has quit [Remote host closed the connection]

16:11 jmalves has joined #jruby

16:15 jmalves has quit [Ping timeout: 256 seconds]

16:29 thomas[m] has joined #jruby

17:04 jmalves has joined #jruby

17:09 jmalves has quit [Ping timeout: 256 seconds]

17:20 nirvdrum has joined #jruby

18:16 subbu is now known as subbu|lunch

18:53 jmalves has joined #jruby

18:53 <travis-ci> jruby/jruby (master:0953009 by Thomas E Enebo): The build was broken. https://travis-ci.org/jruby/jruby/builds/698651266 [142 min 59 sec]

18:53 travis-ci has left #jruby [#jruby]

18:53 travis-ci has joined #jruby

18:54 subbu|lunch is now known as subbu

18:58 jmalves has quit [Ping timeout: 258 seconds]

19:01 <headius[m]> enebo hmm the PR was green wasn't it?

19:01 <enebo[m]> I thought so

19:01 <headius[m]> Only sequel failed

19:02 travis-ci has joined #jruby

19:02 travis-ci has left #jruby [#jruby]

19:02 <travis-ci> jruby/jruby (jruby-9.2:bd61d73 by Charles Oliver Nutter): The build was broken. https://travis-ci.org/jruby/jruby/builds/698654294 [107 min 34 sec]

19:02 <headius[m]> DateTime off by one failure...maybe it's a timing bug in the test

19:03 <headius[m]> 9.2 branch fails are all over with same it though

19:03 <headius[m]> Same commit

19:04 <headius[m]> jeremyevans that master fail above...possibility of a timing issue in the test? I haven't looked as I'm on mobile

19:05 <enebo[m]> very weird to see IRPersist loading duping causing that.

19:06 <headius[m]> Yeah hard to believe that would affect anything

19:06 <headius[m]> I have not seen this fail before though

19:06 <headius[m]> We do pull sequel master so there's that

19:07 <enebo[m]> headius: just verified all was green on the PR

19:10 <headius[m]> Probably a glitch or bad test then, I guess we'll see

19:21 travis-ci has joined #jruby

19:21 travis-ci has left #jruby [#jruby]

19:21 <travis-ci> jruby/jruby (master:03a8357 by Thomas E Enebo): The build was fixed. https://travis-ci.org/jruby/jruby/builds/698658386 [148 min 26 sec]

19:22 <enebo[m]> HUZZAH...clearly IR serialization got fixed by mergingh ir.print option flag PR :)

19:22 <enebo[m]> so something odd happened in that sequel run. Is it a special day like Mayan end of the world?

19:31 <headius[m]> That's weird

19:32 <headius[m]> I didn't look at the failures on the branch but maybe it was some network glitch that affected everything

19:32 <headius[m]> I feel like we should continue moving jobs off of Travis

19:54 <chrisseaton[m]1> Feels like travis is getting worse more quickly now

19:58 <headius[m]> Well I think they have one person maintaining it now

19:59 <headius[m]> I really don't understand how they couldn't make a business out of it

19:59 <headius[m]> Somebody majorly screwed up their market position

20:15 <kalenp[m]> headius Following up from that screenshot I shared on Friday, we aren't able to upgrade to 9.2.11 because of the IR serialization bug. But looks like that just got fixed! Looking forward to the next release so we can pick that up.

20:17 <headius[m]> Well lucky you it looks like we will do a 9.2.12

20:17 <kalenp[m]> I don't twitter, but I might have some coworkers (Looker) who could share it.

20:17 <headius[m]> Surprising number of people have hit that issue

20:18 <kalenp[m]> Some of them may have been coworkers ;)

20:23 <headius[m]> Sounds great!

20:25 drbobbeaty has quit [Ping timeout: 265 seconds]

20:41 jmalves has joined #jruby

20:46 jmalves has quit [Ping timeout: 264 seconds]

20:58 snickers has joined #jruby

21:01 ur5us has joined #jruby

21:22 <headius[m]> jmalves: one of our JVM friends appears to have figured out JVM flags to reproduce it 100%

21:22 <headius[m]> I'm got my best people working on it 🧐

21:23 <headius[m]> enebo: can't remember if you met Charlie Gracie from J9 team but he moved over to working on Hotspot at I think Microsoft

21:41 <jeremyevans> headius[m]: failure could be rounding issue specific to jdbc-sqlite3, I guess

21:41 <headius[m]> it passed on the subsequent build so something's flaky somewhere

21:42 <enebo[m]> headius: ah I saw him comment and I was confused...ok he is on hotspot codebase now :)

21:43 <enebo[m]> shadowing variable was removed from Ruby

21:43 <jeremyevans> I'll see if I can reproduce in a loop

21:43 <enebo[m]> looks like end of 2018

21:43 <headius[m]> yeah it's been over a year

21:44 <enebo[m]> Another rando jruby option in grey

21:44 <headius[m]> I think he's still in Ottawa though

21:44 <headius[m]> jmalves: oops sorry, I meant chrisseaton ... got my bugs crossed

21:45 <headius[m]> jmalves: I let your stuff run for hours and did not reproduce

21:45 <chrisseaton[m]1> My bug with inlining? Shouldn't need to let it run for hours - it only compiles once (unless there are other bugs) and it's either inlined or it isn't. Should be able to confirm or not in seconds.

21:46 <headius[m]> chrisseaton: no, that was actually for jmalves

21:46 <headius[m]> you get the Charlie Gracie flags which I think you might have seen already

21:46 <headius[m]> and it repros without tiered compilation so there's something else wrong

21:46 <chrisseaton[m]1> Oh sorry I understand now

21:49 <headius[m]> two JDK bugs in two weeks, we're on a roll

21:52 <enebo[m]> Any idea on how often this non-inlining happens?

21:53 <headius[m]> we were able to reproduce it almost every time yesterday without any special flags

21:54 <enebo[m]> I more mean how many sites end up being affectred

21:54 <headius[m]> in theory it could be affecting every indy ruby to ruby call site

21:54 <headius[m]> I doubt it affects ruby to java call sites

21:54 <enebo[m]> That would be pretty amazing based on the fact indy seems to help

21:55 <headius[m]> yeah see my last point... a very large percentage of calls are into core classes, and at the moment there's no evidence to indicate they're affected

21:56 <enebo[m]> Ah yeah I guess so but if you are looking at Rails I wonder a bit...avticesupport pretty much wraps a lot of stuff

21:56 <headius[m]> right, something like rails probably tips that scale a lot

21:56 <enebo[m]> and goes through about 200 levels of ruby stack :)

21:56 <headius[m]> so it could still be a very big opportunity

21:57 <headius[m]> even when they call core classes they usually do it through some monkey patch

21:57 <enebo[m]> but that is a funny observation...most microbenchmarks have a large amount of core being called

21:58 <headius[m]> there's some classloader involvement here because it started showing up when Chris forced the methods to JIT instead of AOT

21:58 <enebo[m]> well the monkey patch was my original AS comment but I don't know hwo many actually get consumed...at this point just thinking about AR there are many many levels of ruby to ruby calls

21:58 <headius[m]> we do most of our microbenchmarking at CLI, which AOTs

21:58 <enebo[m]> but we are tangenting a bit too much that it might be all ruby to ruby calls

21:58 <enebo[m]> that is also true

21:59 <headius[m]> in any case there's gobs of small Ruby methods getting hit very hard in Rails, and those not inlining is a big loss

21:59 <enebo[m]> yeah

22:01 <enebo[m]> ok this used warning stuff is almost working but dinner prep has come upon me

22:01 <headius[m]> disabling tiered would have been a decent workaround but Charlie's flags threw water on that

22:01 <headius[m]> so we have no workaround again

22:01 <headius[m]> possibly disabling one shot classloaders but that's not really workable

22:02 <headius[m]> ok, ttfn

22:10 travis-ci has joined #jruby

22:10 travis-ci has left #jruby [#jruby]

22:10 <travis-ci> jruby/jruby (master:69142ee by Charles Oliver Nutter): The build was broken. https://travis-ci.org/jruby/jruby/builds/698712720 [89 min 20 sec]

22:11 <headius[m]> oh goodie

22:16 drbobbeaty has joined #jruby

22:25 <headius[m]> all failures during the build, which doesn't use 9.3 🤔

22:30 jmalves has joined #jruby

22:32 snickers has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

22:34 jmalves has quit [Ping timeout: 256 seconds]

22:51 <headius[m]> I'm going to restart this and see what happens

22:55 <chrisseaton[m]1> enebo: a problem with my bug is that you don't see it if you do a single file test case, so anyone exploring JRuby assembly output would likely miss it

23:07 nirvdrum has quit [Ping timeout: 256 seconds]

23:07 travis-ci has joined #jruby

23:08 travis-ci has left #jruby [#jruby]

23:08 <travis-ci> jruby/jruby (master:69142ee by Charles Oliver Nutter): The build was broken. https://travis-ci.org/jruby/jruby/builds/698712720 [142 min 9 sec]

23:08 <headius[m]> we could probably soften the CLI AOT since we can jit blocks properly now

23:08 <headius[m]> that was the main reason we did AOT, since so many benchmarks are just a bag of toplevel blocks

23:10 <headius[m]> sigh, of course the fails before passed now... I don't know whether to blame maven, JRuby 9.1, or travis

23:10 <chrisseaton[m]1> It's benchmarks with while loops that are the real problem, I guess

23:20 <headius[m]> yeah, OSR would be a pretty big challenge

23:20 <headius[m]> but at that point we're adding optimization mostly for benchmarking

23:20 <chrisseaton[m]1> Don't want to minimise work needed, but why? You already have a side-stack for the backtrace and local variables if needed don't you?

23:21 <headius[m]> the interpreter that runs at boot doesn't execute things the same way as as bytecode, so the variables would be rather different

23:22 <chrisseaton[m]1> We're looking at expanding kinds of OSR in TruffleRuby, actually not for benchmarking reasons but to we can get smaller compilation units that compile independently - like massive switches. I thought JRuby did that to some extent?

23:22 <headius[m]> it's doable using the same sort of mechanism as deopt would use, but we don't have that yet either

23:22 <headius[m]> we used to "chunk" large methods but it was only based on top-level lines... so nothing inside a loop or switch

23:23 <chrisseaton[m]1> That's what I was thinking of

23:23 <headius[m]> we don't have that currently for the IR-based pipeline

23:24 <headius[m]> it may be easier to chunk things in IR... it's somewhere on the long list of opportunities

23:25 <headius[m]> chunking also introduces challenges for stack traces, since we have to mine the Java stack trace and translate it into Ruby... chunking can't cause there to be double frames