#jruby on 2019-10-28 — irc logs at freenode.irclog.whitequark.org

2019-08-12 18:53 ChanServ changed the topic of #jruby to: Get 9.2.8.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

00:34 _whitelogger has joined #jruby

01:13 _whitelogger has joined #jruby

06:08 xardion has quit [Ping timeout: 268 seconds]

06:41 xardion has joined #jruby

09:05 shellac has joined #jruby

09:07 drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

10:53 drbobbeaty has joined #jruby

11:53 nirvdrum has joined #jruby

11:54 nirvdrum has quit [Remote host closed the connection]

12:22 nirvdrum has joined #jruby

12:46 shellac has quit [Quit: Computer has gone to sleep.]

13:34 lucasb has joined #jruby

13:37 shellac has joined #jruby

14:24 <headius[m]> Hood morning!

14:24 <headius[m]> oops, good morning

16:03 xardion has quit [Remote host closed the connection]

16:03 xardion has joined #jruby

16:13 <headius[m]> woo getting close: https://travis-ci.org/jruby/jruby/builds/603983311?utm_source=github_status&utm_medium=notification

16:47 shellac has quit [Ping timeout: 245 seconds]

19:33 sagax has joined #jruby

20:32 <fidothe> hey all. Making what I think is good progress on https://github.com/jruby/jruby/issues/5095. Unsurprisingly, I am learning a lot about internals... Naive implementation of `String#sub` (I started there, it seemed simpler) shows something like a 2x speedup in the benchmark i cribbed from @headius[m]'s `String#gsub` one (only for the pattern-is-a-string case). I have a couple of assumption-questions.

20:33 <fidothe> I was assuming that in the pattern-is-string case for `#sub` and `#gsub` there'd be no need to create a Regexp object

20:44 <fidothe> However, checking on magic vars in MRI while puzzling over `#gsub` I realised that `$~` returns a Matchdata anyway, and it provides a Regexp version of the string pattern from `#regexp`. And so does `#sub`. In Jruby's implementation `return context.setBackRef(context.nil);` is called by both `subBangIter` and `subBangNoIter`, which I assumed meant it doesn't populate `$~`

20:47 <fidothe> But I can see that it does in IRB. So, given that I clearly don't understand how the `setBackRef`/`$~` stuff works, is there a good resource that gives an overview of the way the magic var stuff works in JRuby? (also, I assume that my initial naive implementation of `#sub`is too naive by far)

20:53 <headius[m]> hey there!

20:54 <headius[m]> So yeah I believe MRI has a way to create a MatchData that's only partially populated so it doesn't have to have a compiled regex

20:54 <headius[m]> lopex: maybe you have some thoughts here

20:54 <headius[m]> I think the way MRI does it is that it just sticks the source string into the MatchData and then lazily creates the regex if needed

20:54 quadz has quit [Ping timeout: 240 seconds]

20:55 <headius[m]> The setBackRef(nil) is likely to clear it before doing the match, and then there should be a set of the actual match data deep in the guts of RubyRegexp.search

20:56 <headius[m]> We definitely want to avoid creating the regexp if possible since that's part of the speedup here, along with not using aheavy-weight regexp match to do a simple substring seach

20:56 <headius[m]> search

20:57 <headius[m]> FWIW setBackRef and such are thread-local and the annotations in JRubyMethod indicate if they will be read or written by a given method. We gather a list of names of methods that do that and assume all such names are tainted that way and will need a place to store backref. This is slated to be improved, either by lazily allocating such space only when needed or by using some low-overhead mechanism like a simple threadlocal

20:57 quadz has joined #jruby

20:58 <lopex> headius[m]: there's this need_backref now https://github.com/ruby/ruby/blob/master/string.c#L5214

20:59 <lopex> lol it's from 2f14bde88fc (nobu 2014-03-28

20:59 <lopex> that's how far behind we are

21:03 <fidothe> Okay that’s useful info. Will do some more digging and keep at it. Will also dig more into MRI’s approach... Thanks!

21:04 <headius[m]> Cool, thanks! I know this stuff is involved so feel free to ping any time. I monitor Matrix and of course I'm on other services. Gitter notifications are busted so I only see those when I happen to check.

21:05 <lopex> fidothe: also, there's some inconsistency with regexp cache in our code

21:06 <lopex> last time I looked mri had one entry regexp cache for loops

21:06 <lopex> we cache all regexps that are being created from strings implicitly

21:07 <lopex> headius[m]: if only I hade a change to play https://deadlockempire.github.io/

21:09 quadz has quit [Ping timeout: 265 seconds]

21:10 quadz has joined #jruby

21:20 snickers has joined #jruby

21:22 drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

21:30 <headius[m]> woohoo, zero remaining items targeted for 9.2.9

21:30 <headius[m]> enebo: SHIP IT

21:31 <headius[m]> I will spend some time before RubyConf trying to finally land load service redux and the direct-addressing hash, so we can say those are at least coming up next

21:35 <lopex> what's the issue with that hash ?

21:36 <headius[m]> in ioquatix PR?

21:36 <headius[m]> s/PR/issue

21:37 <lopex> mri has unicode 12.1.0

21:37 <lopex> we are at 12.0.0

21:37 <headius[m]> what hash are you speaking of

21:37 <lopex> you mentioned that addressing hash issue

21:37 <headius[m]> ohhh

21:38 <headius[m]> it works fine, but the way it's written makes it more susceptible to concurrency issues than the one we have now

21:38 <headius[m]> the current one is less efficient but accidentally correct under more thread-unsafe cases

21:39 <lopex> lolz, imagine specializing all that code on assumption you saw only one thread

21:39 <headius[m]> basically where the current Hash impl is a chained bucket that's mostly appending or making a new bucket array, the direct-addressing version has multiple shared state changes for any mutation

21:39 <headius[m]> my aborted experiment to try to fix it made more of those operations atomic

21:40 <lopex> and deopt on another thread spawning :P

21:40 <headius[m]> like packing all of the int state changes into a single packed long

21:40 <lopex> madmans dream

21:40 <headius[m]> yeah that's no good :-)

21:40 quadz has quit [Ping timeout: 245 seconds]

21:40 <headius[m]> at this point I'm pretty much committed to making String, Array, and Hash be lock-free threadsafe regardless of what you do

21:41 <headius[m]> I think we can keep overhead to a minimum with CAS and friends

21:42 <lopex> the cas impls, are they simple intrincics or do they do optimize things too ?

21:45 nirvdrum has quit [Ping timeout: 252 seconds]

21:54 <headius[m]> I guess I don't know the answer to that

21:54 <headius[m]> the intrinsics use whatever the CPU provides for CAS, and I doubt it can be optimized away because that would seem to defeat the purpose

21:55 <headius[m]> the difference from back in the day when we decided not to explicitly make them thread-safe is that we mostly were still using synchronization, which requires a hard state change and blocking rather than nearly-free CAS for uncontended updates plus spin-until-you-win

21:58 <headius[m]> the many state changes in the D-A hash impl will be a little trickier to do atomically, but I'm sure it's possible

21:58 <headius[m]> Array and String will be easier, especially since they already have COW semantics that could be repurposed to handle atomic updates across threads

22:02 quadz has joined #jruby

22:43 snickers has quit [Read error: Connection reset by peer]

22:45 quadz has quit [Quit: ZNC 1.6.5+deb1+deb9u1 - http://znc.in]

22:46 quadz has joined #jruby

22:53 <lopex> maybe it's time to remove cow for arrays