#jruby on 2017-09-28 — irc logs at freenode.irclog.whitequark.org

2017-09-06 20:15 ChanServ changed the topic of #jruby to: Get 9.1.13.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

00:03 shellac has quit [Quit: Computer has gone to sleep.]

00:17 shellac has joined #jruby

00:19 Joufflu has joined #jruby

00:24 rdubya has joined #jruby

00:24 rdubya has quit [Client Quit]

00:27 rdubya has joined #jruby

00:27 rdubya has quit [Client Quit]

00:27 rdubya has joined #jruby

00:28 rdubya has quit [Client Quit]

00:38 shellac has quit [Quit: Computer has gone to sleep.]

01:20 Joufflu has quit [Ping timeout: 255 seconds]

03:04 deobalds has joined #jruby

03:30 jhass has quit [Ping timeout: 246 seconds]

03:32 yopp has quit [Ping timeout: 258 seconds]

03:40 yopp has joined #jruby

03:41 jhass has joined #jruby

05:04 rdubya has joined #jruby

05:04 rdubya has quit [Client Quit]

06:23 Joufflu has joined #jruby

06:25 Joufflu has quit [Client Quit]

06:25 Puffball has quit [Quit: No Ping reply in 180 seconds.]

06:28 claudiuinberlin has joined #jruby

06:29 Puffball has joined #jruby

06:36 shellac has joined #jruby

06:57 temporal_ has quit [Ping timeout: 252 seconds]

07:03 shellac has quit [Quit: Computer has gone to sleep.]

07:28 deobalds has quit [Quit: Computer has gone to sleep.]

07:49 shellac has joined #jruby

08:23 shellac has quit [Quit: Computer has gone to sleep.]

08:57 temporalfox has joined #jruby

08:57 drbobbeaty has joined #jruby

09:06 temporalfox has quit [Ping timeout: 248 seconds]

09:12 temporalfox has joined #jruby

09:21 shellac has joined #jruby

09:22 drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

09:32 temporalfox has quit [Ping timeout: 248 seconds]

09:59 deobalds has joined #jruby

10:36 shellac has quit [Quit: Computer has gone to sleep.]

10:48 shellac has joined #jruby

11:20 drbobbeaty has joined #jruby

11:22 rdubya has joined #jruby

11:22 rdubya has quit [Client Quit]

11:22 rdubya has joined #jruby

11:22 rdubya has quit [Client Quit]

11:30 rdubya has joined #jruby

11:41 deobalds has quit [Quit: Computer has gone to sleep.]

12:15 lance|afk is now known as lanceball

13:03 Scorchin_ is now known as Scorchin

13:19 <GitHub176> [jruby] eregon pushed 4 new commits to master: https://git.io/vdOmI

13:19 <GitHub176> jruby/master e43c4cc Benoit Daloze: Squashed 'spec/mspec/' changes from 5bd9409..8aa352c...

13:19 <GitHub176> jruby/master a9fdbdb Benoit Daloze: Merge ruby/mspec commit 'e43c4ccadb90684226a1fc365680e630df71c0ab'

13:19 <GitHub176> jruby/master c3c0595 Benoit Daloze: Squashed 'spec/ruby/' changes from a4bc1d8..691755d...

13:45 shellac has quit [Ping timeout: 240 seconds]

13:48 shellac has joined #jruby

13:57 rdubya has quit [Quit: Leaving.]

13:58 rdubya has joined #jruby

14:38 drbobbeaty has quit [Ping timeout: 258 seconds]

14:42 drbobbeaty has joined #jruby

15:22 shellac has quit [Read error: Connection reset by peer]

16:30 drbobbeaty has quit [Read error: Connection reset by peer]

16:34 enebo has joined #jruby

19:10 olle has joined #jruby

20:20 olle has quit [Quit: olle]

20:33 <GitHub7> [jruby] ivoanjo closed issue #4651: Unexpected memory usage when running with native.enabled=true https://git.io/vH6Vn

20:34 temporalfox has joined #jruby

20:49 <nirvdrum> enebo: FYI, JRuby is affected by this: https://bugs.ruby-lang.org/issues/13950

20:50 <enebo> nirvdrum: thanks

20:57 temporalfox has quit [Read error: Connection reset by peer]

20:59 <nirvdrum> We're affected, too, since we copied from you.

20:59 <nirvdrum> But I think I'm going to make it "correct", otherwise strings go down the wrong path.

20:59 temporalfox has joined #jruby

21:01 <enebo> nirvdrum: because you are roped will your patch help us too?

21:01 <enebo> nirvdrum: I did not look for their fix to this

21:01 <nirvdrum> I just opened the issue. It's been a bug since 1.9.2.

21:02 <nirvdrum> But it's not a matter of ropes or not.

21:02 <nirvdrum> The only real difference is we eagerly work out the code range. I added assertions today verifying we're actually creating ropes with the correct code range. And that's when I started tripping over these old bugs.

21:08 <enebo> nirvdrum: well no doubt their patch will be very similar to ours

21:09 <nirvdrum> I added a follow-up comment. I think the logic here is more complex.

21:09 <enebo> nirvdrum: these are bugs I am ok with fixing as soon as discovered since nothing is gained from leaving behavior like this

21:09 claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]

21:09 <enebo> nirvdrum: oh I just read the actual bug

21:09 <enebo> nirvdrum: hilarious

21:10 <enebo> nirvdrum: basically TR with any non 7bit friendly encoding is just broken

21:10 <enebo> nirvdrum: so long as it can tr ascii-like chars

21:10 <nirvdrum> Yes.

21:11 <nirvdrum> Unless you end up only using it in contexts where code range doesn't matter.

21:11 <enebo> nirvdrum: which is not many encodings other than utf16*

21:11 <nirvdrum> I also opened https://bugs.ruby-lang.org/issues/13949

21:11 temporalfox has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

21:11 <nirvdrum> But JRuby isn't affected by that one.

21:11 <nirvdrum> ISO-2022-JP is another one I tried.

21:11 <nirvdrum> It's a problem for any non-ASCII compatible encoding.

21:11 <enebo> nirvdrum: but that is not 7 bit ascii when ascii?

21:12 <nirvdrum> Eh?

21:12 <enebo> nirvdrum: err how should I say that sentence :)

21:12 <enebo> nirvdrum: if [a-z] is not actually ascii a-z

21:12 rdubya has quit [Quit: Leaving.]

21:13 <nirvdrum> I could have this very wrong, but CR_7BIT only applies to codepoints < 128 in an ASCII-compatible encoding.

21:13 <nirvdrum> It's used all over the place for single-byte operations.

21:14 <enebo> nirvdrum: no you have it right

21:14 <enebo> nirvdrum: but UTF-16LE does not store those there

21:14 <enebo> nirvdrum: well I mean they are all 2 bytes

21:14 <nirvdrum> UTF-16LE uses 2 bytes for each codepoint.

21:14 <enebo> nirvdrum: I assume ISO-2022-JP must as well

21:14 <nirvdrum> And I guess surrogate pairs for grapheme clusters or whatever.

21:15 <nirvdrum> All this shit is confusing.

21:15 <enebo> nirvdrum: but 'a' in UTF-16 is a two bytes

21:15 <enebo> so it cannot be 7bit

21:15 <nirvdrum> Every CR_7BIT string is by definition CR_VALID, but if you incorrectly mark a string that *could be* CR_7BIT as CR_VALID, you might go down a bad path and get incorrect results.

21:15 <enebo> although it happens to have a 7bit 'a' in it :)

21:15 <nirvdrum> Because the helper functions for CR_VALID assume the source string is MBC.

21:16 <nirvdrum> enebo: Right.

21:16 <enebo> yeah I hate CR because it is less semantically driven and more specific to what they could opt at the time of 1.9 as an impl

21:16 <nirvdrum> I hate ASCII-8BIT with a passion.

21:16 <enebo> but at this point I doubt we could just remove it

21:17 <enebo> that internal state now dictates external encoding in cases so it has become semantic

21:17 <nirvdrum> singleByteOptimizable usually means it's CR_7BIT or it's ASCII-8BIT (or I suppose some other binary-based encoding).

21:17 <enebo> yeah

21:17 <nirvdrum> You just always have this corner case of dealing with arbitrary binary strings.

21:18 <enebo> nirvdrum: yeah

21:18 <nirvdrum> And it has a wildly different usage pattern.

21:18 <nirvdrum> I posit most strings are read-only, but most byte buffers have many writes.

21:19 <enebo> nirvdrum: yeah I guess in many languages bytes and chars are two things or chars come from byte storage

21:19 <nirvdrum> Ropes take a beating when reading from IO, for instance.

21:19 <enebo> nirvdrum: but notion that chars are made of bytes is more explicit at a language level

21:20 <enebo> nirvdrum: fact that in 1.8 all strings were bytes I guess drove this lack of distinction

21:20 <nirvdrum> I'm sure it was based on C strings.

21:20 <enebo> yeah char*

21:20 <enebo> C also is similarly weird

21:20 <nirvdrum> I'd just like to see a byte[] or byte buffer type introduced and let that get adopted.

21:21 <nirvdrum> ASCII-8BIT is here forever for compatibility reasons. But we could encourage people off it.

21:21 <enebo> nirvdrum: I think Matz dislikes lots of types but it is a duck-typed language so Binary/String would have been reasonable to me

21:22 <enebo> bin[1] would be second offset of bytes and string[1] would be second offset of a character

21:22 <nirvdrum> https://bugs.ruby-lang.org/issues/13166

21:22 <nirvdrum> That's where I was advocating for it.

21:22 <nirvdrum> Since you can actually screw up your bytes arrays pretty easily if you're not careful.

21:23 <nirvdrum> You basically really need to know how MRI implemented strings.

21:24 <nirvdrum> enebo: This is terribly inefficient, but I don't have the energy to work out this tr helper at the moment: https://github.com/graalvm/truffleruby/pull/574/commits/89385b36297a232510af74cb32915b12b248d798

21:24 <nirvdrum> That seems to fix the String#tr issue.

21:25 <enebo> nirvdrum: why do you pass in 'c'?

21:25 <enebo> oh nvm

21:28 <nirvdrum> But it's silly to check this for every byte after you've already determined it's CR_VALID.

21:28 <enebo> nirvdrum: yeah the asciicompat check is invariant

21:28 <enebo> nirvdrum: and once valid it won't go back right?

21:28 <nirvdrum> The resulting string couldn't, as far as I can tell.

21:29 <nirvdrum> This should be a function of the replacement string's encoding and whether the string to replace is ever matched.

21:29 <enebo> nirvdrum: this is all gross in any case (not specifcially your code but the notion of how side-effecty cr is)

21:29 <nirvdrum> I sorta took their macro that you guys seem to have inlined and blew it up to include the current encoding.

21:30 <enebo> you could just look for non-ascii char as a boolean and set cr once at end or something too?

21:30 <enebo> no doubt your ropes even know right

21:31 <enebo> like you know if you have mbc in a rope already just not if result of tr will end up with mbc

21:31 <nirvdrum> I think so, yeah. But this code has a half dozen different exit points.

21:31 <enebo> nirvdrum: yeah no doubt this is squirrelly method too

21:32 <nirvdrum> I have a boundary on it and just expect it to be slow. At some point I'm going to need to tackle it.

21:35 <enebo> heh ours is massive

21:35 bbrowning is now known as bbrowning_away

21:37 <enebo> nirvdrum: so you have all this c0, c, save state too right?

21:40 <nirvdrum> Sorry, not sure what you're asking.

21:40 <enebo> nirvdrum: massive trTansHelper

21:40 <enebo> nirvdrum: just looking at the loop where those checks happen

21:41 <nirvdrum> Oh. We literally copy & pasted from you.

21:42 <enebo> nirvdrum: I am guessing someone decided to overload c to contain whether translation worked or not and the new value

21:42 <nirvdrum> The only thing that's changed is ByteList was replaced with something that looks like ByteList called RopeBuilder.

21:42 <enebo> who would be nobu probably :)

21:42 <enebo> so to save a boolean he added a second int

21:43 <enebo> well I guess I should say original c, translated c, and whether translation worked

21:43 <enebo> since translated c might not work then it is set to -1

21:44 <enebo> this is probably fine but the names and -1 checks make this challenging to read

21:46 <enebo> nirvdrum: if (c < TRANS_SIZE) {

21:47 <enebo> nirvdrum: this confuses me does utf16-le translate 'a' to be a value < 256?

21:47 <enebo> nirvdrum: c is the codepoint of 'a'

21:47 <enebo> as an example

21:48 <enebo> nirvdrum: I naively assumed everything in utf-16le would be above 256

21:48 <enebo> nirvdrum: this is where I feel weak in my knowledge

21:50 <enebo> heh actually ignore that question it probably i > trans_size

21:51 <enebo> everything has more than one path here

21:51 <nirvdrum> I'm getting dinner ready. In and out at the moment.

21:51 <nirvdrum> I've spent approximately 0 amount of time trying to understand this code, though.

21:51 <enebo> nirvdrum: np. I will probably stop working fairly soon...fighting jet lag and feelign a bit weird

21:51 <nirvdrum> I'm not sure I even understand many of the Ruby usages I see of it.

21:52 <enebo> nirvdrum: my main issue with code like this is we kept parity with C version to make future fixes simpler but it is in this we must have all logic paths in one method C-like build around it

21:52 <nirvdrum> Yeah.

21:52 <nirvdrum> The approach is sound, if not ugly.

21:52 <enebo> nirvdrum: so grokking all paths sometimes means you only follow one branch everywhere through one call but maybe not :)

21:53 <enebo> nirvdrum: yeah I am sure it works it is just trying to handle all cases in one place

21:53 <enebo> nirvdrum: which I think speaks volumes to lack of polymorphism

21:55 enebo has quit [Quit: Leaving.]

21:56 enebo has joined #jruby

22:00 <nirvdrum> enebo: If you're not married to maintaining parity, I've started optimizing CR_7BIT cases a bit more: https://github.com/graalvm/truffleruby/pull/563/files

22:00 <nirvdrum> And https://github.com/graalvm/truffleruby/pull/567

22:00 <enebo> nirvdrum: well actual reasonable variable names would be enough for me

22:01 <enebo> c0 is not great for readability

22:02 <enebo> nirvdrum: but possibly

22:08 <nirvdrum> On the case methods, using ^= 0x20 seemed clearer and faster to me than this CType table with a bitmask applied.

22:11 <nirvdrum> enebo: I don't suppose you've looked at the Unicode casing methods for 2.4 support yet, have you?

22:11 <enebo> nirvdrum: nope

22:12 <enebo> nirvdrum: I think it needs data import but Charlie may have did that

22:12 <enebo> done that

22:12 <enebo> whatever

22:12 <enebo> in joni

22:12 <enebo> err jcodings

22:13 rdubya has joined #jruby

22:13 rdubya has quit [Client Quit]

22:15 rdubya has joined #jruby

22:15 rdubya has quit [Client Quit]

22:15 rdubya has joined #jruby

22:15 rdubya has quit [Client Quit]

22:20 rdubya has joined #jruby

22:20 rdubya has quit [Client Quit]

23:10 <lopex> nirvdrum: for encodings casing is mostly this https://github.com/ruby/ruby/blob/trunk/include/ruby/onigmo.h#L177

23:12 <lopex> and then rb_str_casemap in string.c on ruby side

23:15 rdubya has joined #jruby

23:15 rdubya has quit [Client Quit]

23:57 <nirvdrum> lopex: Thanks. I have no idea what any of those data structures are though.

23:57 drbobbeaty has joined #jruby

23:57 <lopex> nirvdrum: OnigEncodingTypeST is our Encoding class

23:58 <lopex> nirvdrum: all those function pointers translate to our virtual methods one to one