#rubinius on 2014-09-23 — irc logs at freenode.irclog.whitequark.org

2014-09-11 06:49 brixen changed the topic of #rubinius to: 2.2.10 - http://releases.rubini.us/rubinius-2.2.10.tar.bz2 : logs - http://irclog.whitequark.org/rubinius

00:04 |jemc| has quit [Ping timeout: 240 seconds]

00:27 jnh has quit [Quit: Leaving...]

00:57 amsi has quit [Quit: Leaving]

00:58 |jemc| has joined #rubinius

00:59 carlosga_ has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

01:10 houhoulis has joined #rubinius

01:24 meh` has quit [Ping timeout: 272 seconds]

01:47 <|jemc|> sheesh

01:48 <|jemc|> I did what I thought was an optimization, but adding one goto_if_true instruction (that skips _ahead_) tripled my total time spent by the parser

02:05 amclain has joined #rubinius

02:26 <|jemc|> ah, I see - the original behavior was actually losing capture data - that's why it was faster :P

04:26 carlosgaldino has joined #rubinius

05:07 houhoulis has quit [Remote host closed the connection]

05:07 flavio has joined #rubinius

05:11 flavio has quit [Client Quit]

05:44 johnmuhl has quit [Quit: Connection closed for inactivity]

06:00 Benny1992 has quit [Quit: leaving]

06:00 Benny1992 has joined #rubinius

06:46 <|jemc|> heh, *bad* rbx-master

06:46 <|jemc|> Coercion error: "4".to_str => String failed

06:48 <|jemc|> ooh, and again:

06:48 <|jemc|> Coercion error: "\n".to_str => String failed

06:49 <|jemc|> and again

06:50 <|jemc|> I wonder if there is some kind of JIT inconsistency

07:09 flavio has joined #rubinius

07:14 Thijsc has joined #rubinius

07:15 Thijsc has quit [Client Quit]

07:17 <|jemc|> yeah, with -Xjit.show I can see it's happening while String#to_s is being background-compiled by the JIT

07:17 <|jemc|> https://gist.github.com/jemc/e4fa52209c0cb01cdb43

07:17 <|jemc|> (if anyone is curious)

07:18 heftig has quit [Read error: Connection reset by peer]

07:19 <|jemc|> happens consistently for me in a large (or repeated) run - though at an inconsistent location

07:19 <|jemc|> I'll poke around a bit and maybe do a bisect

07:22 heftig has joined #rubinius

07:33 amclain has quit [Quit: Leaving]

07:52 blowmage has quit [Ping timeout: 245 seconds]

07:54 blowmage has joined #rubinius

07:55 flavio has quit [Ping timeout: 246 seconds]

07:55 flavio has joined #rubinius

08:02 noop has joined #rubinius

08:19 elia has joined #rubinius

08:25 diegoviola has joined #rubinius

08:25 noop has quit [Ping timeout: 244 seconds]

08:26 noop has joined #rubinius

08:42 kagaro1 has quit [Ping timeout: 244 seconds]

08:43 kagaro has joined #rubinius

09:23 benlovell has joined #rubinius

09:30 Thijsc has joined #rubinius

09:32 josh-k has joined #rubinius

09:35 benlovell has quit [Ping timeout: 240 seconds]

09:40 benlovell has joined #rubinius

09:43 diegoviola has quit [Quit: WeeChat 1.0]

09:44 benlovell has quit [Ping timeout: 245 seconds]

10:02 meh` has joined #rubinius

10:02 benlovell has joined #rubinius

10:06 <yorickpeterse> morning

10:26 Ngz00 has joined #rubinius

10:28 Thijsc has quit [Ping timeout: 260 seconds]

10:40 postmodern has quit [Quit: Leaving]

10:50 flavio has quit [Ping timeout: 260 seconds]

10:58 josh-k has quit [Ping timeout: 245 seconds]

11:00 josh-k has joined #rubinius

11:03 josh-k has quit [Read error: No route to host]

11:04 josh-k has joined #rubinius

11:05 josh-k has quit [Remote host closed the connection]

11:25 <cremes> morning

11:30 benlovell has quit [Ping timeout: 258 seconds]

11:43 johnmuhl has joined #rubinius

12:19 benlovell has joined #rubinius

12:32 kagaro has quit [Ping timeout: 260 seconds]

12:44 <cpuguy83> brixen: Sorry about all the marketing junk.

13:04 flavio has joined #rubinius

13:52 kagaro has joined #rubinius

13:54 Thijsc has joined #rubinius

14:05 locks has quit [Ping timeout: 272 seconds]

14:06 locks has joined #rubinius

14:06 jeregrine has quit [Ping timeout: 272 seconds]

14:08 jeregrine has joined #rubinius

14:46 elia has quit [Quit: Computer has gone to sleep.]

14:47 elia has joined #rubinius

14:48 |jemc| has quit [Quit: WeeChat 0.4.3]

14:52 <Benny1992> ping: yorickpeterse https://github.com/rubysl/rubysl-pathname/pull/3

14:52 <Benny1992> I defined rubysl-pathname in Gemfile

14:53 <Benny1992> gem 'rubysl-pathname', :github => "rubysl/rubysl-pathname", :branch => "2.0", platforms: :rbx

14:53 <Benny1992> but it's still complaining for accessing private Method Pathname on module Kernel

14:53 <Benny1992> https://gist.github.com/Benny1992/3bf331ce83774140a12a

14:53 <yorickpeterse> Benny1992: Rbx currently prioritizes the bundled rubysl Gems over those installed/updated manually

14:54 <Benny1992> ah ok thx

14:54 <yorickpeterse> so even if you update the Gem, it will still load the version it initially shipped with

14:54 <Benny1992> kk, can we push a new version of rubysl-pathname?

14:54 <yorickpeterse> not sure if we ever fixed that in master

14:54 <yorickpeterse> Benny1992: that still requires a new release of Rbx I believe, not sure

14:54 <yorickpeterse> I can look into this tonight, bit too busy atm

14:54 <Benny1992> hmm ok

14:54 <Benny1992> no problem :)

15:03 benlovell has quit [Ping timeout: 250 seconds]

15:03 <brixen> cpuguy83: no worries

15:03 <brixen> cpuguy83: just slammed the past couple weeks so didn't see the email for 3 days :p

15:09 josh-k has joined #rubinius

15:15 |jemc| has joined #rubinius

15:32 <|jemc|> does the check_interrupts bytecode instruction play a role in allowing the JIT to do its work?

15:34 <|jemc|> that is, if I make a dynamic_method that runs for a while (say, one second), do I need to explicitly generate a check_interrupts instruction to avoid JIT-related errors like the ones I'm seeing?

15:34 <|jemc|> or is check_interrupts unrelated to the JIT?

15:49 <|jemc|> inserting a check_interrupts instruction before the to_s call seems to _avert_ the problem but I'm not sure if that's causal or just circumstantial

16:03 benlovell has joined #rubinius

16:10 benlovell has quit [Ping timeout: 244 seconds]

16:17 carlosgaldino has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

16:25 noop has quit [Ping timeout: 245 seconds]

16:26 noop has joined #rubinius

16:32 carlosgaldino has joined #rubinius

16:38 flavio has quit [Quit: WeeChat 0.4.1]

17:09 josh-k has quit [Remote host closed the connection]

17:10 josh-k has joined #rubinius

17:14 josh-k has quit [Ping timeout: 244 seconds]

17:47 benlovell has joined #rubinius

17:57 <brixen> |jemc|: what JIT-related errors are you seeing?

18:08 <|jemc|> methods that become incorrect when the JIT hits them - I was just now preparing a second part to my earlier gist

18:08 <|jemc|> this part I'm seeing problems in parts where I'm not using dynamic_method

18:09 <|jemc|> (or using any loops - as far as I can tell check_interrupts is intended to be generated in loops)

18:11 crowell has joined #rubinius

18:11 benlovell has quit [Ping timeout: 272 seconds]

18:11 <crowell> quick question. is there currently a working disassembler for .rbc files?

18:12 <|jemc|> brixen: for example, my Myco::Component#new method becomes incorrect right after I see the line:

18:12 <|jemc|> [[[ JIT finished background compiling ANONYMOUS#new (block) ]]]

18:13 <|jemc|> but again, working on providing some context for you in a gist

18:15 Thijsc has quit [Ping timeout: 246 seconds]

18:17 <|jemc|> crowell: afaik - no, but that doesn't mean it's not easy for someone to piece together what your program is doing

18:18 <crowell> |jemc|: ok, thanks, maybe I'll have to write one up myself :\

18:19 <brixen> crowell: yes, it's call the loader :)

18:20 <crowell> brixen: but does it dump opcodes?

18:20 <brixen> what do you mean by dump?

18:20 <crowell> like can i print the disassembly to stdout or to a file?

18:21 <brixen> ah, are you asking if there's a disassembler from bytecode to Ruby code?

18:21 <crowell> brixen: no. I mean from bytecode to this stuff http://rubini.us/doc/en/virtual-machine/instructions/

18:22 <crowell> all the way back to ruby code would be nice, but that is a stretch goal

18:22 <crowell> similar to how python has 'dis'

18:22 <crowell> http://pymotw.com/2/dis/

18:23 <brixen> I have a PoC for a disassembler

18:23 <brixen> the link you gave is to the docs for the bytecode

18:23 <brixen> a .rbc file is just a serialized tree of compiled code objects

18:23 <brixen> compiled code objects have an instruction sequence object

18:23 <brixen> which is just a stream of bytecode

18:24 <brixen> so, if you load a .rbc file, you get a tree of compiled code objects

18:24 <brixen> if you have a compiled code object, you can print the bytecode with 'puts cc.decode'

18:24 <brixen> is this what you're asking?

18:26 <brixen> eg, the Rubinius kernel (Ruby core classes) are loaded at boot from .rbc files

18:26 <brixen> hence https://gist.github.com/brixen/6a5da1146dcfa51edcf7

18:27 <crowell> brixen: perfect, that's exactly what I was looking for

18:30 <brixen> crowell: ok, cool

18:31 <crowell> the docs for the instructions leave a bit to be desired, but that's what I was looking for

18:33 <brixen> crowell: what's missing?

18:34 <|jemc|> crowell: you can see the implementation of each instruction alongside those same docs if you look in vm/instructions.def

18:34 <Benny1992> ping: yorickpeterse

18:35 <Benny1992> currently adding tests to rubysl-pathname, but the tests fail because the changes I made are not seen, is this because rbx prefers the bundled gem?

18:35 <crowell> ok, cool. just that web page was a bit sparse

18:35 <Benny1992> so the same problem as previous

18:36 <brixen> crowell: if you can be more specific, I may be able to answer a question

18:36 <brixen> "a bit sparse" is not actionable

18:36 <brixen> also, we generate the docs from vm/instructions.def, so you can send a PR as well

18:36 <crowell> brixen: I actually don't have the file I'm trying to disassemble in front of me, so I can't ask any specifics now. but I have enought ot get started

18:36 <brixen> ok

18:37 <brixen> Benny1992: you need to set RUBYOPT=lib to have rbx pick up the gem's files

18:37 <brixen> Benny1992: le'me push an updated .travis.yml for that repo

18:38 <Benny1992> brixen: okay thx :)

18:39 <brixen> Benny1992: er, sorry, RUBYLIB not RUBYOPT

18:39 <Benny1992> awesome works like charm :)

18:40 <brixen> Benny1992: https://gist.github.com/brixen/ffff14eb00cbd14c21be

18:40 <brixen> Benny1992: oh cool

18:43 <brixen> Benny1992: are you working on the specs for #4 ?

18:43 erdic has quit [Ping timeout: 260 seconds]

18:45 erdic has joined #rubinius

18:45 <Benny1992> yep :)

18:45 <Benny1992> https://github.com/rubysl/rubysl-pathname/pull/4

18:45 <Benny1992> done

18:47 amsi has joined #rubinius

18:51 <brixen> Benny1992: you can use simpler fixtures in those specs

18:52 <yorickpeterse> Benny1992: I believe you have to run it with RUBYLIB=.:lib or something like that

18:52 <yorickpeterse> oh, brixen beat me to it

18:52 <brixen> we don't pollute the global namespace with stuff like class RootPath

18:52 <Benny1992> yorickpeterse: thx brixen said it already :D

18:52 <Benny1992> brixen: okay

18:52 <brixen> Benny1992: see http://rubyspec.org/style_guide/ under 1.1 Utility Classes

18:53 <Benny1992> okay thx will give it a read

18:53 <brixen> Benny1992: in those specs, you don't need a class

18:53 <brixen> just use a mock

18:53 <brixen> obj = mock(); obj.should_receive(:to_str).and_return("whatever")

18:54 <Benny1992> okay, will change it :)

18:54 <|jemc|> brixen: this one is turning out to be a bit of a heisenbug (my code is not threaded, but I think it's JIT-related non-determinism) but I think I should be able to help you reproduce the to_s one it if you feel like rake installing myco and running a basic script from my other repo.

18:54 <|jemc|> https://gist.github.com/jemc/e4fa52209c0cb01cdb43

18:55 <brixen> |jemc|: are you running on master?

18:55 <|jemc|> brixen: yes, just pulled this morning the latest and ran again

18:55 <brixen> there's an open PR for a JIT bug

18:55 <|jemc|> oh really

18:55 * |jemc| looks

18:58 <|jemc|> brixen: are you talking about this one? https://github.com/rubinius/rubinius/issues/3114

19:02 <|jemc|> because that looks like it could be causing the second issue I ran into, but not the first one (with String#to_s)

19:02 <Benny1992> brixen: https://github.com/Benny1992/rubysl-pathname/commit/845d5f9d5eef81935f0d30aeabb321c0769d132b better?

19:27 amsi has quit [Ping timeout: 245 seconds]

19:28 noop has quit [Ping timeout: 260 seconds]

20:09 Thijsc has joined #rubinius

20:10 diegoviola has joined #rubinius

20:22 elia has quit [Quit: Computer has gone to sleep.]

20:25 amsi has joined #rubinius

20:27 elia has joined #rubinius

20:28 elia has quit [Client Quit]

20:36 diegoviola has quit [Remote host closed the connection]

20:45 elia has joined #rubinius

20:46 elia has quit [Client Quit]

20:53 diegoviola has joined #rubinius

20:59 <yorickpeterse> ok Ruby trivia:

20:59 <yorickpeterse> Given I have a Hash with find values as the keys, and replacements as the values

21:00 <yorickpeterse> What's the most efficient way of running a find-replace using that table, using the least amount of String#gsub calls

21:00 <yorickpeterse> the most basic form is find_replace.each { |find, replace| input = input.gsub(find, replace) }

21:00 <yorickpeterse> That however is slow as sin

21:00 <yorickpeterse> (we're talking about running this thousands/millions of times potentially)

21:00 elia has joined #rubinius

21:01 <yorickpeterse> Hm, I think I _might_ be able to compile a clever regexp for this

21:01 <yorickpeterse> hm no, that's impossible

21:02 elia has quit [Client Quit]

21:06 <headius> is it?

21:07 <yorickpeterse> Adding that to this parsing setup slows it down by ~5,5 times

21:07 <yorickpeterse> It would have to run for every text node in the document

21:07 <yorickpeterse> which can potentially be a lot of nodes

21:08 <yorickpeterse> Time wise this 10MB XML file that I auto generated goes from 0.02 seconds to 480 ms

21:08 <headius> you might speed that each loop up by iterating over keys

21:08 <yorickpeterse> (parsing time)

21:09 <chrisseaton> Why not move through the string once with a state machine?

21:09 <chrisseaton> That would be optimal

21:09 <headius> I was thinking a regexp that is all of the keys to replace and a block passed to gsub...then you look up the keys as they're found and return replacement

21:10 <yorickpeterse> so context: I need to replace certain XML entities (e.g. <) with their equivalents (< in this case)

21:11 postmodern has joined #rubinius

21:11 <yorickpeterse> The mapping is basically: { '<' => '<', '>' => '>', '&' => '&' }

21:11 <yorickpeterse> I _can_ do this lazily at the very end of the parsing chain (basically find/replace upon access), but I'm curious if I can do it earlier on

21:14 diegoviola has quit [Quit: WeeChat 1.0]

21:16 <chrisseaton> you already have a state machine in the lexer right? why not make it recognise and expand the entities?

21:18 <yorickpeterse> That requires emitting separate tokens, which I then have to stich back together (elements can only contain a single text node)

21:18 <yorickpeterse> I tried that actually, it's way slower than the above loop

21:18 <yorickpeterse> it results in more string allocations, only for those to be stitched back together

21:19 <yorickpeterse> so for example, for the string "<foo&ampl;" you'd normally have 1 allocation

21:19 <yorickpeterse> However, if you emit stuff separately you'd now have 3 allocations

21:19 <yorickpeterse> ("<", "foo" and "&")

21:19 <yorickpeterse> then you smack them together for a 4th allocation

21:20 <yorickpeterse> so basically O(N*4) vs O(N)

21:20 <chrisseaton> I don't mean new tokens - I mean you're already going through a string and copying character by character to create the token string in the lexer aren't you? so why not copy and expand at the same time?

21:20 * yorickpeterse finally gets to use the big O

21:20 <chrisseaton> O(N*4) is exactly the same thing as O(N)

21:20 <yorickpeterse> Oh no, the lexer doesn't do that

21:20 <yorickpeterse> it operates on byte ranges

21:20 <yorickpeterse> plus it's shared between C/Java, so doing find/replacements there is a total pain

21:20 <chrisseaton> ah ok

21:21 <yorickpeterse> errr derp you're right

21:21 <yorickpeterse> (regarding the big O stuff)

21:21 <yorickpeterse> see, I never use it :P

21:42 elia has joined #rubinius

21:46 <yorickpeterse> headius: does ByteList in JRuby allow me to find/replace bytes?

21:46 <yorickpeterse> crap wait I can't use that, that means I'd have to check things for every token

21:46 <yorickpeterse> darn

21:47 <headius> yorickpeterse: sure

21:47 <yorickpeterse> I guess having something like String#tr supporting multiple character replacements (opposed to 1 char being replaced with another single char) would be nice here

21:48 <headius> can't you turn the keys into a regexp and try what I suggested? gsub + block will hurt, but unless you expand as you lex I'm not sure how to avoid that

21:49 <yorickpeterse> That would still result in multiple string allocations

21:49 <yorickpeterse> In fact, I think the block form would result in one allocation for every match

21:49 <yorickpeterse> unless it only evaluates the block once for all matches

21:54 _elia has joined #rubinius

21:55 Thijsc has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

21:56 elia has quit [Ping timeout: 245 seconds]

21:56 <headius> yorickpeterse: it would

21:56 <headius> but at least the underlying byte[] would be shared across those instances

21:59 _elia has quit [Ping timeout: 245 seconds]

22:31 <yorickpeterse> brixen: in Rbx, do we allocate a new string when calling String#gsub! ?

22:32 <yorickpeterse> That is, is it basically `new_str = dup.gsub(....); replace(new_str)`, or does it truly modify the current string in-place?

22:32 <yorickpeterse> hm, I think I can actually test that!

22:37 <yorickpeterse> oh, string literals seem to bypass String#initialize in rbx

22:37 <yorickpeterse> wtf

22:46 <yorickpeterse> bah, I just want to measure how many Strings are created, this is stupid difficult in both MRI and Rbx

22:46 <yorickpeterse> MRI has TracePoint but lol of course that doesn't work when creating strings

22:47 <yorickpeterse> and Rbx just bypasses String#new for literals :/

22:47 <yorickpeterse> (╯°□°)╯︵ ┻━┻

22:49 <headius> you should be able to get a count on JRuby by passing flag -J-Xrunhprof:depth=0

22:50 <headius> that enables JVM-level object profiling... depth=0 makes it not accumulate allocation backtraces to speed up the data gathering

22:50 <headius> it should be pretty similar across impls if we have mostly the same logic

22:51 <yorickpeterse> headius: I also need this on MRI and Rbx though, I need to see if String#gsub! allocates more than I want there

22:51 <yorickpeterse> hmpf, object allocation tracking doesn't appear to work either

22:52 <headius> yeah, I don't know how different it will be between jruby and MRI since we largely have the same logic

22:54 <yorickpeterse> meh, I need to dig in to this when I'm actually awake. Toodles

22:55 <yorickpeterse> oh derp that's right, allocation tracking also requires a compile time flag

22:55 <yorickpeterse> ugh

22:55 <yorickpeterse> see, I need sleep, laters

22:56 elia has joined #rubinius

22:56 |jemc| has quit [Read error: Connection reset by peer]

23:01 elia has quit [Quit: Computer has gone to sleep.]

23:03 josh-k has joined #rubinius

23:08 josh-k has quit [Ping timeout: 250 seconds]

23:17 diegoviola has joined #rubinius

23:35 elia has joined #rubinius

23:42 josh-k has joined #rubinius

23:51 elia has quit [Quit: Computer has gone to sleep.]