<darix>
as you can see the configure should be running in UTF-8 locale too
<darix>
it works when building with MRI
<Ngz00>
Forgive me, I am new to dev ops.. just a ruby developer who needs to get real concurrency. I'm trying to create a new image with Rubinius, however, I'm lost as to which directory is preferred for Rubinius installations and how to set up preferences in regards to Ruby vs RBX.
<darix>
it is basically the same as the debian package uses.
<darix>
heftig: hahaha i have an idea
<darix>
heftig: our rubinius package doesnt set up /usr/bin/gem yet. so i had to tell it which is the real path to gem.rbx2.2 :)
<darix>
RTFS helped
<darix>
Ngz00: in theory my rpm should build on centos too. but it isnt in my focus right now.
<Ngz00>
darix: What is an 'rpm'? Hah, I'm quite fresh on dev ops.
<darix>
rpm is what your distro uses for pkg mgmt
<heftig>
darix: ah, right. We have the /usr/bin/gem -> rbx symlink in a package here that I install for building rubinius
<darix>
heftig: my normal ruby pakages have that too. but I didnt do it for rubinius yet.
<darix>
and right now i just want to break the build cycle :)
<Ngz00>
Hm, I'm getting this message '/usr/bin/ld: cannot find -lruby' on the 'bundle install' section of the Rubinius installation. Is this an issue with the make file or am I missing a dependency
<Ngz00>
I'm assuming that its because Linux AMI came with Ruby installed and that GCC or what not cannot find the module?
<darix>
heftig: a little bit further but calling mspec calls it with plain "env ruby"
<darix>
you need ruby-devel
enriclluelles has quit [Remote host closed the connection]
<Ngz00>
I did sudo yum -y install ruby-devel
<Ngz00>
But I'm not sure which paths the makefile is referencing
<darix>
Ngz00: just curious ... what is the program you are trying to run with rubinius later?
<Ngz00>
Trying to do some cpu intensive processing with ruby scripts
Ori_P has joined #rubinius
enriclluelles has joined #rubinius
<cremes>
Ngz00: make sure you do some representative benchmarks and compare across mri, rbx and jruby. i recommend the benchmark_suite gem for this purpose (it can run benches on all 3 during the same run and directly compare).
<Ngz00>
cremes: Oh wow, I'll check that out
<Ngz00>
Thanks :D
<cremes>
Ngz00: yeah, it was written by the rubinius guys to make sure perf was measured right. benchmarks are hard!
Ngz00 has quit [Ping timeout: 250 seconds]
heftig has quit [Ping timeout: 260 seconds]
heftig has joined #rubinius
postmodern has quit [Quit: Leaving]
Ori_P has quit [Ping timeout: 245 seconds]
meh` has joined #rubinius
Ori_P has joined #rubinius
DireFog has quit [Ping timeout: 260 seconds]
DireFog has joined #rubinius
benlovell has quit [Ping timeout: 260 seconds]
yxhuvud has quit [Remote host closed the connection]
Ngz00 has joined #rubinius
benlovell has joined #rubinius
<Ngz00>
Hallo. Is using processes the only way to get parallel scheduling?
houhoulis has joined #rubinius
yxhuvud has joined #rubinius
houhoulis has quit [Remote host closed the connection]
dzhulk has quit [Quit: Leaving.]
Ori_P has quit [Quit: Computer has gone to sleep.]
<Bish>
when godes rubinius create those precompiled rbc files?
<darix>
Bish: i dont know
<Bish>
last_msg.gsub(/godes/,"does")
<Bish>
darix, why reply then :D
<darix>
Bish: i thought you referred to my problem
<Bish>
oh my bad then, sorry :)
dberlinger has joined #rubinius
Ori_P has quit [Ping timeout: 245 seconds]
<Ngz00>
Anyone have any insight on how to full utilize the CPU with Ruby?
Ori_P has joined #rubinius
<darix>
Ngz00: it depends
<darix>
Ngz00: are you processing large amounts of data. can you run with multiple processes aswell or does it have to be within the same process because of sharing data.
<Ngz00>
darix: It doesn't appear that Rubinix and the thread class are giving me real parallization. Should I be using processes?
<darix>
those are pretty much language independent questions.
<darix>
Ngz00: it depends.
<Ngz00>
Darix: I can definitely chunk it up. I'm aggregating data for projects over multiple years, but each project, or set of projects could be a new process
<Ngz00>
Darix: They do not share any data
<darix>
Ngz00: if you dont have to parallize the processing of a single project, that is an option yes.
<Ngz00>
darix: Can you elaborate? I would like them all to run in parallel and finish as soon as possible, with minimal execution blocking.
<headius>
if you're running parallel threads and it's not using the CPU you expect, it may be memory pipeline-bound
<headius>
how are you breaking up the work?
<Ngz00>
I'm spinning up a new thread for each project and then joining them all together.
<Ngz00>
They seem to block as it takes far longer than I would expect for them to finish.
<headius>
can you show us the code that spins up the threads to do work?
Ori_P has quit [Quit: Computer has gone to sleep.]
<headius>
Ngz00: is this the first thing you've tried in rbx?
<Ngz00>
headius: Yes
<headius>
you might want to confirm that perf on rbx for non-threaded runs meets with your expectations
<headius>
if that's good, then confirm that all your threads are actually running at the same time
<Ngz00>
headius: What do you mean by non-threaded runs?
<Ngz00>
headius: Is that a proper implementation of threads?
<headius>
well, if you run this on MRI it basically runs the same threaded or unthreaded and finishes in X seconds...so you should try it in rbx unthreaded to get a baseline for your threaded run
<headius>
the code you showed seems fine, they should run in parallel given that
Ori_P has joined #rubinius
<Ngz00>
headius: I see, I've tried it on MRI but I haven't actually benchmarked. They seem to all execute initially at the same time, but then theres a large hang time between the finishing of each thread. I believe that they're getting blocked during the labor aggregation, however, I'm only using static methods and not sharing any references so I don't understand. I'll run some benchmarks with a smaller data set and see what results I get. Thank
<Ngz00>
you for your help and patience.
<Ngz00>
headius: Last question, do you believe it's worth looking into separate independent processes?
<headius>
if these processes don't have to communicate much, it may be another good path to parallelism
<headius>
I'm not married to threads or processes for any task...threads just make it easier to coordinate
<Ngz00>
headius: Awesome, thank you again.
<headius>
no problem
<headius>
yorickpeterse: I'm curious...have you confirmed in some way that nokogiri problems are due to thread-unsafety in libxml?
<headius>
I thought putting a lock around nokogiri downcalls didn't fix it, so I'm confused by that
Ori_P has quit [Quit: Computer has gone to sleep.]
Ori_P has joined #rubinius
mjc__ is now known as mjc
mjc is now known as mjc_
noopq has quit [Ping timeout: 255 seconds]
Ori_P has quit [Quit: Computer has gone to sleep.]
Ori_P has joined #rubinius
<yorickpeterse>
headius: 100% sure
<yorickpeterse>
and the problem doesn't strictly occur when querying documents or parsing them, so a lock around that would not help
<headius>
hmm
<headius>
but wouldn't that prevent libxml from being run in paralle?
<headius>
I'm just confused because we never had concurrency bugs reported for ffi-nokogiri, which was also based on libxml
<headius>
that was a long time ago though
<yorickpeterse>
headius: a bug not being reported doesn't mean it doesn't exist
<yorickpeterse>
also the problem occurs in the custom libxml free() function of nokogiri, no idea if ffi-nokogiri had that
<headius>
it just wrapped libxml...this free function is from nokogiri?
<headius>
I guess I would like to understand exactly how libxml is thread-unsafe
Ori_P has quit [Quit: Computer has gone to sleep.]
Ori_P has joined #rubinius
Ori_P has quit [Client Quit]
benlovell has quit [Ping timeout: 260 seconds]
<yorickpeterse>
darix: euuhhh...
<yorickpeterse>
I have no idea :/
<yorickpeterse>
that looks like corrupted gemspecs, but I'm not sure
<darix>
yorickpeterse: those are the gemspecs from the tarball.
<darix>
it can be that this part is run in C locale again
<headius>
yorickpeterse: is it #1047 that has the appropriate nokogiri/rubinius discussion
<headius>
?
enriclluelles has quit [Remote host closed the connection]
elia has quit [Quit: Computer has gone to sleep.]
dzhulk has joined #rubinius
<headius>
yorickpeterse: so after reading through again...I guess the theory is that the data pointer has been freed by libxml independent of the data freeing logic, which causes the latter to segv
<headius>
what I'm confused about is how that would happen if nokogiri downcalls were happening under lock
<caio_oliveira>
will there be any future support from rubinius to LLVM 3.5?
<headius>
you mean will 3.5 be supported? I think there's a PR for that right now
<caio_oliveira>
really
<caio_oliveira>
nice
<yorickpeterse>
headius: Yes, 1047 is the issue
<yorickpeterse>
LLVM 3.5 is supported in master
djb has joined #rubinius
dberlinger has quit [Ping timeout: 255 seconds]
djb is now known as dberlinger
Ngz00 has quit [Ping timeout: 246 seconds]
<headius>
yorickpeterse: nokogiri appears to provide its own free functions for all data nodes it wraps...does the error occur inside the nokogiri free funcs?
nirix has quit [Ping timeout: 245 seconds]
<headius>
the free func for the node wrapper doesn't do anything, for example
cezarsa_ has quit [Read error: Connection reset by peer]
guilleiguaran___ has quit [Ping timeout: 272 seconds]
nwjsmith____ has quit [Read error: Connection reset by peer]
cezarsa__ has joined #rubinius
nwjsmith_____ has joined #rubinius
guilleiguaran___ has joined #rubinius
nirix has joined #rubinius
benlovell has joined #rubinius
<yorickpeterse>
headius: the GH issue explains that
havenwood has quit []
<yorickpeterse>
(or should otherwise clarify that)
<headius>
actually it doesn't
<headius>
the last comment from brixen talks about turning off concurrent GC and adding a lock around nokogiri
<headius>
well, the last significant comment anyway
<headius>
an earlier comment talks about a double free but it doesn't provide any evidence that it's libxml doing an unexpected free
havenwood has joined #rubinius
benlovell has quit [Ping timeout: 240 seconds]
<headius>
I may be missing the key evidence here...trying to consume all info in the issue
caio_oliveira has quit [Quit: ChatZilla 0.9.90.1 [Firefox 32.0.1/20140911151253]]
<headius>
yorickpeterse: I guess I just don't understand how libxml or nokogiri thread-safety could be coming into play if the native side is never running in paralle
<headius>
that seems impossible to me
<headius>
Or put differently: if putting a lock around nokogiri native calls and turning off concurrent GC did not fix the problem, how is it a thread-safety issue in nokogiri or libxml?
<yorickpeterse>
it's also literally in the gdb output
<yorickpeterse>
also disabling concurrent GC doesn't solve the problem
<headius>
does the lock around nokogiri fix it?
<yorickpeterse>
No, the issue still persists
<yorickpeterse>
The issue occurs far less often once you drop the amount of threads to just the main thread (at least in our production apps)
<headius>
so how is it a thread-safety problem then
<headius>
you have no nokogiri or libxml native code running in parallel
<headius>
or rather, how is it a thread-safety problem in those libraries, when you're guaranteeing no code runs in parallel?
<yorickpeterse>
hm lets see: 1 thread: problem doesn't occur very often
<yorickpeterse>
10 threads: problem occurs 30 seconds in
<headius>
if the native library is not running in parallel and rubinius is running in parallel, I think that would indicate the problem is in rubinius
<headius>
regardless of how many threads you throw at the "safe" configuration, the native code is serialized at the Ruby API boundary and nothing runs during those downcalls
<yorickpeterse>
That sentence makes zero sense
<headius>
no, it makes perfect sense
<yorickpeterse>
Because rbx runs things in parallel that means Rbx is bugged? What?
<headius>
the code you claim is not thread-safe is not running in parallel
<headius>
while Rubinius itself does run parallel
<headius>
ergo, if more threads causes the problem, I would suspect Rubinius first
<yorickpeterse>
That sentence still makes zero fucking sense as to why it would be rbx
<headius>
why would it be nokogiri/libxml?
<headius>
You're saying the problem happens more often with more threads, right? And with the lock and non-concurrent GC, nokogiri and libxml native code shouldn't be running in parallel, right?
<yorickpeterse>
Because 1) the problems occur in Nokogiri specific code, in the mark/free stuff that it adds 2) I've not seen this problem with any other Gem we're using in the past 10 months
<headius>
so...how is it still a thread-safety issue in nokogiri or libxml?
<yorickpeterse>
Also, the Rbx immix lock does not control whatever libxml does under the hood
<yorickpeterse>
it simply ensures the immix GC doesn't do mark/sweep concurrently
<headius>
unless libxml is spinning up threads that free objects, I'd say it's pretty unlikely it's freeing things in another thread
<yorickpeterse>
The problem occurs before it can do its shit as Nokogiri's mark() function by that time is already dealing with crap
<yorickpeterse>
I don't know how brixen came up with the CAPI lock fixing things, because it doesn't
<headius>
ok, then I'm back to my original question: if nothing in libxml is running in parallel, where's the evidence that this is a thread-safety issue in libxml?
<yorickpeterse>
I already told you that
<headius>
you're not getting my point
<yorickpeterse>
Read above, I'm not going to keep repeating
<yorickpeterse>
* myself
<headius>
more threads doesn't mean anything if libxml isn't running in parallel
<headius>
MRI can throw more threads at nokogiri in the same model and it doesn't lbow up
<headius>
blow up
<yorickpeterse>
That's because MRI has the GIL
<yorickpeterse>
So it doesn't actually run threads in parallel when dealing with Nokogiri
<headius>
and rbx has a GIL around nokogiri
<headius>
so it doesn't actually run threads in parallel when dealing with nokogiri's native data
<headius>
right?
<yorickpeterse>
The CAPI lock does nothing about the GCs running concurrently, it merely prevents concurrent C calls
<headius>
but you said turning off concurrent GC doesn't fix it either
<headius>
and I commented in the bug that it seems like a big risk to have GC do any manipulation of objects actively being used from GC
<headius>
I mean from C, not GC
<headius>
forgive me...there's a lot of circumstantial evidence I'm trying to sort out
<yorickpeterse>
The GC doesn't mark/sweep things that are not mark/sweepable, I'm not sure where this idea comes from that it would randomly decide to free something
<headius>
just saying "more threads causes it to break, therefore it's libxml" doesn't convince me
<yorickpeterse>
To clarify, this is basically what happens:
<headius>
especially when libxml itself is not running in parallel with anything that manipulates those native structures
<yorickpeterse>
At some random point in time, memory gets marked/freed, resulting in Nokogiri's custom mark() function being called. At this point this mark() function has a bunch of shit it passes to the GC, crashing Rbx
<headius>
there are no random points in time.... when does this happen?
<yorickpeterse>
this "shit" is a bunch of corrupt data/void pointers (I'd have to dig through gdb to see what _exactly_ it was again), something Rbx can't free
<yorickpeterse>
Oh god, read above
<yorickpeterse>
I said "30 seconds in", but that's not a guarantee
<yorickpeterse>
it might be 32 seconds, 40, etc
<yorickpeterse>
I just measured it usually only taking around 30 seconds
<headius>
so after some variable amount of time...but WHO is freeing it?
<headius>
30 seconds means nothing
<headius>
someone's freeing those objects, and nobody seems to have proven who
<yorickpeterse>
In one of my apps that happens to usually take place around 30 seconds
<headius>
well, not the moment it goes out of scope, but I think I understand
<headius>
another possibility...could the GC of nokogiri objects from one thread be running in parallel with the allocation of nokogiri objects in another thread?
<yorickpeterse>
Possibly, though I don't see how that would mess things up
<yorickpeterse>
In the repro script no data is shared between threads, assuming libxml is configured with their thread-safety option this should work
<headius>
has anyone talked to libxml folks to see if they can confirm this is a concurrency problem?
<yorickpeterse>
if it wouldn't crash the process with memory errors I'd try nokogiri with a no-op mark() function
<yorickpeterse>
to see what would happen then, but alas that would probably still crash things
<yorickpeterse>
and no, I don't think anybody contacted libxml as it's not entirely clear what the heck is going on with it
<headius>
does rbx lock around all native memory management via a native DATA alloc/mark/free?
<yorickpeterse>
or if it's even a libxml problem, and not a problem of the combo libxml + nokogiri
<headius>
right, it certainly could still be a libxml or nokogiri problem...I'm just trying to make it crystal clear *why* it's a libxml or nokogiri problem
<headius>
if it is true that no libxml code runs in parallel but there's still issues, I would escalate it to libxml folks
<headius>
ideally with a profile and native backtraces for the free/mark/etc functions in question
<yorickpeterse>
bbl
dmilith_ is now known as dmilith
dzhulk1 has joined #rubinius
dzhulk has quit [Ping timeout: 272 seconds]
elia has joined #rubinius
elia has quit [Quit: Computer has gone to sleep.]
benlovell has joined #rubinius
benlovell has quit [Ping timeout: 260 seconds]
diegoviola has joined #rubinius
rue has joined #rubinius
kfpratt has quit [Remote host closed the connection]
rue has quit [Client Quit]
_kfpratt has joined #rubinius
_kfpratt has quit [Remote host closed the connection]
postmodern has joined #rubinius
diegoviola has quit [Quit: WeeChat 1.0]
Ngz00 has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
fbernier has quit [Ping timeout: 245 seconds]
noopq has quit [Ping timeout: 255 seconds]
fbernier has joined #rubinius
diegoviola has joined #rubinius
jaffachi_ has joined #rubinius
elia has joined #rubinius
dzhulk1 has quit [Quit: Leaving.]
kfpratt has joined #rubinius
benlovell has joined #rubinius
benlovell has quit [Ping timeout: 272 seconds]
dzhulk has joined #rubinius
dberlinger has quit [Quit: Leaving...]
dzhulk has quit [Quit: Leaving.]
Ori_P has joined #rubinius
Ori_P has quit [Ping timeout: 245 seconds]
Ori_P has joined #rubinius
Ori_P has quit [Ping timeout: 272 seconds]
Ori_P has joined #rubinius
Ngz00 has joined #rubinius
benlovell has joined #rubinius
benlovell has quit [Ping timeout: 240 seconds]
Ori_P has quit [Quit: Computer has gone to sleep.]