<chrisseaton[m]>
I've been working with someone who's encountering the 2 GB string limit in JRuby in practice (not judging - TruffleRuby has the same limitation!) Do you have any tips or tricks to manage that limitation?
nirvdrum has quit [Remote host closed the connection]
<chrisseaton[m]>
I'm recommending FFI to them.
<enebo[m]>
chrisseaton: ffi is probably the simplest workaround
<chrisseaton[m]>
Got an overflow bug we found coming as a PR as well.
<enebo[m]>
chrisseaton: cool. I recently saw we have some tagged specs on large index values in ruby/spec. In that case I think we are supposed to realize it and raise
<chrisseaton[m]>
Need to be more rigorous about using addExact etc.
<enebo[m]>
overflow itself is different but possibly somewhat related
<enebo[m]>
oh heh ok then different
<chrisseaton[m]>
newLength = oldLength * 1.5, ends up being negative
<enebo[m]>
hmm if we do that for String (ByteList?) we probably also do it for Array
<chrisseaton[m]>
Anyway PR will make it clear - a comment was also misleading
<headius[m]>
enebo: the overflow patch seems fine but I was unable to allocate a string larger than MAX_VALUE - 2
<headius[m]>
I'm not sure why MAX_VALUE and MAX_VALUE - 1 raise errors
<enebo[m]>
heh interesting...that is unexpected
<headius[m]>
see my comment there... if I use "*" * MAX_VALUE - 2 it allocates and then fails on the first append, but anything larger than that can't complete the *
<enebo[m]>
I do not know how this all opts out since most of the time this will never fire but I wondered if a negative value check is ok since we know how much we grow
<enebo[m]>
hmm although I am wrong
<enebo[m]>
if it just happens to hit max_int than 2*max_int would be what?
<headius[m]>
yeah we have used manual checks like that elsewhere but we have moved almost all those to addExact by now
<headius[m]>
it is a Hotspot intrinsic so it should be pretty efficient when not raising
<enebo[m]>
ok I admit it felt hacky to even consider it
<headius[m]>
using jump on overflow in asm
<enebo[m]>
so one extra jump and this will not generate a trace I am guessing :)
<enebo[m]>
I don't know enough on x86/64 assembly to know but there is probably some support for some IEEEish overflow in the instruction set
<headius[m]>
this is not following the ensure size path either so we need to audit other calls to these size-based ByteList constructors
<headius[m]>
and probably set the fallback size to MAX_VALUE - 2, but I am trying to confirm this in JVM spec
<headius[m]>
chrisseaton: FWIW the only solution I have considered would be using a long[] instead of byte[] so the effective size would be 8 * MAX_VALUE but clearly that takes a lot of code changes and can't interact with byte[] APIs like IO
<headius[m]>
from what I have heard from others this is about the best you can do to get around it without chaining together multiple arrays
<enebo[m]>
I feel like we considered linked list (segments) idea but it would end up changing tons of code
<enebo[m]>
The fact RubyString is backed by ByteList does give us some freedom to change how bytes are backed but ByteList is so unencapsulated I feel this would mean rewriting everything :)
<enebo[m]>
HAHA we will just stop using JVM for byte[] and use malloc
<enebo[m]>
If all strings had an explicit destructor and not relied on finalization it would almost work
subbu is now known as subbu|lunch
<headius[m]>
yeah I was thinking about that too
<headius[m]>
the trend on JVM has been toward making finalization less reliable and now it is actually deprecated
<headius[m]>
so we would need to set up a reaper thread to do this practically going forwarded
<headius[m]>
I believe *Reference logic is still blessed so we would basically have weak references that refer to the "NativeByteList" and also have a reference to the memory pointer to clean, and just scrub it when we detect the weak reference has been evacuated
<headius[m]>
but still needs a reaper
<enebo[m]>
The other thing I dislike though is the notion that standard Java tools will not see that memory
<headius[m]>
enebo: so about fixing this and merging PR... 9.3 or 9.2?
<headius[m]>
merges might get messy fixing it in 9.2 but it is no less an issue there
<enebo[m]>
headius: yeah I am ok with this PR on 9.2. Most people will never hit it and the case you do hit it then you will probably still break but you might not
<headius[m]>
do you mean 9.3?
<enebo[m]>
for the bytelist change?
<headius[m]>
yeah the PR is against master right now
<headius[m]>
you said you are ok with it for 9.2 but then also said that most people will never hit it so I am confused
<enebo[m]>
9.2 also does not use bytelist artifact and self bundles right?
<enebo[m]>
oh ok I mean I think it is not risky for 9.2 because most people will not see it
<headius[m]>
ahh ok
<enebo[m]>
The only possible problem would be some unexpected perf regression but that seems unlikely
<headius[m]>
yeah we can retarget it to 9.2
<enebo[m]>
you could just mege and cp since it will apply so cleanly
<headius[m]>
yeah and then fix additional cases on 9.2
<enebo[m]>
right I am guessing Array has same issue and probably hash
<headius[m]>
and other paths to allocation in ByteList
<enebo[m]>
oh right
<headius[m]>
this only helps the case of growing existing
<enebo[m]>
well an audit definitely seems like a good idea
<headius[m]>
hey I am still having issues with the pom.xml schema URLs too
<headius[m]>
intermittent, I filed a bug the other day
<chrisseaton[m]>
TruffleRuby uses ropes, but they're designed to be able to collapse to byte[] for simplicity, so doesn't actually workaround the length issue for us
<headius[m]>
yeah at some point you still have to work with byte[] APIs
<chrisseaton[m]>
At least if you centralise it you can change it when someone complains!
<headius[m]>
I was wondering about that
<headius[m]>
I assumed it was due to header size
<headius[m]>
I wonder if there is a way to query this without reflection
<headius[m]>
when searching for this I did not see a single post or answer that used anything except MAX_VALUE
<chrisseaton[m]>
Should have been encoded in the spec, really. Not really tractable to play whack-a-mole like this.
<headius[m]>
for sure
<headius[m]>
-8 might be safest since ArrayList uses that but this should be queryable
<headius[m]>
I suppose if there were a way to query, ArrayList would be using it
<headius[m]>
so yeah most of these alloc paths do actually compare with MAX_VALUE but clearly that is not the right value to use
<chrisseaton[m]>
It's only an issue when we've decided to allocate more than the user actually asked for. If the user asked for that much it can be a natural allocation fail.