<enebo[m]>
If you try this on master you will see a Java proxy class get setup via compilation where it passes a null superClass to newClass.
<enebo[m]>
Not sure if you would care to look at it or not but I figured you know that code
<enebo[m]>
kares: if that gets fixed interestingly we will see a second error raising NPE when compiling optparse
<enebo[m]>
kares: which goes back to 9.2...that is something else though :P
<kares[m]>
heh, reproduced. is this in an issue on the tracker?
<kares[m]>
would be interested to look into but my work-queue is too full ... maybe tomorrow or over the weekend
<enebo[m]>
kares: ah yeah no rush at all...it is only on master atm
<enebo[m]>
but no issue
<enebo[m]>
I was trying to debug IRFlags changes and there are lots of errors in our logging code now and then I realized multiple errors have nothing to do with my changes
<headius[m]>
I fixed a bug in jit logging on ir_concurrency
<headius[m]>
NPE in InterpretedIRMethod
<headius[m]>
er maybe it was InterpretedIRBlockBody... it was returning null for implementationClass
<enebo[m]>
well I have another now since fic never gets set on failed compile we cannot see what it tried to do
<enebo[m]>
My changes work for startup interp but I hit some bugs when I switch to Full/JIT
<enebo[m]>
now that I am debugging I am hitting things
<enebo[m]>
I did see that problem so II probably had not pulled since then
<kares[m]>
wonder if we should finish that with pends - most seems to be io/posix issues
<enebo[m]>
kares: pend with windows platform and that is just fine
<enebo[m]>
kares: I think seeing what new things we break is better than not running it at all
<kares[m]>
okay I will look into that and add a test there for the regression
<kares[m]>
will re-target fir 9.2 it is still merged to master, right?
<headius[m]>
kares: not surprising
<headius[m]>
yes we are merging periodicailly
<kares[m]>
yy not at all I expected worse ;)
<kares[m]>
simce tgat suite is 'crazy'
<kares[m]>
* since that suite is 'crazy'
<headius[m]>
I modified spec tags on master to ad RbConfig::CONFIG['host_os'] to excluded tags so if you copy that over to 9.2 branch it should be possible to tag as "mswin" or something
<headius[m]>
I can't remember what we use for host_os on Windows but it's always the same
<kares[m]>
this is only test:jruby
<headius[m]>
might be mswin32 😟
<headius[m]>
oh I see
<headius[m]>
still no posix IO for most stuff on Windows
<kares[m]>
tagged some in the code already
<headius[m]>
ok
<kares[m]>
will do the rest with TODOs
<headius[m]>
we used to run specs on windows but that was years ago... would be a project for a day or two
<headius[m]>
kares: yeah maybe we should file a bug with that tagging commit so we don't just forget about them
<headius[m]>
wish we had a better mechanism for tagging test/unit
<headius[m]>
both we and MRI do consider encoding in String hash calculation, but we don't use the same logic for symbol bytelist hashing
<headius[m]>
so that's probably the only remaining fix to get identical symbols with different encodings working
<enebo[m]>
I am not sure
<enebo[m]>
we already will make any 7bit sequence into US-ASCII before it enters the table
<headius[m]>
yeah I think that's wrong
<headius[m]>
what we should do instead is make it US-ASCII only if the original encoding is 7bit-compatible and the content is all 7-bit
<headius[m]>
I think
<headius[m]>
otherwise we leave it as is and use the encoding index as part of the hash
<enebo[m]>
I have never seen a 7bit clean symbol not be US-ASCII but I guess it is not us-ascii compatible string but happens to be us-ascii bytes maybe?
<headius[m]>
yes
<headius[m]>
in the example case it's an ascii string forced into utf-16
<enebo[m]>
The parser does do this check though. Although it could be wrong
<headius[m]>
it may be unlikely that we see bugs with this anymore in practice, but that particular report is still broken
<enebo[m]>
I actually don't know of a scenario where that could happen either but I can believe it could
<headius[m]>
also we scan for code range in #intern which rejects that forced UTF-16 string
<headius[m]>
MRI does not reject it
<headius[m]>
they appear to happily make symbols that are CR_BROKEN
<headius[m]>
which is... weird
<enebo[m]>
which intern?
<headius[m]>
String#intern
<headius[m]>
I added another comment after the one linked above
sagax has joined #jruby
<enebo[m]>
I admit I have not looked in a while but this surprises me
<headius[m]>
I think if we review their logic again we'll be able to sort out which piece go where
<enebo[m]>
symbol in RubySymbol should be an id into the symbol table as a string which is always ISO-8859_1 bytes which will never fail to intern
<headius[m]>
but they do not appear to check for CR_BROKEN on the way into symbol
<headius[m]>
we do
<enebo[m]>
I think we do have String entry points which maybe is not honoring that but I believe that is all coming from Java and should then all be valid UTF-16LE
<enebo[m]>
Admittedly there are likely loose ends but in theory that string should not fail to intern
<headius[m]>
this is just RubyString#intern for a bad UTF-16 string
<headius[m]>
as in the linked issue's original repro
<enebo[m]>
which I did not read :P
<headius[m]>
if I remove that check, though, we still have the old behavior and are not distinguishing between the "ab" string forced to UTF-16 and the "ab" string that's just US-ASCII
<enebo[m]>
at I see so RubySymbol itself could be fine here with Broken
<headius[m]>
yeah seems to be
<enebo[m]>
it would just be a really messed up symbol that only this RubyString would be a path to
<headius[m]>
and even if string contents are 7-bit, it's still keeping the UTF-16 and US-ASCII versions of symbol separate
<headius[m]>
right
<enebo[m]>
if you wanted to get it by that again (or itself obviously)
<headius[m]>
obviously any other UTF-16 string should not collide with a US-ASCII string ever
<headius[m]>
unless there's embedded nulls, which would get rejected other ways usually
<enebo[m]>
so now having properly read what you said with it being RubyString.intern() it looks like we could almost just remove the coderangescan
<headius[m]>
it seems so
<headius[m]>
RubySymbol uses RubyString hashing?
<headius[m]>
I mean the symbol table
<enebo[m]>
oh yeah ok that is the other part of that issue right?
<headius[m]>
yeah
<headius[m]>
from what I can see all String#intern symbols just go through getSymbol(byteList) which does not consider encoding for hash
<headius[m]>
CRuby explicitly uses rb_str_hash for it
<headius[m]>
(which incidentally also does the hash DOS stuff, we are not doing in symbol table)
<headius[m]>
our symbol hash appears to just be the iso-8859-1 bytes, in javaStringHashCode
<enebo[m]>
tbh I am not sure I have much more invested in this conversation today. It seems like you are close to having figured this out already.
<enebo[m]>
it is as far as I remember
<enebo[m]>
which is why a RubyString will probably just find the broken CR symbol from a string
<headius[m]>
yeah I can look into it but wanted to get your thoughts
<headius[m]>
it shouldn't be hard to make logic match MRI and this issue has been dangling so I thought I'd knock it down
<headius[m]>
on master
<enebo[m]>
I think removing just that check will address the case of broken but I still believe two weird ISO compatible strings containing the 8th bit will not work
<enebo[m]>
which is where that last comment of mine came from
<headius[m]>
I'll try to align logic for 9.3 and see if I can kill this issue today
<enebo[m]>
but 8bit 8859-11 and 8859-13 (made these up) will still not work
<enebo[m]>
I don't think this is a very significant incompatibility but since we do not tuple encoding with those bytes I think that will still be broken
<enebo[m]>
broken strings probably are actually fairly common
<headius[m]>
yeah unsure about that case
<headius[m]>
I think e.g. 11 and 13 may be compatible under 7bit so they wouldn't show up as 7-bit only if there were differing characters?
<enebo[m]>
the issue as I see it is they will be identical 8859_1 hashes
<enebo[m]>
but the 8bit values will mean different things
<headius[m]>
but if both had same chars and were all under 7bit they would collide in MRI too I thinkn
<enebo[m]>
no but they are not only 7bit
<headius[m]>
if they're not 7bit MRI will include encoding index in hash
<enebo[m]>
they contain something 8bit which is the same byte but a different character
<enebo[m]>
yeah MRI will but we won't
<headius[m]>
right, which is what I will try to fix
<enebo[m]>
ok
<headius[m]>
basically just has to propagate some string hash logic into symbol table
<enebo[m]>
that will be great to fix it
<headius[m]>
might need to store CR
<headius[m]>
I'll see
<enebo[m]>
CR can be calculated but it forces a scan so it is probably nice to prop it
<enebo[m]>
I still sort of wish it was in bytelist
<enebo[m]>
going up to start some lunch
<headius[m]>
yeah
<headius[m]>
patch got bigger than expected but I have it propagating CR into Symbol now and using String hashing for the table
<headius[m]>
it seems like most places that could pass a stored CR through already do... and parser already saves RubySymbol in SymbolNode
<headius[m]>
there are a few places, mostly coming from Java that only have a String or ByteList in hand
<headius[m]>
gotta get use to this new icon for riot/element
<enebo[m]>
well I am not even concerned about that although it is good to know it was from that
<enebo[m]>
I had some other tab open with about 40% of the tests green and the rest yellow
<enebo[m]>
If I click on the commit it appears to be all my last changes
<enebo[m]>
What am I watching?
<enebo[m]>
OH I SEE
<headius[m]>
PRs done against origin test twice
<enebo[m]>
It does one build for the commit and one for the PR
<headius[m]>
not sure how to prevent that but still allow branches to CI
<enebo[m]>
ok makes sense too but I thought I was a bad luck time traveller for a moment
<headius[m]>
yeah it's annoying
<headius[m]>
the push (branch) CI always runs first, fwiw
<enebo[m]>
so I may be as done as I can be without rewriting it to be the right design
<headius[m]>
hot crackers
<enebo[m]>
but I do have one more thing to do (well two)
<enebo[m]>
1. Make sure nothing has stopped compiling and is falling back to interp
<enebo[m]>
2. Sort of related but make sure the same opts are happening before/after
<headius[m]>
ugh this symbol unmarshal logic blows
<enebo[m]>
#2 proves #1 already but it is not super simple...I
<headius[m]>
stopped compiling as in couldn't JIT to bytecode?
<enebo[m]>
headius: stopped compiling as it hit and error and bailed out
<enebo[m]>
I think I can trivially just change test:mri:core:jit probably to do verbose logging on jit and see what shows up
<enebo[m]>
but I am not super concerned about this as much as determining whether the passes actually did stuff like eliminate dyn scopes and stuff like that
<enebo[m]>
even examining the flag does not tell me anything more than it might have happened
<headius[m]>
oh like you aren't sure jit is still doing what it should after your changes
<headius[m]>
compiler specs are pretty unforgiving so that's one assurance
<headius[m]>
if something fails in there now it's broken in full or jit logic for other stuff too
<headius[m]>
it doesn't have any fallback
<enebo[m]>
headius: do spec:compiler actually depend on all passes running a particular way?
<headius[m]>
they don't check anything related to passes but they run jit like jit would run
<headius[m]>
they are just testing that jit works and new code behaves as expected
<enebo[m]>
oh sure so it probably is a better indicator of #1 than #2
<headius[m]>
yeah
<enebo[m]>
ok
<enebo[m]>
I was freaked this morning when I uncovered the getImplementationClass NPE
<headius[m]>
deeper testing could be done against full to make sure the code structure is as expected
<enebo[m]>
I was thinking oh crap we have some stuff not compiling that precedes all this
<enebo[m]>
funny it was literally just the log statement that was broken (as far as we know)
<headius[m]>
I guess neither full nor jit would be "easy" to verify because we're basically looking at the "assembly" after and making sure it's what's expected
<enebo[m]>
well if you know JIT is eliminating a scope then that would be cool
<enebo[m]>
I think largely that and scopes replaced with temps are the two main opts I want to make sure have not changed
<enebo[m]>
I don't want any of them to stop but those are bigger ones from a changed perf perspective
<enebo[m]>
I cannot do before/after on all flag state because I made all scopes get flags whereas we used to not do it for several scopes
<headius[m]>
yeah it would be easier to do against full
<headius[m]>
once it's in bytecode all I can really inspect is the bytecode
<enebo[m]>
yeah I think my plan is to run same thing with same printlns before/after and those prints will say I removed dyn scope for X
<headius[m]>
we have some pass listener framework I've never tried to use
<headius[m]>
could possibly hook into that
<enebo[m]>
I think if there is no interleaved output I can just sort and compare
<enebo[m]>
yeah it exists largely to report what was run but not how it was run
<enebo[m]>
So I can verify the same passes are run but I already know they are
<enebo[m]>
I maybe will think a little about this though. It would be nice to report what happened in a pass. compilerpasslistener could be instrumental in making something we could regularly test
<enebo[m]>
maybe some void actionPerformed(pass, fix, data...) or something
<headius[m]>
I think marshal should be registering ByteList for symbols
<headius[m]>
MRI appears to do something similar before they actually create a symbol object
<enebo[m]>
Although it is not perhaps a great idea for next 24 hours in that I need to apply it to older code too ... same could be said of printlns but new code uses fic and old uses IRScope so println probably is simpler
<headius[m]>
that's the only way I can untangle this lifecycle
<headius[m]>
yeah actionPerformed kind of callback would be good
<headius[m]>
maybe throw together an enum of actions they take and just feed those out if there's a listener
<headius[m]>
we can improve it later
<enebo[m]>
yeah I think it could also just be a log as well but as a listener testing could register that and pull it into Ruby
<headius[m]>
right
<enebo[m]>
oh I plan on it being string data largely
<enebo[m]>
based on my 3 minutes of thinking about it
<enebo[m]>
but enum has disadvantage of being a pain to extend and if it will end up as Ruby we do not gain a lot from it
<headius[m]>
just a more structured way to log them really
<headius[m]>
we have no consumers of it so we could reevaluate the enum later too
<enebo[m]>
yeah I think at this point I also need to evaluate what an action really is for each thing
<headius[m]>
the marshaling rework puts us more in line with CRuby as far as the workflow of getting a symbol out of the marshal stream
<headius[m]>
additionally I had to remove all "fast" paths from a java.lang.String to Symbol because they were using iso-8859-1 hash... the only way to get a symbol now is via a ByteList, so the only "fast" way is to keep the ByteList (and ideally a CR) on hand
<headius[m]>
we will want to audit all newSymbol/getSymbol(java.lang.String) paths and try to find ways to propagate a ByteList through
<headius[m]>
some like method_missing and const_missing currently use a Java String and will now be creating intermediate ByteList along the way
<headius[m]>
it might be possible to restore the "fast" paths by doing all the hash calculation from the String bytes as though they have been decoded to UTF-8... I will take a quick look to see if that's possible
<headius[m]>
I don't know if it would be more efficient than just creating the UTF-8 ByteList and using that, but it may be possible to eliminate the alloc
<headius[m]>
I've updated the issue with a description of the two extra pieces and will look into fast pathing some of this now
<headius[m]>
heh
<headius[m]>
fun with optimization... I have a method that can take a Java String and calculate both Ruby hash and code range from it without any allocation, using our existing ByteBuffer cache
nirvdrum has quit [Ping timeout: 246 seconds]
<headius[m]>
yay
<headius[m]>
I have fast String to Symbol logic working with minimum allocation
<headius[m]>
no reason not to go forward with this now