#jruby on 2020-05-22 — irc logs at freenode.irclog.whitequark.org

2019-08-12 18:53 ChanServ changed the topic of #jruby to: Get 9.2.8.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

00:21 ur5us has quit [Ping timeout: 260 seconds]

00:25 ur5us has joined #jruby

00:49 michael_mbp has joined #jruby

00:50 nirvdrum has quit [Ping timeout: 256 seconds]

03:31 ur5us has quit [Ping timeout: 260 seconds]

03:33 ur5us has joined #jruby

04:43 _whitelogger has joined #jruby

05:38 ur5us has quit [Ping timeout: 260 seconds]

05:46 nirvdrum has joined #jruby

08:46 knu has quit [Ping timeout: 260 seconds]

08:48 knu has joined #jruby

11:46 _whitelogger has joined #jruby

14:04 haze has quit [Quit: Lost terminal]

18:07 subbu is now known as subbu|lunch

18:38 subbu|lunch is now known as subbu

18:42 <headius[m]> enebo: hey about https://github.com/jruby/jruby/pull/6228

18:43 <headius[m]> I was wondering if it would be more appropriate in the parser

18:43 <headius[m]> the parser produces this chain of tokens for the formatting etc but leaves the padding character blank

18:43 <headius[m]> so then the interpreter has to guess at it which requires me to save that extra state

18:44 <headius[m]> well interpreter/parser whatever you'd call it

18:44 <headius[m]> but I did not write that parser so I am not sure if this is too contextual

18:44 <enebo[m]> so are you saying it is something the parser is not sending on but it is there

18:45 <headius[m]> well I'm saying the FORMAT_OUTPUT token should be set to the text padding character (a space) because the subsequent token it applies to will want text padding

18:45 <headius[m]> I implemented that manually by deferring the padding character selection until I see the next token

18:46 <headius[m]> but that seems like maybe parser could determine it better

18:46 <headius[m]> my patch is essentially rewriting the token stream

18:46 <enebo[m]> headius: I am going to lok at the code again...I just looked at your changes themselves vs how it is implemented before that point

18:46 <headius[m]> what I have works fine but it feels wrong

18:47 <enebo[m]> if this was printf and a parser for it then it would just probably construct formats with their specifiers but not really know anything about the fields

18:47 <headius[m]> the other thing that seems odd to me is the fact that padding is a token on its own rather than being a modifier to the following token

18:47 <enebo[m]> for this perhaps we have more necesary knowledge and it would know

18:47 <headius[m]> like you could have a flag during parse that says "next token should pad by X" and then when you get the token you'd know how to pad

18:48 <headius[m]> right that printf comment is just what I mean

18:48 <enebo[m]> I need to look at code

18:48 <headius[m]> like if you wrote a parser for printf, you wouldn't have 20 and "f" be different tokens for %20f

18:49 <headius[m]> the term token is maybe confusing here but to me it's clear the whole item is "f" with padding "20"

18:49 <headius[m]> or precision or whatever

18:49 <enebo[m]> well I would have format('f', [specifiers]) where those specifiers would be '20' and other things but I would not do much pass that

18:49 <headius[m]> right

18:49 <enebo[m]> but yeah I think f would have a set of modifiers

18:49 <headius[m]> that would be more than enough here and no hacks needed to lazily decide the padding char

18:49 <enebo[m]> it would just be 20 but not say ' ' or 0 or how it is padded

18:50 <enebo[m]> but you should be able to just know it is 20 of something and then go but for Z it is spaces

18:50 <headius[m]> I just felt icky about fixing this broken lazy logic by making it more lazy

18:50 <headius[m]> started to feel like a parrot implementer

18:50 <enebo[m]> haha

18:50 <enebo[m]> ok give me a few to remember what this code is

18:50 <enebo[m]> did I write it?

18:50 <headius[m]> ok

18:50 <headius[m]> I thought so

18:50 <headius[m]> you're my patron saint of parsers

18:52 <enebo[m]> this particular file has you all over it but it was a move commit

18:53 <enebo[m]> I am sure I wrote the lexer

18:54 <enebo[m]> I do not think I wrote this...I dislike the notion of a List<Token> but I do remember optimizing that with a cache I think

18:55 <enebo[m]> I half think someone wrote this for us and it was too slow then I added caching and it was quick cached

18:55 <headius[m]> I don't believe I ever implemented the parser side of this that's consuming the tokens

18:55 <headius[m]> but I have been known to forget

18:55 <enebo[m]> Although the List<Token> doesn ot need to be reparsed each time so that was a benefit

18:56 <enebo[m]> yeah I think someone contrib'd this

18:56 <enebo[m]> All my parser that are not the main ones are simple static recursive descent parsers

18:57 <headius[m]> ok

18:57 <headius[m]> so maybe my fix is fine then

18:58 <enebo[m]> So each option will have a token for FORMAT_SPECIAL before whatever is after it

19:02 <enebo[m]> ok so compilePattern is not entirely a parser...it does parse the flex tokens but then just makes another list of tokens

19:03 <enebo[m]> I think whatever processes this list should just notice their is a FORMAT SPECIAL and then when it processes the next token which is what to print (e.g. year) it then knows special said 20

19:04 <enebo[m]> but I am looking at how we process these tokens now

19:04 <headius[m]> that's what I did

19:04 <headius[m]> oh more context here

19:04 <headius[m]> the rewrite turns that special format into series of extra tokens

19:05 <headius[m]> so it see s %F and replaces it with the equivalent of Y-m-d

19:05 <headius[m]> the problem is that the padding selection is deferred until after that so it sees Y and thinks it wants a numeric padding

19:05 <headius[m]> but it should have a text padding because it's an aggregate format

19:05 <headius[m]> sorry I should have described the problem better at first

19:06 <headius[m]> so yeah the problem is that the lazy pad logic comes after the rewriting

19:06 <headius[m]> by that time it doesn't know it was supposed to be text padded

19:07 <enebo[m]> All aggregates are more or less text?

19:07 <headius[m]> yeah

19:07 <enebo[m]> but then all of those aggregates combined nee to be 20 for %20F

19:07 <headius[m]> see my patch, there were two special formats that appear to use numeric padding because they're just weird formats

19:07 <headius[m]> Q I think was one

19:08 <headius[m]> I checked all of these against MRI

19:11 subbu is now known as subbu|away

19:18 <enebo[m]> ok I see why you are saying what you are saying

19:18 <enebo[m]> if it is special no formatter is provided BY THE LEXER otherwise it makes one

19:21 <headius[m]> right

19:22 <headius[m]> and in my head it seemed like the lexer would be better set up to provide that

19:22 <headius[m]> my code seems like me working around lack of information out of the lexer

19:22 <enebo[m]> so all directives call this method directive which will possibly output special

19:22 <enebo[m]> I would not have had this lex like this

19:23 <headius[m]> since it's a contrib I release you from your obligation to assist, but maybe you will have an idea how to do it better

19:23 <enebo[m]> I would have passed all the flags as tokens and you would read each flag until you came to the actual directive

19:23 <headius[m]> strftime is a pretty simple format so it seems like this could all be parsed in one go

19:23 <headius[m]> yeah that would be better too

19:23 <headius[m]> I think the main ick here is that I have to rewrite the token stream

19:24 <enebo[m]> well I wouldn't have but nonetheless

19:24 <headius[m]> so that the subsequent interpreter will actually have the right information

19:25 <enebo[m]> wow this is surprisingly tough to follow

19:25 <enebo[m]> special is what?

19:26 <headius[m]> it took me a while to figure out where to fix this

19:26 <headius[m]> and I am not entirely happy with how I did it

19:26 <enebo[m]> I can read the code but I don't get why it is not within the default map of format tokens

19:26 <headius[m]> special are these multiformat things I guess

19:26 <headius[m]> like %F gets translated into %Y-%m-%d

19:26 <headius[m]> so it's "special"

19:26 <enebo[m]> I have not run or print anything out so I am not really grokking as well as I could

19:27 <headius[m]> so this is a pre-parse that converts "special" aggregate formats into their components

19:27 <headius[m]> it's like a de-parser

19:27 <enebo[m]> ok so the F gets sent on

19:27 <enebo[m]> but with no formatter

19:27 <headius[m]> well I'd say the F gets converted into Y-m-d and that is sent on, but then the later determination of padding has the wrong specified

19:27 <headius[m]> specifier

19:27 <enebo[m]> I feel like 'F' could have just been added to this list and had the token returned be the expanded list

19:28 <headius[m]> so it rewrites F but does not rewrite the FORMAT_OPTION that came before it

19:28 <headius[m]> until my patch

19:28 <headius[m]> my patch basically says "ok I saw a FORMAT_OPTION... let's see what it applies to"

19:28 <headius[m]> and then produces a new FORMAT_OPTION token based on special format rather than rewritten format

19:29 <headius[m]> it's a weird impl

19:29 <headius[m]> seems overcomplicated now that we are discussing it more

19:30 <enebo[m]> I can sort of understand why they cleaved it into two passes

19:30 <headius[m]> well I assume it's because of these special formats

19:30 <enebo[m]> they decided to have the compound stuff expand in second phase

19:31 <headius[m]> it parses out the special token and then splats it into its elements

19:31 <headius[m]> yeah

19:31 <enebo[m]> but the formatting applies to the compond value right?

19:31 <headius[m]> right

19:31 <headius[m]> and that's the bug

19:31 <headius[m]> so now it has formatting determination in two places

19:31 <enebo[m]> %20F is all three things and not just that it was numeric

19:31 <enebo[m]> so it is really two problems

19:31 <headius[m]> right

19:35 <enebo[m]> how do you figure out that there are 5 more characters in that expansion?

19:35 <headius[m]> eh?

19:35 <headius[m]> oh for the padding

19:36 <enebo[m]> maybe I misread how add to pattern works

19:36 <headius[m]> 🤔

19:36 <enebo[m]> It sort of looked like it just put them all on as their own thing

19:36 <headius[m]> I guess the padding count is determined after formatting the resulting text

19:36 <headius[m]> because e.g. month long names would be different

19:37 <enebo[m]> so you have the same length string as MRI with your fix?

19:37 <headius[m]> I don't think I checked

19:37 <enebo[m]> I would naively assume this fix only makes the year get spacing out to 20

19:38 <headius[m]> the original report just said the character was wrong so I probably didn't think to check the length

19:38 <headius[m]> well it's supposed to pad F

19:38 <headius[m]> so it pads Y-m-d out to N chars

19:38 <enebo[m]> yeah and with this fix I think it just pads Z

19:38 <enebo[m]> or whatever the 4 digit year

19:38 <enebo[m]> or at least that is what I think

19:39 <headius[m]> it's a good theory

19:39 <enebo[m]> I should probably apply your patch and try

19:39 <headius[m]> if so then this is more broken than I realized and it needs to also determine the pad difference during special processing

19:39 <enebo[m]> this formatting needs to almost be parens around a group of text

19:39 <headius[m]> hmm

19:39 <headius[m]> but it can't

19:39 <headius[m]> because it doesn't know that F will be Y-m-d or how long those are

19:40 <enebo[m]> The problem with this parser is it just translates and add formatter at front but it just broke it into n sub formats

19:40 <headius[m]> well maybe it can hardcode this because that is a fixed-length format

19:40 <headius[m]> 4-2-2

19:40 <enebo[m]> yeah I was wondering how many are known

19:41 <enebo[m]> %c will not work

19:41 <enebo[m]> unless it does some special padding

19:41 <enebo[m]> 'a b e' is variable

19:42 <headius[m]> right I figured there'd be some variable length special formats

19:42 <headius[m]> maybe you should rewrite it this afternoon

19:42 <enebo[m]> I think this can be fixed generically

19:43 <enebo[m]> instead of List<Token> it could be List<Token | List<Token>>

19:43 <enebo[m]> if list is encountered it recursively calls format call on that sublist

19:44 <headius[m]> hmm

19:44 <enebo[m]> but the problem with that solution is formatting has to happen after

19:44 <enebo[m]> I think whoever wrote this did not consider this at all and as such this is pretty busted

19:46 <enebo[m]> wow this is really weird...didn't this use to cache the compiled form

19:46 <enebo[m]> I must be thinking of another date/time parser in our codebase

19:56 <headius[m]> hey unrelated, how is this bug still open and marked for 9.1.18?

19:57 <headius[m]> https://github.com/jruby/jruby/issues/5082

19:57 <headius[m]> it turns out it's working in 9.2.11.1 but I went to change the milestone and was shocked to see it was set to a released version

19:57 <enebo[m]> haha

19:57 <headius[m]> yeah 9.1.18 has three open issues even though it's a closed milestone

19:57 <headius[m]> 1.7.15 has two issues open

19:57 <enebo[m]> Can you mark against a closed milestone?

19:57 <headius[m]> that's it

19:58 <headius[m]> yeah you can

19:58 <headius[m]> this was clearly not fixed in 9.2.11.1 but it works there

19:58 <enebo[m]> yeah that is fine

19:58 <headius[m]> I'm not about to bisect releases back to 9.1.18

19:58 <headius[m]> ok

19:59 <headius[m]> looks like it's only these 5 bugs that are unfixed and marked for releases

19:59 <headius[m]> I will review and clean up

20:00 <enebo[m]> so strftime up until your "fix" did not look ahead (or at least not outside jflex) so I am confused why this is made into a List then walked

20:00 <enebo[m]> I see no cache here at all

21:09 <headius[m]> enebo: so the week is winding down here and this is in my queue... how do you want to proceed?

21:22 subbu|away is now known as subbu

22:00 nirvdrum has quit [Ping timeout: 260 seconds]

22:17 <enebo[m]> landing it as-is is better than nothing

22:18 <enebo[m]> I think ultimately we should rewrite this parser and add a cache

22:26 <headius[m]> ok

22:26 <headius[m]> I'll land it then for 9.3

23:04 nirvdrum has joined #jruby

23:15 ur5us has joined #jruby

23:41 ur5us has quit [Quit: Leaving]