<headius[m]>
I was wondering if it would be more appropriate in the parser
<headius[m]>
the parser produces this chain of tokens for the formatting etc but leaves the padding character blank
<headius[m]>
so then the interpreter has to guess at it which requires me to save that extra state
<headius[m]>
well interpreter/parser whatever you'd call it
<headius[m]>
but I did not write that parser so I am not sure if this is too contextual
<enebo[m]>
so are you saying it is something the parser is not sending on but it is there
<headius[m]>
well I'm saying the FORMAT_OUTPUT token should be set to the text padding character (a space) because the subsequent token it applies to will want text padding
<headius[m]>
I implemented that manually by deferring the padding character selection until I see the next token
<headius[m]>
but that seems like maybe parser could determine it better
<headius[m]>
my patch is essentially rewriting the token stream
<enebo[m]>
headius: I am going to lok at the code again...I just looked at your changes themselves vs how it is implemented before that point
<headius[m]>
what I have works fine but it feels wrong
<enebo[m]>
if this was printf and a parser for it then it would just probably construct formats with their specifiers but not really know anything about the fields
<headius[m]>
the other thing that seems odd to me is the fact that padding is a token on its own rather than being a modifier to the following token
<enebo[m]>
for this perhaps we have more necesary knowledge and it would know
<headius[m]>
like you could have a flag during parse that says "next token should pad by X" and then when you get the token you'd know how to pad
<headius[m]>
right that printf comment is just what I mean
<enebo[m]>
I need to look at code
<headius[m]>
like if you wrote a parser for printf, you wouldn't have 20 and "f" be different tokens for %20f
<headius[m]>
the term token is maybe confusing here but to me it's clear the whole item is "f" with padding "20"
<headius[m]>
or precision or whatever
<enebo[m]>
well I would have format('f', [specifiers]) where those specifiers would be '20' and other things but I would not do much pass that
<headius[m]>
right
<enebo[m]>
but yeah I think f would have a set of modifiers
<headius[m]>
that would be more than enough here and no hacks needed to lazily decide the padding char
<enebo[m]>
it would just be 20 but not say ' ' or 0 or how it is padded
<enebo[m]>
but you should be able to just know it is 20 of something and then go but for Z it is spaces
<headius[m]>
I just felt icky about fixing this broken lazy logic by making it more lazy
<headius[m]>
started to feel like a parrot implementer
<enebo[m]>
haha
<enebo[m]>
ok give me a few to remember what this code is
<enebo[m]>
did I write it?
<headius[m]>
ok
<headius[m]>
I thought so
<headius[m]>
you're my patron saint of parsers
<enebo[m]>
this particular file has you all over it but it was a move commit
<enebo[m]>
I am sure I wrote the lexer
<enebo[m]>
I do not think I wrote this...I dislike the notion of a List<Token> but I do remember optimizing that with a cache I think
<enebo[m]>
I half think someone wrote this for us and it was too slow then I added caching and it was quick cached
<headius[m]>
I don't believe I ever implemented the parser side of this that's consuming the tokens
<headius[m]>
but I have been known to forget
<enebo[m]>
Although the List<Token> doesn ot need to be reparsed each time so that was a benefit
<enebo[m]>
yeah I think someone contrib'd this
<enebo[m]>
All my parser that are not the main ones are simple static recursive descent parsers
<headius[m]>
ok
<headius[m]>
so maybe my fix is fine then
<enebo[m]>
So each option will have a token for FORMAT_SPECIAL before whatever is after it
<enebo[m]>
ok so compilePattern is not entirely a parser...it does parse the flex tokens but then just makes another list of tokens
<enebo[m]>
I think whatever processes this list should just notice their is a FORMAT SPECIAL and then when it processes the next token which is what to print (e.g. year) it then knows special said 20
<enebo[m]>
but I am looking at how we process these tokens now
<headius[m]>
that's what I did
<headius[m]>
oh more context here
<headius[m]>
the rewrite turns that special format into series of extra tokens
<headius[m]>
so it see s %F and replaces it with the equivalent of Y-m-d
<headius[m]>
the problem is that the padding selection is deferred until after that so it sees Y and thinks it wants a numeric padding
<headius[m]>
but it should have a text padding because it's an aggregate format
<headius[m]>
sorry I should have described the problem better at first
<headius[m]>
so yeah the problem is that the lazy pad logic comes after the rewriting
<headius[m]>
by that time it doesn't know it was supposed to be text padded
<enebo[m]>
All aggregates are more or less text?
<headius[m]>
yeah
<enebo[m]>
but then all of those aggregates combined nee to be 20 for %20F
<headius[m]>
see my patch, there were two special formats that appear to use numeric padding because they're just weird formats
<headius[m]>
Q I think was one
<headius[m]>
I checked all of these against MRI
subbu is now known as subbu|away
<enebo[m]>
ok I see why you are saying what you are saying
<enebo[m]>
if it is special no formatter is provided BY THE LEXER otherwise it makes one
<headius[m]>
right
<headius[m]>
and in my head it seemed like the lexer would be better set up to provide that
<headius[m]>
my code seems like me working around lack of information out of the lexer
<enebo[m]>
so all directives call this method directive which will possibly output special
<enebo[m]>
I would not have had this lex like this
<headius[m]>
since it's a contrib I release you from your obligation to assist, but maybe you will have an idea how to do it better
<enebo[m]>
I would have passed all the flags as tokens and you would read each flag until you came to the actual directive
<headius[m]>
strftime is a pretty simple format so it seems like this could all be parsed in one go
<headius[m]>
yeah that would be better too
<headius[m]>
I think the main ick here is that I have to rewrite the token stream
<enebo[m]>
well I wouldn't have but nonetheless
<headius[m]>
so that the subsequent interpreter will actually have the right information
<enebo[m]>
wow this is surprisingly tough to follow
<enebo[m]>
special is what?
<headius[m]>
it took me a while to figure out where to fix this
<headius[m]>
and I am not entirely happy with how I did it
<enebo[m]>
I can read the code but I don't get why it is not within the default map of format tokens
<headius[m]>
special are these multiformat things I guess
<headius[m]>
like %F gets translated into %Y-%m-%d
<headius[m]>
so it's "special"
<enebo[m]>
I have not run or print anything out so I am not really grokking as well as I could
<headius[m]>
so this is a pre-parse that converts "special" aggregate formats into their components
<headius[m]>
it's like a de-parser
<enebo[m]>
ok so the F gets sent on
<enebo[m]>
but with no formatter
<headius[m]>
well I'd say the F gets converted into Y-m-d and that is sent on, but then the later determination of padding has the wrong specified
<headius[m]>
specifier
<enebo[m]>
I feel like 'F' could have just been added to this list and had the token returned be the expanded list
<headius[m]>
so it rewrites F but does not rewrite the FORMAT_OPTION that came before it
<headius[m]>
until my patch
<headius[m]>
my patch basically says "ok I saw a FORMAT_OPTION... let's see what it applies to"
<headius[m]>
and then produces a new FORMAT_OPTION token based on special format rather than rewritten format
<headius[m]>
it's a weird impl
<headius[m]>
seems overcomplicated now that we are discussing it more
<enebo[m]>
I can sort of understand why they cleaved it into two passes
<headius[m]>
well I assume it's because of these special formats
<enebo[m]>
they decided to have the compound stuff expand in second phase
<headius[m]>
it parses out the special token and then splats it into its elements
<headius[m]>
yeah
<enebo[m]>
but the formatting applies to the compond value right?
<headius[m]>
right
<headius[m]>
and that's the bug
<headius[m]>
so now it has formatting determination in two places
<enebo[m]>
%20F is all three things and not just that it was numeric
<enebo[m]>
so it is really two problems
<headius[m]>
right
<enebo[m]>
how do you figure out that there are 5 more characters in that expansion?
<headius[m]>
eh?
<headius[m]>
oh for the padding
<enebo[m]>
maybe I misread how add to pattern works
<headius[m]>
🤔
<enebo[m]>
It sort of looked like it just put them all on as their own thing
<headius[m]>
I guess the padding count is determined after formatting the resulting text
<headius[m]>
because e.g. month long names would be different
<enebo[m]>
so you have the same length string as MRI with your fix?
<headius[m]>
I don't think I checked
<enebo[m]>
I would naively assume this fix only makes the year get spacing out to 20
<headius[m]>
the original report just said the character was wrong so I probably didn't think to check the length
<headius[m]>
well it's supposed to pad F
<headius[m]>
so it pads Y-m-d out to N chars
<enebo[m]>
yeah and with this fix I think it just pads Z
<enebo[m]>
or whatever the 4 digit year
<enebo[m]>
or at least that is what I think
<headius[m]>
it's a good theory
<enebo[m]>
I should probably apply your patch and try
<headius[m]>
if so then this is more broken than I realized and it needs to also determine the pad difference during special processing
<enebo[m]>
this formatting needs to almost be parens around a group of text
<headius[m]>
hmm
<headius[m]>
but it can't
<headius[m]>
because it doesn't know that F will be Y-m-d or how long those are
<enebo[m]>
The problem with this parser is it just translates and add formatter at front but it just broke it into n sub formats
<headius[m]>
well maybe it can hardcode this because that is a fixed-length format
<headius[m]>
4-2-2
<enebo[m]>
yeah I was wondering how many are known
<enebo[m]>
%c will not work
<enebo[m]>
unless it does some special padding
<enebo[m]>
'a b e' is variable
<headius[m]>
right I figured there'd be some variable length special formats
<headius[m]>
maybe you should rewrite it this afternoon
<enebo[m]>
I think this can be fixed generically
<enebo[m]>
instead of List<Token> it could be List<Token | List<Token>>
<enebo[m]>
if list is encountered it recursively calls format call on that sublist
<headius[m]>
hmm
<enebo[m]>
but the problem with that solution is formatting has to happen after
<enebo[m]>
I think whoever wrote this did not consider this at all and as such this is pretty busted
<enebo[m]>
wow this is really weird...didn't this use to cache the compiled form
<enebo[m]>
I must be thinking of another date/time parser in our codebase
<headius[m]>
hey unrelated, how is this bug still open and marked for 9.1.18?
<headius[m]>
it turns out it's working in 9.2.11.1 but I went to change the milestone and was shocked to see it was set to a released version
<enebo[m]>
haha
<headius[m]>
yeah 9.1.18 has three open issues even though it's a closed milestone
<headius[m]>
1.7.15 has two issues open
<enebo[m]>
Can you mark against a closed milestone?
<headius[m]>
that's it
<headius[m]>
yeah you can
<headius[m]>
this was clearly not fixed in 9.2.11.1 but it works there
<enebo[m]>
yeah that is fine
<headius[m]>
I'm not about to bisect releases back to 9.1.18
<headius[m]>
ok
<headius[m]>
looks like it's only these 5 bugs that are unfixed and marked for releases
<headius[m]>
I will review and clean up
<enebo[m]>
so strftime up until your "fix" did not look ahead (or at least not outside jflex) so I am confused why this is made into a List then walked
<enebo[m]>
I see no cache here at all
<headius[m]>
enebo: so the week is winding down here and this is in my queue... how do you want to proceed?
subbu|away is now known as subbu
nirvdrum has quit [Ping timeout: 260 seconds]
<enebo[m]>
landing it as-is is better than nothing
<enebo[m]>
I think ultimately we should rewrite this parser and add a cache