Add internal URI handling API#19073
Conversation
TimWolla
left a comment
There was a problem hiding this comment.
Some first remarks. Did not yet look at everything.
There was a problem hiding this comment.
Defaulting to parse_url in a new API is probably not a good idea. Instead the “legacy” users should just pass "parse_url" explicitly.
There was a problem hiding this comment.
Defaulting to parse_url here works because that's the default indeed where php_uri_get_handler() is called, the other "backends" can only be used if the config is explicitly passed (not null).
The other reason why I opted for this approach is that it would be inconvenient to create and free a new zend_string when the legacy implementation is needed, and I wanted to avoid adding a known string just for this purpose, or exposing the C string based uri_handler_by_name function instead.
TimWolla
left a comment
There was a problem hiding this comment.
I've looked at this again and I must say that I'm having trouble meaningfully reviewing this. It adds a large amount of code with unclear purpose and confusing (to me) naming.
|
I'll try to take another look tomorrow. |
|
@nielsdos Do you see anything that I should fix before merging it? I'd like to implement some of the cleanups that we discussed |
ndossche
left a comment
There was a problem hiding this comment.
Looks mostly good, FWIW I agree with Tim on UNREACHABLE
There was a problem hiding this comment.
I wanted to keep these assertions even though two handlers of the legacy parser became NULL. So I excluded this URI handler from the checks.
There was a problem hiding this comment.
Assertions with exceptions to the rule are a bit iffy IMHO
There was a problem hiding this comment.
Yes, I can understand your POV. Do you think it's better to get rid of the latter two special cased assertions?
There was a problem hiding this comment.
As I renamed the legacy implementation to "", I wanted to make sure that other implementations cannot use the same handler.
Although, I think this assertion doesn't make any sense. Especially because module initialization will surely fail because of the name collision.
There was a problem hiding this comment.
Huh? Why did you change the name to "" though?
There was a problem hiding this comment.
Mainly because I wanted to avoid this to work:
filter_var("qwe", FILTER_VALIDATE_URL, ["uri_parser_class" => "parse_url"])
even though null is the suggested value to use for the "uri_parser_class" config. As "" is the most similar string value to null that can be stored in a uri_handlers hash table, I figured it's a better choice than parse_url.
Or do you have any better solutions in mind?
There was a problem hiding this comment.
I think "" is weird from a user point of view.
I still don't think I quite understand the problem though.
There was a problem hiding this comment.
I just found it weird that one can pass parse_url to use the legacy parser, while null is already dedicated for the purpose. Defaulting to null is convenient for the internal API: #19073 (comment).
Since "" is very close to null, I figured that it would be a better choice to use than an arbitrary name like parse_url. It's not really likely that users have to ever use this value (we should only document null), but I have to choose some string value for the name to be able to store the handler in the HashTable of the handlers. I could choose not to store it at all, and rather add an extra if to uri_handler_by_name (instead of php_uri_get_handler) for retrieving the legacy handler, but doing so would slightly slow down all usages, not just the ones that come outside of ext/uri.
I hope I managed to describe my problem 🤔 In any case, I don't have any hard feelings against each name, so I can live with any choice.
There was a problem hiding this comment.
So what if we want to change the default later or even drop parse_url?
There was a problem hiding this comment.
Hmm.. so do you suggest that the implementation should have a name so that it can be referenced even if it's not the default one... I guess that makes sense... Thinking about it long-term, a phase-out will likely look like that some other implementation is made the default (likely RFC 3986), and then a few years later parse_url is removed... I guess I got your point now.
ndossche
left a comment
There was a problem hiding this comment.
I think this is getting pretty close to being merge-able
There was a problem hiding this comment.
Huh? Why did you change the name to "" though?
There was a problem hiding this comment.
Assertions with exceptions to the rule are a bit iffy IMHO
|
I'll fix the CI probably tomorrow night |
bd91ee6 to
006730c
Compare
There was a problem hiding this comment.
I noticed that still the normalized URI components were retrieved in multiple places where the raw version would have made more sense, so I changed these.
Generally, I think the raw format should be used internally whenever possible. I am also considering to return an error if an unsupported format is required (e.g. the WHATWG implementation doesn't support normalization - except for the host whose unicode representation can be returned).
There was a problem hiding this comment.
I realized that the php_raw_url_decode() is conformant to RFC 3986.
There was a problem hiding this comment.
Can you clarify what you mean with this remark/change?
There was a problem hiding this comment.
Maybe you missed that initially, I used php_url_decode(). A bit later I realized that this function is not conformant to RFC 3986 URL ecoding (due to this line: https://github.com/php/php-src/blob/006730c0df454b29c7e6a0ac6b9149bfb179b1a9/ext/standard/url.c#L759), so I changed it to php_raw_url_decode which is conformant to it.
There was a problem hiding this comment.
I understand. (It's also a bit hard keeping track of changes in large PRs)
All fine then.
ndossche
left a comment
There was a problem hiding this comment.
Please see my last question, otherwise looks fine to me. Let the cleanup begin.
|
Just out of curiosity and for planning, I seem to recall that the URI extension implementation was going to be split into multiple parts - is this the last part? Or are there more PRs to be expected (non-cleanup PRs that is)? |
DanielEScherzer
left a comment
There was a problem hiding this comment.
ext/reflection changes look good to me
|
I've just rebased to latest master, added internal upgrading notes + news + renamed the |
Part of #14461