Use top_level.txt when analyzing pip modules#291
Use top_level.txt when analyzing pip modules#291nkubala merged 1 commit intoGoogleContainerTools:masterfrom
Conversation
| // Retrieves size for actual package/script corresponding to each dist-info metadata directory | ||
| // by taking the file entry alphabetically before it (for a package) or after it (for a script) | ||
| // First, try and use the "top_level.txt", | ||
| // Many egg packages contains a "top_level.txt" file describing the directories containing the |
There was a problem hiding this comment.
https://setuptools.readthedocs.io/en/latest/formats.html
The minimum project metadata that all eggs must have is a standard Python PKG-INFO file, named PKG-INFO and placed within the metadata directory appropriate to the format.
...
In addition to the PKG-INFO file, an egg’s metadata directory may also include files and directories representing various forms of optional standard metadata ...
And
https://www.python.org/dev/peps/pep-0427/#the-dist-info-directory
- Wheel .dist-info directories include at a minimum METADATA, WHEEL, and RECORD.
- METADATA is the package metadata, the same format as PKG-INFO as found at the root of sdis
So from the sound of if, it's either one or the other.
There was a problem hiding this comment.
nice, these could definitely be useful for getting the package name. however I don't see anywhere in the METADATA files the list of dependencies. that said, I do see what looks likes a total list of files in RECORD....this could be useful for wheels, but for eggs PKG-INFO still doesn't contain a list of dependencies. I think trying the top_level.txt is still the right way to go here.
There was a problem hiding this comment.
Yeah, eggs are pretty oldschool/inconsistent/a bit horrid.
No worries, maybe not as bulletproof, but reading top_level.txt is still an improvement and as you say, if it gives you extra information needed - it sounds like a good pragmatic choice.
Many egg modules contain a
top_level.txtfile, which contains metadata about the installed module's dependencies. Often the name of the egg module doesn't match up with the name of the directory containing the actual contents (e.g. a module namedPyYaml, with its contents in a directory calledyaml), so using this file is much more reliable than simple attempting to string match the directory. Additionally, this file gives much greater accuracy when computing the size of a package, especially when a package implicitly includes other dependencies.Partially addresses #281