Package Manager Flow

Posted on 12/10/2022 | 🗒️ Edit on GitHub

In this post

Overview
Software Maintenance Technologies
Package Manager
User preferences
Scenarios
Requests
Terms
Solvers

Overview

We should eventually be able to describe the makeup of a package manager - akin to how the Open Containers Initiative has an image-spec, distribution-spec, and runtime-spec for containers, we want similar specs to describe actions and states for package management. I don’t have a holistic view yet of what this should look like, but I’ll include interesting notes that are relevant. Here is an early understanding of a general flow:

user preferences --> package manager --> translate to upgrade problem (CUDF) --> solver -->  --> CUDF-encoded solution --> scenario --> [if solution] --> action [else] rollback

The above says that we start with user preferences that are assembled by the package manager, hand them off to a solver (package manager agnostic, used like plugins) to come up with a solution. There are several ways to describe a solution (scenario) that determines if we move forward or not. The input and output to the solver is a CUDF document that describes the needed changes. There are also terms that can describe the scenarios terms

Software Maintenance Technologies

The “ingredients” of “software maintenance technologies” according to [1] are:

dependencies
conflicts
package managers with dependency solving capabilities

Package Manager

The package manager should generally have the following inputs and outputs [2]

Take as input:
- the current status of packages on the system
- a universe of all available packages (in other papers called U)
- a user request (e.g., “install xyz, remove xyz”)
- user preferences (e.g., “minimize updated software”)
Return as output:
- an upgrade plan: a list of actions to take on the system to reach the status that the user wants (e.g., installing thething)

And we can evaluate these steps based on expressivity - how empowered the user is to express preferences, and completeness - being able to propose a valid update plan (the output) when one exists.

In [2] they (mostly same) authors make a similar statement, saying that package managers should (this is verbatim):

devise upgrade plans that are correct (i.e. no plan that violates component expectations is proposed) and complete (i.e. every time asuitable plan exists, it can be found)
have performances that scale up gracefully at component repositories growth
empower users to express preferences on the desired component configuration when several options exist, which is often the case

User preferences

The user should be able to specify high level criteria to describe what they want (e.g., minimize changed packages). Of course the more criteria we define, the more likely we are to not be able to find a solution or have something pareto-optimal. Approaches to deal with this include:

Lexicographic: order criteria by importance (e.g., a security update gets ranked higher)
Weighted sum: aggregate criteria into single measure using some set of weights
lexmin and lexmax: (haven’t read about yet)

The authors in [3, P. 14] propose to do the following:

define a dictionary of criteria
define a dictionary of aggregation functions (e.g., the above)
write the user preference as an expression of the aggregation function (op) and directionality (e.g., +/-) for each criteria. E.g.,

lex(-removed, -changed)

Says use lexmin to aggregate, and minimize changed and removed packages. They present a nice table of optimization criteria:

And this is a neat idea because you can then describe specific scenarios (and name them)

paranoid=lex(−removed,−changed)
trendy=lex(−removed,−notuptodate,−unsatrec,−new)

Scenarios

The paper [3, P. 12] describes different solution scenarios, rooted in CUDF, and this would be a good terminology to harden to better describe our work.

We start with a CUDF document and define a solution to it as a subset (S) of the entire package universe (U). We then say that dom(S) (domain) are the pairs of packages (e.g., (name, version) in our subset (e.g., what CUDF is asking for) in S, and pro(S) (provided) the same (name, version) pairs of all that are available, which can be infinite (gulp!).

This defines an atomic package constraint. A subset S in U is (these are written verbatim from the paper):

abundant if every disjunction in the dependency of every package in S contains a package constraint that is satisfied by dom(S) ∪ pro(S). I think this is saying that a solution is abundant if every nested dependency has it’s own dependency needs met by either the set of pairs that we need, or the set of pairs available to us. It’s abundant possibly because it says that the universe is rich with everything that we need.
peaceful if no atomic package constraint in the conflicts of any package p∈S is satisfied by dom(S−{p}) ∪ pro(S−{p}). I think this is saying that if we removed p from both sets, there would be no conflict.

Requests

A subest S of the universe U satisfies a request:

install φ if every atomic constraint in φ is satisfied by dom(S) ∪ pro(S). This is saying we move forward to install if the universe is abundant?)
remove φ if no element of φ is satisfied by dom(S) ∪ pro(S).

A set S is only a solution if it is abundant, healthy, and satisfies the request. Solutions are allowed to have several packages with the same name and different versions.

Terms

We can further classify solutions based on:

correctness: have we satisfied the user request?
quality: do I like the solution?

Solvers

It looks like there are many solver technologies, as references by [3] in the MISC competition (Mancoosi International Solver Competition ref) including but not limited to:

boolean satisfiability (SAT)
Mixed Integer Linear Programming (MILP)
Answer Set Programming (ASP)
graph constraints

And I think we intend to work on one based on ABI compatability. Many of these can handle upgrade problems encoded as CUDF documents.

P. Abate, R. Di Cosmo, R. Treinen, and S. Zacchiroli, “Dependency solving: a separate concern in component evolution management,” Journal of Systems and Software, vol. 85, no. 10, pp. 2228–2240, 2012. Full details | PDF
P. Abate, R. Di Cosmo, G. Gousios, and S. Zacchiroli, “Dependency Solving Is Still Hard, but We Are Getting Better at It,” in 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), 2020, pp. 547–551. Full details | PDF
P. Abate, R. Di Cosmo, R. Treinen, and S. Zacchiroli, http://www.sciencedirect.com/science/article/pii/S0950584912001851“A modular package manager architecture”</a> Information and Software Technology, vol. 55, no. 2, pp. 459–474, 2013. Full details | PDF