-
Notifications
You must be signed in to change notification settings - Fork 225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alternate household allocation algorithm #168
Comments
Not sure how you want this submitted, but I played around with part of this algorithm and found a increase in speed in setup of the households array: I took this setup: households = []
adults = ages >= 18
if sum(adults) < len(household_sizes):
raise Exception('Not enough adults to fill houses')
ages = list(ages)
for _ in household_sizes:
for i,age in enumerate(ages):
if age >= 18:
households.append([age])
del ages[i]
break
# Add remaining people to households
for n, household in zip(household_sizes, households):
for i in range(n-1):
household.append(ages.pop(0)) Moved it into its own function then wrote a new version so I could time each: def setup_households_fast(ages, household_sizes):
households = []
ages = list(ages)
# sort the ages to put adults first then grab an adult for
# each household that we have and shuffle the order
ages.sort(reverse=True)
adults = ages[:len(household_sizes)]
if adults[-1] < 18:
raise Exception('Not enough adults to fill houses')
random.shuffle(adults)
# Take everyone not in the adult array and shuffle them around
remainder = ages[len(household_sizes):]
random.shuffle(remainder)
for size in household_sizes:
# Create a house with at least 1 adult then
# add people from the random ordered remainder to reach
# the house size
household = []
household.append(adults.pop(0))
size = size - 1
household += remainder[:size]
remainder = remainder[size:]
households.append(household)
return households |
|
I am about to go out of office for about ten days. But attached is some work where I was seeing what would happen if you tested different possible household combinations by spinning up a pool of processes. I was getting some decent speed improvements. I also did some other optimizations which are commented in the code. It is pretty tricky to get a base line time difference for large sets as it is really dependent on the starting household and how many swaps are required for get to a good quality score, so my timing results were all over the map. There are 2 versions in this zip file. The first v2.py which switches from using a the for loop to create all possible combinations to a permutation. It is a bit slower to build the list of possible outcome, however for larger datasets where household sizes get bigger it was somewhat faster as it seems to increase the quality score in larger jumps. The custom.py is a reimplementation of the original potential swaps algorithm but in a multi-processed approach. My best outcome from the 2 was v2.py building a set of 100,000 in about an hour with a quality score of 3.291. However, it is not really reproducible due to the random nature of the input. Anyways there are some ideas in there, take them or leave them 😅 |
@gwincr11 Thanks a lot for the optimizations, very much appreciate it! Doing it in parallel would definitely give a significant boost :) Hope you have a great vacation! |
Summary
In the attached files, make
allocate_people_simple()
scale better with population sizeFiles
allocation_functions.zip
Overview
The Covasim simulation needs to model individual households, but typically country data comes aggregated over individuals. This means that households need to be reconstructed based on statistical descriptions of ages, household sizes, and mixing between age groups. Currently the algorithm for doing this relies on having a 'reference person' in each household, but not all countries provide this data. So the challenge is to develop an algorithm that can operate without this information.
For this problem, there are three inputs
p=[5,48,32,12,68]
. The ages can be any positive integer but in practice would never exceed 125H=[1,4,2]
which represents 3 households, with 1 person, 4 people, and 2 people, respectively. The sum of the households is equal to the total number of people i.e.sum(H) == len(p)
so there are the same number of people as there are spaces in households to allocate them to. The household size can be assumed to be an positive integer less than 15.The rows of the matrix sum to 1. This matrix describes the proportion of contacts that a person with an age in a given row has with people of other ages, in each column. For the purpose of household contact allocation, we can assume that a household forms a fully-connected network of people, with an equal number of interactions between each household member. So for example, suppose that a household with 2 adults and 3 kids has ages
[29, 27, 1, 5, 12]
. Then we can produce an edge list containing all of the contact pairs here e.g.29-27
which can then be expressed in the form of an age mixing matrix by aggregating the edges. For example, consider the 29 year old individual. Their contribution to the age mixing matrix will be in the20-30
row since that's the age bin they fall into. And their contribution to that row is calculated by binning the ages of the other household members they interact with:Similarly, for the 1 year old, their interactions could be summarized as
Therefore, the age mixing matrix for the entire household is
This matrix represents contacts for a single household. We can accumulate the interactions across all households, and then normalize so that the sum across each row is
1
, ending up with something likeThus, the age mixing matrix for the population can be computed given an specification of the ages of people in each household e.g.
[[1,20],[30],[50,49,16,14]]
.Question
Given a collection of people with specified ages, and a collection of households with specified sizes (such that the number of places in the households matches the number of people), assign people to households such that the resulting age mixing matrix optimally matches a target age mixing matrix also provided as input. Any reasonable metric to quantify the match between the age mixing matrices can be used, but it is of course most convenient if the metric of choice is 0 if the two matrices are identical (e.g. Frobenius distance).
In the first instance, we can assume that the format of the age mixing matrix is fixed (i.e. a 16x16 matrix, 5 year age bins up to age 80) although ideally this could be changed (and in particular, it would be ideal if the bin width could vary e.g.
0-14
and15-64
as bins).Note that households must not consist entirely of individuals below a cutoff age (use 18 as an initial value). This ensures that every household contains at least one adult
Included files
inputs.xlsx
containing the mixing matrix and population-level data about the number of people in each age group and household size. These data are in a similar format to real country dataallocation_functions.py
which containssample_inputs()
to generate new inputs of variable size. The algorithm will need to perform well up to ~100000 people but it is probably useful to test on smaller populationsallocate_people_simple()
which contains a working algorithm. It needs to return the allocation of ages to householdsvalidate_allocation()
ensures that the allocation satisfies any constraints. Currently the only constraint is that each household must have at least one adultcompute_mixing_matrix()
which has a basic implementation to calculate the mixing matrix for a given allocation of ages to householdscompute_allocation_quality()
which as a basic implementation of a distance metric between the target mixing matrix and the mixing matrix from the proposed allocation. This can be changed if a different distance metric is used.Based on the code in
allocation_functions.py
what is needed is a different implementation ofallocate_people_simple()
that minimizes the result ofcompute_allocation_quality()
more efficiently. In particular, there are two key limitationsThere are some ways to speed up the calculation without changing the algorithm (e.g. tracking the contribution of each household to the mixing matrix, and only updating those for the two households being considered at a time). However, these would probably not be enough to allow the population size to scale up to 100000+, what would be really great is an approach that runs in less than quadratic time.
The text was updated successfully, but these errors were encountered: