设计与分析 - 设置覆盖问题

集合覆盖算法为许多现实世界的资源分配问题提供了解决方案。例如，考虑一家航空公司为每架飞机分配机组人员，以便有足够的人员来满足旅程的要求。他们会考虑航班时刻、持续时间、进站以及机组人员的可用性来将他们分配给航班。这就是集合覆盖算法的用武之地。

给定一个通用集合 U，包含很少的元素，这些元素都分为子集。考虑这些子集的集合为 S = {S ₁ , S ₂ , S ₃ , S ₄ ... S _n }，集合覆盖算法找到子集的最小数量，使得它们覆盖通用集合中存在的所有元素。

如上图所示，点代表通用集合U中存在的元素，它们被分为不同的集合，S = {S ₁ , S ₂ , S ₃ , S ₄ , S ₅ , S ₆ }。需要选择覆盖所有元素的最小集合数将是最优输出= {S ₁ , S ₂ , S ₃ }。

设置覆盖算法

集合覆盖将集合的集合作为输入，并返回包含所有通用元素所需的最小集合数。

集合覆盖算法是一个NP-Hard问题，也是一个2-近似贪心算法。

算法

步骤 1 - 初始化输出 = {}，其中输出表示元素的输出集。

步骤 2 - 虽然输出集不包含通用集中的所有元素，但请执行以下操作 -

使用公式 $\frac{Cost\left ( S_{i} \right )}{S_{i}-Output}$ 查找通用集中存在的每个子集的成本效益
查找执行的每次迭代具有最低成本效益的子集。将子集添加到输出集中。

步骤 3 - 重复步骤 2，直到宇宙中没有任何元素为止。实现的输出是最终的输出集。

伪代码

APPROX-GREEDY-SET_COVER(X, S)
   U = X
   OUTPUT = ф
   while U ≠ ф
      select S_i Є S which has maximum |S_i∩U|
   U = U – S
   OUTPUT = OUTPUT∪ {S_i}
return OUTPUT

分析

假设元素总数等于集合总数 (|X| = |S|)，则代码运行时间为 O(|X|3)

例子

让我们看一个更详细地描述集合覆盖问题的近似算法的示例

S₁ = {1, 2, 3, 4}                cost(S₁) = 5
S₂ = {2, 4, 5, 8, 10}            cost(S₂) = 10
S₃ = {1, 3, 5, 7, 9, 11, 13}     cost(S₃) = 20
S₄ = {4, 8, 12, 16, 20}          cost(S₄) = 12
S₅ = {5, 6, 7, 8, 9}             cost(S₅) = 15

步骤1

输出集，Output = ф

查找输出集中没有元素的每个集合的成本效益，

S₁ = cost(S₁) / (S₁ – Output) = 5 / (4 – 0)
S₂ = cost(S₂) / (S₂ – Output) = 10 / (5 – 0)
S₃ = cost(S₃) / (S₃ – Output) = 20 / (7 – 0)
S₄ = cost(S₄) / (S₄ – Output) = 12 / (5 – 0)
S₅ = cost(S₅) / (S₅ – Output) = 15 / (5 – 0)

本次迭代中的最小成本效益在 S ₁处实现，因此，将子集添加到输出集，Output = {S ₁ }，其中元素为 {1, 2, 3, 4}

第2步

查找输出集中新元素的每个集合的成本效益，

S₂ = cost(S₂) / (S₂ – Output) = 10 / (5 – 4)
S₃ = cost(S₃) / (S₃ – Output) = 20 / (7 – 4)
S₄ = cost(S₄) / (S₄ – Output) = 12 / (5 – 4)
S₅ = cost(S₅) / (S₅ – Output) = 15 / (5 – 4)

本次迭代中的最小成本效益在 S ₃处实现，因此，子集添加到输出集，Output = {S ₁ , S ₃ }，其中元素为 {1, 2, 3, 4, 5, 7, 9, 11 ，13}。

步骤3

查找输出集中新元素的每个集合的成本效益，

S₂ = cost(S₂) / (S₂ – Output) = 10 / |(5 – 9)|
S₄ = cost(S₄) / (S₄ – Output) = 12 / |(5 – 9)|
S₅ = cost(S₅) / (S₅ – Output) = 15 / |(5 – 9)|

本次迭代中的最小成本效益是在 S ₂处实现的，因此，将子集添加到输出集，输出 = {S ₁ , S ₃ , S ₂ }，其中元素为 {1, 2, 3, 4, 5, 7, 8, 9, 10, 11, 13}

步骤4

查找输出集中新元素的每个集合的成本效益，

S₄ = cost(S₄) / (S₄ – Output) = 12 / |(5 – 11)|
S₅ = cost(S₅) / (S₅ – Output) = 15 / |(5 – 11)|

本次迭代中的最小成本效益在 S ₄处实现，因此，子集添加到输出集，输出 = {S ₁ , S ₃ , S ₂ , S ₄ } ，元素为 {1, 2, 3, 4, 5 , 7, 8, 9, 10, 11, 12, 13, 16, 20}

步骤5

查找输出集中新元素的每个集合的成本效益，

S₅ = cost(S₅) / (S₅ – Output) = 15 / |(5 – 14)|

本次迭代中的最小成本效益是在 S ₅处实现的，因此，将子集添加到输出集，输出 = {S ₁ , S ₃ , S ₂ , S ₄ , S ₅ }，其中元素为 {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 16, 20}

涵盖通用有限集中所有元素的最终输出为 Output = {S ₁ , S ₃ , S ₂ , S ₄ , S ₅ }。

例子

C C++ Java Python

#include <stdio.h>
#define MAX_SETS 100
#define MAX_ELEMENTS 1000
int setCover(int X[], int S[][MAX_ELEMENTS], int numSets, int numElements, int output[]) {
    int U[MAX_ELEMENTS];
    for (int i = 0; i < numElements; i++) {
        U[i] = X[i];
    }
    int selectedSets[MAX_SETS];
    for (int i = 0; i < MAX_SETS; i++) {
        selectedSets[i] = 0; // Initialize all to 0 (not selected)
    }
    int outputIdx = 0;
    while (outputIdx < numSets) {  // Ensure we don't exceed the maximum number of sets
        int maxIntersectionSize = 0;
        int selectedSetIdx = -1;
        // Find the set Si with the maximum intersection with U
        for (int i = 0; i < numSets; i++) {
            if (selectedSets[i] == 0) { // Check if the set is not already selected
                int intersectionSize = 0;
                for (int j = 0; j < numElements; j++) {
                    if (U[j] && S[i][j]) {
                        intersectionSize++;
                    }
                }
                if (intersectionSize > maxIntersectionSize) {
                    maxIntersectionSize = intersectionSize;
                    selectedSetIdx = i;
                }
            }
        }
        // If no set found, break from the loop
        if (selectedSetIdx == -1) {
            break;
        }
        // Mark the selected set as "selected" in the array
        selectedSets[selectedSetIdx] = 1;
        // Remove the elements covered by the selected set from U
        for (int j = 0; j < numElements; j++) {
            U[j] = U[j] - S[selectedSetIdx][j];
        }
        // Add the selected set to the output
        output[outputIdx++] = selectedSetIdx;
    }
    return outputIdx;
}
int main() {
    int X[MAX_ELEMENTS] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    int S[MAX_SETS][MAX_ELEMENTS] = {
        {1, 1, 0, 0, 0, 0, 0, 0, 0, 0},
        {0, 1, 1, 1, 0, 0, 0, 0, 0, 0},
        {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
        {0, 0, 0, 0, 0, 1, 1, 1, 0, 0},
        {0, 0, 0, 0, 0, 0, 0, 1, 1, 1}
    };
    int numSets = 5;
    int numElements = 10;
    int output[MAX_SETS];
    int numSelectedSets = setCover(X, S, numSets, numElements, output);
    printf("Selected Sets: ");
    for (int i = 0; i < numSelectedSets; i++) {
        printf("%d ", output[i]);
    }
    printf("\n");
    return 0;
}

输出

Selected Sets: 1 2 3 4 0

#include <iostream>
#include <vector>
using namespace std;
#define MAX_SETS 100
#define MAX_ELEMENTS 1000
// Function to find the set cover using the Approximate Greedy Set Cover algorithm
int setCover(int X[], int S[][MAX_ELEMENTS], int numSets, int numElements, int output[])
{
    int U[MAX_ELEMENTS];
    for (int i = 0; i < numElements; i++) {
        U[i] = X[i];
    }
    int selectedSets[MAX_SETS];
    for (int i = 0; i < MAX_SETS; i++) {
        selectedSets[i] = 0; // Initialize all to 0 (not selected)
    }
    int outputIdx = 0;
    while (outputIdx < numSets) {  // Ensure we don't exceed the maximum number of sets
        int maxIntersectionSize = 0;
        int selectedSetIdx = -1;
        // Find the set Si with maximum intersection with U
        for (int i = 0; i < numSets; i++) {
            if (selectedSets[i] == 0) { // Check if the set is not already selected
                int intersectionSize = 0;
                for (int j = 0; j < numElements; j++) {
                    if (U[j] && S[i][j]) {
                        intersectionSize++;
                    }
                }
                if (intersectionSize > maxIntersectionSize) {
                    maxIntersectionSize = intersectionSize;
                    selectedSetIdx = i;
                }
            }
        }
        // If no set found, break from the loop
        if (selectedSetIdx == -1) {
            break;
        }
        // Mark the selected set as "selected" in the array
        selectedSets[selectedSetIdx] = 1;
        // Remove the elements covered by the selected set from U
        for (int j = 0; j < numElements; j++) {
            U[j] = U[j] - S[selectedSetIdx][j];
        }
        // Add the selected set to the output
        output[outputIdx++] = selectedSetIdx;
    }
    return outputIdx;
}
int main()
{
    int X[MAX_ELEMENTS] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
    int S[MAX_SETS][MAX_ELEMENTS] = {
        {1, 1, 0, 0, 0, 0, 0, 0, 0, 0},
        {0, 1, 1, 1, 0, 0, 0, 0, 0, 0},
        {0, 0, 0, 1, 1, 1, 0, 0, 0, 0},
        {0, 0, 0, 0, 0, 1, 1, 1, 0, 0},
        {0, 0, 0, 0, 0, 0, 0, 1, 1, 1}
    };
    int numSets = 5;
    int numElements = 10;
    int output[MAX_SETS];
    int numSelectedSets = setCover(X, S, numSets, numElements, output);
    cout << "Selected Sets: ";
    for (int i = 0; i < numSelectedSets; i++) {
        cout << output[i] << " ";
    }
    cout << endl;
    return 0;
}

输出

Selected Sets: 1 2 3 4 0

import java.util.*;
public class SetCover {
    public static List<Integer> setCover(int[] X, int[][] S) {
        Set<Integer> U = new HashSet<>();
        for (int x : X) {
            U.add(x);
        }
        List<Integer> output = new ArrayList<>();
        while (!U.isEmpty()) {
            int maxIntersectionSize = 0;
            int selectedSetIdx = -1;
            for (int i = 0; i < S.length; i++) {
                int intersectionSize = 0;
                for (int j = 0; j < S[i].length; j++) {
                    if (U.contains(S[i][j])) {
                        intersectionSize++;
                    }
                }
                if (intersectionSize > maxIntersectionSize) {
                    maxIntersectionSize = intersectionSize;
                    selectedSetIdx = i;
                }
            }
            if (selectedSetIdx == -1) {
                break;
            }
            for (int j = 0; j < S[selectedSetIdx].length; j++) {
                U.remove(S[selectedSetIdx][j]);
            }
            output.add(selectedSetIdx);
        }
        return output;
    }
    public static void main(String[] args) {
        int[] X = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
        int[][] S = {
            {1, 2},
            {2, 3, 4},
            {4, 5, 6},
            {6, 7, 8},
            {8, 9, 10}
        };
        List<Integer> selectedSets = setCover(X, S);
        System.out.print("Selected Sets: ");
        for (int idx : selectedSets) {
            System.out.print(idx + " ");
        }
        System.out.println();
    }
}

输出

Selected Sets: 1 3 4 0 2

def set_cover(X, S):
    U = set(X)
    output = []
    while U:
        max_intersection_size = 0
        selected_set_idx = -1
        for i, s in enumerate(S):
            intersection_size = len(U.intersection(s))
            if intersection_size > max_intersection_size:
                max_intersection_size = intersection_size
                selected_set_idx = i
        if selected_set_idx == -1:
            break
        U = U - set(S[selected_set_idx])
        output.append(selected_set_idx)
    return output
if __name__ == "__main__":
    X = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    S = [
        {1, 2},
        {2, 3, 4},
        {4, 5, 6},
        {6, 7, 8},
        {8, 9, 10}
    ]
    selected_sets = set_cover(X, S)
    print("Selected Sets:", selected_sets)

输出

Selected Sets: 1 3 4 0 2