Julia - 字典和集合

到目前为止，我们看到的许多函数都适用于数组和元组。数组只是一种类型的集合，但 Julia 也有其他类型的集合。其中一种集合是将键与值关联起来的 Dictionary 对象。这就是为什么它被称为“关联集合”。

为了更好地理解它，我们可以将其与简单的查找表进行比较，在查找表中组织了多种类型的数据，并为我们提供了单一的信息，例如数字、字符串或符号，称为键。它并没有为我们提供相应的数据值。

创建字典

创建简单字典的语法如下 -

Dict(“key1” => value1, “key2” => value2,,…, “keyn” => valuen)

在上面的语法中，key1，key2…keyn 是键，value1，value2，…valuen 是对应的值。运算符 => 是 Pair() 函数。我们不能有两个同名的键，因为键在字典中总是唯一的。

例子

julia> first_dict = Dict("X" => 100, "Y" => 110, "Z" => 220)
Dict{String,Int64} with 3 entries:
 "Y" => 110
 "Z" => 220
 "X" => 100

我们还可以借助理解语法来创建字典。下面给出了示例 -

例子

julia> first_dict = Dict(string(x) => sind(x) for x = 0:5:360)
Dict{String,Float64} with 73 entries:
 "320" => -0.642788
 "65" => 0.906308
 "155" => 0.422618
 "335" => -0.422618
 "75" => 0.965926
 "50" => 0.766044
 ⋮ => ⋮

按键

如前所述，字典具有唯一的键。这意味着，如果我们为已经存在的键分配一个值，我们不会创建新的键，而是修改现有的键。以下是有关键的字典的一些操作 -

寻找钥匙

我们可以使用haskey()函数来检查字典是否包含键 -

julia> first_dict = Dict("X" => 100, "Y" => 110, "Z" => 220)
Dict{String,Int64} with 3 entries:
 "Y" => 110
 "Z" => 220
 "X" => 100
 
julia> haskey(first_dict, "Z")
true

julia> haskey(first_dict, "A")
false

搜索键/值对

我们可以使用in()函数来检查字典是否包含键/值对 -

julia> in(("X" => 100), first_dict)
true

julia> in(("X" => 220), first_dict)
false

添加新的键值对

我们可以在现有字典中添加新的键值，如下所示 -

julia> first_dict["R"] = 400
400

julia> first_dict
Dict{String,Int64} with 4 entries:
 "Y" => 110
 "Z" => 220
 "X" => 100
 "R" => 400

删除一个键

我们可以使用delete!()函数从现有字典中删除键 -

julia> delete!(first_dict, "R")
Dict{String,Int64} with 3 entries:
 "Y" => 110
 "Z" => 220
 "X" => 100

获取所有钥匙

我们可以使用keys()函数从现有字典中获取所有键 -

julia> keys(first_dict)
Base.KeySet for a Dict{String,Int64} with 3 entries. Keys:
 "Y"
 "Z"
 "X"

价值观

字典中的每个键都有对应的值。以下是有关值的字典的一些操作 -

检索所有值

我们可以使用values()函数从现有字典中获取所有值 -

julia> values(first_dict)
Base.ValueIterator for a Dict{String,Int64} with 3 entries. Values:
 110
 220
 100

字典作为可迭代对象

我们可以处理每个键/值对以查看字典实际上是可迭代对象 -

for kv in first_dict
         println(kv)
      end
 "Y" => 110
 "Z" => 220
 "X" => 100

这里的kv是一个包含每个键/值对的元组。

对字典进行排序

字典不以任何特定顺序存储键，因此字典的输出不会是排序数组。为了按顺序获取项目，我们可以对字典进行排序 -

例子

julia> first_dict = Dict("R" => 100, "S" => 220, "T" => 350, "U" => 400, "V" => 575, "W" => 670)
Dict{String,Int64} with 6 entries:
 "S" => 220
 "U" => 400
 "T" => 350
 "W" => 670
 "V" => 575
 "R" => 100
julia> for key in sort(collect(keys(first_dict)))
         println("$key => $(first_dict[key])")
         end
R => 100
S => 220
T => 350
U => 400
V => 575
W => 670

我们还可以使用DataStructures.ji Julia 包中的SortedDict数据类型来确保字典始终保持排序状态。您可以检查下面的示例 -

例子

julia> import DataStructures
julia> first_dict = DataStructures.SortedDict("S" => 220, "T" => 350, "U" => 400, "V" => 575, "W" => 670)
DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 5 entries:
 "S" => 220
 "T" => 350
 "U" => 400
 "V" => 575
 "W" => 670
julia> first_dict["R"] = 100
100
julia> first_dict
DataStructures.SortedDict{String,Int64,Base.Order.ForwardOrdering} with 6 entries:
 “R” => 100
 “S” => 220
 “T” => 350
 “U” => 400
 “V” => 575
 “W” => 670

字数统计示例

字典的简单应用之一是计算每个单词在文本中出现的次数。该应用程序背后的概念是，每个单词都是一个键值集，该键的值是特定单词在该文本片段中出现的次数。

在下面的示例中，我们将计算文件名 NLP.txtb（保存在桌面上）中的单词数 -

julia> f = open("C://Users//Leekha//Desktop//NLP.txt")
IOStream()

julia> wordlist = String[]
String[]

julia> for line in eachline(f)
            words = split(line, r"\W")
            map(w -> push!(wordlist, lowercase(w)), words)
         end
 julia> filter!(!isempty, wordlist)
984-element Array{String,1}:
 "natural"
 "language"
 "processing"
 "semantic"
 "analysis"
 "introduction"
 "to"
 "semantic"
 "analysis"
 "the"
 "purpose"
   ……………………
   ……………………
julia> close(f)

从上面的输出中我们可以看到，wordlist 现在是一个包含 984 个元素的数组。

我们可以创建一个字典来存储单词和字数 -

julia> wordcounts = Dict{String,Int64}()
Dict{String,Int64}()

julia> for word in wordlist
            wordcounts[word]=get(wordcounts, word, 0) + 1
         end

要找出单词出现的次数，我们可以在字典中查找单词，如下所示 -

julia> wordcounts["natural"]
1

julia> wordcounts["processing"]
1

julia> wordcounts["and"]
14

我们还可以按如下方式对字典进行排序 -

julia> for i in sort(collect(keys(wordcounts)))
         println("$i, $(wordcounts[i])")
      end
1, 2
2, 2
3, 2
4, 2
5, 1
a, 28
about, 3
above, 2
act, 1
affixes, 3
all, 2
also, 5
an, 5
analysis, 15
analyze, 1
analyzed, 1
analyzer, 2
and, 14
answer, 5
antonymies, 1
antonymy, 1
application, 3
are, 11
…
…
…
…

为了找到最常见的单词，我们可以使用collect()将字典转换为元组数组，然后按如下方式对数组进行排序 -

julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true)
276-element Array{Pair{String,Int64},1}:
            "the" => 76
             "of" => 47
             "is" => 39
              "a" => 28
          "words" => 23
        "meaning" => 23
       "semantic" => 22
        "lexical" => 21
       "analysis" => 15
            "and" => 14
             "in" => 14
             "be" => 13
             "it" => 13
        "example" => 13
             "or" => 12
           "word" => 12
            "for" => 11
            "are" => 11
        "between" => 11
             "as" => 11
                  ⋮
            "each" => 1
           "river" => 1
         "homonym" => 1
  "classification" => 1
         "analyze" => 1
       "nocturnal" => 1
            "axis" => 1
         "concept" => 1
           "deals" => 1
          "larger" => 1
         "destiny" => 1
            "what" => 1
     "reservation" => 1
"characterization" => 1
          "second" => 1
       "certitude" => 1
            "into" => 1
        "compound" => 1
    "introduction" => 1

我们可以检查前 10 个单词，如下所示 -

julia> sort(collect(wordcounts), by = tuple -> last(tuple), rev=true)[1:10]
10-element Array{Pair{String,Int64},1}:
      "the" => 76
       "of" => 47
       "is" => 39
        "a" => 28
    "words" => 23
  "meaning" => 23
 "semantic" => 22
  "lexical" => 21
 "analysis" => 15
      "and" => 14

我们可以使用filter()函数来查找以特定字母表（例如“n”）开头的所有单词。

julia> filter(tuple -> startswith(first(tuple), "n") && last(tuple) < 4, collect(wordcounts))
6-element Array{Pair{String,Int64},1}:
      "none" => 2
       "not" => 3
    "namely" => 1
      "name" => 1
   "natural" => 1
 "nocturnal" => 1

套

与数组或字典一样，集合可以定义为唯一元素的集合。以下是集合和其他类型集合之间的区别 -

在集合中，每个元素只能有一个。
集合中元素的顺序并不重要。

创建一个集合

在Set构造函数的帮助下，我们可以创建一个集合，如下所示 -

julia> var_color = Set()
Set{Any}()

我们还可以指定集合的类型，如下所示 -

julia> num_primes = Set{Int64}()
Set{Int64}()

我们还可以创建并填充集合，如下所示 -

julia> var_color = Set{String}(["red","green","blue"])
Set{String} with 3 elements:
 "blue"
 "green"
 "red"

或者，我们也可以使用push!()函数作为数组，在集合中添加元素，如下所示 -

julia> push!(var_color, "black")
Set{String} with 4 elements:
 "blue"
 "green"
 "black"
 "red"

我们可以使用in()函数来检查集合中有什么 -

julia> in("red", var_color)
true

julia> in("yellow", var_color)
false

标准操作

并集、交集和差集是我们可以对集合执行的一些标准运算。这些操作对应的函数是union()、intersect()和setdiff()。

联盟

一般来说，联合（集合）运算返回两个语句的组合结果。

例子

julia> color_rainbow = Set(["red","orange","yellow","green","blue","indigo","violet"])
Set{String} with 7 elements:
 "indigo"
 "yellow"
 "orange"
 "blue"
 "violet"
 "green"
 "red"
 
julia> union(var_color, color_rainbow)
Set{String} with 8 elements:
 "indigo"
 "yellow"
 "orange"
 "blue"
 "violet"
 "green"
 "black"
 "red"

路口

通常，交集运算将两个或多个变量作为输入并返回它们之间的交集。

例子

julia> intersect(var_color, color_rainbow)
Set{String} with 3 elements:
 "blue"
 "green"
 "red"

不同之处

一般来说，差分运算需要两个或多个变量作为输入。然后，它返回第一组的值，不包括与第二组重叠的值。

例子

julia> setdiff(var_color, color_rainbow)
Set{String} with 1 element:
 "black"

字典的一些功能

在下面的示例中，您将看到适用于数组和集合的函数也适用于字典等集合 -

julia> dict1 = Dict(100=>"X", 220 => "Y")
Dict{Int64,String} with 2 entries:
 100 => "X"
 220 => "Y"
 
julia> dict2 = Dict(220 => "Y", 300 => "Z", 450 => "W")
Dict{Int64,String} with 3 entries:
 450 => "W"
 220 => "Y"
 300 => "Z"

联盟

julia> union(dict1, dict2)
4-element Array{Pair{Int64,String},1}:
 100 => "X"
 220 => "Y"
 450 => "W"
 300 => "Z"

相交

julia> intersect(dict1, dict2)
1-element Array{Pair{Int64,String},1}:
 220 => "Y"

不同之处

julia> setdiff(dict1, dict2)
1-element Array{Pair{Int64,String},1}:
 100 => "X"

合并两个字典

julia> merge(dict1, dict2)
Dict{Int64,String} with 4 entries:
 100 => "X"
 450 => "W"
 220 => "Y"
 300 => "Z"

寻找最小元素

julia> dict1
Dict{Int64,String} with 2 entries:
 100 => "X"
 220 => "Y"
 
 
julia> findmin(dict1)
("X", 100)