CLRS 笔记 | The War Of Mine

今天开始算导和CSAPP和PRML一起看

笔记慢慢写

1. Lec 1: Intro & Linear-time Selection (randomized and deterministic); recursion trees.

Sorting in the comparison model

比较模型至少要比较nlgn次才能进行排序.

对比较模型来说, 每比较一次产生一次结果, 那么相当于一个输出产生一个结果, 这个结果可以是枚举产生, 那么对一个输入一共有M = n!种枚举结果, 只有其中的一个是正确的, 每比较一次就可以确定两个数的位置, 而其他操作是没有cost的, 所以这就可以去掉一半的结果, 结果就是最差情况下lgM 次比较后可以找到结果.

二分搜索
归并排序
快排(期望时间)/随机

因此对于比较来进行搜索排序的算法其最优为O(nlgn)

Sorting in the exchange model

对于交换模型

只需要比较全部大小后进行交换即可, 最坏情况下需要进行n-1次交换

这种模型对应

计数排序
基数排序
桶排序

都是线性排序算法O(n)

###An example: Median Finding(更一般的, 查找第Kth大的数)

One thing that makes algorithm design “Computer Science” is that solving a problem in the most
obvious way from its definitions is often not the best way to get a solution. A simple example of
this is median finding.
Recall the median. For a set of n elements, this is the element in this set that is the n/2
th smallest,
i.e., it has n/2 elements larger than it.1 Given an unsorted array, how quickly can one find the
median element? The definition gives us no clue: we can enumerate over all elements, and for each
check if it is the median. This gives a Θ(n
2
) time algorithm. Or one can sort, and then read off
the median, which takes O(n log n) time using MergeSort or HeapSort (deterministic) or QuickSort
(randomized).
Can one do it more quickly than by sorting? In this lecture we describe two linear-time algorithms
for this problem: one randomized and one deterministic. More generally, we solve the problem of
finding the kth smallest out of an unsorted array of n elements.

可以想到的简单解法是先排序, 后输出第K位数, 这样使用merge sort, heap sort, quick sort 都可以达到O(nlogn)的上界, 注意前面两者是决定性的算法, 而快排是随机化得算法, 其最坏的情况下是O(n^2)但是期望时间是O(nlogn)

一个随机化的线性算法

QuickSelect

pseudo code如下

//QuickSelect: Given array A of size n and integer 1 ≤ k ≤ n,

1. Pick a pivot element p at random from A.
2. Split A into subarrays Less and Greater by comparing each element to p as in Quicksort.
While we are at it, count the number L of elements going in to Less.
3. (a) If L = k − 1, then output p.
(b) If L > k − 1, output QuickSelect(Less, k).
(c) If L < k − 1, output QuickSelect(Greater, k − L − 1)

一个determinist的方法

DeterministicSelect:

这里不同是固定的找到一个PIVOT使他尽可能的靠近中位数, 而QUICKSELECT是随机使用一个PIVOT.这种方法也可以达到线性时间的复杂度.

//Given array A of size n and integer k ≤ n,
1. Group the array into n/5 groups of size 5 and find the median of each group. (For
simplicity, we will ignore integrality issues.)
2. Recursively, find the true median of the medians. Call this p.
3. Use p as a pivot to split the array into subarrays Less and Greater.
4. Recurse on the appropriate piece.

全部证明在这里

这里对递归时间复杂度的一个理解是如果二分递归从递归树的角度上, 每一层是N, 一共有LOGN层,时间复杂度为O(nlogn),既mergesort的情况如果只处理一边的话, 既quickselect的情况, 时间复杂度是n. 其上限可以证明是4N.

Lec 2: Concrete Models, Upper/Lower Bounds

这里证明一些常见模型的上下界, bound or tight bound

最后关于比较排序算法最优的时间复杂度为(nlogn)给出一些想法.

由于比较排序的限制性, 所以可以利用SET本身的性质来进行排序, 下面三种方法就是线性的排序算法

计数排序(特性为0-K的数列)
基数排序(特性为数字位数为D, 个数为N的数列)
桶排序 (均匀分布的数列)

这里对计数排序给出的一个线性算法之后会在开一篇博客写

Ch 5: 概率与随机算法

N人排队应聘问题, 调和函数的期望个数是o(lnn)+O(1)

这里用每一个人期望的数目全加起来做总期望

随机排列生成

By sorting (nlogn复杂度/ 这里要用到排序)
in place(用原址排列)线性时间/ 这里用扫描交换

Exercises

5.1-2

问题：

描述RANDOM(a,b)的过程的一种实现，它只调用RANDOM(0,1)。作为a和b的函数，你的程序的期望运行时间是多少？
注：RANDOM(0,1)以等概率输出0或者1，
要求RANDOM(a,b)以等概率输出[a,b]之间的数(整数)

解决方案：

     1，取 n=b-a+1，取最小的正整数m，使得 2^m >= n
     2，调用RANDOM(0,1)，输出m-bit位整数N   (  N >= 0 and N <= 2^m-1)
     3，  if   N >=0  and N <= b-a
                  then return a+N     
            else 重新执行步骤 2

这里用RANDOM(0,1)实现随机RANDOM(A,B) 可以看成二进制数, 然后输出二进制数

5.1-3

这里主要找到生成的某个数几率一样大就行了.

int temp1 = 1;
int temp2 = 1,;
int temp3 = 1;
int returnValue = 0;
where(temp3 == 1)
{
temp1 = BIASED-RANDOW;
temp2 = ~ BIASED-RANDOW; // ~符号是相反的意思，0变成1,1变成0
temp3 = temp1 + temp2;
if(temp3 == 0)
returnValue = 0;
else if(temp3 == 2)
returnValue = 1;
};

5.2-2

(N 2)的意义可以参考.

5.2-4

帽子核对问题, 与应聘问题类似

但是这里是独立的, 每个的期望是1/N, 乘以N, 所以期望是1

5.2-5

由上面可以看出随机生成的期望可以通过每一个个体的期望来全相加来得到全部期望

以此可以来分析一个算法的平均性能

// 但是很多时候分布是为止的, 这就需要随机算法了

对招聘问题随机算法随机枚举多次, 找出期望值