Programmers often have a preference among program constructs. For example, some may prefer if(0==a), while others may prefer if(!a). Analyzing such patterns can help to narrow down a programmer's identity, which is useful for detecting plagiarism.
Now given some text sampled from someone's program, can you find the person's most commonly used pattern of a specific length?
Input Specification:
Each input file contains one test case. For each case, there is one line consisting of the pattern length N (1≤N≤1048576), followed by one line no less than N and no more than 1048576 characters in length, terminated by a carriage return \n. The entire input is case sensitive.
Output Specification:
For each test case, print in one line the length-N substring that occurs most frequently in the input, followed by a space and the number of times it has occurred in the input. If there are multiple such substrings, print the lexicographically smallest one.
Whitespace characters in the input should be printed as they are. Also note that there may be multiple occurrences of the same substring overlapping each other.
Sample Input 1:
1 2
4 //A can can can a can.
Sample Output 1:
1
can 4
Sample Input 2:
1 2
3 int a=~~~~~~~~~~~~~~~~~~~~~0;
Sample Output 2:
1
~~~ 19
2. 题解
分析
后缀数组实现
这道题可以利用后缀数组解决。求出字符串的后缀数组后,再进一步求出 height 数组。由于 height[i] 表示的是排名为 i 的后缀与排名为 i−1 的后缀的最长公共前缀长度,且后缀是根据字典序进行排名的。对于连续的 cnt 个 height[i]>=n,说明某一长度为 n 的子串在字符串中出现了 cnt+1 次。而反过来想,对于字符串中长度为 n 的子串,以其首字母为开头的后缀,对应的排名一定是连续的,且排名相邻的后缀的最长公共前缀至少为 n 。因此求出最大的 cnt+1 即为最终答案。