Deriving Monad Comprehension Programmatically

List comprehension, 或者说更 general 一点的 monad comprehension, 都是可以在 Lean 里边实现的，借助其强大的元编程功能。

实际上这个东西只是我一个库（LazyList，模拟Haskell惰性求值的表）的副产物，但具体实现的时候还是遇到了一些意想不到的地方，所以写成一篇文章来记录。关于所有的代码，见

lazylist

notch1p • Updated Jul 4, 2025

Introducing Comprehension

Basics

The notation for list comprehension is the operational interpretation of a set-builder notation.

集合在类型论里边通常用 dependent pair/coproduct 来表示。给定 universe , 有 formation rule

把 instantiate 成 , 就得到集合的一个 encoding:

将之记作 , 表示所有以类型的值为元素的集合。当然，如果需要编码一个完整的集合论，同样还应该包括集合论的外延公理 setext ，这个可以从函数外延和命题外延导出，而且涉及到几种不同 flavor （intensional/extensional/observational）的类型论之间处理函数外延的区别，是一个很有意思的东西，但是这里不多做说明。

（HoTT 则一般用 h-set 或者直接 , 但这里不讨论这个）

一句话即集合是类型上的谓语，可以发现这个定义本质上是一个满足的的子集/subtype.

比较一下（ Set Subtype 是 Lean 自带的语法）：


{(x, y) | (x > 2) ∧ y < 2}

Set from mathlib


{xy : ℕ × ℕ // xy.1 > 2 ∧ xy.2 < 2}

Subtype


[(x, y) | x <- [0..], y <- [0..], x > 2 && y < 2]

Haskell 的列表推导式


(loop for x from 0
  collect
    (loop for y from 0
      when (and (> x 2) (< y 2)) 
        nconc (cons x y))
  into ls
  finally (return (remove-if #'null ls)))

不得不品尝我们 cl 抽象的 loop DSL


[(x, y) for x in [0..] for y in [0..] if x > 2 and y < 2]

假设 Python 真的支持惰性求值的话

给某个语言加入推导式，难点有四个：

从上面 type theoretic 的定义可以看出，集合构建式说到底还是一个命题——还不具有 operational semantics. 这要我们去赋予；

此外其通常用于描述无穷可数集合，然而在无穷集合/序列上做计算，对于Haskell之类 lazy 的语言是较为简单的；但此外的绝大多数即 strict langauge 是不好操作的，要求一个惰性的数据结构以及相适应的方法，在这里就是我实现的 LazyList.

有隐含的 pattern matching (比如 pair 的 destructuring)，虽然这里没有发生：只要语言支持 pattern matching 就没有任何问题，否则应该先去做 pattern matching.

区分 sequential / parallel. 即嵌套的 map 和平行 map ；命令式的话就是 for 循环：嵌套其实就是计算笛卡儿积，一般用于 nondeterministic search，平行则可以翻译成若干个 zip 或 zipWith .

Translating by Examples: Sequential

（这之后都用 Lean，我们会实现和 Haskell 类似的推导式）

对于上面这个例子，应该可以立即反应出来该如何翻译（只要知道推导式的语义的话）：


[(x, y) | x <- [0..], y <- [0..], x > 2 && y < 2] :=
  flatten $ filter (· matches some _) $
  [0..].map fun x => 
    [0..].map fun y => 
      if x > 2 && y < 2 then some (x, y) else none

发现有几个细节

虽然在这里 y <- [0..] 中不含 x ，但是 Haskell 的推导式是有 sequential binding 的——也就是说 x <- e1, y <- e2,.. 中 x 在 e2 中有定义， y 在其之后的表达式中有定义。

单纯的嵌套 map 肯定是不行，否则类型就是 List (List (Nat × Nat)) ，需要用 flatten 展平。

x > 2 && y < 2 这个 predicative filtering 可以把 (x, y) 即我们用来 map 的函数的 codomain 给 lift 到 Option 中，构造一个 anamorphism，说人话就是把 Nat -> Nat × Nat 转换成 Nat -> Option (Nat × Nat) .

正是以下 F-coalgebra :

特别地，我们希望直接构造出，最好是能够进行一些系统性的句法（syntactic）上的变换，因为 Lean 是 strict 的，采用复合不合适。
我们希望能把和 filterMap fuse 起来，减少一次遍历
其中 filterMap id = filter (· matches some _)

性能如何？虽然函数式已经没有性能可言，但是能提升的地方当然应该提升，不妨看看遍历性能：

嵌套 map 两次，这个不可能没有
flatten 一次
filter 一次

我们发现，一个简简单单的东西居然要在一个列表上跑 4 遍，这实在是很难说得过去。

（这是因为 Lean 不是 call-by-name/need，没有办法直接复合这几个操作，Haskell 在这里的翻译就直观得多）。

Translation #1

根据上面的分析，我们先设计一个 sequential comprehension 的语法：


comp ::=
  ...
  | [$exp (where $p)? | ($pat <- $exp' in)+]

其中 p 是谓词，和 exp 一样允许 pat 中绑定的任何符号出现。

写成 Lean:


syntax "[" term " | " withoutPosition(sepBy((term (" ← " <|> " <- ") term), " in ")) "]" : term
syntax "[" term " where " term " | " withoutPosition(sepBy((term (" ← " <|> " <- ") term), " in ")) "]" : ter

性能改进

flatMap

我们发现 map 和 flatten 是可以复合起来的，即 flatMap 或 concatMap 或 >>= . 这个东西，和 monad 都是说烂了的玩意儿，没什么好说的。我就特别提一下 LazyList 的实现。首先给一个定义：


inductive LazyList (α : Type u)
  | nil
  | cons : α -> Thunk (LazyList α) -> LazyList α
-- ... (中略)
def appendH : LazyList α -> Thunk (LazyList α) -> LazyList α
  | [||], l => l.get
  | x ::' xs, l => x ::' appendH xs.get l

def append : LazyList α -> LazyList α -> LazyList α
  | nil, l => l
  | x ::' xs, l => x ::' ↑(append xs.get l)

instance : Append (LazyList α) := ⟨append⟩
instance : HAppend (LazyList α) (Thunk $ LazyList α) (LazyList α) := ⟨appendH⟩

-- ... (中略)
def flatMap (f : α -> LazyList β) : LazyList α -> LazyList β
  | nil => nil
  | x ::' xs => (f x) ++ (Thunk.mk $ fun _ => flatMap f xs.get)

可以看到， append appendH 都是 lazy 的。其中 ↑ 表示 coercion, 因为存在一个 α 到 Thunk α 的 Coe instance.

(++) 同时 dispatch 到 homogeneous 的 Append 和 heterogeneous 的 HAppend

(s₁ ++ s₂) 会根据 s₂ 的类型从这俩函数中选取一个。static multiple dispatch 确实是很爽。

那么，很自然的就会去想，依赖这两个方法的方法，是不是也是 lazy 的呢——显然不全是的。

因为绝大多数语言包括 Lean 都是 call-by-value，考虑第二个分支内的 recursive call 的两种情况

f x ++ flatMap f xs.get

f x ++ (Thunk.mk fun _ => flatMap f xs.get)

我们发现虽然不管哪个 (++) 都是 lazy 的，但是 flatMap 自身也是 recursive 且 recursive call 发生在 (++) 的参数位置，根据 CBV 定义，Lean 会先求参数，而不是直接代入，因而在一个无穷序列上计算时仍然会爆，所以情况一肯定会无法终止而爆栈。情况二则相当于是模拟了惰性求值将参数代入 (++) 的一步，只不过这一步被我们放在了 caller 而不是 callee 中： flatMap 的 recursive call 被包在一个函数中，所以肯定不会 eagar eval.

上面是题外话，继续看如何优化。

我们 fuse 了 flatten 和 map ，那么应该放在哪里呢？嵌套的 Function application 都是从内向外展开的，对于一个链

考虑最内层的 map ，其 map 的应该是一个结构 F (..) 其中 (..) 一定不可能包含更多的 F （因为已经在最内层），这说明如果我们需要 F α 且要求，除了最内层的 map 应当保留之外，其他 map 全部应该改写为 flatMap 即

那么这样就减少了一次遍历（实际上是次，考虑个列表）。


[(x, y) where x > 2 && y < 2 | x <- [0..] in y <- [0..]] :=
  filter (· matches some _) $
  [0..].flatMap fun x => 
    [0..].filterMap fun y => 
      if x > 2 && y < 2 then some (x, y) else none

构造 `f'` 并 fuse `filterMap`

上面说过，如果能从AST的角度直接构造 f' 就好了。还是考虑上面那个例子：根据我们的语法，

exp 会被绑定到 (x, y)

出现的两个 binder pat （以后也可能是 pattern）

x 和 y
fun pat => e 已经支持 pattern matching，我们不必更改。

p 会被绑定到 x > 2 && y < 2

而翻译之后的产物是 if x > 2 && y < 2 then some (x, y) else none —— 很显然我们发现了两个 hole if (·) ... then some (·) .

那么完全可以写一个 macro 来方便我们构造：

（这实际上就是一条 denotational semantics 规则）


local macro f:term:50 " depends_on! " p:term:50 : term =>
  `(if $p then Option.some $f else Option.none)

根据上面的优化，我们已经把最内层以外的 map 全部优化成 flatMap 了，接下来想想，能不能优化掉 filter 呢？答案当然是可以的。我们只需要把 map 改成 filterMap 即可，且不管有没有 where 从句，我们统一把所有的 exp 变换成 exp depends_on! p 的形式，也就是说：

e ⇒ `($exp depends_on! Bool.true)

e where p ⇒ `($exp depends_on! $p)

亦即


[(x, y) where x > 2 && y < 2 | x <- [0..] in y <- [0..]] :=
  [0..].flatMap fun x => 
    [0..].filterMap fun y => 
      if x > 2 && y < 2 then some (x, y) else none

Sequential Binding

我们发现，如果使用上面的翻译，就已经是 sequential binding 了， x y 的 binding 已经具备先后顺序，这很好，我们不必多做处理。

Generalizing

因 flatMap = flip (· >>= ·) ，我们完全可以将

所有的 flatMap 替换成 bind

所有的 map 替换成 (· <$> ·)

然后这个语法就供所有 Monad 通用了。

之所以前面不提，是因为我感觉 Lean 的开发者不喜欢把 List 当成 Monad 来推理，诸如 Option 之类的都有 Monad instance，唯独 List 没有（但是 Functor 是有的）。

那么我们也入乡随俗，就不用 Monad ，我们用 Bind （虽然根本没差）。同时因为 Monad 没有 filterMap ，但是我们能够为同时实现了 Monad 和 Alternative

(本质上是因为实现 Alternative 相当于定义了某个 Monad 的 zero element)

去 derive 一个 filterMap . 我们把包含有 map 和 filterMap 的 class 称为 Mappable :



--   Extends `Functor`, with the addition of `filterMap`, for collections especially.
-- - Any `Monad` that has an `Alternative` instance is `Mappable`.
--
class Mappable (F : Type u -> Type v) extends Functor F where

  filterMap (p : α -> Option β) : F α -> F β
  
  -- Usually if there's `filterMap`, 
  -- there must be a `filter` method as well for collections. 
  -- But if it doesn't then we define `filter` in terms of `filterMap`.
  -- However a direct implementation has, very likely, better performance.
  filter (p : α -> Bool) : F α -> F α := filterMap (fun x => if p x then some x else none)
  
  map f := filterMap (some ∘ f)

(这里虽然借助了 filterMap 来实现 Functor.map 但是从来不用，我们还是用原生实现的 Functor .)

Mappable 延伸了（我们特意不用继承这个字眼以便和 java 等语言的 extends 区分开来） Functor .

只关注 filterMap 这一个方法： Array List LazyList 都已经直接定义了该方法，现在要为所有的 m s.t. Monad m Alternative m 导出一个 Mappable ，这实际上也简单：


@[default_instance] 
instance (priority := low) [Alternative m] [Monad m] : Mappable m where
  filterMap f xs := xs >>= fun x => 
    if let some x := f x then pure x else failure

有人可能会说，你这还需要一个 Alternative ，是不是依赖太强了？

这个依赖是必要的，据我所知， Haskell，至少是 ghc 的翻译，用的是

guard p *> e 或者 e <$ guard p
无论哪个，用到 guard 就需要 Alternative .
上面也说过， Alternative 的两个 fields

failure 体现的是 zero element
tryCatch 体现的是这个 zero element 作为吸收元（absorbing element）的一个证明
所以无论有无实现 Alternative ，既然我们需要处理 where 从句，实际上就已经需要 zero element 了，实质上已经体现了 Alternative .

ghc 的翻译是很干净的，为什么我们做不到呢——当然是因为性能问题和 eager eval.

我觉得唯一的槽点就是 API 乱，因为我们下面实际的翻译，又用到 Functor 又用到 Mappable 还有 Bind . 确实不如 Haskell 统一用 Monad 来得舒服。这里大概率还有改进空间，不过我觉得不重要就没有管了。

Denotational semantics #1

既然已经优化好了，也定义好接口了，那就来写写 denotational semantics:

我们用表示作为推导式的 desugaring，那么有

⚠️

特指最左侧的表达式，且中不再嵌套推导式

where E ::= (pat <- e' in E) .

Implement #1

数学总是看起来很美好的，但实现起来，特别是需要优化，就显得丑陋了。

我们发现还有优化可做：

不存在 predicate 的情况（即没有 where 从句）的情况下，没有必要去构造 f' 并 filterMap ，直接 map 即可——省几个 stack frames 嘛——因此我们分开处理两种情况；

我们只有两条完整的语法，所以 parse 之后会得到 i, l 两个数组，分别表示 pat 和 e' ；

当然我们也可以多加几条语法，然后直接在 macro_rules 上递归（即前面的 denotational semantics），不过性能恐怕不如直接 fold 数组来构造AST来得好。


macro_rules
	...  
  | `([ $f | $[$i <- $l]in* ]) => do
    if h : i.size >= 1 ∧ l.size >= 1 then
      let (iz, lz) := (i[i.size - 1], l[l.size - 1])
      let li := (l.zip i)[:l.size - 1]
      show MacroM Term from
      ``(Functor.map (fun $iz => $f) $lz) >>= li.foldrM fun (t, fb) a =>
        ``(Bind.bind $t (fun $fb => $a))
    else unreachable!
  | `([ $f | $[$i ← $l]in* ]) => ``([ $f | $[$i <- $l]in* ])
  | `([ $f where $p | $[$i <- $l]in* ]) => do
    let mf <- `($f depends_on! $p)
    let (iz, lz) := (i[i.size - 1]!, l[l.size - 1]!)
    let li := (l.zip i)[:l.size - 1]
    show MacroM Term from
    ``(Mappable.filterMap (fun $iz => $mf) $lz) >>=
      li.foldrM fun (t, fb) a => ``(Bind.bind $t (fun $fb => $a))
  | `([ $f where $p | $[$i ← $l]in* ]) => ``([ $f where $p | $[$i <- $l]in* ]

现在比如说我们求 n 以内的勾股数，就很简单：


def pyth (n : Nat) : LazyList (Nat × Nat × Nat) :=
  [(x, y, z) where x ^ 2 + y ^ 2 == z ^ 2
    | x <- [|1 ~~ n + 1|]
   in y <- [|x ~~ n + 1|]
   in z <- [|y ~~ n + 1|]]

其中 [||] 是我用来表示 LazyList literal 的语法，还加了点糖：


gen ::= [|num ~~ (num)? (: num)? |]
      | [|num to (num)? (by num)?|]

这些语法都会被翻译到 LazyList.gen start stop step ，产生一个左闭右开步数为 step 的无穷 / 有穷整数列.

小缺陷

一般来说， fun x => fun y => fun z => .. 应该是和 fun x y z => .. definitionally equal 的，不仅如此，前者在 WHNF 化之后，从 metalanguage 的角度来说，应该更是直接相等的—— rfl 能够直接证明两者相等印证了这点：


example : 
  (λ x => λ y => λ z => (x, y, z) : α -> β -> γ -> (α × β × γ)) 
  = 
  λ x y z => (x, y, z) := rfl

但是在 universe level inference 上 Lean 并非这样处理，特别是和 dependent product 相关，即便 result type 是 concrete 的时候：


(fun (x, _) (_, y) => (x, y)) <$> [(1,2), (3,4)] <*> [(5,6),(7,8)]
-- [(1, 6), (1, 8), (3, 6), (3, 8)]


(fun (x, _) => fun (_, y) => (x, y)) <$> [(1,2), (3,4)] <*> [(5,6),(7,8)]

-- stuck at solving universe constraint
--   max (?u.400287+1) (?u.400288+1) =?= max (?u.400212+1) (?u.400213+1)
-- while trying to unify
--   Prod.{?u.400213, ?u.400212} ?m.400220 ?m.400230 : Type (max ?u.400213 ?u.400212)
-- with
--   Prod.{?u.400213, ?u.400212} ?m.400220 ?m.400230 : Type (max ?u.400213 ?u.400212)

这里没有细究，能说明的是涉及到 (· × ·) 类型的 pattern matching 时，后者会加 explicit motive: 可能是这个导致的问题。

此外，Lean 的积类型的两个参数是 universe-polymorphic 的，恐怕这也造成了一些影响，考虑单参数 universe-polymorphic 的情况：


structure NProd (α β : Type u) where
  mkNp ::
  fst : α
  snd : β
  
open NProd in 
#eval 
  (fun {fst,..} => fun {snd,..} => (fst, snd)) 
 <$> [mkNp 1 2, mkNp 3 4] 
 <*> [mkNp 5 6, mkNp 7 8]

就能够正常推断。总的来说是一个很 obscure 的东西。

这个问题在我们推导式这里尤其明显：上面的例子其实是推导式的另一种翻译，用 Seq 而不是 Bind . 同时上面的式子等价于


[(x, y) | (x, _) <- [(1, 2), (3, 4)] in (_, y) <- [(5, 6), (7, 8)]]
= [(1, 2), (3, 4)] >>= fun (x, _) => 
    [(5, 6), (7, 8)]] >>= fun (_, y) =>
      (x, y)

可以看到，我们这个翻译也不是 WHNF，同样也会遇到上述问题。

不过这个问题很好解决，显式添加类型标注 (type ascription) 即可。 show List (Nat × Nat) from ...

实际使用，toplevel 的定义本就最好要写类型；local 有 local bidirectional type inference, 真正会报错的情况还是少数。

如果就是不想写类型标注，那只能考虑用 Seq 的翻译了，就像上面这个例子的第一种情况那样。下面介绍这一种翻译。

Translation #2

首先需要明确一个很基本的东西：

对任意的有

一般习惯性的理解成：


do 
  let x₁' <- x₁
  let x₂' <- x₂
  ...
  let xₙ' <- xₙ
  pure (f x₁' x₂' ... xₙ')

即把 lift 到一个 Applicative 中，再 apply 到各个参数上。但是这个东西其实不是特别准确，在语义上。我认为最准确的叫法是 sequential application. 也就是说按顺序一个个 apply. 这很重要：

告诉我们把中的第个元素全部转换成，由于其需要多个参数，相当于 partial application, 因此相当于产生一个全是函数的列表，记作，

接着看，两边都是列表，一边是一列函数，另一边是一列参数。

固定一个，对，其结果为一个列表，往复这个过程并把所有的列表连接起来，就是结果。

有标准定义 f <*> x = f >>= (· <$> x)

从语义上来讲，特别是在列表上，相当于爆搜。因而我们也可以将推导式翻译成类似的形式，而且异常简单。

仍然先给出一个语法：


comp ::=
  ...
  | [$exp (where $p)? | ($pat <- $exp',)+]

写成 Lean


syntax "[" term " | " withoutPosition((term (" ← " <|> " <- ") term),+) "]" : term
syntax "[" term " where"  term " | " withoutPosition((term (" ← " <|> " <- ") term),+) "]" : term

有 semantics (注意 <&> = flip <$> )

where E ::= (pat <- e', E) .

对于没有 where 从句的仍分开处理，直接用 <$> :


macro_rules
	...
  | `([ $f | $[$i <- $l],* ]) => do
    if h : l.size >= 1 then
      let is <- ``(fun $i* => $f)
      let fn <- ``(Functor.map $is)
      let l0 := l[0]
      show MacroM Term from
      ``($fn $l0) >>= l.foldlM (start := 1) fun a s =>
        ``(Seq.seq $a (fun _ : Unit => $s))
    else unreachable!
  | `([ $f | $[$i ← $l],* ]) => ``([ $f | $[$i <- $l],* ])
  | `([ $f where $p | $[$i <- $l],* ]) => do
    let mf <- ``($f depends_on! $p)
    ``(Mappable.filterMap id [ $mf | $[$i <- $l]in* ])
  | `([ $f where $p | $[$i ← $l],* ]) => do
    ``([ $f where $p | $[$i <- $l],* ])

我们前面说过， <$> 即 map 是可以和 filterMap 合并的，这里也如此吗？显然不是的，因为求值顺序不一样，具体来说， <$> 是左结合的。

我们这里是一个 left fold, <$> 最先求值；

和前面的 right fold, map 最后求值（在最内层）不一样。

所以此处不能合并，即这种翻译在 predicative filtering 的情况下必须要多遍历一遍。

这种方法主要是直观，也是我第一个想到的翻译，反而不是前面那种。但是其参考作用大于实际作用：

性能上并不好

而且显然不支持 sequential binding

但不存在上面说过的小缺陷：

注意 ``(fun $i* => $f)

其中 $i* 是 antiquote 的 splice 语法，可以看作是把 (i : Array (TSyntax `funBinder) 在 fun ... 处直接展平，这样就避免了 fun ... => fun ... => 手动 curry 导致的 universe level inference 问题。但我估计这东西大概率没有健壮性可言，稍微复杂一点可能仍会出现上述缺陷。

Translating by Examples: Parallel

个人感觉上面的 sequential 这种爆搜恐怕并不很常用，很常用的或许是 parallel comprehension，即


zipWith f xs ys = [| f.body | x <- xs | y <- ys |]

那么对于两个以上的列表怎么办呢——有一种 naive thought:

zip = zipWith (· , ·) ，记为 . right associative, binds tighter than . (因为 (· × ·) 就是右结合的)

考虑 zipWithN ，其工作在个列表上，那么

我们用表示 uncurried version of .

定义很美妙，跑起来像屎一样。

因为是平行的，理想情况下的时间复杂度应当是

很明显，对个长度为的列表来说复杂度是，同时还会存在一个超大的 intermediate data structure. 虽然后者可以通过 LazyList 缓解，但是就单纯的时间复杂度而言已经十分不理想。

土办法

上面所说的问题，在任何一个把 parallel comprehension, parallel for 解糖到 zip zipWith 的语言中都会发生。对于没有元编程，且又不想提供原生支持自动生成 zipN 的语言来说，最好也最简单的办法就是枚举。

还记得我原来工作过的一个语言，其 for...in 用 continuation-based iterator 实现，parallel for 则是用 iterator 的 zipWith() 方法。该语言并没有元编程，同样也没有在编译器内做 zipWithN （在我工作的时期，现在似乎也没有），因而采取的办法就是把 zipWith...zipWith15 全部枚举一遍，超出部分用仍用上面的 nested zip .

但我们仍要尽力减少这种 boilerplate 的东西，先考虑 zip zipWith 的关系：

一般来说 zip 都是直接用 zipWith 实现的，即 zip = zipWith (· , ·) . 考虑 zipN ，显然我们有

翻译成 Lean 的 macro prod_of! 就是


macro:max "prod_of!" n:num : term => 
  match n.getNat with
  | 0 => ``(())
  | 1 => ``(id)
  | n + 2 =>
    ``((· , ·)) >>= n.foldM fun _ _ a =>
      ``((· , $a))

💡

prod_of! binds tighter than function application. Prod nests to the right.

这样，我们就能够专心去实现 zipWithN 了。

那先给一个 Zippable 的定义，generalizing zip .


--  Similar to Haskell's `MonadZip`, but for collections especially.
--  But relaxes on the constraint that `m` must be a `Monad` first.
--  - Any `Monad` is `Zippable`.

class Zippable (m : Type u -> Type v) where

  zipWith  (f : α -> β -> γ) : m α -> m β -> m γ
  zipWith3 (f : α -> β -> γ -> δ) : m α -> m β -> m γ -> m δ
  zipWith4 (f : α -> β -> γ -> δ -> ζ) : m α -> m β -> m γ -> m δ -> m ζ  
  zipWith5 (f : α -> β -> γ -> δ -> ζ -> η) : m α -> m β -> m γ -> m δ -> m ζ -> m η
  zipWith6 (f : α -> β -> γ -> δ -> ζ -> η -> θ) : m α -> m β -> m γ -> m δ -> m ζ -> m η -> m θ
  zipWith7 (f : α -> β -> γ -> δ -> ζ -> η -> θ -> ι) : m α -> m β -> m γ -> m δ -> m ζ -> m η -> m θ -> m ι
  
  zip  : m α -> m β -> m (α × β) := zipWith prod_of! 2
  zip3 : m α -> m β -> m γ -> m (α × β × γ) := zipWith3 prod_of! 3
  zip4 : m α -> m β -> m γ -> m δ -> m (α × β × γ × δ) := zipWith4 prod_of! 4
  zip5 : m α -> m β -> m γ -> m δ -> m ε -> m (α × β × γ × δ × ε) := zipWith5 prod_of! 5
  zip6 : m α -> m β -> m γ -> m δ -> m ε -> m ζ -> m (α × β × γ × δ × ε × ζ) := zipWith6 prod_of! 6
  zip7 : m α -> m β -> m γ -> m δ -> m ε -> m ζ -> m η -> m (α × β × γ × δ × ε × ζ × η) := zipWith7 prod_of! 7

很多人看来是屎山，但确是很好用的办法

Generalizing 土办法

让我们先跳出 List 这个框架，想一下针对所有的 Monad 该怎么处理：

我们上面说过， <*> 主要是用来爆搜，但是考虑一下非线性/递归结构的 Monad 呢？实际上，就不存在这个问题，没有差别。也就是说对于任何非 Array List LazyList 的 Monad ，在语义上有

<*> = <+> .

所以我们可以实现一个通用的 Zippable instance:


instance (priority := low) [Monad m] : Zippable m where
  zipWith  := (· <$> · <*> ·)
  zipWith3 := (· <$> · <*> · <*> ·)
  zipWith4 := (· <$> · <*> · <*> · <*> ·)
  zipWith5 := (· <$> · <*> · <*> · <*> · <*> ·)
  zipWith6 := (· <$> · <*> · <*> · <*> · <*> · <*> ·)
  zipWith7 := (· <$> · <*> · <*> · <*> · <*> · <*> · <*> ·)

接着，我们再单独为 Array List LazyList 实现 Zippable , 根据 Lean 的 instance search algorithm, 这几个 instance 的优先级会比通用的高。这也可以通过 #synth 来验证。

这样，我们在确保所有 Monad 都能使用的情况下（通用性），也满足序列结构的特殊 zip 实现的性能要求。

优化先于通用

其实通篇看下来会发现我们特意不去统一使用Monad 这个API，而且实现几乎都是从序列结构 List 等视角叙述的，这有几个原因：

第一是 Lean 的求值语义和 Haskell 不同，上面也强调过很多遍了；

第二是这些序列结构确实是有特殊的实现，没有必要为了统一而统一，失去本可以避免失去的性能优势；

第三是大家都爱 generalizing，monad comprehension 确实是很美妙的统一，但是就我所知很少有人真的用 comprehension 的语法去写 monadic code, 绝大多数都是用 do ; 而 comprehension 则仍然主要用于序列结构。

那么很显然，我们的 comprehension 就一定要为序列结构而优化，generalization 都是次要的。

而且实际上我们发现，在做足了必要的优化下也没有损失任何一点 generality，使用的 API 虽然较乱，但对用户来说是不可见的；

不仅如此，涉及到的几个 typeclass 本就有很强的关联性，实现起来并不费什么功夫。

`Zippable List`

很简单，我们特意写 tailrec 版本的，用数组存中间数据，这里给出两个例子：


def zipWith4 (f : α -> β -> γ -> δ -> ζ) : List α -> List β -> List γ -> List δ -> List ζ := go #[] where
  go acc
  | a :: as, b :: bs, c :: cs, d :: ds => go (acc.push $ f a b c d) as bs cs ds
  | _, _, _, _ => acc.toList
def zipWith5 (f : α -> β -> γ -> δ -> ζ -> η) : List α -> List β -> List γ -> List δ -> List ζ -> List η := go #[] where
  go acc
  | a :: as, b :: bs, c :: cs, d :: ds, e :: es => go (acc.push $ f a b c d e) as bs cs ds es
  | _, _, _, _, _ => acc.toList

`Zippable LazyList`

同理，但是肯定没法写 tailrec 的，因为要保证 lazy property


def zipWith4 (f : α -> β -> γ -> δ -> ζ) : LazyList α -> LazyList β -> LazyList γ -> LazyList δ -> LazyList ζ
  | a ::' as, b ::' bs, c ::' cs, d ::' ds => f a b c d ::' zipWith4 f as.get bs.get cs.get ds.get
  | _, _, _, _ => [||]
def zipWith5 (f : α -> β -> γ -> δ -> ζ -> η) : LazyList α -> LazyList β -> LazyList γ -> LazyList δ -> LazyList ζ -> LazyList η
  | a ::' as, b ::' bs, c ::' cs, d ::' ds, e ::' es => f a b c d e ::' zipWith5 f as.get bs.get cs.get ds.get es.get

很多人会发现，其实上面这些 inductive 的数据结构的 zipWithN 都十分有规律，完全可以写一个宏来程序化的生成 zipWithN ，但其实有一些小问题：

到底该生成在哪呢，是根据用户的推导式代码来在本地生成一个；

如此则每次使用推导式都要生成一个一模一样的 local definition, 极大增加编译体积

还是再写一个 command，用户声明其需要使用多长的 zipWith ，再根据这些信息利用 elaborator 在 toplevel 生成呢？

看起来也并不方便，对用户有要求。

最后结论就是不做这种宏，不方便做，也懒得做。

`Zippable Array`

我这里想特别谈谈 Array 的实现。 Array 在 prelude 中是由 List 实现的，但这不过是一个 reference implementation, 主要写给 typechecker/kernel 看 (因为 primitive 就没法 reason 了，不够 self-contained，一定要构造出来). 那么显然这个性能是完全说不过去的。

在运行时 Lean 会用一个真正的， primitive的，适合编程的 Array 覆盖之。

所以在实现上，我们需要利用 Array 的这一点优势，通过 index 去访问它，就像命令式语言一样。

但是这里就涉及到一个问题，对某个 arr : Array α，Lean 有三种下标访问：

arr[i] 需要证明 i < arr.size，但是最快，没有任何下面提到的 overhead.

arr[i]? 产生一个 Option α , 但如此，对 n 个数组而言我们就要做次运行时越界检查,

同时 constructing/destructuring Option 又要做同样多次，i.e.


match as[i]?, bs[i]? .. with
| some a, some b, .. =>
| _, _, .. =>

这么多 overhead 在性能上不可接受。

arr[i]! 相当于命令式语言的下标访问，若越界运行时会 panic ，但是 Lean 毕竟是一个 theorem prover, 讲求 type soundness, 即便 panic ，我们也需要证明 Inhabited α ；同时这里的 result type 就是 α , 我们也不能通过 Nonempty α 并用选择公理 nonconstructively 直接选一个元素出来，否则整个 zipWithN 就是 noncomputable 的；

Inhabited 的约束太强，虽然实际编程用不到非 Inhabited 的类型，而且 Lean 也有自动的 deriving handler, 但证明工作者应当有些自尊。
而且实际上 arr[i] 对应 FFI 的 lean_array_fget ； arr[i]! 对应 lean_array_get ，后者还是有 runtime checking 的（否则怎么知道要 panic 呢）

那么，留给我们的只有一种选择，即是用 canonical 的 arr[i] 来访问。

这就涉及到我最爱的证明环节了。

证明访问一定不越界

先考虑一个简单的情况 zipWith ，下面是 Lean 官方的实现：


@[semireducible, specialize]
def zipWithAux (as : Array α) (bs : Array β) (f : α → β → γ) (i : Nat) (cs : Array γ) : Array γ :=
  if h : i < as.size then
    let a := as[i]
    if h : i < bs.size then
      let b := bs[i]
      zipWithAux as bs f (i+1) <| cs.push <| f a b
    else
      cs
  else
    cs
decreasing_by simp_wf; decreasing_trivial_pre_omega

要我说，第一是写的不清楚，第二是甚至还用了良基递归 (WF recursion)，有点太 cumbersome.

同时仍然做了越界检查，自然数上的小于关系 (· < ·) 是 decidable 的，这两个 if 不会擦除，也不能擦除。

而且还不具有通用性，考虑个情况，就是个比较，这么深的 nested ite 对 branching prediction 也不友好，属于是控制流地狱。

而实际上，我们完全可以，求出个数组中最小的那个的长度，并证明这个长度小于等于任何其他的长度，i.e.

其中 minOf 的实现可以取巧， foldl 不如 foldr 也就是传统的递归好 reason,

但是非尾递归容易爆栈，我们可以用 implemented_by 来同时保证 reasoning 的简单性和运行健壮性：


def minOfI [Min α] : (l : List α) -> l ≠ [] -> α
  | x :: xs, _ => xs.foldl min x

@[implemented_by minOfI] 
def minOf [Min α] : (l : List α) -> l ≠ [] -> α
  | [x], _ => x
  | x :: xs, _ => 
    min x $ minOf xs (List.cons_ne_nil _ _)

discriminant refinement in action

可以发现，调用 minOf 本身还须证明列表非空。但实际上我们一定不会把 minOf apply 到空列表上，

对于非空列表，有一个通用的定理（很显然）

我们再写一个宏 nonempty! 方便使用：


local macro "nonempty!" : term => 
  ``(List.cons_ne_nil _ _)

严格说来，还要证明 minOfI = minOf 。但我们直接~~易证/显然~~

并在这个长度上（自然数上）做 folding 即可。听起来有点抽象，不妨举个例子：

考虑


def zipWith3 (f : α -> β -> γ -> δ) 
  : Array α -> Array β -> Array γ -> Array δ
  | as, bs, cs => 
	  let m := minOf [as.size, bs.size, cs.size] nonempty!
	  m.fold (init := #[]) 
	    fun i h a => 
	      a.push (f as[i] bs[i] cs[i])
	  --            ^^^^^ ^^^^^ ^^^^^

m.fold 会同时传一个定理给后边的 lambda.

根据

我们能证明 i 对数组的访问是合法的。现在已经有

只需证证明 . 不妨先展开：

得到三个 goal. Recall, (花括号表示 Lean 可以自己推断出, 不必手动填写)

we then discharge these goals with

证明这类有规律的命题，通常的思路是在 LHS 从内由外通过自然数小于等于号的传递性直接构造出目标不等式，不是非常好想，特别是程序性的推理。

但是注意力比较好的人，或者 constructive intuition 较好的人，可以从构造主义的角度来想这个问题。我注意力不好，所以就多写了两个，但最终也能发现同样的 intuition：

考虑

我们发现一个很重要的事实：

💡

min 链和我们的列表，是 isomorphic 的！

回想一下，在 Lisp 中，我们是如何通过 accessor 在一个列表上递归的：

car = head

cdr = tail

min 链终究不是数据结构，但问题是我们现在是需要证明左边这个 min 链小于右边的一个东西，把所有的 min 全部换成 cons 我们就有

仅看这个式子没有任何道理，但是有助于我们思考：

来看 min_le_left min_le_right ：像不像 car 和 cdr ——就像用 car cdr 去 destructure 一个列表一样，

我们也可以通过 min_le_left min_le_right (分别记作 car cdr)，中间由 le_trans (记作 <> , 右结合) 串接 (chain)，在左边做 structural induction 来构造出不等号右边的式子，即

当的时候，即

把这些东西用宏全部自动实现了就有：


open Lean

private abbrev cdr {a b} := Nat.min_le_right a b
private abbrev car {a b} := Nat.min_le_left  a b
local infixr:60 " <> " => Nat.le_trans
local macro "nonempty!" : term => ``(List.cons_ne_nil _ _)
local macro "car" : tactic => `(tactic|exact car)
local macro "cadr" : tactic => `(tactic|exact cdr <> car)
local macro "caddr" : tactic => `(tactic|exact cdr <> cdr <> car)
local macro "cadddr" : tactic => `(tactic|exact cdr <> cdr <> cdr <> car)
local macro "caddddr" : tactic => `(tactic|exact cdr <> cdr <> cdr <> cdr <> car)
local macro "cadddddr" : tactic => `(tactic|exact cdr <> cdr <> cdr <> cdr <> cdr <> car)
local macro "caddddddr" : tactic => `(tactic|exact cdr <> cdr <> cdr <> cdr <> cdr <> cdr <> car)
local macro "cadddddddr" : tactic => `(tactic|exact cdr <> cdr <> cdr <> cdr <> cdr <> cdr <> cdr <> car)

local macro "cdr" : tactic => `(tactic|exact cdr)
local macro "cddr" : tactic => `(tactic|exact cdr <> cdr)
local macro "cdddr" : tactic => `(tactic|exact cdr <> cdr <> cdr)
local macro "cddddr" : tactic => `(tactic|exact cdr <> cdr <> cdr <> cdr)
local macro "cdddddr" : tactic => `(tactic|exact cdr <> cdr <> cdr <> cdr <> cdr)
local macro "cddddddr" : tactic => `(tactic|exact cdr <> cdr <> cdr <> cdr <> cdr <> cdr)
local macro "cdddddddr" : tactic => `(tactic|exact cdr <> cdr <> cdr <> cdr <> cdr <> cdr <> cdr)
local macro "cddddddddr" : tactic => `(tactic|exact cdr <> cdr <> cdr <> cdr <> cdr <> cdr <> cdr <> cdr)

private abbrev accessors : Array $ MacroM Syntax.Tactic :=
  #[ `(tactic|car), `(tactic|cadr), `(tactic|caddr), `(tactic|cadddr)
   , `(tactic|caddddr), `(tactic|cadddddr), `(tactic|caddddddr), `(tactic|cadddddddr)]
private abbrev accessors' : Array $ MacroM Syntax.Tactic :=
  #[ `(tactic|cdr), `(tactic|cddr), `(tactic|cdddr), `(tactic|cddddr)
   , `(tactic|cdddddr), `(tactic|cddddddr), `(tactic|cdddddddr), `(tactic|cddddddddr)]
@[simp]private theorem accsize  : accessors.size  = 8 := by decide
@[simp]private theorem accsize' : accessors'.size = 8 := by decide

private def pfcmds (s : Ident) (pf : Syntax.Tactic) (cont : Term) : MacroM Term :=
  `(have : m <= Array.size $s := by simp[m, minOf]; $pf -- mandatory linebreak
    $cont)
local syntax "proof_and_fold!" ident ident+ : term
macro_rules
  | `(proof_and_fold! $f $[$t]*) => do
      let ts := t.size
      if H : ts <= 8 ∧ ts >= 2 then
        let funapp <- t.foldlM (init := f) fun a s => `($a ($s[i]))
        let cont <- `(Nat.fold m (init := #[]) fun i h a => Array.push a $(funapp))
        let blk <- ts.foldRevM (init := cont) fun i h a => do
          have : i < accessors.size := Nat.lt_of_lt_of_le h $ accsize ▸ H.1
          if i = ts - 1 then
            have : i - 1 < accessors'.size := Nat.sub_lt_of_lt $ accsize' ▸ (accsize ▸ this)
            pfcmds t[i] (<-accessors'[i-1]) a
          else
            pfcmds t[i] (<- accessors[i]) a
        `(let m := minOf [$[Array.size $t],*] nonempty!;
          $blk)
      else Macro.throwError "handler not implemented, see source."

def zipWith3 (f : α -> β -> γ -> δ) : Array α -> Array β -> Array γ -> Array δ
  | as, bs, cs => proof_and_fold! f as bs cs
def zipWith4 (f : α -> β -> γ -> δ -> ζ) : Array α -> Array β -> Array γ -> Array δ -> Array ζ
  | as, bs, cs, ds => -- what `proof_and_fold!` should be expanded to
    let m := minOf [as.size, bs.size, cs.size, ds.size] nonempty!
    have : m <= as.size := by simp[m, minOf]; car
    have : m <= bs.size := by simp[m, minOf]; cadr
    have : m <= cs.size := by simp[m, minOf]; caddr
    have : m <= ds.size := by simp[m, minOf]; cdddr
    m.fold (init := #[]) fun i h a => a.push (f as[i] bs[i] cs[i] ds[i])
def zipWith5 (f : α -> β -> γ -> δ -> ζ -> η) : Array α -> Array β -> Array γ -> Array δ -> Array ζ -> Array η
  | as, bs, cs, ds, es => proof_and_fold! f as bs cs ds es
def zipWith6 (f : α -> β -> γ -> δ -> ζ -> η -> θ) : Array α -> Array β -> Array γ -> Array δ -> Array ζ -> Array η -> Array θ
  | as, bs, cs, ds, es, fs => proof_and_fold! f as bs cs ds es fs
def zipWith7 (f : α -> β -> γ -> δ -> ζ -> η -> θ -> ι) : Array α -> Array β -> Array γ -> Array δ -> Array ζ -> Array η -> Array θ -> Array ι
  | as, bs, cs, ds, es, fs, gs => proof_and_fold! f as bs cs ds es fs gs

let have 都是 local binding, 之后必须要跟一个表达式，但这个表达式应该在最后，所以我们在 pfcmd 中传入一个语义上的 continuation.

这里有一点比较屎山，就是 accessor 没有用 macro 生成，而是直接枚举出来的。但反正数量不多，也不影响。

综合起来

先给语法：


comp ::=
  ...
  | [$exp (where $p)? | ($pat <- $exp' |)+]

因为实际实现和 denotational semantics 差得比较多，我懒得写了，直接放代码吧：


macro_rules
	...  
  | `([ $f | $[$i <- $l]|* ]) => do
    match h : l.size with
    | 0 => unreachable!
    | 1 => ``(Functor.map       (fun $i* => $f) $(l[0]))
    | 2 => ``(Zippable.zipWith  (fun $i* => $f) $(l[0]) $(l[1]))
--  ... (中略)
    | 7 => ``(Zippable.zipWith7 (fun $i* => $f) $(l[0]) $(l[1]) $(l[2]) $(l[3]) $(l[4]) $(l[5]) $(l[6]))
    | _ + 8 =>
      let i := i.map fun i => ⟨i.raw.setKind `term⟩
      let ls := l.size
      let hd <- ``(Zippable.zip7 $(l[ls - 7]) $(l[ls - 6]) $(l[ls - 5]) $(l[ls - 4]) $(l[ls - 3]) $(l[ls - 2]) $(l[ls - 1]))
      let lzip <- l[:ls - 7].foldrM (init := hd) fun s a => ``(Zippable.zip $s $a)
      let args <- i.reverse.foldl1M fun a s => ``(($s, $a))
      ``(Functor.map (fun $args => $f) $lzip)
  | `([ $f | $[$i ← $l]|* ]) => ``([ $f | $[$i <- $l]|* ])
  | `([ $f where $p | $[$i <- $l]|* ]) => do
    match h : l.size with
    | 0 => unreachable!
    | 1 => ``(                      (Functor.filterMap (fun $i* => $f depends_on! $p) $(l[0])))
    | 2 => ``(Mappable.filterMap id (Zippable.zipWith  (fun $i* => $f depends_on! $p) $(l[0]) $(l[1])))
--  ... (中略)
    | 7 => ``(Mappable.filterMap id (Zippable.zipWith7 (fun $i* => $f depends_on! $p) $(l[0]) $(l[1]) $(l[2]) $(l[3]) $(l[4]) $(l[5]) $(l[6])))
    | _ + 8 =>
      let i := i.map fun i => ⟨i.raw.setKind `term⟩
      let ls := l.size
      let hd <- ``(Zippable.zip7 $(l[ls - 7]) $(l[ls - 6]) $(l[ls - 5]) $(l[ls - 4]) $(l[ls - 3]) $(l[ls - 2]) $(l[ls - 1]))
      let lzip <- l[:ls - 7].foldrM (init := hd) fun s a => ``(Zippable.zip $s $a)
      let args <- i.reverse.foldl1M fun a s => ``(($s, $a))
      let mf <- `(fun $args => $f depends_on! $p)
      ``(Mappable.filterMap $mf $lzip)
  | `([ $f where $p | $[$i ← $l]|* ]) => ``([ $f where $p | $[$i <- $l]|* ])

这样，我们就完成了所有的翻译。全部代码请参照文章头部的链接。

What’s left

parallel 版本的 comprehension 蕴含了一个很强的约束：homogeneous. 要求 F 应该是同一个类型。但实际上 heterogeneous 的版本就很不好做。虽然前面一直一起拿来说，但现在要区分

parallel for

parallel comprehension

区别就在于它们的 result type.

for 语句中 100% 存在 effects, 而且他从来不返回有用的值（Unit），在一个 monadic context m 中，我们能够轻易确定返回类型 m PUnit ；所以针对 for 是可以做的。（实际上 Lean 也有一个 Stream ，正是用来实现 hetero parallel for 的）。

但是推导式是一定有值的，而且它通常在一个 pure context 中使用。

hetero 的版本第一是对静态类型系统非常之不友好（cf. python 的 zip 就是 lazy 且 hetero 的），

第二是考虑有值的情况，[(x, y, z) | x <- mx, y <- ny, z <- fz] , 那么这里有三个 functor m n f , 最终的3-元组到底应该是在哪个 functor 中呢？

要同时在一个 monadic context 中处理 x y z 是不是又要好多个 MonadLift MonadControl instance 呢？

而且这个 monadic context 又该是什么呢？

不解决这些问题，就没有办法实现。而我才疏学浅也没有想到什么好办法。

有人说你 Haskell 不是可以 [(x, y) | x <- [1,2,3], y <- "abc"] 吗，一个是 [Int] ; 一个是 String.

那是当然，因为 Haskell 里边 String = [Char] .

不妨开 -XOverloadedStrings 再试试：


[(x, y) | x <- [1,2,3], y <- ("123" :: Text)]

要是没报错可能下载到了动态类型版本的 Haskell.

所以你猜性能有多垃圾

结尾

我们完成了三种 Comprehension 的翻译，两种 sequential 的，一种 parallel 的：


comp ::=
	| [$exp (where $p)? | ($pat <- $exp' |)+]
	| [$exp (where $p)? | ($pat <- $exp' in)+]
	| [$exp (where $p)? | ($pat <- $exp',)+]

这些语法糖在写的舒爽的同时保证了性能，应该说还是 production ready 的。

Addendum: Algebraic Intuition

Indeed isomorphic

It is very clear that foldr is a catamorphism, intuitively minOf swaps out the (::) in the algebraic structure [...] for min.

let be the endofunctor .

The initial algebra where

By definition, in the category of F-algebra, there is a unique homomorphism such that

any other must admit:

In our case . That is,

We showed that min chain is indeed, isomorphic.

`car` and `cdr`

Additionally, we would like to observe how min_le_left min_le_right works.

let be the category of ordered lists, let be the posetal category (like poset but category) of the elements of the aforementioned lists, We obtain the adjunction

and

This express the fact that (unfolding the definition of and ):

Recall the counit-unit definition of adjunction:

maps an element to the minimum of the singleton that is constructed from itself;

relates the minimum of a list to each element of itself.

This is exactly how min_le_left min_le_right work in a binary min , extends to n -arity through .

Proof diagram

To further clarify things, we show the commutative diagram-like proof diagram:

There’s one crucial fact to remember, that is, the composition of arrows is not , but rather, Nat.le_trans or .

This concludes our proof.