Reinforcement Learning without Backprop:Open AI Evolution Strategies

今天要為大家介紹的是過去一個多禮拜爭議性最高的一份研究:Open AI:Evolution Strategies as a Scalable Alternative to Reinforcement Learning。這份研究爭議究竟有多大呢? 首先Deep Learning的大老之一Dr. Yann LeCun就先在推特提出以下批評:
"....Gradient-Based optimization is immensely more efficient than black box optimization.....black box optimization should be a last resort, when there absolutely no way to use Gradient-Based opt. Also, "no need to backprop gradients" is a deficiency, not a feature......this blog post would be unnecessary if people knew the difference between RL and black box optimization“

而其他資深的Deep Learning研究員如Dr. Nando de Freitas也加入討論,更多完整的討論可以參考這兩則tweeter(1,2)

那到底什麼是Evolution Strategies(ES)呢? 讀者可以參考這篇Blog有更完整的介紹和反思,這邊只對ES做簡單的摘要(還是非常推薦去讀上面的連結)

  1. ES的Evolution跟基因(演化、遺傳..etc)沒有關係,這個名詞純粹是因為早期前人的作品用這個詞於是就沿用了。
  2. ES簡單來說就是在參數裡面做一些隨機擾動(pertubation),假設因這些擾動多生成了100組新的參數,我們可以拿這100組參數去算他分別的Reward,最後參數更新的方向就是這該組參數對其Reward的加權總和。
  3. 不需要BackProp,可以大量平行化,其效能跟傳統的BackProp相當且訓練速度快

然而,在有BackProp可用卻用這樣隨機擾動來最佳化的過程還是很違反許多人的直覺,因為早期的神經網路是因為BackProp的出現後學習效果才有顯著的提升,因此BackProp往往被認為是最佳化神經網路的ㄧ個更直覺的選擇,但最後不妨引用Blog作者的一段話作結:

”Previously, very good things have come out of researchers questioning fundamental assumptions about our methods: If you think about it, replacing smooth sigmoid activations with non-differentiable rectified linear units sounds like a pretty bad idea - until you actually realise they work. Dropout may sound like something to be avoided, until you realise why it works....So ES may well be one of these things. Salimans et al questioned a fundamental thing that everybody takes for granted: that neural networks are optimised via SGD using backpropagation. Well, maybe not always."

Share the joy
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  
  •  

近期文章

近期迴響

彙整

分類

其它

kuanchen Written by:

Be First to Comment

發表迴響

你的電子郵件位址並不會被公開。 必要欄位標記為 *