深度学习路线
目录
1. A Neural Algorithm of Artistic Style [中英文摘要]
2. A Neural Conversational Model [中英文摘要]
3. A character-level decoder without explicit segmentation for neural machine translation [中英文摘要]
4. A learned representation for artistic style [中英文摘要]
5. Actor-mimic deep multitask and transfer reinforcement learning [中英文摘要]
6. Adam: A method for stochastic optimization [中英文摘要]
7. Addressing the rare word problem in neural machine translation [中英文摘要]
8. Ask me anything: Dynamic memory networks for natural language processing [中英文摘要]
9. Asynchronous methods for deep reinforcement learning [中英文摘要]
10. Auto-encoding variational bayes [中英文摘要]
11. Baby talk: Understanding and generating simple image descriptions [中英文摘要]
12. Bag of tricks for efficient text classification [中英文摘要]
13. Batch normalization: Accelerating deep network training by reducing internal covariate shift [中英文摘要]
14. Beyond correlation filters: Learning continuous convolution operators for visual tracking [中英文摘要]
15. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 [中英文摘要]
16. Building high-level features using large scale unsupervised learning [中英文摘要]
17. Character-Aware neural language models [中英文摘要]
18. Collective robot reinforcement learning with distributed asynchronous guided policy search [中英文摘要]
19. Colorful image colorization [中英文摘要]
20. Conditional image generation with PixelCNN decoders [中英文摘要]
21. Continuous control with deep reinforcement learning [中英文摘要]
22. Continuous deep q-learning with model-based acceleration [中英文摘要]
23. Controlling perceptual factors in neural style transfer [中英文摘要]
24. Db89979D95C2465936017076F280Dc25.Pdf [中英文摘要]
25. Decoupled neural interfaces using synthetic gradients [中英文摘要]
26. Deep Learning of Representations for Unsupervised and Transfer Learning [中英文摘要]
27. Deep Neural Networks for Acoustic Modeling in the Presence of Noise [中英文摘要]
28. Deep captioning with multimodal recurrent neural networks (m-RNN) [中英文摘要]
29. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding [中英文摘要]
30. Deep fragment embeddings for bidirectional image sentence mapping [中英文摘要]
31. Deep learning [中英文摘要]
32. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates [中英文摘要]
33. Deep residual learning for image recognition [中英文摘要]
34. Deep speech 2: End-to-end speech recognition in English and Mandarin [中英文摘要]
35. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs [中英文摘要]
36. Distilling the Knowledge in a Neural Network [中英文摘要]
37. Dueling Network Architectures for Deep Reinforcement Learning [中英文摘要]
38. Effective approaches to attention-based neural machine translation [中英文摘要]
39. End-to-end memory networks [中英文摘要]
40. End-to-end training of deep visuomotor policies [中英文摘要]
41. Every picture tells a story: Generating sentences from images [中英文摘要]
42. Evolving large-scale neural networks for vision-based reinforcement learning [中英文摘要]
43. Fast R-CNN [中英文摘要]
44. Fast and accurate recurrent neural network acoustic models for speech recognition [中英文摘要]
45. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks [中英文摘要]
46. Fcnt [中英文摘要]
47. From captions to visual concepts and back [中英文摘要]
48. Fully Character-Level Neural Machine Translation without Explicit Segmentation [中英文摘要]
49. Fully Convolutional Networks for Semantic Segmentation [中英文摘要]
50. Fully-convolutional siamese networks for object tracking [中英文摘要]
51. Functional magnetic resonance imaging as experienced by stroke survivors [中英文摘要]
52. Generating Sequences With Recurrent Neural Networks [中英文摘要]
53. Generative visual manipulation on the natural image manifold [中英文摘要]
54. “{Googles Neural Machine Translation System: Bridging the Gap between Human and Machine Translation}” [中英文摘要]
55. Handbook of approximation algorithms and metaheuristics [中英文摘要]
56. Human-level control through deep reinforcement learning [中英文摘要]
57. Hybrid computing using a neural network with dynamic external memory [中英文摘要]
58. Improving neural networks by preventing co-adaptation of feature detectors [中英文摘要]
59. Inceptionism: Going deeper into neural networks, google research blog [中英文摘要]
60. “{Influ{\^{e}}ncia do glyphosate e 2,4-D sobre o desenvolvimento inicial de esp{\{e}}cies florestais}” [中英文摘要]
61. Instance-Aware Semantic Segmentation via Multi-task Network Cascades [中英文摘要]
62. Instance-sensitive fully convolutional networks [中英文摘要]
63. Joint learning of words and meaning representations for open-text semantic parsing [中英文摘要]
64. Layer Normalization [中英文摘要]
65. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection [中英文摘要]
66. Learning phrase representations using RNN encoder-decoder for statistical machine translation [中英文摘要]
67. Learning to learn by gradient descent by gradient descent [中英文摘要]
68. Learning to navigate in complex environments [中英文摘要]
69. Learning to segment object candidates [中英文摘要]
70. Learning to track at 100 FPS with deep regression networks [中英文摘要]
71. Lifelong machine learning systems: Beyond learning algorithms [中英文摘要]
72. Lifted rule injection for relation embeddings [中英文摘要]
73. Literature after 9/11 [中英文摘要]
74. Long-Term Recurrent Convolutional Networks for Visual Recognition and Description [中英文摘要]
75. Low-Shot Visual Recognition by Shrinking and Hallucinating Features [中英文摘要]
76. Mask R-CNN [中英文摘要]
77. Mastering the game of Go with deep neural networks and tree search [中英文摘要]
78. Matching networks for one shot learning [中英文摘要]
79. Memory networks [中英文摘要]
80. Meta-Learning with Memory-Augmented Neural Networks [中英文摘要]
81. “{Minds eye: A recurrent visual representation for image caption generation}” [中英文摘要]
82. Mining on Manifolds: Metric Learning Without Labels [中英文摘要]
83. Modeling and Propagating CNNs in a Tree Structure for Visual Tracking [中英文摘要]
84. “{Multitudes : Aux origines dune revue radicale}” [中英文摘要]
85. Net2Net: Accelerating learning via knowledge transfer [中英文摘要]
86. Network morphism [中英文摘要]
87. Neural Turing Machines [中英文摘要]
88. Neural machine translation by jointly learning to align and translate [中英文摘要]
89. Neural machine translation of rare words with subword units [中英文摘要]
90. On the importance of initialization and momentum in deep learning [中英文摘要]
91. Perceptual losses for real-time style transfer and super-resolution [中英文摘要]
92. Pixel recurrent neural networks [中英文摘要]
93. Playing Atari with Deep Reinforcement Learning [中英文摘要]
94. Pointer networks [中英文摘要]
95. Policy distillation [中英文摘要]
96. Preparation of novel high copper ions removal membranes by embedding organosilane-functionalized multi-walled carbon nanotube [中英文摘要]
97. Program Induction [中英文摘要]
98. Progressive Neural Networks [中英文摘要]
99. Protective effect of rSm28GST-specific T cells in schistosomiasis: Role of gamma interferon [中英文摘要]
100. R-FCN: Object detection via region-based fully convolutional networks [中英文摘要]
101. Reinforcement Learning Neural Turing Machines - Revised [中英文摘要]
102. Rich feature hierarchies for accurate object detection and semantic segmentation [中英文摘要]
103. SSD: Single shot multibox detector [中英文摘要]
104. Science [中英文摘要]
105. Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks [中英文摘要]
106. Sequence to sequence learning with neural networks [中英文摘要]
107. Sequence to sequence learning with neural networks [中英文摘要]
108. Show and tell: A neural image caption generator [中英文摘要]
109. Show, attend and tell: Neural image caption generation with visual attention [中英文摘要]
110. Siamese Thesis [中英文摘要]
111. Sim-to-Real Robot Learning from Pixels with Progressive Nets [中英文摘要]
112. Sparse generative adversarial network [中英文摘要]
113. Spatial pyramid pooling in deep convolutional networks for visual recognition [中英文摘要]
114. Speech recognition with deep recurrent neural networks [中英文摘要]
115. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and {\textless}0.5MB model size [中英文摘要]
116. Squeezed Very Deep Convolutional Neural Networks for Text Classification [中英文摘要]
117. Studies on Ozone-oxidation of Dye in a Bubble Column Reactor at Different pH and Different Oxidation-reduction Potential. [中英文摘要]
118. Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours [中英文摘要]
119. Target-driven visual navigation in indoor scenes using deep reinforcement learning [中英文摘要]
120. Teaching machines to read and comprehend [中英文摘要]
121. Texture networks: Feed-forward synthesis of textures and stylized images [中英文摘要]
122. Toward Human Parity in Conversational Speech Recognition [中英文摘要]
123. Towards AI-complete question answering: A set of prerequisite toy tasks [中英文摘要]
124. Towards end-to-end speech recognition with recurrent neural networks [中英文摘要]
125. Transferring Rich Feature Hierarchies for Robust Visual Tracking [中英文摘要]
126. Unsupervised representation learning with deep convolutional generative adversarial networks [中英文摘要]
127. Very deep convolutional networks for large-scale image recognition [中英文摘要]
128. You only look once: Unified, real-time object detection [中英文摘要]
摘要
[1] A Neural Algorithm of Artistic Style (2016)
Abstract: In fine art, especially painting, humans have mastered the skill to create unique visual experiences through composing a complex interplay between the content and style of an image. Thus far the algorithmic basis of this process is unknown and there exists no artificial system with similar capabilities. However, in other key areas of visual perception such as object and face recognition near-human performance was recently demonstrated by a class of biologically inspired vision models called Deep Neural Networks. Here we introduce an artificial system based on a Deep Neural Network that creates artistic images of high perceptual quality. The system uses neural representations to separate and recombine content and style of arbitrary images, providing a neural algorithm for the creation of artistic images. Moreover, in light of the striking similarities between performance-optimised artificial neural networks and biological vision, our work offers a path forward to an algorithmic understanding of how humans create and perceive artistic imagery.
摘要: 在美术,特别是绘画中,人类已经掌握了通过在图像的内容和样式之间进行复杂的相互作用来创造独特的视觉体验的技能。到目前为止,该过程的算法基础是未知的,并且不存在具有类似功能的人工系统。但是,在其他一些视觉感知的关键领域,例如物体和面部识别,近来人类的表现被一类被称为深度神经网络的受生物启发的视觉模型所证明。在这里,我们介绍一种基于深度神经网络的人工系统,该系统可以创建高感知质量的艺术图像。该系统使用神经表示来分离和重组任意图像的内容和样式,为创建艺术图像提供了一种神经算法。此外,鉴于性能优化的人工神经网络与生物视觉之间的惊人相似性,我们的工作为算法理解人类如何创造和感知艺术意象提供了一条途径。
下载地址 | 返回目录 | [10.1167/16.12.326]
[2] A Neural Conversational Model (2015)
Abstract: Conversational modeling is an important task in natural language understanding and machine intelligence. Although previous approaches exist, they are often restricted to specific domains (e.g., booking an airline ticket) and require hand-crafted rules. In this paper, we present a simple approach for this task which uses the recently proposed sequence to sequence framework. Our model converses by predicting the next sentence given the previous sentence or sentences in a conversation. The strength of our model is that it can be trained end-to-end and thus requires much fewer hand-crafted rules. We find that this straightforward model can generate simple conversations given a large conversational training dataset. Our preliminary results suggest that, despite optimizing the wrong objective function, the model is able to converse well. It is able extract knowledge from both a domain specific dataset, and from a large, noisy, and general domain dataset of movie subtitles. On a domain-specific IT helpdesk dataset, the model can find a solution to a technical problem via conversations. On a noisy open-domain movie transcript dataset, the model can perform simple forms of common sense reasoning. As expected, we also find that the lack of consistency is a common failure mode of our model.
摘要: 会话建模是自然语言理解和机器智能中的重要任务。尽管存在先前的方法,但是它们通常限于特定的领域(例如,预订机票),并且需要手工制定的规则。在本文中,我们提出了一种用于此任务的简单方法,该方法使用最近提出的序列对框架进行测序。我们的模型通过在对话中给定前一个或多个句子的情况下预测下一个句子来进行交谈。我们的模型的优势在于可以端到端进行训练,因此需要的手工规则要少得多。我们发现,给定大量的会话训练数据集,这种简单的模型可以生成简单的会话。我们的初步结果表明,尽管优化了错误的目标函数,该模型仍能很好地进行对话。它既可以从特定于域的数据集,又可以从大型,嘈杂且通用的电影字幕域数据集中提取知识。在特定于域的IT服务台数据集上,该模型可以通过对话找到技术问题的解决方案。在嘈杂的开放域电影成绩单数据集上,该模型可以执行常识推理的简单形式。不出所料,我们还发现缺乏一致性是我们模型的常见故障模式。
[3] A character-level decoder without explicit segmentation for neural machine translation (2016)
Abstract: “The existing machine translation systems, whether phrase-based or neural, have relied almost exclusively on word-level modelling with explicit segmentation. In this paper, we ask a fundamental question: can neural machine translation generate a character sequence without any explicit segmentation? To answer this question, we evaluate an attention-based encoderdecoder with a subword-level encoder and a character-level decoder on four language pairs-En-Cs, En-De, En-Ru and En-Fiusing the parallel corpora from WMT15. Our experiments show that the models with a character-level decoder outperform the ones with a subword-level decoder on all of the four language pairs. Furthermore, the ensembles of neural models with a character-level decoder outperform the state-of-the-art non-neural machine translation systems on En-Cs, En-De and En-Fi and perform comparably on En-Ru.”
摘要: “现有的机器翻译系统,无论是基于短语的还是基于神经的,几乎都完全依赖具有显式分割的单词级建模。在本文中,我们提出了一个基本问题:神经机器翻译能否在不进行任何显式分割的情况下生成字符序列为了回答这个问题,我们评估了基于注意力的编码器解码器,该编码器在四个语言对(En-C,En-De,En-Ru和En-Fi上)具有子字级编码器和字符级解码器,并使用来自WMT的并行语料库15。我们的实验表明,在所有四种语言对中,具有字符级解码器的模型均优于具有子词级解码器的模型;此外,具有字符级解码器的神经模型的集成优于状态-C,En-De和En-Fi上最先进的非神经机器翻译系统,并且在En-Ru上具有可比的性能。“
下载地址 | 返回目录 | [10.18653/v1/p16-1160]
[4] A learned representation for artistic style (2019)
Abstract: The diversity of painting styles represents a rich visual vocabulary for the construction of an image. The degree to which one may learn and parsimoniously capture this visual vocabulary measures our understanding of the higher level features of paintings, if not images in general. In this work we investigate the construction of a single, scalable deep network that can parsimoniously capture the artistic style of a diversity of paintings. We demonstrate that such a network generalizes across a diversity of artistic styles by reducing a painting to a point in an embedding space. Importantly, this model permits a user to explore new painting styles by arbitrarily combining the styles learned from individual paintings. We hope that this work provides a useful step towards building rich models of paintings and offers a window on to the structure of the learned representation of artistic style.
摘要: 绘画风格的多样性代表了构建图像的丰富视觉词汇。人们可能学习和简约地掌握这一视觉词汇的程度,衡量了我们对绘画(即使不是一般图像)更高层次特征的理解。在这项工作中,我们研究了一个单一的,可扩展的深层网络的构建,该网络可以同时捕捉各种绘画的艺术风格。我们证明了这种网络通过将绘画减少到嵌入空间中的某一点来概括各种艺术风格。重要的是,该模型允许用户通过任意组合从个别绘画中学习的样式来探索新的绘画样式。我们希望这项工作为建立丰富的绘画模型提供有用的步骤,并为了解艺术风格的表现结构提供一个窗口。
[5] Actor-mimic deep multitask and transfer reinforcement learning (2016)
Abstract: The ability to act in multiple environments and transfer previous knowledge to new situations can be considered a critical aspect of any intelligent agent. Towards this goal, we define a novel method of multitask and transfer learning that enables an autonomous agent to learn how to behave in multiple tasks simultaneously, and then generalize its knowledge to new domains. This method, termed “Actor-Mimic”, exploits the use of deep reinforcement learning and model compression techniques to train a single policy network that learns how to act in a set of distinct tasks by using the guidance of several expert teachers. We then show that the representations learnt by the deep policy network are capable of generalizing to new tasks with no prior expert guidance, speeding up learning in novel environments. Although our method can in general be applied to a wide range of problems, we use Atari games as a testing environment to demonstrate these methods.
摘要: 在多种环境中采取行动并将先前的知识转移到新情况中的能力可以被视为任何智能代理的关键方面。为了实现这一目标,我们定义了一种新颖的多任务和转移学习方法,使自治代理能够学习如何同时在多个任务中表现,然后将其知识推广到新领域。这种称为“演员模仿”的方法利用深度强化学习和模型压缩技术来训练单个策略网络,该策略网络通过在多位专家老师的指导下学习如何在一组不同的任务中采取行动。然后,我们表明,深度策略网络学习到的表示能够在没有专家指导的情况下推广到新任务,从而加快了在新颖环境中的学习速度。尽管我们的方法通常可以应用于广泛的问题,但我们使用Atari游戏作为测试环境来演示这些方法。
[6] Adam: A method for stochastic optimization (2015)
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.
摘要: 我们介绍亚当(Adam),这是一种基于低阶矩的自适应估计的基于一阶梯度的随机目标函数优化算法。该方法易于实现,计算效率高,对存储器的要求很小,对于梯度的对角线重缩放不变,并且非常适合于数据和/或参数较大的问题。该方法也适用于非平稳目标和梯度非常嘈杂和/或稀疏的问题。超参数具有直观的解释,通常需要很少的调整。讨论了与启发亚当的相关算法的一些联系。我们还分析了该算法的理论收敛性质,并提供了与在线凸优化框架下的最佳已知结果相当的收敛速度的后悔界限。实证结果表明,Adam在实践中效果很好,并且可以与其他随机优化方法进行比较。最后,我们讨论AdaMax,它是基于无穷范数的Adam的变体。
[7] Addressing the rare word problem in neural machine translation (2015)
Abstract: “Neural Machine Translation (NMT) is a new approach to machine translation that has shown promising results that are comparable to traditional approaches. A significant weakness in conventional NMT systems is their inability to correctly translate very rare words: end-to-end NMTs tend to have relatively small vocabularies with a single unk symbol that represents every possible out-of-vocabulary (OOV) word. In this paper, we propose and implement an effective technique to address this problem. We train an NMT system on data that is augmented by the output of a word alignment algorithm, allowing the NMT system to emit, for each OOV word in the target sentence, the position of its corresponding word in the source sentence. This information is later utilized in a post-processing step that translates every OOV word using a dictionary. Our experiments on the WMT14 English to French translation task show that this method provides a substantial improvement of up to 2.8 BLEU points over an equivalent NMT system that does not use this technique. With 37.5 BLEU points, our NMT system is the first to surpass the best result achieved on a WMT14 contest task.”
摘要: ““神经机器翻译(NMT)是一种机器翻译的新方法,已显示出可与传统方法相媲美的令人鼓舞的结果。传统NMT系统的一个显着弱点是它们无法正确翻译非常稀有的单词:端到端NMT往往具有相对较小的词汇量,并且带有一个代表所有可能出现的词汇(OOV)单词的单一unk符号。在本文中,我们提出并实现了一种有效的技术来解决此问题。通过单词对齐算法的输出进行扩展,NMT系统可以针对目标句子中的每个OOV单词发出其对应单词在源句子中的位置,此信息随后会在后处理步骤中使用,我们使用WMT14英语到法语的翻译任务进行实验,结果表明,该方法相对于EQ而言,最多可提高2.8 BLEU点不使用此技术的等效NMT系统。我们的NMT系统拥有37.5 BLEU积分,是第一个超过WMT14竞赛任务所取得的最佳成绩的系统。”
下载地址 | 返回目录 | [10.3115/v1/p15-1002]
[8] Ask me anything: Dynamic memory networks for natural language processing (2016)
Abstract: “Most tasks in natural language processing can be cast into question answering (QA) problems over language input. We introduce the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers. Questions trigger an iterative attention process which allows the model to condition its attention on the inputs and the result of previous iterations. These results are then reasoned over in a hierarchical recurrent sequence model to generate answers. The DMN can be trained end-to-end and obtains state-of-the-art results on several types of tasks and datasets: question answering (Facebooks bAbl dataset), text classification for sentiment analysis (Stanford Sentiment Treebank) and sequence modeling for part-of-speech tagging (WSJ-PTB). The training for these different tasks relies exclusively on trained word vector representations and input-question-answer triplets.”
摘要: “自然语言处理中的大多数任务都可以通过语言输入转化为问答(QA)问题。我们介绍了动态记忆网络(DMN),这是一种神经网络架构,可以处理输入序列和问题,形成情节性记忆并生成相关信息问题触发了迭代注意过程,该过程使模型可以将注意力集中在输入和先前迭代的结果上,然后将这些结果在分层的递归序列模型中进行推理以生成答案。 -end并在几种类型的任务和数据集上获得最新结果:问答(Facebook的bAbl数据集),用于情感分析的文本分类(斯坦福情感树库)和用于词性标注的序列建模(WSJ) -PTB)。针对这些不同任务的培训完全依赖于受过培训的单词矢量表示形式和输入问题-答案三胞胎。”
[9] Asynchronous methods for deep reinforcement learning (2016)
Abstract: Wc propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task of navigating random 3D mazes using a visual input.
摘要: Wc提出了用于深度强化学习的概念上简单轻巧的框架,该框架使用异步梯度下降来优化深度神经网络控制器。我们介绍了四种标准强化学习算法的异步变体,并表明并行角色学习器对训练具有稳定作用,允许所有四种方法成功地训练神经网络控制器。性能最佳的方法是actor-critic的异步变体,它超越了Atari域上的当前技术水平,同时在单个多核CPU而非GPU上进行了一半的训练。此外,我们证明了异步参与者批评者在各种各样的连续电动机控制问题以及使用可视输入导航随机3D迷宫的新任务中取得了成功。
[10] Auto-encoding variational bayes (2014)
Abstract: How can we perform efficient inference and learning in directed probabilistic models, in the presence of continuous latent variables with intractable posterior distributions, and large datasets? We introduce a stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case. Our contributions is two-fold. First, we show that a reparameterization of the variational lower bound yields a lower bound estimator that can be straightforwardly optimized using standard stochastic gradient methods. Second, we show that for i.i.d. datasets with continuous latent variables per datapoint, posterior inference can be made especially efficient by fitting an approximate inference model (also called a recognition model) to the intractable posterior using the proposed lower bound estimator. Theoretical advantages are reflected in experimental results.
摘要: 在存在具有难解的后验分布的连续潜变量和大型数据集的情况下,我们如何在定向概率模型中进行有效的推理和学习?我们引入了一种随机变分推理和学习算法,该算法可以扩展到大型数据集,并且在某些温和的微分条件下,甚至可以在难处理的情况下工作。我们的贡献是双重的。首先,我们证明了变化下界的重新参数化产生了一个下界估计量,可以使用标准随机梯度方法直接对其进行优化。其次,我们为i.d.对于每个数据点具有连续潜在变量的数据集,通过使用拟议的下界估计器将近似推理模型(也称为识别模型)拟合到难处理的后验,可以使后验特别有效。理论上的优势体现在实验结果上。
[11] Baby talk: Understanding and generating simple image descriptions (2013)
Abstract: We present a system to automatically generate natural language descriptions from images. This system consists of two parts. The first part, content planning, smooths the output of computer vision-based detection and recognition algorithms with statistics mined from large pools of visually descriptive text to determine the best content words to use to describe an image. The second step, surface realization, chooses words to construct natural language sentences based on the predicted content and general statistics from natural language. We present multiple approaches for the surface realization step and evaluate each using automatic measures of similarity to human generated reference descriptions. We also collect forced choice human evaluations between descriptions from the proposed generation system and descriptions from competing approaches. The proposed system is very effective at producing relevant sentences for images. It also generates descriptions that are notably more true to the specific image content than previous work. {\textcopyright} 2013 IEEE.
摘要: 我们提出了一种系统,可以根据图像自动生成自然语言描述。该系统由两部分组成。第一部分是内容计划,它使用从大量视觉描述性文本池中提取的统计数据来平滑基于计算机视觉的检测和识别算法的输出,以确定用于描述图像的最佳内容词。第二步,表面实现,根据自然语言的预测内容和一般统计信息,选择单词来构建自然语言句子。我们为表面实现步骤提供了多种方法,并使用与人类生成的参考描述相似的自动度量来评估每种方法。我们还从提议的生成系统的描述与竞争方法的描述之间收集了强制选择的人工评估。所提出的系统在产生图像的相关句子方面非常有效。与以前的工作相比,它还生成了对特定图像内容更真实的描述。 {\ textcopyright} 2013 IEEE。
下载地址 | 返回目录 | [10.1109/TPAMI.2012.162]
[12] Bag of tricks for efficient text classification (2017)
Abstract: This paper explores a simple and efficient baseline for text classification. Our experiments show that our fast text classifier fastText is often on par with deep learning classifiers in terms of accuracy, and many orders of magnitude faster for training and evaluation. We can train fastText on more than one billion words in less than ten minutes using a standard multicore CPU, and classify half a million sentences among 312K classes in less than a minute.
摘要: 本文探讨了一种简单有效的文本分类基准。我们的实验表明,我们的快速文本分类器fastText在准确性方面经常与深度学习分类器相提并论,在训练和评估方面要快多个数量级。我们可以使用标准的多核CPU在不到十分钟的时间内训练fastText超过十亿个单词,并在一分钟之内对312K个类别中的50万个句子进行分类。
下载地址 | 返回目录 | [10.18653/v1/e17-2068]
[13] Batch normalization: Accelerating deep network training by reducing internal covariate shift (2015)
Abstract: “Training Deep Neural Networks is complicated by the fact that the distribution of each layers inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82{\%} top-5 test error, exceeding the accuracy of human raters.”
摘要: “训练深度神经网络的复杂性在于,在训练过程中,每层输入的分布会随着前几层的参数的变化而发生变化。这会降低学习速度并进行仔细的参数初始化,从而减慢了训练速度,并且臭名昭著难以训练具有饱和非线性的模型。我们将此现象称为内部协变量偏移,并通过归一化层输入来解决该问题。我们的方法是通过将归一化作为模型体系结构的一部分并针对每个训练最小化进行归一化来汲取其优势。批量归一化使我们可以使用更高的学习率,而不必对初始化进行任何注意,并且在某些情况下,无需进行Dropout。批量归一化应用于最新的图像分类模型,可以达到与训练步骤减少了14倍,并大大超越了原始模型。在网络化方面,我们改进了ImageNet分类中最好的已发布结果:达到4.82 {\%}前5个测试错误,超过了人类评分者的准确性。“
[14] Beyond correlation filters: Learning continuous convolution operators for visual tracking (2016)
Abstract: Discriminative Correlation Filters (DCF) have demonstrated excellent performance for visual object tracking. The key to their success is the ability to efficiently exploit available negative data by including all shifted versions of a training sample. However, the underlying DCF formulation is restricted to single-resolution feature maps, significantly limiting its potential. In this paper, we go beyond the conventional DCF framework and introduce a novel formulation for training continuous convolution filters. We employ an implicit interpolation model to pose the learning problem in the continuous spatial domain. Our proposed formulation enables efficient integration of multi-resolution deep feature maps, leading to superior results on three object tracking benchmarks: OTB-2015 (+5.1{\%} in mean OP), Temple-Color (+4.6{\%} in mean OP), and VOT2015 (20{\%} relative reduction in failure rate). Additionally, our approach is capable of sub-pixel localization, crucial for the task of accurate feature point tracking. We also demonstrate the effectiveness of our learning formulation in extensive feature point tracking experiments.
摘要: 判别相关滤波器(DCF)在视觉对象跟踪方面表现出出色的性能。他们成功的关键在于通过包含训练样本的所有移位版本来有效利用可用负面数据的能力。但是,基本的DCF公式仅限于单分辨率特征图,从而极大地限制了其潜力。在本文中,我们超越了常规的DCF框架,并介绍了一种用于训练连续卷积滤波器的新颖公式。我们采用隐式插值模型来提出连续空间域中的学习问题。我们提出的公式可实现多分辨率深度特征图的有效集成,从而在三个对象跟踪基准上产生优异的结果:OTB-2015(平均OP值+5.1 {\%}),Temple-Color(+4.6 {\% }(平均OP)和VOT2015(故障率相对降低20 {\%})。此外,我们的方法能够进行亚像素定位,这对于精确特征点跟踪的任务至关重要。我们还将在广泛的特征点跟踪实验中证明我们的学习公式的有效性。
下载地址 | 返回目录 | [10.1007/978-3-319-46454-1_29]
[15] Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 (2016)
Abstract: We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time. At training-time the binary weights and activations are used for computing the parameters gradients. During the forward pass, BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations, which is expected to substantially improve power-efficiency. To validate the effectiveness of BNNs we conduct two sets of experiments on the Torch7 and Theano frameworks. On both, BNNs achieved nearly state-of-the-art results over the MNIST, CIFAR-10 and SVHN datasets. Last but not least, we wrote a binary matrix multiplication GPU kernel with which it is possible to run our MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The code for training and running our BNNs is available on-line.
摘要: 我们引入了一种训练二进制神经网络(BNN)的方法-在运行时具有二进制权重和激活的神经网络。在训练时,二进制权重和激活用于计算参数梯度。在前向传递过程中,BNN大大减少了内存大小和访问,并用按位运算代替了大多数算术运算,这有望显着提高功率效率。为了验证BNN的有效性,我们在Torch7和Theano框架上进行了两组实验。在这两个方面,BNN都在MNIST,CIFAR-10和SVHN数据集上取得了近乎最新的结果。最后但并非最不重要的一点是,我们编写了一个二进制矩阵乘法GPU内核,通过该内核,我们的MNIST BNN可以比未优化的GPU内核快7倍运行,而不会损失任何分类精度。在线培训和运行我们的BNN的代码。
[16] Building high-level features using large scale unsupervised learning (2013)
Abstract: We consider the problem of building high-level, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images? To answer this, we train a deep sparse autoencoder on a large dataset of images (the model has 1 billion connections, the dataset has 10 million 200×200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. Control experiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation. We also find that the same network is sensitive to other high-level concepts such as cat faces and human bodies. Starting from these learned features, we trained our network to recognize 22,000 object categories from ImageNet and achieve a leap of 70{\%} relative improvement over the previous state-of-the-art. {\textcopyright} 2013 IEEE.
摘要: 我们考虑仅从未标记的数据构建高级,特定于类的特征检测器的问题。例如,是否可以仅使用未标记的图像来学习面部检测器?为了回答这个问题,我们在大型图像数据集上训练了一个深度稀疏的自动编码器(该模型具有10亿个连接,该数据集具有从Internet下载的1000万个200×200像素图像)。我们在具有1,000台机器(16,000个内核)的群集上使用模型并行性和异步SGD对该网络进行了三天的培训。与似乎普遍持有的直觉相反,我们的实验结果表明,可以训练面部检测器而不必将图像标记为包含或不包含面部。控制实验表明,该特征检测器不仅对平移具有鲁棒性,而且还对缩放和平面外旋转具有鲁棒性。我们还发现,同一网络对其他高级概念(如猫脸和人体)敏感。从这些学习的功能开始,我们训练了网络以识别ImageNet中的22,000个对象类别,并相对于以前的最新技术实现了70 {\%}的相对改进。 {\ textcopyright} 2013 IEEE。
下载地址 | 返回目录 | [10.1109/ICASSP.2013.6639343]
[17] Character-Aware neural language models (2016)
Abstract: We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convolutional neural network (CNN) and a highway network over characters, whose output is given to a long short-Term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-The-Art despite having 60{\%} fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for language modeling. Analysis of word representations obtained from the character composition part of the model reveals that the model is able to encode, from characters only, both semantic and orthographic information.
摘要: 我们描述了一种仅依赖于字符级输入的简单神经语言模型。预测仍然是字面上的。我们的模型采用了卷积神经网络(CNN)和基于字符的高速公路网络,其输出被提供给长短期记忆(LSTM)递归神经网络语言模型(RNN-LM)。在英语Penn树库中,尽管参数减少了60 {\%},但该模型与现有的最新技术水平相当。在具有丰富形态的语言(阿拉伯语,捷克语,法语,德语,西班牙语,俄语)上,该模型优于词级/词素级LSTM基线,而且参数更少。结果表明,在许多语言上,字符输入足以进行语言建模。对从模型的字符组成部分获得的单词表示的分析表明,该模型能够仅从字符编码语义和正字信息。
[18] Collective robot reinforcement learning with distributed asynchronous guided policy search (2017)
Abstract: Policy search methods and, more broadly, reinforcement learning can enable robots to learn highly complex and general skills that may allow them to function amid the complexity and diversity of the real world. However, training a policy that generalizes well across a wide range of real-world conditions requires far greater quantity and diversity of experience than is practical to collect with a single robot. Fortunately, it is possible for multiple robots to share their experience with one another, and thereby, learn a policy collectively. In this work, we explore distributed and asynchronous policy learning as a means to achieve generalization and improved training times on challenging, real-world manipulation tasks. We propose a distributed and asynchronous version of guided policy search and use it to demonstrate collective policy learning on a vision-based door opening task using four robots. We describe how both policy learning and data collection can be conducted in parallel across multiple robots, and present a detailed empirical evaluation of our system. Our results indicate that distributed learning significantly improves training time, and that parallelizing policy learning and data collection substantially improves utilization. We also demonstrate that we can achieve substantial generalization on a challenging real-world door opening task.
摘要: 政策搜索方法以及更广泛的强化学习可以使机器人学习高度复杂和通用的技能,从而使它们能够在现实世界的复杂性和多样性中发挥作用。但是,训练一个可以在广泛的现实条件下很好地概括的策略所需的经验数量和多样性要比用单个机器人收集的实践要大得多。幸运的是,多个机器人可以彼此共享经验,从而共同学习策略。在这项工作中,我们探索了分布式和异步策略学习,以此来实现对具有挑战性的实际操作任务的概括和改进的培训时间。我们提出了一种指导策略搜索的分布式和异步版本,并使用它来演示使用四个机器人在基于视觉的开门任务上的集体策略学习。我们描述了如何在多个机器人上并行进行策略学习和数据收集,并给出了对我们系统的详细实证评估。我们的结果表明,分布式学习显着提高了培训时间,而并行化策略学习和数据收集则大大提高了利用率。我们还证明,我们可以对具有挑战性的现实世界中的开门任务实现实质性概括。”
下载地址 | 返回目录 | [10.1109/IROS.2017.8202141]
[19] Colorful image colorization (2016)
Abstract: Given a grayscale photograph as input, this paper attacks the problem of hallucinating a plausible color version of the photograph. This problem is clearly underconstrained, so previous approaches have either relied on significant user interaction or resulted in desaturated colorizations. We propose a fully automatic approach that produces vibrant and realistic colorizations. We embrace the underlying uncertainty of the problem by posing it as a classification task and use class-rebalancing at training time to increase the diversity of colors in the result. The system is implemented as a feed-forward pass in a CNN at test time and is trained on over a million color images. We evaluate our algorithm using a “colorization Turing test,” asking human participants to choose between a generated and ground truth color image. Our method successfully fools humans on 32{\%} of the trials, significantly higher than previous methods. Moreover, we show that colorization can be a powerful pretext task for self-supervised feature learning, acting as a cross-channel encoder. This approach results in state-of-the-art performance on several feature learning benchmarks.
摘要: 给出灰度照片作为输入,本文解决了使照片的颜色看起来像幻像的问题。这个问题显然受到限制,因此以前的方法要么依赖于重要的用户交互,要么导致不饱和的着色。我们提出了一种全自动的方法,可以产生生动逼真的色彩。我们通过将其摆为分类任务来接受问题的潜在不确定性,并在训练时使用类重新平衡来增加结果中颜色的多样性。该系统在测试时作为CNN中的前馈通道实现,并在超过一百万个彩色图像上进行了训练。我们使用“着色图灵测试”评估算法,要求人类参与者在生成的和真实的彩色图像之间进行选择。我们的方法在32%的试验中成功地欺骗了人类,远高于以前的方法。此外,我们表明,着色可以作为跨通道编码器,是自监督特征学习的强大前言任务。这种方法可在多个功能学习基准上产生最先进的性能。
下载地址 | 返回目录 | [10.1007/978-3-319-46487-9_40]
[20] Conditional image generation with PixelCNN decoders (2016)
Abstract: This work explores conditional image generation with a new image density model based on the PixelCNN architecture. The model can be conditioned on any vector, including descriptive labels or tags, or latent embeddings created by other networks. When conditioned on class labels from the ImageNet database, the model is able to generate diverse, realistic scenes representing distinct animals, objects, landscapes and structures. When conditioned on an embedding produced by a convolutional network given a single image of an unseen face, it generates a variety of new portraits of the same person with different facial expressions, poses and lighting conditions. We also show that conditional PixelCNN can serve as a powerful decoder in an image autoencoder. Additionally, the gated convolutional layers in the proposed model improve the log-likelihood of PixelCNN to match the state-of-the-art performance of PixelRNN on ImageNet, with greatly reduced computational cost.
摘要: 这项工作探索了基于PixelCNN架构的新图像密度模型的条件图像生成。该模型可以任何载体为条件,包括描述性标签或标记,或由其他网络创建的潜在嵌入。当以ImageNet数据库中的类标签为条件时,该模型能够生成代表不同动物,物体,风景和建筑物的多样逼真的场景。当以卷积网络产生的嵌入为条件(给定一张看不见的脸部的单个图像)时,它会生成具有不同面部表情,姿势和光照条件的同一个人的各种新肖像。我们还表明,条件PixelCNN可以在图像自动编码器中充当强大的解码器。此外,所提出模型中的门控卷积层提高了PixelCNN的对数似然性,以匹配ImageNet上PixelRNN的最新性能,并大大降低了计算成本。
[21] Continuous control with deep reinforcement learning (2016)
Abstract: We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. We present an actor-critic, model-free algorithm based on the deterministic policy gradient that can operate over continuous action spaces. Using the same learning algorithm, network architecture and hyper-parameters, our algorithm robustly solves more than 20 simulated physics tasks, including classic problems such as cartpole swing-up, dexterous manipulation, legged locomotion and car driving. Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies “end-to-end”: directly from raw pixel inputs.
摘要: 我们将Deep Q-Learning成功背后的思想适应于连续行动领域。我们基于可在连续动作空间上运行的确定性策略梯度,提出了一种基于参与者的,无模型的算法。使用相同的学习算法,网络体系结构和超参数,我们的算法可以稳健地解决20多个模拟物理任务,包括经典问题,例如卡杆摆动,灵巧操纵,有腿运动和汽车驾驶。我们的算法能够找到性能与计划算法相媲美的策略,并且可以完全访问域及其派生的动态。我们进一步证明,对于许多任务,该算法可以直接从原始像素输入中学习“端到端”策略。
[22] Continuous deep q-learning with model-based acceleration (2016)
Abstract: Model-free reinforcement learning has been successfully applied to a range of challenging problems, and has recently been extended to handle large neural network policies and value functions. However, the sample complexity of model-free algorithms, particularly when using high- dimensional function approximators, tends to limit their applicability to physical systems. In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks. We propose two complementary techniques for improving the efficiency of such algorithms. First, we derive a continuous variant of the Q-leaming algorithm, which we call normalized advantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods. NAF representation allows us to apply Q-learning with experience replay to continuous tasks, and substantially improves performance on a set of simulated robotic control tasks. To further improve the efficiency of our approach, we explore the use of learned models for accelerating model-free reinforcement learning. We show that iteratively refitted local linear models are especially effective for this, and demonstrate substantially faster learning on domains where such models are applicable.
摘要: 无模型强化学习已成功应用于一系列具有挑战性的问题,并且最近已扩展为处理大型神经网络策略和价值函数。但是,无模型算法的样本复杂性,尤其是在使用高维函数逼近器时,往往会限制其在物理系统中的适用性。在本文中,我们探索了算法和表示形式,以减少用于连续控制任务的深度强化学习的样本复杂性。我们提出了两种补充技术来提高这种算法的效率。首先,我们推导了Q学习算法的连续变体,我们将其称为归一化优势函数(NAF),以替代更常用的策略梯度和参与者批评方法。 NAF表示使我们能够将具有经验重播的Q学习应用于连续任务,并显着提高了一组模拟机器人控制任务的性能。为了进一步提高我们的方法的效率,我们探索了使用学习的模型来加速无模型强化学习的过程。我们证明了迭代地重新拟合的局部线性模型对此特别有效,并且证明了在此类模型适用的领域上的学习速度大大提高。
[23] Controlling perceptual factors in neural style transfer (2017)
Abstract: Neural Style Transfer has shown very exciting results enabling new forms of image manipulation. Here we extend the existing method to introduce control over spatial location, colour information and across spatial scale12. We demonstrate how this enhances the method by allowing high-resolution controlled stylisation and helps to alleviate common failure cases such as applying ground textures to sky regions. Furthermore, by decomposing style into these perceptual factors we enable the combination of style information from multiple sources to generate new, perceptually appealing styles from existing ones. We also describe how these methods can be used to more efficiently produce large size, high-quality stylisation. Finally we show how the introduced control measures can be applied in recent methods for Fast Neural Style Transfer.
摘要: 神经样式转换已显示出令人兴奋的结果,可实现新形式的图像处理。在这里,我们扩展了现有方法,以引入对空间位置,颜色信息和整个空间比例尺的控制。我们演示了如何通过允许高分辨率控制的样式化来增强方法,并有助于减轻常见的故障情况,例如将地面纹理应用于天空区域。此外,通过将样式分解为这些感知因素,我们可以将来自多个来源的样式信息进行组合,以从现有样式中生成新的,具有吸引力的样式。我们还将描述如何使用这些方法更有效地生产大尺寸,高质量的样式化。最后,我们展示了如何在最近的快速神经风格转移方法中应用引入的控制措施。
下载地址 | 返回目录 | [10.1109/CVPR.2017.397]
[24] Db89979D95C2465936017076F280Dc25.Pdf (2014)
Abstract: This paper introduces the Deep Recurrent Attentive Writer (DRAW) neural network architecture for image generation. DRAW networks combine a novel spatial attention mechanism that mimics the foveation of the human eye, with a sequential variational auto-encoding framework that allows for the iterative construction of complex images. The system substantially improves on the state of the art for generative models on MNIST, and, when trained on the Street View House Numbers dataset, it generates images that cannot be distinguished from real data with the naked eye.
摘要: 本文介绍了用于图像生成的深度递归专注作者(DRAW)神经网络体系结构。 DRAW网络结合了模仿人眼偏爱的新颖空间注意力机制,以及允许迭代构造复杂图像的顺序可变自动编码框架。该系统极大地改进了MNIST生成模型的最新技术,并且在街景房号数据集上进行训练后,该系统生成的图像无法用肉眼与真实数据区分开。
[25] Decoupled neural interfaces using synthetic gradients (2017)
Abstract: “Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagaling error signal, to produce weight updates. All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated. In this work we break this constraint by decoupling modules by introducing a model of the future computation of the network graph. These models predict what the result of the modelled subgraph will produce using only local information. In particular we focus on modelling error gradients: by using the modelled synthetic gradient in place of true backpropa-gated error gradients we decouple subgraphs, and can update them independently and asynchronously i.e. we realise decoupled neural interfaces. We show results for feed-forward models, where every layer is trained asynchronously, recurrent neural networks (RNNs) where predicting ones future gradient extends the time over which the RNN can effectively model, and also a hierarchical RNN system with ticking at different timescales. Finally, we demonstrate that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which arc decoupled in both the forward and backwards pass - amounting to independent networks which co-learn such that they can be composed into a single functioning corporation.”
摘要: “定向训练神经网络通常需要通过计算图向前传播数据,然后向后传播误差信号以产生权重更新。因此,网络的所有层(或更一般地说,模块)都必须锁定,因为它们必须等待网络的其余部分向前执行并向后传播错误,然后再进行更新。在这项工作中,我们通过引入网络图的未来计算模型来解耦模块,从而打破了这种约束。建模后的子图将仅使用局部信息生成,特别是我们将重点放在误差梯度建模上:通过使用建模的合成梯度代替真实的反向传播误差梯度,我们对子图进行解耦,并且可以独立和异步地更新它们,即我们实现了神经网络的解耦界面,我们展示了前馈模型的结果,其中每一层都是异步训练的,循环神经l网络(RNN),在其中预测一个人的未来梯度会延长RNN可以有效建模的时间,而且还会在不同的时间尺度上滴答作响的分层RNN系统。最后,我们证明,除了预测梯度之外,相同的框架还可以用于预测输入,从而导致模型在前向和后向传递中都解耦-构成独立的网络,这些网络可以共同学习,从而可以将它们组合成一家运作正常的公司。”
[26] Deep Learning of Representations for Unsupervised and Transfer Learning (2011)
Abstract: Deep learning algorithms seek to exploit the unknown structure in the input distribution in order to discover good representations, often at multiple levels, with higher-level learned features defined in terms of lower-level features. The objective is to make these higher- level representations more abstract, with their individual features more invariant to most of the variations that are typically present in the training distribution, while collectively preserving as much as possible of the information in the input. Ideally, we would like these representations to disentangle the unknown factors of variation that underlie the training distribution. Such unsupervised learning of representations can be exploited usefully under the hypothesis that the input distribution P(x) is structurally related to some task of interest, say predicting P(y|x). This paper focusses on why unsupervised pre-training of representations can be useful, and how it can be exploited in the transfer learning scenario, where we care about predictions on examples that are not from the same distribution as the training distribution
摘要: 深度学习算法试图利用输入分布中的未知结构来发现良好的表示形式,通常是在多个级别上进行的,其中较高级别的学习功能是根据较低级别的功能定义的。目的是使这些更高级别的表示更加抽象,它们的个体特征对于训练分布中通常存在的大多数变体而言更加不变,同时在输入中尽可能多地保留信息。理想情况下,我们希望这些表示能解开构成训练分布基础的未知变化因素。在输入分布P(x)与某些感兴趣的任务(例如预测P(y | x))在结构上相关的假设下,可以有效地利用这种无监督的表示学习方法。本文着重于为何无监督的表征预训练为何有用,以及如何在转移学习场景中加以利用,在这里我们关注的是与训练分布不同的示例的预测。
[27] Deep Neural Networks for Acoustic Modeling in the Presence of Noise (2018)
Abstract: Systems using deep neural network (DNN) have shown promising results in automatic speech recognition (ASR), where one of the biggest challenges is the recognition in noisy speech signals. We have combined two famous architectures of deep learning, the convolutional neural networks (CNN) for acoustic approach and a recurrent architecture with connectionist temporal classification (CTC) for sequential modeling, in order to decode the frames in a sequence forming a word. Experimental results show that the proposed architecture achieves improved performance over classical models, such as hidden model Markov (HMM) for labeling in variable time sequences in BioChaves database.
摘要: 使用深度神经网络(DNN)的系统已在自动语音识别(ASR)中显示出令人鼓舞的结果,其中最大的挑战之一是在嘈杂的语音信号中进行识别。我们结合了两种著名的深度学习架构,用于声学方法的卷积神经网络(CNN)和用于顺序建模的带有连接主义时间分类(CTC)的递归架构,以对构成单词的序列进行解码。实验结果表明,所提出的体系结构在经典模型(例如,用于在BioChaves数据库中以可变时间序列进行标记的隐模型Markov(HMM))之上的性能有所提高。
下载地址 | 返回目录 | [10.1109/TLA.2018.8358674]
[28] Deep captioning with multimodal recurrent neural networks (m-RNN) (2015)
Abstract: In this paper, we present a multimodal Recurrent Neural Network (m-RNN) model for generating novel image captions. It directly models the probability distribution of generating a word given previous words and an image. Image captions are generated according to this distribution. The model consists of two sub-networks: a deep recurrent neural network for sentences and a deep convolutional network for images. These two sub-networks interact with each other in a multimodal layer to form the whole m-RNN model. The effectiveness of our model is validated on four benchmark datasets (IAPR TC-12, Flickr 8K, Flickr 30K and MS COCO). Our model outperforms the state-of-the-art methods. In addition, we apply the m-RNN model to retrieval tasks for retrieving images or sentences, and achieves significant performance improvement over the state-of-the-art methods which directly optimize the ranking objective function for retrieval. The project page of this work is: www.stat.ucla.edu/{\~{}}junhua.mao/m-RNN.html. 1
摘要: 在本文中,我们提出了一种用于生成新颖图像字幕的多模式递归神经网络(m-RNN)模型。它直接为给定先前单词和图像的单词生成概率分布建模。根据此分布生成图像标题。该模型由两个子网组成:一个用于句子的深层递归神经网络和一个用于图像的深层卷积网络。这两个子网在多模式层中相互交互以形成整个m-RNN模型。我们的模型的有效性在四个基准数据集(IAPR TC-12,Flickr 8K,Flickr 30K和MS COCO)上得到了验证。我们的模型优于最新方法。此外,我们将m-RNN模型应用于检索图像或句子的检索任务,与直接优化排名目标函数以进行检索的最新方法相比,其性能得到了显着提高。这项工作的项目页面为:www.stat.ucla.edu/{\~{}}junhua.mao/m-RNN.html。 1
[29] Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding (2016)
Abstract: Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems with limited hardware resources. To address this limitation, we introduce “deep compression”, a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35× to 49× without affecting their accuracy. Our method first prunes the network by learning only the important connections. Next, we quantize the weights to enforce weight sharing, finally, we apply Huffman coding. After the first two steps we retrain the network to fine tune the remaining connections and the quantized centroids. Pruning, reduces the number of connections by 9× to 13×; Quantization then reduces the number of bits that represent each connection from 32 to 5. On the ImageNet dataset, our method reduced the storage required by AlexNet by 35×, from 240MB to 6.9MB, without loss of accuracy. Our method reduced the size of VGG-16 by 49× from 552MB to 11.3MB, again with no loss of accuracy. This allows fitting the model into on-chip SRAM cache rather than off-chip DRAM memory. Our compression method also facilitates the use of complex neural networks in mobile applications where application size and download bandwidth are constrained. Benchmarked on CPU, GPU and mobile GPU, compressed network has 3× to 4× layerwise speedup and 3× to 7× better energy efficiency.
摘要: 神经网络既需要大量计算,又需要大量内存,因此很难在硬件资源有限的嵌入式系统上进行部署。为了解决此限制,我们引入了“深度压缩”这三个阶段的流水线:修剪,经过训练的量化和霍夫曼编码,它们可以共同协作,将神经网络的存储需求减少35倍至49倍,而不会影响其准确性。我们的方法首先通过仅学习重要连接来修剪网络。接下来,我们量化权重以实施权重共享,最后,我们应用霍夫曼编码。经过前两个步骤后,我们对网络进行了重新培训,以微调其余的连接和量化的质心。修剪将连接数减少9倍至13倍;量化然后将表示每个连接的位数从32减少到5。在ImageNet数据集上,我们的方法将AlexNet所需的存储空间从240MB减少到6.9MB,减少了35倍,而不会降低准确性。我们的方法将VGG-16的大小从552MB减小了49倍,至11.3MB,同样没有损失精度。这允许将模型拟合到片上SRAM缓存中,而不是片外DRAM存储器中。我们的压缩方法还有助于在应用程序大小和下载带宽受到限制的移动应用程序中使用复杂的神经网络。以CPU,GPU和移动GPU为基准,压缩网络的分层速度提高了3倍至4倍,能源效率提高了3倍至7倍。
[30] Deep fragment embeddings for bidirectional image sentence mapping (2014)
Abstract: We introduce a model for bidirectional retrieval of images and sentences through a deep, multi-modal embedding of visual and natural language data. Unlike previous models that directly map images or sentences into a common embedding space, our model works on a finer level and embeds fragments of images (objects) and fragments of sentences (typed dependency tree relations) into a common space. We then introduce a structured max-margin objective that allows our model to explicitly associate these fragments across modalities. Extensive experimental evaluation shows that reasoning on both the global level of images and sentences and the finer level of their respective fragments improves performance on image-sentence retrieval tasks. Additionally, our model provides interpretable predictions for the image-sentence retrieval task since the inferred inter-modal alignment of fragments is explicit.
摘要: 我们引入了一种通过视觉和自然语言数据的深度,多模式嵌入来双向检索图像和句子的模型。与以前的将图像或句子直接映射到公共嵌入空间中的模型不同,我们的模型在更高的层次上工作,并将图像(对象)的片段和句子(类型化的依赖关系)类型嵌入到公共空间中。然后,我们引入结构化的最大保证金目标,该目标使我们的模型可以跨模式显式关联这些片段。广泛的实验评估表明,在图像和句子的全局级别以及它们各自片段的更精细级别上进行推理,可以提高图像句子检索任务的性能。此外,我们的模型为图像句子检索任务提供了可解释的预测,因为推断出的片段之间的模态比对是明确的。
[31] Deep learning (2015)
Abstract: Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.
摘要: 深度学习允许由多个处理层组成的计算模型学习具有多个抽象级别的数据表示。这些方法极大地改善了语音识别,视觉对象识别,对象检测以及许多其他领域的最新技术,例如药物发现和基因组学。深度学习通过使用反向传播算法来指示机器应如何更改其内部参数来发现大型数据集中的复杂结构,这些内部参数用于根据上一层中的表示来计算每一层中的表示。深层卷积网络在处理图像,视频,语音和音频方面取得了突破,而递归网络则对诸如文本和语音之类的顺序数据有所启发。
下载地址 | 返回目录 | [10.1038/nature14539]
[32] Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates (2017)
Abstract: Reinforcement learning holds the promise of enabling autonomous robots to learn large repertoires of behavioral skills with minimal human intervention. However, robotic applications of reinforcement learning often compromise the autonomy of the learning process in favor of achieving training times that are practical for real physical systems. This typically involves introducing hand-engineered policy representations and human-supplied demonstrations. Deep reinforcement learning alleviates this limitation by training general-purpose neural network policies, but applications of direct deep reinforcement learning algorithms have so far been restricted to simulated settings and relatively simple tasks, due to their apparent high sample complexity. In this paper, we demonstrate that a recent deep reinforcement learning algorithm based on off-policy training of deep Q-functions can scale to complex 3D manipulation tasks and can learn deep neural network policies efficiently enough to train on real physical robots. We demonstrate that the training times can be further reduced by parallelizing the algorithm across multiple robots which pool their policy updates asynchronously. Our experimental evaluation shows that our method can learn a variety of 3D manipulation skills in simulation and a complex door opening skill on real robots without any prior demonstrations or manually designed representations.
摘要: 强化学习有望使自主机器人能够以最少的人工干预来学习大量的行为技能。但是,强化学习的机器人应用通常会损害学习过程的自主性,从而有利于实现对实际物理系统而言切实可行的训练时间。这通常涉及引入人工设计的策略表示形式和人工提供的演示。深度强化学习通过训练通用神经网络策略减轻了这种限制,但是由于其明显的高样本复杂性,直接深度强化学习算法的应用到目前为止仅限于模拟设置和相对简单的任务。在本文中,我们证明了基于深度Q功能的非策略训练的最新深度强化学习算法可以扩展到复杂的3D操作任务,并且可以足够有效地学习深度神经网络策略以在实际的物理机器人上进行训练。我们证明,通过在多个异步存储策略更新的机器人之间并行化算法,可以进一步减少训练时间。我们的实验评估表明,我们的方法可以在仿真中学习各种3D操作技能,并可以在真实的机器人上学习复杂的开门技能,而无需任何先前的演示或手动设计的表示。”
下载地址 | 返回目录 | [10.1109/ICRA.2017.7989385]
[33] Deep residual learning for image recognition (2016)
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers - 8× deeper than VGG nets 40 but still having lower complexity. An ensemble of these residual nets achieves 3.57{\%} error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28{\%} relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC {\&} COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.
摘要: 更深的神经网络更难训练。我们提出了一种残差的学习框架,以简化比以前使用的网络更深入的网络训练。我们显式地将层重新构造为参考层输入学习剩余函数,而不是学习未参考函数。我们提供了全面的经验证据,表明这些残差网络更易于优化,并且可以通过深度的增加而获得准确性。在ImageNet数据集上,我们评估了深度最大为152层的残差网络-比VGG网络高8倍40,但仍具有较低的复杂度。这些残留网络的整体在ImageNet测试集上实现了3.57 {\%}的误差。该结果在ILSVRC 2015分类任务中获得第一名。我们还将介绍具有100和1000层的CIFAR-10的分析。表示的深度对于许多视觉识别任务至关重要。仅由于我们的深度表示,我们在COCO对象检测数据集上获得了28 {\%}的相对改进。深度残差网是我们提交ILSVRC {\&} COCO 2015竞赛1的基础,在该竞赛中,我们还获得了ImageNet检测,ImageNet本地化,COCO检测和COCO分割等任务的第一名。
下载地址 | 返回目录 | [10.1109/CVPR.2016.90]
[34] Deep speech 2: End-to-end speech recognition in English and Mandarin (2016)
Abstract: We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech-two vastly different languages. Because it replaces entire pipelines of hand-engineered components with neural networks, end-to-end learning allows us to handle a diverse variety of speech including noisy environments, accents and different languages. Key to our approach is our application of HPC techniques, enabling experiments that previously took weeks to now run in days. This allows us to iterate more quickly to identify superior architectures and algorithms. As a result, in several cases, our system is competitive with the transcription of human workers when benchmarked on standard datasets. Finally, using a technique called Batch Dispatch with GPUs in the data center, we show that our system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.
摘要: 我们证明了可以使用端到端的深度学习方法来识别英语或汉语普通话-两种截然不同的语言。因为它用神经网络代替了人工工程组件的整个流水线,所以端到端的学习使我们能够处理各种语音,包括嘈杂的环境,口音和不同的语言。我们方法的关键是我们对HPC技术的应用,使以前需要数周才能完成的实验现在可以在数天内运行。这使我们能够更快地进行迭代,以识别出卓越的体系结构和算法。结果,在某些情况下,当以标准数据集为基准进行比较时,我们的系统在人工转录方面具有竞争力。最后,在数据中心中使用一种称为“具有GPU的批处理调度”的技术,我们证明了我们的系统可以廉价地在线部署,在为大规模用户提供服务时提供了低延迟。”
[35] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs (2018)
Abstract: “In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or atrous convolution, as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed DeepLab system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7 percent mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.”
摘要: ““在这项工作中,我们通过深度学习解决了语义图像分割的任务,并做出了三项主要贡献,这些贡献在实验上被证明具有实质性的实用价值。首先,我们将上采样过滤器或“无用卷积”作为强大的工具来强调卷积在密集的预测任务中,Atrous卷积使我们可以显式控制深度卷积神经网络中计算特征响应的分辨率,还可以使我们有效地扩大过滤器的视场以合并更大的上下文,而无需增加参数或其次,我们提出了多孔空间金字塔池(ASPP)以在多个尺度上稳健地分割对象,ASPP使用多个采样率和有效视场的过滤器探测传入的卷积特征层,从而也捕获了对象作为图像上下文的多尺度第三,我们通过结合来改进对象边界的定位DCNN和概率图形模型的方法。 DCNN中通常采用的最大合并和下采样组合实现不变性,但会影响定位精度。我们通过将最终DCNN层的响应与完全连接的条件随机场(CRF)结合起来来克服这一问题,定性和定量地显示了条件随机场以改善定位性能。我们提出的“ DeepLab”系统在PASCAL VOC-2012语义图像分割任务中设置了新的技术水平,在测试集中达到了79.7%mIOU,并在其他三个数据集上进行了改进:PASCAL-Context,PASCAL-Person -部分和城市景观。我们所有的代码都可以在线公开获得。”
下载地址 | 返回目录 | [10.1109/TPAMI.2017.2699184]
[36] Distilling the Knowledge in a Neural Network (2015)
Abstract: A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.
摘要: 提高几乎所有机器学习算法性能的一种非常简单的方法是在相同的数据上训练许多不同的模型,然后平均其预测。不幸的是,使用整个模型的整体进行预测很麻烦,并且可能在计算上过于昂贵,以致无法部署到大量用户,尤其是在单个模型是大型神经网络的情况下。 Caruana及其合作者表明,可以将集成中的知识压缩到一个更易于部署的单一模型中,我们将使用另一种压缩技术进一步开发此方法。我们在MNIST上取得了一些令人惊讶的结果,并且表明通过将模型集成中的知识提取为单个模型,可以显着改善频繁使用的商业系统的声学模型。我们还介绍了一种由一个或多个完整模型以及许多专业模型组成的新型合奏,这些模型学习了区分完整模型所混淆的细粒度类。与专家的混合体不同,可以快速而并行地训练这些专家模型。
[37] Dueling Network Architectures for Deep Reinforcement Learning (2016)
Abstract: In recent years there have been many successes of using deep representations in reinforcement learning. Still, many of these applications use conventional architectures, such as convolutional networks, LSTMs, or auto-encoders. In this paper, we present a new neural network architecture for model-free reinforcement learning. Our dueling network represents two separate estimators: one for the state value function and one for the state-dependent action advantage function. The main benefit of this factoring is to generalize learning across actions without imposing any change to the underlying reinforcement learning algorithm. Our results show that this architecture leads to better policy evaluation in the presence of many similar-valued actions. Moreover, the dueling architecture enables our RL agent to outperform the state-of-the-art on the Atari 2600 domain.
摘要: 近年来,在强化学习中使用深度表示已经取得了许多成功。尽管如此,许多这些应用仍使用常规体系结构,例如卷积网络,LSTM或自动编码器。在本文中,我们提出了一种新的神经网络架构,用于无模型的强化学习。我们的决斗网络代表两个独立的估算器:一个用于状态值函数,另一个用于状态相关的行动优势函数。这种分解的主要好处是可以在不对基础增强学习算法进行任何更改的情况下,将跨动作的学习普遍化。我们的结果表明,在存在许多具有相似价值的操作的情况下,此体系结构可导致更好的策略评估。而且,决斗架构使我们的RL代理能够胜过Atari 2600域上的最新技术。
[38] Effective approaches to attention-based neural machine translation (2015)
Abstract: “An attentional mechanism has lately been used to improve neural machine translation (NMT) by selectively focusing on parts of the source sentence during translation. However, there has been little work exploring useful architectures for attention-based NMT. This paper examines two simple and effective classes of attentional mechanism: a global approach which always attends to all source words and a local one that only looks at a subset of source words at a time. We demonstrate the effectiveness of both approaches on the WMT translation tasks between English and German in both directions. With local attention, we achieve a significant gain of 5.0 BLEU points over non-attentional systems that already incorporate known techniques such as dropout. Our ensemble model using different attention architectures yields a new state-of-the-art result in the WMT15 English to German translation task with 25.9 BLEU points, an improvement of 1.0 BLEU points over the existing best system backed by NMT and an n-gram reranker.”
摘要: “最近,注意力机制已被用于通过在翻译过程中选择性地关注源句子的某些部分来改善神经机器翻译(NMT)。但是,针对基于注意力的NMT探索有用的体系结构的工作很少。本文研究了两种简单的方法和有效的注意力机制类别:始终关注所有源词的全局方法和一次仅关注源词子集的局部方法,我们证明了这两种方法对英语和德语之间的WMT翻译任务的有效性与本地注意力相比,我们已经获得了5.0 BLEU积分的显着提高,而非注意力系统已经采用了已知的技术,例如辍学,我们使用不同注意力结构的集成模型产生了最新的最新结果。 WMT15英德翻译任务获得25.9 BLEU点,比NM支持的现有最佳系统提高了1.0 BLEU点T和一个n-gram重整器。“
下载地址 | 返回目录 | [10.18653/v1/d15-1166]
[39] End-to-end memory networks (2015)
Abstract: We introduce a neural network with a recurrent attention model over a possibly large external memory. The architecture is a form of Memory Network 23 but unlike the model in that work, it is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings. It can also be seen as an extension of RNNsearch 2 to the case where multiple computational steps (hops) are performed per output symbol. The flexibility of the model allows us to apply it to tasks as diverse as (synthetic) question answering 22 and to language modeling. For the former our approach is competitive with Memory Networks, but with less supervision. For the latter, on the Penn TreeBank and Text8 datasets our approach demonstrates comparable performance to RNNs and LSTMs. In both cases we show that the key concept of multiple computational hops yields improved results.
摘要: 我们在可能较大的外部存储器上引入具有递归注意模型的神经网络。该体系结构是内存网络的一种形式23,但与该模型不同,它是端对端训练的,因此在训练过程中所需的监督少得多,从而使其更普遍地应用于现实环境。它也可以看作是RNNsearch 2的扩展,扩展到对每个输出符号执行多个计算步骤(跳数)的情况。该模型的灵活性使我们能够将其应用于诸如(综合)问答22之类的各种任务以及语言建模。对于前者,我们的方法在内存网络方面具有竞争力,但监督较少。对于后者,在Penn TreeBank和Text8数据集上,我们的方法展示了与RNN和LSTM相当的性能。在这两种情况下,我们都表明,多次计算跃点的关键概念可以带来更好的结果。
[40] End-to-end training of deep visuomotor policies (2016)
Abstract: “Policy search methods can allow robots to learn control policies for a wide range of tasks, but practical applications of policy search often require hand-engineered components for perception, state estimation, and low-level control. In this paper, we aim to answer the following question: does training the perception and control systems jointly end-to-end provide better performance than training each component separately? To this end, we develop a method that can be used to learn policies that map raw image observations directly to torques at the robots motors. The policies are represented by deep convolutional neural networks (CNNs) with 92,000 parameters, and are trained using a guided policy search method, which transforms policy search into supervised learning, with supervision provided by a simple trajectory-centric reinforcement learning method. We evaluate our method on a range of real-world manipulation tasks that require close coordination between vision and control, such as screwing a cap onto a bottle, and present simulated comparisons to a range of prior policy search methods.”
摘要: “策略搜索方法可以使机器人学习各种任务的控制策略,但是策略搜索的实际应用通常需要人工设计的组件来进行感知,状态估计和低级控制。在本文中,我们旨在回答以下问题:与单独训练每个组件相比,端到端联合训练感知和控制系统是否提供更好的性能?为此,我们开发了一种可用于学习将原始图像观测值直接映射到扭矩的策略的方法。策略由具有92,000个参数的深层卷积神经网络(CNN)表示,并使用引导策略搜索方法进行训练,该策略将策略搜索转换为监督学习,并通过简单的以轨迹为中心的强化学习提供监督我们在一系列需要视觉和控制之间紧密协调的实际操作任务上评估我们的方法。 s将螺帽拧到瓶子上,并提供与一系列现有政策搜索方法的模拟比较。”
[41] Every picture tells a story: Generating sentences from images (2010)
Abstract: Humans can prepare concise descriptions of pictures, focusing on what they find important. We demonstrate that automatic methods can do so too. We describe a system that can compute a score linking an image to a sentence. This score can be used to attach a descriptive sentence to a given image, or to obtain images that illustrate a given sentence. The score is obtained by comparing an estimate of meaning obtained from the image to one obtained from the sentence. Each estimate of meaning comes from a discriminative procedure that is learned using data. We evaluate on a novel dataset consisting of human-annotated images. While our underlying estimate of meaning is impoverished, it is sufficient to produce very good quantitative results, evaluated with a novel score that can account for synecdoche. {\textcopyright} 2010 Springer-Verlag.
摘要: 人类可以准备简要的图片描述,着重于他们认为重要的内容。我们证明了自动方法也可以做到这一点。我们描述了一种可以计算将图像链接到句子的分数的系统。该分数可用于将描述性句子附加到给定图像,或获取用于说明给定句子的图像。通过将从图像获得的含义的估计值与从句子获得的含义的估计值进行比较来获得分数。每个含义的估计都来自使用数据学习的判别过程。我们评估了由人类注释图像组成的新型数据集。虽然我们对意义的基本估计是贫穷的,但足以产生非常好的定量结果,并用可以解释共鸣的新颖分数进行评估。 {\ textcopyright} 2010年Springer-Verlag。
下载地址 | 返回目录 | [10.1007/978-3-642-15561-1_2]
[42] Evolving large-scale neural networks for vision-based reinforcement learning (2013)
Abstract: The idea of using evolutionary computation to train artificial neural networks, or neuroevolution (NE), for reinforcement learning (RL) tasks has now been around for over 20 years. However, as RL tasks become more challenging, the networks required become larger, so do their genomes. But, scaling NE to large nets (i.e. tens of thousands of weights) is infeasible using direct encodings that map genes one-to-one to network components. In this paper, we scale-up our “compressed” network encoding where network weight matrices are represented indirectly as a set of Fourier-type coefficients, to tasks that require very-large networks due to the high-dimensionality of their input space. The approach is demonstrated successfully on two reinforcement learning tasks in which the control networks receive visual input: (1) a vision-based version of the octopus control task requiring networks with over 3 thousand weights, and (2) a version of the TORCS driving game where networks with over 1 million weights are evolved to drive a car around a track using video images from the driver\s perspective. Copyright {\textcopyright} 2013 ACM.
摘要: 使用进化计算来训练人工神经网络或神经进化(NE)来进行强化学习(RL)任务的想法现已存在20多年了。但是,随着RL任务变得越来越具有挑战性,所需的网络也会越来越大,其基因组也会越来越大。但是,使用直接编码将基因一对一映射到网络组件的直接编码,将NE扩展到大型网络(即成千上万的权重)是不可行的。在本文中,我们扩大了“压缩”网络编码的规模,其中网络权重矩阵间接表示为一组傅立叶类型系数,由于其输入空间的高维性,它们需要非常大的网络。该方法已在两个强化学习任务中得到了成功演示,其中控制网络接收到视觉输入:(1)章鱼控制任务的基于视觉的版本,要求网络具有超过3000的权重;(2)TORCS驱动的版本从驾驶员的角度来看,具有超过100万权重的网络已经演变为使用视频图像在轨道上驾驶汽车的游戏。版权所有{\ textcopyright} 2013 ACM。
下载地址 | 返回目录 | [10.1145/2463372.2463509]
[43] Fast R-CNN (2015)
Abstract: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9x faster than R-CNN, is 213x faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3x faster, tests 10x faster, and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at https://github.com/rbgirshick/fast-rcnn.
摘要: 本文提出了一种基于快速区域的卷积网络方法(Fast R-CNN)用于目标检测。快速R-CNN以先前的工作为基础,使用深度卷积网络对目标提案进行有效分类。与以前的工作相比,Fast R-CNN采用了多项创新,可以提高训练和测试速度,同时还可以提高检测精度。快速R-CNN训练非常深的VGG16网络的速度比R-CNN快9倍,测试时间快213倍,并且在PASCAL VOC 2012上实现了更高的mAP。与SPPnet相比,快速R-CNN训练VGG16的速度快了3倍,测试速度提高了10倍更快,更准确。快速R-CNN是使用Python和C ++(使用Caffe)实现的,并且可在https://github.com/rbgirshick/fast-rcnn的开源MIT许可下获得。”
下载地址 | 返回目录 | [10.1109/ICCV.2015.169]
[44] Fast and accurate recurrent neural network acoustic models for speech recognition (2015)
Abstract: We have recently shown that deep Long Short-Term Memory (LSTM) recurrent neural networks (RNNs) outperform feed forward deep neural networks (DNNs) as acoustic models for speech recognition. More recently, we have shown that the performance of sequence trained context dependent (CD) hidden Markov model (HMM) acoustic models using such LSTM RNNs can be equaled by sequence trained phone models initialized with connectionist temporal classification (CTC). In this paper, we present techniques that further improve performance of LSTM RNN acoustic models for large vocabulary speech recognition. We show that frame stacking and reduced frame rate lead to more accurate models and faster decoding. CD phone modeling leads to further improvements. We also present initial results for LSTM RNN models outputting words directly.
摘要: 我们最近表明,作为语音识别的声学模型,深度长短期记忆(LSTM)递归神经网络(RNN)优于前馈深度神经网络(DNN)。最近,我们已经证明,使用此类LSTM RNN进行序列训练的上下文相关(CD)隐马尔可夫模型(HMM)声学模型的性能可以与用连接主义时间分类(CTC)初始化的序列训练的电话模型相提并论。在本文中,我们提出了可进一步提高用于大词汇量语音识别的LSTM RNN声学模型性能的技术。我们表明帧堆叠和降低的帧速率导致更准确的模型和更快的解码。 CD电话建模带来了进一步的改进。我们还提供了直接输出单词的LSTM RNN模型的初始结果。
[45] Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (2017)
Abstract: “State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet 1 and Fast R-CNN 2 have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features - using the recently popular terminology of neural networks with attention mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model 3, our detection system has a frame rate of 5 fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.”
摘要: “最先进的物体检测网络依靠区域提议算法来假设物体的位置。像SPPnet 1和Fast R-CNN 2这样的进步减少了这些检测网络的运行时间,从而暴露了区域提议的计算在本文中,我们将引入一个区域提议网络(RPN),该区域提议网络与检测网络共享全图像卷积特征,从而实现几乎免费的区域提议。 RPN经过端到端的训练以生成高质量的区域建议,供快速R-CNN用于检测,然后通过共享将RPN和Fast R-CNN进一步合并到单个网络中它们的卷积特征-使用最近流行的带有“注意力”机制的神经网络术语,RPN组件告诉统一网络在哪里寻找。对于非常深的VGG-16模型3,我们的检测系统具有GPU上的帧速率为5 fps(包括所有步骤),同时在PASCAL VOC 2007、2012和MS COCO数据集(每个图像仅包含300个提案)上实现了最新的对象检测精度。在ILSVRC和COCO 2015比赛中,Faster R-CNN和RPN是在多个赛道中获得第一名的作品的基础。代码已公开发布。“
下载地址 | 返回目录 | [10.1109/TPAMI.2016.2577031]
[46] Fcnt (2015)
Abstract: We propose a new approach for general object tracking with fully convolutional neural network. Instead of treating convolutional neural network (CNN) as a black-box feature extractor, we conduct in-depth study on the properties of CNN features offline pre-trained on massive image data and classification task on ImageNet. The discoveries motivate the design of our tracking system. It is found that convolutional layers in different levels characterize the target from different perspectives. A top layer encodes more semantic features and serves as a category detector, while a lower layer carries more discriminative information and can better separate the target from distracters with similar appearance. Both layers are jointly used with a switch mechanism during tracking. It is also found that for a tracking target, only a subset of neurons are relevant. A feature map selection method is developed to remove noisy and irrelevant feature maps, which can reduce computation redundancy and improve tracking accuracy. Extensive evaluation on the widely used tracking benchmark 36 shows that the proposed tacker outperforms the state-of-the-art significantly.
摘要: 我们提出了一种使用完全卷积神经网络进行通用目标跟踪的新方法。我们没有将卷积神经网络(CNN)当作黑匣子特征提取器,而是对大量图像数据进行离线预训练并在ImageNet上进行分类任务的CNN特征的属性进行了深入研究。这些发现激发了我们跟踪系统的设计。发现不同层次的卷积层从不同的角度表征了目标。顶层编码更多的语义特征并用作类别检测器,而底层则承载更多的判别信息,并且可以更好地将目标与具有类似外观的干扰物分开。在跟踪过程中,这两层都与切换机制一起使用。还发现对于跟踪目标,仅神经元的子集是相关的。提出了一种特征图选择方法,以去除噪点和不相关的特征图,从而减少了计算冗余,提高了跟踪精度。对广泛使用的跟踪基准36的广泛评估表明,所提出的打钉机明显优于最新技术。
下载地址 | 返回目录 | [10.1109/ICCV.2015.357]
[47] From captions to visual concepts and back (2015)
Abstract: This paper presents a novel approach for automatically generating image descriptions: visual detectors, language models, and multimodal similarity models learnt directly from a dataset of image captions. We use multiple instance learning to train visual detectors for words that commonly occur in captions, including many different parts of speech such as nouns, verbs, and adjectives. The word detector outputs serve as conditional inputs to a maximum-entropy language model. The language model learns from a set of over 400,000 image descriptions to capture the statistics of word usage. We capture global semantics by re-ranking caption candidates using sentence-level features and a deep multimodal similarity model. Our system is state-of-the-art on the official Microsoft COCO benchmark, producing a BLEU-4 score of 29.1{\%}. When human judges compare the system captions to ones written by other people on our held-out test set, the system captions have equal or better quality 34{\%} of the time.
摘要: 本文提出了一种自动生成图像描述的新颖方法:视觉检测器,语言模型和直接从图像标题数据集中学习的多模式相似性模型。我们使用多实例学习来训练视觉检测器中字幕中常见的单词,包括名词,动词和形容词等许多不同的语音部分。单词检测器的输出用作最大熵语言模型的条件输入。语言模型从一组超过40万个图像描述中学习,以捕获单词使用情况的统计信息。我们通过使用句子级别的功能和深层的多模式相似性模型对字幕候选者进行重新排序来捕获全局语义。我们的系统是Microsoft官方COCO基准上的最新技术,其BLEU-4得分为29.1 {\%}。当人类法官将系统字幕与其他人在我们坚持的测试集上撰写的字幕进行比较时,系统字幕在34%的时间内具有相同或更好的质量。
下载地址 | 返回目录 | [10.1109/CVPR.2015.7298754]
[48] Fully Character-Level Neural Machine Translation without Explicit Segmentation (2017)
Abstract: “Most existing machine translation systems operate at the level of words, relying on explicit segmentation to extract tokens. We introduce a neural machine translation (NMT) model that maps a source character sequence to a target character sequence without any segmentation. We employ a character-level convolutional network with max-pooling at the encoder to reduce the length of source representation, allowing the model to be trained at a speed comparable to subword-level models while capturing local regularities. Our character-to-character model outperforms a recently proposed baseline with a subword-level encoder on WMT15 DE-EN and CS-EN, and gives comparable performance on FI-EN and RU-EN. We then demonstrate that it is possible to share a single character-level encoder across multiple languages by training a model on a many-to-one translation task. In this multilingual setting, the character-level encoder significantly outperforms the subword-level encoder on all the language pairs. We observe that on CS-EN, FI-EN and RU-EN, the quality of the multilingual character-level translation even surpasses the models specifically trained on that language pair alone, both in terms of the BLEU score and human judgment.”
摘要: “”大多数现有的机器翻译系统都在单词级别上运行,依靠显式分段来提取标记。我们引入了神经机器翻译(NMT)模型,该模型将源字符序列映射到目标字符序列而没有任何分段。字符级卷积网络,在编码器处具有最大池,以减少源表示的长度,从而可以在捕获局部规则的同时以与子词级模型可比的速度训练模型。建议的基线,并在WMT15 DE-EN和CS-EN上使用子词级编码器,并在FI-EN和RU-EN上提供可比的性能,然后我们证明可以在多个应用程序之间共享单个字符级编码器通过在多对一翻译任务上训练模型来支持多种语言。在这种多语言设置中,字符级编码器在所有语言对上的性能均大大优于子词级编码器s。我们观察到,在CS-EN,FI-EN和RU-EN上,无论是在BLEU得分还是在人为判断方面,多语言字符级翻译的质量甚至都超过了仅针对该语言对专门训练的模型。”
下载地址 | 返回目录 | [10.1162/tacl_a_00067]
[49] Fully Convolutional Networks for Semantic Segmentation (2017)
Abstract: “Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, improve on the previous best result in semantic segmentation. Our key insight is to build fully convolutional networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet, the VGG net, and GoogLeNet) into fully convolutional networks and transfer their learned representations by fine-tuning to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional networks achieve improved segmentation of PASCAL VOC (30{\%} relative improvement to 67.2{\%} mean IU on 2012), NYUDv2, SIFT Flow, and PASCAL-Context, while inference takes one tenth of a second for a typical image.”
摘要: “卷积网络是强大的视觉模型,可产生要素的层次结构。我们证明,卷积网络本身(经过训练的端到端,像素对像素)可以改善语义分割中以前的最佳结果。我们的主要见解是建立“全卷积”网络,以接受任意大小的输入并通过有效的推理和学习生成相应大小的输出,我们定义并详细说明全卷积网络的空间,解释其在空间密集型预测任务中的应用,并绘制与先前模型的联系我们将当代的分类网络(AlexNet,VGG网络和GoogLeNet)改编为完全卷积的网络,并通过对细分任务的微调来转移其学习的表示形式,然后定义一个跳过结构,该结构将来自较深的粗糙层的语义信息进行组合来自浅层,细层的外观信息,以产生准确而详细的细分。 l网络实现了PASCAL VOC的细分(2012年平均IU改善了30 {\%},相对改善为67.2 {\%}),NYUDv2,SIFT Flow和PASCAL-Context,而推理所需的时间为十分之一秒典型图像。”
下载地址 | 返回目录 | [10.1109/TPAMI.2016.2572683]
[50] Fully-convolutional siamese networks for object tracking (2016)
Abstract: “The problem of arbitrary object tracking has traditionally been tackled by learning a model of the objects appearance exclusively online, using as sole training data the video itself. Despite the success of these methods, their online-only approach inherently limits the richness of the model they can learn. Recently, several attempts have been made to exploit the expressive power of deep convolutional networks. However, when the object to track is not known beforehand, it is necessary to perform Stochastic Gradient Descent online to adapt the weights of the network, severely compromising the speed of the system. In this paper we equip a basic tracking algorithm with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video. Our tracker operates at frame-rates beyond real-time and, despite its extreme simplicity, achieves state-of-the-art performance in multiple benchmarks.”
摘要: “传统上,通过使用视频本身作为唯一的训练数据,专门在线在线学习对象外观模型,解决了任意对象跟踪的问题。尽管这些方法取得了成功,但仅在线方法固有地限制了对象的丰富性。他们可以学习的模型最近,人们进行了许多尝试来利用深层卷积网络的表达能力,但是,当事先不知道要跟踪的对象时,有必要在线进行随机梯度下降以适应网络的权重在本文中,我们为基本跟踪算法配备了在ILSVRC15数据集中进行端到端训练的新型全卷积暹罗网络,用于视频中的目标检测。实时性,尽管非常简单,但可以在多个基准测试中达到最新的性能。”
下载地址 | 返回目录 | [10.1007/978-3-319-48881-3_56]
[51] Functional magnetic resonance imaging as experienced by stroke survivors (2014)
Abstract: “Functional magnetic resonance imaging (fMRI), a noninvasive technique that measures brain activation, has been increasingly used in the past decade, particularly among older adults. Use of fMRI in research with stroke survivors in recent years has substantially contributed to researchers understanding of the pathophysiology of stroke sequelae. However, despite the increasing popularity and use of fMRI, little is known about the patient experience of fMRI under research circumstances. The current research brief reports the findings of a pilot study undertaken to understand stroke survivors experiences with fMRI under research circumstances. Nine ischemic stroke patients underwent two MRI sessions, each of which lasted 1.5 hours and included several fMRI tasks. Patients were asked about their experiences and to share any advice. All participants reported that they did not feel claustrophobic; in addition, the importance of educating participants about fMRI was a universal theme that emerged. Knowledge of participant experiences may help with enrollment strategies for fMRI studies and improve research outcomes related to the fMRI experience.”
摘要: “功能磁共振成像(fMRI)是一种测量脑部活动的非侵入性技术,在过去的十年中,尤其是在老年人中,已越来越多地使用。近年来,在中风幸存者的研究中使用fMRI大大促进了研究人员的理解然而,尽管功能磁共振成像的普及和使用日益增加,但对于研究背景下功能磁共振成像的患者体验知之甚少,本研究简报报道了一项初步研究的结果,该研究旨在了解脑卒中幸存者的经验。研究条件下的功能磁共振成像:9名缺血性中风患者接受了两次磁共振成像会议,每次持续1.5小时,包括几次功能磁共振成像任务,询问患者的经验并分享任何建议,所有参与者均报告他们没有幽闭恐惧症;此外,对参与者进行功能磁共振成像的教育是一个普遍的主题,出现了。参与者经验的知识可能有助于fMRI研究的注册策略,并改善与fMRI经验有关的研究成果。“
下载地址 | 返回目录 | [10.3928/19404921-20140820-01]
[52] Generating Sequences With Recurrent Neural Networks (2013)
Abstract: This paper shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with long-range structure, simply by predicting one data point at a time. The approach is demonstrated for text (where the data are discrete) and online handwriting (where the data are real-valued). It is then extended to handwriting synthesis by allowing the network to condition its predictions on a text sequence. The resulting system is able to generate highly realistic cursive handwriting in a wide variety of styles.
摘要: 本文展示了如何通过简单地一次预测一个数据点,就可以将长短期记忆循环神经网络用于生成具有远程结构的复杂序列。该方法适用于文本(数据是离散的)和在线手写(数据是实值)的情况。然后通过允许网络将其预测条件基于文本序列扩展到手写合成。最终的系统能够以多种样式生成高度逼真的草书手写。”
[53] Generative visual manipulation on the natural image manifold (2016)
Abstract: “Realistic image manipulation is challenging because it requires modifying the image appearance in a user-controlled way, while preserving the realism of the result. Unless the user has considerable artistic skill, it is easy to “fall off” the manifold of natural images while editing. In this paper, we propose to learn the natural image manifold directly from data using a generative adversarial neural network. We then define a class of image editing operations, and constrain their output to lie on that learned manifold at all times. The model automatically adjusts the output keeping all edits as realistic as possible. All our manipulations are expressed in terms of constrained optimization and are applied in near-real time. We evaluate our algorithm on the task of realistic photo manipulation of shape and color. The presented method can further be used for changing one image to look like the other, as well as generating novel imagery from scratch based on users scribbles.”
摘要: “现实的图像处理颇具挑战性,因为它需要以用户控制的方式修改图像外观,同时保留结果的真实感。除非用户具有相当的艺术技能,否则很容易“掉落”自然图像的多样性。在本文中,我们建议使用生成的对抗神经网络直接从数据中学习自然图像流形,然后定义一类图像编辑操作,并将其输出始终限制在该学习的流形上。模型自动调整输出,使所有编辑尽可能地逼真,我们的所有操作均以约束优化表示,并几乎实时应用,我们评估了算法对形状和颜色的逼真的照片处理的任务。该方法还可以用于将一个图像更改为另一个图像,以及根据用户的涂抹从头开始生成新颖的图像。”
下载地址 | 返回目录 | [10.1007/978-3-319-46454-1_36]
[54] “{Googles Neural Machine Translation System: Bridging the Gap between Human and Machine Translation}” (2016)
Abstract: Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Unfortunately, NMT systems are known to be computationally expensive both in training and in translation inference. Also, most NMT systems have difficulty with rare words. These issues have hindered NMT\s use in practical deployments and services, where both accuracy and speed are essential. In this work, we present GNMT, Google\s Neural Machine Translation system, which attempts to address many of these issues. Our model consists of a deep LSTM network with 8 encoder and 8 decoder layers using attention and residual connections. To improve parallelism and therefore decrease training time, our attention mechanism connects the bottom layer of the decoder to the top layer of the encoder. To accelerate the final translation speed, we employ low-precision arithmetic during inference computations. To improve handling of rare words, we divide words into a limited set of common sub-word units (“wordpieces”) for both input and output. This method provides a good balance between the flexibility of “character”-delimited models and the efficiency of “word”-delimited models, naturally handles translation of rare words, and ultimately improves the overall accuracy of the system. Our beam search technique employs a length-normalization procedure and uses a coverage penalty, which encourages generation of an output sentence that is most likely to cover all the words in the source sentence. On the WMT\14 English-to-French and English-to-German benchmarks, GNMT achieves competitive results to state-of-the-art. Using a human side-by-side evaluation on a set of isolated simple sentences, it reduces translation errors by an average of 60{\%} compared to Google\s phrase-based production system.
摘要: 神经机器翻译(NMT)是自动翻译的端到端学习方法,具有克服传统基于短语的翻译系统的许多弱点的潜力。不幸的是,已知NMT系统在训练和翻译推理方面在计算上都是昂贵的。而且,大多数NMT系统都很难用罕见的单词。这些问题阻碍了NMT在实际部署和服务中的使用,在这些部署中,准确性和速度都是至关重要的。在这项工作中,我们介绍了GNMT,即Google的神经机器翻译系统,该系统试图解决其中的许多问题。我们的模型由一个深LSTM网络组成,该网络具有8个编码器层和8个解码器层,并使用了注意和剩余连接。为了提高并行度并因此减少训练时间,我们的注意力机制将解码器的底层连接到编码器的顶层。为了加快最终翻译速度,我们在推理计算过程中采用了低精度算法。为了改善对稀有单词的处理,我们将单词分为有限的一组公共子单词单元(“单词”),用于输入和输出。这种方法在“字符”定界的模型的灵活性和“单词”定界的模型的效率之间取得了良好的平衡,自然地处理了稀有单词的翻译,并最终提高了系统的整体准确性。我们的波束搜索技术采用长度归一化过程并使用覆盖率罚分,这会鼓励生成最有可能覆盖源句子中所有单词的输出句子。在WMT14的英语到法语和英语到德语的基准测试中,GNMT取得了与最新技术相当的竞争力。与一组基于短语的生产系统相比,通过对一组孤立的简单句子进行人工并排评估,可将翻译错误平均减少60 {\%}。
[55] Handbook of approximation algorithms and metaheuristics (2007)
Abstract: Delineating the tremendous growth in this area, the Handbook of Approximation Algorithms and Metaheuristics covers fundamental, theoretical topics as well as advanced, practical applications. It is the first book to comprehensively study both approximation algorithms and metaheuristics. Starting with basic approaches, the handbook presents the methodologies to design and analyze efficient approximation algorithms for a large class of problems, and to establish inapproximability results for another class of problems. It also discusses local search, neural networks, and metaheuristics, as well as multiobjective problems, sensitivity analysis, and stability. After laying this foundation, the book applies the methodologies to classical problems in combinatorial optimization, computational geometry, and graph problems. In addition, it explores large-scale and emerging applications in networks, bioinformatics, VLSI, game theory, and data analysis. Undoubtedly sparking further developments in the field, this handbook provides the essential techniques to apply approximation algorithms and metaheuristics to a wide range of problems in computer science, operations research, computer engineering, and economics. Armed with this information, researchers can design and analyze efficient algorithms to generate near-optimal solutions for a wide range of computational intractable problems.
摘要: 描述了该领域的巨大发展,《逼近算法和元启发式技术手册》涵盖了基本的理论主题以及先进的实际应用。这是第一本全面研究逼近算法和元启发式方法的书。从基本方法开始,该手册介绍了用于设计和分析针对一大类问题的有效逼近算法,以及为另一类问题建立不可逼近结果的方法。它还讨论了局部搜索,神经网络和元启发式方法,以及多目标问题,敏感性分析和稳定性。奠定了基础之后,该书将方法论应用于组合优化,计算几何和图形问题中的经典问题。此外,它还探索了网络,生物信息学,VLSI,博弈论和数据分析中的大规模新兴应用。毫无疑问,这激发了该领域的进一步发展,该手册提供了将逼近算法和元启发式方法应用于计算机科学,运筹学,计算机工程和经济学等众多问题的基本技术。有了这些信息,研究人员可以设计和分析有效的算法,以针对各种计算上的棘手问题生成接近最优的解决方案。
下载地址 | 返回目录 | [10.1201/9781420010749]
[56] Human-level control through deep reinforcement learning (2015)
Abstract: The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.
摘要: 强化学习的理论提供了一种规范性的解释,它深深植根于关于动物行为的心理学和神经科学观点中,有关代理如何优化环境控制的。但是,要在逼近现实世界的情况下成功地使用强化学习,特工面临着一项艰巨的任务:他们必须从高维感官输入中获得对环境的有效表示,并利用它们将过去的经验推广到新的情况中。值得注意的是,人类和其他动物似乎通过强化学习和分层的感觉处理系统的和谐组合解决了这一问题,前者由大量的神经数据证明,显示出多巴胺能神经元发出的相位信号与时差强化学习算法之间存在明显的相似之处。尽管强化学习代理已在各种领域中取得了一些成功,但它们的适用性以前仅限于可以手工制作有用功能的领域,或者限于具有充分观察到的低维状态空间的领域。在这里,我们利用训练深度神经网络的最新进展来开发一种新型的人工代理,称为深度Q网络,它可以使用端到端强化学习直接从高维感官输入中学习成功的策略。我们在经典的Atari 2600游戏具有挑战性的领域上对该代理进行了测试。我们证明,仅接收像素和游戏得分作为输入的深度Q网络代理能够超越之前所有算法的性能,并在49款游戏中达到与专业人类游戏测试人员相当的水平,使用相同的算法,网络架构和超参数。这项工作弥合了高维感官输入和动作之间的鸿沟,从而产生了第一个能够学习在各种挑战性任务中表现出色的人工代理。”
下载地址 | 返回目录 | [10.1038/nature14236]
[57] Hybrid computing using a neural network with dynamic external memory (2016)
Abstract: Artificial neural networks are remarkably adept at sensory processing, sequence learning and reinforcement learning, but are limited in their ability to represent variables and data structures and to store data over long timescales, owing to the lack of an external memory. Here we introduce a machine learning model called a differentiable neural computer (DNC), which consists of a neural network that can read from and write to an external memory matrix, analogous to the random-access memory in a conventional computer. Like a conventional computer, it can use its memory to represent and manipulate complex data structures, but, like a neural network, it can learn to do so from data. When trained with supervised learning, we demonstrate that a DNC can successfully answer synthetic questions designed to emulate reasoning and inference problems in natural language. We show that it can learn tasks such as finding the shortest path between specified points and inferring the missing links in randomly generated graphs, and then generalize these tasks to specific graphs such as transport networks and family trees. When trained with reinforcement learning, a DNC can complete a moving blocks puzzle in which changing goals are specified by sequences of symbols. Taken together, our results demonstrate that DNCs have the capacity to solve complex, structured tasks that are inaccessible to neural networks without external read-write memory.
摘要: 人工神经网络非常擅长于感觉处理,序列学习和强化学习,但是由于缺少外部存储器,它们在表示变量和数据结构以及长时间存储数据方面的能力受到限制。在这里,我们介绍了一种称为可分化神经计算机(DNC)的机器学习模型,该模型由一个神经网络组成,该神经网络可以读取和写入外部存储矩阵,类似于传统计算机中的随机存取存储器。像常规计算机一样,它可以使用其内存来表示和操纵复杂的数据结构,但是,像神经网络一样,它可以学习如何从数据中做到这一点。通过监督学习训练,我们证明DNC可以成功回答旨在模拟自然语言中的推理和推理问题的综合问题。我们表明,它可以学习任务,例如找到指定点之间的最短路径并推断随机生成的图中的缺失链接,然后将这些任务推广到特定的图中,例如运输网络和家谱。在接受强化学习训练后,DNC可以完成一个移动拼图游戏,在其中,不断变化的目标由符号序列指定。综上所述,我们的结果表明,DNC具有解决复杂的结构化任务的能力,而无需外部读写存储器,神经网络无法访问这些结构化任务。
下载地址 | 返回目录 | [10.1038/nature20101]
[58] Improving neural networks by preventing co-adaptation of feature detectors (2012)
Abstract: When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This “overfitting” is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random “dropout” gives big improvements on many benchmark tasks and sets new records for speech and object recognition.
摘要: 当在较小的训练集上训练大型前馈神经网络时,通常在保留的测试数据上表现较差。通过在每个训练案例中随机省略一半特征检测器,可以大大减少这种“过度拟合”。这防止了复杂的共同适应,在复杂的共同适应中,特征检测器仅在其他几个特定特征检测器的上下文中才有用。取而代之的是,每个神经元都会学会检测一种特征,该特征通常会在给定其必须在其中进行操作的多种内部环境的情况下,有助于产生正确答案。随机的“辍学”对许多基准测试任务都有了很大的改进,并为语音和物体识别创造了新的记录。
[59] Inceptionism: Going deeper into neural networks, google research blog (2015)
Abstract: “Artificial Neural Networks have spurred remarkable recent progress in image classification and speech recognition. But even though these are very useful tools based on well-known mathematical methods, we actually understand surprisingly little of why certain models work and others dont. So lets take a look at some simple techniques for peeking inside these networks. We train an artificial neural network by showing it millions of training examples and gradually adjusting the network parameters until it gives the classifications we want. The network typically consists of 10-30 stacked layers of artificial neurons. Each image is fed into the input layer, which then talks to the next layer, until eventually the “output” layer is reached. The networks “answer” comes from this final output layer. One of the challenges of neural networks is understanding what exactly goes on at each layer. We know that after training, each layer progressively extracts higher and higher-level features of the image, until the final layer essentially makes a decision on what the image shows. For example, Artificial Neural Networks have spurred remarkable recent progress in image classification and speech recognition. But even though these are very useful tools based on well-known mathematical methods, we actually understand surprisingly little of why certain models work and others dont. So lets take a look at some simple techniques for peeking inside these networks.”
摘要: “人工神经网络在图像分类和语音识别方面取得了令人瞩目的最新进展。但是,尽管这些是基于众所周知的数学方法的非常有用的工具,但实际上,我们对于某些模型为何有效而另一些模型却不起作用的理解却很少。让我们看一下在这些网络中进行窥视的一些简单技术,我们通过向人工神经网络展示数百万个训练示例并逐步调整网络参数直到提供所需的分类来训练人工神经网络,该网络通常由10-30个堆叠而成每一层图像被馈送到输入层,然后与下一层对话,直到最终到达“输出”层。网络的“答案”来自最后的输出层。网络正在了解每一层到底发生了什么。我们知道,经过训练,每一层都会逐渐提取出更高层次的特征图像,直到最后一层基本上决定了图像显示的内容。例如,人工神经网络在图像分类和语音识别方面取得了令人瞩目的最新进展。但是,即使这些工具是基于众所周知的数学方法的非常有用的工具,但实际上,对于某些模型为何起作用而其他模型却不起作用的事实,我们却非常惊讶。因此,让我们看一下在这些网络内部进行窥视的一些简单技术。”
[60] “{Influ{\^{e}}ncia do glyphosate e 2,4-D sobre o desenvolvimento inicial de esp{\{e}}cies florestais}” (2009)
Abstract: The chemical management of weeds in reforestation areas has been carried out increasingly due to its speed, economy and control. Glyphosate has been widely used in forest ecosystems, due to the absence of a residual effect in the soil. In order to increase its efficiency, it is often mixed with 2,4-D. However, both herbicides are not selective for most forest species, with particular attention to the possibility of the risk of an accidental drift in the areas in the planted forest areas. The purpose of this study was to evaluate the effects of glyphosate and 2,4-D alone and in combination on plants of Schizolobium amazonicum and Ceiba pentandra. Seven treatments were applied: control, glyphosate 180 and 360 g e.a. ha-1, 2,4-D amina 335 and 670 g e.a. ha-1, glyphosate + 2,4-D amina 90 + 167 and 180 + 335 g e.a. ha-1. Phytotoxicity, plant height and relative number of leaves were evaluated at 7, 14, 21 and 28 days after application. In the last evaluation, dry mass and length of roots were also evaluated. Both species showed tolerance to glyphosate at 180 g e.a. ha-1. C. pentandra was more sensitive to herbicides than Schizolobium amazonicum. During the evaluations, the phytotoxicity was more evident, with a reduction in height and the number of leaves. For both species, the amount of dry weight and root length were reduced by the action of herbicides, both alone and mixed. Despite the tolerance of both species to the lower dosis of glyphosate, both herbicides caused considerable damage; suggesting that the application be done with a jet directed at the target in order to avoid damage to the plant development.
摘要: 由于其速度,经济性和可控性,越来越多地对植树造林地区的杂草进行化学处理。草甘膦由于在土壤中没有残留作用,因此已广泛用于森林生态系统。为了提高效率,通常将其与2,4-D混合使用。但是,两种除草剂对大多数森林物种均不是选择性的,尤其要注意在人工林区发生意外漂移的可能性。这项研究的目的是评估草甘膦和2,4-D单独或联合使用对亚马逊裂殖酵母和五味木(Ceiba pentandra)植物的影响。进行了七种处理:对照,草甘膦180和360 g e.a.。 ha-1、2,4-D氨基酸335和670 g e.a. ha-1,草甘膦+ 2,4-D氨基90 + 167和180 + 335 g e.a. ha-1。在施用后第7、14、21和28天评估植物毒性,植物高度和相对叶数。在最后的评估中,还评估了干燥质量和根长。两种菌种均在180 g e.a时表现出对草甘膦的耐受性。 ha-1。 pentandra C.对除草剂的敏感度高于Schizolobium amazonicum。在评估过程中,植物毒性更加明显,高度和叶片数量减少。对于这两个物种,单独和混合使用的除草剂均可减少干重和根长。尽管这两种物种都对较低的草甘膦剂量具有耐受性,但两种除草剂均造成了相当大的损害。建议使用直接对准目标的射流完成该应用,以避免对植物发育的损害。”
[61] Instance-Aware Semantic Segmentation via Multi-task Network Cascades (2016)
Abstract: Semantic segmentation research has recently witnessed rapid progress, but many leading methods are unable to identify object instances. In this paper, we present Multitask Network Cascades for instance-aware semantic segmentation. Our model consists of three networks, respectively differentiating instances, estimating masks, and categorizing objects. These networks form a cascaded structure, and are designed to share their convolutional features. We develop an algorithm for the nontrivial end-to-end training of this causal, cascaded structure. Our solution is a clean, single-step training framework and can be generalized to cascades that have more stages. We demonstrate state-of-the-art instance-aware semantic segmentation accuracy on PASCAL VOC. Meanwhile, our method takes only 360ms testing an image using VGG-16, which is two orders of magnitude faster than previous systems for this challenging problem. As a by product, our method also achieves compelling object detection results which surpass the competitive Fast/Faster R-CNN systems. The method described in this paper is the foundation of our submissions to the MS COCO 2015 segmentation competition, where we won the 1st place.
摘要: 语义分割研究最近见证了快速的进步,但是许多领先的方法无法识别对象实例。在本文中,我们提出了用于实例感知语义分割的多任务网络级联。我们的模型由三个网络组成,分别区分实例,估计蒙版和对对象进行分类。这些网络形成了一个层叠的结构,旨在共享它们的卷积特征。我们开发了一种用于因果级联结构的非平凡的端到端训练的算法。我们的解决方案是一个干净的单步培训框架,可以推广到具有更多阶段的级联。我们在PASCAL VOC上展示了最新的实例感知语义分割精度。同时,我们的方法仅需360毫秒即可使用VGG-16测试图像,因此在解决这一难题方面比以前的系统快两个数量级。作为副产品,我们的方法还获得了令人信服的目标检测结果,超过了竞争性的快速/快速R-CNN系统。本文介绍的方法是我们向2015年MS COCO细分比赛提交的作品的基础,我们赢得了第一名。
下载地址 | 返回目录 | [10.1109/CVPR.2016.343]
[62] Instance-sensitive fully convolutional networks (2016)
Abstract: Fully convolutional networks (FCNs) have been proven very successful for semantic segmentation, but the FCN outputs are unaware of object instances. In this paper, we develop FCNs that are capable of proposing instance-level segment candidates. In contrast to the previous FCN that generates one score map, our FCN is designed to compute a small set of instance-sensitive score maps, each of which is the outcome of a pixel-wise classifier of a relative position to instances. On top of these instance-sensitive score maps, a simple assembling module is able to output instance candidate at each position. In contrast to the recent DeepMask method for segmenting instances, our method does not have any high-dimensional layer related to the mask resolution, but instead exploits image local coherence for estimating instances. We present competitive results of instance segment proposal on both PASCAL VOC and MS COCO.
摘要: 全卷积网络(FCN)已被证明在语义分割方面非常成功,但是FCN的输出并不了解对象实例。在本文中,我们开发了能够提出实例级细分候选者的FCN。与之前的生成一个得分图的FCN相比,我们的FCN旨在计算少量实例敏感得分图,每个实例图都是实例相对位置的按像素分类器的结果。在这些实例敏感得分图的顶部,一个简单的组装模块能够在每个位置输出实例候选。与最近的用于分割实例的DeepMask方法相反,我们的方法没有与遮罩分辨率相关的任何高维图层,而是利用图像局部相干性来估计实例。我们在PASCAL VOC和MS COCO上展示了实例细分提案的竞争结果。
下载地址 | 返回目录 | [10.1007/978-3-319-46466-4_32]
[63] Joint learning of words and meaning representations for open-text semantic parsing (2012)
Abstract: Open-text semantic parsers are designed to interpret any statement in natural language by inferring a corresponding meaning representation (MR-a formal representation of its sense). Unfortunately, large scale systems cannot be easily machine-learned due to a lack of directly supervised data. We propose a method that learns to assign MRs to a wide range of text (using a dictionary of more than 70, 000 words mapped to more than 40, 000 entities) thanks to a training scheme that combines learning from knowledge bases (e.g. WordNet) with learning from raw text. The model jointly learns representations of words, entities and MRs via a multi-task training process operating on these diverse sources of data. Hence, the system ends up providing methods for knowledge acquisition and wordsense disambiguation within the context of semantic parsing in a single elegant framework. Experiments on these various tasks indicate the promise of the approach.
摘要: 开放文本语义解析器旨在通过推断相应的含义表示形式(MR-表示其含义的形式表示)来解释自然语言中的任何语句。不幸的是,由于缺乏直接监管的数据,因此无法轻松地对大型系统进行机器学习。我们提出了一种方法,该方法借助结合了从知识库中学习的训练方案(可将70,000个单词的字典映射到40,000个以上的实体)来学习将MR分配给广泛的文本从原始文本中学习。该模型通过对这些不同数据源进行操作的多任务训练过程,共同学习单词,实体和MR的表示形式。因此,该系统最终提供了在单个优雅框架中在语义解析的上下文中用于知识获取和单词义消歧的方法。对这些不同任务的实验表明了该方法的前景。
[64] Layer Normalization (2016)
Abstract: Training state-of-the-art, deep neural networks is computationally expensive. One way to reduce the training time is to normalize the activities of the neurons. A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to that neuron on each training case. This significantly reduces the training time in feed-forward neural networks. However, the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent neural networks. In this paper, we transpose batch normalization into layer normalization by computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case. Like batch normalization, we also give each neuron its own adaptive bias and gain which are applied after the normalization but before the non-linearity. Unlike batch normalization, layer normalization performs exactly the same computation at training and test times. It is also straightforward to apply to recurrent neural networks by computing the normalization statistics separately at each time step. Layer normalization is very effective at stabilizing the hidden state dynamics in recurrent networks. Empirically, we show that layer normalization can substantially reduce the training time compared with previously published techniques.
摘要: 培训最先进的深度神经网络在计算上很昂贵。减少训练时间的一种方法是使神经元的活动正常化。最近引入的一种称为批归一化的技术使用训练案例的小批量上神经元的总输入分布来计算均值和方差,然后使用均值和方差对每个训练案例中对该神经元的总输入进行归一化。这大大减少了前馈神经网络的训练时间。但是,批处理规范化的效果取决于微型批处理的大小,如何将其应用于递归神经网络尚不清楚。在本文中,我们通过在单个训练案例上计算从层的所有总输入到神经元的归一化的均值和方差,将批量归一化转换为层归一化。像批次归一化一样,我们还为每个神经元提供了自己的自适应偏差和增益,这些偏差和增益将在归一化之后但在非线性之前应用。与批处理归一化不同,层归一化在训练和测试时间执行完全相同的计算。通过在每个时间步分别计算归一化统计量,将其应用于递归神经网络也很简单。层归一化在稳定循环网络中的隐藏状态动态方面非常有效。从经验上讲,我们表明与以前发布的技术相比,层归一化可以大大减少训练时间。
[65] Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection (2018)
Abstract: We describe a learning-based approach to hand-eye coordination for robotic grasping from monocular images. To learn hand-eye coordination for grasping, we trained a large convolutional neural network to predict the probability that task-space motion of the gripper will result in successful grasps, using only monocular camera images independent of camera calibration or the current robot pose. This requires the network to observe the spatial relationship between the gripper and objects in the scene, thus learning hand-eye coordination. We then use this network to servo the gripper in real time to achieve successful grasps. We describe two large-scale experiments that we conducted on two separate robotic platforms. In the first experiment, about 800,000 grasp attempts were collected over the course of two months, using between 6 and 14 robotic manipulators at any given time, with differences in camera placement and gripper wear and tear. In the second experiment, we used a different robotic platform and 8 robots to collect a dataset consisting of over 900,000 grasp attempts. The second robotic platform was used to test transfer between robots, and the degree to which data from a different set of robots can be used to aid learning. Our experimental results demonstrate that our approach achieves effective real-time control, can successfully grasp novel objects, and corrects mistakes by continuous servoing. Our transfer experiment also illustrates that data from different robots can be combined to learn more reliable and effective grasping.
摘要: 我们描述了一种基于学习的手眼协调方法,用于从单眼图像中抓取机器人。为了学习手眼协调的能力,我们训练了一个大型的卷积神经网络来预测抓爪的任务空间运动将成功抓握的可能性,仅使用独立于摄像机校准或当前机器人姿势的单目摄像机图像。这要求网络观察场景中抓取器和物体之间的空间关系,从而学习手眼协调。然后,我们使用此网络对抓具进行实时伺服,以实现成功的抓取。我们描述了在两个单独的机器人平台上进行的两个大规模实验。在第一个实验中,在两个月的过程中,在任何给定时间使用6到14个机器人操纵器,收集了大约80万次抓取尝试,相机放置和抓爪的磨损有所不同。在第二个实验中,我们使用了不同的机器人平台和8个机器人来收集由900,000次抓取尝试组成的数据集。第二个机器人平台用于测试机器人之间的传输,以及来自不同机器人集的数据可用于帮助学习的程度。我们的实验结果表明,我们的方法实现了有效的实时控制,可以成功地抓住新颖的物体,并通过连续伺服来纠正错误。我们的传输实验还表明,可以将来自不同机器人的数据进行组合,以学习更可靠,更有效的掌握方法。
下载地址 | 返回目录 | [10.1177/0278364917710318]
[66] Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014)
Abstract: In this paper, we propose a novel neural network model called RNN Encoder- Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixedlength vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.
摘要: 在本文中,我们提出了一种新的神经网络模型,称为RNN编码器-解码器,它由两个循环神经网络(RNN)组成。一个RNN将符号序列编码为固定长度的矢量表示形式,另一个RNN将表示形式解码为另一符号序列。共同训练提出模型的编码器和解码器,以在给定源序列的情况下最大化目标序列的条件概率。通过经验发现,通过使用RNN编码器-解码器计算的短语对的条件概率作为现有对数线性模型的附加功能,可以提高统计机器翻译系统的性能。定性地,我们表明所提出的模型学习了语言短语的语义和句法上有意义的表示。
下载地址 | 返回目录 | [10.3115/v1/d14-1179]
[67] Learning to learn by gradient descent by gradient descent (2016)
Abstract: The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorithms, implemented by LSTMs, outperform generic, hand-designed competitors on the tasks for which they are trained, and also generalize well to new tasks with similar structure. We demonstrate this on a number of tasks, including simple convex problems, training neural networks, and styling images with neural art.
摘要: 从手工设计的功能到机器学习中的已学习功能的转换已取得了巨大成功。尽管如此,优化算法仍然是手工设计的。在本文中,我们展示了如何将优化算法的设计转化为学习问题,从而使该算法能够自动学习感兴趣的问题中的结构。由LSTM实施的我们学到的算法在训练任务上胜过一般的手工设计竞争对手,并且可以很好地推广到结构相似的新任务。我们在许多任务上对此进行了演示,包括简单的凸问题,训练神经网络以及使用神经艺术对图像进行样式化。
[68] Learning to navigate in complex environments (2019)
Abstract: Learning to navigate in complex environments with dynamic elements is an important milestone in developing AI agents. In this work we formulate the navigation question as a reinforcement learning problem and show that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs. In particular we consider jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks. This approach can learn to navigate from raw sensory input in complicated 3D mazes, approaching human-level performance even under conditions where the goal location changes frequently. We provide detailed analysis of the agent behaviour1, its ability to localise, and its network activity dynamics, showing that the agent implicitly learns key navigation abilities.
摘要: 学习使用动态元素在复杂环境中导航是开发AI代理的重要里程碑。在这项工作中,我们将导航问题表述为强化学习问题,并表明通过依靠利用多模式感官输入的其他辅助任务,可以显着提高数据效率和任务性能。特别是,我们考虑与辅助深度预测和循环闭合分类任务一起学习目标驱动的强化学习问题。这种方法可以学习在复杂的3D迷宫中从原始的感觉输入中导航,甚至在目标位置频繁变化的情况下也可以达到人类水平的性能。我们提供了对代理行为1,其定位能力及其网络活动动态的详细分析,表明该代理隐式地学习了关键的导航能力。
[69] Learning to segment object candidates (2015)
Abstract: Recent object detection systems rely on two critical steps: (1) a set of object proposals is predicted as efficiently as possible, and (2) this set of candidate proposals is then passed to an object classifier. Such approaches have been shown they can be fast, while achieving the state of the art in detection performance.In this paper, we propose a new way to generate object proposals, introducing an approach based on a discriminative convolutional network. Our model is trained jointly with two objectives: given an image patch, the first part of the system outputs a class-agnostic segmentation mask, while the second part of the system outputs the likelihood of the patch being centered on a full object. At test time, the model is efficiently applied on the whole test image and generates a set of segmentation masks, each of them being assigned with a corresponding object likelihood score. We show that our model yields significant improvements over state-of-theart object proposal algorithms.In particular, compared to previous approaches, our model obtains substantially higher object recall using fewer proposals. We also show that our model is able to generalize to unseen categories it has not seen during training. Unlike all previous approaches for generating object masks, we do not rely on edges, superpixels, or any other form of low-level segmentation.
摘要: 最近的对象检测系统依赖于两个关键步骤:(1)尽可能高效地预测一组对象建议,并且(2)然后将该组候选建议传递给对象分类器。这样的方法已经证明它们可以快速实现目标,同时又能达到检测性能的最新水平。我们的模型是通过两个目标联合训练的:给定图像补丁,系统的第一部分输出与类无关的分割蒙版,而系统的第二部分输出补丁以整个对象为中心的可能性。在测试时,将模型有效地应用于整个测试图像,并生成一组分割蒙版,每个分割蒙版都分配有相应的对象似然度得分。我们表明,与最新的对象提议算法相比,我们的模型产生了重大改进。特别是,与以前的方法相比,我们的模型使用较少的提议获得了更高的对象召回率。我们还表明,我们的模型能够将训练过程中未见的未分类类别推广。与以前所有用于生成对象蒙版的方法不同,我们不依赖于边缘,超像素或任何其他形式的低级分割。
[70] Learning to track at 100 FPS with deep regression networks (2016)
Abstract: “Machine learning techniques are often used in computer vision due to their ability to leverage large amounts of training data to improve performance. Unfortunately, most generic object trackers are still trained from scratch online and do not benefit from the large number of videos that are readily available for offline training. We propose a method for offline training of neural networks that can track novel objects at test-time at 100 fps. Our tracker is significantly faster than previous methods that use neural networks for tracking, which are typically very slow to run and not practical for real-time applications. Our tracker uses a simple feed-forward network with no online training required. The tracker learns a generic relationship between object motion and appearance and can be used to track novel objects that do not appear in the training set. We test our network on a standard tracking benchmark to demonstrate our trackers state-of-the-art performance. Further, our performance improves as we add more videos to our offline training set. To the best of our knowledge, our tracker (Our tracker is available at http://davheld.github.io/GOTURN/GOTURN.html) is the first neuralnetwork tracker that learns to track generic objects at 100 fps.”
摘要: “机器学习技术由于能够利用大量的训练数据来提高性能而经常用于计算机视觉中。不幸的是,大多数通用对象跟踪器仍从头开始在线培训,而无法从大量的视频中受益。我们提出了一种用于神经网络的离线训练的方法,该方法可以在测试时以100 fps的速度跟踪新对象,我们的跟踪器比以前使用神经网络进行跟踪的方法要快得多,而跟踪跟踪的方法通常非常慢跟踪器使用简单的前馈网络,无需进行在线培训,该跟踪器了解对象运动与外观之间的一般关系,可用于跟踪未出现在对象中的新颖对象。训练集。我们在标准跟踪基准上测试我们的网络,以展示跟踪器的最新性能。随着我们向离线培训集中添加更多视频,nce会有所改善。据我们所知,我们的跟踪器(可在http://davheld.github.io/GOTURN/GOTURN.html上找到我们的跟踪器)是第一个学习以100 fps跟踪通用对象的神经网络跟踪器。“
下载地址 | 返回目录 | [10.1007/978-3-319-46448-0_45]
[71] Lifelong machine learning systems: Beyond learning algorithms (2013)
Abstract: Lifelong Machine Learning, or LML, considers systems that can learn many tasks from one or more domains over its lifetime. The goal is to sequentially retain learned knowledge and to selectively transfer that knowledge when learning a new task so as to develop more accurate hypotheses or policies. Following a review of prior work on LML, we propose that it is now appropriate for the AI community to move beyond learning algorithms to more seriously consider the nature of systems that are capable of learning over a lifetime. Reasons for our position are presented and potential counter-arguments are discussed. The remainder of the paper contributes by defining LML, presenting a reference framework that considers all forms of machine learning, and listing several key challenges for and benefits from LML research. We conclude with ideas for next steps to advance the field. {\textcopyright} 2013, Association for the Advancement of artificial intelligence.
摘要: 终身机器学习或LML,是指可以在其生命周期内从一个或多个域学习许多任务的系统。目标是依次保留所学知识,并在学习新任务时有选择地转移知识,以便制定更准确的假设或政策。在回顾了有关LML的先前工作之后,我们建议AI社区现在应该超越学习算法,更认真地考虑能够在一生中进行学习的系统的性质。提出了我们立场的原因,并讨论了潜在的反议。本文的其余部分通过定义LML,提出考虑所有形式的机器学习的参考框架以及列出LML研究的若干主要挑战和收益做出了贡献。最后,我们提出了推进该领域的下一步行动的想法。 {\ textcopyright},2013年,人工智能促进协会。
[72] Lifted rule injection for relation embeddings (2016)
Abstract: Methods based on representation learning currently hold the state-of-the-art in many natural language processing and knowledge base inference tasks. Yet, a major challenge is how to efficiently incorporate commonsense knowledge into such models. A recent approach regularizes relation and entity representations by propositionalization of first-order logic rules. However, propositionalization does not scale beyond domains with only few entities and rules. In this paper we present a highly efficient method for incorporating implication rules into distributed representations for automated knowledge base construction. We map entity-tuple embeddings into an approximately Boolean space and encourage a partial ordering over relation embeddings based on implication rules mined from WordNet. Surprisingly, we find that the strong restriction of the entity-tuple embedding space does not hurt the expressiveness of the model and even acts as a regularizer that improves generalization. By incorporating few commonsense rules, we achieve an increase of 2 percentage points mean average precision over a matrix factorization baseline, while observing a negligible increase in runtime.
摘要: 基于表示学习的方法目前在许多自然语言处理和知识库推理任务中具有最先进的技术。然而,一个主要的挑战是如何有效地将常识知识纳入此类模型。最近的方法通过一阶逻辑规则的命题化来规范关系和实体表示。但是,命题化不会扩展到只有很少的实体和规则的领域。在本文中,我们提出了一种高效的方法,可以将蕴涵规则合并到分布式表示中,以实现自动化的知识库。我们将实体元组嵌入映射到一个近似布尔空间中,并基于从WordNet中提取的蕴涵规则,鼓励对关系嵌入进行部分排序。出乎意料的是,我们发现实体元组嵌入空间的强约束不会损害模型的表达,甚至不会充当可提高泛化性的正则化器。通过合并一些常识性规则,我们在矩阵分解基线之上的平均平均精度提高了2个百分点,而运行时间的增长却可以忽略不计。
下载地址 | 返回目录 | [10.18653/v1/d16-1146]
[73] Literature after 9/11 (2017)
Abstract: “In a celebrated essay published forty years before the attacks of September 11, Philip Roth bemoaned the inadequacy of literary language against the density of lived experience. The writer, Roth insisted, could no longer compete with the surrealism of late modernity, which mocks our meager imaginations: “The actuality is continually outdoing our talents, and the culture tosses up figures almost daily that are the envy of any novelist.” Such apprehensions were vindicated and amplified by the 2001 attacks in New York and Washington, DC. The spectacular explosions, the web of data around the terrorist perpetrators, and the military fallout of the attacks didnt yield easily to the synthesis of literary transcription. The reality of America at the turn of the millennium seemed to overwhelm any fiction that sought to capture it. Circuits of new information and the sheer volume of signal traffic defied literary dramatization. To surmount the problem of a reality that both overwhelms and eludes its witnesses, in the years following the attacks writers aimed to convey a widely felt sense of limit or failure. Coming up short in its ability to respond to trauma, yet trying to react nonetheless, was fictions most compelling response to the cataclysm of the collapsing towers. The literature of September 11 gauged infinitesimal shifts in the national psyche from 2001 through 2010, across what may be called the 9/11 decade. As a genre, literature in this vein often questioned its own necessity, as well as its ties to a place and time, to an aftermath. In this sense, the terrorist attacks intensified ongoing negotiations of what literature is permitted or expected to address, and with what degree of fidelity or earnestness. Inflated expectations about the great American 9/11 novel enhanced public awareness of how writers reflected on the events, and exactly at what remove from their occurrence. Ruptures were diagnosed in the literary careers of several established novelists, among them Don DeLillo, Philip Roth, John Updike, Jay McInerney, and David Foster Wallace, to name some of those interested in addressing the attacks explicitly; but also Jonathan Franzen, Paul Auster, and Dave Eggers, who favored a more diffuse, ambient approach to terrors aftermath. This lineup strikingly marshals the white male canon of late modern American writing, which begs the question of diversity and representation in 9/11 authorship.”
摘要: “在9月11日袭击前四十年发表的一篇著名的论文中,菲利普·罗斯(Philip Roth)哀叹文学语言对生活经验密度的不足。罗斯坚称,作家再也无法与后期现代主义的超现实主义竞争,后者嘲讽了我们微不足道的想象力:“现实不断超越我们的才华,这种文化几乎每天都在吸引任何小说家羡慕的人物。” 2001年在纽约和华盛顿特区发生的袭击,证明了这种忧虑,并加剧了这种担忧,壮观的爆炸,恐怖分子施暴者周围的数据网络以及袭击的军事影响对文学转录的合成并不容易。千年之交,美国的现实似乎压倒了任何试图抓住它的小说,新信息的循环和信号量的庞大反驳了文学戏剧化,以解决一个现实问题,即既压倒又躲避其证人的问题在袭击发生后的数年中,作家们试图传达一种广泛的极限感或失败感,其对创伤的反应能力虽然不足,但仍试图做出反应,这是小说对坍塌的塔楼的灾难最有说服力的反应。 9月11日的文献评估了从2001年到2010年(可能称为9/11十年)国民心态的微小变化。这种类型的文学常常质疑其自身的必要性,以及它与地点和时间,后果的联系。从这个意义上讲,恐怖袭击加剧了关于允许或预期处理哪些文献以及保真度或诚恳程度的正在进行的谈判。人们对美国9/11伟大小说的过高期望增强了公众对作家对事件的思考方式以及事件发生的根源的意识。几位知名小说家的文学生涯被诊断出破裂,其中包括唐·德利洛,菲利普·罗斯,约翰·厄普代克,杰伊·麦金尼和大卫·福斯特·华莱士,并列举了一些有兴趣明确应对袭击的人。还有乔纳森·弗兰森(Jonathan Franzen),保罗·奥斯特(Paul Auster)和戴夫·埃格斯(Dave Eggers),他们赞成采用更分散,更环境的方式来处理恐怖后果。该阵容惊人地封送了近代美国现代写作中的白人男性佳能,这引出了9/11作者身份的多样性和代表性问题。”
下载地址 | 返回目录 | [10.1017/9781316569290.011]
[74] Long-Term Recurrent Convolutional Networks for Visual Recognition and Description (2017)
Abstract: “Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent are effective for tasks involving sequences, visual and otherwise. We describe a class of recurrent convolutional architectures which is end-to-end trainable and suitable for large-scale visual understanding tasks, and demonstrate the value of these models for activity recognition, image captioning, and video description. In contrast to previous models which assume a fixed visual representation or perform simple temporal averaging for sequential processing, recurrent convolutional models are doubly deep in that they learn compositional representations in space and time. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Differentiable recurrent models are appealing in that they can directly map variable-length inputs (e.g., videos) to variable-length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent sequence models are directly connected to modern visual convolutional network models and can be jointly trained to learn temporal dynamics and convolutional perceptual representations. Our results show that such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined or optimized.”
摘要: “基于深度卷积网络的模型主导了最近的图像解释任务;我们研究了循环模型对于涉及序列,视觉或其他方面的任务是否有效。我们描述了一类可以端到端训练的循环卷积体系结构并适用于大规模的视觉理解任务,并证明这些模型对于活动识别,图像字幕和视频描述的价值与以前的模型(假定模型具有固定的视觉表示或对顺序处理进行简单的时间平均),递归卷积模型是“双重深度”的模型,因为它们学习时空的组成表示形式。当将非线性纳入网络状态更新时,可以学习长期依赖性。可微分的递归模型吸引人之处在于它们可以直接映射可变长度输入(例如视频)到可变长度输出(例如自然语言文字),并可以对复杂的时间动态进行建模;但是可以通过反向传播对其进行优化。我们的循环序列模型直接连接到现代视觉卷积网络模型,可以共同训练以学习时间动态和卷积感知表示。我们的结果表明,与单独定义或优化的识别或生成的最新模型相比,此类模型具有明显的优势。”
下载地址 | 返回目录 | [10.1109/TPAMI.2016.2599174]
[75] Low-Shot Visual Recognition by Shrinking and Hallucinating Features (2017)
Abstract: Low-shot visual learning-the ability to recognize novel object categories from very few examples-is a hallmark of human visual intelligence. Existing machine learning approaches fail to generalize in the same way. To make progress on this foundational problem, we present a low-shot learning benchmark on complex images that mimics challenges faced by recognition systems in the wild. We then propose (1) representation regularization techniques, and (2) techniques to hallucinate additional training examples for data-starved classes. Together, our methods improve the effectiveness of convolutional networks in low-shot learning, improving the one-shot accuracy on novel classes by 2.3× on the challenging ImageNet dataset.
摘要: 低速视觉学习-从很少的例子中识别新颖物体类别的能力-是人类视觉智能的标志。现有的机器学习方法无法以相同的方式推广。为了在这个基本问题上取得进展,我们针对复杂图像提出了一个低调的学习基准,该基准模仿了野外识别系统面临的挑战。然后,我们提出(1)表示正则化技术,以及(2)幻化数据匮乏类的其他训练示例的技术。总之,我们的方法提高了卷积网络在低镜头学习中的有效性,在具有挑战性的ImageNet数据集上,将新颖类的单镜头准确性提高了2.3倍。
下载地址 | 返回目录 | [10.1109/ICCV.2017.328]
[76] Mask R-CNN (2020)
Abstract: We present a conceptually simple, flexible, and general framework for object instance segmentation. Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance. The method, called Mask R-CNN, extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition. Mask R-CNN is simple to train and adds only a small overhead to Faster R-CNN, running at 5 fps. Moreover, Mask R-CNN is easy to generalize to other tasks, e.g., allowing us to estimate human poses in the same framework. We show top results in all three tracks of the COCO suite of challenges, including instance segmentation, bounding-box object detection, and person keypoint detection. Without bells and whistles, Mask R-CNN outperforms all existing, single-model entries on every task, including the COCO 2016 challenge winners. We hope our simple and effective approach will serve as a solid baseline and help ease future research in instance-level recognition. Code has been made available at: https://github.com/facebookresearch/Detectron.
摘要: 我们为对象实例分割提供了一个概念上简单,灵活且通用的框架。我们的方法有效地检测图像中的对象,同时为每个实例生成高质量的分割蒙版。该方法称为“遮罩R-CNN”,它通过添加一个分支与现有的用于边界框识别的分支并行来预测对象蒙版,从而扩展了Faster R-CNN。 Mask R-CNN易于训练,并且为Faster R-CNN仅增加了很小的开销,以5 fps的速度运行。此外,Mask R-CNN易于推广到其他任务,例如,允许我们在同一框架中估算人体姿势。我们在COCO挑战套件的所有三个轨迹中均显示了最佳结果,包括实例细分,边界框对象检测和人员关键点检测。 Mask R-CNN不需花哨,在所有任务上都胜过所有现有的单一模型条目,包括2016年COCO挑战赛获奖者。我们希望我们简单有效的方法可以作为坚实的基准,并有助于简化实例级识别的未来研究。代码已在以下位置提供:https://github.com/facebookresearch/Detectron。
下载地址 | 返回目录 | [10.1109/TPAMI.2018.2844175]
[77] Mastering the game of Go with deep neural networks and tree search (2016)
Abstract: “The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses value networks to evaluate board positions and policy networks to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8{\%} winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.”
摘要: “ Go游戏因其巨大的搜索空间以及评估棋盘位置和移动的难度而一直被视为人工智能经典游戏中最具挑战性的游戏。在这里,我们介绍一种使用价值网络的计算机Go新方法评估董事会职位和“政策网络”以选择行动。这些深度神经网络是通过人类专家游戏的有监督学习与自玩游戏的强化学习的新颖组合来训练的。在最先进的蒙特卡洛树搜索程序级别上玩Go,该程序可以模拟成千上万的自玩游戏,我们还引入了一种新的搜索算法,将蒙特卡洛模拟与价值和政策网络相结合。算法,我们的程序AlphaGo相对于其他Go程序获得了99.8 {\%}的获胜率,并以5场比赛将欧洲人Go冠军击败了0。这是第一次计算ter程序在Go的大型游戏中击败了人类职业玩家,这一壮举以前被认为至少要十年。
下载地址 | 返回目录 | [10.1038/nature16961]
[78] Matching networks for one shot learning (2016)
Abstract: Learning from a few examples remains a key challenge in machine learning. Despite recent advances in important domains such as vision and language, the standard supervised deep learning paradigm does not offer a satisfactory solution for learning new concepts rapidly from little data. In this work, we employ ideas from metric learning based on deep neural features and from recent advances that augment neural networks with external memories. Our framework learns a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types. We then define one-shot learning problems on vision (using Omniglot, ImageNet) and language tasks. Our algorithm improves one-shot accuracy on ImageNet from 87.6{\%} to 93.2{\%} and from 88.0{\%} to 93.8{\%} on Omniglot compared to competing approaches. We also demonstrate the usefulness of the same model on language modeling by introducing a one-shot task on the Penn Treebank.
摘要: 从一些示例中学习仍然是机器学习中的关键挑战。尽管最近在视觉和语言等重要领域取得了进步,但受监督的标准深度学习范例仍无法为从少量数据中快速学习新概念提供令人满意的解决方案。在这项工作中,我们采用了基于深度神经功能的度量学习的思想,以及基于外部记忆增强神经网络的最新进展。我们的框架学习了一个网络,该网络将一个带有标签的小型支持集和一个未带有标签的示例映射到其标签,从而无需进行微调以适应新的类类型。然后,我们定义关于视觉(使用Omniglot,ImageNet)和语言任务的一次性学习问题。与竞争方法相比,我们的算法在ImageNet上的单发准确性从87.6 {\%}提高到93.2 {\%},在Omniglot上从88.0 {\%}提高到93.8 {\%}。通过在Penn Treebank上引入一个一次性任务,我们还证明了相同模型在语言建模中的有用性。
[79] Memory networks (2015)
Abstract: We describe a new class of learning models called memory networks. Memory networks reason with inference components combined with a long-term memory component; they learn how to use these jointly. The long-term memory can be read and written to, with the goal of using it for prediction. We investigate these models in the context of question answering (QA) where the long-term memory effectively acts as a (dynamic) knowledge base, and the output is a textual response. We evaluate them on a large-scale QA task, and a smaller, but more complex, toy task generated from a simulated world. In the latter, we show the reasoning power of such models by chaining multiple supporting sentences to answer questions that require understanding the intension of verbs.
摘要: 我们描述了一种称为记忆网络的新型学习模型。内存网络的推理将推理成分与长期内存成分结合在一起;他们学习如何共同使用这些。可以读取和写入长期内存,目的是将其用于预测。我们在问答(QA)的背景下研究这些模型,其中长期记忆有效地充当(动态)知识库,而输出是文本响应。我们根据大型质量检查任务以及由模拟世界生成的较小但更复杂的玩具任务对它们进行评估。在后者中,我们通过链接多个支持语句来回答需要理解动词内涵的问题,从而展示了这种模型的推理能力。
[80] Meta-Learning with Memory-Augmented Neural Networks (2016)
Abstract: Despite recent breakthroughs in the applications of deep neural networks, one setting that presents a persistent challenge is that of “one-shot learning.” Traditional gradient-based networks require a lot of data to learn, often through extensive iterative training. When new data is encountered, the models must inefficiently releam their parameters to adequately incorporate the new information without catastrophic interference. Architectures with augmented memory capacities, such as Neural Turing Machines (NTMs), offer the ability to quickly encode and retrieve new information, and hence can potentially obviate the downsides of conventional models. Here, we demonstrate the ability of a memory-augmented neural network to rapidly assimilate new data, and leverage this data to make accurate predictions after only a few samples. We also introduce a new method for accessing an external memory that focuses on memory content, unlike previous methods that additionally use memory location-based focusing mechanisms.
摘要: 尽管在深度神经网络的应用方面取得了近期突破,但提出“持续性挑战”的一个条件是“一次性学习”。传统的基于梯度的网络通常需要通过大量的迭代训练来学习大量数据。当遇到新数据时,模型必须无效地释放其参数以充分合并新信息而不会造成灾难性干扰。具有增强存储容量的体系结构,例如神经图灵机(NTM),提供了快速编码和检索新信息的能力,因此可以避免传统模型的弊端。在这里,我们展示了增强记忆的神经网络快速吸收新数据的能力,并利用这些数据仅需几个样本即可做出准确的预测。我们还介绍了一种访问外部存储器的新方法,该方法专注于存储器内容,这与以前的方法另外使用基于存储器位置的聚焦机制不同。
[81] “{Minds eye: A recurrent visual representation for image caption generation}” (2015)
Abstract: In this paper we explore the bi-directional mapping between images and their sentence-based descriptions. Critical to our approach is a recurrent neural network that attempts to dynamically build a visual representation of the scene as a caption is being generated or read. The representation automatically learns to remember long-term visual concepts. Our model is capable of both generating novel captions given an image, and reconstructing visual features given an image description. We evaluate our approach on several tasks. These include sentence generation, sentence retrieval and image retrieval. State-of-the-art results are shown for the task of generating novel image descriptions. When compared to human generated captions, our automatically generated captions are equal to or preferred by humans 21.0{\%} of the time. Results are better than or comparable to state-of-the-art results on the image and sentence retrieval tasks for methods using similar visual features.
摘要: 在本文中,我们探讨了图像及其基于句子的描述之间的双向映射。对于我们的方法而言,至关重要的是递归神经网络,该网络尝试在生成或读取字幕时动态构建场景的视觉表示。该表示法会自动学习记住长期的视觉概念。我们的模型既能够在给定图像的情况下生成新颖的字幕,又能够在给定图像描述的情况下重建视觉特征。我们在一些任务上评估我们的方法。这些包括句子生成,句子检索和图像检索。显示了用于生成新颖图像描述的任务的最新结果。与人工生成的字幕相比,我们自动生成的字幕在21.0 {\%}的时间内等于或偏爱人类。对于使用类似视觉特征的方法,结果优于或可与图像和句子检索任务的最新结果相媲美。
下载地址 | 返回目录 | [10.1109/CVPR.2015.7298856]
[82] Mining on Manifolds: Metric Learning Without Labels (2018)
Abstract: In this work we present a novel unsupervised framework for hard training example mining. The only input to the method is a collection of images relevant to the target application and a meaningful initial representation, provided e.g. by pre-trained CNN. Positive examples are distant points on a single manifold, while negative examples are nearby points on different manifolds. Both types of examples are revealed by disagreements between Euclidean and manifold similarities. The discovered examples can be used in training with any discriminative loss. The method is applied to unsupervised fine-tuning of pre-trained networks for fine-grained classification and particular object retrieval. Our models are on par or are outperforming prior models that are fully or partially supervised.
摘要: 在这项工作中,我们为刻苦训练示例挖掘提供了一种新颖的无监督框架。该方法的唯一输入是与目标应用程序相关的图像集合以及有意义的初始表示,例如,通过预先训练的CNN。正例是单个流形上的远点,而负例是不同流形上的近点。两种类型的例子都可以通过欧几里得和流形相似性之间的分歧来揭示。发现的示例可用于具有任何区别损失的训练。该方法适用于预训练网络的无监督微调,以进行细分类和特定对象检索。我们的模型与完全或部分监督的模型处于同等水平或优于先前的模型。
下载地址 | 返回目录 | [10.1109/CVPR.2018.00797]
[83] Modeling and Propagating CNNs in a Tree Structure for Visual Tracking (2016)
Abstract: We present an online visual tracking algorithm by managing multiple target appearance models in a tree structure. The proposed algorithm employs Convolutional Neural Networks (CNNs) to represent target appearances, where multiple CNNs collaborate to estimate target states and determine the desirable paths for online model updates in the tree. By maintaining multiple CNNs in diverse branches of tree structure, it is convenient to deal with multi-modality in target appearances and preserve model reliability through smooth updates along tree paths. Since multiple CNNs share all parameters in convolutional layers, it takes advantage of multiple models with little extra cost by saving memory space and avoiding redundant network evaluations. The final target state is estimated by sampling target candidates around the state in the previous frame and identifying the best sample in terms of a weighted average score from a set of active CNNs. Our algorithm illustrates outstanding performance compared to the state-of-the-art techniques in challenging datasets such as online tracking benchmark and visual object tracking challenge.
摘要: 我们通过管理树状结构中的多个目标外观模型,提出了一种在线视觉跟踪算法。所提出的算法采用卷积神经网络(CNN)表示目标外观,其中多个CNN协作以估计目标状态并确定树中在线模型更新的理想路径。通过在树结构的不同分支中维护多个CNN,可以方便地处理目标外观中的多模式,并通过沿树路径的平滑更新来保持模型的可靠性。由于多个CNN共享卷积层中的所有参数,因此它通过节省内存空间并避免了冗余网络评估,以很少的额外成本利用了多个模型。通过在前一帧中的状态周围对目标候选进行采样并根据来自一组活动CNN的加权平均得分来确定最佳样本,来估算最终目标状态。与最新技术相比,我们的算法在具有挑战性的数据集(例如在线跟踪基准和视觉对象跟踪挑战)中表现出了卓越的性能。
[84] “{Multitudes : Aux origines dune revue radicale}” (2017)
Abstract: This article explains the foundation in 2000 of the journal Multitudes. Carried by a group around Yann Moulier-Boutang, Multitudes shows its closeness with the Italian intellectual Antonio Negri who was in jail at this moment. Contrary to other journals as Vacarmes or mouvements which are more directly linked to the rise of social movements from 1995 to 1997, Multitudes claims its belonging to another temporality. This article intends to question this reality by proceeding to the analysis of the theoretical and political origins of Multitudes. Thanks to the method of social history of political ideas, the article reintegrates Multitudes within the context of “years 68” when it was born. Indeed, it is the moment when crossed the trajectories of Antonio Negri, Yann Moulier-Boutang and Felix Guattari. Multitudes arises from this history of unorthodox marxisms as trotskyism and operaism which starts during the 1970\s and hybridize within the 1980\s, when political struggles ebbed and marxisms collapsed.
摘要: 本文解释了Multitudes杂志在2000年的创立。 Multitudes由Yann Moulier-Boutang周围的一个团队所携带,显示了与目前在监狱中的意大利知识分子Antonio Negri的亲密关系。与1995年至1997年与社会运动的兴起更直接相关的Vacarmes或运动等其他期刊相反,Multitudes声称它属于另一种暂时性。本文旨在通过对众多理论和政治渊源的分析来质疑这一现实。由于采用了政治思想的社会史方法,本文在“ 68岁”出生的背景下重新融合了许多人。确实,这是跨越安东尼奥·内格里(Antonio Negri),扬·穆里尔·布唐(Yann Moulier-Boutang)和费利克斯·瓜塔里(Felix Guattari)的那一刻的时刻。大量的非传统马克思主义历史源于托洛斯基主义和操作主义,始于1970年代,并在1980年代混合发展,当时政治斗争消退,马克思主义崩溃。
下载地址 | 返回目录 | [10.3917/rai.067.0031]
[85] Net2Net: Accelerating learning via knowledge transfer (2016)
Abstract: We introduce techniques for rapidly transferring the information stored in one neural net into another neural net. The main purpose is to accelerate the training of a significantly larger neural net. During real-world workflows, one often trains very many different neural networks during the experimentation and design process. This is a wasteful process in which each new model is trained from scratch. Our Net2Net technique accelerates the experimentation process by instantaneously transferring the knowledge from a previous network to each new deeper or wider network. Our techniques are based on the concept of function-preserving transformations between neural network specifications. This differs from previous approaches to pre-training that altered the function represented by a neural net when adding layers to it. Using our knowledge transfer mechanism to add depth to Inception modules, we demonstrate a new state of the art accuracy rating on the ImageNet dataset.
摘要: 我们引入了将存储在一个神经网络中的信息快速转移到另一个神经网络的技术。主要目的是加速训练更大的神经网络。在现实世界的工作流程中,人们经常在实验和设计过程中训练很多不同的神经网络。这是一个浪费的过程,其中从头开始训练每个新模型。我们的Net2Net技术通过将知识从先前的网络瞬时转移到每个新的更深或更广的网络来加速实验过程。我们的技术基于神经网络规范之间的功能保留转换概念。这与以前的预训练方法不同,前一种方法是在向神经网络添加层时更改神经网络表示的功能。使用我们的知识转移机制为Inception模块增加深度,我们在ImageNet数据集上展示了最新的精度等级。
[86] Network morphism (2016)
Abstract: We present a systematic study on how to morph a well-trained neural network to a new one so that its network function can be completely preserved. We define this as network morphism in this research. After morphing a parent network, the child network is expected to inherit the knowledge from its parent network and also has the potential to continue growing into a more powerful one with much shortened training time. The first requirement for this network morphism is its ability to handle diverse morphing types of networks, including changes of depth, width, kernel size, and even subnet. To meet this requirement, we first introduce the network morphism equations, and then develop novel morphing algorithms for all these morphing types for both classic and convolutional neural networks. The second requirement is its ability to deal with non-linearity in a network. We propose a family of parametric-activation functions to facilitate the morphing of any continuous nonlinear activation neurons. Experimental results on benchmark datasets and typical neural networks demonstrate the effectiveness of the proposed network morphism scheme.
摘要: 我们对如何将训练有素的神经网络变形为新的神经网络进行了系统的研究,以便可以完全保留其网络功能。在这项研究中,我们将其定义为网络形态。在对父级网络进行了变形之后,期望子级网络从其父级网络中继承知识,并且有潜力继续发展为功能更强大的网络,而培训时间将大大缩短。这种网络形态的第一个要求是其处理各种网络形态类型的能力,包括深度,宽度,内核大小甚至子网的变化。为了满足此要求,我们首先介绍网络态射方程,然后针对经典和卷积神经网络针对所有这些变形类型开发新颖的变形算法。第二个要求是其处理网络非线性的能力。我们提出了一组参数激活函数,以促进任何连续的非线性激活神经元的变形。在基准数据集和典型神经网络上的实验结果证明了所提出的网络态射方案的有效性。
[87] Neural Turing Machines (2014)
Abstract: We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.
摘要: 我们通过将神经网络耦合到外部存储资源来扩展神经网络的功能,它们可以通过注意力过程与之交互。组合系统类似于图灵机或冯·诺依曼体系结构,但是端到端是可区分的,因此可以通过梯度下降有效地进行训练。初步结果表明,神经图灵机可以从输入和输出示例中推断出简单的算法,例如复制,排序和关联召回。
[88] Neural machine translation by jointly learning to align and translate (2015)
Abstract: Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder–decoders and encode a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder–decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.
摘要: “神经机器翻译是最近提出的机器翻译方法。与传统的统计机器翻译不同,神经机器翻译的目的是构建可以联合调整以最大化翻译性能的单个神经网络。最近为神经机器翻译提出的模型通常属于编码器-解码器系列,并将源句子编码为固定长度的向量,解码器从中生成翻译。在本文中,我们推测使用固定长度向量是提高此基本编码器-解码器体系结构性能的瓶颈,并建议通过允许模型自动(软)搜索对象的部分来扩展此范围。与预测目标单词相关的源句子,而不必明确地将这些部分形成为一个困难的部分。通过这种新方法,我们在英语到法语翻译任务上的翻译性能可与现有的基于短语的最新系统相媲美。此外,定性分析表明该模型发现的(软)路线与我们的直觉非常吻合。
[89] Neural machine translation of rare words with subword units (2016)
Abstract: Neural machine translation (NMT) models typically operate with a fixed vocabulary, but translation is an open-vocabulary problem. Previous work addresses the translation of out-of-vocabulary words by backing off to a dictionary. In this paper, we introduce a simpler and more effective approach, making the NMT model capable of open-vocabulary translation by encoding rare and unknown words as sequences of subword units. This is based on the intuition that various word classes are translatable via smaller units than words, for instance names (via character copying or transliteration), compounds (via compositional translation), and cognates and loanwords (via phonological and morphological transformations). We discuss the suitability of different word segmentation techniques, including simple character ngram models and a segmentation based on the byte pair encoding compression algorithm, and empirically show that subword models improve over a back-off dictionary baseline for the WMT 15 translation tasks English→German and English→Russian by up to 1.1 and 1.3 Bleu, respectively.
摘要: 神经机器翻译(NMT)模型通常以固定的词汇量运行,但是翻译是一个开放词汇的问题。先前的工作通过退回到字典来解决词汇外单词的翻译。在本文中,我们介绍了一种更简单,更有效的方法,通过将稀有和未知词编码为子词单元序列,使NMT模型能够进行词汇翻译。这是基于这样的直觉,即可以通过比单词小的单位来翻译各种单词类别,例如名称(通过字符复制或音译),复合词(通过组成翻译)以及同源词和借词(通过语音和词法转换)。我们讨论了不同的分词技术的适用性,包括简单字符ngram模型和基于字节对编码压缩算法的分词,并通过经验证明子词模型相对于WMT 15翻译任务的退避字典基线有所改善,英语→德语和英语→俄语,最多分别达到1.1和1.3 Bleu。
[90] On the importance of initialization and momentum in deep learning (2013)
Abstract: Deep and recurrent neural networks (DNNs and RNNs respectively) are powerful models that were considered to be almost impossible to train using stochastic gradient descent with momentum. In this paper, we show that when stochastic gradient descent with momentum uses a well-designed random initialization and a particular type of slowly increasing schedule for the momentum parameter, it can train both DNNs and RNNs (on datasets with long-term dependencies) to levels of performance that were previously achievable only with Hessian-Free optimization. We find that both the initialization and the momentum are crucial since poorly initialized networks cannot be trained with momentum and well-initialized networks perform markedly worse when the momentum is absent or poorly tuned. Our success training these models suggests that previous attempts to train deep and recurrent neural networks from random initializations have likely failed due to poor initialization schemes. Furthermore, carefully tuned momentum methods suffice for dealing with the curvature issues in deep and recurrent network training objectives without the need for sophisticated second-order methods. Copyright 2013 by the author(s).
摘要: 深度神经网络和递归神经网络(分别为DNN和RNN)是功能强大的模型,被认为几乎不可能使用带有动量的随机梯度下降来进行训练。在本文中,我们表明当具有动量的随机梯度下降使用设计良好的随机初始化和动量参数的特定类型的缓慢增加的调度时,它可以训练DNN和RNN(在具有长期依赖性的数据集上)以以前只有使用Hessian-free优化才能达到的性能水平。我们发现初始化和动量都是至关重要的,因为不能用动量训练初始化差的网络,而当动量缺失或调整不当时,初始化良好的网络的性能会明显变差。我们对这些模型的成功训练表明,先前尝试从随机初始化中训练深度和循环神经网络的尝试可能由于初始化方案不佳而失败。此外,精心设计的动量方法足以解决深层和循环网络训练目标中的曲率问题,而无需复杂的二阶方法。作者的版权2013。”
[91] Perceptual losses for real-time style transfer and super-resolution (2016)
Abstract: We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a per-pixel loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing perceptual loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.
摘要: 我们考虑图像转换问题,其中将输入图像转换为输出图像。针对此类问题的最新方法通常使用输出图像与真实图像之间的每像素损失来训练前馈卷积神经网络。并行工作表明,可以基于从预训练网络中提取的高级特征来定义和优化感知损失函数,从而生成高质量图像。我们结合两种方法的优点,并提出使用感知损失函数来训练图像变换任务的前馈网络。我们展示了图像样式转换的结果,其中训练了前馈网络以解决Gatys等人提出的优化问题。实时。与基于优化的方法相比,我们的网络给出了相似的定性结果,但速度快了三个数量级。我们还尝试了单图像超分辨率,其中用感知损失代替每个像素的损失在视觉上令人愉悦。”
下载地址 | 返回目录 | [10.1007/978-3-319-46475-6_43]
[92] Pixel recurrent neural networks (2016)
Abstract: Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two- dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNel dataset. Samples generated from the model appear crisp, varied and globally coherent.
摘要: 对自然图像的分布进行建模是无监督学习中的一个里程碑式的问题。此任务需要一个可即时表达,易于处理和可扩展的图像模型。我们提出了一个深度神经网络,可以沿着两个空间维度顺序预测图像中的像素。我们的方法对原始像素值的离散概率进行建模,并对图像中的完整依赖项进行编码。建筑上的新颖之处包括快速的二维循环层和在深度循环网络中有效使用残留连接。我们在自然图像上获得了对数似然比,这比以前的技术要好得多。我们的主要结果还为各种ImageNel数据集提供了基准。从模型生成的样本显得清晰,多样且整体一致。
[93] Playing Atari with Deep Reinforcement Learning (2013)
Abstract: We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
摘要: 我们提出了第一个深度学习模型,可以使用强化学习直接从高维感官输入中成功学习控制策略。该模型是一个卷积神经网络,它经过Q学习的变体训练,其输入是原始像素,其输出是估计未来奖励的值函数。我们将我们的方法应用于Arcade学习环境中的七个Atari 2600游戏,无需调整体系结构或学习算法。我们发现它在六个游戏中的表现优于以前的所有方法,在三个游戏中都超过了人类专家。
[94] Pointer networks (2015)
Abstract: We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. Such problems cannot be trivially addressed by existent approaches such as sequence-to-sequence 1 and Neural Turing Machines 2, because the number of target classes in each step of the output depends on the length of the input, which is variable. Problems such as sorting variable sized sequences, and various combinatorial optimization problems belong to this class. Our model solves the problem of variable size output dictionaries using a recently proposed mechanism of neural attention. It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output. We call this architecture a Pointer Net (Ptr-Net). We show Ptr-Nets can be used to learn approximate solutions to three challenging geometric problems - finding planar convex hulls, computing Delaunay triangulations, and the planar Travelling Salesman Problem - using training examples alone. Ptr-Nets not only improve over sequence-to-sequence with input attention, but also allow us to generalize to variable size output dictionaries. We show that the learnt models generalize beyond the maximum lengths they were trained on. We hope our results on these tasks will encourage a broader exploration of neural learning for discrete problems.
摘要: 我们引入了一种新的神经体系结构,以使用与输入序列中的位置相对应的离散标记的元素来学习输出序列的条件概率。此类问题无法通过现有方法(如逐个序列1和神经图灵机2)轻松解决,因为输出的每一步中目标类别的数量取决于输入的长度,该长度是可变的。诸如对可变大小的序列进行排序之类的问题以及各种组合优化问题都属于此类。我们的模型使用最近提出的神经注意机制解决了可变大小输出字典的问题。它与先前的关注尝试的不同之处在于,它不是使用关注点在每个解码器步骤中将编码器的隐藏单元混合到上下文向量,而是使用关注点作为指针来选择输入序列的成员作为输出。我们称这种架构为指针网(Ptr-Net)。我们展示了Ptr-Nets可以用于仅使用训练示例来学习三个难题的几何问题的近似解-查找平面凸包,计算Delaunay三角剖分和平面旅行商问题。 Ptr-Nets不仅在输入注意的情况下改善了逐个序列,而且使我们能够概括为可变大小的输出字典。我们表明,学习的模型的推广超出了对其进行训练的最大长度。我们希望我们在这些任务上的结果将鼓励对离散问题的神经学习进行更广泛的探索。
[95] Policy distillation (2016)
Abstract: Policies for complex visual tasks have been successfully learned with deep reinforcement learning, using an approach called deep Q-networks (DQN), but relatively large (task-specific) networks and extensive training are needed to achieve good performance. In this work, we present a novel method called policy distillation that can be used to extract the policy of a reinforcement learning agent and train a new network that performs at the expert level while being dramatically smaller and more efficient. Furthermore, the same method can be used to consolidate multiple task-specific policies into a single policy. We demonstrate these claims using the Atari domain and show that the multi-task distilled agent outperforms the single-task teachers as well as a jointly-trained DQN agent.
摘要: 复杂的视觉任务的策略已通过使用称为Q Q网络(DQN)的方法进行了深度强化学习而成功学习,但是要获得良好的性能,还需要相对较大的(特定于任务的)网络和广泛的培训。在这项工作中,我们提出了一种称为策略蒸馏的新方法,该方法可用于提取强化学习代理的策略并训练在专家级别执行的新网络,而该网络显着更小,更有效。此外,可以使用相同的方法将多个特定于任务的策略合并为一个策略。我们使用Atari域证明了这些主张,并表明多任务蒸馏代理优于单任务教师以及联合训练的DQN代理。
[96] Preparation of novel high copper ions removal membranes by embedding organosilane-functionalized multi-walled carbon nanotube (2016)
Abstract: BACKGROUND: Multi-walled carbon nanotubes (MWCNTs) have attracted considerable interest in the membrane field, but they are prone to aggregate in the polymer matrix and cause membrane defects. Different from blending MWCNTs in the casting solution, different ratios of APTS-functionalized MWCNTs (A-MWCNTs) were embedded on the surface of polyvinylidene fluoride (PVDF) membranes. RESULTS: FTIR and XPS demonstrated the reaction between MWCNTs and APTS. SEM and AFM images showed that new morphology and pore structure appeared in novel A-MWCNTs/PVDF membranes compared with pure PVDF membrane. The embedding was firm because of APTS chains penetrating into the PVDF matrix, and the optimum content of A-MWCNTs was 0.05 wt{\%}. Besides, A-MWCNTs/PVDF membranes exhibited superior contact angle and BSA rejection than PVDF membrane. In the tests of copper ions rejection and adsorption, 0.05 wt{\%} A-MWCNTs/PVDF membrane gave 89.1{\%} removal and 2.067 mg g−1 adsorption capacity compared with PVDF membrane, at just 19.66{\%} and 0.516 mg g−1, respectively. CONCLUSIONS: The existence of functional groups in A-MWCNTs such as –NH2 and –COOH, can not only increase the hydrophilicity of membranes, but also provide more adsorption sites to have a complexation reaction with Cu2+ ions. This work provides information to solve practical problems in the field of wastewater treatment. {\textcopyright} 2015 Society of Chemical Industry.
摘要: 背景:多壁碳纳米管(MWCNT)在膜领域引起了相当大的兴趣,但它们易于在聚合物基质中聚集并引起膜缺陷。与在浇铸溶液中混合MWCNT不同,在聚偏二氟乙烯(PVDF)膜的表面嵌入了不同比例的APTS功能化MWCNT(A-MWCNT)。结果:FTIR和XPS证实了MWCNTs与APTS之间的反应。 SEM和AFM图像表明,与纯PVDF膜相比,新型A-MWCNTs / PVDF膜出现了新的形态和孔结构。由于APTS链渗透到PVDF基质中,所以嵌入是牢固的,并且A-MWCNT的最佳含量为0.05 wt {\%}。此外,A-MWCNTs / PVDF膜比PVDF膜具有更好的接触角和BSA排斥性。在铜离子排斥和吸附测试中,与PVDF膜相比,0.05 wt {\%}的A-MWCNTs / PVDF膜与PVDF膜相比去除了89.1 {\%},吸附能力为2.067 mg g-1。 %}和0.516 mg g-1。结论:A-MWCNTs中存在诸如–NH2和–COOH的官能团,不仅可以增加膜的亲水性,而且可以提供更多的吸附位点以与Cu2 +离子发生络合反应。这项工作为解决废水处理中的实际问题提供了信息。 {\ textcopyright} 2015年化学工业协会。
下载地址 | 返回目录 | [10.1002/jctb.4820]
[97] Program Induction (2015)
Abstract: “People learning new concepts can often generalize successfully from just a single example, yet machine learning algorithms typically require tens or hundreds of examples to perform with similar accuracy. People can also use learned concepts in richer ways than conventional algorithms—for action, imagination, and explanation.We present a computational model that captures these human learning abilities for a large class of simple visual concepts: handwritten characters from the worlds alphabets. The model represents concepts as simple programs that best explain observed examples under a Bayesian criterion. On a challenging one-shot classification task, the model achieves human-level performance while outperforming recent deep learning approaches.We also present several “visual Turing tests” probing the models creative generalization abilities, which in many cases are indistinguishable from human behavior.”
摘要: “人们学习新概念通常可以仅通过一个示例就成功地将其概括,但是机器学习算法通常需要数十或数百个示例才能以相似的精度执行。人们还可以以比传统算法更丰富的方式使用学习过的概念,从而获得行动,想象力我们提出了一个计算模型,该模型捕获了一大类简单的视觉概念(来自世界字母表的手写字符)的人类学习能力,该模型将概念表示为简单的程序,可以最好地解释根据贝叶斯准则进行观察的示例。该模型具有挑战性,一次性完成了分类任务,在达到人类水平的性能的同时,还胜过了最近的深度学习方法。我们还提出了几种“视觉图灵测试”,以探究模型的创造性概括能力,这在许多情况下与人类行为没有区别。”
下载地址 | 返回目录 | [10.1126/science.aab3050]
[98] Progressive Neural Networks (2016)
Abstract: Learning to solve complex sequences of tasks–while both leveraging transfer and avoiding catastrophic forgetting–remains a key obstacle to achieving human-level intelligence. The progressive networks approach represents a step forward in this direction: they are immune to forgetting and can leverage prior knowledge via lateral connections to previously learned features. We evaluate this architecture extensively on a wide variety of reinforcement learning tasks (Atari and 3D maze games), and show that it outperforms common baselines based on pretraining and finetuning. Using a novel sensitivity measure, we demonstrate that transfer occurs at both low-level sensory and high-level control layers of the learned policy.
摘要: 学习解决复杂的任务序列-同时利用转移和避免灾难性的遗忘-仍然是实现人类智能的主要障碍。渐进式网络方法代表了朝着这个方向迈出的一步:它们不受遗忘,可以通过横向连接到先前学习的功能来利用先验知识。我们在各种强化学习任务(Atari和3D迷宫游戏)上广泛评估了该体系结构,并显示了它在预训练和微调的基础上优于常见的基准。使用一种新颖的敏感性度量,我们证明了转移发生在所学习策略的低层感官层和高层控制层上。
[99] Protective effect of rSm28GST-specific T cells in schistosomiasis: Role of gamma interferon (1994)
Abstract: Immunization with a single dose of 50 g of recombinant Schistosoma mansoni 28-kDa glutathione-S-transferase (rSm28GST) was able to induce a reduction in the worm burden, the number of eggs, and the degree of hepatic fibrosis as quantified by the measurement of collagen content in the liver of S. mansoni-infected mice. No relationship was found between anti-Sm28GST immunoglobulin G and immunoglobulin A titers and the levels of protection obtained. Adoptive transfers of Sm28GST-specific total, CD4+, or CD8+ T cells reproduced the protective effect obtained with the recombinant molecule. Moreover, experiments studying in vivo T-cell depletion demonstrated that anti-CD4- or anti-CD8-treated mice showed a significant decrease in the protective effect conferred, suggesting a role of the two T- cell subpopulations in the expression of Sm28GST-mediated protection against hepatic damage. Sm28GST-specific cells produced little interleukin-4 and high levels of gamma interferon. Treatment of immunized mice with anti-gamma interferon antibody totally suppressed the Sm28GST-induced protective effect and led to the rapid death of infected animals, suggesting a role for this cytokine in the expression of the protective immunity obtained after immunization with rSm28GST.
摘要: 使用单剂量50μgg的重组曼氏血吸虫28 kDa谷胱甘肽S-转移酶(rSm28GST)进行免疫可以减少蠕虫的负担,减少卵子的数量和降低程度。肝纤维化,通过对曼氏沙门氏菌感染小鼠肝脏中胶原蛋白含量的测量来量化。在抗Sm28GST免疫球蛋白G和免疫球蛋白A滴度与获得的保护水平之间未发现相关性。 Sm28GST特异性总CD4 +或CD8 + T细胞的过继转移可重现重组分子的保护作用。此外,研究体内T细胞耗竭的实验表明,抗CD4或抗CD8处理的小鼠显示出所赋予的保护作用显着降低,表明这两个T细胞亚群在Sm28GST介导的表达中的作用防止肝损伤。 Sm28GST特异性细胞产生很少的白介素4和高水平的γ干扰素。用抗γ干扰素抗体治疗免疫小鼠完全抑制了Sm28GST诱导的保护作用,并导致受感染动物迅速死亡,这表明该细胞因子在rSm28GST免疫后获得的保护性免疫表达中具有作用。
下载地址 | 返回目录 | [10.1128/iai.62.9.3723-3730.1994]
[100] R-FCN: Object detection via region-based fully convolutional networks (2016)
Abstract: We present region-based, fully convolutional networks for accurate and efficient object detection. In contrast to previous region-based detectors such as Fast/Faster R-CNN 7, 19 that apply a costly per-region subnetwork hundreds of times, our region-based detector is fully convolutional with almost all computation shared on the entire image. To achieve this goal, we propose position-sensitive score maps to address a dilemma between translation-invariance in image classification and translation-variance in object detection. Our method can thus naturally adopt fully convolutional image classifier backbones, such as the latest Residual Networks (ResNets) 10, for object detection. We show competitive results on the PASCAL VOC datasets (e.g., 83.6{\%} mAP on the 2007 set) with the 101-layer ResNet. Meanwhile, our result is achieved at a test-time speed of 170ms per image, 2.5-20× faster than the Faster R-CNN counterpart. Code is made publicly available at: https://github.com/daijifeng001/r-fcn.
摘要: 我们提出了基于区域的全卷积网络,以进行准确,高效的目标检测。与之前的基于区域的检测器(例如快速/快速R-CNN 7,19)多次应用昂贵的每个区域子网相比,我们的基于区域的检测器是完全卷积的,几乎所有计算都在整个图像上共享。为了实现此目标,我们提出了位置敏感得分图,以解决图像分类中的平移不变性与对象检测中的平移差异性之间的难题。因此,我们的方法自然可以采用完全卷积的图像分类器主干(例如最新的残差网络(ResNets)10)进行对象检测。我们在具有101层ResNet的PASCAL VOC数据集(例如2007年的83.6 {\%} mAP)上显示出竞争性结果。同时,我们的结果是在每幅图像170ms的测试时间速度下实现的,比Faster R-CNN的速度快2.5-20倍。可以在以下网址公开获取代码:https://github.com/daijifeng001/r-fcn。
[101] Reinforcement Learning Neural Turing Machines - Revised (2015)
Abstract: The Neural Turing Machine (NTM) is more expressive than all previously considered models because of its external memory. It can be viewed as a broader effort to use abstract external Interfaces and to learn a parametric model that interacts with them. The capabilities of a model can be extended by providing it with proper Interfaces that interact with the world. These external Interfaces include memory, a database, a search engine, or a piece of software such as a theorem verifier. Some of these Interfaces are provided by the developers of the model. However, many important existing Interfaces, such as databases and search engines, are discrete. We examine feasibility of learning models to interact with discrete Interfaces. We investigate the following discrete Interfaces: a memory Tape, an input Tape, and an output Tape. We use a Reinforcement Learning algorithm to train a neural network that interacts with such Interfaces to solve simple algorithmic tasks. Our Interfaces are expressive enough to make our model Turing complete.
摘要: 神经图灵机(NTM)由于具有外部存储器,因此比以前考虑的所有型号更具表现力。可以将其视为使用抽象外部接口并学习与其进行交互的参数模型的更广泛的尝试。可以通过为模型提供与世界互动的适当接口来扩展其功能。这些外部接口包括内存,数据库,搜索引擎或诸如定理验证器之类的软件。其中一些接口由模型的开发人员提供。但是,许多重要的现有接口(例如数据库和搜索引擎)是离散的。我们研究了学习模型与离散接口交互的可行性。我们研究以下离散接口:内存磁带,输入磁带和输出磁带。我们使用强化学习算法来训练与此类接口交互以解决简单算法任务的神经网络。我们的界面具有足够的表现力,可以使我们的图灵模型变得完整。
[102] Rich feature hierarchies for accurate object detection and semantic segmentation (2013)
Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30{\%} relative to the previous best result on VOC 2012—achieving a mAP of 53.3{\%}. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at http://www.cs.berkeley.edu/{\~{}}rbg/rcnn.
摘要: 根据标准的PASCAL VOC数据集测得的目标检测性能在最近几年一直处于稳定状态。表现最佳的方法是复杂的集成系统,通常将多个低级图像特征与高级上下文结合在一起。在本文中,我们提出了一种简单且可扩展的检测算法,相对于VOC 2012上的先前最佳结果,该算法将平均平均精度(mAP)提高了30 {\%}以上,达到了53.3 {\% }。我们的方法结合了两个关键的见解:(1)一个人可以将高容量卷积神经网络(CNN)应用于自下而上的区域建议,以对对象进行定位和分割;(2)当标记的训练数据稀少,有监督的预训练时对于辅助任务,然后进行特定于域的微调,可以显着提高性能。由于我们将区域提案与CNN相结合,因此我们将我们的方法称为R-CNN:具有CNN功能的区域。我们还将R-CNN与OverFeat(一种基于相似CNN架构的最近提出的滑动窗口检测器)进行了比较。我们发现,在200类ILSVRC2013检测数据集上,R-CNN优于OverFeat。有关完整系统的源代码,请访问http://www.cs.berkeley.edu/{\~{}}rbg/rcnn。
[下载地址](http://arxiv. http://arxiv.org/abs/1311.2524) | 返回目录 | [10.1109/CVPR.2014.81]
[103] SSD: Single shot multibox detector (2016)
Abstract: We present a method for detecting objects in images using a single deep neural network. Our approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location. At prediction time, the network generates scores for the presence of each object category in each default box and produces adjustments to the box to better match the object shape. Additionally, the network combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes. SSD is simple relative to methods that require object proposals because it completely eliminates proposal generation and subsequent pixel or feature resampling stages and encapsulates all computation in a single network. This makes SSD easy to train and straightforward to integrate into systems that require a detection component. Experimental results on the PASCAL VOC, COCO, and ILSVRC datasets confirm that SSD has competitive accuracy to methods that utilize an additional object proposal step and is much faster, while providing a unified framework for both training and inference. For 300×300 input, SSD achieves 74.3{\%} mAP on VOC2007 test at 59 FPS on a Nvidia Titan X and for 512 × 512 input, SSD achieves 76.9{\%} mAP, outperforming a comparable state of the art Faster R-CNN model. Compared to other single stage methods, SSD has much better accuracy even with a smaller input image size. Code is available at https://github.com/weiliu89/caffe/ tree/ssd.
摘要: 我们提出了一种使用单个深度神经网络检测图像中对象的方法。我们的名为SSD的方法将边界框的输出空间离散化为一组默认框,这些默认框具有不同的长宽比和每个要素图位置的比例。在预测时,网络会为每个默认框中的每个对象类别的存在生成分数,并对该框进行调整以更好地匹配对象形状。此外,该网络将来自多个具有不同分辨率的特征图的预测结合起来,以自然地处理各种大小的对象。相对于需要对象提议的方法,SSD简单,因为它完全消除了提议生成和后续的像素或特征重采样阶段,并将所有计算封装在一个网络中。这使得SSD易于培训,并且可以直接集成到需要检测组件的系统中。在PASCAL VOC,COCO和ILSVRC数据集上的实验结果证实,SSD与采用附加对象建议步骤的方法相比具有更高的准确性,并且速度更快,同时为训练和推理提供了统一的框架。对于300×300输入,SSD在Nvidia Titan X上以59 FPS的速度在VOC2007测试上达到74.3 {\%} mAP,对于512×512输入,SSD达到76.9 {\%} mAP,优于同类产品更快的R-CNN模型。与其他单阶段方法相比,即使输入图像尺寸较小,SSD的精度也要好得多。可以从https://github.com/weiliu89/caffe/tree/ssd获得代码。
下载地址 | 返回目录 | [10.1007/978-3-319-46448-0_2]
[104] Science (2010)
Abstract: Research during the past decade has greatly enriched our knowledge about the ways that curriculum design can impact learning in science. An important contribution comes from design-based research - an emerging methodology. Researchers have sought to synthesize refinement studies of curricular innovations. Frameworks to organize design knowledge are emerging. We demonstrate the potential use of such frameworks to guide the process of curricular design by describing two approaches for synthesizing design knowledge: the design principles and the design patterns approaches. Both approaches view science learning as a process of knowledge integration. These approaches can contribute to courses in curriculum design and help designers build on widely used materials to improve student learning. {\textcopyright} 2010 Elsevier Ltd. All rights reserved.
摘要: 过去十年的研究极大地丰富了我们对课程设计可以影响科学学习的方式的知识。一个重要的贡献来自基于设计的研究-一种新兴的方法。研究人员已寻求对课程创新的综合研究进行综合。组织设计知识的框架正在兴起。通过描述两种综合设计知识的方法,我们证明了这种框架在指导课程设计过程中的潜在用途:设计原理和设计模式方法。两种方法都将科学学习视为知识整合的过程。这些方法可以为课程设计课程做出贡献,并帮助设计师在广泛使用的材料基础上改进学生的学习。 {\ textcopyright} 2010 Elsevier Ltd.保留所有权利。
下载地址 | 返回目录 | [10.1016/B978-0-08-044894-7.00081-6]
[105] Semantic Style Transfer and Turning Two-Bit Doodles into Fine Artworks (2016)
Abstract: Convolutional neural networks (CNNs) have proven highly effective at image synthesis and style transfer. For most users, however, using them as tools can be a challenging task due to their unpredictable behavior that goes against common intuitions. This paper introduces a novel concept to augment such generative architectures with semantic annotations, either by manually authoring pixel labels or using existing solutions for semantic segmentation. The result is a content-aware generative algorithm that offers meaningful control over the outcome. Thus, we increase the quality of images generated by avoiding common glitches, make the results look significantly more plausible, and extend the functional range of these algorithms—whether for portraits or landscapes, etc. Applications include semantic style transfer and turning doodles with few colors into masterful paintings!
摘要: 卷积神经网络(CNN)已被证明在图像合成和样式转换方面非常有效。但是,对于大多数用户而言,将它们用作工具可能是一项具有挑战性的任务,因为其不可预测的行为违背了通常的直觉。本文介绍了一种新颖的概念,可以通过人工编写像素标签或使用现有的语义分割解决方案,通过语义注释来扩充此类生成架构。结果是一种内容感知的生成算法,可对结果进行有意义的控制。因此,我们通过避免常见的毛刺来提高图像质量,使结果看起来更合理,并扩展了这些算法的功能范围(无论是肖像画还是风景画等)。应用程序包括语义样式转换和几幅色彩变成精湛的绘画!
[106] Sequence to sequence learning with neural networks (2014)
Abstract: “Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTMs BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTMs performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.”
摘要: “深度神经网络(DNN)是功能强大的模型,在困难的学习任务上取得了出色的性能。尽管DNN在大型标签训练集可用时都能很好地工作,但是它们不能用于将序列映射到序列。在本文中,我们提出了一个一种通用的端到端序列学习方法,该方法对序列结构进行了最小假设。我们的方法使用多层长短期记忆(LSTM)将输入序列映射到固定维数的向量,然后再映射另一种深度LSTM从载体中解码目标序列我们的主要结果是,在WMT14数据集的英语到法语翻译任务中,LSTM产生的翻译在整个测试集上的BLEU得分达到34.8。 LSTM在词汇量大的单词上受到惩罚;此外,LSTM在长句子上没有困难;为了进行比较,基于短语的SMT系统在同一数据集上的BLEU得分达到33.3。使用LSTM来对上述SMT系统产生的1000个假设进行重新排序,其BLEU得分提高到36.5,接近该任务先前的最佳结果。 LSTM还学习了对词序敏感并且对主动和被动语音相对不变的明智的短语和句子表示。最后,我们发现反转所有源句子(而不是目标句子)中单词的顺序可以显着提高LSTM的性能,因为这样做会在源句子和目标句子之间引入许多短期依赖性,这使得优化问题更加容易。”
[107] Sequence to sequence learning with neural networks (2014)
Abstract: “Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTMs BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTMs performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.”
摘要: “深度神经网络(DNN)是功能强大的模型,在困难的学习任务上取得了出色的性能。尽管DNN在大型标签训练集可用时都能很好地工作,但是它们不能用于将序列映射到序列。在本文中,我们提出了一个一种通用的端到端序列学习方法,该方法对序列结构进行了最小假设。我们的方法使用多层长短期记忆(LSTM)将输入序列映射到固定维数的向量,然后再映射另一种深度LSTM从载体中解码目标序列我们的主要结果是,在WMT14数据集的英语到法语翻译任务中,LSTM产生的翻译在整个测试集上的BLEU得分达到34.8。 LSTM在词汇量大的单词上受到惩罚;此外,LSTM在长句子上没有困难;为了进行比较,基于短语的SMT系统在同一数据集上的BLEU得分达到33.3。使用LSTM来对上述SMT系统产生的1000个假设进行重新排序,其BLEU得分提高到36.5,接近该任务先前的最佳结果。 LSTM还学习了对词序敏感并且对主动和被动语音相对不变的明智的短语和句子表示。最后,我们发现反转所有源句子(而不是目标句子)中单词的顺序可以显着提高LSTM的性能,因为这样做会在源句子和目标句子之间引入许多短期依赖性,这使得优化问题更加容易。”
[108] Show and tell: A neural image caption generator (2015)
Abstract: Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU-1 score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU-1 score improvements on Flickr30k, from 56 to 66, and on SBU, from 19 to 28. Lastly, on the newly released COCO dataset, we achieve a BLEU-4 of 27.7, which is the current state-of-the-art.
摘要: 自动描述图像的内容是连接计算机视觉和自然语言处理的人工智能的基本问题。在本文中,我们提出了一个基于深度递归体系结构的生成模型,该模型结合了计算机视觉和机器翻译的最新进展,可用于生成描述图像的自然句子。训练模型以在给定训练图像的情况下最大化目标描述语句的可能性。在多个数据集上进行的实验表明,该模型的准确性以及仅从图像描述中学习的语言的流畅性。我们的模型通常非常准确,我们可以在定性和定量上进行验证。例如,在Pascal数据集上,当前最先进的BLEU-1得分(越高越好)为25,而我们的方法得出的得分为59,与人类表现的69左右相比。我们还显示了BLEU-1 Flickr30k的得分从56上升到66,SBU的得分从19上升到28。最后,在新发布的COCO数据集上,我们的BLEU-4为27.7,这是当前的最新水平。
下载地址 | 返回目录 | [10.1109/CVPR.2015.7298935]
[109] Show, attend and tell: Neural image caption generation with visual attention (2015)
Abstract: Inspired by recent work in machine translation and object detection, we introduce an attention based model that automatically learns to describe the content of images. We describe how we can train this model in a deterministic manner using standard backpropagation techniques and stochastically by maximizing a variational lower bound. We also show through visualization how the model is able to automatically learn to fix its gaze on salient objects while generating the corresponding words in the output sequence. We validate the use of attention with state-of-the-art performance on three benchmark datasets: Flickr9k, Flickr30k and MS COCO.
摘要: 受机器翻译和对象检测领域最新工作的启发,我们引入了一种基于注意力的模型,该模型可以自动学习描述图像的内容。我们描述了如何使用标准的反向传播技术以确定性的方式并通过最大化变分下界随机地训练该模型。我们还通过可视化展示了模型如何能够自动学习将注视固定在显着对象上,同时在输出序列中生成相应的单词。我们通过三个基准数据集(Flickr9k,Flickr30k和MS COCO)的最新性能验证了注意力的使用。
[110] Siamese Thesis (2015)
Abstract: The process of learning good features for machine learning applications can be very computationally expensive and may prove difficult in cases where little data is available. A prototypical example of this is the one-shot learning setting, in which we must correctly make predictions given only a single example of each new class. In this paper, we explore a method for learning siamese neural networks which employ a unique structure to naturally rank similarity between inputs. Once a network has been tuned, we can then capitalize on powerful discriminative features to generalize the predictive power of the network not just to new data, but to entirely new classes from unknown distributions. Using a convolutional architecture, we are able to achieve strong results which exceed those of other deep learning models with near state-of-the-art performance on one-shot classification tasks. ii
摘要: 为机器学习应用程序学习良好功能的过程可能在计算上非常昂贵,并且在几乎没有可用数据的情况下可能会变得困难。一个典型的例子是单次学习设置,在这种情况下,我们必须仅对每个新类的一个单独示例给出正确的预测。在本文中,我们探索了一种学习暹罗神经网络的方法,该方法采用独特的结构自然对输入之间的相似性进行排名。调整完网络后,我们便可以利用强大的判别功能,将网络的预测能力不仅适用于新数据,而且适用于未知分布中的全新类别。使用卷积架构,我们可以在单次分类任务上取得近乎最先进的性能,从而获得超越其他深度学习模型的强大结果。 ii
[111] Sim-to-Real Robot Learning from Pixels with Progressive Nets (2016)
Abstract: Applying end-to-end learning to solve complex, interactive, pixel-driven control tasks on a robot is an unsolved problem. Deep Reinforcement Learning algorithms are too slow to achieve performance on a real robot, but their potential has been demonstrated in simulated environments. We propose using progressive networks to bridge the reality gap and transfer learned policies from simulation to the real world. The progressive net approach is a general framework that enables reuse of everything from low-level visual features to high-level policies for transfer to new tasks, enabling a compositional, yet simple, approach to building complex skills. We present an early demonstration of this approach with a number of experiments in the domain of robot manipulation that focus on bridging the reality gap. Unlike other proposed approaches, our real-world experiments demonstrate successful task learning from raw visual input on a fully actuated robot manipulator. Moreover, rather than relying on model-based trajectory optimisation, the task learning is accomplished using only deep reinforcement learning and sparse rewards.
摘要: 应用端到端学习来解决机器人上复杂的,交互式的,像素驱动的控制任务是一个尚未解决的问题。深度强化学习算法太慢,无法在真正的机器人上实现性能,但在模拟环境中已证明了其潜力。我们建议使用渐进式网络来弥合现实差距,并将学习的策略从仿真转移到现实世界。渐进式网络方法是一个通用框架,可以重用从低级视觉功能到高级策略的所有内容,以转移到新任务上,从而可以通过组合而简单的方法来构建复杂技能。我们通过在机器人操纵领域的许多实验来对此方法进行早期演示,这些实验的重点是弥合现实差距。与其他建议的方法不同,我们的实际实验证明了在完全驱动的机器人操纵器上从原始视觉输入成功学习任务的能力。而且,任务学习不是依靠基于模型的轨迹优化,而是仅通过深度强化学习和稀疏奖励来完成。
[112] Sparse generative adversarial network (2019)
Abstract: We propose a new approach to Generative Adversarial Networks (GANs) to achieve an improved performance with additional robustness to its so-called and well-recognized mode collapse. We first proceed by mapping the desired data onto a frame-based space for a sparse representation to lift any limitation of small support features prior to learning the structure. To that end, we start by dividing an image into multiple patches and modifying the role of the generative network from producing an entire image, at once, to creating a sparse representation vector for each image patch. We synthesize an entire image by multiplying generated sparse representations to a pre-trained dictionary and assembling the resulting patches. This approach restricts the output of the generator to a particular structure, obtained by imposing a Union of Subspaces (UoS) model to the original training data, leading to more realistic images, while maintaining a desired diversity. To further regularize GANs in generating high-quality images and to avoid the notorious mode-collapse problem, we introduce a third player in GANs, called reconstructor. This player utilizes an auto-encoding scheme to ensure that first, the input-output relation in the generator is injective and second each real image corresponds to some input noise. We present a number of experiments, where the proposed algorithm shows a remarkably higher inception score compared to the equivalent conventional GANs.
摘要: 我们为生成对抗网络(GAN)提出了一种新方法,以实现改进的性能以及对其所谓的和公认的模式崩溃具有更高的鲁棒性。我们首先将所需数据映射到基于帧的空间上,以进行稀疏表示,以在学习结构之前解除对小支撑特征的限制。为此,我们首先将图像划分为多个小块,然后修改生成网络的作用,从立即生成完整图像到为每个图像小块创建稀疏表示向量。我们通过将生成的稀疏表示乘以预训练的字典并组装生成的补丁来合成整个图像。这种方法将生成器的输出限制为特定的结构,该结构是通过将子空间联合(UoS)模型强加于原始训练数据而获得的,从而在保持所需分集的同时生成更逼真的图像。为了进一步规范GAN生成高质量图像并避免臭名昭著的模式崩溃问题,我们在GAN中引入了第三个参与者,称为重构器。该播放器利用自动编码方案来确保第一,发生器中的输入-输出关系是可注射的,其次,每个实际图像都对应于一些输入噪声。我们提供了许多实验,其中与同等的常规GAN相比,所提出的算法显示出较高的起始分数。
下载地址 | 返回目录 | [10.1109/ICCVW.2019.00369]
[113] Spatial pyramid pooling in deep convolutional networks for visual recognition (2014)
Abstract: Existing deep convolutional neural networks (CNNs) require a fixed-size (e.g. 224×224) input image. This requirement is “artificial” and may hurt the recognition accuracy for the images or sub-images of an arbitrary size/scale. In this work, we equip the networks with a more principled pooling strategy, “spatial pyramid pooling”, to eliminate the above requirement. The new network structure, called SPP-net, can generate a fixed-length representation regardless of image size/scale. By removing the fixed-size limitation, we can improve all CNN-based image classification methods in general. Our SPP-net achieves state-of-the-art accuracy on the datasets of ImageNet 2012, Pascal VOC 2007, and Caltech101. The power of SPP-net is more significant in object detection. Using SPP-net, we compute the feature maps from the entire image only once, and then pool features in arbitrary regions (sub-images) to generate fixed-length representations for training the detectors. This method avoids repeatedly computing the convolutional features. In processing test images, our method computes convolutional features 30-170× faster than the recent leading method R-CNN (and 24-64× faster overall), while achieving better or comparable accuracy on Pascal VOC 2007. {\textcopyright} 2014 Springer International Publishing.
摘要: “现有的深度卷积神经网络(CNN)需要固定尺寸(例如224×224)的输入图像。该要求是“人工的”,并且可能损害任意大小/比例的图像或子图像的识别精度。在这项工作中,我们为网络配备了更原则的池化策略,即“空间金字塔池”,以消除上述要求。新的网络结构称为SPP-net,可以生成固定长度的表示形式,而与图像大小/比例无关。通过消除固定大小的限制,我们总体上可以改进所有基于CNN的图像分类方法。我们的SPP网络在ImageNet 2012,Pascal VOC 2007和Caltech101的数据集上实现了最先进的准确性。 SPP-net的功能在目标检测中更为重要。使用SPP-net,我们仅从整个图像计算一次特征图,然后在任意区域(子图像)中合并特征以生成固定长度的表示形式以训练检测器。该方法避免了重复计算卷积特征。在处理测试图像时,我们的方法比最近的领先方法R-CNN计算卷积特征快30-170倍(整体速度快24-64倍),同时在Pascal VOC 2007上获得更好或相当的准确性。{\ textcopyright} 2014施普林格国际出版社。”
下载地址 | 返回目录 | [10.1007/978-3-319-10578-9_23]
[114] Speech recognition with deep recurrent neural networks (2013)
Abstract: Recurrent neural networks (RNNs) are a powerful model for sequential data. End-to-end training methods such as Connectionist Temporal Classification make it possible to train RNNs for sequence labelling problems where the input-output alignment is unknown. The combination of these methods with the Long Short-term Memory RNN architecture has proved particularly fruitful, delivering state-of-the-art results in cursive handwriting recognition. However RNN performance in speech recognition has so far been disappointing, with better results returned by deep feedforward networks. This paper investigates deep recurrent neural networks, which combine the multiple levels of representation that have proved so effective in deep networks with the flexible use of long range context that empowers RNNs. When trained end-to-end with suitable regularisation, we find that deep Long Short-term Memory RNNs achieve a test set error of 17.7{\%} on the TIMIT phoneme recognition benchmark, which to our knowledge is the best recorded score. {\textcopyright} 2013 IEEE.
摘要: 递归神经网络(RNN)是用于顺序数据的强大模型。诸如连接主义者时间分类的端到端训练方法使训练RNN可以解决输入输出比对未知的序列标记问题。这些方法与长短期记忆RNN架构的结合已被证明特别有效,可提供草书手写识别的最新结果。但是,到目前为止,RNN在语音识别中的表现令人失望,而深度前馈网络返回了更好的结果。本文研究了深度递归神经网络,该网络将已被证明在深度网络中非常有效的多种表示形式与灵活使用赋予RNN的远程上下文相结合。通过适当的正规化对端到端进行训练后,我们发现深长短期记忆RNN在TIMIT音素识别基准上达到17.7 {\%}的测试集错误,据我们所知,这是记录最好的分数。 {\ textcopyright} 2013 IEEE。
下载地址 | 返回目录 | [10.1109/ICASSP.2013.6638947]
[115] SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and {\textless}0.5MB model size (2016)
Abstract: Recent research on deep neural networks has focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify multiple DNN architectures that achieve that accuracy level. With equivalent accuracy, smaller DNN architectures offer at least three advantages: (1) Smaller DNNs require less communication across servers during distributed training. (2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car. (3) Smaller DNNs are more feasible to deploy on FPGAs and other hardware with limited memory. To provide all of these advantages, we propose a small DNN architecture called SqueezeNet. SqueezeNet achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters. Additionally, with model compression techniques we are able to compress SqueezeNet to less than 0.5MB (510x smaller than AlexNet). The SqueezeNet architecture is available for download here: https://github.com/DeepScale/SqueezeNet
摘要: 关于深度神经网络的最新研究主要集中在提高准确性上。对于给定的精度水平,通常可以确定达到该精度水平的多个DNN架构。较小的DNN架构具有同等的准确性,至少具有三个优点:(1)较小的DNN在分布式培训期间需要较少的跨服务器通信。 (2)较小的DNN需要较少的带宽才能将新模型从云导出到自动驾驶汽车。 (3)较小的DNN在具有有限内存的FPGA和其他硬件上部署更可行。为了提供所有这些优点,我们提出了一种名为SqueezeNet的小型DNN架构。 SqueezeNet通过减少50倍的参数在ImageNet上达到AlexNet级别的准确性。此外,借助模型压缩技术,我们可以将SqueezeNet压缩到小于0.5MB(比AlexNet小510倍)。可在此处下载SqueezeNet架构:https://github.com/DeepScale/SqueezeNet
[116] Squeezed Very Deep Convolutional Neural Networks for Text Classification (2019)
Abstract: Embedding artificial intelligence on constrained platforms has become a trend since the growth of embedded systems and mobile devices, experimented in recent years. Although constrained platforms do not have enough processing capabilities to train a sophisticated deep learning model, like convolutional neural networks (CNN), they are already capable of performing inference locally by using a previously trained embedded model. This approach enables numerous advantages such as privacy, response latency, and no real time network dependence. Still, the use of a local CNN model on constrained platforms is restricted by its storage size. Most of the research in CNNs has focused on increasing network depth to improve accuracy. In the text classification area, deep models were proposed with excellent performance but relying on large architectures with thousands of parameters, and consequently, high storage size. We propose to modify the structure of the Very Deep Convolutional Neural Networks (VDCNN) model to reduce its storage size while keeping the model performance. In this paper, we evaluate the impact of Temporal Depthwise Separable Convolutions and Global Average Pooling in the network parameters, storage size, dedicated hardware dependence, and accuracy. The proposed squeezed model (SVDCNN) is between 10x and 20x smaller than the original version, depending on the network depth, maintaining a maximum disk size of 6MB. Regarding accuracy, the network experiences a loss between 0.4{\%} and 1.3{\%} in the accuracy performance while obtains lower latency over non-dedicated hardware and higher inference time ratio compared to the baseline model.
摘要: 近年来,自嵌入式系统和移动设备的增长以来,将人工智能嵌入受限平台已成为一种趋势。尽管受约束的平台没有足够的处理能力来训练复杂的深度学习模型(例如卷积神经网络(CNN)),但它们已经能够通过使用先前受过训练的嵌入式模型在本地执行推理。这种方法可实现众多优势,例如隐私,响应延迟以及无实时网络依赖性。尽管如此,在受限平台上使用本地CNN模型仍受其存储大小的限制。 CNN的大多数研究都集中在增加网络深度以提高准确性上。在文本分类领域,提出了深度模型,该模型具有出色的性能,但它依赖于具有数千个参数的大型体系结构,因此具有很高的存储容量。我们建议修改甚深卷积神经网络(VDCNN)模型的结构,以减少其存储大小,同时保持模型性能。在本文中,我们评估了时间深度可分离卷积和全局平均池对网络参数,存储大小,专用硬件依存性和准确性的影响。所建议的压缩模型(SVDCNN)比原始版本小10到20倍,具体取决于网络深度,最大磁盘大小为6 MB。关于准确性,与基准模型相比,网络在准确性方面的损失介于0.4 {\%}和1.3 {\%}之间,同时在非专用硬件上的延迟较低,推理时间比率更高。
下载地址 | 返回目录 | [10.1007/978-3-030-30487-4_16]
[117] Studies on Ozone-oxidation of Dye in a Bubble Column Reactor at Different pH and Different Oxidation-reduction Potential. (2010)
Abstract: We show how to use “complementary priors” to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.
摘要: 我们展示了如何使用“互补先验”来消除那些在具有许多隐藏层的密集连接的信念网中难以进行推理的解释效应。使用互补先验,我们推导出了一种快速的贪心算法,只要最顶层的两层形成了无方向的关联记忆,就可以一次学习深入的定向信念网络。快速贪婪算法用于初始化较慢的学习过程,该过程使用唤醒睡眠算法的对比版本来微调权重。经过微调后,具有三个隐藏层的网络形成了手写数字图像及其标签的联合分布的很好的生成模型。与最佳判别学习算法相比,该生成模型提供了更好的数字分类。数字所在的低维流形是由顶层关联记忆的自由能景观中的长沟壑建模的,通过使用有向连接来显示关联记忆中的内容很容易探索这些沟壑。心神。
下载地址 | 返回目录 | [10.7763/ijesd.2010.v1.67]
[118] Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours (2016)
Abstract: Current model free learning-based robot grasping approaches exploit human-labeled datasets for training the models. However, there are two problems with such a methodology: (a) since each object can be grasped in multiple ways, manually labeling grasp locations is not a trivial task; (b) human labeling is biased by semantics. While there have been attempts to train robots using trial-and-error experiments, the amount of data used in such experiments remains substantially low and hence makes the learner prone to over-fitting. In this paper, we take the leap of increasing the available training data to 40 times more than prior work, leading to a dataset size of 50K data points collected over 700 hours of robot grasping attempts. This allows us to train a Convolutional Neural Network (CNN) for the task of predicting grasp locations without severe overfitting. In our formulation, we recast the regression problem to an 18-way binary classification over image patches. We also present a multi-stage learning approach where a CNN trained in one stage is used to collect hard negatives in subsequent stages. Our experiments clearly show the benefit of using large-scale datasets (and multi-stage training) for the task of grasping. We also compare to several baselines and show state-of-the-art performance on generalization to unseen objects for grasping.
摘要: 当前基于模型的基于学习的免费机器人抓取方法利用了人类标记的数据集来训练模型。但是,这种方法存在两个问题:(a)由于可以通过多种方式抓取每个对象,因此手动标记抓取位置并不是一件容易的事; (b)人为标签受到语义的偏见。尽管已经尝试使用试错实验来训练机器人,但是此类实验中使用的数据量仍然很低,因此使学习者容易过度拟合。在本文中,我们采取了飞跃性的行动,将可用的训练数据增加到以前的工作的40倍,从而导致在700个小时的机器人抓握尝试中收集到的5万个数据点的数据集大小。这使我们可以训练卷积神经网络(CNN),以预测抓地力的位置,而不会出现过度拟合的情况。在我们的公式中,我们将回归问题重塑为图像斑块的18路二进制分类。我们还提出了一种多阶段学习方法,其中在一个阶段中训练的CNN用于在后续阶段中收集硬底片。我们的实验清楚地表明了使用大规模数据集(和多阶段训练)来完成抓取任务的好处。我们还比较了几个基准,并展示了对未发现的物体进行抓取时的泛化性能。
下载地址 | 返回目录 | [10.1109/ICRA.2016.7487517]
[119] Target-driven visual navigation in indoor scenes using deep reinforcement learning (2017)
Abstract: Two less addressed issues of deep reinforcement learning are (1) lack of generalization capability to new goals, and (2) data inefficiency, i.e., the model requires several (and often costly) episodes of trial and error to converge, which makes it impractical to be applied to real-world scenarios. In this paper, we address these two issues and apply our model to target-driven visual navigation. To address the first issue, we propose an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization. To address the second issue, we propose the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine. Our framework enables agents to take actions and interact with objects. Hence, we can collect a huge number of training samples efficiently. We show that our proposed method (1) converges faster than the state-of-the-art deep reinforcement learning methods, (2) generalizes across targets and scenes, (3) generalizes to a real robot scenario with a small amount of fine-tuning (although the model is trained in simulation), (4) is end-to-end trainable and does not need feature engineering, feature matching between frames or 3D reconstruction of the environment.
摘要: 深度强化学习的两个未得到解决的问题是(1)缺乏对新目标的泛化能力,以及(2)数据效率低下,即该模型需要多次(且往往是昂贵的)试验和错误来收敛,这使得将其应用于实际场景是不切实际的。在本文中,我们解决了这两个问题,并将我们的模型应用于目标驱动的视觉导航。为了解决第一个问题,我们提出了一个行为者批评模型,该模型的策略是目标以及当前状态的函数,可以更好地进行概括。为了解决第二个问题,我们提出了AI2-THOR框架,该框架为环境提供了高质量的3D场景和物理引擎。我们的框架使代理能够采取行动并与对象进行交互。因此,我们可以有效地收集大量的训练样本。我们表明,我们提出的方法(1)的融合速度比最新的深度强化学习方法快;(2)可以跨目标和场景进行概括;(3)可以归纳为具有少量精细信息的真实机器人场景调整(尽管模型是在仿真中训练的),(4)是端到端可训练的,不需要特征工程,框架之间的特征匹配或环境的3D重建。
下载地址 | 返回目录 | [10.1109/ICRA.2017.7989381]
[120] Teaching machines to read and comprehend (2015)
Abstract: Teaching machines to read natural language documents remains an elusive challenge. Machine reading systems can be tested on their ability to answer questions posed on the contents of documents that they have seen, but until now large scale training and test datasets have been missing for this type of evaluation. In this work we define a new methodology that resolves this bottleneck and provides large scale supervised reading comprehension data. This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure.
摘要: 阅读自然语言文档的教学机仍然是一个难以捉摸的挑战。可以对机器阅读系统的回答其所见文档内容所提出问题的能力进行测试,但是直到现在,此类评估都缺少大规模培训和测试数据集。在这项工作中,我们定义了一种解决该瓶颈并提供大规模监督阅读理解数据的新方法。这使我们能够开发一类基于注意力的深度神经网络,以最少的语言结构先验知识来学习阅读真实文档和回答复杂问题。
[121] Texture networks: Feed-forward synthesis of textures and stylized images (2016)
Abstract: Gatys et al. recently demonstrated that deep networks can generate beautiful textures and stylized images from a single texture example. However, their methods require a slow and memoryconsuming optimization process. We propose here an alternative approach that moves the computational burden to a learning stage. Given a single example of a texture, our approach trains compact feed-forward convolutional networks to generate multiple samples of the same texture of arbitrary size and to transfer artistic style from a given image to any other image. The resulting networks are remarkably light-weight and can generate textures of quality comparable to Gatys et al., but hundreds of times faster. More generally, our approach highlights the power and flexibility of generative feed-forward models trained with complex and expressive loss functions.
摘要: Gatys等。最近的研究表明,深层网络可以从一个纹理示例中生成漂亮的纹理和风格化的图像。但是,它们的方法需要缓慢且消耗内存的优化过程。我们在这里提出一种替代方法,将计算负担转移到学习阶段。给定一个纹理的单个示例,我们的方法训练紧凑的前馈卷积网络,以生成任意大小的相同纹理的多个样本,并将艺术风格从给定图像转移到任何其他图像。生成的网络非常轻便,可以生成与Gatys等人相媲美的质量纹理,但速度要快数百倍。更一般地说,我们的方法强调了用复杂和表达损失函数训练的生成前馈模型的功能和灵活性。
[122] Toward Human Parity in Conversational Speech Recognition (2017)
Abstract: Conversational speech recognition has served as a flagship speech recognition task since the release of the Switchboard corpus in the 1990s. In this paper, we measure a human error rate on the widely used NIST 2000 test set for commercial bulk transcription. The error rate of professional transcribers is 5.9{\%} for the Switchboard portion of the data, in which newly acquainted pairs of people discuss an assigned topic, and 11.3{\%} for the CallHome portion, where friends and family members have open-ended conversations. In both cases, our automated system edges past the human benchmark, achieving error rates of 5.8{\%} and 11.0{\%
摘要: 自1990年代Switchboard语料库发布以来,会话语音识别一直是一项旗舰语音识别任务。在本文中,我们在广泛用于商业批量转录的NIST 2000测试仪上测量人为错误率。对于数据的“总机”部分,专业转录员的错误率为5.9 {\%},其中新认识的人讨论分配的主题,对于CallHome部分,朋友和家人为11.3 {\%}有不限成员名额的对话。在这两种情况下,我们的自动化系统都超出了人工基准,错误率分别为5.8 {\%}和11.0 {\%
下载地址 | 返回目录 | [10.1109/TASLP.2017.2756440]
[123] Towards AI-complete question answering: A set of prerequisite toy tasks (2016)
Abstract: One long-term goal of machine learning research is to produce methods that are applicable to reasoning and natural language, in particular building an intelligent dialogue agent. To measure progress towards that goal, we argue for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering. Our tasks measure understanding in several ways: whether a system is able to answer questions via chaining facts, simple induction, deduction and many more. The tasks are designed to be prerequisites for any system that aims to be capable of conversing with a human. We believe many existing learning systems can currently not solve them, and hence our aim is to classify these tasks into skill sets, so that researchers can identify (and then rectify) the failings of their systems. We also extend and improve the recently introduced Memory Networks model, and show it is able to solve some, but not all, of the tasks.
摘要: 机器学习研究的一个长期目标是产生适用于推理和自然语言的方法,尤其是构建智能对话代理。为了衡量实现该目标的进度,我们争辩了一组代理任务的有用性,这些任务通过问答来评估阅读理解。我们的任务以几种方式衡量理解程度:系统是否能够通过链接事实,简单的归纳,推论等等来回答问题。这些任务被设计为任何旨在能够与人类对话的系统的先决条件。我们认为许多现有的学习系统目前无法解决这些问题,因此我们的目标是将这些任务分类为各种技能,以便研究人员可以识别(然后纠正)其系统的故障。我们还扩展和改进了最近推出的内存网络模型,并表明它能够解决部分但不是全部任务。”
[124] Towards end-to-end speech recognition with recurrent neural networks (2014)
Abstract: This paper presents a speech recognition system that directly transcribes audio data with text, without requiring an intermediate phonetic representation. The system is based on a combination of the deep bidirectional LSTM recurrent neural network architecture and the Connectionist Temporal Classification objective function. A modification to the objective function is introduced that trains the network to minimise the expectation of an arbitrary transcription loss function. This allows a direct optimisation of the word error rate, even in the absence of a lexicon or language model. The system achieves a word error rate of 27.3{\%} on the Wall Street Journal corpus with no prior linguistic information, 21.9{\%} with only a lexicon of allowed words, and 8.2{\%} with a trigram language model. Combining the network with a baseline system further reduces the error rate to 6.7{\%}.
摘要: 本文提出了一种语音识别系统,该语音识别系统可以直接在文本中转录音频数据,而无需中间的语音表示。该系统基于深度双向LSTM递归神经网络体系结构和Connectionist时间分类目标函数的组合。引入了对目标函数的修改,该修改可以训练网络以最大程度地减少对任意转录丢失函数的期望。即使在没有词典或语言模型的情况下,这也可以直接优化单词错误率。该系统在没有先验语言信息的情况下在《华尔街日报》语料库中实现了27.3 {\%}的单词错误率,在只有允许的单词词典的情况下达到21.9 {\%},并且通过三字母组实现了8.2 {\%}语言模型。将网络与基准系统结合使用,可将错误率进一步降低至6.7 {\%}。
[125] Transferring Rich Feature Hierarchies for Robust Visual Tracking (2015)
Abstract: Convolutional neural network (CNN) models have demonstrated great success in various computer vision tasks including image classification and object detection. However, some equally important tasks such as visual tracking remain relatively unexplored. We believe that a major hurdle that hinders the application of CNN to visual tracking is the lack of properly labeled training data. While existing applications that liberate the power of CNN often need an enormous amount of training data in the order of millions, visual tracking applications typically have only one labeled example in the first frame of each video. We address this research issue here by pre-training a CNN offline and then transferring the rich feature hierarchies learned to online tracking. The CNN is also fine-tuned during online tracking to adapt to the appearance of the tracked target specified in the first video frame. To fit the characteristics of object tracking, we first pre-train the CNN to recognize what is an object, and then propose to generate a probability map instead of producing a simple class label. Using two challenging open benchmarks for performance evaluation, our proposed tracker has demonstrated substantial improvement over other state-of-the-art trackers.
摘要: 卷积神经网络(CNN)模型已在包括图像分类和目标检测在内的各种计算机视觉任务中取得了巨大的成功。但是,诸如视觉跟踪之类的一些同样重要的任务仍然相对未被开发。我们认为,阻碍CNN应用于视觉跟踪的主要障碍是缺乏正确标记的训练数据。尽管释放CNN功能的现有应用程序通常需要数百万个数量级的大量训练数据,但视觉跟踪应用程序通常在每个视频的第一帧中只有一个带标签的示例。我们在此处通过离线培训CNN,然后将学到的丰富功能层次结构转移到在线跟踪中来解决此研究问题。在在线跟踪过程中,也可以对CNN进行微调,以适应在第一视频帧中指定的跟踪目标的外观。为了适应对象跟踪的特征,我们首先对CNN进行预训练以识别什么是对象,然后提出生成概率图而不是生成简单的类标签。通过使用两个具有挑战性的开放基准进行性能评估,我们提出的跟踪器已证明比其他最新的跟踪器有了实质性的改进。
[126] Unsupervised representation learning with deep convolutional generative adversarial networks (2016)
Abstract: In recent years, supervised learning with convolutional networks (CNNs) has seen huge adoption in computer vision applications. Comparatively, unsupervised learning with CNNs has received less attention. In this work we hope to help bridge the gap between the success of CNNs for supervised learning and unsupervised learning. We introduce a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrate that they are a strong candidate for unsupervised learning. Training on various image datasets, we show convincing evidence that our deep convolutional adversarial pair learns a hierarchy of representations from object parts to scenes in both the generator and discriminator. Additionally, we use the learned features for novel tasks - demonstrating their applicability as general image representations.
摘要: 近年来,使用卷积网络(CNN)进行监督学习已在计算机视觉应用程序中得到了广泛采用。相比之下,CNN的无监督学习受到的关注较少。在这项工作中,我们希望帮助弥合CNN在有监督学习的成功与无监督学习之间的差距。我们介绍了一类称为深度卷积生成对抗网络(DCGAN)的CNN,它们具有一定的体系结构约束,并证明它们是无监督学习的强大候选人。在各种图像数据集上进行训练,我们显示出令人信服的证据,即我们深厚的卷积对抗对学习了生成器和鉴别器中从对象部分到场景的表示层次。此外,我们将学习到的功能用于新颖的任务-展示了它们作为一般图像表示形式的适用性。
[127] Very deep convolutional networks for large-scale image recognition (2015)
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3 × 3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16–19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.
摘要: 在这项工作中,我们研究了卷积网络深度对其在大规模图像识别环境中的准确性的影响。我们的主要贡献是使用具有非常小(3×3)卷积滤波器的体系结构对增加深度的网络进行全面评估,这表明将深度推至16-19重量可以实现对现有技术配置的重大改进层。这些发现是我们提交2014年ImageNet挑战赛的基础,我们的团队分别在本地化和分类轨道上获得了第一名和第二名。我们还表明,我们的表示法可以很好地推广到其他数据集,在这些数据集中它们可以达到最新的结果。我们已经公开提供了两个性能最佳的ConvNet模型,以促进对在计算机视觉中使用深层视觉表示的进一步研究。
[128] You only look once: Unified, real-time object detection (2016)
Abstract: We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
摘要: 我们介绍YOLO,这是一种新的物体检测方法。先前有关对象检测的工作将分类器用于执行检测。取而代之的是,我们将对象检测框架化为空间分隔的边界框和相关类概率的回归问题。单个神经网络可以在一次评估中直接从完整图像中预测边界框和类概率。由于整个检测管道是单个网络,因此可以直接在检测性能上进行端到端优化。我们的统一体系结构非常快。我们的基本YOLO模型以每秒45帧的速度实时处理图像。较小的网络Fast YOLO每秒可处理惊人的155帧,同时仍可实现其他实时检测器的mAP两倍的性能。与最新的检测系统相比,YOLO会产生更多的定位错误,但预测背景假阳性的可能性较小。最后,YOLO学习了非常普通的对象表示形式。从自然图像推广到艺术品等其他领域时,它的性能优于其他检测方法,包括DPM和R-CNN。
下载地址 | 返回目录 | [10.1109/CVPR.2016.91]
中文摘要仅供参考