参考文献¶ 在 SageMaker Studio Lab 中打开 Notebook
- Abadi et al., 2016
Abadi, M.、Barham, P.、Chen, J.、Chen, Z.、Davis, A.、Dean, J. 等 (2016)。TensorFlow: a system for large-scale machine learning。第 12 届 USENIX 操作系统设计与实现研讨会 (OSDI 16) (第 265–283 页)。
- Abdel-Hamid et al., 2014
Abdel-Hamid, O.、Mohamed, A.-R.、Jiang, H.、Deng, L.、Penn, G. 和 Yu, D. (2014)。Convolutional neural networks for speech recognition。IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(10), 1533–1545。
- Ahmed et al., 2012
Ahmed, A.、Aly, M.、Gonzalez, J.、Narayanamurthy, S. 和 Smola, A. J. (2012)。Scalable inference in latent variable models。第五届 ACM 国际网络搜索与数据挖掘会议论文集 (第 123–132 页)。
- Akiba et al., 2019
Akiba, T.、Sano, S.、Yanase, T.、Ohta, T. 和 Koyama, M. (2019)。Optuna: a next-generation hyperparameter optimization framework。第 25 届 ACM SIGKDD 国际知识发现与数据挖掘会议论文集。
- Alayrac et al., 2022
Alayrac, J.-B.、Donahue, J.、Luc, P.、Miech, A.、Barr, I.、Hasson, Y. 等 (2022)。Flamingo: a visual language model for few-shot learning。ArXiv:2204.14198。
- Alsallakh et al., 2020
Alsallakh, B.、Kokhlikyan, N.、Miglani, V.、Yuan, J. 和 Reblitz-Richardson, O. (2020)。Mind the PAD – CNNs can develop blind spots。ArXiv:2010.02178。
- Anil et al., 2023
Anil, R.、Dai, A. M.、Firat, O.、Johnson, M.、Lepikhin, D.、Passos, A. 等 (2023)。PaLM 2 Technical Report。ArXiv:2305.10403。
- Anil et al., 2020
Anil, R.、Gupta, V.、Koren, T.、Regan, K. 和 Singer, Y. (2020)。Scalable second-order optimization for deep learning。ArXiv:2002.09018。
- Aronszajn, 1950
Aronszajn, N. (1950)。Theory of reproducing kernels。Transactions of the American Mathematical Society, 68(3), 337–404。
- Ba et al., 2016
Ba, J. L.、Kiros, J. R. 和 Hinton, G. E. (2016)。Layer normalization。ArXiv:1607.06450。
- Baevski & Auli, 2018
Baevski, A. 和 Auli, M. (2018)。Adaptive input representations for neural language modeling。国际学习表征会议。
- Bahdanau et al., 2014
Bahdanau, D.、Cho, K. 和 Bengio, Y. (2014)。Neural machine translation by jointly learning to align and translate。ArXiv:1409.0473。
- Bai et al., 2022
Bai, Y.、Kadavath, S.、Kundu, S.、Askell, A.、Kernion, J.、Jones, A. 等 (2022)。Constitutional AI: harmlessness from AI feedback。ArXiv:2212.08073。
- Baptista & Poloczek, 2018
Baptista, R. 和 Poloczek, M. (2018)。Bayesian optimization of combinatorial structures。第 35 届国际机器学习会议论文集。
- Bardenet et al., 2013
Bardenet, R.、Brendel, M.、Kégl, B. 和 Sebag, M. (2013)。Collaborative hyperparameter tuning。第 30 届国际机器学习会议论文集 (ICML'13)。
- Bay et al., 2006
Bay, H.、Tuytelaars, T. 和 Van Gool, L. (2006)。SURF: Speeded up robust features。欧洲计算机视觉会议 (第 404–417 页)。
- Bellman, 1966
Bellman, R. (1966)。Dynamic programming。Science, 153, 34–37。
- Bellman, 1952
Bellman, R. (1952)。On the theory of dynamic programming。Proceedings of the National Academy of Sciences, 38(8), 716–719。
- Bellman, 1957a
Bellman, R. (1957)。A Markovian decision process。Journal of Mathematics and Mechanics, 6(5), 679–684。URL: http://www.jstor.org/stable/24900506
- Bellman, 1957b
Bellman, R. (1957)。Dynamic Programming。Dover Publications。
- Beltagy et al., 2020
Beltagy, I.、Peters, M. E. 和 Cohan, A. (2020)。Longformer: the long-document transformer。ArXiv:2004.05150。
- Bengio et al., 2003
Bengio, Y.、Ducharme, R.、Vincent, P. 和 Jauvin, C. (2003)。A neural probabilistic language model。Journal of Machine Learning Research, 3(Feb), 1137–1155。
- Bengio et al., 1994
Bengio, Y.、Simard, P. 和 Frasconi, P. (1994)。Learning long-term dependencies with gradient descent is difficult。IEEE Transactions on Neural Networks, 5(2), 157–166。
- Bergstra et al., 2011
Bergstra, J.、Bardenet, R.、Bengio, Y. 和 Kégl, B. (2011)。Algorithms for hyper-parameter optimization。Advances in Neural Information Processing Systems, 24。
- Bergstra et al., 2010
Bergstra, J.、Breuleux, O.、Bastien, F.、Lamblin, P.、Pascanu, R.、Desjardins, G. … Bengio, Y. (2010)。Theano: a CPU and GPU math compiler in Python。Proc. 9th Python in Science Conference (第 3–10 页)。
- Beutel et al., 2014
Beutel, A.、Murray, K.、Faloutsos, C. 和 Smola, A. J. (2014)。CoBaFi: collaborative Bayesian filtering。第 23 届国际万维网会议论文集 (第 97–108 页)。
- Bishop, 1995
Bishop, C. M. (1995)。Training with noise is equivalent to Tikhonov regularization。Neural Computation, 7(1), 108–116。
- Bishop, 2006
Bishop, C. M. (2006)。Pattern Recognition and Machine Learning。Springer。
- Black & Scholes, 1973
Black, F. 和 Scholes, M. (1973)。The pricing of options and corporate liabilities。Journal of Political Economy, 81, 637–654。
- Bodla et al., 2017
Bodla, N.、Singh, B.、Chellappa, R. 和 Davis, L. S. (2017)。Soft-NMS-improving object detection with one line of code。IEEE 国际计算机视觉会议论文集 (第 5561–5569 页)。
- Bojanowski et al., 2017
Bojanowski, P.、Grave, E.、Joulin, A. 和 Mikolov, T. (2017)。Enriching word vectors with subword information。Transactions of the Association for Computational Linguistics, 5, 135–146。
- Bollobas, 1999
Bollobás, B. (1999)。Linear Analysis。Cambridge University Press。
- Bommasani et al., 2021
Bommasani, R.、Hudson, D. A.、Adeli, E.、Altman, R.、Arora, S.、von Arx, S. 等 (2021)。On the opportunities and risks of foundation models。ArXiv:2108.07258。
- Bottou, 2010
Bottou, L. (2010)。Large-scale machine learning with stochastic gradient descent。COMPSTAT'2010 论文集 (第 177–186 页)。Springer。
- Bottou & Le Cun, 1988
Bottou, L. 和 Le Cun, Y. (1988)。SN: a simulator for connectionist models。Proceedings of NeuroNimes 88 (第 371–382 页)。法国尼姆。URL: http://leon.bottou.org/papers/bottou-lecun-88
- Boucheron et al., 2005
Boucheron, S.、Bousquet, O. 和 Lugosi, G. (2005)。Theory of classification: a survey of some recent advances。ESAIM: Probability and Statistics, 9, 323–375。
- Bowman et al., 2015
Bowman, S. R.、Angeli, G.、Potts, C. 和 Manning, C. D. (2015)。A large annotated corpus for learning natural language inference。ArXiv:1508.05326。
- Boyd & Vandenberghe, 2004
Boyd, S. 和 Vandenberghe, L. (2004)。Convex Optimization。英格兰剑桥: Cambridge University Press。
- Bradley & Terry, 1952
Bradley, R. A. 和 Terry, M. E. (1952)。Rank analysis of incomplete block designs: I. The method of paired comparisons。Biometrika, 39(3/4), 324–345。
- Brown & Sandholm, 2017
Brown, N. 和 Sandholm, T. (2017)。Libratus: the superhuman AI for no-limit poker。IJCAI (第 5226–5228 页)。
- Brown et al., 1990
Brown, P. F.、Cocke, J.、Della Pietra, S. A.、Della Pietra, V. J.、Jelinek, F.、Lafferty, J. … Roossin, P. S. (1990)。A statistical approach to machine translation。Computational Linguistics, 16(2), 79–85。
- Brown et al., 1988
Brown, P. F.、Cocke, J.、Della Pietra, S. A.、Della Pietra, V. J.、Jelinek, F.、Mercer, R. L. 和 Roossin, P. (1988)。A statistical approach to language translation。COLING Budapest 1988 Volume 1: International Conference on Computational Linguistics。
- Brown et al., 2020
Brown, T.、Mann, B.、Ryder, N.、Subbiah, M.、Kaplan, J. D.、Dhariwal, P. 等 (2020)。Language models are few-shot learners。Advances in Neural Information Processing Systems, 33, 1877–1901。
- Buslaev et al., 2020
Buslaev, A.、Iglovikov, V. I.、Khvedchenya, E.、Parinov, A.、Druzhinin, M. 和 Kalinin, A. A. (2020)。Albumentations: Fast and flexible image augmentations。Information, 11(2), 125。
- Campbell et al., 2002
Campbell, M.、Hoane Jr, A. J. 和 Hsu, F.-h. (2002)。Deep blue。Artificial Intelligence, 134(1-2), 57–83。
- Canny, 1987
Canny, J. (1987)。A computational approach to edge detection。Readings in Computer Vision (第 184–203 页)。Elsevier。
- Cer et al., 2017
Cer, D.、Diab, M.、Agirre, E.、Lopez-Gazpio, I. 和 Specia, L. (2017)。SemEval-2017 Task 1: semantic textual similarity multilingual and crosslingual focused evaluation。第 11 届国际语义评估研讨会论文集 (SemEval-2017) (第 1–14 页)。
- Chan et al., 2015
Chan, W.、Jaitly, N.、Le, Q. V. 和 Vinyals, O. (2015)。Listen, attend and spell。ArXiv:1508.01211。
- Chen et al., 2021
Chen, L.、Lu, K.、Rajeswaran, A.、Lee, K.、Grover, A.、Laskin, M. … Mordatch, I. (2021)。Decision transformer: reinforcement learning via sequence modeling。Advances in Neural Information Processing Systems, 34, 15084–15097。
- Chen et al., 2015
Chen, T.、Li, M.、Li, Y.、Lin, M.、Wang, N.、Wang, M. … Zhang, Z. (2015)。MXNET: a flexible and efficient machine learning library for heterogeneous distributed systems。ArXiv:1512.01274。
- Cheng et al., 2016
Cheng, J.、Dong, L. 和 Lapata, M. (2016)。Long short-term memory-networks for machine reading。2016 年自然语言处理经验方法会议论文集 (第 551–561 页)。
- Chetlur et al., 2014
Chetlur, S.、Woolley, C.、Vandermersch, P.、Cohen, J.、Tran, J.、Catanzaro, B. 和 Shelhamer, E. (2014)。CuDNN: Efficient primitives for deep learning。ArXiv:1410.0759。
- Cho et al., 2014a
Cho, K.、Van Merriënboer, B.、Bahdanau, D. 和 Bengio, Y. (2014)。On the properties of neural machine translation: Encoder–decoder approaches。ArXiv:1409.1259。
- Cho et al., 2014b
Cho, K.、Van Merriënboer, B.、Gulcehre, C.、Bahdanau, D.、Bougares, F.、Schwenk, H. 和 Bengio, Y. (2014)。Learning phrase representations using RNN encoder–decoder for statistical machine translation。ArXiv:1406.1078。
- Chowdhery et al., 2022
Chowdhery, A.、Narang, S.、Devlin, J.、Bosma, M.、Mishra, G.、Roberts, A. 等 (2022)。PaLM: scaling language modeling with pathways。ArXiv:2204.02311。
- Chung et al., 2014
Chung, J.、Gulcehre, C.、Cho, K. 和 Bengio, Y. (2014)。Empirical evaluation of gated recurrent neural networks on sequence modeling。ArXiv:1412.3555。
- Clark et al., 2020
Clark, K.、Luong, M.-T.、Le, Q. V. 和 Manning, C. D. (2020)。ELECTRA: pre-training text encoders as discriminators rather than generators。国际学习表征会议。
- Collobert et al., 2011
Collobert, R.、Weston, J.、Bottou, L.、Karlen, M.、Kavukcuoglu, K. 和 Kuksa, P. (2011)。Natural language processing (almost) from scratch。Journal of Machine Learning Research, 12, 2493–2537。
- Cordonnier et al., 2020
Cordonnier, J.-B.、Loukas, A. 和 Jaggi, M. (2020)。On the relationship between self-attention and convolutional layers。国际学习表征会议。
- Cover & Thomas, 1999
Cover, T. 和 Thomas, J. (1999)。Elements of Information Theory。John Wiley & Sons。
- Csiszar, 2008
Csiszár, I. (2008)。Axiomatic characterizations of information measures。Entropy, 10(3), 261–273。
- Cybenko, 1989
Cybenko, G. (1989)。Approximation by superpositions of a sigmoidal function。Mathematics of Control, Signals and Systems, 2(4), 303–314。
- Dalal & Triggs, 2005
Dalal, N. 和 Triggs, B. (2005)。Histograms of oriented gradients for human detection。2005 IEEE 计算机学会计算机视觉与模式识别会议 (CVPR'05) (第 886–893 页)。
- DeCock, 2011
De Cock, D. (2011)。Ames, Iowa: alternative to the Boston housing data as an end of semester regression project。Journal of Statistics Education, 19(3)。
- Dean et al., 2012
Dean, J.、Corrado, G. S.、Monga, R.、Chen, K.、Devin, M.、Le, Q. V. 等 (2012)。Large scale distributed deep networks。第 25 届国际神经信息处理系统会议论文集, 第 1 卷 (第 1223–1231 页)。
- DeCandia et al., 2007
DeCandia, G.、Hastorun, D.、Jampani, M.、Kakulapati, G.、Lakshman, A.、Pilchin, A. … Vogels, W. (2007)。Dynamo: Amazon's highly available key-value store。ACM SIGOPS Operating Systems Review (第 205–220 页)。
- Deng et al., 2009
Deng, J.、Dong, W.、Socher, R.、Li, L.-J.、Li, K. 和 Fei-Fei, L. (2009)。Imagenet: a large-scale hierarchical image database。2009 IEEE 计算机视觉与模式识别会议 (第 248–255 页)。
- DerKiureghian & Ditlevsen, 2009
Der Kiureghian, A. 和 Ditlevsen, O. (2009)。Aleatory or epistemic? does it matter?。Structural Safety, 31(2), 105–112。
- Devlin et al., 2018
Devlin, J.、Chang, M.-W.、Lee, K. 和 Toutanova, K. (2018)。BERT: Pre-training of deep bidirectional transformers for language understanding。ArXiv:1810.04805。
- Dinh et al., 2014
Dinh, L.、Krueger, D. 和 Bengio, Y. (2014)。NICE: non-linear independent components estimation。ArXiv:1410.8516。
- Dinh et al., 2017
Dinh, L.、Sohl-Dickstein, J. 和 Bengio, S. (2017)。Density estimation using real NVP。国际学习表征会议。
- Doersch et al., 2015
Doersch, C.、Gupta, A. 和 Efros, A. A. (2015)。Unsupervised visual representation learning by context prediction。IEEE 国际计算机视觉会议论文集 (第 1422–1430 页)。
- Dosovitskiy et al., 2021
Dosovitskiy, A.、Beyer, L.、Kolesnikov, A.、Weissenborn, D.、Zhai, X.、Unterthiner, T. 等 (2021)。An image is worth 16 x 16 words: transformers for image recognition at scale。国际学习表征会议。
- Duchi et al., 2011
Duchi, J.、Hazan, E. 和 Singer, Y. (2011)。Adaptive subgradient methods for online learning and stochastic optimization。Journal of Machine Learning Research, 12, 2121–2159。
- Dumoulin & Visin, 2016
Dumoulin, V. 和 Visin, F. (2016)。A guide to convolution arithmetic for deep learning。ArXiv:1603.07285。
- Dwivedi & Bresson, 2020
Dwivedi, V. P. 和 Bresson, X. (2020)。A generalization of transformer networks to graphs。ArXiv:2012.09699。
- Dwork et al., 2015
Dwork, C.、Feldman, V.、Hardt, M.、Pitassi, T.、Reingold, O. 和 Roth, A. L. (2015)。Preserving statistical validity in adaptive data analysis。第 47 届年度 ACM 计算理论研讨会论文集 (第 117–126 页)。
- Elman, 1990
Elman, J. L. (1990)。Finding structure in time。Cognitive Science, 14(2), 179–211。
- Elsken et al., 2018
Elsken, T.、Metzen, J. H. 和 Hutter, F. (2018)。Neural architecture search: a ssurvey。ArXiv:1808.05377 [stat.ML]。
- Fechner, 1860
Fechner, G. T. (1860)。Elemente der Psychophysik。第 2 卷。Breitkopf u. Härtel。
- Fedus et al., 2022
Fedus, W.、Zoph, B. 和 Shazeer, N. (2022)。Switch transformers: scaling to trillion parameter models with simple and efficient sparsity。Journal of Machine Learning Research, 23(120), 1–39。
- Fernando, 2004
Fernando, R. (2004)。GPU Gems: Programming Techniques, Tips, and Tricks for Real-Time Graphics。Addison-Wesley。
- Feurer & Hutter, 2018
Feurer, M. 和 Hutter, F. (2018)。Hyperparameter ptimization。Automatic Machine Learning: Methods, Systems, Challenges。Springer。
- Feurer et al., 2022
Feurer, M.、Letham, B.、Hutter, F. 和 Bakshy, E. (2022)。Practical transfer learning for Bayesian optimization。ArXiv:1802.02219 [stat.ML]。
- Field, 1987
Field, D. J. (1987)。Relations between the statistics of natural images and the response properties of cortical cells。JOSA A, 4(12), 2379–2394。
- Fisher, 1925
Fisher, R. A. (1925)。Statistical Methods for Research Workers. Oliver & Boyd。
- Flammarion & Bach, 2015
Flammarion, N. 和 Bach, F. (2015)。From averaging to acceleration, there is only a step-size。Conference on Learning Theory (第 658–695 页)。
- Forrester et al., 2007
Forrester, A. I.、Sóbester, A. 和 Keane, A. J. (2007)。Multi-fidelity optimization via surrogate modelling。Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 463(2088), 3251–3269。
- Franceschi et al., 2017
Franceschi, L.、Donini, M.、Frasconi, P. 和 Pontil, M. (2017)。Forward and reverse gradient-based hyperparameter optimization。第 34 届国际机器学习会议论文集 (ICML'17)。
- Frankle & Carbin, 2018
Frankle, J. 和 Carbin, M. (2018)。The lottery ticket hypothesis: finding sparse, trainable neural networks。ArXiv:1803.03635。
- Frazier, 2018
Frazier, P. I. (2018)。A tutorial on Bayesian optimization。ArXiv:1807.02811。
- Freund & Schapire, 1996
Freund, Y. 和 Schapire, R. E. (1996)。Experiments with a new boosting algorithm。国际机器学习会议论文集 (第 148–156 页)。
- Friedman, 1987
Friedman, J. H. (1987)。Exploratory projection pursuit。Journal of the American Statistical Association, 82(397), 249–266。
- Frostig et al., 2018
Frostig, R.、Johnson, M. J. 和 Leary, C. (2018)。Compiling machine learning programs via high-level tracing。Proceedings of Systems for Machine Learning。
- Fukushima, 1982
Fukushima, K. (1982)。Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition。Competition and Cooperation in Neural Nets (第 267–285 页)。Springer。
- Gardner et al., 2018
Gardner, J.、Pleiss, G.、Weinberger, K. Q.、Bindel, D. 和 Wilson, A. G. (2018)。GPyTorch: blackbox matrix–matrix Gaussian process inference with GPU acceleration。Advances in Neural Information Processing Systems。
- Garg et al., 2021
Garg, S.、Balakrishnan, S.、Kolter, Z. 和 Lipton, Z. (2021)。RATT: leveraging unlabeled data to guarantee generalization。国际机器学习会议 (第 3598–3609 页)。
- Gatys et al., 2016
Gatys, L. A.、Ecker, A. S. 和 Bethge, M. (2016)。Image style transfer using convolutional neural networks。IEEE 计算机视觉与模式识别会议论文集 (第 2414–2423 页)。
- Gauss, 1809
Gauss, C. F. (1809)。Theoria motus corporum coelestum。Werke。Königlich Preussische Akademie der Wissenschaften。
- Gibbs, 1902
Gibbs, J. W. (1902)。Elementary Principles of Statistical Mhanics。Scribner's。
- Ginibre, 1965
Ginibre, J. (1965)。Statistical ensembles of complex, quaternion, and real matrices。Journal of Mathematical Physics, 6(3), 440–449。
- Girshick, 2015
Girshick, R. (2015)。Fast R-CNN。IEEE 国际计算机视觉会议论文集 (第 1440–1448 页)。
- Girshick et al., 2014
Girshick, R.、Donahue, J.、Darrell, T. 和 Malik, J. (2014)。Rich feature hierarchies for accurate object detection and semantic segmentation。IEEE 计算机视觉与模式识别会议论文集 (第 580–587 页)。
- Glorot & Bengio, 2010
Glorot, X. 和 Bengio, Y. (2010)。Understanding the difficulty of training deep feedforward neural networks。第 13 届国际人工智能与统计学会议论文集 (第 249–256 页)。
- Goh, 2017
Goh, G. (2017)。Why momentum really works。Distill。URL: http://distill.pub/2017/momentum
- Goldberg et al., 1992
Goldberg, D.、Nichols, D.、Oki, B. M. 和 Terry, D. (1992)。Using collaborative filtering to weave an information tapestry。Communications of the ACM, 35(12), 61–71。
- Golub & VanLoan, 1996
Golub, G. H. 和 Van Loan, C. F. (1996)。Matrix Computations。Johns Hopkins University Press。
- Goodfellow et al., 2016
Goodfellow, I.、Bengio, Y. 和 Courville, A. (2016)。Deep Learning。MIT Press。http://www.deeplearningbook.org。
- Goodfellow et al., 2014
Goodfellow, I.、Pouget-Abadie, J.、Mirza, M.、Xu, B.、Warde-Farley, D.、Ozair, S. … Bengio, Y. (2014)。Generative adversarial nets。Advances in Neural Information Processing Systems (第 2672–2680 页)。
- Gotmare et al., 2018
Gotmare, A.、Keskar, N. S.、Xiong, C. 和 Socher, R. (2018)。A closer look at deep learning heuristics: learning rate restarts, warmup and distillation。ArXiv:1810.13243。
- Goyal et al., 2021
Goyal, A.、Bochkovskiy, A.、Deng, J. 和 Koltun, V. (2021)。Non-deep networks。ArXiv:2110.07641。
- Graham, 2014
Graham, B. (2014)。Fractional max-pooling。ArXiv:1412.6071。
- Graves, 2013
Graves, A. (2013)。Generating sequences with recurrent neural networks。ArXiv:1308.0850。
- Graves et al., 2008
Graves, A.、Liwicki, M.、Fernández, S.、Bertolami, R.、Bunke, H. 和 Schmidhuber, J. (2008)。A novel connectionist system for unconstrained handwriting recognition。IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5), 855–868。
- Graves & Schmidhuber, 2005
Graves, A. 和 Schmidhuber, J. (2005)。Framewise phoneme classification with bidirectional LSTM and other neural network architectures。Neural Networks, 18(5-6), 602–610。
- Griewank, 1989
Griewank, A. (1989)。On automatic differentiation。Mathematical Programming: Recent Developments and Applications (第 83–107 页)。Kluwer。
- Gulati et al., 2020
Gulati, A.、Qin, J.、Chiu, C.-C.、Parmar, N.、Zhang, Y.、Yu, J. 等 (2020)。Conformer: convolution-augmented transformer for speech recognition。Proc. Interspeech 2020, 第 5036–5040 页。
- Gunawardana & Shani, 2015
Gunawardana, A. 和 Shani, G. (2015)。Evaluating recommender systems。Recommender Systems Handbook (第 265–308 页)。Springer。
- Guo et al., 2017
Guo, H.、Tang, R.、Ye, Y.、Li, Z. 和 He, X. (2017)。Deepfm: a factorization-machine based neural network for ctr prediction。第 26 届国际人工智能联合会议论文集 (第 1725–1731 页)。
- Guyon et al., 2008
Guyon, I.、Gunn, S.、Nikravesh, M. 和 Zadeh, L. A. (2008)。Feature Extraction: Foundations and Applications。Springer。
- Hadjis et al., 2016
Hadjis, S.、Zhang, C.、Mitliagkas, I.、Iter, D. 和 Ré, C. (2016)。Omnivore: an optimizer for multi-device deep learning on CPUs and GPUs。ArXiv:1606.04487。
- Hartley & Zisserman, 2000
Hartley, R. 和 Zisserman, A. (2000)。Multiple View Geometry in Computer Vision。Cambridge University Press。
- Hartley & Kahl, 2009
Hartley, R. I. 和 Kahl, F. (2009)。Global optimization through rotation space search。International Journal of Computer Vision, 82(1), 64–79。
- He et al., 2022
He, K.、Chen, X.、Xie, S.、Li, Y.、Dollár, P. 和 Girshick, R. (2022)。Masked autoencoders are scalable vision learners。IEEE/CVF 计算机视觉与模式识别会议论文集 (第 16000–16009 页)。
- He et al., 2017a
He, K.、Gkioxari, G.、Dollár, P. 和 Girshick, R. (2017)。Mask R-CNN。IEEE 国际计算机视觉会议论文集 (第 2961–2969 页)。
- He et al., 2015
He, K.、Zhang, X.、Ren, S. 和 Sun, J. (2015)。Delving deep into rectifiers: surpassing human-level performance on ImageNet classification。IEEE 国际计算机视觉会议论文集 (第 1026–1034 页)。
- He et al., 2016a
He, K.、Zhang, X.、Ren, S. 和 Sun, J. (2016)。Deep residual learning for image recognition。IEEE 计算机视觉与模式识别会议论文集 (第 770–778 页)。
- He et al., 2016b
He, K.、Zhang, X.、Ren, S. 和 Sun, J. (2016)。Identity mappings in deep residual networks。欧洲计算机视觉会议 (第 630–645 页)。
- He & Chua, 2017
He, X. 和 Chua, T.-S. (2017)。Neural factorization machines for sparse predictive analytics。第 40 届国际 ACM SIGIR 信息检索研究与发展会议论文集 (第 355–364 页)。
- He et al., 2017b
He, X.、Liao, L.、Zhang, H.、Nie, L.、Hu, X. 和 Chua, T.-S. (2017)。Neural collaborative filtering。第 26 届国际万维网会议论文集 (第 173–182 页)。
- Hebb, 1949
Hebb, D. O. (1949)。The Organization of Behavior。Wiley。
- Hendrycks & Gimpel, 2016
Hendrycks, D. 和 Gimpel, K. (2016)。Gaussian error linear units (GELUs)。ArXiv:1606.08415。
- Hennessy & Patterson, 2011
Hennessy, J. L. 和 Patterson, D. A. (2011)。Computer Architecture: A Quantitative Approach。Elsevier。
- Herlocker et al., 1999
Herlocker, J. L.、Konstan, J. A.、Borchers, A. 和 Riedl, J. (1999)。An algorithmic framework for performing collaborative filtering。第 22 届年度国际 ACM 信息检索研究与发展会议, SIGIR 1999 (第 230–237 页)。
- Hidasi et al., 2015
Hidasi, B.、Karatzoglou, A.、Baltrunas, L. 和 Tikk, D. (2015)。Session-based recommendations with recurrent neural networks。ArXiv:1511.06939。
- Ho et al., 2020
Ho, J.、Jain, A. 和 Abbeel, P. (2020)。Denoising diffusion probabilistic models。Advances in Neural Information Processing Systems, 33, 6840–6851。
- Hochreiter et al., 2001
Hochreiter, S.、Bengio, Y.、Frasconi, P. 和 Schmidhuber, J. (2001)。Gradient flow in recurrent nets: the difficulty of learning long-term dependencies。A Field Guide to Dynamical Recurrent Neural Networks。IEEE Press。
- Hochreiter & Schmidhuber, 1997
Hochreiter, S. 和 Schmidhuber, J. (1997)。Long short-term memory。Neural Computation, 9(8), 1735–1780。
- Hoffmann et al., 2022
Hoffmann, J.、Borgeaud, S.、Mensch, A.、Buchatskaya, E.、Cai, T.、Rutherford, E. 等 (2022)。Training compute-optimal large language models。ArXiv:2203.15556。
- Howard et al., 2019
Howard, A.、Sandler, M.、Chu, G.、Chen, L.-C.、Chen, B.、Tan, M. … Adam, H. (2019)。Searching for MobileNetV3。IEEE/CVF 国际计算机视觉会议论文集 (第 1314–1324 页)。
- Hoyer et al., 2009
Hoyer, P. O.、Janzing, D.、Mooij, J. M.、Peters, J. 和 Schölkopf, B. (2009)。Nonlinear causal discovery with additive noise models。Advances in Neural Information Processing Systems (第 689–696 页)。
- Hu et al., 2018
Hu, J.、Shen, L. 和 Sun, G. (2018)。Squeeze-and-excitation networks。IEEE 计算机视觉与模式识别会议论文集 (第 7132–7141 页)。
- Hu et al., 2008
Hu, Y.、Koren, Y. 和 Volinsky, C. (2008)。Collaborative filtering for implicit feedback datasets。2008 第 8 届 IEEE 国际数据挖掘会议 (第 263–272 页)。
- Hu et al., 2022
Hu, Z.、Lee, R. K.-W.、Aggarwal, C. C. 和 Zhang, A. (2022)。Text style transfer: a review and experimental evaluation。SIGKDD Explor. Newsl., 24(1)。URL: https://doi.org/10.1145/3544903.3544906
- Huang et al., 2018
Huang, C.-Z. A.、Vaswani, A.、Uszkoreit, J.、Simon, I.、Hawthorne, C.、Shazeer, N. … Eck, D. (2018)。Music transformer: generating music with long-term structure。国际学习表征会议。
- Huang et al., 2017
Huang, G.、Liu, Z.、Van Der Maaten, L. 和 Weinberger, K. Q. (2017)。Densely connected convolutional networks。IEEE 计算机视觉与模式识别会议论文集 (第 4700–4708 页)。
- Huang et al., 2015
Huang, Z.、Xu, W. 和 Yu, K. (2015)。Bidirectional LSTM–CRF models for sequence tagging。ArXiv:1508.01991。
- Hubel & Wiesel, 1959
Hubel, D. H. 和 Wiesel, T. N. (1959)。Receptive fields of single neurones in the cat's striate cortex。Journal of Physiology, 148(3), 574–591。
- Hubel & Wiesel, 1962
Hubel, D. H. 和 Wiesel, T. N. (1962)。Receptive fields, binocular interaction and functional architecture in the cat's visual cortex。Journal of Physiology, 160(1), 106–154。
- Hubel & Wiesel, 1968
Hubel, D. H. 和 Wiesel, T. N. (1968)。Receptive fields and functional architecture of monkey striate cortex。Journal of Physiology, 195(1), 215–243。
- Hutter et al., 2011
Hutter, F.、Hoos, H. 和 Leyton-Brown, K. (2011)。Sequential model-based optimization for general algorithm configuration。第五届国际学习与智能优化会议论文集 (LION'11)。
- Hutter et al., 2019
Hutter, F.、Kotthoff, L. 和 Vanschoren, J. (编) (2019)。Automated Machine Learning: Methods, Systems, Challenges。Springer。
- Ioffe, 2017
Ioffe, S. (2017)。Batch renormalization: towards reducing minibatch dependence in batch-normalized models。Advances in Neural Information Processing Systems (第 1945–1953 页)。
- Ioffe & Szegedy, 2015
Ioffe, S. 和 Szegedy, C. (2015)。Batch normalization: accelerating deep network training by reducing internal covariate shift。ArXiv:1502.03167。
- Izmailov et al., 2018
Izmailov, P.、Podoprikhin, D.、Garipov, T.、Vetrov, D. 和 Wilson, A. G. (2018)。Averaging weights leads to wider optima and better generalization。ArXiv:1803.05407。
- Jacot et al., 2018
Jacot, A.、Gabriel, F. 和 Hongler, C. (2018)。Neural tangent kernel: convergence and generalization in neural networks。Advances in Neural Information Processing Systems。
- Jaeger, 2002
Jaeger, H. (2002)。Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach。GMD-Forschungszentrum Informationstechnik Bonn。
- Jamieson & Talwalkar, 2016
Jamieson, K. 和 Talwalkar, A. (2016)。Non-stochastic best arm identification and hyperparameter optimization。第 17 届国际人工智能与统计学会议论文集。
- Jenatton et al., 2017
Jenatton, R.、Archambeau, C.、González, J. 和 Seeger, M. (2017)。Bayesian optimization with tree-structured dependencies。第 34 届国际机器学习会议论文集 (ICML'17)。
- Jia et al., 2018
Jia, X.、Song, S.、He, W.、Wang, Y.、Rong, H.、Zhou, F. 等 (2018)。Highly scalable deep learning training system with mixed-precision: training ImageNet in four minutes。ArXiv:1807.11205。
- Jia et al., 2014
Jia, Y.、Shelhamer, E.、Donahue, J.、Karayev, S.、Long, J.、Girshick, R. … Darrell, T. (2014)。Caffe: convolutional architecture for fast feature embedding。第 22 届 ACM 国际多媒体会议论文集 (第 675–678 页)。
- Joshi et al., 2020
Joshi, M.、Chen, D.、Liu, Y.、Weld, D. S.、Zettlemoyer, L. 和 Levy, O. (2020)。SpanBERT: improving pre-training by representing and predicting spans。Transactions of the Association for Computational Linguistics, 8, 64–77。
- Jouppi et al., 2017
Jouppi, N. P.、Young, C.、Patil, N.、Patterson, D.、Agrawal, G.、Bajwa, R. 等 (2017)。In-datacenter performance analysis of a tensor processing unit。2017 ACM/IEEE 第 44 届年度国际计算机体系结构研讨会 (ISCA) (第 1–12 页)。
- Kalchbrenner et al., 2014
Kalchbrenner, N.、Grefenstette, E. 和 Blunsom, P. (2014)。A convolutional neural network for modelling sentences。ArXiv:1404.2188。
- Kalman & Kwasny, 1992
Kalman, B. L. 和 Kwasny, S. C. (1992)。Why tanh: choosing a sigmoidal function。国际神经网络联合会议论文集 (IJCNN) (第 578–581 页)。
- Kaplan et al., 2020
Kaplan, J.、McCandlish, S.、Henighan, T.、Brown, T. B.、Chess, B.、Child, R. … Amodei, D. (2020)。Scaling laws for neural language models。ArXiv:2001.08361。
- Karnin et al., 2013
Karnin, Z.、Koren, T. 和 Somekh, O. (2013)。Almost optimal exploration in multi-armed bandits。第 30 届国际机器学习会议论文集 (ICML'13)。
- Karras et al., 2017
Karras, T.、Aila, T.、Laine, S. 和 Lehtinen, J. (2017)。Progressive growing of GANs for improved quality, stability, and variation。ArXiv:1710.10196。
- Kim et al., 2017
Kim, J.、El-Khamy, M. 和 Lee, J. (2017)。Residual LSTM: design of a deep recurrent architecture for distant speech recognition。ArXiv:1701.03360。
- Kim, 2014
Kim, Y. (2014)。Convolutional neural networks for sentence classification。ArXiv:1408.5882。
- Kimeldorf & Wahba, 1971
Kimeldorf, G. S. 和 Wahba, G. (1971)。Some results on Tchebycheffian spline functions。J. Math. Anal. Appl., 33, 82–95。
- Kingma & Ba, 2014
Kingma, D. P. 和 Ba, J. (2014)。Adam: a method for stochastic optimization。ArXiv:1412.6980。
- Kingma & Welling, 2014
Kingma, D. P. 和 Welling, M. (2014)。Auto-encoding variational Bayes。国际学习表征会议 (ICLR)。
- Kipf & Welling, 2016
Kipf, T. N. 和 Welling, M. (2016)。Semi-supervised classification with graph convolutional networks。ArXiv:1609.02907。
- Kojima et al., 2022
Kojima, T.、Gu, S. S.、Reid, M.、Matsuo, Y. 和 Iwasawa, Y. (2022)。Large language models are zero-shot reasoners。arxiv.org/abs/2205.11916。
- Koller & Friedman, 2009
Koller, D. 和 Friedman, N. (2009)。Probabilistic Graphical Models: Principles and Techniques。MIT Press。
- Kolmogorov, 1933
Kolmogorov, A. (1933)。Sulla determinazione empirica di una legge di distribuzione。Inst. Ital. Attuari, Giorn., 4, 83–91。
- Kolter, 2008
Kolter, Z. (2008)。Linear algebra review and reference。在线查阅:http://cs229.stanford.edu/section/cs229-linalg.pdf。
- Koren et al., 2009
Koren, Y.、Bell, R. 和 Volinsky, C. (2009)。Matrix factorization techniques for recommender systems。Computer, 第 30–37 页。
- Krizhevsky et al., 2012
Krizhevsky, A.、Sutskever, I. 和 Hinton, G. E. (2012)。ImageNet classification with deep convolutional neural networks。Advances in Neural Information Processing Systems (第 1097–1105 页)。
- Kung, 1988
Kung, S. Y. (1988)。VLSI Array Processors。Prentice Hall。
- Kuzovkin et al., 2018
Kuzovkin, I.、Vicente, R.、Petton, M.、Lachaux, J.-P.、Baciu, M.、Kahane, P. … Aru, J. (2018)。Activations of deep convolutional neural networks are aligned with gamma band activity of human visual cortex。Communications Biology, 1(1), 1–12。
- Lan et al., 2019
Lan, Z.、Chen, M.、Goodman, S.、Gimpel, K.、Sharma, P. 和 Soricut, R. (2019)。ALBERT: a lite BERT for self-supervised learning of language representations。ArXiv:1909.11942。
- Lavin & Gray, 2016
Lavin, A. 和 Gray, S. (2016)。Fast algorithms for convolutional neural networks。IEEE 计算机视觉与模式识别会议论文集 (第 4013–4021 页)。
- Le, 2013
Le, Q. V. (2013)。Building high-level features using large scale unsupervised learning。IEEE 国际声学、语音与信号处理会议论文集 (第 8595–8598 页)。
- LeCun et al., 1995a
LeCun, Y.、Bengio, Y. 和 等 (1995)。Convolutional networks for images, speech, and time series。The Handbook of Brain Theory and Neural Networks (第 3361 页)。MIT Press。
- LeCun et al., 1989
LeCun, Y.、Boser, B.、Denker, J. S.、Henderson, D.、Howard, R. E.、Hubbard, W. 和 Jackel, L. D. (1989)。Backpropagation applied to handwritten zip code recognition。Neural Computation, 1(4), 541–551。
- LeCun et al., 1998a
LeCun, Y.、Bottou, L.、Orr, G. 和 Muller, K.-R. (1998)。Efficient backprop。Neural Networks: Tricks of the Trade。Springer。
- LeCun et al., 1998b
LeCun, Y.、Bottou, L.、Bengio, Y. 和 Haffner, P. (1998)。Gradient-based learning applied to document recognition。Proceedings of the IEEE, 86(11), 2278–2324。
- LeCun et al., 1995b
LeCun, Y.、Jackel, L.、Bottou, L.、Brunot, A.、Cortes, C.、Denker, J. 等 (1995)。Comparison of learning algorithms for handwritten digit recognition。国际人工神经网络会议 (第 53–60 页)。
- Legendre, 1805
Legendre, A. M. (1805)。Mémoire sur les Opérations Trigonométriques: dont les Résultats Dépendent de la Figure de la Terre。F. Didot。
- Lewis et al., 2019
Lewis, M.、Liu, Y.、Goyal, N.、Ghazvininejad, M.、Mohamed, A.、Levy, O. … Zettlemoyer, L. (2019)。BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension。ArXiv:1910.13461。
- Lewkowycz et al., 2022
Lewkowycz, A.、Andreassen, A.、Dohan, D.、Dyer, E.、Michalewski, H.、Ramasesh, V. 等 (2022)。Solving quantitative reasoning problems with language models。ArXiv:2206.14858。
- Li et al., 2018
Li, L.、Jamieson, K.、Rostamizadeh, A.、Gonina, K.、Hardt, M.、Recht, B. 和 Talwalkar, A. (2018)。Massively parallel hyperparameter tuning。ArXiv:1810.05934。
- Li, 2017
Li, M. (2017)。Scaling Distributed Machine Learning with System and Algorithm Co-design (博士论文)。博士论文,CMU。
- Li et al., 2014a
Li, M.、Andersen, D. G.、Park, J. W.、Smola, A. J.、Ahmed, A.、Josifovski, V. … Su, B.-Y. (2014)。Scaling distributed machine learning with the parameter server。第 11 届操作系统设计与实现研讨会 (OSDI 14) (第 583–598 页)。
- Li et al., 2014b
Li, M.、Zhang, T.、Chen, Y. 和 Smola, A. J. (2014)。Efficient mini-batch training for stochastic optimization。第 20 届 ACM SIGKDD 国际知识发现与数据挖掘会议论文集 (第 661–670 页)。
- Liaw et al., 2018
Liaw, R.、Liang, E.、Nishihara, R.、Moritz, P.、Gonzalez, J. 和 Stoica, I. (2018)。Tune: a research platform for distributed model selection and training。ArXiv:1807.05118。
- Lin et al., 2013
Lin, M.、Chen, Q. 和 Yan, S. (2013)。Network in network。ArXiv:1312.4400。
- Lin et al., 2017a
Lin, T.-Y.、Goyal, P.、Girshick, R.、He, K. 和 Dollár, P. (2017)。Focal loss for dense object detection。IEEE 国际计算机视觉会议论文集 (第 2980–2988 页)。
- Lin et al., 2010
Lin, Y.、Lv, F.、Zhu, S.、Yang, M.、Cour, T.、Yu, K. … 等 (2010)。ImageNet classification: fast descriptor coding and large-scale SVM training。大规模视觉识别挑战赛。
- Lin et al., 2017b
Lin, Z.、Feng, M.、Santos, C. N. d.、Yu, M.、Xiang, B.、Zhou, B. 和 Bengio, Y. (2017)。A structured self-attentive sentence embedding。ArXiv:1703.03130。
- Lipton et al., 2015
Lipton, Z. C.、Berkowitz, J. 和 Elkan, C. (2015)。A critical review of recurrent neural networks for sequence learning。ArXiv:1506.00019。
- Lipton et al., 2016
Lipton, Z. C.、Kale, D. C.、Elkan, C. 和 Wetzel, R. (2016)。Learning to diagnose with LSTM recurrent neural networks。国际学习表征会议 (ICLR)。
- Lipton & Steinhardt, 2018
Lipton, Z. C. 和 Steinhardt, J. (2018)。Troubling trends in machine learning scholarship。Communications of the ACM, 17, 45–77。
- Liu & Nocedal, 1989
Liu, D. C. 和 Nocedal, J. (1989)。On the limited memory BFGS method for large scale optimization。Mathematical Programming, 45(1), 503–528。
- Liu et al., 2018
Liu, H.、Simonyan, K. 和 Yang, Y. (2018)。DARTS: differentiable architecture search。ArXiv:1806.09055。
- Liu et al., 2016
Liu, W.、Anguelov, D.、Erhan, D.、Szegedy, C.、Reed, S.、Fu, C.-Y. 和 Berg, A. C. (2016)。SSD: single shot multibox detector。欧洲计算机视觉会议 (第 21–37 页)。
- Liu et al., 2019
Liu, Y.、Ott, M.、Goyal, N.、Du, J.、Joshi, M.、Chen, D. … Stoyanov, V. (2019)。RoBERTa: a robustly optimized BERT pretraining approach。ArXiv:1907.11692。
- Liu et al., 2021
Liu, Z.、Lin, Y.、Cao, Y.、Hu, H.、Wei, Y.、Zhang, Z. … Guo, B. (2021)。Swin transformer: hierarchical vision transformer using shifted windows。IEEE/CVF 国际计算机视觉会议论文集 (第 10012–10022 页)。
- Liu et al., 2022
Liu, Z.、Mao, H.、Wu, C.-Y.、Feichtenhofer, C.、Darrell, T. 和 Xie, S. (2022)。A convNet for the 2020s。ArXiv:2201.03545。
- Long et al., 2015
Long, J.、Shelhamer, E. 和 Darrell, T. (2015)。Fully convolutional networks for semantic segmentation。IEEE 计算机视觉与模式识别会议论文集 (第 3431–3440 页)。
- Loshchilov & Hutter, 2016
Loshchilov, I. 和 Hutter, F. (2016)。SGDR: stochastic gradient descent with warm restarts。ArXiv:1608.03983。
- Lowe, 2004
Lowe, D. G. (2004)。Distinctive image features from scale-invariant keypoints。International Journal of Computer Vision, 60(2), 91–110。
- Luo et al., 2018
Luo, P.、Wang, X.、Shao, W. 和 Peng, Z. (2018)。Towards understanding regularization in batch normalization。ArXiv:1809.00846。
- Maas et al., 2011
Maas, A. L.、Daly, R. E.、Pham, P. T.、Huang, D.、Ng, A. Y. 和 Potts, C. (2011)。Learning word vectors for sentiment analysis。计算语言学协会第 49 届年会论文集:人类语言技术,第 1 卷 (第 142–150 页)。
- Mack & Silverman, 1982
Mack, Y.-P. 和 Silverman, B. W. (1982)。Weak and strong uniform consistency of kernel regression estimates。Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 61(3), 405–415。
- MacKay, 2003
MacKay, D. J. (2003)。Information Theory, Inference and Learning Algorithms。Cambridge University Press。
- Maclaurin et al., 2015
Maclaurin, D.、Duvenaud, D. 和 Adams, R. (2015)。Gradient-based hyperparameter optimization through reversible learning。第 32 届国际机器学习会议论文集 (ICML'15)。
- Mangasarian, 1965
Mangasarian, O. L. (1965)。Linear and nonlinear separation of patterns by linear programming。Oper. Res., 13, 444-452。
- Mangram, 2013
Mangram, M. E. (2013)。A simplified perspective of the Markowitz portfolio theory。Global Journal of Business Research, 7(1), 59–70。
- Matthews et al., 2018
Matthews, A. G. d. G.、Rowland, M.、Hron, J.、Turner, R. E. 和 Ghahramani, Z. (2018)。Gaussian process behaviour in wide deep neural networks。ArXiv:1804.11271。
- McCann et al., 2017
McCann, B.、Bradbury, J.、Xiong, C. 和 Socher, R. (2017)。Learned in translation: Contextualized word vectors。Advances in Neural Information Processing Systems (第 6294–6305 页)。
- McCulloch & Pitts, 1943
McCulloch, W. S. 和 Pitts, W. (1943)。A logical calculus of the ideas immanent in nervous activity。Bulletin of Mathematical Biophysics, 5(4), 115–133。
- McMahan et al., 2013
McMahan, H. B.、Holt, G.、Sculley, D.、Young, M.、Ebner, D.、Grady, J. 等 (2013)。Ad click prediction: a view from the trenches。第 19 届 ACM SIGKDD 国际知识发现与数据挖掘会议论文集 (第 1222–1230 页)。
- Mead, 1980
Mead, C. (1980)。Introduction to VLSI systems。IEE Proceedings I-Solid-State and Electron Devices, 128(1), 18。
- Merity et al., 2016
Merity, S.、Xiong, C.、Bradbury, J. 和 Socher, R. (2016)。Pointer sentinel mixture models。ArXiv:1609.07843。
- Micchelli, 1984
Micchelli, C. A. (1984)。Interpolation of scattered data: distance matrices and conditionally positive definite functions。Approximation Theory and Spline Functions (第 143–145 页)。Springer。
- Mikolov et al., 2013a
Mikolov, T.、Chen, K.、Corrado, G. 和 Dean, J. (2013)。Efficient estimation of word representations in vector space。ArXiv:1301.3781。
- Mikolov et al., 2013b
Mikolov, T.、Sutskever, I.、Chen, K.、Corrado, G. S. 和 Dean, J. (2013)。Distributed representations of words and phrases and their compositionality。Advances in Neural Information Processing Systems (第 3111–3119 页)。
- Miller, 1995
Miller, G. A. (1995)。WordNet: a lexical database for English。Communications of the ACM, 38(11), 39–41。
- Mirhoseini et al., 2017
Mirhoseini, A.、Pham, H.、Le, Q. V.、Steiner, B.、Larsen, R.、Zhou, Y. … Dean, J. (2017)。Device placement optimization with reinforcement learning。第 34 届国际机器学习会议 (第 2430–2439 页)。
- Mnih et al., 2014
Mnih, V.、Heess, N.、Graves, A. 和 等 (2014)。Recurrent models of visual attention。Advances in Neural Information Processing Systems (第 2204–2212 页)。
- Mnih et al., 2013
Mnih, V.、Kavukcuoglu, K.、Silver, D.、Graves, A.、Antonoglou, I.、Wierstra, D. 和 Riedmiller, M. (2013)。Playing Atari with deep reinforcement learning。ArXiv:1312.5602。
- Mnih et al., 2015
Mnih, V.、Kavukcuoglu, K.、Silver, D.、Rusu, A. A.、Veness, J.、Bellemare, M. G. 等 (2015)。Human-level control through deep reinforcement learning。Nature, 518(7540), 529–533。
- Moon et al., 2010
Moon, T.、Smola, A.、Chang, Y. 和 Zheng, Z. (2010)。Intervalrank: isotonic regression with listwise and pairwise constraints。第 3 届 ACM 国际网络搜索与数据挖掘会议论文集 (第 151–160 页)。
- Morey et al., 2016
Morey, R. D.、Hoekstra, R.、Rouder, J. N.、Lee, M. D. 和 Wagenmakers, E.-J. (2016)。The fallacy of placing confidence in confidence intervals。Psychonomic Bulletin & Review, 23(1), 103–123。
- Morozov, 1984
Morozov, V. A. (1984)。Methods for Solving Incorrectly Posed Problems。Springer。
- Nadaraya, 1964
Nadaraya, E. A. (1964)。On estimating regression。Theory of Probability & its Applications, 9(1), 141–142。
- Nair & Hinton, 2010
Nair, V. 和 Hinton, G. E. (2010)。Rectified linear units improve restricted Boltzmann machines。ICML。
- Nakkiran et al., 2021
Nakkiran, P.、Kaplun, G.、Bansal, Y.、Yang, T.、Barak, B. 和 Sutskever, I. (2021)。Deep double descent: where bigger models and more data hurt。Journal of Statistical Mechanics: Theory and Experiment, 2021(12), 124003。
- Naor & Reingold, 1999
Naor, M. 和 Reingold, O. (1999)。On the construction of pseudorandom permutations: Luby–Rackoff revisited。Journal of Cryptology, 12(1), 29–66。
- Neal, 1996
Neal, R. M. (1996)。Bayesian Learning for Neural Networks。Springer。
- Nesterov, 2018
Nesterov, Y. (2018)。Lectures on Convex Optimization。Springer。
- Nesterov & Vial, 2000
Nesterov, Y. 和 Vial, J.-P. (2000)。Confidence level solutions for stochastic programming。Automatica, 44(6), 1559–1568。
- Neyman, 1937
Neyman, J. (1937)。Outline of a theory of statistical estimation based on the classical theory of probability。Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences, 236(767), 333–380。
- Norelli et al., 2022
Norelli, A.、Fumero, M.、Maiorca, V.、Moschella, L.、Rodolà, E. 和 Locatello, F. (2022)。ASIF: coupled data turns unimodal models to multimodal without training。ArXiv:2210.01738。
- Novak et al., 2018
Novak, R.、Xiao, L.、Lee, J.、Bahri, Y.、Yang, G.、Hron, J. … Sohl-Dickstein, J. (2018)。Bayesian deep convolutional networks with many channels are Gaussian processes。ArXiv:1810.05148。
- Novikoff, 1962
Novikoff, A. B. J. (1962)。On convergence proofs on perceptrons。Proceedings of the Symposium on the Mathematical Theory of Automata (第 615–622 页)。
- Olshausen & Field, 1996
Olshausen, B. A. 和 Field, D. J. (1996)。Emergence of simple-cell receptive field properties by learning a sparse code for natural images。Nature, 381(6583), 607–609。
- Ong et al., 2005
Ong, C. S.、Smola, A. 和 Williamson, R. (2005)。Learning the kernel with hyperkernels。Journal of Machine Learning Research, 6, 1043–1071。
- OpenAI, 2023
OpenAI. (2023)。GPT-4 Technical Report。ArXiv:2303.08774。
- Ouyang et al., 2022
Ouyang, L.、Wu, J.、Jiang, X.、Almeida, D.、Wainwright, C. L.、Mishkin, P. 等 (2022)。Training language models to follow instructions with human feedback。ArXiv:2203.02155。
- Papineni et al., 2002
Papineni, K.、Roukos, S.、Ward, T. 和 Zhu, W.-J. (2002)。BLEU: a method for automatic evaluation of machine translation。计算语言学协会第 40 届年会论文集 (第 311–318 页)。
- Parikh et al., 2016
Parikh, A. P.、Täckström, O.、Das, D. 和 Uszkoreit, J. (2016)。A decomposable attention model for natural language inference。ArXiv:1606.01933。
- Park et al., 2019
Park, T.、Liu, M.-Y.、Wang, T.-C. 和 Zhu, J.-Y. (2019)。Semantic image synthesis with spatially-adaptive normalization。IEEE 计算机视觉与模式识别会议论文集 (第 2337–2346 页)。
- Parzen, 1957
Parzen, E. (1957)。On consistent estimates of the spectrum of a stationary time series。Annals of Mathematical Statistics, 28, 329–348。
- Paszke et al., 2019
Paszke, A.、Gross, S.、Massa, F.、Lerer, A.、Bradbury, J.、Chanan, G. 等 (2019)。PyTorch: an imperative style, high-performance deep learning library。Advances in Neural Information Processing Systems, 32, 8026–8037。
- Paulus et al., 2017
Paulus, R.、Xiong, C. 和 Socher, R. (2017)。A deep reinforced model for abstractive summarization。ArXiv:1705.04304。
- Penedo et al., 2023
Penedo, G.、Malartic, Q.、Hesslow, D.、Cojocaru, R.、Cappelli, A.、Alobeidli, H. … Launay, J. (2023)。The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only。ArXiv:2306.01116。
- Pennington et al., 2017
Pennington, J.、Schoenholz, S. 和 Ganguli, S. (2017)。Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice。Advances in Neural Information Processing Systems (第 4785–4795 页)。
- Pennington et al., 2014
Pennington, J.、Socher, R. 和 Manning, C. (2014)。GloVe: global vectors for word representation。2014 年自然语言处理经验方法会议论文集 (EMNLP) (第 1532–1543 页)。
- Peters et al., 2017a
Peters, J.、Janzing, D. 和 Schölkopf, B. (2017)。Elements of Causal Inference: Foundations and Learning Algorithms。MIT Press。
- Peters et al., 2017b
Peters, M.、Ammar, W.、Bhagavatula, C. 和 Power, R. (2017)。Semi-supervised sequence tagging with bidirectional language models。计算语言学协会第 55 届年会论文集, 第 1 卷 (第 1756–1765 页)。
- Peters et al., 2018
Peters, M.、Neumann, M.、Iyyer, M.、Gardner, M.、Clark, C.、Lee, K. 和 Zettlemoyer, L. (2018)。Deep contextualized word representations。2018 年北美计算语言学协会分会会议论文集:人类语言技术,第 1 卷 (第 2227–2237 页)。
- Petersen & Pedersen, 2008
Petersen, K. B. 和 Pedersen, M. S. (2008)。The Matrix Cookbook。丹麦技术大学。
- Pleiss et al., 2017
Pleiss, G.、Chen, D.、Huang, G.、Li, T.、Van Der Maaten, L. 和 Weinberger, K. Q. (2017)。Memory-efficient implementation of densenets。ArXiv:1707.06990。
- Polyak, 1964
Polyak, B. T. (1964)。Some methods of speeding up the convergence of iteration methods。USSR Computational Mathematics and Mathematical Physics, 4(5), 1–17。
- Prakash et al., 2016
Prakash, A.、Hasan, S. A.、Lee, K.、Datla, V.、Qadir, A.、Liu, J. 和 Farri, O. (2016)。Neural paraphrase generation with stacked residual LSTM networks。ArXiv:1610.03098。
- Qin et al., 2023
Qin, C.、Zhang, A.、Zhang, Z.、Chen, J.、Yasunaga, M. 和 Yang, D. (2023)。Is ChatGPT a general-purpose natural language processing task solver?。ArXiv:2302.06476。
- Quadrana et al., 2018
Quadrana, M.、Cremonesi, P. 和 Jannach, D. (2018)。Sequence-aware recommender systems。ACM Computing Surveys, 51(4), 66。
- Quinlan, 1993
Quinlan, J. R. (1993)。C4.5: Programs for Machine Learning。Elsevier。
- Rabiner & Juang, 1993
Rabiner, L. 和 Juang, B.-H. (1993)。Fundamentals of Speech Recognition。Prentice-Hall。
- Radford et al., 2021
Radford, A.、Kim, J. W.、Hallacy, C.、Ramesh, A.、Goh, G.、Agarwal, S. 等 (2021)。Learning transferable visual models from natural language supervision。国际机器学习会议 (第 8748–8763 页)。
- Radford et al., 2015
Radford, A.、Metz, L. 和 Chintala, S. (2015)。Unsupervised representation learning with deep convolutional generative adversarial networks。ArXiv:1511.06434。
- Radford et al., 2018
Radford, A.、Narasimhan, K.、Salimans, T. 和 Sutskever, I. (2018)。Improving language understanding by generative pre-training。OpenAI。
- Radford et al., 2019
Radford, A.、Wu, J.、Child, R.、Luan, D.、Amodei, D. 和 Sutskever, I. (2019)。Language models are unsupervised multitask learners。OpenAI Blog, 1(8), 9。
- Radosavovic et al., 2019
Radosavovic, I.、Johnson, J.、Xie, S.、Lo, W.-Y. 和 Dollár, P. (2019)。On network design spaces for visual recognition。IEEE/CVF 国际计算机视觉会议论文集 (第 1882–1890 页)。
- Radosavovic et al., 2020
Radosavovic, I.、Kosaraju, R. P.、Girshick, R.、He, K. 和 Dollár, P. (2020)。Designing network design spaces。IEEE/CVF 计算机视觉与模式识别会议论文集 (第 10428–10436 页)。
- Rae et al., 2021
Rae, J. W.、Borgeaud, S.、Cai, T.、Millican, K.、Hoffmann, J.、Song, F. 等 (2021)。Scaling language models: methods, analysis & insights from training gopher。ArXiv:2112.11446。
- Raffel et al., 2020
Raffel, C.、Shazeer, N.、Roberts, A.、Lee, K.、Narang, S.、Matena, M. … Liu, P. J. (2020)。Exploring the limits of transfer learning with a unified text-to-text transformer。Journal of Machine Learning Research, 21, 1–67。
- Rajpurkar et al., 2016
Rajpurkar, P.、Zhang, J.、Lopyrev, K. 和 Liang, P. (2016)。SQuAD: 100,000+ questions for machine comprehension of text。ArXiv:1606.05250。
- Ramachandran et al., 2019
Ramachandran, P.、Parmar, N.、Vaswani, A.、Bello, I.、Levskaya, A. 和 Shlens, J. (2019)。Stand-alone self-attention in vision models。Advances in Neural Information Processing Systems, 32。
- Ramachandran et al., 2017
Ramachandran, P.、Zoph, B. 和 Le, Q. V. (2017). Searching for activation functions. ArXiv:1710.05941。
- Ramesh 等人, 2022
Ramesh, A.、Dhariwal, P.、Nichol, A.、Chu, C. 和 Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. ArXiv:2204.06125。
- Cajal & Azoulay, 1894
Ramón y Cajal, Santiago, 和 Azoulay, L. (1894). Les Nouvelles Idées sur la Structure du Système Nerveux chez l'Homme et chez les Vertébrés。巴黎, C. Reinwald & Cie。
- Ranzato 等人, 2007
Ranzato, M.-A.、Boureau, Y.-L.、Chopra, S. 和 LeCun, Y. (2007). A unified energy-based framework for unsupervised learning. Artificial Intelligence and Statistics (第 371–379 页)。
- Rasmussen & Williams, 2006
Rasmussen, C. E., 和 Williams, C. K. (2006). Gaussian Processes for Machine Learning。麻省理工学院出版社。
- Reddi 等人, 2019
Reddi, S. J.、Kale, S. 和 Kumar, S. (2019). On the convergence of Adam and beyond. ArXiv:1904.09237。
- Redmon 等人, 2016
Redmon, J.、Divvala, S.、Girshick, R. 和 Farhadi, A. (2016). You only look once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (第 779–788 页)。
- Redmon & Farhadi, 2018
Redmon, J., 和 Farhadi, A. (2018). YOLOv3: an incremental improvement. ArXiv:1804.02767。
- Reed & DeFreitas, 2015
Reed, S., 和 De Freitas, N. (2015). Neural programmer-interpreters. ArXiv:1511.06279。
- Reed 等人, 2022
Reed, S.、Zolna, K.、Parisotto, E.、Colmenarejo, S. G.、Novikov, A.、Barth-Maron, G. 等人 (2022). A generalist agent. ArXiv:2205.06175。
- Ren 等人, 2015
Ren, S.、He, K.、Girshick, R. 和 Sun, J. (2015). Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (第 91–99 页)。
- Rendle, 2010
Rendle, S. (2010). Factorization machines. 2010 IEEE International Conference on Data Mining (第 995–1000 页)。
- Rendle 等人, 2009
Rendle, S.、Freudenthaler, C.、Gantner, Z. 和 Schmidt-Thieme, L. (2009). BPR: Bayesian personalized ranking from implicit feedback. Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (第 452–461 页)。
- Revels 等人, 2016
Revels, J.、Lubin, M. 和 Papamarkou, T. (2016). Forward-mode automatic differentiation in Julia. ArXiv:1607.07892。
- Rezende 等人, 2014
Rezende, D. J.、Mohamed, S. 和 Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. International Conference on Machine Learning (第 1278–1286 页)。
- Riesenhuber & Poggio, 1999
Riesenhuber, M., 和 Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019–1025。
- Rockafellar, 1970
Rockafellar, R. T. (1970). Convex Analysis。普林斯顿大学出版社。
- Rolnick 等人, 2017
Rolnick, D.、Veit, A.、Belongie, S. 和 Shavit, N. (2017). Deep learning is robust to massive label noise. ArXiv:1705.10694。
- Rudin, 1973
Rudin, W. (1973). Functional Analysis。McGraw-Hill。
- Rumelhart 等人, 1988
Rumelhart, D. E.、Hinton, G. E. 和 Williams, R. J. (1988). Learning representations by back-propagating errors. Cognitive Modeling, 5(3), 1。
- Russakovsky 等人, 2013
Russakovsky, O.、Deng, J.、Huang, Z.、Berg, A. C. 和 Fei-Fei, L. (2013). Detecting avocados to zucchinis: what have we done, and where are we going? International Conference on Computer Vision (ICCV)。
- Russakovsky 等人, 2015
Russakovsky, O.、Deng, J.、Su, H.、Krause, J.、Satheesh, S.、Ma, S. 等人 (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252。
- Russell & Norvig, 2016
Russell, S. J., 和 Norvig, P. (2016). Artificial Intelligence: A Modern Approach。Pearson Education Limited。
- Saharia 等人, 2022
Saharia, C.、Chan, W.、Saxena, S.、Li, L.、Whang, J.、Denton, E. 等人 (2022). Photorealistic text-to-image diffusion models with deep language understanding. ArXiv:2205.11487。
- Salinas 等人, 2022
Salinas, D.、Seeger, M.、Klein, A.、Perrone, V.、Wistuba, M. 和 Archambeau, C. (2022). Syne Tune: a library for large scale hyperparameter tuning and reproducible research. First Conference on Automated Machine Learning。
- Sanh 等人, 2019
Sanh, V.、Debut, L.、Chaumond, J. 和 Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv:1910.01108。
- Sanh 等人, 2021
Sanh, V.、Webson, A.、Raffel, C.、Bach, S. H.、Sutawika, L.、Alyafeai, Z. 等人 (2021). Multitask prompted training enables zero-shot task generalization. ArXiv:2110.08207。
- Santurkar 等人, 2018
Santurkar, S.、Tsipras, D.、Ilyas, A. 和 Madry, A. (2018). How does batch normalization help optimization? Advances in Neural Information Processing Systems (第 2483–2493 页)。
- Sarwar 等人, 2001
Sarwar, B. M.、Karypis, G.、Konstan, J. A. 和 Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. Proceedings of 10th International Conference on World Wide Web (第 285–295 页)。
- Scao 等人, 2022
Scao, T. L.、Fan, A.、Akiki, C.、Pavlick, E.、Ilić, S.、Hesslow, D. 等人 (2022). BLOOM: a 176B-parameter open-access multilingual language model. ArXiv:2211.05100。
- Schein 等人, 2002
Schein, A. I.、Popescul, A.、Ungar, L. H. 和 Pennock, D. M. (2002). Methods and metrics for cold-start recommendations. Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (第 253–260 页)。
- Schuhmann 等人, 2022
Schuhmann, C.、Beaumont, R.、Vencu, R.、Gordon, C.、Wightman, R.、Cherti, M. 等人 (2022). LAION-5B: an open large-scale dataset for training next generation image-text models. ArXiv:2210.08402。
- Schuster & Paliwal, 1997
Schuster, M., 和 Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681。
- Scholkopf 等人, 2001
Schölkopf, B.、Herbrich, R. 和 Smola, A. J. (2001). Helmbold, D. P., 和 Williamson, B. (编辑). A generalized representer theorem. Proceedings of the Annual Conference on Computational Learning Theory (第 416–426 页)。Springer-Verlag。
- Scholkopf 等人, 1996
Schölkopf, B.、Burges, C. 和 Vapnik, V. (1996). Incorporating invariances in support vector learning machines. International Conference on Artificial Neural Networks (第 47–52 页)。
- Scholkopf & Smola, 2002
Schölkopf, B., 和 Smola, A. J. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond。麻省理工学院出版社。
- Sedhain 等人, 2015
Sedhain, S.、Menon, A. K.、Sanner, S. 和 Xie, L. (2015). Autorec: autoencoders meet collaborative filtering. Proceedings of the 24th International Conference on World Wide Web (第 111–112 页)。
- Sennrich 等人, 2015
Sennrich, R.、Haddow, B. 和 Birch, A. (2015). Neural machine translation of rare words with subword units. ArXiv:1508.07909。
- Sergeev & DelBalso, 2018
Sergeev, A., 和 Del Balso, M. (2018). Horovod: fast and easy distributed deep learning in TensorFlow. ArXiv:1802.05799。
- Shannon, 1948
Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423。
- Shao 等人, 2020
Shao, H.、Yao, S.、Sun, D.、Zhang, A.、Liu, S.、Liu, D. 等人 (2020). ControlVAE: controllable variational autoencoder. Proceedings of the 37th International Conference on Machine Learning。
- Shaw 等人, 2018
Shaw, P.、Uszkoreit, J. 和 Vaswani, A. (2018). Self-attention with relative position representations. ArXiv:1803.02155。
- Shoeybi 等人, 2019
Shoeybi, M.、Patwary, M.、Puri, R.、LeGresley, P.、Casper, J. 和 Catanzaro, B. (2019). Megatron-LM: training multi-billion parameter language models using model parallelism. ArXiv:1909.08053。
- Silver 等人, 2016
Silver, D.、Huang, A.、Maddison, C. J.、Guez, A.、Sifre, L.、Van Den Driessche, G. 等人 (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484。
- Silverman, 1986
Silverman, B. W. (1986). Density Estimation for Statistical and Data Analysis。Chapman and Hall。
- Simard 等人, 1998
Simard, P. Y.、LeCun, Y. A.、Denker, J. S. 和 Victorri, B. (1998). Transformation invariance in pattern recognition – tangent distance and tangent propagation. Neural Networks: Tricks of the Trade (第 239–274 页)。Springer。
- Simonyan & Zisserman, 2014
Simonyan, K., 和 Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. ArXiv:1409.1556。
- Sindhwani 等人, 2015
Sindhwani, V.、Sainath, T. N. 和 Kumar, S. (2015). Structured transforms for small-footprint deep learning. ArXiv:1510.01722。
- Sivic & Zisserman, 2003
Sivic, J., 和 Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. Proceedings of the IEEE International Conference on Computer Vision (第 1470–1470 页)。
- Smith 等人, 2022
Smith, S.、Patwary, M.、Norick, B.、LeGresley, P.、Rajbhandari, S.、Casper, J. 等人 (2022). Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B, a large-scale generative language model. ArXiv:2201.11990。
- Smola & Narayanamurthy, 2010
Smola, A., 和 Narayanamurthy, S. (2010). An architecture for parallel topic models. Proceedings of the VLDB Endowment, 3(1-2), 703–710。
- Snoek 等人, 2012
Snoek, J.、Larochelle, H. 和 Adams, R. (2012). Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems 25 (第 2951–2959 页)。
- Sohl-Dickstein 等人, 2015
Sohl-Dickstein, J.、Weiss, E.、Maheswaranathan, N. 和 Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. International Conference on Machine Learning (第 2256–2265 页)。
- Song & Ermon, 2019
Song, Y., 和 Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32。
- Song 等人, 2021
Song, Y.、Sohl-Dickstein, J.、Kingma, D. P.、Kumar, A.、Ermon, S. 和 Poole, B. (2021). Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations。
- Speelpenning, 1980
Speelpenning, B. (1980). Compiling fast partial derivatives of functions given by algorithms (博士论文)。伊利诺伊大学厄巴纳-香槟分校。
- Srivastava 等人, 2022
Srivastava, A.、Rastogi, A.、Rao, A.、Shoeb, A. A. M.、Abid, A.、Fisch, A. 等人 (2022). Beyond the imitation game: quantifying and extrapolating the capabilities of language models. ArXiv:2206.04615。
- Srivastava 等人, 2014
Srivastava, N.、Hinton, G.、Krizhevsky, A.、Sutskever, I. 和 Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958。
- Srivastava 等人, 2015
Srivastava, R. K.、Greff, K. 和 Schmidhuber, J. (2015). Highway networks. ArXiv:1505.00387。
- Strang, 1993
Strang, G. (1993). Introduction to Linear Algebra。Wellesley–Cambridge Press。
- Su & Khoshgoftaar, 2009
Su, X., 和 Khoshgoftaar, T. M. (2009). A survey of collaborative filtering techniques. Advances in Artificial Intelligence, 2009。
- Sukhbaatar 等人, 2015
Sukhbaatar, S.、Weston, J. 和 Fergus, R. (2015). End-to-end memory networks. Advances in Neural Information Processing Systems (第 2440–2448 页)。
- Sutskever 等人, 2013
Sutskever, I.、Martens, J.、Dahl, G. 和 Hinton, G. (2013). On the importance of initialization and momentum in deep learning. International Conference on Machine Learning (第 1139–1147 页)。
- Sutskever 等人, 2014
Sutskever, I.、Vinyals, O. 和 Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems (第 3104–3112 页)。
- Szegedy 等人, 2017
Szegedy, C.、Ioffe, S.、Vanhoucke, V. 和 Alemi, A. A. (2017). Inception-v4, Inception-ResNet and the impact of residual connections on learning. 31st AAAI Conference on Artificial Intelligence。
- Szegedy 等人, 2015
Szegedy, C.、Liu, W.、Jia, Y.、Sermanet, P.、Reed, S.、Anguelov, D. 等人 (2015). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (第 1–9 页)。
- Szegedy 等人, 2016
Szegedy, C.、Vanhoucke, V.、Ioffe, S.、Shlens, J. 和 Wojna, Z. (2016). Rethinking the Inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (第 2818–2826 页)。
- Tallec & Ollivier, 2017
Tallec, C., 和 Ollivier, Y. (2017). Unbiasing truncated backpropagation through time. ArXiv:1705.08209。
- Tan & Le, 2019
Tan, M., 和 Le, Q. (2019). EfficientNet: rethinking model scaling for convolutional neural networks. International Conference on Machine Learning (第 6105–6114 页)。
- Tang & Wang, 2018
Tang, J., 和 Wang, K. (2018). Personalized top-n sequential recommendation via convolutional sequence embedding. Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (第 565–573 页)。
- Taskar 等人, 2004
Taskar, B.、Guestrin, C. 和 Koller, D. (2004). Max-margin Markov networks. Advances in Neural Information Processing Systems, 16, 25。
- Tay 等人, 2020
Tay, Y.、Dehghani, M.、Bahri, D. 和 Metzler, D. (2020). Efficient transformers: a survey. ArXiv:2009.06732。
- Taylor 等人, 2022
Taylor, R.、Kardas, M.、Cucurull, G.、Scialom, T.、Hartshorn, A.、Saravia, E. 等人 (2022). Galactica: a large language model for science. ArXiv:2211.09085。
- Teye 等人, 2018
Teye, M.、Azizpour, H. 和 Smith, K. (2018). Bayesian uncertainty estimation for batch normalized deep networks. ArXiv:1802.06455。
- Thomee 等人, 2016
Thomee, B.、Shamma, D. A.、Friedland, G.、Elizalde, B.、Ni, K.、Poland, D. 等人 (2016). Yfcc100m: the new data in multimedia research. Communications of the ACM, 59(2), 64–73。
- Tieleman & Hinton, 2012
Tieleman, T., 和 Hinton, G. (2012). Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, Lecture 6.5-rmsprop。
- Tikhonov & Arsenin, 1977
Tikhonov, A. N., 和 Arsenin, V. Y. (1977). Solutions of Ill-Posed Problems。W.H. Winston。
- Tolstikhin 等人, 2021
Tolstikhin, I. O.、Houlsby, N.、Kolesnikov, A.、Beyer, L.、Zhai, X.、Unterthiner, T. 等人 (2021). MLP-mixer: an all-MLP architecture for vision. Advances in Neural Information Processing Systems, 34。
- Torralba 等人, 2008
Torralba, A.、Fergus, R. 和 Freeman, W. T. (2008). 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1958–1970。
- Touvron 等人, 2021
Touvron, H.、Cord, M.、Douze, M.、Massa, F.、Sablayrolles, A. 和 Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. International Conference on Machine Learning (第 10347–10357 页)。
- Touvron 等人, 2023a
Touvron, H.、Lavril, T.、Izacard, G.、Martinet, X.、Lachaux, M.-A.、Lacroix, T. 等人 (2023a). LLaMA: open and efficient foundation language models. ArXiv:2302.13971。
- Touvron 等人, 2023b
Touvron, H.、Martin, L.、Stone, K.、Albert, P.、Almahairi, A.、Babaei, Y. 等人 (2023b). LLaMA 2: open foundation and fine-tuned chat models. ArXiv:2307.09288。
- Tsoumakas & Katakis, 2007
Tsoumakas, G., 和 Katakis, I. (2007). Multi-label classification: an overview. International Journal of Data Warehousing and Mining, 3(3), 1–13。
- Turing, 1950
Turing, A. (1950). Computing machinery and intelligence. Mind, 59(236), 433。
- Toscher 等人, 2009
Töscher, A.、Jahrer, M. 和 Bell, R. M. (2009). The bigchaos solution to the Netflix grand prize。
- Uijlings 等人, 2013
Uijlings, J. R.、Van De Sande, K. E.、Gevers, T. 和 Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171。
- Vapnik, 1995
Vapnik, V. (1995). The Nature of Statistical Learning Theory。纽约:Springer。
- Vapnik, 1998
Vapnik, V. (1998). Statistical Learning Theory。纽约:John Wiley and Sons。
- Vapnik & Chervonenkis, 1964
Vapnik, V., 和 Chervonenkis, A. (1964). A note on one class of perceptrons. Automation and Remote Control, 25。
- Vapnik & Chervonenkis, 1968
Vapnik, V., 和 Chervonenkis, A. (1968). Uniform convergence of frequencies of occurence of events to their probabilities. Dokl. Akad. Nauk SSSR, 181, 915-918。
- Vapnik & Chervonenkis, 1971
Vapnik, V., 和 Chervonenkis, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl., 16(2), 264-281。
- Vapnik & Chervonenkis, 1981
Vapnik, V., 和 Chervonenkis, A. (1981). The necessary and sufficient conditions for the uniform convergence of averages to their expected values. Teoriya Veroyatnostei i Ee Primeneniya, 26(3), 543-564。
- Vapnik & Chervonenkis, 1991
Vapnik, V., 和 Chervonenkis, A. (1991). The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognition and Image Analysis, 1(3), 283-305。
- Vapnik & Chervonenkis, 1974
Vapnik, V. N., 和 Chervonenkis, A. Y. (1974). Ordered risk minimization. Automation and Remote Control, 35, 1226–1235, 1403–1412。
- Vapnik, 1992
Vapnik, V. (1992). Principles of risk minimization for learning theory. Advances in Neural Information Processing Systems (第 831–838 页)。
- Vapnik 等人, 1994
Vapnik, V.、Levin, E. 和 Le Cun, Y. (1994). Measuring the VC-dimension of a learning machine. Neural Computation, 6(5), 851–876。
- Vaswani 等人, 2017
Vaswani, A.、Shazeer, N.、Parmar, N.、Uszkoreit, J.、Jones, L.、Gomez, A. N. 等人 (2017). Attention is all you need. Advances in Neural Information Processing Systems (第 5998–6008 页)。
- Wahba, 1990
Wahba, G. (1990). Spline Models for Observational Data。SIAM。
- Waibel 等人, 1989
Waibel, A.、Hanazawa, T.、Hinton, G.、Shikano, K. 和 Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(3), 328–339。
- Wang 等人, 2022
Wang, H.、Zhang, A.、Zheng, S.、Shi, X.、Li, M. 和 Wang, Z. (2022). Removing batch normalization boosts adversarial training. International Conference on Machine Learning (第 23433–23445 页)。
- Wang 等人, 2018
Wang, L.、Li, M.、Liberty, E. 和 Smola, A. J. (2018). Optimal message scheduling for aggregation. Networks, 2(3), 2–3。
- Wang 等人, 2019
Wang, Q.、Li, B.、Xiao, T.、Zhu, J.、Li, C.、Wong, D. F. 和 Chao, L. S. (2019). Learning deep transformer models for machine translation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (第 1810–1822 页)。
- Wang 等人, 2023
Wang, X.、Wei, J.、Schuurmans, D.、Le, Q.、Chi, E. 和 Zhou, D. (2023). Self-consistency improves chain of thought reasoning in language models. International Conference on Learning Representations。
- Wang 等人, 2016
Wang, Y.、Davidson, A.、Pan, Y.、Wu, Y.、Riffel, A. 和 Owens, J. D. (2016). Gunrock: a high-performance graph processing library on the GPU. ACM SIGPLAN Notices (p. 11)。
- Warstadt 等人, 2019
Warstadt, A.、Singh, A. 和 Bowman, S. R. (2019). Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 7, 625–641。
- Wasserman, 2013
Wasserman, L. (2013). All of Statistics: A Concise Course in Statistical Inference。Springer。
- Watkins & Dayan, 1992
Watkins, C. J., 和 Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292。
- Watson, 1964
Watson, G. S. (1964). Smooth regression analysis. Sankhyā: The Indian Journal of Statistics, Series A, 第 359–372 页。
- Wei 等人, 2021
Wei, J.、Bosma, M.、Zhao, V. Y.、Guu, K.、Yu, A. W.、Lester, B. 等人 (2021). Finetuned language models are zero-shot learners. ArXiv:2109.01652。
- Wei 等人, 2022a
Wei, J.、Tay, Y.、Bommasani, R.、Raffel, C.、Zoph, B.、Borgeaud, S. 等人 (2022). Emergent abilities of large language models. ArXiv:2206.07682。
- Wei 等人, 2022b
Wei, J.、Wang, X.、Schuurmans, D.、Bosma, M.、Chi, E.、Le, Q. 和 Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. ArXiv:2201.11903。
- Welling & Teh, 2011
Welling, M., 和 Teh, Y. W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. Proceedings of the 28th International Conference on Machine Learning (ICML-11) (第 681–688 页)。
- Wengert, 1964
Wengert, R. E. (1964). A simple automatic derivative evaluation program. Communications of the ACM, 7(8), 463–464。
- Werbos, 1990
Werbos, P. J. (1990). Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), 1550–1560。
- Wigner, 1958
Wigner, E. P. (1958). On the distribution of the roots of certain symmetric matrices. Ann. Math. (第 325–327 页)。
- Wilson & Izmailov, 2020
Wilson, A. G., 和 Izmailov, P. (2020). Bayesian deep learning and a probabilistic perspective of generalization. Advances in Neural Information Processing Systems, 33, 4697–4708。
- Wistuba 等人, 2019
Wistuba, M.、Rawat, A. 和 Pedapati, T. (2019). A survey on neural architecture search. ArXiv:1905.01392 [cs.LG]。
- Wistuba 等人, 2018
Wistuba, M.、Schilling, N. 和 Schmidt-Thieme, L. (2018). Scalable Gaussian process-based transfer surrogates for hyperparameter optimization. Machine Learning, 108, 43–78。
- Wolpert & Macready, 1995
Wolpert, D. H., 和 Macready, W. G. (1995). No free lunch theorems for search。技术报告 SFI-TR-95-02-010, Santa Fe Institute。
- Wood 等人, 2011
Wood, F.、Gasthaus, J.、Archambeau, C.、James, L. 和 Teh, Y. W. (2011). The sequence memoizer. Communications of the ACM, 54(2), 91–98。
- Wu 等人, 2018
Wu, B.、Wan, A.、Yue, X.、Jin, P.、Zhao, S.、Golmant, N. 等人 (2018). Shift: a zero flop, zero parameter alternative to spatial convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (第 9127–9135 页)。
- Wu 等人, 2016
Wu, Y.、Schuster, M.、Chen, Z.、Le, Q. V.、Norouzi, M.、Macherey, W. 等人 (2016). Google's neural machine translation system: bridging the gap between human and machine translation. ArXiv:1609.08144。
- Xiao 等人, 2017
Xiao, H.、Rasul, K. 和 Vollgraf, R. (2017). Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. ArXiv:1708.07747。
- Xiao 等人, 2018
Xiao, L.、Bahri, Y.、Sohl-Dickstein, J.、Schoenholz, S. 和 Pennington, J. (2018). Dynamical isometry and a mean field theory of CNNs: how to train 10,000-layer vanilla convolutional neural networks. International Conference on Machine Learning (第 5393–5402 页)。
- Xie 等人, 2017
Xie, S.、Girshick, R.、Dollár, P.、Tu, Z. 和 He, K. (2017). Aggregated residual transformations for deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (第 1492–1500 页)。
- Xiong 等人, 2020
Xiong, R.、Yang, Y.、He, D.、Zheng, K.、Zheng, S.、Xing, C. 等人 (2020). On layer normalization in the transformer architecture. International Conference on Machine Learning (第 10524–10533 页)。
- Xiong 等人, 2018
Xiong, W.、Wu, L.、Alleva, F.、Droppo, J.、Huang, X. 和 Stolcke, A. (2018). The Microsoft 2017 conversational speech recognition system. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (第 5934–5938 页)。
- Yamaguchi 等人, 1990
Yamaguchi, K.、Sakamoto, K.、Akabane, T. 和 Fujimoto, Y. (1990). A neural network for speaker-independent isolated word recognition. First International Conference on Spoken Language Processing。
- Yang 等人, 2016
Yang, Z.、Hu, Z.、Deng, Y.、Dyer, C. 和 Smola, A. (2016). Neural machine translation with recurrent attention modeling. ArXiv:1607.05108。
- Yang 等人, 2015
Yang, Z.、Moczulski, M.、Denil, M.、De Freitas, N.、Smola, A.、Song, L. 和 Wang, Z. (2015). Deep fried convnets. Proceedings of the IEEE International Conference on Computer Vision (第 1476–1483 页)。
- Ye 等人, 2011
Ye, M.、Yin, P.、Lee, W.-C. 和 Lee, D.-L. (2011). Exploiting geographical influence for collaborative point-of-interest recommendation. Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (第 325–334 页)。
- You 等人, 2017
You, Y.、Gitman, I. 和 Ginsburg, B. (2017). Large batch training of convolutional networks. ArXiv:1708.03888。
- Yu 等人, 2022
Yu, J.、Xu, Y.、Koh, J. Y.、Luong, T.、Baid, G.、Wang, Z. 等人 (2022). Scaling autoregressive models for content-rich text-to-image generation. ArXiv:2206.10789。
- Zaheer 等人, 2018
Zaheer, M.、Reddi, S.、Sachan, D.、Kale, S. 和 Kumar, S. (2018). Adaptive methods for nonconvex optimization. Advances in Neural Information Processing Systems (第 9793–9803 页)。
- Zeiler, 2012
Zeiler, M. D. (2012). ADADELTA: an adaptive learning rate method. ArXiv:1212.5701。
- Zeiler & Fergus, 2013
Zeiler, M. D., 和 Fergus, R. (2013). Stochastic pooling for regularization of deep convolutional neural networks. ArXiv:1301.3557。
- Zhang 等人, 2021a
Zhang, A.、Tay, Y.、Zhang, S.、Chan, A.、Luu, A. T.、Hui, S. C. 和 Fu, J. (2021). Beyond fully-connected layers with quaternions: parameterization of hypercomplex multiplications with 1/n parameters. International Conference on Learning Representations。
- Zhang 等人, 2021b
Zhang, C.、Bengio, S.、Hardt, M.、Recht, B. 和 Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3), 107–115。
- Zhang 等人, 2019
Zhang, S.、Yao, L.、Sun, A. 和 Tay, Y. (2019). Deep learning based recommender system: a survey and new perspectives. ACM Computing Surveys, 52(1), 5。
- Zhang 等人, 2022
Zhang, S.、Roller, S.、Goyal, N.、Artetxe, M.、Chen, M.、Chen, S. 等人 (2022). OPT: open pre-trained transformer language models. ArXiv:2205.01068。
- Zhang 等人, 1988
Zhang, W.、Tanida, J.、Itoh, K. 和 Ichioka, Y. (1988). Shift-invariant pattern recognition neural network and its optical architecture. Proceedings of Annual Conference of the Japan Society of Applied Physics。
- Zhang 等人, 2021c
Zhang, Y.、Sun, P.、Jiang, Y.、Yu, D.、Yuan, Z.、Luo, P. 等人 (2021). ByteTrack: multi-object tracking by associating every detection box. ArXiv:2110.06864。
- Zhang 等人, 2023a
Zhang, Z.、Zhang, A.、Li, M. 和 Smola, A. (2023). Automatic chain of thought prompting in large language models. International Conference on Learning Representations。
- Zhang 等人, 2023b
Zhang, Z.、Zhang, A.、Li, M.、Zhao, H.、Karypis, G. 和 Smola, A. (2023). Multimodal chain-of-thought reasoning in language models. ArXiv:2302.00923。
- Zhao 等人, 2019
Zhao, Z.-Q.、Zheng, P.、Xu, S.-t. 和 Wu, X. (2019). Object detection with deep learning: a review. IEEE Transactions on Neural Networks and Learning Systems, 30(11), 3212–3232。
- Zhou 等人, 2023
Zhou, D.、Schärli, N.、Hou, L.、Wei, J.、Scales, N.、Wang, X. 等人 (2023). Least-to-most prompting enables complex reasoning in large language models. International Conference on Learning Representations。
- Zhu 等人, 2017
Zhu, J.-Y.、Park, T.、Isola, P. 和 Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE International Conference on Computer Vision (第 2223–2232 页)。
- Zhu 等人, 2015
Zhu, Y.、Kiros, R.、Zemel, R.、Salakhutdinov, R.、Urtasun, R.、Torralba, A. 和 Fidler, S. (2015). Aligning books and movies: towards story-like visual explanations by watching movies and reading books. Proceedings of the IEEE International Conference on Computer Vision (第 19–27 页)。
- Zoph & Le, 2016
Zoph, B., 和 Le, Q. V. (2016). Neural architecture search with reinforcement learning. ArXiv:1611.01578。