参考文献
在 Colab 中打开 Notebook
在 Colab 中打开 Notebook
在 Colab 中打开 Notebook
在 Colab 中打开 Notebook
在 SageMaker Studio Lab 中打开 Notebook

Abadi et al., 2016

Abadi, M.、Barham, P.、Chen, J.、Chen, Z.、Davis, A.、Dean, J. 等 (2016)。TensorFlow: a system for large-scale machine learning。第 12 届 USENIX 操作系统设计与实现研讨会 (OSDI 16) (第 265–283 页)。

Abdel-Hamid et al., 2014

Abdel-Hamid, O.、Mohamed, A.-R.、Jiang, H.、Deng, L.、Penn, G. 和 Yu, D. (2014)。Convolutional neural networks for speech recognition。IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(10), 1533–1545。

Ahmed et al., 2012

Ahmed, A.、Aly, M.、Gonzalez, J.、Narayanamurthy, S. 和 Smola, A. J. (2012)。Scalable inference in latent variable models。第五届 ACM 国际网络搜索与数据挖掘会议论文集 (第 123–132 页)。

Akiba et al., 2019

Akiba, T.、Sano, S.、Yanase, T.、Ohta, T. 和 Koyama, M. (2019)。Optuna: a next-generation hyperparameter optimization framework。第 25 届 ACM SIGKDD 国际知识发现与数据挖掘会议论文集

Alayrac et al., 2022

Alayrac, J.-B.、Donahue, J.、Luc, P.、Miech, A.、Barr, I.、Hasson, Y. 等 (2022)。Flamingo: a visual language model for few-shot learning。ArXiv:2204.14198

Alsallakh et al., 2020

Alsallakh, B.、Kokhlikyan, N.、Miglani, V.、Yuan, J. 和 Reblitz-Richardson, O. (2020)。Mind the PAD – CNNs can develop blind spots。ArXiv:2010.02178

Anil et al., 2023

Anil, R.、Dai, A. M.、Firat, O.、Johnson, M.、Lepikhin, D.、Passos, A. 等 (2023)。PaLM 2 Technical Report。ArXiv:2305.10403

Anil et al., 2020

Anil, R.、Gupta, V.、Koren, T.、Regan, K. 和 Singer, Y. (2020)。Scalable second-order optimization for deep learning。ArXiv:2002.09018

Aronszajn, 1950

Aronszajn, N. (1950)。Theory of reproducing kernels。Transactions of the American Mathematical Society, 68(3), 337–404。

Ba et al., 2016

Ba, J. L.、Kiros, J. R. 和 Hinton, G. E. (2016)。Layer normalization。ArXiv:1607.06450

Baevski & Auli, 2018

Baevski, A. 和 Auli, M. (2018)。Adaptive input representations for neural language modeling。国际学习表征会议

Bahdanau et al., 2014

Bahdanau, D.、Cho, K. 和 Bengio, Y. (2014)。Neural machine translation by jointly learning to align and translate。ArXiv:1409.0473

Bai et al., 2022

Bai, Y.、Kadavath, S.、Kundu, S.、Askell, A.、Kernion, J.、Jones, A. 等 (2022)。Constitutional AI: harmlessness from AI feedback。ArXiv:2212.08073

Baptista & Poloczek, 2018

Baptista, R. 和 Poloczek, M. (2018)。Bayesian optimization of combinatorial structures。第 35 届国际机器学习会议论文集

Bardenet et al., 2013

Bardenet, R.、Brendel, M.、Kégl, B. 和 Sebag, M. (2013)。Collaborative hyperparameter tuning。第 30 届国际机器学习会议论文集 (ICML'13)

Bay et al., 2006

Bay, H.、Tuytelaars, T. 和 Van Gool, L. (2006)。SURF: Speeded up robust features。欧洲计算机视觉会议 (第 404–417 页)。

Bellman, 1966

Bellman, R. (1966)。Dynamic programming。Science, 153, 34–37。

Bellman, 1952

Bellman, R. (1952)。On the theory of dynamic programming。Proceedings of the National Academy of Sciences, 38(8), 716–719。

Bellman, 1957a

Bellman, R. (1957)。A Markovian decision process。Journal of Mathematics and Mechanics, 6(5), 679–684。URL: http://www.jstor.org/stable/24900506

Bellman, 1957b

Bellman, R. (1957)。Dynamic Programming。Dover Publications。

Beltagy et al., 2020

Beltagy, I.、Peters, M. E. 和 Cohan, A. (2020)。Longformer: the long-document transformer。ArXiv:2004.05150

Bengio et al., 2003

Bengio, Y.、Ducharme, R.、Vincent, P. 和 Jauvin, C. (2003)。A neural probabilistic language model。Journal of Machine Learning Research, 3(Feb), 1137–1155。

Bengio et al., 1994

Bengio, Y.、Simard, P. 和 Frasconi, P. (1994)。Learning long-term dependencies with gradient descent is difficult。IEEE Transactions on Neural Networks, 5(2), 157–166。

Bergstra et al., 2011

Bergstra, J.、Bardenet, R.、Bengio, Y. 和 Kégl, B. (2011)。Algorithms for hyper-parameter optimization。Advances in Neural Information Processing Systems, 24

Bergstra et al., 2010

Bergstra, J.、Breuleux, O.、Bastien, F.、Lamblin, P.、Pascanu, R.、Desjardins, G. … Bengio, Y. (2010)。Theano: a CPU and GPU math compiler in Python。Proc. 9th Python in Science Conference (第 3–10 页)。

Beutel et al., 2014

Beutel, A.、Murray, K.、Faloutsos, C. 和 Smola, A. J. (2014)。CoBaFi: collaborative Bayesian filtering。第 23 届国际万维网会议论文集 (第 97–108 页)。

Bishop, 1995

Bishop, C. M. (1995)。Training with noise is equivalent to Tikhonov regularization。Neural Computation, 7(1), 108–116。

Bishop, 2006

Bishop, C. M. (2006)。Pattern Recognition and Machine Learning。Springer。

Black & Scholes, 1973

Black, F. 和 Scholes, M. (1973)。The pricing of options and corporate liabilities。Journal of Political Economy, 81, 637–654。

Bodla et al., 2017

Bodla, N.、Singh, B.、Chellappa, R. 和 Davis, L. S. (2017)。Soft-NMS-improving object detection with one line of code。IEEE 国际计算机视觉会议论文集 (第 5561–5569 页)。

Bojanowski et al., 2017

Bojanowski, P.、Grave, E.、Joulin, A. 和 Mikolov, T. (2017)。Enriching word vectors with subword information。Transactions of the Association for Computational Linguistics, 5, 135–146。

Bollobas, 1999

Bollobás, B. (1999)。Linear Analysis。Cambridge University Press。

Bommasani et al., 2021

Bommasani, R.、Hudson, D. A.、Adeli, E.、Altman, R.、Arora, S.、von Arx, S. 等 (2021)。On the opportunities and risks of foundation models。ArXiv:2108.07258

Bottou, 2010

Bottou, L. (2010)。Large-scale machine learning with stochastic gradient descent。COMPSTAT'2010 论文集 (第 177–186 页)。Springer。

Bottou & Le Cun, 1988

Bottou, L. 和 Le Cun, Y. (1988)。SN: a simulator for connectionist models。Proceedings of NeuroNimes 88 (第 371–382 页)。法国尼姆。URL: http://leon.bottou.org/papers/bottou-lecun-88

Boucheron et al., 2005

Boucheron, S.、Bousquet, O. 和 Lugosi, G. (2005)。Theory of classification: a survey of some recent advances。ESAIM: Probability and Statistics, 9, 323–375。

Bowman et al., 2015

Bowman, S. R.、Angeli, G.、Potts, C. 和 Manning, C. D. (2015)。A large annotated corpus for learning natural language inference。ArXiv:1508.05326

Boyd & Vandenberghe, 2004

Boyd, S. 和 Vandenberghe, L. (2004)。Convex Optimization。英格兰剑桥: Cambridge University Press。

Bradley & Terry, 1952

Bradley, R. A. 和 Terry, M. E. (1952)。Rank analysis of incomplete block designs: I. The method of paired comparisons。Biometrika, 39(3/4), 324–345。

Brown & Sandholm, 2017

Brown, N. 和 Sandholm, T. (2017)。Libratus: the superhuman AI for no-limit poker。IJCAI (第 5226–5228 页)。

Brown et al., 1990

Brown, P. F.、Cocke, J.、Della Pietra, S. A.、Della Pietra, V. J.、Jelinek, F.、Lafferty, J. … Roossin, P. S. (1990)。A statistical approach to machine translation。Computational Linguistics, 16(2), 79–85。

Brown et al., 1988

Brown, P. F.、Cocke, J.、Della Pietra, S. A.、Della Pietra, V. J.、Jelinek, F.、Mercer, R. L. 和 Roossin, P. (1988)。A statistical approach to language translation。COLING Budapest 1988 Volume 1: International Conference on Computational Linguistics

Brown et al., 2020

Brown, T.、Mann, B.、Ryder, N.、Subbiah, M.、Kaplan, J. D.、Dhariwal, P. 等 (2020)。Language models are few-shot learners。Advances in Neural Information Processing Systems, 33, 1877–1901。

Buslaev et al., 2020

Buslaev, A.、Iglovikov, V. I.、Khvedchenya, E.、Parinov, A.、Druzhinin, M. 和 Kalinin, A. A. (2020)。Albumentations: Fast and flexible image augmentations。Information, 11(2), 125。

Campbell et al., 2002

Campbell, M.、Hoane Jr, A. J. 和 Hsu, F.-h. (2002)。Deep blue。Artificial Intelligence, 134(1-2), 57–83。

Canny, 1987

Canny, J. (1987)。A computational approach to edge detection。Readings in Computer Vision (第 184–203 页)。Elsevier。

Cer et al., 2017

Cer, D.、Diab, M.、Agirre, E.、Lopez-Gazpio, I. 和 Specia, L. (2017)。SemEval-2017 Task 1: semantic textual similarity multilingual and crosslingual focused evaluation。第 11 届国际语义评估研讨会论文集 (SemEval-2017) (第 1–14 页)。

Chan et al., 2015

Chan, W.、Jaitly, N.、Le, Q. V. 和 Vinyals, O. (2015)。Listen, attend and spell。ArXiv:1508.01211

Chen et al., 2021

Chen, L.、Lu, K.、Rajeswaran, A.、Lee, K.、Grover, A.、Laskin, M. … Mordatch, I. (2021)。Decision transformer: reinforcement learning via sequence modeling。Advances in Neural Information Processing Systems, 34, 15084–15097。

Chen et al., 2015

Chen, T.、Li, M.、Li, Y.、Lin, M.、Wang, N.、Wang, M. … Zhang, Z. (2015)。MXNET: a flexible and efficient machine learning library for heterogeneous distributed systems。ArXiv:1512.01274

Cheng et al., 2016

Cheng, J.、Dong, L. 和 Lapata, M. (2016)。Long short-term memory-networks for machine reading。2016 年自然语言处理经验方法会议论文集 (第 551–561 页)。

Chetlur et al., 2014

Chetlur, S.、Woolley, C.、Vandermersch, P.、Cohen, J.、Tran, J.、Catanzaro, B. 和 Shelhamer, E. (2014)。CuDNN: Efficient primitives for deep learning。ArXiv:1410.0759

Cho et al., 2014a

Cho, K.、Van Merriënboer, B.、Bahdanau, D. 和 Bengio, Y. (2014)。On the properties of neural machine translation: Encoder–decoder approaches。ArXiv:1409.1259

Cho et al., 2014b

Cho, K.、Van Merriënboer, B.、Gulcehre, C.、Bahdanau, D.、Bougares, F.、Schwenk, H. 和 Bengio, Y. (2014)。Learning phrase representations using RNN encoder–decoder for statistical machine translation。ArXiv:1406.1078

Chowdhery et al., 2022

Chowdhery, A.、Narang, S.、Devlin, J.、Bosma, M.、Mishra, G.、Roberts, A. 等 (2022)。PaLM: scaling language modeling with pathways。ArXiv:2204.02311

Chung et al., 2014

Chung, J.、Gulcehre, C.、Cho, K. 和 Bengio, Y. (2014)。Empirical evaluation of gated recurrent neural networks on sequence modeling。ArXiv:1412.3555

Clark et al., 2020

Clark, K.、Luong, M.-T.、Le, Q. V. 和 Manning, C. D. (2020)。ELECTRA: pre-training text encoders as discriminators rather than generators。国际学习表征会议

Collobert et al., 2011

Collobert, R.、Weston, J.、Bottou, L.、Karlen, M.、Kavukcuoglu, K. 和 Kuksa, P. (2011)。Natural language processing (almost) from scratch。Journal of Machine Learning Research, 12, 2493–2537。

Cordonnier et al., 2020

Cordonnier, J.-B.、Loukas, A. 和 Jaggi, M. (2020)。On the relationship between self-attention and convolutional layers。国际学习表征会议

Cover & Thomas, 1999

Cover, T. 和 Thomas, J. (1999)。Elements of Information Theory。John Wiley & Sons。

Csiszar, 2008

Csiszár, I. (2008)。Axiomatic characterizations of information measures。Entropy, 10(3), 261–273。

Cybenko, 1989

Cybenko, G. (1989)。Approximation by superpositions of a sigmoidal function。Mathematics of Control, Signals and Systems, 2(4), 303–314。

Dalal & Triggs, 2005

Dalal, N. 和 Triggs, B. (2005)。Histograms of oriented gradients for human detection。2005 IEEE 计算机学会计算机视觉与模式识别会议 (CVPR'05) (第 886–893 页)。

DeCock, 2011

De Cock, D. (2011)。Ames, Iowa: alternative to the Boston housing data as an end of semester regression project。Journal of Statistics Education, 19(3)。

Dean et al., 2012

Dean, J.、Corrado, G. S.、Monga, R.、Chen, K.、Devin, M.、Le, Q. V. 等 (2012)。Large scale distributed deep networks。第 25 届国际神经信息处理系统会议论文集, 第 1 卷 (第 1223–1231 页)。

DeCandia et al., 2007

DeCandia, G.、Hastorun, D.、Jampani, M.、Kakulapati, G.、Lakshman, A.、Pilchin, A. … Vogels, W. (2007)。Dynamo: Amazon's highly available key-value store。ACM SIGOPS Operating Systems Review (第 205–220 页)。

Deng et al., 2009

Deng, J.、Dong, W.、Socher, R.、Li, L.-J.、Li, K. 和 Fei-Fei, L. (2009)。Imagenet: a large-scale hierarchical image database。2009 IEEE 计算机视觉与模式识别会议 (第 248–255 页)。

DerKiureghian & Ditlevsen, 2009

Der Kiureghian, A. 和 Ditlevsen, O. (2009)。Aleatory or epistemic? does it matter?。Structural Safety, 31(2), 105–112。

Devlin et al., 2018

Devlin, J.、Chang, M.-W.、Lee, K. 和 Toutanova, K. (2018)。BERT: Pre-training of deep bidirectional transformers for language understanding。ArXiv:1810.04805

Dinh et al., 2014

Dinh, L.、Krueger, D. 和 Bengio, Y. (2014)。NICE: non-linear independent components estimation。ArXiv:1410.8516

Dinh et al., 2017

Dinh, L.、Sohl-Dickstein, J. 和 Bengio, S. (2017)。Density estimation using real NVP。国际学习表征会议

Doersch et al., 2015

Doersch, C.、Gupta, A. 和 Efros, A. A. (2015)。Unsupervised visual representation learning by context prediction。IEEE 国际计算机视觉会议论文集 (第 1422–1430 页)。

Dosovitskiy et al., 2021

Dosovitskiy, A.、Beyer, L.、Kolesnikov, A.、Weissenborn, D.、Zhai, X.、Unterthiner, T. 等 (2021)。An image is worth 16 x 16 words: transformers for image recognition at scale。国际学习表征会议

Duchi et al., 2011

Duchi, J.、Hazan, E. 和 Singer, Y. (2011)。Adaptive subgradient methods for online learning and stochastic optimization。Journal of Machine Learning Research, 12, 2121–2159。

Dumoulin & Visin, 2016

Dumoulin, V. 和 Visin, F. (2016)。A guide to convolution arithmetic for deep learning。ArXiv:1603.07285

Dwivedi & Bresson, 2020

Dwivedi, V. P. 和 Bresson, X. (2020)。A generalization of transformer networks to graphs。ArXiv:2012.09699

Dwork et al., 2015

Dwork, C.、Feldman, V.、Hardt, M.、Pitassi, T.、Reingold, O. 和 Roth, A. L. (2015)。Preserving statistical validity in adaptive data analysis。第 47 届年度 ACM 计算理论研讨会论文集 (第 117–126 页)。

Elman, 1990

Elman, J. L. (1990)。Finding structure in time。Cognitive Science, 14(2), 179–211。

Elsken et al., 2018

Elsken, T.、Metzen, J. H. 和 Hutter, F. (2018)。Neural architecture search: a ssurvey。ArXiv:1808.05377 [stat.ML]

Fechner, 1860

Fechner, G. T. (1860)。Elemente der Psychophysik。第 2 卷。Breitkopf u. Härtel。

Fedus et al., 2022

Fedus, W.、Zoph, B. 和 Shazeer, N. (2022)。Switch transformers: scaling to trillion parameter models with simple and efficient sparsity。Journal of Machine Learning Research, 23(120), 1–39。

Fernando, 2004

Fernando, R. (2004)。GPU Gems: Programming Techniques, Tips, and Tricks for Real-Time Graphics。Addison-Wesley。

Feurer & Hutter, 2018

Feurer, M. 和 Hutter, F. (2018)。Hyperparameter ptimization。Automatic Machine Learning: Methods, Systems, Challenges。Springer。

Feurer et al., 2022

Feurer, M.、Letham, B.、Hutter, F. 和 Bakshy, E. (2022)。Practical transfer learning for Bayesian optimization。ArXiv:1802.02219 [stat.ML]

Field, 1987

Field, D. J. (1987)。Relations between the statistics of natural images and the response properties of cortical cells。JOSA A, 4(12), 2379–2394。

Fisher, 1925

Fisher, R. A. (1925)。Statistical Methods for Research Workers. Oliver & Boyd。

Flammarion & Bach, 2015

Flammarion, N. 和 Bach, F. (2015)。From averaging to acceleration, there is only a step-size。Conference on Learning Theory (第 658–695 页)。

Forrester et al., 2007

Forrester, A. I.、Sóbester, A. 和 Keane, A. J. (2007)。Multi-fidelity optimization via surrogate modelling。Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 463(2088), 3251–3269。

Franceschi et al., 2017

Franceschi, L.、Donini, M.、Frasconi, P. 和 Pontil, M. (2017)。Forward and reverse gradient-based hyperparameter optimization。第 34 届国际机器学习会议论文集 (ICML'17)

Frankle & Carbin, 2018

Frankle, J. 和 Carbin, M. (2018)。The lottery ticket hypothesis: finding sparse, trainable neural networks。ArXiv:1803.03635

Frazier, 2018

Frazier, P. I. (2018)。A tutorial on Bayesian optimization。ArXiv:1807.02811

Freund & Schapire, 1996

Freund, Y. 和 Schapire, R. E. (1996)。Experiments with a new boosting algorithm。国际机器学习会议论文集 (第 148–156 页)。

Friedman, 1987

Friedman, J. H. (1987)。Exploratory projection pursuit。Journal of the American Statistical Association, 82(397), 249–266。

Frostig et al., 2018

Frostig, R.、Johnson, M. J. 和 Leary, C. (2018)。Compiling machine learning programs via high-level tracing。Proceedings of Systems for Machine Learning

Fukushima, 1982

Fukushima, K. (1982)。Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition。Competition and Cooperation in Neural Nets (第 267–285 页)。Springer。

Gardner et al., 2018

Gardner, J.、Pleiss, G.、Weinberger, K. Q.、Bindel, D. 和 Wilson, A. G. (2018)。GPyTorch: blackbox matrix–matrix Gaussian process inference with GPU acceleration。Advances in Neural Information Processing Systems

Garg et al., 2021

Garg, S.、Balakrishnan, S.、Kolter, Z. 和 Lipton, Z. (2021)。RATT: leveraging unlabeled data to guarantee generalization。国际机器学习会议 (第 3598–3609 页)。

Gatys et al., 2016

Gatys, L. A.、Ecker, A. S. 和 Bethge, M. (2016)。Image style transfer using convolutional neural networks。IEEE 计算机视觉与模式识别会议论文集 (第 2414–2423 页)。

Gauss, 1809

Gauss, C. F. (1809)。Theoria motus corporum coelestum。Werke。Königlich Preussische Akademie der Wissenschaften。

Gibbs, 1902

Gibbs, J. W. (1902)。Elementary Principles of Statistical Mhanics。Scribner's。

Ginibre, 1965

Ginibre, J. (1965)。Statistical ensembles of complex, quaternion, and real matrices。Journal of Mathematical Physics, 6(3), 440–449。

Girshick, 2015

Girshick, R. (2015)。Fast R-CNN。IEEE 国际计算机视觉会议论文集 (第 1440–1448 页)。

Girshick et al., 2014

Girshick, R.、Donahue, J.、Darrell, T. 和 Malik, J. (2014)。Rich feature hierarchies for accurate object detection and semantic segmentation。IEEE 计算机视觉与模式识别会议论文集 (第 580–587 页)。

Glorot & Bengio, 2010

Glorot, X. 和 Bengio, Y. (2010)。Understanding the difficulty of training deep feedforward neural networks。第 13 届国际人工智能与统计学会议论文集 (第 249–256 页)。

Goh, 2017

Goh, G. (2017)。Why momentum really works。Distill。URL: http://distill.pub/2017/momentum

Goldberg et al., 1992

Goldberg, D.、Nichols, D.、Oki, B. M. 和 Terry, D. (1992)。Using collaborative filtering to weave an information tapestry。Communications of the ACM, 35(12), 61–71。

Golub & VanLoan, 1996

Golub, G. H. 和 Van Loan, C. F. (1996)。Matrix Computations。Johns Hopkins University Press。

Goodfellow et al., 2016

Goodfellow, I.、Bengio, Y. 和 Courville, A. (2016)。Deep Learning。MIT Press。http://www.deeplearningbook.org

Goodfellow et al., 2014

Goodfellow, I.、Pouget-Abadie, J.、Mirza, M.、Xu, B.、Warde-Farley, D.、Ozair, S. … Bengio, Y. (2014)。Generative adversarial nets。Advances in Neural Information Processing Systems (第 2672–2680 页)。

Gotmare et al., 2018

Gotmare, A.、Keskar, N. S.、Xiong, C. 和 Socher, R. (2018)。A closer look at deep learning heuristics: learning rate restarts, warmup and distillation。ArXiv:1810.13243

Goyal et al., 2021

Goyal, A.、Bochkovskiy, A.、Deng, J. 和 Koltun, V. (2021)。Non-deep networks。ArXiv:2110.07641

Graham, 2014

Graham, B. (2014)。Fractional max-pooling。ArXiv:1412.6071

Graves, 2013

Graves, A. (2013)。Generating sequences with recurrent neural networks。ArXiv:1308.0850

Graves et al., 2008

Graves, A.、Liwicki, M.、Fernández, S.、Bertolami, R.、Bunke, H. 和 Schmidhuber, J. (2008)。A novel connectionist system for unconstrained handwriting recognition。IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5), 855–868。

Graves & Schmidhuber, 2005

Graves, A. 和 Schmidhuber, J. (2005)。Framewise phoneme classification with bidirectional LSTM and other neural network architectures。Neural Networks, 18(5-6), 602–610。

Griewank, 1989

Griewank, A. (1989)。On automatic differentiation。Mathematical Programming: Recent Developments and Applications (第 83–107 页)。Kluwer。

Gulati et al., 2020

Gulati, A.、Qin, J.、Chiu, C.-C.、Parmar, N.、Zhang, Y.、Yu, J. 等 (2020)。Conformer: convolution-augmented transformer for speech recognition。Proc. Interspeech 2020, 第 5036–5040 页。

Gunawardana & Shani, 2015

Gunawardana, A. 和 Shani, G. (2015)。Evaluating recommender systems。Recommender Systems Handbook (第 265–308 页)。Springer。

Guo et al., 2017

Guo, H.、Tang, R.、Ye, Y.、Li, Z. 和 He, X. (2017)。Deepfm: a factorization-machine based neural network for ctr prediction。第 26 届国际人工智能联合会议论文集 (第 1725–1731 页)。

Guyon et al., 2008

Guyon, I.、Gunn, S.、Nikravesh, M. 和 Zadeh, L. A. (2008)。Feature Extraction: Foundations and Applications。Springer。

Hadjis et al., 2016

Hadjis, S.、Zhang, C.、Mitliagkas, I.、Iter, D. 和 Ré, C. (2016)。Omnivore: an optimizer for multi-device deep learning on CPUs and GPUs。ArXiv:1606.04487

Hartley & Zisserman, 2000

Hartley, R. 和 Zisserman, A. (2000)。Multiple View Geometry in Computer Vision。Cambridge University Press。

Hartley & Kahl, 2009

Hartley, R. I. 和 Kahl, F. (2009)。Global optimization through rotation space search。International Journal of Computer Vision, 82(1), 64–79。

He et al., 2022

He, K.、Chen, X.、Xie, S.、Li, Y.、Dollár, P. 和 Girshick, R. (2022)。Masked autoencoders are scalable vision learners。IEEE/CVF 计算机视觉与模式识别会议论文集 (第 16000–16009 页)。

He et al., 2017a

He, K.、Gkioxari, G.、Dollár, P. 和 Girshick, R. (2017)。Mask R-CNN。IEEE 国际计算机视觉会议论文集 (第 2961–2969 页)。

He et al., 2015

He, K.、Zhang, X.、Ren, S. 和 Sun, J. (2015)。Delving deep into rectifiers: surpassing human-level performance on ImageNet classification。IEEE 国际计算机视觉会议论文集 (第 1026–1034 页)。

He et al., 2016a

He, K.、Zhang, X.、Ren, S. 和 Sun, J. (2016)。Deep residual learning for image recognition。IEEE 计算机视觉与模式识别会议论文集 (第 770–778 页)。

He et al., 2016b

He, K.、Zhang, X.、Ren, S. 和 Sun, J. (2016)。Identity mappings in deep residual networks。欧洲计算机视觉会议 (第 630–645 页)。

He & Chua, 2017

He, X. 和 Chua, T.-S. (2017)。Neural factorization machines for sparse predictive analytics。第 40 届国际 ACM SIGIR 信息检索研究与发展会议论文集 (第 355–364 页)。

He et al., 2017b

He, X.、Liao, L.、Zhang, H.、Nie, L.、Hu, X. 和 Chua, T.-S. (2017)。Neural collaborative filtering。第 26 届国际万维网会议论文集 (第 173–182 页)。

Hebb, 1949

Hebb, D. O. (1949)。The Organization of Behavior。Wiley。

Hendrycks & Gimpel, 2016

Hendrycks, D. 和 Gimpel, K. (2016)。Gaussian error linear units (GELUs)。ArXiv:1606.08415

Hennessy & Patterson, 2011

Hennessy, J. L. 和 Patterson, D. A. (2011)。Computer Architecture: A Quantitative Approach。Elsevier。

Herlocker et al., 1999

Herlocker, J. L.、Konstan, J. A.、Borchers, A. 和 Riedl, J. (1999)。An algorithmic framework for performing collaborative filtering。第 22 届年度国际 ACM 信息检索研究与发展会议, SIGIR 1999 (第 230–237 页)。

Hidasi et al., 2015

Hidasi, B.、Karatzoglou, A.、Baltrunas, L. 和 Tikk, D. (2015)。Session-based recommendations with recurrent neural networks。ArXiv:1511.06939

Ho et al., 2020

Ho, J.、Jain, A. 和 Abbeel, P. (2020)。Denoising diffusion probabilistic models。Advances in Neural Information Processing Systems, 33, 6840–6851。

Hochreiter et al., 2001

Hochreiter, S.、Bengio, Y.、Frasconi, P. 和 Schmidhuber, J. (2001)。Gradient flow in recurrent nets: the difficulty of learning long-term dependencies。A Field Guide to Dynamical Recurrent Neural Networks。IEEE Press。

Hochreiter & Schmidhuber, 1997

Hochreiter, S. 和 Schmidhuber, J. (1997)。Long short-term memory。Neural Computation, 9(8), 1735–1780。

Hoffmann et al., 2022

Hoffmann, J.、Borgeaud, S.、Mensch, A.、Buchatskaya, E.、Cai, T.、Rutherford, E. 等 (2022)。Training compute-optimal large language models。ArXiv:2203.15556

Howard et al., 2019

Howard, A.、Sandler, M.、Chu, G.、Chen, L.-C.、Chen, B.、Tan, M. … Adam, H. (2019)。Searching for MobileNetV3。IEEE/CVF 国际计算机视觉会议论文集 (第 1314–1324 页)。

Hoyer et al., 2009

Hoyer, P. O.、Janzing, D.、Mooij, J. M.、Peters, J. 和 Schölkopf, B. (2009)。Nonlinear causal discovery with additive noise models。Advances in Neural Information Processing Systems (第 689–696 页)。

Hu et al., 2018

Hu, J.、Shen, L. 和 Sun, G. (2018)。Squeeze-and-excitation networks。IEEE 计算机视觉与模式识别会议论文集 (第 7132–7141 页)。

Hu et al., 2008

Hu, Y.、Koren, Y. 和 Volinsky, C. (2008)。Collaborative filtering for implicit feedback datasets。2008 第 8 届 IEEE 国际数据挖掘会议 (第 263–272 页)。

Hu et al., 2022

Hu, Z.、Lee, R. K.-W.、Aggarwal, C. C. 和 Zhang, A. (2022)。Text style transfer: a review and experimental evaluation。SIGKDD Explor. Newsl., 24(1)。URL: https://doi.org/10.1145/3544903.3544906

Huang et al., 2018

Huang, C.-Z. A.、Vaswani, A.、Uszkoreit, J.、Simon, I.、Hawthorne, C.、Shazeer, N. … Eck, D. (2018)。Music transformer: generating music with long-term structure。国际学习表征会议

Huang et al., 2017

Huang, G.、Liu, Z.、Van Der Maaten, L. 和 Weinberger, K. Q. (2017)。Densely connected convolutional networks。IEEE 计算机视觉与模式识别会议论文集 (第 4700–4708 页)。

Huang et al., 2015

Huang, Z.、Xu, W. 和 Yu, K. (2015)。Bidirectional LSTM–CRF models for sequence tagging。ArXiv:1508.01991

Hubel & Wiesel, 1959

Hubel, D. H. 和 Wiesel, T. N. (1959)。Receptive fields of single neurones in the cat's striate cortex。Journal of Physiology, 148(3), 574–591。

Hubel & Wiesel, 1962

Hubel, D. H. 和 Wiesel, T. N. (1962)。Receptive fields, binocular interaction and functional architecture in the cat's visual cortex。Journal of Physiology, 160(1), 106–154。

Hubel & Wiesel, 1968

Hubel, D. H. 和 Wiesel, T. N. (1968)。Receptive fields and functional architecture of monkey striate cortex。Journal of Physiology, 195(1), 215–243。

Hutter et al., 2011

Hutter, F.、Hoos, H. 和 Leyton-Brown, K. (2011)。Sequential model-based optimization for general algorithm configuration。第五届国际学习与智能优化会议论文集 (LION'11)

Hutter et al., 2019

Hutter, F.、Kotthoff, L. 和 Vanschoren, J. (编) (2019)。Automated Machine Learning: Methods, Systems, Challenges。Springer。

Ioffe, 2017

Ioffe, S. (2017)。Batch renormalization: towards reducing minibatch dependence in batch-normalized models。Advances in Neural Information Processing Systems (第 1945–1953 页)。

Ioffe & Szegedy, 2015

Ioffe, S. 和 Szegedy, C. (2015)。Batch normalization: accelerating deep network training by reducing internal covariate shift。ArXiv:1502.03167

Izmailov et al., 2018

Izmailov, P.、Podoprikhin, D.、Garipov, T.、Vetrov, D. 和 Wilson, A. G. (2018)。Averaging weights leads to wider optima and better generalization。ArXiv:1803.05407

Jacot et al., 2018

Jacot, A.、Gabriel, F. 和 Hongler, C. (2018)。Neural tangent kernel: convergence and generalization in neural networks。Advances in Neural Information Processing Systems

Jaeger, 2002

Jaeger, H. (2002)。Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach。GMD-Forschungszentrum Informationstechnik Bonn。

Jamieson & Talwalkar, 2016

Jamieson, K. 和 Talwalkar, A. (2016)。Non-stochastic best arm identification and hyperparameter optimization。第 17 届国际人工智能与统计学会议论文集

Jenatton et al., 2017

Jenatton, R.、Archambeau, C.、González, J. 和 Seeger, M. (2017)。Bayesian optimization with tree-structured dependencies。第 34 届国际机器学习会议论文集 (ICML'17)

Jia et al., 2018

Jia, X.、Song, S.、He, W.、Wang, Y.、Rong, H.、Zhou, F. 等 (2018)。Highly scalable deep learning training system with mixed-precision: training ImageNet in four minutes。ArXiv:1807.11205

Jia et al., 2014

Jia, Y.、Shelhamer, E.、Donahue, J.、Karayev, S.、Long, J.、Girshick, R. … Darrell, T. (2014)。Caffe: convolutional architecture for fast feature embedding。第 22 届 ACM 国际多媒体会议论文集 (第 675–678 页)。

Joshi et al., 2020

Joshi, M.、Chen, D.、Liu, Y.、Weld, D. S.、Zettlemoyer, L. 和 Levy, O. (2020)。SpanBERT: improving pre-training by representing and predicting spans。Transactions of the Association for Computational Linguistics, 8, 64–77。

Jouppi et al., 2017

Jouppi, N. P.、Young, C.、Patil, N.、Patterson, D.、Agrawal, G.、Bajwa, R. 等 (2017)。In-datacenter performance analysis of a tensor processing unit。2017 ACM/IEEE 第 44 届年度国际计算机体系结构研讨会 (ISCA) (第 1–12 页)。

Kalchbrenner et al., 2014

Kalchbrenner, N.、Grefenstette, E. 和 Blunsom, P. (2014)。A convolutional neural network for modelling sentences。ArXiv:1404.2188

Kalman & Kwasny, 1992

Kalman, B. L. 和 Kwasny, S. C. (1992)。Why tanh: choosing a sigmoidal function。国际神经网络联合会议论文集 (IJCNN) (第 578–581 页)。

Kaplan et al., 2020

Kaplan, J.、McCandlish, S.、Henighan, T.、Brown, T. B.、Chess, B.、Child, R. … Amodei, D. (2020)。Scaling laws for neural language models。ArXiv:2001.08361

Karnin et al., 2013

Karnin, Z.、Koren, T. 和 Somekh, O. (2013)。Almost optimal exploration in multi-armed bandits。第 30 届国际机器学习会议论文集 (ICML'13)

Karras et al., 2017

Karras, T.、Aila, T.、Laine, S. 和 Lehtinen, J. (2017)。Progressive growing of GANs for improved quality, stability, and variation。ArXiv:1710.10196

Kim et al., 2017

Kim, J.、El-Khamy, M. 和 Lee, J. (2017)。Residual LSTM: design of a deep recurrent architecture for distant speech recognition。ArXiv:1701.03360

Kim, 2014

Kim, Y. (2014)。Convolutional neural networks for sentence classification。ArXiv:1408.5882

Kimeldorf & Wahba, 1971

Kimeldorf, G. S. 和 Wahba, G. (1971)。Some results on Tchebycheffian spline functions。J. Math. Anal. Appl., 33, 82–95。

Kingma & Ba, 2014

Kingma, D. P. 和 Ba, J. (2014)。Adam: a method for stochastic optimization。ArXiv:1412.6980

Kingma & Welling, 2014

Kingma, D. P. 和 Welling, M. (2014)。Auto-encoding variational Bayes。国际学习表征会议 (ICLR)

Kipf & Welling, 2016

Kipf, T. N. 和 Welling, M. (2016)。Semi-supervised classification with graph convolutional networks。ArXiv:1609.02907

Kojima et al., 2022

Kojima, T.、Gu, S. S.、Reid, M.、Matsuo, Y. 和 Iwasawa, Y. (2022)。Large language models are zero-shot reasoners。arxiv.org/abs/2205.11916

Koller & Friedman, 2009

Koller, D. 和 Friedman, N. (2009)。Probabilistic Graphical Models: Principles and Techniques。MIT Press。

Kolmogorov, 1933

Kolmogorov, A. (1933)。Sulla determinazione empirica di una legge di distribuzione。Inst. Ital. Attuari, Giorn., 4, 83–91。

Kolter, 2008

Kolter, Z. (2008)。Linear algebra review and reference。在线查阅:http://cs229.stanford.edu/section/cs229-linalg.pdf

Koren et al., 2009

Koren, Y.、Bell, R. 和 Volinsky, C. (2009)。Matrix factorization techniques for recommender systems。Computer, 第 30–37 页。

Krizhevsky et al., 2012

Krizhevsky, A.、Sutskever, I. 和 Hinton, G. E. (2012)。ImageNet classification with deep convolutional neural networks。Advances in Neural Information Processing Systems (第 1097–1105 页)。

Kung, 1988

Kung, S. Y. (1988)。VLSI Array Processors。Prentice Hall

Kuzovkin et al., 2018

Kuzovkin, I.、Vicente, R.、Petton, M.、Lachaux, J.-P.、Baciu, M.、Kahane, P. … Aru, J. (2018)。Activations of deep convolutional neural networks are aligned with gamma band activity of human visual cortex。Communications Biology, 1(1), 1–12。

Lan et al., 2019

Lan, Z.、Chen, M.、Goodman, S.、Gimpel, K.、Sharma, P. 和 Soricut, R. (2019)。ALBERT: a lite BERT for self-supervised learning of language representations。ArXiv:1909.11942

Lavin & Gray, 2016

Lavin, A. 和 Gray, S. (2016)。Fast algorithms for convolutional neural networks。IEEE 计算机视觉与模式识别会议论文集 (第 4013–4021 页)。

Le, 2013

Le, Q. V. (2013)。Building high-level features using large scale unsupervised learning。IEEE 国际声学、语音与信号处理会议论文集 (第 8595–8598 页)。

LeCun et al., 1995a

LeCun, Y.、Bengio, Y. 和 等 (1995)。Convolutional networks for images, speech, and time series。The Handbook of Brain Theory and Neural Networks (第 3361 页)。MIT Press。

LeCun et al., 1989

LeCun, Y.、Boser, B.、Denker, J. S.、Henderson, D.、Howard, R. E.、Hubbard, W. 和 Jackel, L. D. (1989)。Backpropagation applied to handwritten zip code recognition。Neural Computation, 1(4), 541–551。

LeCun et al., 1998a

LeCun, Y.、Bottou, L.、Orr, G. 和 Muller, K.-R. (1998)。Efficient backprop。Neural Networks: Tricks of the Trade。Springer。

LeCun et al., 1998b

LeCun, Y.、Bottou, L.、Bengio, Y. 和 Haffner, P. (1998)。Gradient-based learning applied to document recognition。Proceedings of the IEEE, 86(11), 2278–2324。

LeCun et al., 1995b

LeCun, Y.、Jackel, L.、Bottou, L.、Brunot, A.、Cortes, C.、Denker, J. 等 (1995)。Comparison of learning algorithms for handwritten digit recognition。国际人工神经网络会议 (第 53–60 页)。

Legendre, 1805

Legendre, A. M. (1805)。Mémoire sur les Opérations Trigonométriques: dont les Résultats Dépendent de la Figure de la Terre。F. Didot。

Lewis et al., 2019

Lewis, M.、Liu, Y.、Goyal, N.、Ghazvininejad, M.、Mohamed, A.、Levy, O. … Zettlemoyer, L. (2019)。BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension。ArXiv:1910.13461

Lewkowycz et al., 2022

Lewkowycz, A.、Andreassen, A.、Dohan, D.、Dyer, E.、Michalewski, H.、Ramasesh, V. 等 (2022)。Solving quantitative reasoning problems with language models。ArXiv:2206.14858

Li et al., 2018

Li, L.、Jamieson, K.、Rostamizadeh, A.、Gonina, K.、Hardt, M.、Recht, B. 和 Talwalkar, A. (2018)。Massively parallel hyperparameter tuning。ArXiv:1810.05934

Li, 2017

Li, M. (2017)。Scaling Distributed Machine Learning with System and Algorithm Co-design (博士论文)。博士论文,CMU。

Li et al., 2014a

Li, M.、Andersen, D. G.、Park, J. W.、Smola, A. J.、Ahmed, A.、Josifovski, V. … Su, B.-Y. (2014)。Scaling distributed machine learning with the parameter server。第 11 届操作系统设计与实现研讨会 (OSDI 14) (第 583–598 页)。

Li et al., 2014b

Li, M.、Zhang, T.、Chen, Y. 和 Smola, A. J. (2014)。Efficient mini-batch training for stochastic optimization。第 20 届 ACM SIGKDD 国际知识发现与数据挖掘会议论文集 (第 661–670 页)。

Liaw et al., 2018

Liaw, R.、Liang, E.、Nishihara, R.、Moritz, P.、Gonzalez, J. 和 Stoica, I. (2018)。Tune: a research platform for distributed model selection and training。ArXiv:1807.05118

Lin et al., 2013

Lin, M.、Chen, Q. 和 Yan, S. (2013)。Network in network。ArXiv:1312.4400

Lin et al., 2017a

Lin, T.-Y.、Goyal, P.、Girshick, R.、He, K. 和 Dollár, P. (2017)。Focal loss for dense object detection。IEEE 国际计算机视觉会议论文集 (第 2980–2988 页)。

Lin et al., 2010

Lin, Y.、Lv, F.、Zhu, S.、Yang, M.、Cour, T.、Yu, K. … 等 (2010)。ImageNet classification: fast descriptor coding and large-scale SVM training。大规模视觉识别挑战赛

Lin et al., 2017b

Lin, Z.、Feng, M.、Santos, C. N. d.、Yu, M.、Xiang, B.、Zhou, B. 和 Bengio, Y. (2017)。A structured self-attentive sentence embedding。ArXiv:1703.03130

Lipton et al., 2015

Lipton, Z. C.、Berkowitz, J. 和 Elkan, C. (2015)。A critical review of recurrent neural networks for sequence learning。ArXiv:1506.00019

Lipton et al., 2016

Lipton, Z. C.、Kale, D. C.、Elkan, C. 和 Wetzel, R. (2016)。Learning to diagnose with LSTM recurrent neural networks。国际学习表征会议 (ICLR)

Lipton & Steinhardt, 2018

Lipton, Z. C. 和 Steinhardt, J. (2018)。Troubling trends in machine learning scholarship。Communications of the ACM, 17, 45–77。

Liu & Nocedal, 1989

Liu, D. C. 和 Nocedal, J. (1989)。On the limited memory BFGS method for large scale optimization。Mathematical Programming, 45(1), 503–528。

Liu et al., 2018

Liu, H.、Simonyan, K. 和 Yang, Y. (2018)。DARTS: differentiable architecture search。ArXiv:1806.09055

Liu et al., 2016

Liu, W.、Anguelov, D.、Erhan, D.、Szegedy, C.、Reed, S.、Fu, C.-Y. 和 Berg, A. C. (2016)。SSD: single shot multibox detector。欧洲计算机视觉会议 (第 21–37 页)。

Liu et al., 2019

Liu, Y.、Ott, M.、Goyal, N.、Du, J.、Joshi, M.、Chen, D. … Stoyanov, V. (2019)。RoBERTa: a robustly optimized BERT pretraining approach。ArXiv:1907.11692

Liu et al., 2021

Liu, Z.、Lin, Y.、Cao, Y.、Hu, H.、Wei, Y.、Zhang, Z. … Guo, B. (2021)。Swin transformer: hierarchical vision transformer using shifted windows。IEEE/CVF 国际计算机视觉会议论文集 (第 10012–10022 页)。

Liu et al., 2022

Liu, Z.、Mao, H.、Wu, C.-Y.、Feichtenhofer, C.、Darrell, T. 和 Xie, S. (2022)。A convNet for the 2020s。ArXiv:2201.03545

Long et al., 2015

Long, J.、Shelhamer, E. 和 Darrell, T. (2015)。Fully convolutional networks for semantic segmentation。IEEE 计算机视觉与模式识别会议论文集 (第 3431–3440 页)。

Loshchilov & Hutter, 2016

Loshchilov, I. 和 Hutter, F. (2016)。SGDR: stochastic gradient descent with warm restarts。ArXiv:1608.03983

Lowe, 2004

Lowe, D. G. (2004)。Distinctive image features from scale-invariant keypoints。International Journal of Computer Vision, 60(2), 91–110。

Luo et al., 2018

Luo, P.、Wang, X.、Shao, W. 和 Peng, Z. (2018)。Towards understanding regularization in batch normalization。ArXiv:1809.00846

Maas et al., 2011

Maas, A. L.、Daly, R. E.、Pham, P. T.、Huang, D.、Ng, A. Y. 和 Potts, C. (2011)。Learning word vectors for sentiment analysis。计算语言学协会第 49 届年会论文集:人类语言技术,第 1 卷 (第 142–150 页)。

Mack & Silverman, 1982

Mack, Y.-P. 和 Silverman, B. W. (1982)。Weak and strong uniform consistency of kernel regression estimates。Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 61(3), 405–415。

MacKay, 2003

MacKay, D. J. (2003)。Information Theory, Inference and Learning Algorithms。Cambridge University Press。

Maclaurin et al., 2015

Maclaurin, D.、Duvenaud, D. 和 Adams, R. (2015)。Gradient-based hyperparameter optimization through reversible learning。第 32 届国际机器学习会议论文集 (ICML'15)

Mangasarian, 1965

Mangasarian, O. L. (1965)。Linear and nonlinear separation of patterns by linear programming。Oper. Res., 13, 444-452。

Mangram, 2013

Mangram, M. E. (2013)。A simplified perspective of the Markowitz portfolio theory。Global Journal of Business Research, 7(1), 59–70。

Matthews et al., 2018

Matthews, A. G. d. G.、Rowland, M.、Hron, J.、Turner, R. E. 和 Ghahramani, Z. (2018)。Gaussian process behaviour in wide deep neural networks。ArXiv:1804.11271

McCann et al., 2017

McCann, B.、Bradbury, J.、Xiong, C. 和 Socher, R. (2017)。Learned in translation: Contextualized word vectors。Advances in Neural Information Processing Systems (第 6294–6305 页)。

McCulloch & Pitts, 1943

McCulloch, W. S. 和 Pitts, W. (1943)。A logical calculus of the ideas immanent in nervous activity。Bulletin of Mathematical Biophysics, 5(4), 115–133。

McMahan et al., 2013

McMahan, H. B.、Holt, G.、Sculley, D.、Young, M.、Ebner, D.、Grady, J. 等 (2013)。Ad click prediction: a view from the trenches。第 19 届 ACM SIGKDD 国际知识发现与数据挖掘会议论文集 (第 1222–1230 页)。

Mead, 1980

Mead, C. (1980)。Introduction to VLSI systems。IEE Proceedings I-Solid-State and Electron Devices, 128(1), 18。

Merity et al., 2016

Merity, S.、Xiong, C.、Bradbury, J. 和 Socher, R. (2016)。Pointer sentinel mixture models。ArXiv:1609.07843

Micchelli, 1984

Micchelli, C. A. (1984)。Interpolation of scattered data: distance matrices and conditionally positive definite functions。Approximation Theory and Spline Functions (第 143–145 页)。Springer。

Mikolov et al., 2013a

Mikolov, T.、Chen, K.、Corrado, G. 和 Dean, J. (2013)。Efficient estimation of word representations in vector space。ArXiv:1301.3781

Mikolov et al., 2013b

Mikolov, T.、Sutskever, I.、Chen, K.、Corrado, G. S. 和 Dean, J. (2013)。Distributed representations of words and phrases and their compositionality。Advances in Neural Information Processing Systems (第 3111–3119 页)。

Miller, 1995

Miller, G. A. (1995)。WordNet: a lexical database for English。Communications of the ACM, 38(11), 39–41。

Mirhoseini et al., 2017

Mirhoseini, A.、Pham, H.、Le, Q. V.、Steiner, B.、Larsen, R.、Zhou, Y. … Dean, J. (2017)。Device placement optimization with reinforcement learning。第 34 届国际机器学习会议 (第 2430–2439 页)。

Mnih et al., 2014

Mnih, V.、Heess, N.、Graves, A. 和 等 (2014)。Recurrent models of visual attention。Advances in Neural Information Processing Systems (第 2204–2212 页)。

Mnih et al., 2013

Mnih, V.、Kavukcuoglu, K.、Silver, D.、Graves, A.、Antonoglou, I.、Wierstra, D. 和 Riedmiller, M. (2013)。Playing Atari with deep reinforcement learning。ArXiv:1312.5602

Mnih et al., 2015

Mnih, V.、Kavukcuoglu, K.、Silver, D.、Rusu, A. A.、Veness, J.、Bellemare, M. G. 等 (2015)。Human-level control through deep reinforcement learning。Nature, 518(7540), 529–533。

Moon et al., 2010

Moon, T.、Smola, A.、Chang, Y. 和 Zheng, Z. (2010)。Intervalrank: isotonic regression with listwise and pairwise constraints。第 3 届 ACM 国际网络搜索与数据挖掘会议论文集 (第 151–160 页)。

Morey et al., 2016

Morey, R. D.、Hoekstra, R.、Rouder, J. N.、Lee, M. D. 和 Wagenmakers, E.-J. (2016)。The fallacy of placing confidence in confidence intervals。Psychonomic Bulletin & Review, 23(1), 103–123。

Morozov, 1984

Morozov, V. A. (1984)。Methods for Solving Incorrectly Posed Problems。Springer。

Nadaraya, 1964

Nadaraya, E. A. (1964)。On estimating regression。Theory of Probability & its Applications, 9(1), 141–142。

Nair & Hinton, 2010

Nair, V. 和 Hinton, G. E. (2010)。Rectified linear units improve restricted Boltzmann machines。ICML

Nakkiran et al., 2021

Nakkiran, P.、Kaplun, G.、Bansal, Y.、Yang, T.、Barak, B. 和 Sutskever, I. (2021)。Deep double descent: where bigger models and more data hurt。Journal of Statistical Mechanics: Theory and Experiment, 2021(12), 124003。

Naor & Reingold, 1999

Naor, M. 和 Reingold, O. (1999)。On the construction of pseudorandom permutations: Luby–Rackoff revisited。Journal of Cryptology, 12(1), 29–66。

Neal, 1996

Neal, R. M. (1996)。Bayesian Learning for Neural Networks。Springer。

Nesterov, 2018

Nesterov, Y. (2018)。Lectures on Convex Optimization。Springer。

Nesterov & Vial, 2000

Nesterov, Y. 和 Vial, J.-P. (2000)。Confidence level solutions for stochastic programming。Automatica, 44(6), 1559–1568。

Neyman, 1937

Neyman, J. (1937)。Outline of a theory of statistical estimation based on the classical theory of probability。Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences, 236(767), 333–380。

Norelli et al., 2022

Norelli, A.、Fumero, M.、Maiorca, V.、Moschella, L.、Rodolà, E. 和 Locatello, F. (2022)。ASIF: coupled data turns unimodal models to multimodal without training。ArXiv:2210.01738

Novak et al., 2018

Novak, R.、Xiao, L.、Lee, J.、Bahri, Y.、Yang, G.、Hron, J. … Sohl-Dickstein, J. (2018)。Bayesian deep convolutional networks with many channels are Gaussian processes。ArXiv:1810.05148

Novikoff, 1962

Novikoff, A. B. J. (1962)。On convergence proofs on perceptrons。Proceedings of the Symposium on the Mathematical Theory of Automata (第 615–622 页)。

Olshausen & Field, 1996

Olshausen, B. A. 和 Field, D. J. (1996)。Emergence of simple-cell receptive field properties by learning a sparse code for natural images。Nature, 381(6583), 607–609。

Ong et al., 2005

Ong, C. S.、Smola, A. 和 Williamson, R. (2005)。Learning the kernel with hyperkernels。Journal of Machine Learning Research, 6, 1043–1071。

OpenAI, 2023

OpenAI. (2023)。GPT-4 Technical Report。ArXiv:2303.08774

Ouyang et al., 2022

Ouyang, L.、Wu, J.、Jiang, X.、Almeida, D.、Wainwright, C. L.、Mishkin, P. 等 (2022)。Training language models to follow instructions with human feedback。ArXiv:2203.02155

Papineni et al., 2002

Papineni, K.、Roukos, S.、Ward, T. 和 Zhu, W.-J. (2002)。BLEU: a method for automatic evaluation of machine translation。计算语言学协会第 40 届年会论文集 (第 311–318 页)。

Parikh et al., 2016

Parikh, A. P.、Täckström, O.、Das, D. 和 Uszkoreit, J. (2016)。A decomposable attention model for natural language inference。ArXiv:1606.01933

Park et al., 2019

Park, T.、Liu, M.-Y.、Wang, T.-C. 和 Zhu, J.-Y. (2019)。Semantic image synthesis with spatially-adaptive normalization。IEEE 计算机视觉与模式识别会议论文集 (第 2337–2346 页)。

Parzen, 1957

Parzen, E. (1957)。On consistent estimates of the spectrum of a stationary time series。Annals of Mathematical Statistics, 28, 329–348。

Paszke et al., 2019

Paszke, A.、Gross, S.、Massa, F.、Lerer, A.、Bradbury, J.、Chanan, G. 等 (2019)。PyTorch: an imperative style, high-performance deep learning library。Advances in Neural Information Processing Systems, 32, 8026–8037。

Paulus et al., 2017

Paulus, R.、Xiong, C. 和 Socher, R. (2017)。A deep reinforced model for abstractive summarization。ArXiv:1705.04304

Penedo et al., 2023

Penedo, G.、Malartic, Q.、Hesslow, D.、Cojocaru, R.、Cappelli, A.、Alobeidli, H. … Launay, J. (2023)。The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only。ArXiv:2306.01116

Pennington et al., 2017

Pennington, J.、Schoenholz, S. 和 Ganguli, S. (2017)。Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice。Advances in Neural Information Processing Systems (第 4785–4795 页)。

Pennington et al., 2014

Pennington, J.、Socher, R. 和 Manning, C. (2014)。GloVe: global vectors for word representation。2014 年自然语言处理经验方法会议论文集 (EMNLP) (第 1532–1543 页)。

Peters et al., 2017a

Peters, J.、Janzing, D. 和 Schölkopf, B. (2017)。Elements of Causal Inference: Foundations and Learning Algorithms。MIT Press。

Peters et al., 2017b

Peters, M.、Ammar, W.、Bhagavatula, C. 和 Power, R. (2017)。Semi-supervised sequence tagging with bidirectional language models。计算语言学协会第 55 届年会论文集, 第 1 卷 (第 1756–1765 页)。

Peters et al., 2018

Peters, M.、Neumann, M.、Iyyer, M.、Gardner, M.、Clark, C.、Lee, K. 和 Zettlemoyer, L. (2018)。Deep contextualized word representations。2018 年北美计算语言学协会分会会议论文集:人类语言技术,第 1 卷 (第 2227–2237 页)。

Petersen & Pedersen, 2008

Petersen, K. B. 和 Pedersen, M. S. (2008)。The Matrix Cookbook。丹麦技术大学。

Pleiss et al., 2017

Pleiss, G.、Chen, D.、Huang, G.、Li, T.、Van Der Maaten, L. 和 Weinberger, K. Q. (2017)。Memory-efficient implementation of densenets。ArXiv:1707.06990

Polyak, 1964

Polyak, B. T. (1964)。Some methods of speeding up the convergence of iteration methods。USSR Computational Mathematics and Mathematical Physics, 4(5), 1–17。

Prakash et al., 2016

Prakash, A.、Hasan, S. A.、Lee, K.、Datla, V.、Qadir, A.、Liu, J. 和 Farri, O. (2016)。Neural paraphrase generation with stacked residual LSTM networks。ArXiv:1610.03098

Qin et al., 2023

Qin, C.、Zhang, A.、Zhang, Z.、Chen, J.、Yasunaga, M. 和 Yang, D. (2023)。Is ChatGPT a general-purpose natural language processing task solver?。ArXiv:2302.06476

Quadrana et al., 2018

Quadrana, M.、Cremonesi, P. 和 Jannach, D. (2018)。Sequence-aware recommender systems。ACM Computing Surveys, 51(4), 66。

Quinlan, 1993

Quinlan, J. R. (1993)。C4.5: Programs for Machine Learning。Elsevier。

Rabiner & Juang, 1993

Rabiner, L. 和 Juang, B.-H. (1993)。Fundamentals of Speech Recognition。Prentice-Hall。

Radford et al., 2021

Radford, A.、Kim, J. W.、Hallacy, C.、Ramesh, A.、Goh, G.、Agarwal, S. 等 (2021)。Learning transferable visual models from natural language supervision。国际机器学习会议 (第 8748–8763 页)。

Radford et al., 2015

Radford, A.、Metz, L. 和 Chintala, S. (2015)。Unsupervised representation learning with deep convolutional generative adversarial networks。ArXiv:1511.06434

Radford et al., 2018

Radford, A.、Narasimhan, K.、Salimans, T. 和 Sutskever, I. (2018)。Improving language understanding by generative pre-training。OpenAI

Radford et al., 2019

Radford, A.、Wu, J.、Child, R.、Luan, D.、Amodei, D. 和 Sutskever, I. (2019)。Language models are unsupervised multitask learners。OpenAI Blog, 1(8), 9。

Radosavovic et al., 2019

Radosavovic, I.、Johnson, J.、Xie, S.、Lo, W.-Y. 和 Dollár, P. (2019)。On network design spaces for visual recognition。IEEE/CVF 国际计算机视觉会议论文集 (第 1882–1890 页)。

Radosavovic et al., 2020

Radosavovic, I.、Kosaraju, R. P.、Girshick, R.、He, K. 和 Dollár, P. (2020)。Designing network design spaces。IEEE/CVF 计算机视觉与模式识别会议论文集 (第 10428–10436 页)。

Rae et al., 2021

Rae, J. W.、Borgeaud, S.、Cai, T.、Millican, K.、Hoffmann, J.、Song, F. 等 (2021)。Scaling language models: methods, analysis & insights from training gopher。ArXiv:2112.11446

Raffel et al., 2020

Raffel, C.、Shazeer, N.、Roberts, A.、Lee, K.、Narang, S.、Matena, M. … Liu, P. J. (2020)。Exploring the limits of transfer learning with a unified text-to-text transformer。Journal of Machine Learning Research, 21, 1–67。

Rajpurkar et al., 2016

Rajpurkar, P.、Zhang, J.、Lopyrev, K. 和 Liang, P. (2016)。SQuAD: 100,000+ questions for machine comprehension of text。ArXiv:1606.05250

Ramachandran et al., 2019

Ramachandran, P.、Parmar, N.、Vaswani, A.、Bello, I.、Levskaya, A. 和 Shlens, J. (2019)。Stand-alone self-attention in vision models。Advances in Neural Information Processing Systems, 32

Ramachandran et al., 2017

Ramachandran, P.、Zoph, B. 和 Le, Q. V. (2017). Searching for activation functions. ArXiv:1710.05941

Ramesh 等人, 2022

Ramesh, A.、Dhariwal, P.、Nichol, A.、Chu, C. 和 Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. ArXiv:2204.06125

Cajal & Azoulay, 1894

Ramón y Cajal, Santiago, 和 Azoulay, L. (1894). Les Nouvelles Idées sur la Structure du Système Nerveux chez l'Homme et chez les Vertébrés。巴黎, C. Reinwald & Cie。

Ranzato 等人, 2007

Ranzato, M.-A.、Boureau, Y.-L.、Chopra, S. 和 LeCun, Y. (2007). A unified energy-based framework for unsupervised learning. Artificial Intelligence and Statistics (第 371–379 页)。

Rasmussen & Williams, 2006

Rasmussen, C. E., 和 Williams, C. K. (2006). Gaussian Processes for Machine Learning。麻省理工学院出版社。

Reddi 等人, 2019

Reddi, S. J.、Kale, S. 和 Kumar, S. (2019). On the convergence of Adam and beyond. ArXiv:1904.09237

Redmon 等人, 2016

Redmon, J.、Divvala, S.、Girshick, R. 和 Farhadi, A. (2016). You only look once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (第 779–788 页)。

Redmon & Farhadi, 2018

Redmon, J., 和 Farhadi, A. (2018). YOLOv3: an incremental improvement. ArXiv:1804.02767

Reed & DeFreitas, 2015

Reed, S., 和 De Freitas, N. (2015). Neural programmer-interpreters. ArXiv:1511.06279

Reed 等人, 2022

Reed, S.、Zolna, K.、Parisotto, E.、Colmenarejo, S. G.、Novikov, A.、Barth-Maron, G. 等人 (2022). A generalist agent. ArXiv:2205.06175

Ren 等人, 2015

Ren, S.、He, K.、Girshick, R. 和 Sun, J. (2015). Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (第 91–99 页)。

Rendle, 2010

Rendle, S. (2010). Factorization machines. 2010 IEEE International Conference on Data Mining (第 995–1000 页)。

Rendle 等人, 2009

Rendle, S.、Freudenthaler, C.、Gantner, Z. 和 Schmidt-Thieme, L. (2009). BPR: Bayesian personalized ranking from implicit feedback. Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (第 452–461 页)。

Revels 等人, 2016

Revels, J.、Lubin, M. 和 Papamarkou, T. (2016). Forward-mode automatic differentiation in Julia. ArXiv:1607.07892

Rezende 等人, 2014

Rezende, D. J.、Mohamed, S. 和 Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. International Conference on Machine Learning (第 1278–1286 页)。

Riesenhuber & Poggio, 1999

Riesenhuber, M., 和 Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019–1025。

Rockafellar, 1970

Rockafellar, R. T. (1970). Convex Analysis。普林斯顿大学出版社。

Rolnick 等人, 2017

Rolnick, D.、Veit, A.、Belongie, S. 和 Shavit, N. (2017). Deep learning is robust to massive label noise. ArXiv:1705.10694

Rudin, 1973

Rudin, W. (1973). Functional Analysis。McGraw-Hill。

Rumelhart 等人, 1988

Rumelhart, D. E.、Hinton, G. E. 和 Williams, R. J. (1988). Learning representations by back-propagating errors. Cognitive Modeling, 5(3), 1。

Russakovsky 等人, 2013

Russakovsky, O.、Deng, J.、Huang, Z.、Berg, A. C. 和 Fei-Fei, L. (2013). Detecting avocados to zucchinis: what have we done, and where are we going? International Conference on Computer Vision (ICCV)

Russakovsky 等人, 2015

Russakovsky, O.、Deng, J.、Su, H.、Krause, J.、Satheesh, S.、Ma, S. 等人 (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252。

Russell & Norvig, 2016

Russell, S. J., 和 Norvig, P. (2016). Artificial Intelligence: A Modern Approach。Pearson Education Limited。

Saharia 等人, 2022

Saharia, C.、Chan, W.、Saxena, S.、Li, L.、Whang, J.、Denton, E. 等人 (2022). Photorealistic text-to-image diffusion models with deep language understanding. ArXiv:2205.11487

Salinas 等人, 2022

Salinas, D.、Seeger, M.、Klein, A.、Perrone, V.、Wistuba, M. 和 Archambeau, C. (2022). Syne Tune: a library for large scale hyperparameter tuning and reproducible research. First Conference on Automated Machine Learning

Sanh 等人, 2019

Sanh, V.、Debut, L.、Chaumond, J. 和 Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv:1910.01108

Sanh 等人, 2021

Sanh, V.、Webson, A.、Raffel, C.、Bach, S. H.、Sutawika, L.、Alyafeai, Z. 等人 (2021). Multitask prompted training enables zero-shot task generalization. ArXiv:2110.08207

Santurkar 等人, 2018

Santurkar, S.、Tsipras, D.、Ilyas, A. 和 Madry, A. (2018). How does batch normalization help optimization? Advances in Neural Information Processing Systems (第 2483–2493 页)。

Sarwar 等人, 2001

Sarwar, B. M.、Karypis, G.、Konstan, J. A. 和 Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. Proceedings of 10th International Conference on World Wide Web (第 285–295 页)。

Scao 等人, 2022

Scao, T. L.、Fan, A.、Akiki, C.、Pavlick, E.、Ilić, S.、Hesslow, D. 等人 (2022). BLOOM: a 176B-parameter open-access multilingual language model. ArXiv:2211.05100

Schein 等人, 2002

Schein, A. I.、Popescul, A.、Ungar, L. H. 和 Pennock, D. M. (2002). Methods and metrics for cold-start recommendations. Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (第 253–260 页)。

Schuhmann 等人, 2022

Schuhmann, C.、Beaumont, R.、Vencu, R.、Gordon, C.、Wightman, R.、Cherti, M. 等人 (2022). LAION-5B: an open large-scale dataset for training next generation image-text models. ArXiv:2210.08402

Schuster & Paliwal, 1997

Schuster, M., 和 Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681。

Scholkopf 等人, 2001

Schölkopf, B.、Herbrich, R. 和 Smola, A. J. (2001). Helmbold, D. P., 和 Williamson, B. (编辑). A generalized representer theorem. Proceedings of the Annual Conference on Computational Learning Theory (第 416–426 页)。Springer-Verlag。

Scholkopf 等人, 1996

Schölkopf, B.、Burges, C. 和 Vapnik, V. (1996). Incorporating invariances in support vector learning machines. International Conference on Artificial Neural Networks (第 47–52 页)。

Scholkopf & Smola, 2002

Schölkopf, B., 和 Smola, A. J. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond。麻省理工学院出版社。

Sedhain 等人, 2015

Sedhain, S.、Menon, A. K.、Sanner, S. 和 Xie, L. (2015). Autorec: autoencoders meet collaborative filtering. Proceedings of the 24th International Conference on World Wide Web (第 111–112 页)。

Sennrich 等人, 2015

Sennrich, R.、Haddow, B. 和 Birch, A. (2015). Neural machine translation of rare words with subword units. ArXiv:1508.07909

Sergeev & DelBalso, 2018

Sergeev, A., 和 Del Balso, M. (2018). Horovod: fast and easy distributed deep learning in TensorFlow. ArXiv:1802.05799

Shannon, 1948

Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423。

Shao 等人, 2020

Shao, H.、Yao, S.、Sun, D.、Zhang, A.、Liu, S.、Liu, D. 等人 (2020). ControlVAE: controllable variational autoencoder. Proceedings of the 37th International Conference on Machine Learning

Shaw 等人, 2018

Shaw, P.、Uszkoreit, J. 和 Vaswani, A. (2018). Self-attention with relative position representations. ArXiv:1803.02155

Shoeybi 等人, 2019

Shoeybi, M.、Patwary, M.、Puri, R.、LeGresley, P.、Casper, J. 和 Catanzaro, B. (2019). Megatron-LM: training multi-billion parameter language models using model parallelism. ArXiv:1909.08053

Silver 等人, 2016

Silver, D.、Huang, A.、Maddison, C. J.、Guez, A.、Sifre, L.、Van Den Driessche, G. 等人 (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484。

Silverman, 1986

Silverman, B. W. (1986). Density Estimation for Statistical and Data Analysis。Chapman and Hall。

Simard 等人, 1998

Simard, P. Y.、LeCun, Y. A.、Denker, J. S. 和 Victorri, B. (1998). Transformation invariance in pattern recognition – tangent distance and tangent propagation. Neural Networks: Tricks of the Trade (第 239–274 页)。Springer。

Simonyan & Zisserman, 2014

Simonyan, K., 和 Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. ArXiv:1409.1556

Sindhwani 等人, 2015

Sindhwani, V.、Sainath, T. N. 和 Kumar, S. (2015). Structured transforms for small-footprint deep learning. ArXiv:1510.01722

Sivic & Zisserman, 2003

Sivic, J., 和 Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. Proceedings of the IEEE International Conference on Computer Vision (第 1470–1470 页)。

Smith 等人, 2022

Smith, S.、Patwary, M.、Norick, B.、LeGresley, P.、Rajbhandari, S.、Casper, J. 等人 (2022). Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B, a large-scale generative language model. ArXiv:2201.11990

Smola & Narayanamurthy, 2010

Smola, A., 和 Narayanamurthy, S. (2010). An architecture for parallel topic models. Proceedings of the VLDB Endowment, 3(1-2), 703–710。

Snoek 等人, 2012

Snoek, J.、Larochelle, H. 和 Adams, R. (2012). Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems 25 (第 2951–2959 页)。

Sohl-Dickstein 等人, 2015

Sohl-Dickstein, J.、Weiss, E.、Maheswaranathan, N. 和 Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. International Conference on Machine Learning (第 2256–2265 页)。

Song & Ermon, 2019

Song, Y., 和 Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32

Song 等人, 2021

Song, Y.、Sohl-Dickstein, J.、Kingma, D. P.、Kumar, A.、Ermon, S. 和 Poole, B. (2021). Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations

Speelpenning, 1980

Speelpenning, B. (1980). Compiling fast partial derivatives of functions given by algorithms (博士论文)。伊利诺伊大学厄巴纳-香槟分校。

Srivastava 等人, 2022

Srivastava, A.、Rastogi, A.、Rao, A.、Shoeb, A. A. M.、Abid, A.、Fisch, A. 等人 (2022). Beyond the imitation game: quantifying and extrapolating the capabilities of language models. ArXiv:2206.04615

Srivastava 等人, 2014

Srivastava, N.、Hinton, G.、Krizhevsky, A.、Sutskever, I. 和 Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958。

Srivastava 等人, 2015

Srivastava, R. K.、Greff, K. 和 Schmidhuber, J. (2015). Highway networks. ArXiv:1505.00387

Strang, 1993

Strang, G. (1993). Introduction to Linear Algebra。Wellesley–Cambridge Press。

Su & Khoshgoftaar, 2009

Su, X., 和 Khoshgoftaar, T. M. (2009). A survey of collaborative filtering techniques. Advances in Artificial Intelligence, 2009

Sukhbaatar 等人, 2015

Sukhbaatar, S.、Weston, J. 和 Fergus, R. (2015). End-to-end memory networks. Advances in Neural Information Processing Systems (第 2440–2448 页)。

Sutskever 等人, 2013

Sutskever, I.、Martens, J.、Dahl, G. 和 Hinton, G. (2013). On the importance of initialization and momentum in deep learning. International Conference on Machine Learning (第 1139–1147 页)。

Sutskever 等人, 2014

Sutskever, I.、Vinyals, O. 和 Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems (第 3104–3112 页)。

Szegedy 等人, 2017

Szegedy, C.、Ioffe, S.、Vanhoucke, V. 和 Alemi, A. A. (2017). Inception-v4, Inception-ResNet and the impact of residual connections on learning. 31st AAAI Conference on Artificial Intelligence

Szegedy 等人, 2015

Szegedy, C.、Liu, W.、Jia, Y.、Sermanet, P.、Reed, S.、Anguelov, D. 等人 (2015). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (第 1–9 页)。

Szegedy 等人, 2016

Szegedy, C.、Vanhoucke, V.、Ioffe, S.、Shlens, J. 和 Wojna, Z. (2016). Rethinking the Inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (第 2818–2826 页)。

Tallec & Ollivier, 2017

Tallec, C., 和 Ollivier, Y. (2017). Unbiasing truncated backpropagation through time. ArXiv:1705.08209

Tan & Le, 2019

Tan, M., 和 Le, Q. (2019). EfficientNet: rethinking model scaling for convolutional neural networks. International Conference on Machine Learning (第 6105–6114 页)。

Tang & Wang, 2018

Tang, J., 和 Wang, K. (2018). Personalized top-n sequential recommendation via convolutional sequence embedding. Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (第 565–573 页)。

Taskar 等人, 2004

Taskar, B.、Guestrin, C. 和 Koller, D. (2004). Max-margin Markov networks. Advances in Neural Information Processing Systems, 16, 25。

Tay 等人, 2020

Tay, Y.、Dehghani, M.、Bahri, D. 和 Metzler, D. (2020). Efficient transformers: a survey. ArXiv:2009.06732

Taylor 等人, 2022

Taylor, R.、Kardas, M.、Cucurull, G.、Scialom, T.、Hartshorn, A.、Saravia, E. 等人 (2022). Galactica: a large language model for science. ArXiv:2211.09085

Teye 等人, 2018

Teye, M.、Azizpour, H. 和 Smith, K. (2018). Bayesian uncertainty estimation for batch normalized deep networks. ArXiv:1802.06455

Thomee 等人, 2016

Thomee, B.、Shamma, D. A.、Friedland, G.、Elizalde, B.、Ni, K.、Poland, D. 等人 (2016). Yfcc100m: the new data in multimedia research. Communications of the ACM, 59(2), 64–73。

Tieleman & Hinton, 2012

Tieleman, T., 和 Hinton, G. (2012). Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, Lecture 6.5-rmsprop

Tikhonov & Arsenin, 1977

Tikhonov, A. N., 和 Arsenin, V. Y. (1977). Solutions of Ill-Posed Problems。W.H. Winston。

Tolstikhin 等人, 2021

Tolstikhin, I. O.、Houlsby, N.、Kolesnikov, A.、Beyer, L.、Zhai, X.、Unterthiner, T. 等人 (2021). MLP-mixer: an all-MLP architecture for vision. Advances in Neural Information Processing Systems, 34

Torralba 等人, 2008

Torralba, A.、Fergus, R. 和 Freeman, W. T. (2008). 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1958–1970。

Touvron 等人, 2021

Touvron, H.、Cord, M.、Douze, M.、Massa, F.、Sablayrolles, A. 和 Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. International Conference on Machine Learning (第 10347–10357 页)。

Touvron 等人, 2023a

Touvron, H.、Lavril, T.、Izacard, G.、Martinet, X.、Lachaux, M.-A.、Lacroix, T. 等人 (2023a). LLaMA: open and efficient foundation language models. ArXiv:2302.13971

Touvron 等人, 2023b

Touvron, H.、Martin, L.、Stone, K.、Albert, P.、Almahairi, A.、Babaei, Y. 等人 (2023b). LLaMA 2: open foundation and fine-tuned chat models. ArXiv:2307.09288

Tsoumakas & Katakis, 2007

Tsoumakas, G., 和 Katakis, I. (2007). Multi-label classification: an overview. International Journal of Data Warehousing and Mining, 3(3), 1–13。

Turing, 1950

Turing, A. (1950). Computing machinery and intelligence. Mind, 59(236), 433。

Toscher 等人, 2009

Töscher, A.、Jahrer, M. 和 Bell, R. M. (2009). The bigchaos solution to the Netflix grand prize

Uijlings 等人, 2013

Uijlings, J. R.、Van De Sande, K. E.、Gevers, T. 和 Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171。

Vapnik, 1995

Vapnik, V. (1995). The Nature of Statistical Learning Theory。纽约:Springer。

Vapnik, 1998

Vapnik, V. (1998). Statistical Learning Theory。纽约:John Wiley and Sons。

Vapnik & Chervonenkis, 1964

Vapnik, V., 和 Chervonenkis, A. (1964). A note on one class of perceptrons. Automation and Remote Control, 25

Vapnik & Chervonenkis, 1968

Vapnik, V., 和 Chervonenkis, A. (1968). Uniform convergence of frequencies of occurence of events to their probabilities. Dokl. Akad. Nauk SSSR, 181, 915-918。

Vapnik & Chervonenkis, 1971

Vapnik, V., 和 Chervonenkis, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl., 16(2), 264-281。

Vapnik & Chervonenkis, 1981

Vapnik, V., 和 Chervonenkis, A. (1981). The necessary and sufficient conditions for the uniform convergence of averages to their expected values. Teoriya Veroyatnostei i Ee Primeneniya, 26(3), 543-564。

Vapnik & Chervonenkis, 1991

Vapnik, V., 和 Chervonenkis, A. (1991). The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognition and Image Analysis, 1(3), 283-305。

Vapnik & Chervonenkis, 1974

Vapnik, V. N., 和 Chervonenkis, A. Y. (1974). Ordered risk minimization. Automation and Remote Control, 35, 1226–1235, 1403–1412。

Vapnik, 1992

Vapnik, V. (1992). Principles of risk minimization for learning theory. Advances in Neural Information Processing Systems (第 831–838 页)。

Vapnik 等人, 1994

Vapnik, V.、Levin, E. 和 Le Cun, Y. (1994). Measuring the VC-dimension of a learning machine. Neural Computation, 6(5), 851–876。

Vaswani 等人, 2017

Vaswani, A.、Shazeer, N.、Parmar, N.、Uszkoreit, J.、Jones, L.、Gomez, A. N. 等人 (2017). Attention is all you need. Advances in Neural Information Processing Systems (第 5998–6008 页)。

Wahba, 1990

Wahba, G. (1990). Spline Models for Observational Data。SIAM。

Waibel 等人, 1989

Waibel, A.、Hanazawa, T.、Hinton, G.、Shikano, K. 和 Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(3), 328–339。

Wang 等人, 2022

Wang, H.、Zhang, A.、Zheng, S.、Shi, X.、Li, M. 和 Wang, Z. (2022). Removing batch normalization boosts adversarial training. International Conference on Machine Learning (第 23433–23445 页)。

Wang 等人, 2018

Wang, L.、Li, M.、Liberty, E. 和 Smola, A. J. (2018). Optimal message scheduling for aggregation. Networks, 2(3), 2–3。

Wang 等人, 2019

Wang, Q.、Li, B.、Xiao, T.、Zhu, J.、Li, C.、Wong, D. F. 和 Chao, L. S. (2019). Learning deep transformer models for machine translation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (第 1810–1822 页)。

Wang 等人, 2023

Wang, X.、Wei, J.、Schuurmans, D.、Le, Q.、Chi, E. 和 Zhou, D. (2023). Self-consistency improves chain of thought reasoning in language models. International Conference on Learning Representations

Wang 等人, 2016

Wang, Y.、Davidson, A.、Pan, Y.、Wu, Y.、Riffel, A. 和 Owens, J. D. (2016). Gunrock: a high-performance graph processing library on the GPU. ACM SIGPLAN Notices (p. 11)。

Warstadt 等人, 2019

Warstadt, A.、Singh, A. 和 Bowman, S. R. (2019). Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 7, 625–641。

Wasserman, 2013

Wasserman, L. (2013). All of Statistics: A Concise Course in Statistical Inference。Springer。

Watkins & Dayan, 1992

Watkins, C. J., 和 Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292。

Watson, 1964

Watson, G. S. (1964). Smooth regression analysis. Sankhyā: The Indian Journal of Statistics, Series A, 第 359–372 页。

Wei 等人, 2021

Wei, J.、Bosma, M.、Zhao, V. Y.、Guu, K.、Yu, A. W.、Lester, B. 等人 (2021). Finetuned language models are zero-shot learners. ArXiv:2109.01652

Wei 等人, 2022a

Wei, J.、Tay, Y.、Bommasani, R.、Raffel, C.、Zoph, B.、Borgeaud, S. 等人 (2022). Emergent abilities of large language models. ArXiv:2206.07682

Wei 等人, 2022b

Wei, J.、Wang, X.、Schuurmans, D.、Bosma, M.、Chi, E.、Le, Q. 和 Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. ArXiv:2201.11903

Welling & Teh, 2011

Welling, M., 和 Teh, Y. W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. Proceedings of the 28th International Conference on Machine Learning (ICML-11) (第 681–688 页)。

Wengert, 1964

Wengert, R. E. (1964). A simple automatic derivative evaluation program. Communications of the ACM, 7(8), 463–464。

Werbos, 1990

Werbos, P. J. (1990). Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), 1550–1560。

Wigner, 1958

Wigner, E. P. (1958). On the distribution of the roots of certain symmetric matrices. Ann. Math. (第 325–327 页)。

Wilson & Izmailov, 2020

Wilson, A. G., 和 Izmailov, P. (2020). Bayesian deep learning and a probabilistic perspective of generalization. Advances in Neural Information Processing Systems, 33, 4697–4708。

Wistuba 等人, 2019

Wistuba, M.、Rawat, A. 和 Pedapati, T. (2019). A survey on neural architecture search. ArXiv:1905.01392 [cs.LG]

Wistuba 等人, 2018

Wistuba, M.、Schilling, N. 和 Schmidt-Thieme, L. (2018). Scalable Gaussian process-based transfer surrogates for hyperparameter optimization. Machine Learning, 108, 43–78。

Wolpert & Macready, 1995

Wolpert, D. H., 和 Macready, W. G. (1995). No free lunch theorems for search。技术报告 SFI-TR-95-02-010, Santa Fe Institute。

Wood 等人, 2011

Wood, F.、Gasthaus, J.、Archambeau, C.、James, L. 和 Teh, Y. W. (2011). The sequence memoizer. Communications of the ACM, 54(2), 91–98。

Wu 等人, 2018

Wu, B.、Wan, A.、Yue, X.、Jin, P.、Zhao, S.、Golmant, N. 等人 (2018). Shift: a zero flop, zero parameter alternative to spatial convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (第 9127–9135 页)。

Wu 等人, 2016

Wu, Y.、Schuster, M.、Chen, Z.、Le, Q. V.、Norouzi, M.、Macherey, W. 等人 (2016). Google's neural machine translation system: bridging the gap between human and machine translation. ArXiv:1609.08144

Xiao 等人, 2017

Xiao, H.、Rasul, K. 和 Vollgraf, R. (2017). Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. ArXiv:1708.07747

Xiao 等人, 2018

Xiao, L.、Bahri, Y.、Sohl-Dickstein, J.、Schoenholz, S. 和 Pennington, J. (2018). Dynamical isometry and a mean field theory of CNNs: how to train 10,000-layer vanilla convolutional neural networks. International Conference on Machine Learning (第 5393–5402 页)。

Xie 等人, 2017

Xie, S.、Girshick, R.、Dollár, P.、Tu, Z. 和 He, K. (2017). Aggregated residual transformations for deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (第 1492–1500 页)。

Xiong 等人, 2020

Xiong, R.、Yang, Y.、He, D.、Zheng, K.、Zheng, S.、Xing, C. 等人 (2020). On layer normalization in the transformer architecture. International Conference on Machine Learning (第 10524–10533 页)。

Xiong 等人, 2018

Xiong, W.、Wu, L.、Alleva, F.、Droppo, J.、Huang, X. 和 Stolcke, A. (2018). The Microsoft 2017 conversational speech recognition system. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (第 5934–5938 页)。

Yamaguchi 等人, 1990

Yamaguchi, K.、Sakamoto, K.、Akabane, T. 和 Fujimoto, Y. (1990). A neural network for speaker-independent isolated word recognition. First International Conference on Spoken Language Processing

Yang 等人, 2016

Yang, Z.、Hu, Z.、Deng, Y.、Dyer, C. 和 Smola, A. (2016). Neural machine translation with recurrent attention modeling. ArXiv:1607.05108

Yang 等人, 2015

Yang, Z.、Moczulski, M.、Denil, M.、De Freitas, N.、Smola, A.、Song, L. 和 Wang, Z. (2015). Deep fried convnets. Proceedings of the IEEE International Conference on Computer Vision (第 1476–1483 页)。

Ye 等人, 2011

Ye, M.、Yin, P.、Lee, W.-C. 和 Lee, D.-L. (2011). Exploiting geographical influence for collaborative point-of-interest recommendation. Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (第 325–334 页)。

You 等人, 2017

You, Y.、Gitman, I. 和 Ginsburg, B. (2017). Large batch training of convolutional networks. ArXiv:1708.03888

Yu 等人, 2022

Yu, J.、Xu, Y.、Koh, J. Y.、Luong, T.、Baid, G.、Wang, Z. 等人 (2022). Scaling autoregressive models for content-rich text-to-image generation. ArXiv:2206.10789

Zaheer 等人, 2018

Zaheer, M.、Reddi, S.、Sachan, D.、Kale, S. 和 Kumar, S. (2018). Adaptive methods for nonconvex optimization. Advances in Neural Information Processing Systems (第 9793–9803 页)。

Zeiler, 2012

Zeiler, M. D. (2012). ADADELTA: an adaptive learning rate method. ArXiv:1212.5701

Zeiler & Fergus, 2013

Zeiler, M. D., 和 Fergus, R. (2013). Stochastic pooling for regularization of deep convolutional neural networks. ArXiv:1301.3557

Zhang 等人, 2021a

Zhang, A.、Tay, Y.、Zhang, S.、Chan, A.、Luu, A. T.、Hui, S. C. 和 Fu, J. (2021). Beyond fully-connected layers with quaternions: parameterization of hypercomplex multiplications with 1/n parameters. International Conference on Learning Representations

Zhang 等人, 2021b

Zhang, C.、Bengio, S.、Hardt, M.、Recht, B. 和 Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3), 107–115。

Zhang 等人, 2019

Zhang, S.、Yao, L.、Sun, A. 和 Tay, Y. (2019). Deep learning based recommender system: a survey and new perspectives. ACM Computing Surveys, 52(1), 5。

Zhang 等人, 2022

Zhang, S.、Roller, S.、Goyal, N.、Artetxe, M.、Chen, M.、Chen, S. 等人 (2022). OPT: open pre-trained transformer language models. ArXiv:2205.01068

Zhang 等人, 1988

Zhang, W.、Tanida, J.、Itoh, K. 和 Ichioka, Y. (1988). Shift-invariant pattern recognition neural network and its optical architecture. Proceedings of Annual Conference of the Japan Society of Applied Physics

Zhang 等人, 2021c

Zhang, Y.、Sun, P.、Jiang, Y.、Yu, D.、Yuan, Z.、Luo, P. 等人 (2021). ByteTrack: multi-object tracking by associating every detection box. ArXiv:2110.06864

Zhang 等人, 2023a

Zhang, Z.、Zhang, A.、Li, M. 和 Smola, A. (2023). Automatic chain of thought prompting in large language models. International Conference on Learning Representations

Zhang 等人, 2023b

Zhang, Z.、Zhang, A.、Li, M.、Zhao, H.、Karypis, G. 和 Smola, A. (2023). Multimodal chain-of-thought reasoning in language models. ArXiv:2302.00923

Zhao 等人, 2019

Zhao, Z.-Q.、Zheng, P.、Xu, S.-t. 和 Wu, X. (2019). Object detection with deep learning: a review. IEEE Transactions on Neural Networks and Learning Systems, 30(11), 3212–3232。

Zhou 等人, 2023

Zhou, D.、Schärli, N.、Hou, L.、Wei, J.、Scales, N.、Wang, X. 等人 (2023). Least-to-most prompting enables complex reasoning in large language models. International Conference on Learning Representations

Zhu 等人, 2017

Zhu, J.-Y.、Park, T.、Isola, P. 和 Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE International Conference on Computer Vision (第 2223–2232 页)。

Zhu 等人, 2015

Zhu, Y.、Kiros, R.、Zemel, R.、Salakhutdinov, R.、Urtasun, R.、Torralba, A. 和 Fidler, S. (2015). Aligning books and movies: towards story-like visual explanations by watching movies and reading books. Proceedings of the IEEE International Conference on Computer Vision (第 19–27 页)。

Zoph & Le, 2016

Zoph, B., 和 Le, Q. V. (2016). Neural architecture search with reinforcement learning. ArXiv:1611.01578