参考文献¶

在 Colab 中打开 Notebook

在 Colab 中打开 Notebook

在 Colab 中打开 Notebook

在 Colab 中打开 Notebook

在 SageMaker Studio Lab 中打开 Notebook

Abadi et al., 2016: Abadi, M.、Barham, P.、Chen, J.、Chen, Z.、Davis, A.、Dean, J. 等 (2016)。TensorFlow: a system for large-scale machine learning。第 12 届 USENIX 操作系统设计与实现研讨会 (OSDI 16) (第 265–283 页)。
Abdel-Hamid et al., 2014: Abdel-Hamid, O.、Mohamed, A.-R.、Jiang, H.、Deng, L.、Penn, G. 和 Yu, D. (2014)。Convolutional neural networks for speech recognition。IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22(10), 1533–1545。
Ahmed et al., 2012: Ahmed, A.、Aly, M.、Gonzalez, J.、Narayanamurthy, S. 和 Smola, A. J. (2012)。Scalable inference in latent variable models。第五届 ACM 国际网络搜索与数据挖掘会议论文集 (第 123–132 页)。
Akiba et al., 2019: Akiba, T.、Sano, S.、Yanase, T.、Ohta, T. 和 Koyama, M. (2019)。Optuna: a next-generation hyperparameter optimization framework。第 25 届 ACM SIGKDD 国际知识发现与数据挖掘会议论文集。
Alayrac et al., 2022: Alayrac, J.-B.、Donahue, J.、Luc, P.、Miech, A.、Barr, I.、Hasson, Y. 等 (2022)。Flamingo: a visual language model for few-shot learning。ArXiv:2204.14198。
Alsallakh et al., 2020: Alsallakh, B.、Kokhlikyan, N.、Miglani, V.、Yuan, J. 和 Reblitz-Richardson, O. (2020)。Mind the PAD – CNNs can develop blind spots。ArXiv:2010.02178。
Anil et al., 2023: Anil, R.、Dai, A. M.、Firat, O.、Johnson, M.、Lepikhin, D.、Passos, A. 等 (2023)。PaLM 2 Technical Report。ArXiv:2305.10403。
Anil et al., 2020: Anil, R.、Gupta, V.、Koren, T.、Regan, K. 和 Singer, Y. (2020)。Scalable second-order optimization for deep learning。ArXiv:2002.09018。
Aronszajn, 1950: Aronszajn, N. (1950)。Theory of reproducing kernels。Transactions of the American Mathematical Society, 68(3), 337–404。
Ba et al., 2016: Ba, J. L.、Kiros, J. R. 和 Hinton, G. E. (2016)。Layer normalization。ArXiv:1607.06450。
Baevski & Auli, 2018: Baevski, A. 和 Auli, M. (2018)。Adaptive input representations for neural language modeling。国际学习表征会议。
Bahdanau et al., 2014: Bahdanau, D.、Cho, K. 和 Bengio, Y. (2014)。Neural machine translation by jointly learning to align and translate。ArXiv:1409.0473。
Bai et al., 2022: Bai, Y.、Kadavath, S.、Kundu, S.、Askell, A.、Kernion, J.、Jones, A. 等 (2022)。Constitutional AI: harmlessness from AI feedback。ArXiv:2212.08073。
Baptista & Poloczek, 2018: Baptista, R. 和 Poloczek, M. (2018)。Bayesian optimization of combinatorial structures。第 35 届国际机器学习会议论文集。
Bardenet et al., 2013: Bardenet, R.、Brendel, M.、Kégl, B. 和 Sebag, M. (2013)。Collaborative hyperparameter tuning。第 30 届国际机器学习会议论文集 (ICML'13)。
Bay et al., 2006: Bay, H.、Tuytelaars, T. 和 Van Gool, L. (2006)。SURF: Speeded up robust features。欧洲计算机视觉会议 (第 404–417 页)。
Bellman, 1966: Bellman, R. (1966)。Dynamic programming。Science, 153, 34–37。
Bellman, 1952: Bellman, R. (1952)。On the theory of dynamic programming。Proceedings of the National Academy of Sciences, 38(8), 716–719。
Bellman, 1957a: Bellman, R. (1957)。A Markovian decision process。Journal of Mathematics and Mechanics, 6(5), 679–684。URL: http://www.jstor.org/stable/24900506
Bellman, 1957b: Bellman, R. (1957)。Dynamic Programming。Dover Publications。
Beltagy et al., 2020: Beltagy, I.、Peters, M. E. 和 Cohan, A. (2020)。Longformer: the long-document transformer。ArXiv:2004.05150。
Bengio et al., 2003: Bengio, Y.、Ducharme, R.、Vincent, P. 和 Jauvin, C. (2003)。A neural probabilistic language model。Journal of Machine Learning Research, 3(Feb), 1137–1155。
Bengio et al., 1994: Bengio, Y.、Simard, P. 和 Frasconi, P. (1994)。Learning long-term dependencies with gradient descent is difficult。IEEE Transactions on Neural Networks, 5(2), 157–166。
Bergstra et al., 2011: Bergstra, J.、Bardenet, R.、Bengio, Y. 和 Kégl, B. (2011)。Algorithms for hyper-parameter optimization。Advances in Neural Information Processing Systems, 24。
Bergstra et al., 2010: Bergstra, J.、Breuleux, O.、Bastien, F.、Lamblin, P.、Pascanu, R.、Desjardins, G. … Bengio, Y. (2010)。Theano: a CPU and GPU math compiler in Python。Proc. 9th Python in Science Conference (第 3–10 页)。
Beutel et al., 2014: Beutel, A.、Murray, K.、Faloutsos, C. 和 Smola, A. J. (2014)。CoBaFi: collaborative Bayesian filtering。第 23 届国际万维网会议论文集 (第 97–108 页)。
Bishop, 1995: Bishop, C. M. (1995)。Training with noise is equivalent to Tikhonov regularization。Neural Computation, 7(1), 108–116。
Bishop, 2006: Bishop, C. M. (2006)。Pattern Recognition and Machine Learning。Springer。
Black & Scholes, 1973: Black, F. 和 Scholes, M. (1973)。The pricing of options and corporate liabilities。Journal of Political Economy, 81, 637–654。
Bodla et al., 2017: Bodla, N.、Singh, B.、Chellappa, R. 和 Davis, L. S. (2017)。Soft-NMS-improving object detection with one line of code。IEEE 国际计算机视觉会议论文集 (第 5561–5569 页)。
Bojanowski et al., 2017: Bojanowski, P.、Grave, E.、Joulin, A. 和 Mikolov, T. (2017)。Enriching word vectors with subword information。Transactions of the Association for Computational Linguistics, 5, 135–146。
Bollobas, 1999: Bollobás, B. (1999)。Linear Analysis。Cambridge University Press。
Bommasani et al., 2021: Bommasani, R.、Hudson, D. A.、Adeli, E.、Altman, R.、Arora, S.、von Arx, S. 等 (2021)。On the opportunities and risks of foundation models。ArXiv:2108.07258。
Bottou, 2010: Bottou, L. (2010)。Large-scale machine learning with stochastic gradient descent。COMPSTAT'2010 论文集 (第 177–186 页)。Springer。
Bottou & Le Cun, 1988: Bottou, L. 和 Le Cun, Y. (1988)。SN: a simulator for connectionist models。Proceedings of NeuroNimes 88 (第 371–382 页)。法国尼姆。URL: http://leon.bottou.org/papers/bottou-lecun-88
Boucheron et al., 2005: Boucheron, S.、Bousquet, O. 和 Lugosi, G. (2005)。Theory of classification: a survey of some recent advances。ESAIM: Probability and Statistics, 9, 323–375。
Bowman et al., 2015: Bowman, S. R.、Angeli, G.、Potts, C. 和 Manning, C. D. (2015)。A large annotated corpus for learning natural language inference。ArXiv:1508.05326。
Boyd & Vandenberghe, 2004: Boyd, S. 和 Vandenberghe, L. (2004)。Convex Optimization。英格兰剑桥: Cambridge University Press。
Bradley & Terry, 1952: Bradley, R. A. 和 Terry, M. E. (1952)。Rank analysis of incomplete block designs: I. The method of paired comparisons。Biometrika, 39(3/4), 324–345。
Brown & Sandholm, 2017: Brown, N. 和 Sandholm, T. (2017)。Libratus: the superhuman AI for no-limit poker。IJCAI (第 5226–5228 页)。
Brown et al., 1990: Brown, P. F.、Cocke, J.、Della Pietra, S. A.、Della Pietra, V. J.、Jelinek, F.、Lafferty, J. … Roossin, P. S. (1990)。A statistical approach to machine translation。Computational Linguistics, 16(2), 79–85。
Brown et al., 1988: Brown, P. F.、Cocke, J.、Della Pietra, S. A.、Della Pietra, V. J.、Jelinek, F.、Mercer, R. L. 和 Roossin, P. (1988)。A statistical approach to language translation。COLING Budapest 1988 Volume 1: International Conference on Computational Linguistics。
Brown et al., 2020: Brown, T.、Mann, B.、Ryder, N.、Subbiah, M.、Kaplan, J. D.、Dhariwal, P. 等 (2020)。Language models are few-shot learners。Advances in Neural Information Processing Systems, 33, 1877–1901。
Buslaev et al., 2020: Buslaev, A.、Iglovikov, V. I.、Khvedchenya, E.、Parinov, A.、Druzhinin, M. 和 Kalinin, A. A. (2020)。Albumentations: Fast and flexible image augmentations。Information, 11(2), 125。
Campbell et al., 2002: Campbell, M.、Hoane Jr, A. J. 和 Hsu, F.-h. (2002)。Deep blue。Artificial Intelligence, 134(1-2), 57–83。
Canny, 1987: Canny, J. (1987)。A computational approach to edge detection。Readings in Computer Vision (第 184–203 页)。Elsevier。
Cer et al., 2017: Cer, D.、Diab, M.、Agirre, E.、Lopez-Gazpio, I. 和 Specia, L. (2017)。SemEval-2017 Task 1: semantic textual similarity multilingual and crosslingual focused evaluation。第 11 届国际语义评估研讨会论文集 (SemEval-2017) (第 1–14 页)。
Chan et al., 2015: Chan, W.、Jaitly, N.、Le, Q. V. 和 Vinyals, O. (2015)。Listen, attend and spell。ArXiv:1508.01211。
Chen et al., 2021: Chen, L.、Lu, K.、Rajeswaran, A.、Lee, K.、Grover, A.、Laskin, M. … Mordatch, I. (2021)。Decision transformer: reinforcement learning via sequence modeling。Advances in Neural Information Processing Systems, 34, 15084–15097。
Chen et al., 2015: Chen, T.、Li, M.、Li, Y.、Lin, M.、Wang, N.、Wang, M. … Zhang, Z. (2015)。MXNET: a flexible and efficient machine learning library for heterogeneous distributed systems。ArXiv:1512.01274。
Cheng et al., 2016: Cheng, J.、Dong, L. 和 Lapata, M. (2016)。Long short-term memory-networks for machine reading。2016 年自然语言处理经验方法会议论文集 (第 551–561 页)。
Chetlur et al., 2014: Chetlur, S.、Woolley, C.、Vandermersch, P.、Cohen, J.、Tran, J.、Catanzaro, B. 和 Shelhamer, E. (2014)。CuDNN: Efficient primitives for deep learning。ArXiv:1410.0759。
Cho et al., 2014a: Cho, K.、Van Merriënboer, B.、Bahdanau, D. 和 Bengio, Y. (2014)。On the properties of neural machine translation: Encoder–decoder approaches。ArXiv:1409.1259。
Cho et al., 2014b: Cho, K.、Van Merriënboer, B.、Gulcehre, C.、Bahdanau, D.、Bougares, F.、Schwenk, H. 和 Bengio, Y. (2014)。Learning phrase representations using RNN encoder–decoder for statistical machine translation。ArXiv:1406.1078。
Chowdhery et al., 2022: Chowdhery, A.、Narang, S.、Devlin, J.、Bosma, M.、Mishra, G.、Roberts, A. 等 (2022)。PaLM: scaling language modeling with pathways。ArXiv:2204.02311。
Chung et al., 2014: Chung, J.、Gulcehre, C.、Cho, K. 和 Bengio, Y. (2014)。Empirical evaluation of gated recurrent neural networks on sequence modeling。ArXiv:1412.3555。
Clark et al., 2020: Clark, K.、Luong, M.-T.、Le, Q. V. 和 Manning, C. D. (2020)。ELECTRA: pre-training text encoders as discriminators rather than generators。国际学习表征会议。
Collobert et al., 2011: Collobert, R.、Weston, J.、Bottou, L.、Karlen, M.、Kavukcuoglu, K. 和 Kuksa, P. (2011)。Natural language processing (almost) from scratch。Journal of Machine Learning Research, 12, 2493–2537。
Cordonnier et al., 2020: Cordonnier, J.-B.、Loukas, A. 和 Jaggi, M. (2020)。On the relationship between self-attention and convolutional layers。国际学习表征会议。
Cover & Thomas, 1999: Cover, T. 和 Thomas, J. (1999)。Elements of Information Theory。John Wiley & Sons。
Csiszar, 2008: Csiszár, I. (2008)。Axiomatic characterizations of information measures。Entropy, 10(3), 261–273。
Cybenko, 1989: Cybenko, G. (1989)。Approximation by superpositions of a sigmoidal function。Mathematics of Control, Signals and Systems, 2(4), 303–314。
Dalal & Triggs, 2005: Dalal, N. 和 Triggs, B. (2005)。Histograms of oriented gradients for human detection。2005 IEEE 计算机学会计算机视觉与模式识别会议 (CVPR'05) (第 886–893 页)。
DeCock, 2011: De Cock, D. (2011)。Ames, Iowa: alternative to the Boston housing data as an end of semester regression project。Journal of Statistics Education, 19(3)。
Dean et al., 2012: Dean, J.、Corrado, G. S.、Monga, R.、Chen, K.、Devin, M.、Le, Q. V. 等 (2012)。Large scale distributed deep networks。第 25 届国际神经信息处理系统会议论文集, 第 1 卷 (第 1223–1231 页)。
DeCandia et al., 2007: DeCandia, G.、Hastorun, D.、Jampani, M.、Kakulapati, G.、Lakshman, A.、Pilchin, A. … Vogels, W. (2007)。Dynamo: Amazon's highly available key-value store。ACM SIGOPS Operating Systems Review (第 205–220 页)。
Deng et al., 2009: Deng, J.、Dong, W.、Socher, R.、Li, L.-J.、Li, K. 和 Fei-Fei, L. (2009)。Imagenet: a large-scale hierarchical image database。2009 IEEE 计算机视觉与模式识别会议 (第 248–255 页)。
DerKiureghian & Ditlevsen, 2009: Der Kiureghian, A. 和 Ditlevsen, O. (2009)。Aleatory or epistemic? does it matter?。Structural Safety, 31(2), 105–112。
Devlin et al., 2018: Devlin, J.、Chang, M.-W.、Lee, K. 和 Toutanova, K. (2018)。BERT: Pre-training of deep bidirectional transformers for language understanding。ArXiv:1810.04805。
Dinh et al., 2014: Dinh, L.、Krueger, D. 和 Bengio, Y. (2014)。NICE: non-linear independent components estimation。ArXiv:1410.8516。
Dinh et al., 2017: Dinh, L.、Sohl-Dickstein, J. 和 Bengio, S. (2017)。Density estimation using real NVP。国际学习表征会议。
Doersch et al., 2015: Doersch, C.、Gupta, A. 和 Efros, A. A. (2015)。Unsupervised visual representation learning by context prediction。IEEE 国际计算机视觉会议论文集 (第 1422–1430 页)。
Dosovitskiy et al., 2021: Dosovitskiy, A.、Beyer, L.、Kolesnikov, A.、Weissenborn, D.、Zhai, X.、Unterthiner, T. 等 (2021)。An image is worth 16 x 16 words: transformers for image recognition at scale。国际学习表征会议。
Duchi et al., 2011: Duchi, J.、Hazan, E. 和 Singer, Y. (2011)。Adaptive subgradient methods for online learning and stochastic optimization。Journal of Machine Learning Research, 12, 2121–2159。
Dumoulin & Visin, 2016: Dumoulin, V. 和 Visin, F. (2016)。A guide to convolution arithmetic for deep learning。ArXiv:1603.07285。
Dwivedi & Bresson, 2020: Dwivedi, V. P. 和 Bresson, X. (2020)。A generalization of transformer networks to graphs。ArXiv:2012.09699。
Dwork et al., 2015: Dwork, C.、Feldman, V.、Hardt, M.、Pitassi, T.、Reingold, O. 和 Roth, A. L. (2015)。Preserving statistical validity in adaptive data analysis。第 47 届年度 ACM 计算理论研讨会论文集 (第 117–126 页)。
Elman, 1990: Elman, J. L. (1990)。Finding structure in time。Cognitive Science, 14(2), 179–211。
Elsken et al., 2018: Elsken, T.、Metzen, J. H. 和 Hutter, F. (2018)。Neural architecture search: a ssurvey。ArXiv:1808.05377 [stat.ML]。
Fechner, 1860: Fechner, G. T. (1860)。Elemente der Psychophysik。第 2 卷。Breitkopf u. Härtel。
Fedus et al., 2022: Fedus, W.、Zoph, B. 和 Shazeer, N. (2022)。Switch transformers: scaling to trillion parameter models with simple and efficient sparsity。Journal of Machine Learning Research, 23(120), 1–39。
Fernando, 2004: Fernando, R. (2004)。GPU Gems: Programming Techniques, Tips, and Tricks for Real-Time Graphics。Addison-Wesley。
Feurer & Hutter, 2018: Feurer, M. 和 Hutter, F. (2018)。Hyperparameter ptimization。Automatic Machine Learning: Methods, Systems, Challenges。Springer。
Feurer et al., 2022: Feurer, M.、Letham, B.、Hutter, F. 和 Bakshy, E. (2022)。Practical transfer learning for Bayesian optimization。ArXiv:1802.02219 [stat.ML]。
Field, 1987: Field, D. J. (1987)。Relations between the statistics of natural images and the response properties of cortical cells。JOSA A, 4(12), 2379–2394。
Fisher, 1925: Fisher, R. A. (1925)。Statistical Methods for Research Workers. Oliver & Boyd。
Flammarion & Bach, 2015: Flammarion, N. 和 Bach, F. (2015)。From averaging to acceleration, there is only a step-size。Conference on Learning Theory (第 658–695 页)。
Forrester et al., 2007: Forrester, A. I.、Sóbester, A. 和 Keane, A. J. (2007)。Multi-fidelity optimization via surrogate modelling。Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 463(2088), 3251–3269。
Franceschi et al., 2017: Franceschi, L.、Donini, M.、Frasconi, P. 和 Pontil, M. (2017)。Forward and reverse gradient-based hyperparameter optimization。第 34 届国际机器学习会议论文集 (ICML'17)。
Frankle & Carbin, 2018: Frankle, J. 和 Carbin, M. (2018)。The lottery ticket hypothesis: finding sparse, trainable neural networks。ArXiv:1803.03635。
Frazier, 2018: Frazier, P. I. (2018)。A tutorial on Bayesian optimization。ArXiv:1807.02811。
Freund & Schapire, 1996: Freund, Y. 和 Schapire, R. E. (1996)。Experiments with a new boosting algorithm。国际机器学习会议论文集 (第 148–156 页)。
Friedman, 1987: Friedman, J. H. (1987)。Exploratory projection pursuit。Journal of the American Statistical Association, 82(397), 249–266。
Frostig et al., 2018: Frostig, R.、Johnson, M. J. 和 Leary, C. (2018)。Compiling machine learning programs via high-level tracing。Proceedings of Systems for Machine Learning。
Fukushima, 1982: Fukushima, K. (1982)。Neocognitron: a self-organizing neural network model for a mechanism of visual pattern recognition。Competition and Cooperation in Neural Nets (第 267–285 页)。Springer。
Gardner et al., 2018: Gardner, J.、Pleiss, G.、Weinberger, K. Q.、Bindel, D. 和 Wilson, A. G. (2018)。GPyTorch: blackbox matrix–matrix Gaussian process inference with GPU acceleration。Advances in Neural Information Processing Systems。
Garg et al., 2021: Garg, S.、Balakrishnan, S.、Kolter, Z. 和 Lipton, Z. (2021)。RATT: leveraging unlabeled data to guarantee generalization。国际机器学习会议 (第 3598–3609 页)。
Gatys et al., 2016: Gatys, L. A.、Ecker, A. S. 和 Bethge, M. (2016)。Image style transfer using convolutional neural networks。IEEE 计算机视觉与模式识别会议论文集 (第 2414–2423 页)。
Gauss, 1809: Gauss, C. F. (1809)。Theoria motus corporum coelestum。Werke。Königlich Preussische Akademie der Wissenschaften。
Gibbs, 1902: Gibbs, J. W. (1902)。Elementary Principles of Statistical Mhanics。Scribner's。
Ginibre, 1965: Ginibre, J. (1965)。Statistical ensembles of complex, quaternion, and real matrices。Journal of Mathematical Physics, 6(3), 440–449。
Girshick, 2015: Girshick, R. (2015)。Fast R-CNN。IEEE 国际计算机视觉会议论文集 (第 1440–1448 页)。
Girshick et al., 2014: Girshick, R.、Donahue, J.、Darrell, T. 和 Malik, J. (2014)。Rich feature hierarchies for accurate object detection and semantic segmentation。IEEE 计算机视觉与模式识别会议论文集 (第 580–587 页)。
Glorot & Bengio, 2010: Glorot, X. 和 Bengio, Y. (2010)。Understanding the difficulty of training deep feedforward neural networks。第 13 届国际人工智能与统计学会议论文集 (第 249–256 页)。
Goh, 2017: Goh, G. (2017)。Why momentum really works。Distill。URL: http://distill.pub/2017/momentum
Goldberg et al., 1992: Goldberg, D.、Nichols, D.、Oki, B. M. 和 Terry, D. (1992)。Using collaborative filtering to weave an information tapestry。Communications of the ACM, 35(12), 61–71。
Golub & VanLoan, 1996: Golub, G. H. 和 Van Loan, C. F. (1996)。Matrix Computations。Johns Hopkins University Press。
Goodfellow et al., 2016: Goodfellow, I.、Bengio, Y. 和 Courville, A. (2016)。Deep Learning。MIT Press。http://www.deeplearningbook.org。
Goodfellow et al., 2014: Goodfellow, I.、Pouget-Abadie, J.、Mirza, M.、Xu, B.、Warde-Farley, D.、Ozair, S. … Bengio, Y. (2014)。Generative adversarial nets。Advances in Neural Information Processing Systems (第 2672–2680 页)。
Gotmare et al., 2018: Gotmare, A.、Keskar, N. S.、Xiong, C. 和 Socher, R. (2018)。A closer look at deep learning heuristics: learning rate restarts, warmup and distillation。ArXiv:1810.13243。
Goyal et al., 2021: Goyal, A.、Bochkovskiy, A.、Deng, J. 和 Koltun, V. (2021)。Non-deep networks。ArXiv:2110.07641。
Graham, 2014: Graham, B. (2014)。Fractional max-pooling。ArXiv:1412.6071。
Graves, 2013: Graves, A. (2013)。Generating sequences with recurrent neural networks。ArXiv:1308.0850。
Graves et al., 2008: Graves, A.、Liwicki, M.、Fernández, S.、Bertolami, R.、Bunke, H. 和 Schmidhuber, J. (2008)。A novel connectionist system for unconstrained handwriting recognition。IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5), 855–868。
Graves & Schmidhuber, 2005: Graves, A. 和 Schmidhuber, J. (2005)。Framewise phoneme classification with bidirectional LSTM and other neural network architectures。Neural Networks, 18(5-6), 602–610。
Griewank, 1989: Griewank, A. (1989)。On automatic differentiation。Mathematical Programming: Recent Developments and Applications (第 83–107 页)。Kluwer。
Gulati et al., 2020: Gulati, A.、Qin, J.、Chiu, C.-C.、Parmar, N.、Zhang, Y.、Yu, J. 等 (2020)。Conformer: convolution-augmented transformer for speech recognition。Proc. Interspeech 2020, 第 5036–5040 页。
Gunawardana & Shani, 2015: Gunawardana, A. 和 Shani, G. (2015)。Evaluating recommender systems。Recommender Systems Handbook (第 265–308 页)。Springer。
Guo et al., 2017: Guo, H.、Tang, R.、Ye, Y.、Li, Z. 和 He, X. (2017)。Deepfm: a factorization-machine based neural network for ctr prediction。第 26 届国际人工智能联合会议论文集 (第 1725–1731 页)。
Guyon et al., 2008: Guyon, I.、Gunn, S.、Nikravesh, M. 和 Zadeh, L. A. (2008)。Feature Extraction: Foundations and Applications。Springer。
Hadjis et al., 2016: Hadjis, S.、Zhang, C.、Mitliagkas, I.、Iter, D. 和 Ré, C. (2016)。Omnivore: an optimizer for multi-device deep learning on CPUs and GPUs。ArXiv:1606.04487。
Hartley & Zisserman, 2000: Hartley, R. 和 Zisserman, A. (2000)。Multiple View Geometry in Computer Vision。Cambridge University Press。
Hartley & Kahl, 2009: Hartley, R. I. 和 Kahl, F. (2009)。Global optimization through rotation space search。International Journal of Computer Vision, 82(1), 64–79。
He et al., 2022: He, K.、Chen, X.、Xie, S.、Li, Y.、Dollár, P. 和 Girshick, R. (2022)。Masked autoencoders are scalable vision learners。IEEE/CVF 计算机视觉与模式识别会议论文集 (第 16000–16009 页)。
He et al., 2017a: He, K.、Gkioxari, G.、Dollár, P. 和 Girshick, R. (2017)。Mask R-CNN。IEEE 国际计算机视觉会议论文集 (第 2961–2969 页)。
He et al., 2015: He, K.、Zhang, X.、Ren, S. 和 Sun, J. (2015)。Delving deep into rectifiers: surpassing human-level performance on ImageNet classification。IEEE 国际计算机视觉会议论文集 (第 1026–1034 页)。
He et al., 2016a: He, K.、Zhang, X.、Ren, S. 和 Sun, J. (2016)。Deep residual learning for image recognition。IEEE 计算机视觉与模式识别会议论文集 (第 770–778 页)。
He et al., 2016b: He, K.、Zhang, X.、Ren, S. 和 Sun, J. (2016)。Identity mappings in deep residual networks。欧洲计算机视觉会议 (第 630–645 页)。
He & Chua, 2017: He, X. 和 Chua, T.-S. (2017)。Neural factorization machines for sparse predictive analytics。第 40 届国际 ACM SIGIR 信息检索研究与发展会议论文集 (第 355–364 页)。
He et al., 2017b: He, X.、Liao, L.、Zhang, H.、Nie, L.、Hu, X. 和 Chua, T.-S. (2017)。Neural collaborative filtering。第 26 届国际万维网会议论文集 (第 173–182 页)。
Hebb, 1949: Hebb, D. O. (1949)。The Organization of Behavior。Wiley。
Hendrycks & Gimpel, 2016: Hendrycks, D. 和 Gimpel, K. (2016)。Gaussian error linear units (GELUs)。ArXiv:1606.08415。
Hennessy & Patterson, 2011: Hennessy, J. L. 和 Patterson, D. A. (2011)。Computer Architecture: A Quantitative Approach。Elsevier。
Herlocker et al., 1999: Herlocker, J. L.、Konstan, J. A.、Borchers, A. 和 Riedl, J. (1999)。An algorithmic framework for performing collaborative filtering。第 22 届年度国际 ACM 信息检索研究与发展会议, SIGIR 1999 (第 230–237 页)。
Hidasi et al., 2015: Hidasi, B.、Karatzoglou, A.、Baltrunas, L. 和 Tikk, D. (2015)。Session-based recommendations with recurrent neural networks。ArXiv:1511.06939。
Ho et al., 2020: Ho, J.、Jain, A. 和 Abbeel, P. (2020)。Denoising diffusion probabilistic models。Advances in Neural Information Processing Systems, 33, 6840–6851。
Hochreiter et al., 2001: Hochreiter, S.、Bengio, Y.、Frasconi, P. 和 Schmidhuber, J. (2001)。Gradient flow in recurrent nets: the difficulty of learning long-term dependencies。A Field Guide to Dynamical Recurrent Neural Networks。IEEE Press。
Hochreiter & Schmidhuber, 1997: Hochreiter, S. 和 Schmidhuber, J. (1997)。Long short-term memory。Neural Computation, 9(8), 1735–1780。
Hoffmann et al., 2022: Hoffmann, J.、Borgeaud, S.、Mensch, A.、Buchatskaya, E.、Cai, T.、Rutherford, E. 等 (2022)。Training compute-optimal large language models。ArXiv:2203.15556。
Howard et al., 2019: Howard, A.、Sandler, M.、Chu, G.、Chen, L.-C.、Chen, B.、Tan, M. … Adam, H. (2019)。Searching for MobileNetV3。IEEE/CVF 国际计算机视觉会议论文集 (第 1314–1324 页)。
Hoyer et al., 2009: Hoyer, P. O.、Janzing, D.、Mooij, J. M.、Peters, J. 和 Schölkopf, B. (2009)。Nonlinear causal discovery with additive noise models。Advances in Neural Information Processing Systems (第 689–696 页)。
Hu et al., 2018: Hu, J.、Shen, L. 和 Sun, G. (2018)。Squeeze-and-excitation networks。IEEE 计算机视觉与模式识别会议论文集 (第 7132–7141 页)。
Hu et al., 2008: Hu, Y.、Koren, Y. 和 Volinsky, C. (2008)。Collaborative filtering for implicit feedback datasets。2008 第 8 届 IEEE 国际数据挖掘会议 (第 263–272 页)。
Hu et al., 2022: Hu, Z.、Lee, R. K.-W.、Aggarwal, C. C. 和 Zhang, A. (2022)。Text style transfer: a review and experimental evaluation。SIGKDD Explor. Newsl., 24(1)。URL: https://doi.org/10.1145/3544903.3544906
Huang et al., 2018: Huang, C.-Z. A.、Vaswani, A.、Uszkoreit, J.、Simon, I.、Hawthorne, C.、Shazeer, N. … Eck, D. (2018)。Music transformer: generating music with long-term structure。国际学习表征会议。
Huang et al., 2017: Huang, G.、Liu, Z.、Van Der Maaten, L. 和 Weinberger, K. Q. (2017)。Densely connected convolutional networks。IEEE 计算机视觉与模式识别会议论文集 (第 4700–4708 页)。
Huang et al., 2015: Huang, Z.、Xu, W. 和 Yu, K. (2015)。Bidirectional LSTM–CRF models for sequence tagging。ArXiv:1508.01991。
Hubel & Wiesel, 1959: Hubel, D. H. 和 Wiesel, T. N. (1959)。Receptive fields of single neurones in the cat's striate cortex。Journal of Physiology, 148(3), 574–591。
Hubel & Wiesel, 1962: Hubel, D. H. 和 Wiesel, T. N. (1962)。Receptive fields, binocular interaction and functional architecture in the cat's visual cortex。Journal of Physiology, 160(1), 106–154。
Hubel & Wiesel, 1968: Hubel, D. H. 和 Wiesel, T. N. (1968)。Receptive fields and functional architecture of monkey striate cortex。Journal of Physiology, 195(1), 215–243。
Hutter et al., 2011: Hutter, F.、Hoos, H. 和 Leyton-Brown, K. (2011)。Sequential model-based optimization for general algorithm configuration。第五届国际学习与智能优化会议论文集 (LION'11)。
Hutter et al., 2019: Hutter, F.、Kotthoff, L. 和 Vanschoren, J. (编) (2019)。Automated Machine Learning: Methods, Systems, Challenges。Springer。
Ioffe, 2017: Ioffe, S. (2017)。Batch renormalization: towards reducing minibatch dependence in batch-normalized models。Advances in Neural Information Processing Systems (第 1945–1953 页)。
Ioffe & Szegedy, 2015: Ioffe, S. 和 Szegedy, C. (2015)。Batch normalization: accelerating deep network training by reducing internal covariate shift。ArXiv:1502.03167。
Izmailov et al., 2018: Izmailov, P.、Podoprikhin, D.、Garipov, T.、Vetrov, D. 和 Wilson, A. G. (2018)。Averaging weights leads to wider optima and better generalization。ArXiv:1803.05407。
Jacot et al., 2018: Jacot, A.、Gabriel, F. 和 Hongler, C. (2018)。Neural tangent kernel: convergence and generalization in neural networks。Advances in Neural Information Processing Systems。
Jaeger, 2002: Jaeger, H. (2002)。Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the “echo state network” approach。GMD-Forschungszentrum Informationstechnik Bonn。
Jamieson & Talwalkar, 2016: Jamieson, K. 和 Talwalkar, A. (2016)。Non-stochastic best arm identification and hyperparameter optimization。第 17 届国际人工智能与统计学会议论文集。
Jenatton et al., 2017: Jenatton, R.、Archambeau, C.、González, J. 和 Seeger, M. (2017)。Bayesian optimization with tree-structured dependencies。第 34 届国际机器学习会议论文集 (ICML'17)。
Jia et al., 2018: Jia, X.、Song, S.、He, W.、Wang, Y.、Rong, H.、Zhou, F. 等 (2018)。Highly scalable deep learning training system with mixed-precision: training ImageNet in four minutes。ArXiv:1807.11205。
Jia et al., 2014: Jia, Y.、Shelhamer, E.、Donahue, J.、Karayev, S.、Long, J.、Girshick, R. … Darrell, T. (2014)。Caffe: convolutional architecture for fast feature embedding。第 22 届 ACM 国际多媒体会议论文集 (第 675–678 页)。
Joshi et al., 2020: Joshi, M.、Chen, D.、Liu, Y.、Weld, D. S.、Zettlemoyer, L. 和 Levy, O. (2020)。SpanBERT: improving pre-training by representing and predicting spans。Transactions of the Association for Computational Linguistics, 8, 64–77。
Jouppi et al., 2017: Jouppi, N. P.、Young, C.、Patil, N.、Patterson, D.、Agrawal, G.、Bajwa, R. 等 (2017)。In-datacenter performance analysis of a tensor processing unit。2017 ACM/IEEE 第 44 届年度国际计算机体系结构研讨会 (ISCA) (第 1–12 页)。
Kalchbrenner et al., 2014: Kalchbrenner, N.、Grefenstette, E. 和 Blunsom, P. (2014)。A convolutional neural network for modelling sentences。ArXiv:1404.2188。
Kalman & Kwasny, 1992: Kalman, B. L. 和 Kwasny, S. C. (1992)。Why tanh: choosing a sigmoidal function。国际神经网络联合会议论文集 (IJCNN) (第 578–581 页)。
Kaplan et al., 2020: Kaplan, J.、McCandlish, S.、Henighan, T.、Brown, T. B.、Chess, B.、Child, R. … Amodei, D. (2020)。Scaling laws for neural language models。ArXiv:2001.08361。
Karnin et al., 2013: Karnin, Z.、Koren, T. 和 Somekh, O. (2013)。Almost optimal exploration in multi-armed bandits。第 30 届国际机器学习会议论文集 (ICML'13)。
Karras et al., 2017: Karras, T.、Aila, T.、Laine, S. 和 Lehtinen, J. (2017)。Progressive growing of GANs for improved quality, stability, and variation。ArXiv:1710.10196。
Kim et al., 2017: Kim, J.、El-Khamy, M. 和 Lee, J. (2017)。Residual LSTM: design of a deep recurrent architecture for distant speech recognition。ArXiv:1701.03360。
Kim, 2014: Kim, Y. (2014)。Convolutional neural networks for sentence classification。ArXiv:1408.5882。
Kimeldorf & Wahba, 1971: Kimeldorf, G. S. 和 Wahba, G. (1971)。Some results on Tchebycheffian spline functions。J. Math. Anal. Appl., 33, 82–95。
Kingma & Ba, 2014: Kingma, D. P. 和 Ba, J. (2014)。Adam: a method for stochastic optimization。ArXiv:1412.6980。
Kingma & Welling, 2014: Kingma, D. P. 和 Welling, M. (2014)。Auto-encoding variational Bayes。国际学习表征会议 (ICLR)。
Kipf & Welling, 2016: Kipf, T. N. 和 Welling, M. (2016)。Semi-supervised classification with graph convolutional networks。ArXiv:1609.02907。
Kojima et al., 2022: Kojima, T.、Gu, S. S.、Reid, M.、Matsuo, Y. 和 Iwasawa, Y. (2022)。Large language models are zero-shot reasoners。arxiv.org/abs/2205.11916。
Koller & Friedman, 2009: Koller, D. 和 Friedman, N. (2009)。Probabilistic Graphical Models: Principles and Techniques。MIT Press。
Kolmogorov, 1933: Kolmogorov, A. (1933)。Sulla determinazione empirica di una legge di distribuzione。Inst. Ital. Attuari, Giorn., 4, 83–91。
Kolter, 2008: Kolter, Z. (2008)。Linear algebra review and reference。在线查阅：http://cs229.stanford.edu/section/cs229-linalg.pdf。
Koren et al., 2009: Koren, Y.、Bell, R. 和 Volinsky, C. (2009)。Matrix factorization techniques for recommender systems。Computer, 第 30–37 页。
Krizhevsky et al., 2012: Krizhevsky, A.、Sutskever, I. 和 Hinton, G. E. (2012)。ImageNet classification with deep convolutional neural networks。Advances in Neural Information Processing Systems (第 1097–1105 页)。
Kung, 1988: Kung, S. Y. (1988)。VLSI Array Processors。Prentice Hall。
Kuzovkin et al., 2018: Kuzovkin, I.、Vicente, R.、Petton, M.、Lachaux, J.-P.、Baciu, M.、Kahane, P. … Aru, J. (2018)。Activations of deep convolutional neural networks are aligned with gamma band activity of human visual cortex。Communications Biology, 1(1), 1–12。
Lan et al., 2019: Lan, Z.、Chen, M.、Goodman, S.、Gimpel, K.、Sharma, P. 和 Soricut, R. (2019)。ALBERT: a lite BERT for self-supervised learning of language representations。ArXiv:1909.11942。
Lavin & Gray, 2016: Lavin, A. 和 Gray, S. (2016)。Fast algorithms for convolutional neural networks。IEEE 计算机视觉与模式识别会议论文集 (第 4013–4021 页)。
Le, 2013: Le, Q. V. (2013)。Building high-level features using large scale unsupervised learning。IEEE 国际声学、语音与信号处理会议论文集 (第 8595–8598 页)。
LeCun et al., 1995a: LeCun, Y.、Bengio, Y. 和等 (1995)。Convolutional networks for images, speech, and time series。The Handbook of Brain Theory and Neural Networks (第 3361 页)。MIT Press。
LeCun et al., 1989: LeCun, Y.、Boser, B.、Denker, J. S.、Henderson, D.、Howard, R. E.、Hubbard, W. 和 Jackel, L. D. (1989)。Backpropagation applied to handwritten zip code recognition。Neural Computation, 1(4), 541–551。
LeCun et al., 1998a: LeCun, Y.、Bottou, L.、Orr, G. 和 Muller, K.-R. (1998)。Efficient backprop。Neural Networks: Tricks of the Trade。Springer。
LeCun et al., 1998b: LeCun, Y.、Bottou, L.、Bengio, Y. 和 Haffner, P. (1998)。Gradient-based learning applied to document recognition。Proceedings of the IEEE, 86(11), 2278–2324。
LeCun et al., 1995b: LeCun, Y.、Jackel, L.、Bottou, L.、Brunot, A.、Cortes, C.、Denker, J. 等 (1995)。Comparison of learning algorithms for handwritten digit recognition。国际人工神经网络会议 (第 53–60 页)。
Legendre, 1805: Legendre, A. M. (1805)。Mémoire sur les Opérations Trigonométriques: dont les Résultats Dépendent de la Figure de la Terre。F. Didot。
Lewis et al., 2019: Lewis, M.、Liu, Y.、Goyal, N.、Ghazvininejad, M.、Mohamed, A.、Levy, O. … Zettlemoyer, L. (2019)。BART: denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension。ArXiv:1910.13461。
Lewkowycz et al., 2022: Lewkowycz, A.、Andreassen, A.、Dohan, D.、Dyer, E.、Michalewski, H.、Ramasesh, V. 等 (2022)。Solving quantitative reasoning problems with language models。ArXiv:2206.14858。
Li et al., 2018: Li, L.、Jamieson, K.、Rostamizadeh, A.、Gonina, K.、Hardt, M.、Recht, B. 和 Talwalkar, A. (2018)。Massively parallel hyperparameter tuning。ArXiv:1810.05934。
Li, 2017: Li, M. (2017)。Scaling Distributed Machine Learning with System and Algorithm Co-design (博士论文)。博士论文，CMU。
Li et al., 2014a: Li, M.、Andersen, D. G.、Park, J. W.、Smola, A. J.、Ahmed, A.、Josifovski, V. … Su, B.-Y. (2014)。Scaling distributed machine learning with the parameter server。第 11 届操作系统设计与实现研讨会 (OSDI 14) (第 583–598 页)。
Li et al., 2014b: Li, M.、Zhang, T.、Chen, Y. 和 Smola, A. J. (2014)。Efficient mini-batch training for stochastic optimization。第 20 届 ACM SIGKDD 国际知识发现与数据挖掘会议论文集 (第 661–670 页)。
Liaw et al., 2018: Liaw, R.、Liang, E.、Nishihara, R.、Moritz, P.、Gonzalez, J. 和 Stoica, I. (2018)。Tune: a research platform for distributed model selection and training。ArXiv:1807.05118。
Lin et al., 2013: Lin, M.、Chen, Q. 和 Yan, S. (2013)。Network in network。ArXiv:1312.4400。
Lin et al., 2017a: Lin, T.-Y.、Goyal, P.、Girshick, R.、He, K. 和 Dollár, P. (2017)。Focal loss for dense object detection。IEEE 国际计算机视觉会议论文集 (第 2980–2988 页)。
Lin et al., 2010: Lin, Y.、Lv, F.、Zhu, S.、Yang, M.、Cour, T.、Yu, K. … 等 (2010)。ImageNet classification: fast descriptor coding and large-scale SVM training。大规模视觉识别挑战赛。
Lin et al., 2017b: Lin, Z.、Feng, M.、Santos, C. N. d.、Yu, M.、Xiang, B.、Zhou, B. 和 Bengio, Y. (2017)。A structured self-attentive sentence embedding。ArXiv:1703.03130。
Lipton et al., 2015: Lipton, Z. C.、Berkowitz, J. 和 Elkan, C. (2015)。A critical review of recurrent neural networks for sequence learning。ArXiv:1506.00019。
Lipton et al., 2016: Lipton, Z. C.、Kale, D. C.、Elkan, C. 和 Wetzel, R. (2016)。Learning to diagnose with LSTM recurrent neural networks。国际学习表征会议 (ICLR)。
Lipton & Steinhardt, 2018: Lipton, Z. C. 和 Steinhardt, J. (2018)。Troubling trends in machine learning scholarship。Communications of the ACM, 17, 45–77。
Liu & Nocedal, 1989: Liu, D. C. 和 Nocedal, J. (1989)。On the limited memory BFGS method for large scale optimization。Mathematical Programming, 45(1), 503–528。
Liu et al., 2018: Liu, H.、Simonyan, K. 和 Yang, Y. (2018)。DARTS: differentiable architecture search。ArXiv:1806.09055。
Liu et al., 2016: Liu, W.、Anguelov, D.、Erhan, D.、Szegedy, C.、Reed, S.、Fu, C.-Y. 和 Berg, A. C. (2016)。SSD: single shot multibox detector。欧洲计算机视觉会议 (第 21–37 页)。
Liu et al., 2019: Liu, Y.、Ott, M.、Goyal, N.、Du, J.、Joshi, M.、Chen, D. … Stoyanov, V. (2019)。RoBERTa: a robustly optimized BERT pretraining approach。ArXiv:1907.11692。
Liu et al., 2021: Liu, Z.、Lin, Y.、Cao, Y.、Hu, H.、Wei, Y.、Zhang, Z. … Guo, B. (2021)。Swin transformer: hierarchical vision transformer using shifted windows。IEEE/CVF 国际计算机视觉会议论文集 (第 10012–10022 页)。
Liu et al., 2022: Liu, Z.、Mao, H.、Wu, C.-Y.、Feichtenhofer, C.、Darrell, T. 和 Xie, S. (2022)。A convNet for the 2020s。ArXiv:2201.03545。
Long et al., 2015: Long, J.、Shelhamer, E. 和 Darrell, T. (2015)。Fully convolutional networks for semantic segmentation。IEEE 计算机视觉与模式识别会议论文集 (第 3431–3440 页)。
Loshchilov & Hutter, 2016: Loshchilov, I. 和 Hutter, F. (2016)。SGDR: stochastic gradient descent with warm restarts。ArXiv:1608.03983。
Lowe, 2004: Lowe, D. G. (2004)。Distinctive image features from scale-invariant keypoints。International Journal of Computer Vision, 60(2), 91–110。
Luo et al., 2018: Luo, P.、Wang, X.、Shao, W. 和 Peng, Z. (2018)。Towards understanding regularization in batch normalization。ArXiv:1809.00846。
Maas et al., 2011: Maas, A. L.、Daly, R. E.、Pham, P. T.、Huang, D.、Ng, A. Y. 和 Potts, C. (2011)。Learning word vectors for sentiment analysis。计算语言学协会第 49 届年会论文集：人类语言技术，第 1 卷 (第 142–150 页)。
Mack & Silverman, 1982: Mack, Y.-P. 和 Silverman, B. W. (1982)。Weak and strong uniform consistency of kernel regression estimates。Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 61(3), 405–415。
MacKay, 2003: MacKay, D. J. (2003)。Information Theory, Inference and Learning Algorithms。Cambridge University Press。
Maclaurin et al., 2015: Maclaurin, D.、Duvenaud, D. 和 Adams, R. (2015)。Gradient-based hyperparameter optimization through reversible learning。第 32 届国际机器学习会议论文集 (ICML'15)。
Mangasarian, 1965: Mangasarian, O. L. (1965)。Linear and nonlinear separation of patterns by linear programming。Oper. Res., 13, 444-452。
Mangram, 2013: Mangram, M. E. (2013)。A simplified perspective of the Markowitz portfolio theory。Global Journal of Business Research, 7(1), 59–70。
Matthews et al., 2018: Matthews, A. G. d. G.、Rowland, M.、Hron, J.、Turner, R. E. 和 Ghahramani, Z. (2018)。Gaussian process behaviour in wide deep neural networks。ArXiv:1804.11271。
McCann et al., 2017: McCann, B.、Bradbury, J.、Xiong, C. 和 Socher, R. (2017)。Learned in translation: Contextualized word vectors。Advances in Neural Information Processing Systems (第 6294–6305 页)。
McCulloch & Pitts, 1943: McCulloch, W. S. 和 Pitts, W. (1943)。A logical calculus of the ideas immanent in nervous activity。Bulletin of Mathematical Biophysics, 5(4), 115–133。
McMahan et al., 2013: McMahan, H. B.、Holt, G.、Sculley, D.、Young, M.、Ebner, D.、Grady, J. 等 (2013)。Ad click prediction: a view from the trenches。第 19 届 ACM SIGKDD 国际知识发现与数据挖掘会议论文集 (第 1222–1230 页)。
Mead, 1980: Mead, C. (1980)。Introduction to VLSI systems。IEE Proceedings I-Solid-State and Electron Devices, 128(1), 18。
Merity et al., 2016: Merity, S.、Xiong, C.、Bradbury, J. 和 Socher, R. (2016)。Pointer sentinel mixture models。ArXiv:1609.07843。
Micchelli, 1984: Micchelli, C. A. (1984)。Interpolation of scattered data: distance matrices and conditionally positive definite functions。Approximation Theory and Spline Functions (第 143–145 页)。Springer。
Mikolov et al., 2013a: Mikolov, T.、Chen, K.、Corrado, G. 和 Dean, J. (2013)。Efficient estimation of word representations in vector space。ArXiv:1301.3781。
Mikolov et al., 2013b: Mikolov, T.、Sutskever, I.、Chen, K.、Corrado, G. S. 和 Dean, J. (2013)。Distributed representations of words and phrases and their compositionality。Advances in Neural Information Processing Systems (第 3111–3119 页)。
Miller, 1995: Miller, G. A. (1995)。WordNet: a lexical database for English。Communications of the ACM, 38(11), 39–41。
Mirhoseini et al., 2017: Mirhoseini, A.、Pham, H.、Le, Q. V.、Steiner, B.、Larsen, R.、Zhou, Y. … Dean, J. (2017)。Device placement optimization with reinforcement learning。第 34 届国际机器学习会议 (第 2430–2439 页)。
Mnih et al., 2014: Mnih, V.、Heess, N.、Graves, A. 和等 (2014)。Recurrent models of visual attention。Advances in Neural Information Processing Systems (第 2204–2212 页)。
Mnih et al., 2013: Mnih, V.、Kavukcuoglu, K.、Silver, D.、Graves, A.、Antonoglou, I.、Wierstra, D. 和 Riedmiller, M. (2013)。Playing Atari with deep reinforcement learning。ArXiv:1312.5602。
Mnih et al., 2015: Mnih, V.、Kavukcuoglu, K.、Silver, D.、Rusu, A. A.、Veness, J.、Bellemare, M. G. 等 (2015)。Human-level control through deep reinforcement learning。Nature, 518(7540), 529–533。
Moon et al., 2010: Moon, T.、Smola, A.、Chang, Y. 和 Zheng, Z. (2010)。Intervalrank: isotonic regression with listwise and pairwise constraints。第 3 届 ACM 国际网络搜索与数据挖掘会议论文集 (第 151–160 页)。
Morey et al., 2016: Morey, R. D.、Hoekstra, R.、Rouder, J. N.、Lee, M. D. 和 Wagenmakers, E.-J. (2016)。The fallacy of placing confidence in confidence intervals。Psychonomic Bulletin & Review, 23(1), 103–123。
Morozov, 1984: Morozov, V. A. (1984)。Methods for Solving Incorrectly Posed Problems。Springer。
Nadaraya, 1964: Nadaraya, E. A. (1964)。On estimating regression。Theory of Probability & its Applications, 9(1), 141–142。
Nair & Hinton, 2010: Nair, V. 和 Hinton, G. E. (2010)。Rectified linear units improve restricted Boltzmann machines。ICML。
Nakkiran et al., 2021: Nakkiran, P.、Kaplun, G.、Bansal, Y.、Yang, T.、Barak, B. 和 Sutskever, I. (2021)。Deep double descent: where bigger models and more data hurt。Journal of Statistical Mechanics: Theory and Experiment, 2021(12), 124003。
Naor & Reingold, 1999: Naor, M. 和 Reingold, O. (1999)。On the construction of pseudorandom permutations: Luby–Rackoff revisited。Journal of Cryptology, 12(1), 29–66。
Neal, 1996: Neal, R. M. (1996)。Bayesian Learning for Neural Networks。Springer。
Nesterov, 2018: Nesterov, Y. (2018)。Lectures on Convex Optimization。Springer。
Nesterov & Vial, 2000: Nesterov, Y. 和 Vial, J.-P. (2000)。Confidence level solutions for stochastic programming。Automatica, 44(6), 1559–1568。
Neyman, 1937: Neyman, J. (1937)。Outline of a theory of statistical estimation based on the classical theory of probability。Philosophical Transactions of the Royal Society of London. Series A, Mathematical and Physical Sciences, 236(767), 333–380。
Norelli et al., 2022: Norelli, A.、Fumero, M.、Maiorca, V.、Moschella, L.、Rodolà, E. 和 Locatello, F. (2022)。ASIF: coupled data turns unimodal models to multimodal without training。ArXiv:2210.01738。
Novak et al., 2018: Novak, R.、Xiao, L.、Lee, J.、Bahri, Y.、Yang, G.、Hron, J. … Sohl-Dickstein, J. (2018)。Bayesian deep convolutional networks with many channels are Gaussian processes。ArXiv:1810.05148。
Novikoff, 1962: Novikoff, A. B. J. (1962)。On convergence proofs on perceptrons。Proceedings of the Symposium on the Mathematical Theory of Automata (第 615–622 页)。
Olshausen & Field, 1996: Olshausen, B. A. 和 Field, D. J. (1996)。Emergence of simple-cell receptive field properties by learning a sparse code for natural images。Nature, 381(6583), 607–609。
Ong et al., 2005: Ong, C. S.、Smola, A. 和 Williamson, R. (2005)。Learning the kernel with hyperkernels。Journal of Machine Learning Research, 6, 1043–1071。
OpenAI, 2023: OpenAI. (2023)。GPT-4 Technical Report。ArXiv:2303.08774。
Ouyang et al., 2022: Ouyang, L.、Wu, J.、Jiang, X.、Almeida, D.、Wainwright, C. L.、Mishkin, P. 等 (2022)。Training language models to follow instructions with human feedback。ArXiv:2203.02155。
Papineni et al., 2002: Papineni, K.、Roukos, S.、Ward, T. 和 Zhu, W.-J. (2002)。BLEU: a method for automatic evaluation of machine translation。计算语言学协会第 40 届年会论文集 (第 311–318 页)。
Parikh et al., 2016: Parikh, A. P.、Täckström, O.、Das, D. 和 Uszkoreit, J. (2016)。A decomposable attention model for natural language inference。ArXiv:1606.01933。
Park et al., 2019: Park, T.、Liu, M.-Y.、Wang, T.-C. 和 Zhu, J.-Y. (2019)。Semantic image synthesis with spatially-adaptive normalization。IEEE 计算机视觉与模式识别会议论文集 (第 2337–2346 页)。
Parzen, 1957: Parzen, E. (1957)。On consistent estimates of the spectrum of a stationary time series。Annals of Mathematical Statistics, 28, 329–348。
Paszke et al., 2019: Paszke, A.、Gross, S.、Massa, F.、Lerer, A.、Bradbury, J.、Chanan, G. 等 (2019)。PyTorch: an imperative style, high-performance deep learning library。Advances in Neural Information Processing Systems, 32, 8026–8037。
Paulus et al., 2017: Paulus, R.、Xiong, C. 和 Socher, R. (2017)。A deep reinforced model for abstractive summarization。ArXiv:1705.04304。
Penedo et al., 2023: Penedo, G.、Malartic, Q.、Hesslow, D.、Cojocaru, R.、Cappelli, A.、Alobeidli, H. … Launay, J. (2023)。The RefinedWeb dataset for Falcon LLM: outperforming curated corpora with web data, and web data only。ArXiv:2306.01116。
Pennington et al., 2017: Pennington, J.、Schoenholz, S. 和 Ganguli, S. (2017)。Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice。Advances in Neural Information Processing Systems (第 4785–4795 页)。
Pennington et al., 2014: Pennington, J.、Socher, R. 和 Manning, C. (2014)。GloVe: global vectors for word representation。2014 年自然语言处理经验方法会议论文集 (EMNLP) (第 1532–1543 页)。
Peters et al., 2017a: Peters, J.、Janzing, D. 和 Schölkopf, B. (2017)。Elements of Causal Inference: Foundations and Learning Algorithms。MIT Press。
Peters et al., 2017b: Peters, M.、Ammar, W.、Bhagavatula, C. 和 Power, R. (2017)。Semi-supervised sequence tagging with bidirectional language models。计算语言学协会第 55 届年会论文集, 第 1 卷 (第 1756–1765 页)。
Peters et al., 2018: Peters, M.、Neumann, M.、Iyyer, M.、Gardner, M.、Clark, C.、Lee, K. 和 Zettlemoyer, L. (2018)。Deep contextualized word representations。2018 年北美计算语言学协会分会会议论文集：人类语言技术，第 1 卷 (第 2227–2237 页)。
Petersen & Pedersen, 2008: Petersen, K. B. 和 Pedersen, M. S. (2008)。The Matrix Cookbook。丹麦技术大学。
Pleiss et al., 2017: Pleiss, G.、Chen, D.、Huang, G.、Li, T.、Van Der Maaten, L. 和 Weinberger, K. Q. (2017)。Memory-efficient implementation of densenets。ArXiv:1707.06990。
Polyak, 1964: Polyak, B. T. (1964)。Some methods of speeding up the convergence of iteration methods。USSR Computational Mathematics and Mathematical Physics, 4(5), 1–17。
Prakash et al., 2016: Prakash, A.、Hasan, S. A.、Lee, K.、Datla, V.、Qadir, A.、Liu, J. 和 Farri, O. (2016)。Neural paraphrase generation with stacked residual LSTM networks。ArXiv:1610.03098。
Qin et al., 2023: Qin, C.、Zhang, A.、Zhang, Z.、Chen, J.、Yasunaga, M. 和 Yang, D. (2023)。Is ChatGPT a general-purpose natural language processing task solver?。ArXiv:2302.06476。
Quadrana et al., 2018: Quadrana, M.、Cremonesi, P. 和 Jannach, D. (2018)。Sequence-aware recommender systems。ACM Computing Surveys, 51(4), 66。
Quinlan, 1993: Quinlan, J. R. (1993)。C4.5: Programs for Machine Learning。Elsevier。
Rabiner & Juang, 1993: Rabiner, L. 和 Juang, B.-H. (1993)。Fundamentals of Speech Recognition。Prentice-Hall。
Radford et al., 2021: Radford, A.、Kim, J. W.、Hallacy, C.、Ramesh, A.、Goh, G.、Agarwal, S. 等 (2021)。Learning transferable visual models from natural language supervision。国际机器学习会议 (第 8748–8763 页)。
Radford et al., 2015: Radford, A.、Metz, L. 和 Chintala, S. (2015)。Unsupervised representation learning with deep convolutional generative adversarial networks。ArXiv:1511.06434。
Radford et al., 2018: Radford, A.、Narasimhan, K.、Salimans, T. 和 Sutskever, I. (2018)。Improving language understanding by generative pre-training。OpenAI。
Radford et al., 2019: Radford, A.、Wu, J.、Child, R.、Luan, D.、Amodei, D. 和 Sutskever, I. (2019)。Language models are unsupervised multitask learners。OpenAI Blog, 1(8), 9。
Radosavovic et al., 2019: Radosavovic, I.、Johnson, J.、Xie, S.、Lo, W.-Y. 和 Dollár, P. (2019)。On network design spaces for visual recognition。IEEE/CVF 国际计算机视觉会议论文集 (第 1882–1890 页)。
Radosavovic et al., 2020: Radosavovic, I.、Kosaraju, R. P.、Girshick, R.、He, K. 和 Dollár, P. (2020)。Designing network design spaces。IEEE/CVF 计算机视觉与模式识别会议论文集 (第 10428–10436 页)。
Rae et al., 2021: Rae, J. W.、Borgeaud, S.、Cai, T.、Millican, K.、Hoffmann, J.、Song, F. 等 (2021)。Scaling language models: methods, analysis & insights from training gopher。ArXiv:2112.11446。
Raffel et al., 2020: Raffel, C.、Shazeer, N.、Roberts, A.、Lee, K.、Narang, S.、Matena, M. … Liu, P. J. (2020)。Exploring the limits of transfer learning with a unified text-to-text transformer。Journal of Machine Learning Research, 21, 1–67。
Rajpurkar et al., 2016: Rajpurkar, P.、Zhang, J.、Lopyrev, K. 和 Liang, P. (2016)。SQuAD: 100,000+ questions for machine comprehension of text。ArXiv:1606.05250。
Ramachandran et al., 2019: Ramachandran, P.、Parmar, N.、Vaswani, A.、Bello, I.、Levskaya, A. 和 Shlens, J. (2019)。Stand-alone self-attention in vision models。Advances in Neural Information Processing Systems, 32。
Ramachandran et al., 2017: Ramachandran, P.、Zoph, B. 和 Le, Q. V. (2017). Searching for activation functions. ArXiv:1710.05941。
Ramesh 等人, 2022: Ramesh, A.、Dhariwal, P.、Nichol, A.、Chu, C. 和 Chen, M. (2022). Hierarchical text-conditional image generation with clip latents. ArXiv:2204.06125。
Cajal & Azoulay, 1894: Ramón y Cajal, Santiago, 和 Azoulay, L. (1894). Les Nouvelles Idées sur la Structure du Système Nerveux chez l'Homme et chez les Vertébrés。巴黎, C. Reinwald & Cie。
Ranzato 等人, 2007: Ranzato, M.-A.、Boureau, Y.-L.、Chopra, S. 和 LeCun, Y. (2007). A unified energy-based framework for unsupervised learning. Artificial Intelligence and Statistics (第 371–379 页)。
Rasmussen & Williams, 2006: Rasmussen, C. E., 和 Williams, C. K. (2006). Gaussian Processes for Machine Learning。麻省理工学院出版社。
Reddi 等人, 2019: Reddi, S. J.、Kale, S. 和 Kumar, S. (2019). On the convergence of Adam and beyond. ArXiv:1904.09237。
Redmon 等人, 2016: Redmon, J.、Divvala, S.、Girshick, R. 和 Farhadi, A. (2016). You only look once: unified, real-time object detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (第 779–788 页)。
Redmon & Farhadi, 2018: Redmon, J., 和 Farhadi, A. (2018). YOLOv3: an incremental improvement. ArXiv:1804.02767。
Reed & DeFreitas, 2015: Reed, S., 和 De Freitas, N. (2015). Neural programmer-interpreters. ArXiv:1511.06279。
Reed 等人, 2022: Reed, S.、Zolna, K.、Parisotto, E.、Colmenarejo, S. G.、Novikov, A.、Barth-Maron, G. 等人 (2022). A generalist agent. ArXiv:2205.06175。
Ren 等人, 2015: Ren, S.、He, K.、Girshick, R. 和 Sun, J. (2015). Faster R-CNN: towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (第 91–99 页)。
Rendle, 2010: Rendle, S. (2010). Factorization machines. 2010 IEEE International Conference on Data Mining (第 995–1000 页)。
Rendle 等人, 2009: Rendle, S.、Freudenthaler, C.、Gantner, Z. 和 Schmidt-Thieme, L. (2009). BPR: Bayesian personalized ranking from implicit feedback. Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (第 452–461 页)。
Revels 等人, 2016: Revels, J.、Lubin, M. 和 Papamarkou, T. (2016). Forward-mode automatic differentiation in Julia. ArXiv:1607.07892。
Rezende 等人, 2014: Rezende, D. J.、Mohamed, S. 和 Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. International Conference on Machine Learning (第 1278–1286 页)。
Riesenhuber & Poggio, 1999: Riesenhuber, M., 和 Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019–1025。
Rockafellar, 1970: Rockafellar, R. T. (1970). Convex Analysis。普林斯顿大学出版社。
Rolnick 等人, 2017: Rolnick, D.、Veit, A.、Belongie, S. 和 Shavit, N. (2017). Deep learning is robust to massive label noise. ArXiv:1705.10694。
Rudin, 1973: Rudin, W. (1973). Functional Analysis。McGraw-Hill。
Rumelhart 等人, 1988: Rumelhart, D. E.、Hinton, G. E. 和 Williams, R. J. (1988). Learning representations by back-propagating errors. Cognitive Modeling, 5(3), 1。
Russakovsky 等人, 2013: Russakovsky, O.、Deng, J.、Huang, Z.、Berg, A. C. 和 Fei-Fei, L. (2013). Detecting avocados to zucchinis: what have we done, and where are we going? International Conference on Computer Vision (ICCV)。
Russakovsky 等人, 2015: Russakovsky, O.、Deng, J.、Su, H.、Krause, J.、Satheesh, S.、Ma, S. 等人 (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252。
Russell & Norvig, 2016: Russell, S. J., 和 Norvig, P. (2016). Artificial Intelligence: A Modern Approach。Pearson Education Limited。
Saharia 等人, 2022: Saharia, C.、Chan, W.、Saxena, S.、Li, L.、Whang, J.、Denton, E. 等人 (2022). Photorealistic text-to-image diffusion models with deep language understanding. ArXiv:2205.11487。
Salinas 等人, 2022: Salinas, D.、Seeger, M.、Klein, A.、Perrone, V.、Wistuba, M. 和 Archambeau, C. (2022). Syne Tune: a library for large scale hyperparameter tuning and reproducible research. First Conference on Automated Machine Learning。
Sanh 等人, 2019: Sanh, V.、Debut, L.、Chaumond, J. 和 Wolf, T. (2019). DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. ArXiv:1910.01108。
Sanh 等人, 2021: Sanh, V.、Webson, A.、Raffel, C.、Bach, S. H.、Sutawika, L.、Alyafeai, Z. 等人 (2021). Multitask prompted training enables zero-shot task generalization. ArXiv:2110.08207。
Santurkar 等人, 2018: Santurkar, S.、Tsipras, D.、Ilyas, A. 和 Madry, A. (2018). How does batch normalization help optimization? Advances in Neural Information Processing Systems (第 2483–2493 页)。
Sarwar 等人, 2001: Sarwar, B. M.、Karypis, G.、Konstan, J. A. 和 Riedl, J. (2001). Item-based collaborative filtering recommendation algorithms. Proceedings of 10th International Conference on World Wide Web (第 285–295 页)。
Scao 等人, 2022: Scao, T. L.、Fan, A.、Akiki, C.、Pavlick, E.、Ilić, S.、Hesslow, D. 等人 (2022). BLOOM: a 176B-parameter open-access multilingual language model. ArXiv:2211.05100。
Schein 等人, 2002: Schein, A. I.、Popescul, A.、Ungar, L. H. 和 Pennock, D. M. (2002). Methods and metrics for cold-start recommendations. Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (第 253–260 页)。
Schuhmann 等人, 2022: Schuhmann, C.、Beaumont, R.、Vencu, R.、Gordon, C.、Wightman, R.、Cherti, M. 等人 (2022). LAION-5B: an open large-scale dataset for training next generation image-text models. ArXiv:2210.08402。
Schuster & Paliwal, 1997: Schuster, M., 和 Paliwal, K. K. (1997). Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11), 2673–2681。
Scholkopf 等人, 2001: Schölkopf, B.、Herbrich, R. 和 Smola, A. J. (2001). Helmbold, D. P., 和 Williamson, B. (编辑). A generalized representer theorem. Proceedings of the Annual Conference on Computational Learning Theory (第 416–426 页)。Springer-Verlag。
Scholkopf 等人, 1996: Schölkopf, B.、Burges, C. 和 Vapnik, V. (1996). Incorporating invariances in support vector learning machines. International Conference on Artificial Neural Networks (第 47–52 页)。
Scholkopf & Smola, 2002: Schölkopf, B., 和 Smola, A. J. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond。麻省理工学院出版社。
Sedhain 等人, 2015: Sedhain, S.、Menon, A. K.、Sanner, S. 和 Xie, L. (2015). Autorec: autoencoders meet collaborative filtering. Proceedings of the 24th International Conference on World Wide Web (第 111–112 页)。
Sennrich 等人, 2015: Sennrich, R.、Haddow, B. 和 Birch, A. (2015). Neural machine translation of rare words with subword units. ArXiv:1508.07909。
Sergeev & DelBalso, 2018: Sergeev, A., 和 Del Balso, M. (2018). Horovod: fast and easy distributed deep learning in TensorFlow. ArXiv:1802.05799。
Shannon, 1948: Shannon, C. E. (1948). A mathematical theory of communication. The Bell System Technical Journal, 27(3), 379–423。
Shao 等人, 2020: Shao, H.、Yao, S.、Sun, D.、Zhang, A.、Liu, S.、Liu, D. 等人 (2020). ControlVAE: controllable variational autoencoder. Proceedings of the 37th International Conference on Machine Learning。
Shaw 等人, 2018: Shaw, P.、Uszkoreit, J. 和 Vaswani, A. (2018). Self-attention with relative position representations. ArXiv:1803.02155。
Shoeybi 等人, 2019: Shoeybi, M.、Patwary, M.、Puri, R.、LeGresley, P.、Casper, J. 和 Catanzaro, B. (2019). Megatron-LM: training multi-billion parameter language models using model parallelism. ArXiv:1909.08053。
Silver 等人, 2016: Silver, D.、Huang, A.、Maddison, C. J.、Guez, A.、Sifre, L.、Van Den Driessche, G. 等人 (2016). Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587), 484。
Silverman, 1986: Silverman, B. W. (1986). Density Estimation for Statistical and Data Analysis。Chapman and Hall。
Simard 等人, 1998: Simard, P. Y.、LeCun, Y. A.、Denker, J. S. 和 Victorri, B. (1998). Transformation invariance in pattern recognition – tangent distance and tangent propagation. Neural Networks: Tricks of the Trade (第 239–274 页)。Springer。
Simonyan & Zisserman, 2014: Simonyan, K., 和 Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. ArXiv:1409.1556。
Sindhwani 等人, 2015: Sindhwani, V.、Sainath, T. N. 和 Kumar, S. (2015). Structured transforms for small-footprint deep learning. ArXiv:1510.01722。
Sivic & Zisserman, 2003: Sivic, J., 和 Zisserman, A. (2003). Video Google: a text retrieval approach to object matching in videos. Proceedings of the IEEE International Conference on Computer Vision (第 1470–1470 页)。
Smith 等人, 2022: Smith, S.、Patwary, M.、Norick, B.、LeGresley, P.、Rajbhandari, S.、Casper, J. 等人 (2022). Using DeepSpeed and Megatron to train Megatron-Turing NLG 530B, a large-scale generative language model. ArXiv:2201.11990。
Smola & Narayanamurthy, 2010: Smola, A., 和 Narayanamurthy, S. (2010). An architecture for parallel topic models. Proceedings of the VLDB Endowment, 3(1-2), 703–710。
Snoek 等人, 2012: Snoek, J.、Larochelle, H. 和 Adams, R. (2012). Practical Bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems 25 (第 2951–2959 页)。
Sohl-Dickstein 等人, 2015: Sohl-Dickstein, J.、Weiss, E.、Maheswaranathan, N. 和 Ganguli, S. (2015). Deep unsupervised learning using nonequilibrium thermodynamics. International Conference on Machine Learning (第 2256–2265 页)。
Song & Ermon, 2019: Song, Y., 和 Ermon, S. (2019). Generative modeling by estimating gradients of the data distribution. Advances in Neural Information Processing Systems, 32。
Song 等人, 2021: Song, Y.、Sohl-Dickstein, J.、Kingma, D. P.、Kumar, A.、Ermon, S. 和 Poole, B. (2021). Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations。
Speelpenning, 1980: Speelpenning, B. (1980). Compiling fast partial derivatives of functions given by algorithms (博士论文)。伊利诺伊大学厄巴纳-香槟分校。
Srivastava 等人, 2022: Srivastava, A.、Rastogi, A.、Rao, A.、Shoeb, A. A. M.、Abid, A.、Fisch, A. 等人 (2022). Beyond the imitation game: quantifying and extrapolating the capabilities of language models. ArXiv:2206.04615。
Srivastava 等人, 2014: Srivastava, N.、Hinton, G.、Krizhevsky, A.、Sutskever, I. 和 Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958。
Srivastava 等人, 2015: Srivastava, R. K.、Greff, K. 和 Schmidhuber, J. (2015). Highway networks. ArXiv:1505.00387。
Strang, 1993: Strang, G. (1993). Introduction to Linear Algebra。Wellesley–Cambridge Press。
Su & Khoshgoftaar, 2009: Su, X., 和 Khoshgoftaar, T. M. (2009). A survey of collaborative filtering techniques. Advances in Artificial Intelligence, 2009。
Sukhbaatar 等人, 2015: Sukhbaatar, S.、Weston, J. 和 Fergus, R. (2015). End-to-end memory networks. Advances in Neural Information Processing Systems (第 2440–2448 页)。
Sutskever 等人, 2013: Sutskever, I.、Martens, J.、Dahl, G. 和 Hinton, G. (2013). On the importance of initialization and momentum in deep learning. International Conference on Machine Learning (第 1139–1147 页)。
Sutskever 等人, 2014: Sutskever, I.、Vinyals, O. 和 Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems (第 3104–3112 页)。
Szegedy 等人, 2017: Szegedy, C.、Ioffe, S.、Vanhoucke, V. 和 Alemi, A. A. (2017). Inception-v4, Inception-ResNet and the impact of residual connections on learning. 31st AAAI Conference on Artificial Intelligence。
Szegedy 等人, 2015: Szegedy, C.、Liu, W.、Jia, Y.、Sermanet, P.、Reed, S.、Anguelov, D. 等人 (2015). Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (第 1–9 页)。
Szegedy 等人, 2016: Szegedy, C.、Vanhoucke, V.、Ioffe, S.、Shlens, J. 和 Wojna, Z. (2016). Rethinking the Inception architecture for computer vision. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (第 2818–2826 页)。
Tallec & Ollivier, 2017: Tallec, C., 和 Ollivier, Y. (2017). Unbiasing truncated backpropagation through time. ArXiv:1705.08209。
Tan & Le, 2019: Tan, M., 和 Le, Q. (2019). EfficientNet: rethinking model scaling for convolutional neural networks. International Conference on Machine Learning (第 6105–6114 页)。
Tang & Wang, 2018: Tang, J., 和 Wang, K. (2018). Personalized top-n sequential recommendation via convolutional sequence embedding. Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (第 565–573 页)。
Taskar 等人, 2004: Taskar, B.、Guestrin, C. 和 Koller, D. (2004). Max-margin Markov networks. Advances in Neural Information Processing Systems, 16, 25。
Tay 等人, 2020: Tay, Y.、Dehghani, M.、Bahri, D. 和 Metzler, D. (2020). Efficient transformers: a survey. ArXiv:2009.06732。
Taylor 等人, 2022: Taylor, R.、Kardas, M.、Cucurull, G.、Scialom, T.、Hartshorn, A.、Saravia, E. 等人 (2022). Galactica: a large language model for science. ArXiv:2211.09085。
Teye 等人, 2018: Teye, M.、Azizpour, H. 和 Smith, K. (2018). Bayesian uncertainty estimation for batch normalized deep networks. ArXiv:1802.06455。
Thomee 等人, 2016: Thomee, B.、Shamma, D. A.、Friedland, G.、Elizalde, B.、Ni, K.、Poland, D. 等人 (2016). Yfcc100m: the new data in multimedia research. Communications of the ACM, 59(2), 64–73。
Tieleman & Hinton, 2012: Tieleman, T., 和 Hinton, G. (2012). Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, Lecture 6.5-rmsprop。
Tikhonov & Arsenin, 1977: Tikhonov, A. N., 和 Arsenin, V. Y. (1977). Solutions of Ill-Posed Problems。W.H. Winston。
Tolstikhin 等人, 2021: Tolstikhin, I. O.、Houlsby, N.、Kolesnikov, A.、Beyer, L.、Zhai, X.、Unterthiner, T. 等人 (2021). MLP-mixer: an all-MLP architecture for vision. Advances in Neural Information Processing Systems, 34。
Torralba 等人, 2008: Torralba, A.、Fergus, R. 和 Freeman, W. T. (2008). 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(11), 1958–1970。
Touvron 等人, 2021: Touvron, H.、Cord, M.、Douze, M.、Massa, F.、Sablayrolles, A. 和 Jégou, H. (2021). Training data-efficient image transformers & distillation through attention. International Conference on Machine Learning (第 10347–10357 页)。
Touvron 等人, 2023a: Touvron, H.、Lavril, T.、Izacard, G.、Martinet, X.、Lachaux, M.-A.、Lacroix, T. 等人 (2023a). LLaMA: open and efficient foundation language models. ArXiv:2302.13971。
Touvron 等人, 2023b: Touvron, H.、Martin, L.、Stone, K.、Albert, P.、Almahairi, A.、Babaei, Y. 等人 (2023b). LLaMA 2: open foundation and fine-tuned chat models. ArXiv:2307.09288。
Tsoumakas & Katakis, 2007: Tsoumakas, G., 和 Katakis, I. (2007). Multi-label classification: an overview. International Journal of Data Warehousing and Mining, 3(3), 1–13。
Turing, 1950: Turing, A. (1950). Computing machinery and intelligence. Mind, 59(236), 433。
Toscher 等人, 2009: Töscher, A.、Jahrer, M. 和 Bell, R. M. (2009). The bigchaos solution to the Netflix grand prize。
Uijlings 等人, 2013: Uijlings, J. R.、Van De Sande, K. E.、Gevers, T. 和 Smeulders, A. W. (2013). Selective search for object recognition. International Journal of Computer Vision, 104(2), 154–171。
Vapnik, 1995: Vapnik, V. (1995). The Nature of Statistical Learning Theory。纽约：Springer。
Vapnik, 1998: Vapnik, V. (1998). Statistical Learning Theory。纽约：John Wiley and Sons。
Vapnik & Chervonenkis, 1964: Vapnik, V., 和 Chervonenkis, A. (1964). A note on one class of perceptrons. Automation and Remote Control, 25。
Vapnik & Chervonenkis, 1968: Vapnik, V., 和 Chervonenkis, A. (1968). Uniform convergence of frequencies of occurence of events to their probabilities. Dokl. Akad. Nauk SSSR, 181, 915-918。
Vapnik & Chervonenkis, 1971: Vapnik, V., 和 Chervonenkis, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl., 16(2), 264-281。
Vapnik & Chervonenkis, 1981: Vapnik, V., 和 Chervonenkis, A. (1981). The necessary and sufficient conditions for the uniform convergence of averages to their expected values. Teoriya Veroyatnostei i Ee Primeneniya, 26(3), 543-564。
Vapnik & Chervonenkis, 1991: Vapnik, V., 和 Chervonenkis, A. (1991). The necessary and sufficient conditions for consistency in the empirical risk minimization method. Pattern Recognition and Image Analysis, 1(3), 283-305。
Vapnik & Chervonenkis, 1974: Vapnik, V. N., 和 Chervonenkis, A. Y. (1974). Ordered risk minimization. Automation and Remote Control, 35, 1226–1235, 1403–1412。
Vapnik, 1992: Vapnik, V. (1992). Principles of risk minimization for learning theory. Advances in Neural Information Processing Systems (第 831–838 页)。
Vapnik 等人, 1994: Vapnik, V.、Levin, E. 和 Le Cun, Y. (1994). Measuring the VC-dimension of a learning machine. Neural Computation, 6(5), 851–876。
Vaswani 等人, 2017: Vaswani, A.、Shazeer, N.、Parmar, N.、Uszkoreit, J.、Jones, L.、Gomez, A. N. 等人 (2017). Attention is all you need. Advances in Neural Information Processing Systems (第 5998–6008 页)。
Wahba, 1990: Wahba, G. (1990). Spline Models for Observational Data。SIAM。
Waibel 等人, 1989: Waibel, A.、Hanazawa, T.、Hinton, G.、Shikano, K. 和 Lang, K. J. (1989). Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech, and Signal Processing, 37(3), 328–339。
Wang 等人, 2022: Wang, H.、Zhang, A.、Zheng, S.、Shi, X.、Li, M. 和 Wang, Z. (2022). Removing batch normalization boosts adversarial training. International Conference on Machine Learning (第 23433–23445 页)。
Wang 等人, 2018: Wang, L.、Li, M.、Liberty, E. 和 Smola, A. J. (2018). Optimal message scheduling for aggregation. Networks, 2(3), 2–3。
Wang 等人, 2019: Wang, Q.、Li, B.、Xiao, T.、Zhu, J.、Li, C.、Wong, D. F. 和 Chao, L. S. (2019). Learning deep transformer models for machine translation. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (第 1810–1822 页)。
Wang 等人, 2023: Wang, X.、Wei, J.、Schuurmans, D.、Le, Q.、Chi, E. 和 Zhou, D. (2023). Self-consistency improves chain of thought reasoning in language models. International Conference on Learning Representations。
Wang 等人, 2016: Wang, Y.、Davidson, A.、Pan, Y.、Wu, Y.、Riffel, A. 和 Owens, J. D. (2016). Gunrock: a high-performance graph processing library on the GPU. ACM SIGPLAN Notices (p. 11)。
Warstadt 等人, 2019: Warstadt, A.、Singh, A. 和 Bowman, S. R. (2019). Neural network acceptability judgments. Transactions of the Association for Computational Linguistics, 7, 625–641。
Wasserman, 2013: Wasserman, L. (2013). All of Statistics: A Concise Course in Statistical Inference。Springer。
Watkins & Dayan, 1992: Watkins, C. J., 和 Dayan, P. (1992). Q-learning. Machine Learning, 8(3–4), 279–292。
Watson, 1964: Watson, G. S. (1964). Smooth regression analysis. Sankhyā: The Indian Journal of Statistics, Series A, 第 359–372 页。
Wei 等人, 2021: Wei, J.、Bosma, M.、Zhao, V. Y.、Guu, K.、Yu, A. W.、Lester, B. 等人 (2021). Finetuned language models are zero-shot learners. ArXiv:2109.01652。
Wei 等人, 2022a: Wei, J.、Tay, Y.、Bommasani, R.、Raffel, C.、Zoph, B.、Borgeaud, S. 等人 (2022). Emergent abilities of large language models. ArXiv:2206.07682。
Wei 等人, 2022b: Wei, J.、Wang, X.、Schuurmans, D.、Bosma, M.、Chi, E.、Le, Q. 和 Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models. ArXiv:2201.11903。
Welling & Teh, 2011: Welling, M., 和 Teh, Y. W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. Proceedings of the 28th International Conference on Machine Learning (ICML-11) (第 681–688 页)。
Wengert, 1964: Wengert, R. E. (1964). A simple automatic derivative evaluation program. Communications of the ACM, 7(8), 463–464。
Werbos, 1990: Werbos, P. J. (1990). Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10), 1550–1560。
Wigner, 1958: Wigner, E. P. (1958). On the distribution of the roots of certain symmetric matrices. Ann. Math. (第 325–327 页)。
Wilson & Izmailov, 2020: Wilson, A. G., 和 Izmailov, P. (2020). Bayesian deep learning and a probabilistic perspective of generalization. Advances in Neural Information Processing Systems, 33, 4697–4708。
Wistuba 等人, 2019: Wistuba, M.、Rawat, A. 和 Pedapati, T. (2019). A survey on neural architecture search. ArXiv:1905.01392 [cs.LG]。
Wistuba 等人, 2018: Wistuba, M.、Schilling, N. 和 Schmidt-Thieme, L. (2018). Scalable Gaussian process-based transfer surrogates for hyperparameter optimization. Machine Learning, 108, 43–78。
Wolpert & Macready, 1995: Wolpert, D. H., 和 Macready, W. G. (1995). No free lunch theorems for search。技术报告 SFI-TR-95-02-010, Santa Fe Institute。
Wood 等人, 2011: Wood, F.、Gasthaus, J.、Archambeau, C.、James, L. 和 Teh, Y. W. (2011). The sequence memoizer. Communications of the ACM, 54(2), 91–98。
Wu 等人, 2018: Wu, B.、Wan, A.、Yue, X.、Jin, P.、Zhao, S.、Golmant, N. 等人 (2018). Shift: a zero flop, zero parameter alternative to spatial convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (第 9127–9135 页)。
Wu 等人, 2016: Wu, Y.、Schuster, M.、Chen, Z.、Le, Q. V.、Norouzi, M.、Macherey, W. 等人 (2016). Google's neural machine translation system: bridging the gap between human and machine translation. ArXiv:1609.08144。
Xiao 等人, 2017: Xiao, H.、Rasul, K. 和 Vollgraf, R. (2017). Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. ArXiv:1708.07747。
Xiao 等人, 2018: Xiao, L.、Bahri, Y.、Sohl-Dickstein, J.、Schoenholz, S. 和 Pennington, J. (2018). Dynamical isometry and a mean field theory of CNNs: how to train 10,000-layer vanilla convolutional neural networks. International Conference on Machine Learning (第 5393–5402 页)。
Xie 等人, 2017: Xie, S.、Girshick, R.、Dollár, P.、Tu, Z. 和 He, K. (2017). Aggregated residual transformations for deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (第 1492–1500 页)。
Xiong 等人, 2020: Xiong, R.、Yang, Y.、He, D.、Zheng, K.、Zheng, S.、Xing, C. 等人 (2020). On layer normalization in the transformer architecture. International Conference on Machine Learning (第 10524–10533 页)。
Xiong 等人, 2018: Xiong, W.、Wu, L.、Alleva, F.、Droppo, J.、Huang, X. 和 Stolcke, A. (2018). The Microsoft 2017 conversational speech recognition system. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (第 5934–5938 页)。
Yamaguchi 等人, 1990: Yamaguchi, K.、Sakamoto, K.、Akabane, T. 和 Fujimoto, Y. (1990). A neural network for speaker-independent isolated word recognition. First International Conference on Spoken Language Processing。
Yang 等人, 2016: Yang, Z.、Hu, Z.、Deng, Y.、Dyer, C. 和 Smola, A. (2016). Neural machine translation with recurrent attention modeling. ArXiv:1607.05108。
Yang 等人, 2015: Yang, Z.、Moczulski, M.、Denil, M.、De Freitas, N.、Smola, A.、Song, L. 和 Wang, Z. (2015). Deep fried convnets. Proceedings of the IEEE International Conference on Computer Vision (第 1476–1483 页)。
Ye 等人, 2011: Ye, M.、Yin, P.、Lee, W.-C. 和 Lee, D.-L. (2011). Exploiting geographical influence for collaborative point-of-interest recommendation. Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval (第 325–334 页)。
You 等人, 2017: You, Y.、Gitman, I. 和 Ginsburg, B. (2017). Large batch training of convolutional networks. ArXiv:1708.03888。
Yu 等人, 2022: Yu, J.、Xu, Y.、Koh, J. Y.、Luong, T.、Baid, G.、Wang, Z. 等人 (2022). Scaling autoregressive models for content-rich text-to-image generation. ArXiv:2206.10789。
Zaheer 等人, 2018: Zaheer, M.、Reddi, S.、Sachan, D.、Kale, S. 和 Kumar, S. (2018). Adaptive methods for nonconvex optimization. Advances in Neural Information Processing Systems (第 9793–9803 页)。
Zeiler, 2012: Zeiler, M. D. (2012). ADADELTA: an adaptive learning rate method. ArXiv:1212.5701。
Zeiler & Fergus, 2013: Zeiler, M. D., 和 Fergus, R. (2013). Stochastic pooling for regularization of deep convolutional neural networks. ArXiv:1301.3557。
Zhang 等人, 2021a: Zhang, A.、Tay, Y.、Zhang, S.、Chan, A.、Luu, A. T.、Hui, S. C. 和 Fu, J. (2021). Beyond fully-connected layers with quaternions: parameterization of hypercomplex multiplications with 1/n parameters. International Conference on Learning Representations。
Zhang 等人, 2021b: Zhang, C.、Bengio, S.、Hardt, M.、Recht, B. 和 Vinyals, O. (2021). Understanding deep learning (still) requires rethinking generalization. Communications of the ACM, 64(3), 107–115。
Zhang 等人, 2019: Zhang, S.、Yao, L.、Sun, A. 和 Tay, Y. (2019). Deep learning based recommender system: a survey and new perspectives. ACM Computing Surveys, 52(1), 5。
Zhang 等人, 2022: Zhang, S.、Roller, S.、Goyal, N.、Artetxe, M.、Chen, M.、Chen, S. 等人 (2022). OPT: open pre-trained transformer language models. ArXiv:2205.01068。
Zhang 等人, 1988: Zhang, W.、Tanida, J.、Itoh, K. 和 Ichioka, Y. (1988). Shift-invariant pattern recognition neural network and its optical architecture. Proceedings of Annual Conference of the Japan Society of Applied Physics。
Zhang 等人, 2021c: Zhang, Y.、Sun, P.、Jiang, Y.、Yu, D.、Yuan, Z.、Luo, P. 等人 (2021). ByteTrack: multi-object tracking by associating every detection box. ArXiv:2110.06864。
Zhang 等人, 2023a: Zhang, Z.、Zhang, A.、Li, M. 和 Smola, A. (2023). Automatic chain of thought prompting in large language models. International Conference on Learning Representations。
Zhang 等人, 2023b: Zhang, Z.、Zhang, A.、Li, M.、Zhao, H.、Karypis, G. 和 Smola, A. (2023). Multimodal chain-of-thought reasoning in language models. ArXiv:2302.00923。
Zhao 等人, 2019: Zhao, Z.-Q.、Zheng, P.、Xu, S.-t. 和 Wu, X. (2019). Object detection with deep learning: a review. IEEE Transactions on Neural Networks and Learning Systems, 30(11), 3212–3232。
Zhou 等人, 2023: Zhou, D.、Schärli, N.、Hou, L.、Wei, J.、Scales, N.、Wang, X. 等人 (2023). Least-to-most prompting enables complex reasoning in large language models. International Conference on Learning Representations。
Zhu 等人, 2017: Zhu, J.-Y.、Park, T.、Isola, P. 和 Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE International Conference on Computer Vision (第 2223–2232 页)。
Zhu 等人, 2015: Zhu, Y.、Kiros, R.、Zemel, R.、Salakhutdinov, R.、Urtasun, R.、Torralba, A. 和 Fidler, S. (2015). Aligning books and movies: towards story-like visual explanations by watching movies and reading books. Proceedings of the IEEE International Conference on Computer Vision (第 19–27 页)。
Zoph & Le, 2016: Zoph, B., 和 Le, Q. V. (2016). Neural architecture search with reinforcement learning. ArXiv:1611.01578。

参考文献¶ Colab [pytorch]在 Colab 中打开 Notebook Colab [mxnet]在 Colab 中打开 Notebook Colab [jax]在 Colab 中打开 Notebook Colab [tensorflow]在 Colab 中打开 Notebook SageMaker Studio Lab在 SageMaker Studio Lab 中打开 Notebook

参考文献¶

在 Colab 中打开 Notebook

在 Colab 中打开 Notebook

在 Colab 中打开 Notebook

在 Colab 中打开 Notebook

在 SageMaker Studio Lab 中打开 Notebook