Neural architecture search is the task of automatically searching for one or more neural network architectures that will produce models with good results (low losses), relatively quickly, for a given data set. Research on neuroarchitecture is currently an emerging area. There is a lot of research going on, there are many different approaches to the task, and there is no single best way in general – or even one best way for a specialized type of problem like object identification in images.
Neural architecture research is an aspect of AutoML, along with feature engineering, transfer learning, and hyperparameter optimization. Perhaps the most difficult problem in machine learning is currently under active research; until the evaluation It is difficult to use research methods in neural architecture. Research on neural architecture can be expensive and time-consuming. The research and training time scale is often presented in GPU days, sometimes in thousands of GPU days.
The drive to improve research on neural architecture is fairly straightforward. Most advances in neural network models, for example in image classification and language translation, have required extensive manual tuning of the neural network architecture, which is time-consuming and error-prone. Even compared to the cost of high-end GPUs on the public cloud, the cost of data scientists is very high, and their availability tends to be low.
Neuroarchitectural research evaluation
as multiple authors (eg Lindauer and HaterAnd Yang et al., And Lee and Talwalkar) Note that many Neural Architecture Research (NAS) studies are not reproducible, for any of several reasons. In addition, many neural architecture search algorithms failed to outperform random search (with early termination criteria applied) or were never compared to a useful baseline.
Yang et al. showed that many neural architecture research techniques struggle to significantly beat the average random architecture baseline. (They called their paper “NAS evaluation is too difficult.”) They also presented a Store The code used to evaluate neural architecture search methods includes several different datasets as well as the code used to augment architectures with different protocols.
Lindauer and Hutter suggested A NAS Best Practices Checklist Based on Article – Commodity (also referenced above):
Code release best practices
For all experiences you report, check if you’ve issued:
_ Code for the training pipeline used to evaluate the final architectures
_ Code for search area
_ Hyperparameters used for the final evaluation pipeline, as well as random seeds
_ Code for your NAS method
_ Hyperparameters for your NAS method, as well as random seeds
Note that the easiest way to meet the first three of them is to use existing NAS standards, rather than changing them or introducing new ones.
Best practices for comparing NAS methods
_ For all the NAS methods you’re comparing, did you use the exact same NAS benchmark, including the same dataset (with the same training test split), lookup space and code to train the superstructures and parameters of that code?
_ Did you control for confounding factors (different hardware, DL-library versions, different run times for different methods)?
_ Have you done ablation studies?
_ Have you used the same evaluation protocol for the methods being compared?
_ Have you compared performance over time?
_ Have you compared random search?
_ Have you done several experiments on your experiments and reported the seeds?
Did you use scheduled or alternative criteria for in-depth assessments?
Best practices for reporting critical details
_ Have you reported how to set the hyperparameters, and what time and resources are required?
_ Did you report the time of the full NAS method (rather than, say, just for the search phase)?
_ Have you reported all the details of your trial setup?
It is worth discussing the term ‘ablation studies’ mentioned in the second set of criteria. Ablation studies originally referred to the surgical removal of body tissues. When applied to the brain, ablation studies (generally driven by a serious medical condition, with research done after surgery) help determine the function of parts of the brain.
In neural network research, ablation means removing features from neural networks to determine their significance. In NAS search, it refers to removing features from the search pipeline and training techniques, including hidden components, again to determine their importance.
Research methods in neuroarchitecture
Elskin et al (2018) conducted a survey of research methods in neuroarchitecture, categorizing them in terms of search space, search strategy, and performance estimation strategy. Search spaces can be for whole structures, layer-by-layer (macro search), or they can be limited to grouping pre-selected cells (cell search). Structures built from cells use a significantly reduced search space; Zoff and others. (2018) Estimated 7x acceleration.
Strategies for searching for neural structures include random search, Bayesian optimization, evolutionary methods, reinforcement learning, and gradient-based methods. There were indications that all of these methods were working, but none really stood out.
The simplest way to estimate the performance of neural networks is to train and verify the networks on the data. Unfortunately, this can lead to computational demands on the order of thousands of GPU days for neural architecture research. Methods for reducing computation include lower accuracy estimates (fewer training periods, less data, and mini models); extrapolation of the learning curve (based on only a few periods); warm start training (initialize the weights by copying them from an original form); and one-shot models with weight-sharing (the sub-graphs use weights from the one-shot model). All of these methods can reduce the training time to a few GPU days instead of a few thousand GPU days. However, the biases introduced by these approximations are not yet well understood.
Project Petridish from Microsoft
Microsoft Research claims to have developed a file A new approach to research in neuroarchitecture who – which Adds short connections to existing network layers Weight sharing is used. Actively added shortcut links Gradient boosting performance on boosted layers. They call this project Tridish.
This method is supposed to reduce training time to a few GPU days instead of a few thousand GPU days, and support warmly started training. According to the researchers, the method works well in both cellular research and macroscopic research.
The quoted experimental results were very good for the CIFAR-10 image data set, but not distinct for the Penn Treebank language data set. Although Project Petridish looks interesting separately, without a detailed comparison to the other methods discussed, it is not clear if it is a significant improvement to neural architecture research compared to other acceleration methods we have discussed, or just another way to get to the same place.