The use of artificial intelligence (AI) continues to transform enterprises by creating new products, boosting revenues, cutting costs and increasing efficiency. But getting to those successful implementations has been tricky for some organizations, given the complexity of the technology and potential for a high failure rate for those who jump in without a plan.
Teams that have been given the green light to move ahead with an AI project must make sure they have a solid IT-led strategy that can unlock and accelerate AI’s potential. This involves choosing the right infrastructure that allows for growth and flexibility, having the right team — both internally and externally — and being able to effectively control costs throughout the project’s lifecycle.
“IT teams are now evolving and adapting their infrastructures to handle the unique demands of AI,” said Tony Paikeday, senior director of AI systems at NVIDIA. “What IT knows really well is the discipline and rigor involved in the continuous innovation and development cycle of how to take concepts into prototypes, test and validate them, and put them into production. AI, like a lot of other things, is never one and done. It’s a recursive process, because the data that fuels them changes over time.”
In some cases, AI projects are handled by business leaders, or they are developed by data scientists in different groups that don’t have IT support. That may be fine for initial “ad-hoc pilot projects,” says Paikeday, but once an AI program develops into a major initiative, it needs to be led by IT.
“IT knows the DevOps rigor of how to continuously innovate and deploy applications, so they can offer a bridge between the data science practitioners who know how to experiment, and those who understand the reality of landing something in production,” he said. “When these two teams come together on top of the right infrastructure that has been purpose-built for the unique demands of AI, then good things happen.”
Know the AI infrastructure
A key decision that IT must make before an AI project begins is to understand the infrastructure necessary for a successful initiative. Many companies are exclusively cloud at the start because of the ease of access and familiarity, but then find down the road that the escalating costs involved require a switch back to an on-premises solution or hybrid offering.
A big obstacle related to this which should be weighed carefully is data gravity, where large datasets tend to attract resources and applications towards them. Like planetary gravity, if your compute is somewhere other than where your data is created and stored, you’ll inevitably spend more time and money trying to resist the pull of data gravity in the form of data storage and transit costs. To avoid these pitfalls, IT needs to understand the nature of keeping the project’s data as close to the computing resources as possible. For example, AI model training should be carried out on premises if the data for the project is generated on site. Similarly, data generated in the cloud should also be processed there; Avoiding large storage-moving (egress) costs helps alleviate the data gravity issue.
However, those two options for infrastructure — on-premises platforms like NVIDIA DGX and cloud — are not the only ones. Managed infrastructure, such as NVIDIA DGX Foundry which offers private dedicated infrastructure sitting in a colocation facility that can be rented, allows companies to enjoy the benefits of having an on-premises solution (where data and processing is in the same space), but without the headaches of managing infrastructure or even owning a data center. This may be a great option for companies looking to pick the right infrastructure for their AI project.
Click here to learn more about how an IT-led strategy can benefit your AI projects with NVIDIA DGX Systems, powered by DGX A100 Tensor core GPUs and AMD EPYC CPUs.