IT operations teams have a lot to reconcile. They manage servers, networks, cloud infrastructure, user experience, application performance, and cybersecurity, often working independently of each other. Employees are often overworked, overburdened by excessive alerts, and struggle to solve problems involving multiple areas.
Enter AIOps, a burgeoning field of technologies and strategies that inject artificial intelligence into IT operations in an effort to solve challenges faced by IT operations teams by reducing false positives, using machine learning to spot problems before they occur, automating remediation, and seeing a comprehensive solution from an enterprise viewpoint. .
According to the month of October Survey of IT leaders conducted by ZK Research and Masergy, 65% of companies are already using AIOps, and 94% say AIOps are “important or very important” for managing the performance of network and cloud applications. Additionally, 84% see AIOps as a path to a fully automated network environment and 86% expect to have a fully automated network within the next five years.
Although AIOps are still new, they are already proving their worth. according to Survey by Enterprise Management Associates Released this summer, 62% of companies see a “very high” or “high” ROI from their investment in AIOps, and the rest say they’ve broken the tie, or it’s too early to tell.
But the path to AIOps is not always smooth. More than half of EMA respondents also stated that implementing AIOps is “difficult” or “extremely difficult”. The most common obstacles reported by companies include cost, data quality, conflicts within IT, mistrust of AI, skills shortages, and integration challenges.
There is no clear strategy before it is adopted
Today’s IT organizations work under high pressure and can feel that there is not enough time for systematic preparation.
“Organizations in general are short on time and resources,” says John Carey, managing director of the technology practice at AArete, a global management consulting firm.
Oftentimes, AI projects begin as experiences that turn into opportunities. “You need a strategy,” says Carey. “AIOps should be comprehensive and planned.”
Rolling out a tech solution without clearly identifying the challenge you’re trying to solve is an age-old problem for IT, agrees Doncha Carroll, partner in the revenue growth practice at Axiom Consulting Partners. Carroll recommends companies take the time to detail the nature of the problem they will solve and how it will affect business.
“And make sure that a more conventional solution is not appropriate or effective,” he says. “Otherwise, you could invest a lot of dollars in implementing a solution that doesn’t deliver the vision I set up for it.”
In fact, according to an EMA survey, even though companies have been globally positive about their investment in AIOps, a staggering 80% are looking for a new platform – and half are planning to switch within the next year.
The biggest reasons? They are looking for more flexibility, scalability, more artificial intelligence, machine learning, and advanced analytics. Such drastic switches underscore the fact that companies often forget to look at the broader picture in order to ensure that the solution they choose can serve the business in the long run, Carroll says.
“It’s important to think about developing a comprehensive strategy, and then implement it based on a use case,” he says.
Poor or incomplete data
According to an EMA survey, data issues are the second biggest obstacle to successful AIOps deployments, after cost.
AI and machine learning live and die on training data. But a company’s legacy process systems may not collect performance data in a consistent manner. Critical aspects may also be missing or contradictory information reported.
“The market today is in its first generation phase,” says Gregory Murray, senior research director at Gartner. “We analyze the data that we have because it is the data that we have.”
He says something similar happened with hard drives. For years now, hard drives have had tools and analytics that predict drive failure, and they’re equipped with exactly the telemetry they need to make those predictions.
“Outside of this use case, you don’t need that data,” Murray says.
The same will happen with AIOps. As the industry rolls out AIOps technology, we’ll learn more about what data actually needs to be collected.
“There is promise of improved accuracy and precision once we start creating data sets suitable for this purpose,” he says.
When the data is available, it may not necessarily be in a format that makes a good training data set. For example, companies may want to know if a particular change will cause problems based on affected servers and applications, says George Machado, partner at McKinsey & Co. To perform this analysis, a written description of the change is a critical factor.
“If it’s poorly written, then running NLP on that text won’t give you any interesting insights,” Machado says. Likewise, AI won’t be able to pick out patterns in open ticket descriptions if they are poorly written, he adds.
More importantly, significant data sets are often incomplete. For example, a company may want to associate an event with related applications, networks, or servers. “But no customer has a perfect change management database,” Machado says, adding that these issues require a significant effort to resolve.
Insufficient coverage
To get the full benefits of AIOps, companies need to bring as many systems under their umbrella as possible, since a problem in one part of the environment can have ripple effects in another. The network issue might actually be a cybersecurity issue, or the user experience issue might be caused by a slow database server.
“As more companies move to digital technology, there are more interconnections in applications,” Machado says. “If the application is performing poorly, it can potentially cause problems on other systems.”
But there are many obstacles to getting there. One is the cost of such a system. Another challenge is the integration to make all relevant data sources to work together. Machado says there are regulatory aspects that need to be addressed. Ultimately, your organizational fragmentation forces your tool fragmentation.
He adds that it is not just about information technology silos. AIOps need input from other areas of the business to be effective. For example, if a company has a big product launch, new marketing campaign, or offers a huge discount, it can cause a spike in calls to the data center or traffic to a website and system crashes.
“You don’t just need to relate to application performance and server performance but upcoming events on the business side,” he says.
Will McKeown-White, an analyst at Forrester Research, agrees, “The most successful AIOps we’ve seen have multi-department use cases.” Not just those related to IT, such as cybersecurity, but communications outside IT, such as marketing, he says.
McKeon-White says an AIOps system that collects real-time user monitoring data could become a common business service, not just something that helps automate IT. “These are the most successful use cases we’ve seen.”
pay double
Another problem that can cause internal organizational conflicts is when individual teams or divisions have their own favorite toolkit and don’t want to give it up.
“Getting rid of other monitoring solutions can be a political nightmare in a lot of organizations,” McKeown White says.
Companies often compromise, maintaining their existing systems and adding the AIOps platform on top of that. But this can create duplication of jobs and increase integration challenges, he says, as well as increase expenses. “Organizations are paying too much for these tools and not getting the value they need.”
To solve this dilemma, some companies are turning to AIOps that are embedded in domain-specific systems. For example, application performance monitoring systems are increasingly using artificial intelligence and machine learning to identify problems. Large cloud vendors also add intelligent monitoring and automation solutions, as well as database vendors, and cybersecurity system vendors.
It’s a relatively easy way to get some of the features of AIOps, but at the cost of being able to have a multi-domain, multimodal view of the processes.
Using the built-in features is also faster than creating or deploying an entire AIOps platform, which is a project that typically takes 16 months or more, says Bradley Schimin, senior analyst for AI platforms, analytics, and data management at Omdia.
“Bringing all of these sources of information together, all of those signals coming from many different sources — the cloud, application sensor APIs, sensors on physical devices — it all requires integration,” he says. “This is the challenge companies have been facing for decades now.”
The big picture is missing
Domain-specific platforms can provide native automation of their functions and make AI tools transparent to users. But while maintaining silos avoids integration challenges, companies will not see the full potential of AIOps.
“If you’re trying to do something like root cause analysis to increase latency, you should be able to talk to the networking system, to the application server, to see all the different areas,” says Shimin. “No one wants to create a Jupyter notebook to check network logs to see what happened with response time.”
Ultimately, the cloud provider may be able to offer a full set of AIOps functionality, which can be useful for businesses all operating in one cloud provider. “Then you can see Nirvana AIOps come true for you,” he says. “But it’s not something you’ll get today.”
Moreover, most companies are multicloud, Simon says. In fact, the EMA survey shows a high preference for having a single platform across the domain AIOps. Of the companies that said their AIOps efforts were “extremely successful,” 80% were using a single platform. Among the companies that were not using the AIOps platform, 57% were “marginally successful”.
So it’s not surprising that while generally only 46% of companies use one AIOps platform, the rest either plan to adopt one or use more than one.
culture change
Finally, many companies find that their employees do not trust AI systems, or are reluctant to embrace change.
In the EMA survey, even in companies reporting the highest level of success with AIOps, 22% of respondents said that “fear or mistrust of AI” was the biggest challenge to their AIOps initiatives, linking it to a “skill shortage” in fourth place. in the list.
says Sanjay Srivastava, chief digital officer of Genpact, a global digital transformation consultancy. “We’re trying to break that down with explainable AI but it somehow works, and in some ways it doesn’t.”
Managing AIOps also requires a different set of skills than traditional IT management, he says. AI-oriented skills require more data engineering and the ability to model AI algorithms.
AIOps are rapidly evolving to the point where they can automatically make corporate operational decisions, such as redirecting traffic, reallocating resources, and rotating new instances. AAret’s Carey says when preparation isn’t done carefully and carefully, things can easily go wrong.
“When you actually program it to make decisions like shutting down systems, it can shut down your business,” he says. “This is probably the worst case scenario.”
Most commonly, he might make expensive mistakes.
“The most common result is that they will step in and keep adding servers and all of a sudden the cloud computing bill goes from $20,000 an hour to $100,000 an hour,” he says. “This homework must be done.”