Here’s how two organisations in Asia Pacific are adopting AIOps (artificial intelligence for IT operations) to enhance the efficiency and resiliency of their IT operations.
Taiwan’s National Center for High-Performance Computing (NCHC) is leveraging IBM Cloud Pak for Watson AIOps software to maximise the resilience and performance of its supercomputers.
NCHC helps accelerate research and innovation nationwide by providing access to supercomputers and analytics and facilitating nationwide networks for data sharing and collaboration. It is building a central network exchange to support cross-discipline efforts and cross-network collaboration. It uses the IBM software as a central integrator of the network exchange’s diverse array of IT operations tools to produce a holistic view of the entire infrastructure.
By doing so, NCHC can train AI models to automatically and proactively manage problems and incidents. As such, the team shortened the mean time to detect issues that would affect service by 55%, enabling NCHC to detect potential outages 25 hours earlier than it could before.
As for Globe Telecom, the Philippines-based telco uses AIOps to monitor 180 business systems and 2,000 servers throughout the company while catching every technical glitch before an outage occurs or service erodes.
“We wanted to monitor our networks, applications and databases from a single pane of glass so we could keep a watchful eye on our systems 24 hours a day, seven days per week — without missing any critical alert. [That way, we can] offer seamless services and enhance end-user satisfaction,” says Joseph Manalang, Globe Telecom’s service operations intelligence centre manager.
See also: Testing QA New Section BDC Feature Winner 1
With the help of Splunk’s AIOps platform, Globe Telecom replaced 20 monitoring screens with a centralised framework. The framework allows them to investigate events through a single, at-a-glance view while displaying data-driven analytics results on their customised dashboards.
As a result, the telco could save at least 40% of its outsourcing costs and resources for system monitoring. The time needed to detect incidents in tier-one systems and direct them to the correct response team has since been reduced from 85 to 15 minutes, while its incident reporting time decreased by half.