Event Management

From Projects
Jump to: navigation, search

Operational Practices: Event Management

The ROC will operate under a series of normal activities or tasks, as documented on the Daily ROC Routine. Daily routines are supported through an event management methodology, described below.

Statement of Procedures and Service Standards v1.0

It is the ROC's goal not only to maintain high standards in our approach to resource management and optimization, but to provide clear, consistent, and established communications with our call centers, management and supporting organizations in the event of incidents which impact our performance. The main focus of the ROC is to provide call center performance monitoring, resource management support, and real-time fault isolation capabilities for the company's Customer Care department. The ROC utilizes an event management system for status reporting, event correlation, and incident tracking to support returning operations to normal when events cause degradation to customer service.

Scope

The Southern Division Customer Care department operates on an Avaya phone network, currently designed with multiple hosts; ACD1/2/3/4/5. Workforce management software is integrated with the Avaya platform for resource planning and management, with a single Verint WFM database managing the forecast, schedules and real-time adherence data for over 6000 customer service agents. The ROC utilizes a series of hardware and software solutions to aggregate data across the division for quick identification and event management when an incident impacts the performance of the call centers. Upon initial operational launch, October 22, 2009, the ROC will operate 21x5 (6am to 3am M-F) and 18x2 (8am to 12am S-S).

Event Management: Tracking Tools

The ROC will be utilizing two software applications to manage, track and communicate all duties assigned. To effectively manage this workload, the two applications are aligned with the severity & type of event or request:

OpsCenter: Severity 1,2 & 3 Incidents

OpsCenter is an incident management application allowing the ROC to effectively manage unplanned events that directly impact the service we deliver to our customers. Cases will be opened and classified in severity when unplanned events cause our service level within a call center to drop below acceptable limits.

CSC Tickets: Severity 4 Requests

The standard ticketing system is utilized by the ROC to manage and work standard ticket requests generated by call center Workforce Management teams. These requests would not be classified as "customer impacting", but would include standard business requests for new requests, change requests, or enhancements to our WFM & associated call routing systems.

Obtaining ROC Support

In monitoring the health of all call centers through the Southern Division, the ROC will actively engage the real-time teams and call center management when incidents arise which cause call center performance to deteriorate for an extended period of time.

Services and support may also be obtained from the call center in need of assistance related to resource management, call routing and related WFM support. Services and support should be obtained by predetermined primary and secondary WFM points of contact from each site rather than a variety of call center supervisors/managers. ROC services and support can be accessed by one of the following methods:

  1. Telephone: 678-xxx-xxxx  (Severity 1, 2, 3)
  2. Ticket Submission:  (Severity 4)

The use of the two methods should align with the severity matrix.

Telephone Contact

Telephone contact with the ROC is the most immediate means of receiving any type of service or support requiring immediate attention. Call centers inquiring about IVR issues, call routing, Service Level degradation, 3rd Party performance issues or system issues supporting the WFM operations may place a call to the ROC. The ROC staff will immediately determine if an incident has already been identified. If the inquiry is already identified, the ROC will communicate the incident case # being tracked through OpsCenter. If the call is reporting a new incident, the ROC will immediately create a case in the OpsCenter database, and provide the caller with a incident case # for future reference. The ROC includes the caller's e-mail address in the Reported By field of the incident case so that the caller/requester can receive and post updates via the ticket.

Email via Tickets

Tickets should only be submitted for Severity 4 requests. These are standard business requests that do not immediately impact service delivered to our customers. Once a ticket is submitted via - an e-mail is generated to the ROC via the [ROC-Requests] distribution. These tickets are worked during periods in which the ROC is not working Severity 1,2 or 3 issues.

Submitters of tickets through the ( ) will receive an automated response which includes the ticket number in the subject line of the email. As the ticket is updated, e-mails using this same subject line will automatically post ticket entries with the e-mailed message content. When submitting a ticket to the ROC, provide as much detail as possible within the ticket itself. This detailed information will be accessible by all ROC staff. Progress toward resolution is captured within the ticket and notification of these updates and actions are distributed to the ROC team, as well as the party submitting the ticket via e-mail. In addition, ticket status changes are communicated; as the ticket is picked up, worked and closed, the ( ) generates an email to reflect a change in the status of tickets.

Event Management Severity Matrix

Event management is our process for categorizing unplanned events which can potentially impact the service received by our customers. As a core goal of optimizing our customer care resources, the ROC must respond to unplanned events to help facilitate the restoration of service to normal operating ranges. A severity system used to prioritize unplanned events.

Severity 1 (Critical) Severity 2 (High) Severity 3 (Medium) Severity 4 (Low)
Business and financial exposure
The failure creates a serious business and financial exposure. The failure creates a serious business and financial exposure. The failure creates a low business and financial exposure. Minimal or no business and financial exposure.
Work Outage
The failure causes an entire call center or multiple centers to be unable to work or perform some significant portion of their job, such as call routing, networking or facility failures. The failure causes significant degradation to service level; however, call routing, network and facility are all functional. The failure causes service level to slip below an acceptable threshold. All call center facilities are operational, but volume and/or resource planning does not match forecasted arrivals, causing Service Level degradation. Does not reflect outage. Severity 4 is associated with questions, requests for information, or requests for change.
Service Level
0% due to routing, facility or network failure <30% sustained for >60 minutes <60% sustained for >30 minutes N/A
Response Time
Within 5 minutes. Within 30 minutes. Within 1 hour. Within two days
Resolution Time
The maximum acceptable resolution time is 1 hour, after initial response time The maximum acceptable resolution time is 4 hours. The maximum acceptable resolution time is 1 business day. The maximum acceptable resolution time is 5 business days.


The following describes the procedures and time-frames associated with event management.

Assessment and Classification

At the time an incident is identified and an incident case is created in OpsCenter, the incident is classified with an appropriate degree of severity. Incident severity is classified as one of the following:

   * Severity 1: (path to resolution needed within 0-59 minutes)
   * Severity 2: (path to resolution needed within 1-4 hours)
   * Severity 3: (path to resolution needed within 24 hours)
   * Severity 4: (referred to http://csc ticket, path to resolution needed within 24 hours)

A Severity 1 designation assumes a network or a key network resource is down and unavailable. The failure causes an entire call center or multiple centers to be unable to work or perform some significant portion of their job. This classification of incident receives immediate action, including a notification to ROC management; the Call Center WFM Team, Call Center Director/VP, the LMC Operations Manager and Director, Division Care VP, Division IT Team & IT VP, Division Operations VP, and the Division President.

A Severity 2 designation assumes that a failure causes significant degradation to service level; however, call routing, network and facility are all functional. A Severity 2 is declared when the interval service level to our customers drops below 30% over two consecutive intervals (a sustained period of 60 minutes or longer). This designation is also used when a network resource is down but for which there is an operating redundant resource. It also receives immediate action, including a notification to ROC management, the Call Center WFM Team, and Call Center Director/VP. The primary difference in this classification is that immediate action is evaluated against Severity 1 incidents, which allows the ROC to prioritize multiple service affecting incidents occurring at the same time and assign resources accordingly.

A Severity 3 designation relates to a failure which causes the interval service level to slip below a level of 60% for one or more consecutive 30-minute intervals. All call center facilities are operational, but volume and/or resource planning does not match forecasted arrivals, causing service level degradation. Severity 3 issues are addressed within 1 hour of the 2-interval trigger, and are prioritized behind Severity 1 and 2 cases. Notification is generated to ROC management, the Call Center WFM Team, and Call Center Director/VP.

A Severity 4 designation relates to requests generated by the Customer Care WFM teams, and should be managed through ticket submission to the csc. The classification is used to track the timely response to requests, and are prioritized behind active Severity 1, 2 & 3 cases. Target response time is within 2 days, with maximum resolution time of 5 days. Notification is generated through the http://csc to the ticket requester, and to the ROC analyst team.

Fault Diagnosis

After assessment and severity classification, the ROC will utilize a Real-Time Cause and Effect Fishbone to identify the factors leading to Service Level objectives being missed. This standardized process for correlating SL failure events to root causes is critical in identifying patterns, and targeting areas for improvement in WFM planning.

Notification to ROC & Call Center Management

To maintain high standards of service, the ROC uses a combination of phone calls, broadcast calls, text messaging to cell phones, and e-mails to ensure that any member of the ROC team may be reached regardless of their location. These notifications occur for each incident classified as Severity 1 & 2 and are sent as soon as practical after the problem begins, but in no case later than 30 minutes after the problem is observed. For Severity 1 cases, a notification is sent hourly to ROC Management with problem status, until the problem is resolved.

For Severity 2 cases, a notification is sent twice daily to ROC & Call Center Management with problem status, until the problem is resolved. These notifications occur at approximately 0800 and 1600, with problem status. When the incident is resolved, a final e-mail to notification is sent to ROC Management summarizing the problem and its final resolution.

Escalation

The ROC strives to assure that escalation is seamless to those who report Resource Management & call center performance problems. When a problem is beyond the capability of the ROC team involving network failures associated with Severity 1 cases, the ROC escalates problems to the Division IT team. This escalation may occur within a shorter time frame, as the ROC team may realize immediately that additional support is needed.

The Division IT team is informed of the problem or failure and is provided with all supporting information. At this point a strategy is decided upon and documented in the trouble ticket.

For Severity 2 & 3 tickets: (Escalation process to be defined)

Closure Communication

Upon resolution of an incident, the ROC will change the status of an incident case to resolved, which triggers an email notification to the distributions associated with the case.

For Severity 1, 2 & 3 cases, the email generated states the following:

Incident Case [ROC #1722] regarding:

" Atlanta Degraded Service Level 50%",

is shown as completed in our records and will be closed unless you respond to this notice requesting additional support and/or information.

Thank You,

ROC Operations 678-xxx-xxxx


Post-Resolution Analysis

All incident cases categorized as Severity 1 are reviewed jointly with the Call Center Leadership, the Division IT & Care Leadership and the ROC Manager in a post-resolution analysis meeting. The purpose of the meeting is to review all of the data relevant to the problem with a goal of improving operational procedures.

This analysis is also available to our clients upon request.

All Severity 1, 2 & 3 cases generate an incident report contained within the OpsCenter application.

When Severity 2 incidents re-occur multiple times within a week, the ROC will generate a ROC Performance Analysis Report - Sample Report. This report gives an analysis of the recurring incidents, including:

  • Incident Description
  • Type of Investigation Undertaken
  • Findings
  • Positive Features and Good Practice Recognized
  • Recommendations and Action Plan
  • Appendices with supporting detail

The ROC Management will facilitate meetings with call center sites to review Severity 2 Performance Analysis Reports, and make recommendations for actions to address the causes identified as leading to Service Level Failures.

Regular Status Reporting

The ROC prepares status reports on a daily, weekly and monthly basis. Each daily status report is generated from OpsCenter, and contains a list of all open incident cases sorted by severity status in ascending order, and by date opened (oldest to newest) within each status. Each weekly report contains a list of all tickets closed since the date of the previous weekly report.