Incident Response

  • Defining incident. An organization’s incident response policy needs to include a precise definition of a security incident. For example, “an event or anomaly that has been determined with high probability to indicate a breach.”
  • Defining risk-based prioritization of incidents. Responders need to classify incidents based on severity. Classification should be simple (high, medium and low) and based on the scale and scope of the attack as well as the impact on confidentiality, integrity and availability of information and operations in the context of enterprise risk.
  • Describing the security response organization. The description should do the following:
    • include staff roles, responsibilities and levels of authority;
    • address compliance and regulatory requirements;
    • include overarching guidelines for external communications; and
    • describe handoff and escalation points in the incident management process.
  • Determining plans and procedures of the policy. These cover the specific nuts and bolts of response, including metrics for measuring the incident response capability and its effectiveness, checklists, detailed processes and forms the incident response team uses.
  • Having a battle-tested approach to internal and external communications. Incident response policies should include plans and timeframes for communicating proactively with both internal stakeholders — including legal, human resources and client services — and external ones, such as customers, the press and law enforcement. Where possible, the plan should include scripts the team can build on when issuing statements and updates.
  • Having a templated approach for incident detection, analysis, containment and remediation. The more cookie-cutter the response, the faster and more effective it is. The incident response policy should quickly classify incidents into categories — denial of service, data exfiltration and so on — and prescribe broad-based approaches to responding to each category.
  • Generating an auditable log that can serve as proof of chain of evidence. A security breach is a disaster, but it is also very likely a crime. That means that data is evidence — and the best way to protect that evidence is to have in place automated logging systems that track and document how evidence has been captured and preserved. Logs can serve as technical documentation for post-mortems and should include a variety of information:
    • identifying information — e.g., the location, serial number, model number, hostname 
    • name, title and contact information for each individual who collected or handled the evidence during the investigation;
    • time and date — including time zone — of each occurrence of evidence handling; and
    • locations where evidence was stored.
  • Conducting effective post-mortems. The incident response policy should call for holding a “lessons learned” meeting with all involved parties after a major incident. This is critical when it comes to improving security measures and the incident response process. The National Transportation Safety Board (NTSB) provides a good model that focuses on fact-finding rather than fault-finding. Senior management should consciously create an NTSB-like culture, even going so far as to name its team the Information Safety Board. The post-mortem should generate two things: an incident report, which serves as institutional knowledge for future reference, and a list of any changes needed in the policy and the security infrastructure. These two documents ensure that future responses are faster and more effective.

Incident Handlers Checklist

1. Preparation

a. Are all members aware of the security policies of the organization?
b. Do all members of the Computer Incident Response Team know whom to contact?
c. Do all incident responders have access to journals and access to incident response
toolkits to perform the actual incident response process?
d. Have all members participated in incident response drills to practice the incident
response process and to improve overall proficiency on a regularly established basis

2. Identification

a. Where did the incident occur?
b. Who reported or discovered the incident?
c. How was it discovered?
d. Are there any other areas that have been compromised by the incident? If so what are
they and when were they discovered?
e. What is the scope of the impact?
f. What is the business impact?
g. Have the source(s) of the incident been located? If so, where, when, and what are
they?

3. Containment

a. Short-term containment
i. Can the problem be isolated?
1. If so, then proceed to isolate the affected systems.
2. If not, then work with system owners and/or managers to determine
further action necessary to contain the problem.
ii. Are all affected systems isolated from non-affected systems?
1. If so, then continue to the next step.
2. If not, then continue to isolate affected systems until short-term
containment has been accomplished to prevent the incident from
escalating any further.
b. System-backup
i. Have forensic copies of affected systems been created for further analysis?
ii. Have all commands and other documentation since the incident has occurred
been kept up to date so far?
1. If not, document all actions taken as soon as possible to ensure all
evidence are retained for either prosecution and/or lessons learned.
2. Are the forensic copies stored in a secure location?
a. If so, then continue onto the next step.
b. If not, then place the forensic images into a secure location to
prevent accidental damage and/or tampering.
c. Long-term containment
i. If the system can be taken offline, then proceed to the Eradication phase.
ii. If the system must remain in production proceed with long-term containment
by removing all malware and other artifacts from affected systems, and harden
the affected systems from further attacks until an ideal circumstance will
allow the affected systems to be reimaged.

4. Eradication

a. If possible can the system be reimaged and then hardened with patches and/or other
countermeasures to prevent or reduce the risk of attacks?
i. If not, then please state why?
b. Have all malware and other artifacts left behind by the attackers been removed and
the affected systems hardened against further attacks?
i. If not, then please explain why?

5. Recovery

a. Has the affected system(s) been patched and hardened against the recent attack, as
well as possible future ones?
b. What day and time would be feasible to restore the affected systems back into
production?
c. What tools are you going to use to test, monitor, and verify that the systems being
restored to productions are not compromised by the same methods that cause the
original incident?
d. How long are you planning to monitor the restored systems and what are you going to
look for?
e. Are there any prior benchmarks that can be used as a baseline to compare monitoring
results of the restored systems against those of the baseline?

6. Lessons Learned

a. Has all necessary documentation from the incident been written?
i. If so, then generate the incident response report for the lessons learned
meeting.
ii. If not, then have documentation written as soon as possible before anything is
forgotten and left out of the report.
b. Assuming the incident response report has been completed, does it document and
answer the following questions of each phase of the incident response process: (Who?
What? Where? Why? And How?)?
c. Can a lessons learned meeting be scheduled within two weeks after the incident has
been resolved?
i. If not, then please explain why and when is the next convenient time to hold
it?
d. Lessons Learned Meeting
i. Review the incident response process of the incident that had occurred with all
CIRT members.
ii. Did the meeting discuss any mistake or areas where the response process
could have been handled better?

IT Incident Response Summary Report

This section describes the incident briefly and identifies when it happened and when it was resolved, along with the impact, such as the number of requests that resulted in errors and the problem that was the root cause of the incident.

Timeline 

This section identifies the precise times of all related events and list the time zone, if relevant. These events include the first report of the incident, all actions taken to resolve the issue and consequent events and the time that the incident was resolved.

Root Cause

This section describes the problem that caused the incident in as much detail as possible.

Resolution and recovery

This section describes all the actions taken, along with the times when they were implemented, in detail. Any results of actions taken should also be described, even if the measures were not effective.

Corrective and Preventative Measures

This section discusses what measures should be taken to prevent a similar incident in the future, including any changes to systems or procedures that are recommended. The section also includes any recommended improvements to the incident response system.