Sunday 23 November 2014

Monitoring SharePoint 2010 – Tutorial


To ensure the availability and reliability of your SharePoint Server 2010 environment, you must actively monitor the physical platform, the operating system, and all important SharePoint Server 2010 services. Preventative maintenance will help you identify potential errors before an error causes problems with the operation of your SharePoint environment. Preventative maintenance combined with disaster recovery planning and regular backups will help minimize problems if they occur. Monitoring your SharePoint environment involves checking for problems with connections, services, server resources, and system resources. You can also set alerts to notify administrators when problems occur. Windows Server and SharePoint Server 2010 provide many monitoring tools and services to ensure that your SharePoint environment is running smoothly.
The following maintenance tasks let you establish criteria for normal behavior of your environment and to detect abnormal activity. It is important to implement these daily maintenance tasks so that you can capture and maintain data about your SharePoint environment, such as usage levels, possible performance bottlenecks, and administrative changes. The following sections describe specific monitoring tasks which then map to the checklists as described below.
1. Diagnostic Logging
The Unified Logging Service (ULS) provides a single, centralized location for logging error and informational messages related to SharePoint Server and SharePoint solutions. Systems administrators have one place to look when they need to troubleshoot an issue or monitor the overall health of the environment.
SharePoint Server 2010 includes improvements that are related to the management of the Unified Logging Service (ULS or Trace Logs) logs and that make it easier for administrators to troubleshoot issues. These are described in the following sections.
2. Event Throttling
Event throttling enables administrators to control the types of event that SharePoint Server log based on the level of severity. The administration of throttling is divided into two sections:
1. Destination
Log entries can be reported in two places. The first is the “Event Log”, which is the standard Windows Event Log. Administrators can use the Windows Event Viewer application to review entries. The second is the ULS or “Trace Log”, a text based log format that is specific to SharePoint Server and is stored on the file system. The default location is C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\14\LOGS.
2. Category
The event throttling dial can be applied to specific categories which map directly to SharePoint Server functionality. This enables the administrator to increase the logging detail for SharePoint components individually, thereby managing the size of the logs and the amount of information to review.
The default settings for all categories are as follows:
• Event Log: Information
• Trace Log: Medium Level
During normal operation, these settings are an appropriate balance of detail and performance. During substantial reconfiguration of SharePoint Server, during the installation of custom solutions, or when SharePoint Server is experiencing issues, the throttling dial should be turned down. This ensures as much information is available as possible for troubleshooting.
Finally, after completing any troubleshooting, logging can be returned to the default by selecting the “Reset to default” option in the throttling drop-downs. Settings that are not currently configured with the default option will appear in a bold font.
Correlation IDs
Correlation IDs are GUIDs that are assigned to events which occur during the lifecycle of a resource request. This value is surfaced within error messages, the ULS logs, and tools like the Developer Dashboard. This value helps an administrator locate and isolate a specific request across the ULS log, Usage Logging database, and SQL Server Profiler data sets for debugging purposes.
For example, administrators can take the Correlation ID that appears on an error page in their browsers and then rapidly locate any related entries in the ULS logs through a simple search.
Correlation IDs also span machine boundaries. If a request, such as a front-end Web server calling a Web service on an application server, crosses a machine boundary the assigned Correlation ID can provide a complete overview of activities during the life-cycle of the request.
Event Log Flood Protection –
Event Log Flood Protection prevents the “Event Log” from being overwhelmed with many repetitive events. When Event Log Flood Protection is enabled (default), it will start trimming events after the same event is logged five times within two minutes. At this point it suppresses additional entries. After an additional two minutes, it throws a summary event that describes the number of times that the event would have been repeated. An administrator can modify these thresholds.

3. ULS or Trace Logging -

Trace Logs can quickly consume disk space, especially when configured to use the more verbose output settings. To manage this growth, administrators can implement two types of restrictions:
a) Administrators can determine the number of days that log files should be kept. By default this is set to 14 days.
b) Administrators can also place a limitation on the overall disk space that log files can consume. This is disabled by default but provides for an additional layer of protection aimed at preventing excessive disk space consumption.
2. Usage Data and Health Data Collection
In addition to Diagnostic Logging, SharePoint Server 2010 also proactively logs information that is related to the overall health of the farm. As an administrator you can individually select which events are monitored, for example the usage of features, page load times, and search queries.
This functionality both consumes disk space and has a performance overhead. Like Diagnostic Logging, care needs to be taken to manage it appropriately. The following options are available to administrators:
1) Health Data Collection
Health reports are built by taking snapshots of various resources, data, and processes at specific points in time. The number of Timer Jobs to schedule will depend on the number of events that you selected to monitor. The frequency of these jobs can be modified to manage the performance impact.
2) Log Collection Schedule
The Log Collection Schedule Timer Job is responsible for collecting Usage Logs from the various servers in the farm, processing them, and then populating a centralized database from where they can be queried for reporting. Once processed, the logs are deleted from disk, freeing up the space they were consuming. The frequency of this job can be modified to manage the consumption of disk space.
Note:
Everything that is being logged to the Windows Event Viewer and to the SharePoint log files is also being stored in the SharePoint Server 2010 logging database. The logging database is also used by the SharePoint Health Analyzer and by SharePoint usage reporting.
Use the following checklist to implement these features in your daily operations:
4. SharePoint Health Analyzer
SharePoint has a number of features that log and gather detailed statistics about all aspects of the health of the environment. The SharePoint Health Analyzer aggregates all of this data, identifies possible problems, then proactively looks for, and recommends solutions.
Many solutions that it finds will include a “Repair Now” link, which when selected will automatically resolve the problem. Other solutions will link to online help content which is constantly updated with the latest information about the problem.
Like the “Best Practices Analyzers” available for other platforms (such as Microsoft Exchange Server), Health Analyzer includes a set of rules which can be extended by developers and which is continuously compared to the existing settings and metrics drawn from your production environment. Rules are applied across a number of categories, including security, performance, configuration and availability.
5. Timer Jobs
The monitoring features in SharePoint Server 2010 use specific timer jobs to perform monitoring tasks and collect monitoring data. The health and usage data might consist of performance counter data, event log data, timer service data, metrics for site collections and sites, search usage data, or various performance aspects of the Web servers. The system uses this data to create health reports, Web Analytics reports, and administrative reports. The system writes usage and health data to the logging folder and subsequently to the logging database.
You might want to change the schedules that the timer jobs run on to collect data more frequently or less frequently. You might even want to disable jobs that collect data if you are not interested in them. You can perform the following tasks on timer jobs:
• Modify the schedule that the timer job runs on.
• Run timer jobs immediately.
• Enable or disable timer jobs.
• View timer job status. You can view currently scheduled jobs, failed jobs, currently running jobs, and a complete timer job history.
6. Web Analytics
The reports that the Web Analytics functionality in SharePoint Server generates provide detailed insight into how your SharePoint environment is being used, and how well it’s performing. Administrators should become familiar with these reports and how they can create their own (directly in the browser) to plan future capacity and to produce benchmarks to compare with future farm configurations.
All of these reports can be used to help you decide if the current architecture remains “fit for purpose,” meaning that it meets the desired service levels.
The reports are broken down into three categories and can be reviewed based on Web Application, Site Collection, Site and Search Service.

Reference:

http://www.learningsharepoint.com/2010/10/16/monitoring-sharepoint-2010-%E2%80%93-tutorial/


No comments:

Post a Comment