In an era in which data increasingly drives business decisions, the role of the database administrator (DBA) is becoming ever more important. But businesses’ unwavering appetite for information has given rise to a proliferation of databases, making life harder for DBAs at the time when executives need them most.
Fortunately, there are a few simple steps that you can take to maximise the value of your databases and your SQL server deployments in particular. While it would be impractical to formulate a strategy for every possible situation, you can put in place a handful of strategies to overcome panic and methodically focus on issues within your control.
1. Assess your SQL server environment
Whenever you take on any new database management assignment, begin by assessing the overall condition of the SQL Server environment. Start organising your efforts as follows:
Take stock of your inventory
How many instances and databases in the enterprise are you managing? An up-to-date inventory of the hardware and systems in your purview is critical. This helps identify the areas you need to address.
Knowledge of your end users, database history and server names becomes the framework that locates and evaluates trouble spots in your infrastructure. Equipped with a wider view of the trouble spots, you can start to identify and prioritise problem areas.
Prioritise issues by severity
With your framework in hand, you can determine the severity of issues and assign priorities. Analyse your environment with the big three resources in mind: disk I/O, memory and CPU. The capacity and performance of these resources will delineate what you can and cannot do in terms of optimising SQL Server performance.
Identify and prioritise the most critical handful of issues in terms of environment performance, security or (in extreme cases) viability. It will help you if you think of database performance through the lens of the business rather than as a purely technological issue. Once your systems are ranked in order of importance, rank them again in order of the severity of issues.
Identify required resources
Once you’ve captured tactical data and prioritised which issues represent the biggest and most immediate threats to business operations, you need to identify which resources are required to stabilise the SQL Server environment in the short term. Possible resources include technical expertise, software tools, server capacity and backup storage. Ask questions such as:
If backups or database performance are an issue, what steps can be taken and roughly when can a long-term fix be identified?
Your triage report should show that you have methodically assessed the situation and its importance.
Assess security measures
While performance and availability will naturally be front-of-mind during the first stages of triage, you’ll also need to assess security liabilities, even though some may represent middle- or long-term threats. In order to do this, you should:
- Review user access, permissions and passwords
- Find and document compliance liabilities
- Institute a code free during triage step
By taking inventory, prioritising issues, identifying resources and assessing security, you’ll have the information you need to move to the next step which is to develop best practices for effective backup and recovery.
2. Establishing effective backup and recovery
With your infrastructure continually growing in volume and complexity, you find yourself in a constant balancing act. Disaster recovery remains at the forefront of your responsibilities. With maintenance windows shrinking and data volumes exploding, the need to follow strict recovery time objectives (RTOs) and recovery point objectives (RPOs) is more important than ever.
Many DBAs make the mistake of thinking that because they have high availability (HA), a backup and recovery strategy is unnecessary. However, the possibility of data loss is a risk even in HA scenarios. Moreover, end-user downtime will surely attract unwanted attention. Therefore, it’s critical to have a disaster recovery (DR) strategy even if you have high availability.
What’s the difference between HA and DR?
High availability is the measurement of a system’s ability to remain accessible in the event of a system component failure. Generally, HA is implemented by building in multiple levels of fault tolerance into a system.
Disaster recovery is the process by which a system is restored to a previous acceptable state, after a natural or man-made disaster. With DR, there can be a significant loss of service while the DR plan is executed and the environment is restored.
In short, HA is about maintaining service, and DR is about retaining data. It is imperative that both HA and DR strategies be driven by business requirements.
Therefore, they should address non-functional requirements such as:
- System availability
- Fault tolerance
- Data retention
- Business continuity
- User experience
High availability tools and functionality built into SQL Server are designed to ensure uptime. Here are a few options to consider:
- AlwaysOn Failover Cluster Instances
- AlwaysOn Availability Groups
- Database mirroring
- Log shipping
Establish both a backup strategy and a recovery strategy
There is no way to simply implement a recovery strategy without first assessing backup processes. Backups are instrumental to protecting critical data and giving you the ability to execute point-in-time recovery for the data in your environment.
If you do identify servers that are not currently backed up, you may need some backup of your own:
- Engage your managers and explicitly declare a backup gap as a major problem.
- Investigate whether there may be a valid reason for databases not being backed up and suggest a minimum level of protection (according to your tolerance).
- Follow up with the owners of the databases to determine their RTOs and RPOs.
You should also be using transaction log backups to ensure you can restore your data to a specific point in time and create a reliable restore process to guarantee seamless and automated backups.
3. Continue optimising SQL server management
Once you have assessed your SQL Server environment and established effective backup and recovery, you need to consider how to maintain SQL Server management optimisation going forward.
Communicate proactively across your organisation
Proactive communication is a critical skill for any successful DBA. As you work to establish and maintain an optimal SQL Server environment, you’ll need to have discussions with appropriate parties within your department, including both your manager and peers, as well as with key extra-departmental players whose needs have a large bearing on your work. Best practices for proactive communication include:
- Establish good relationships with developers and management — Capture and discuss key database performance issues and events for your team.
- Gain credibility — Demonstrate awareness of problems and share what you intend to do about them.
- Establish transparency and accountability — Become a conduit of relevant information for management and colleagues, keeping them abreast of the health, status and ongoing priorities in managing the environment.
- Report on both problems and solutions — Keep a standardised report in which you update stakeholders on the status and next steps of key issues to show that you’re bringing constructive solutions along with the bad news.
Shift from reactive mode into proactive mode
In chapters one and two, you ensured that your environment’s most immediate problems have been addressed and that your data is recoverable within acceptable parameters. It’s now time to look ahead and introduce stability and visibility into your environment. This is your opportunity to switch from reactive to proactive mode by tuning your environment to prevent problems in advance.
Follow best practices for deploying code
You can further protect your gains by instituting best practices that introduce new code into the environment. Many DBAs face constant requests to push out code from
various points in the organisation, which results in adhoc deployments that not only cause a great deal of stress, but also introduce security and performance risks.
Part of the solution lies in developing a standard process for reviewing and scheduling code deployment requests. Working with developers and other colleagues to craft this process might take some doing — be sure to take advantage of the best practices for proactive communication outlined earlier.
Take advantage of automation
Stabilising your environment also means stabilising your routine; as you implement proactive strategies, automation becomes a necessity. It also gives you the time you need to engage with your organisation, tackle problems like ensuring your system can scale with the business and become a rock-star DBA.
Any repeatable task is a candidate for automation. As you identify ways to move from being a firefighter to the guardian of an environment that breezes along, look for tasks that can be run automatically and for tools that can help deliver that automation.
Solving the challenges of your evolving database environment can be stressful, but also intensely satisfying. We know it’s not easy with growing data volumes, increasingly complex database environments and diverse customer demands. But, by following the best practices, you can proactively manage your SQL Server infrastructure, protect your data and continue to thrive in the new database world. You might even be able to enjoy more evenings and weekends doing the things you love – besides database administration.