Location
Palo Alto, CA 94301
Industries
Internet Services
Job Type
Full Time
Employee
Relevant Work Experience
7+ to 10 Years
Education Level
Bachelor's Degree
Career Level
Manager (Manager/Supervisor of Staff)
Salary
Generous bonus and stock package
Manager, Site Reliability Engineering
About the Job
The Manager of Site Reliability Engineering is responsible for day-to-day health and uptime for all Facebook services. As the leader, you are responsible for maintaining and improving service uptime, headcount growth, personnel management, and service stability. Additionally, this role is responsible for handling either planned or unplanned maintenance events as well as executing capacity and capabilities growth as Facebook expands. This position is located in our Palo Alto, CA Headquarters.
Responsibilities
* Responsible for directing and growing a team of engineers across many time zones who work to analyze and maintain service stability by documenting policies and best practices in a 7x24x365 operation
* Responsible for the day-to-day health of all network, server, storage, and ancillary infrastructure
* Focus on lifecycle – deployment, maintenance, management, and decommission - of applications, components, and processes for Facebook products and services
* Work closely with cross functional teams to negotiate requirements, specifications, schedules, quality, and acceptance criteria
* Work closely with engineering, project management, and operational peers to develop innovative technical solutions that meet Facebook’s needs with respect to functionality, performance, scalability, and reliability
* Identify tactical issues and emerging areas of concern
* Work with regional leads to establish organizational goals, meet recruiting objectives, and fulfill the mission of unyielding site stewardship
* Participate in recovery from and forensic examination of major site incidents
* Develop reports and feedback to inform technical solutions that meet design needs
Requirements
* At least 4-6 years experience managing an Operations organization
* A natural team leader who can motivate and encourage personal advancement
* Excellent project management skills and the ability to work in a fast-paced and hectic work environment
* Ability to prioritize tasks effectively
* Perfect communications skills (written and verbal) and an ability to work seamlessly with organizational partners and peers
* A minimum experience of 4-6 years demonstrating the planning and roll-out of infrastructure in a global enterprise environment
* Must demonstrate experience with - Server OS and application management in large-scale production environment, Global infrastructure management in 24x7 co-located environments, Network and system troubleshooting and maintenance practices, and Management of engineering leads and support staff
* Must be willing to travel to domestic and international datacenter and office locations
* Understanding of best practices concepts, change management, SLA’s, policies, procedures, and design review driven standards
PLEASE APPLY ONLINE AT:
www.facebook.com/careers/apply.php?id=558&jobBoardId=1
Manager, Site Reliability Engineering
.