Supporting a client, in our case a Department of Defense (DoD) client, can offer some surprisingly rich take-aways other than the final products themselves. The following paragraphs summarize several valuable "lessons learned" that my team and I gathered from a recent project which utilized an R program backend to be hosted on a DoD cloud. This may be helpful to you if you are thinking about developing a web application yourself or would like to move a legacy application to the cloud.
Cloud Platforms: It is no secret the government is migrating from traditional computer systems managed on-site to cloud-based services in order to gain operational efficiencies. Here I’ve listed just a few examples of the many cloud solutions being used by government agencies, such as Amazon Web Services (known as Infrastructure as a Service) with the red bracket on the right of each column identifying the services the cloud service provider manages. You’ll see the last two types of cloud solutions require the least amount of internal IT management: Platform As A Service (PAAS), where you manage just the applications and the data, and Software As A Service (SAAS), where the service manages everything. However, this shift also necessitates the migration of existing legacy applications to the cloud.
Cloud and App Certification: Cloud solutions and new applications need to meet clear guidelines to ensure sufficient security safeguards are in place. The process can be a lengthy one for a cloud to receive the authority to operate (ATO). When Agencies decide to move their applications to the commercial cloud, the Defense Information Systems Agency (DISA) mandates specific approval and certification to connect the Cloud Access Point (CAP). At the individual application level, many of the required controls or safeguards can be inherited by the cloud solution, particularly from PAAS. This can speed the certification and deployment of new applications. The hardest part of navigating that process is determining exactly what the process is. 18F is an agency formed in 2014 to assist the government with developing digital services. Unfortunately, they have found the certification process to differ with each application across government agencies, and even within a single department. Since this can be the “long pole in the tent,” you would be wise to learn as much as you can and start developing early the necessary products required for the process.
Agile Development: In the past, government applications were traditionally built using the so-called “Waterfall” development process; it starts with the customer defining the set of requirements for the software solution, then proceeds to design, then to development, etc. The problem with this process is the customer may not have the best idea of what the requirements should be at the outset. The customer also may have included requirements nobody wants, and later realizes a whole new set of requirements not included in the original design. The term “Agile Development” refers to a development process with principles that include satisfying the customer with early and continuous delivery of working software, ranging from a few weeks to several months, with a preference to the shorter timescale. This requires continuous communication between the developers and the customer. As government agencies are migrating legacy applications to the cloud or introducing new digital services, most are adapting to the agile approach to increase the effectiveness of their resources and to speed the delivery of working software.
Jira: Jira is an issue tracking and project management tool used for agile development. There are a standard set of issue types such as Bug, New Feature, Task, Improvement, Story, Sub-task, and User Story. A User Story is typically something a user wants the software solution to do. It can be further broken down into a series of tasks. All of the issues of various types are put into the “backlog” of things to be done. In what is known as Agile Scrum workflows, issues (user stories, tasks, etc.) that can be executed in a manageable chunk of time, typically two weeks, are pulled from the backlog and put together into what is called a sprint. The issues are then assigned to individual developers. In Agile Scrum, the project is managed with daily stand-ups (quick meetings) to address the status of each developer’s issues to identify any “blockers” to workflow progress.
Sprint Planning: Sprint planning takes place prior to the start of a two week sprint. What can be delivered in the Increment resulting from the upcoming Sprint? How will the work needed to deliver the Increment be achieved? These two questions are asked and answered to develop the planned workflow for each sprint. You may look at the set of tasks or stories needed to complete a new feature in the software. Each of the tasks or stories needs to be evaluated on how long they will take to accomplish, and are then assigned to a developer. A construct known as “story points” is typically used to assign relative workload to execute a task. You generally do not assign more than one eight point task per sprint, and estimating a developer can execute 3 story points per day, this equates to 25 to 27 story points per sprint. We utilized a fun tool for developing task estimates with the team called Planitpoker, a free, online software. Team members vote on estimated story points to assign to a task, and then try to reach consensus on the workload for a given task. If something is pointed too high, it probably needs to be broken down into multiple tasks.
Git and GitHub: Git is version-control software created by the person who invented Linux. Git enables you to easily keep track of every revision you and your team make during software development. You all share one repository of code that is worked on independently and then merged back together. You do not need to be connected all the time because the project is saved both locally on each machine and remotely (likely at Github or Bitbucket). The repository, often referred to as a repo, is the collection of files and folders that you are using Git to track. The repository consists of the entire history of your team’s changes to the project. At CANA, we use GitHub for our repository storage. As developers complete software revisions, known as “commits,” they can push them to the GitHub repo so other developers can easily review the changes made, provide feedback, and approve changes to be merged into the main branch of code.
R Packages for Web Applications: R is an open source programming language widely used among statisticians and data miners for developing statistical software and data analysis. Shiny is an open source R package that provides a web framework for building web applications using R. Many DoD analysts have used Shiny to build dashboard applications for their government clients. It allows R users to develop powerful web applications entirely in R, without having to understand HTML, CSS and JavaScript. It also allows us to embed the statistical power of R directly into those web applications. Plumber is an R package that converts your existing R code to a web API using a handful of special one-line comments. So, you can build your application and still make use of calls to R code for what R does best. To illustrate, you may have heard skiing is fairly easy to pick up, but hard to truly master, whereas, snowboarding is considered hard to pick up, but once you get past the initial phase, you can master it. Shiny skills are like skiing, easy to pick up, but hard to master. And once you have figured out Plumber, like snowboarding, you will master it and have a very powerful tool for exposing your R code to a web service.
What is a Web API? The Hypertext Transfer Protocol (HTTP) is the dominant way information is exchanged on the Internet. An Application Programming Interface (API) is a broad term that defines the rules that guide your interaction with some software. In the case of HTTP APIs, you have a defined set of endpoints that accept particular inputs. Plumber translates the annotations you place on your functions into an HTTP API that can be called from other machines on your network. If you execute your Plumber API on a public server, you can even make your API available to the public Internet.
Swagger: Upon starting the API, the Plumber package provides us with an IP address, and a port and a client, e.g., another R instance, can now begin to send REST Requests (requests for information). It also opens a browser tool called Swagger, which can be useful to check if your API is working as intended. Swagger allows you to describe the structure of your APIs so that machines can read them. The ability of APIs to describe their own structure is one of the awesome features of Swagger. This also helps you develop the technical documentation for your application as you build it.
Docker: In simple terms, Docker is a software platform that simplifies the process of building, running, managing, and distributing applications. It does this by virtualizing the operating system of the computer on which it is installed and running. Let us hypothesize - you have three different R-based applications you plan to host on a single server, which could be either a physical or a virtual machine. Each of these applications makes use of a different version of R, as well as the associated libraries and dependencies that differ from one application to another. This could happen if the applications were developed at different times, by different developers. Since we cannot have different versions of R installed on the same machine, this prevents us from hosting all three applications on the same computer. A Docker Container does not have any operating system installed and running on it, but it would have a virtual copy of the process table, network interface(s), and the file system mount point(s). These have been inherited from the operating system of the host on which the container is hosted and running, whereas the kernel of the host’s operating system is shared across all the containers that are running on it. This allows each container to be isolated from the other present on the same host. Thus, Docker supports multiple containers with different application requirements and dependencies to run on the same host, as long as they have the same operating system requirements.
Packrat and Renv are dependency management systems for R. R package dependencies can be frustrating. Have you ever used trial-and-error to figure out what R packages you need to install to make someone else’s code work, and been left with those packages globally installed forever because you are not sure you need them? Renv allows you to lock in the packages you need for a particular project and you can also revert to an older version if you run into a problem. These project libraries can be shared across your development team, regardless whether some are running Linux and some Windows.
GUI Mock-ups & Pair Programming: We have found a graphical user interface, or GUI, mockups to be an important part of the development process working in concert with our client. Another important tip is the use of pair programming. In our DoD project, we were fortunate to have three of us working together on the backend, and it worked extremely well. To summarize the value of pair programming in your project:
Two heads are better than one. If the driver encounters a hitch with the code, there will be two of them who will solve the problem.
More efficient. Common thinking is that pair programming slows down the project completion time because you are effectively putting two programmers to develop a single program, instead of having them work independently on two different programs. But studies have shown two programmers working on the same program are only slightly slower than when those programmers work independently, rather than the presupposed 50% slow down, but they have more efficient code to show for it.
Fewer coding mistakes. Because there is another programmer looking over your work, there are fewer bugs. Plus, pair programming allows the driver to remain focused on the code being written, while the other attends to external matters or interruptions.
Sharing knowledge. They can share knowledge about the application purpose as well as the development operations more generally. Programmers get instant face-to-face instruction, which is much better and faster than looking for online resources and tutorials. Plus, you may learn things more easily from your partner, especially in areas that may be unfamiliar to you; developers can also pick up best practices and better techniques from more advanced programmers. Pair programming can also facilitate mentoring relationships between programmers.
Develops your staff’s interpersonal skills and team building. Collaborating on a single project helps your team to appreciate the value of communication and teamwork.
Renee G. Carlucci is a CANA Advisors Principal Operations Research Analyst.
Commenti