Artifactory is an excellent solution to the problem of package management and binary artefact distribution when teams are spread around the world. The caching and data replication features of the Artifactory Enterprise edition hugely simplify global software development.
Package management and artefact distribution with globally-spread teams
We were on onsite this week with a client in the UK that has software development teams in the UK and in the Asia Pacific region. Their live environments (both traditional “on premise” and AWS) are also geographically spread across two or three time zones. We were discussing the challenges of keeping test data, deployable artefacts, and other assets in sync across these different locations, along with the problem of huge network delays when transferring files across 8-hour and 10-hour time zone distances. The conversation turned to the benefits of using a first-class artefact management solution like Artifactory.
First-class artefact management with caching, replication, and tagging in Artifactory
Artifactory helps to solve some of the geographic challenges associated with package and binary artefact management in several ways, which we’ll explore below. These all help with globally-spread software development.
Caching of public packages
Artifactory does intelligent (actually, lazy) caching of public package repositories (npm, nuget, etc.) by proxying package requests from local package clients, calling the remote repository, and then caching the public package locally. This has several benefits:
- We can block access to the public package endpoints at the edge firewall to prevent unwanted packages from being downloaded and installed. With all requests going via our own package repo, we have the chance to block access to certain packages and/or perform a security scan on first caching the package.
- With access to the public repos blocked, we have also ensured that all software/firmware builds are dependent only on our own package repos, not on a public or 3rd party repo, making for more reproducible builds.
- We can now audit all package dependency requests.
- We might save some significant outbound bandwidth by caching the public packages locally within our network; this is especially true if we are caching container images.
Artifactory makes this caching of public packages seamless with its Virtual Repository concept.
Caching of internal packages and container images
By setting up remote caches of certain internal package or container image collections, we can avoid the need to push many GB or TB of data around the network, and instead simply allow the remote locations (say, the Production environment) to pull packages “on-demand” from a central Artifactory instance. Effectively, this is the “lazy caching” pattern but with our own internal package repo as the upstream repository.
Replication of packages and binaries
With globally-distributed teams, it’s important to have packages and binary artefacts (test data, container images, etc.) located close to each team so that the download time is rapid. You cannot have effective Continuous Integration and deployment pipelines if it takes 30 mins to download a package dependency. Artifactory Enterprise helps to address this challenge by its smart binary replication options.
Essentially, you set up an Artifactory instance in a location close to one set of engineering teams, and another instance in a location close to another set of teams in a different geographic location. You can then configure the replication settings in Artifactory Enterprise to automatically replicate the packages and binaries seamlessly between the locations, making it feel like there is a single, rapid central location for packages and binaries.
Replication is also the mechanism used to achieve active-active High Availability in a given location.
Using metadata on packages to restrict or promote package consumption
One of the key tenets of Continuous Delivery is Build Your Binaries Once; this means that once we have a binary in our artefact repo, it has the potential to be used in Production/Live if sufficient tests pass. That is, we do not build separate “Dev” and “Release” versions of binaries, but instead just a single binary that is tested before releasing. How can Artifactory help with this?
We can use Artifactory ‘properties’ (metadata) to tag packages and binaries that have passed certain kinds of testing (say UnitTest, UAT, etc.) and then have dependent builds or dependent deployments look only for packages with a certain tag. This really helps teams to choose their degree of exposure to the “latest bleeding edge” version versus a more well-tested version.
License auditing and security scans for packages
A very nice feature of Artifactory for larger organisations is the ability to manage and audit the licenses used by packages (Licence Control). From a compliance perspective, this hugely simplifies the otherwise daunting task of ensuring that the right kind of licences are being used and that the organisation is complying with the licences.
Background – Artifactory at scale
Back in 2011-2014 I was part of a small team responsible for build & deployment at an online retailer in the UK. We had introduced NuGet for .NET package management and also RPMs for managing the RedHat Linux servers (actually, Oracle Linux, but don’t ask…). At first, we used various NuGet package hosting solutions that turned out to be quite half-baked and caused almost daily operational problems. We also had to deal with a huge daily shipment of binaries back and forth between the London office and the Bangalore office (around 6 GB per day, I think – the poor MPLS link was hammered 😦 ) . This supported the work of around 250 developers across the two locations.
Developers are product advocates
Eventually, we evaluated and rolled out Artifactory Enterprise for NuGet packages and RPMs due to its support for different package endpoints and for its native replication capabilities. We never looked back. In 2012, NuGet support was quite new in Artifactory; in fact, we were one of the first customers using Artifactory Enterprise for hosting a NuGet package collection, and we worked closely with the engineers at Artifactory to diagnose and fix some awkward bugs in JFrog’s implementation of the (rather strange) NuGet feed specification around results paging (cue several hours of XML inspection – yuk). The JFrog engineers were really helpful and responsive, and this encouraged us to continue with Artifactory.
Artifactory replication topology
In addition to the London and Bangalore offices, we also had to ship the artefacts to the primary and secondary datacentres, at the time located in the north of the UK (this was before a move to AWS). We used the replication and caching features of Artifactory to build a simple and reliable binary distribution mechanism across the four locations:
- London and Bangalore had mirrors of all packages from development teams – the packages here were used for early-stage deployment pipeline tests (unit tests, component tests, some integration tests)
- We had separate Artifactory collections for packages that were “more tested” and heading closer to Production
- We had an instance of Artifactory running close to the primary datacentre and set up as a lazy cache pointing to the Production-ready package collection in the London office; when deployments ran in Production, they requested packages from the Artifactory server near Production, which then pulled the packages from London (or Bangalore) and cached them. This kept the volume of packages in Production small and helped with auditing.
Package cleardown is essential
We did some analysis on the packages in the main London/Bangalore repositories and discovered that around 70% of the packages had never been requested (for this we used the Nginx logs from an Nginx instance in front of Artifactory initially). This high rate of non-use was due to the way in which all CI builds would push a new package, auto-incrementing the build and the package number. This meant that clearing down the packages was simple:
- Find the packages over (say) 30 days old
- Select those packages that have a newer package
- Select those packages that have never been requested
Regular, sensible cleardown of binary artefact repositories is essential in order to avoid excessive storage costs. Of course, you need to adhere to compliance rules for data retention or other legal reasons, but in cases like the one we had, with clear cases of non-use, cleardown is straightforward.
A first-class artefact repository is essential for modern enterprise software development.. Artifactory Enterprise works really well for globally-distributed software development due to its smart caching, intelligent replication, and integrated licence management and security scanning.