Debugging failures seen only in CI


(Pushkar N Kulkarni) #1

Hi Tony,

Pushkar N Kulkarni,

IBM Runtimes

Simplicity is prerequisite for reliability - Edsger W. Dijkstra

In this context, would it be acceptable if we add (temporarily) some diagnostic prints to TestNSURLSession to log information only on failures?

However, the tests continue to intermittently fail in the CI and we seemed to have reached a point where we simply aren’t able to reproduce these failures locally - we tried 2/4/16CPU with 8G/16G memory running Ubuntu 16.04 on bare metal. Even if we are able to reproduce, as is in the case of a segfault, there’s no way to ascertain that the local failures have the same causes (e.g backtrace for a segfault) as the ones in the CI.

I have two questions in this context:

  1. Is there a way to collect failure data from the CI environment - core dumps on a crash or thread stacks on a timeout (I don’t think XCTest helps) ?

  2. Is there a way to obtain more information about the environment - is it virtualised, # of CPU cores, memory size, CPU/memory available for each build, what’s the load (CPU/memory utilisation)?

In general, I’d be really helpful to have some suggestions about how to go about debugging failures that are being observed ONLY in the CI.

Thanks!
Pushkar N Kulkarni,

IBM Runtimes

Simplicity is prerequisite for reliability - Edsger W. Dijkstra

···

To: swift-dev@swift.org, swift-corelibs-dev@swift.org
From: Pushkar N Kulkarni/India/IBM
Date: 05/19/2017 01:24AM
Subject: Debugging failures seen only in CI

Over the past few months, we’ve seen random and intermittent failures in the URLSession tests in TestFoundation in the CI builds running on Ubuntu 16.04. Surprisingly, these failures never occur on PR builds. We’ve tried to reproduce them locally (on matching Ubuntu levels) and fixed locally observed issues, assuming those were the issues that broke the CI.

-----Pushkar N Kulkarni/India/IBM wrote: -----


(Tony Parker) #2

I'm ok with that if it helps us get to the bottom of this once and for all.

I wonder if it would be possible to enable the logging conditionally, so it only appears in CI? Maybe via a runtime call ("turn logging on now").

- Tony

···

On May 23, 2017, at 2:25 AM, Pushkar N Kulkarni <pushkar.nk@in.ibm.com> wrote:

Hi Tony,

In this context, would it be acceptable if we add (temporarily) some diagnostic prints to TestNSURLSession to log information only on failures?

Pushkar N Kulkarni,
IBM Runtimes

Simplicity is prerequisite for reliability - Edsger W. Dijkstra

-----Pushkar N Kulkarni/India/IBM wrote: -----
To: swift-dev@swift.org, swift-corelibs-dev@swift.org
From: Pushkar N Kulkarni/India/IBM
Date: 05/19/2017 01:24AM
Subject: Debugging failures seen only in CI

Over the past few months, we've seen random and intermittent failures in the URLSession tests in TestFoundation in the CI builds running on Ubuntu 16.04. Surprisingly, these failures never occur on PR builds. We've tried to reproduce them locally (on matching Ubuntu levels) and fixed locally observed issues, assuming those were the issues that broke the CI.

However, the tests continue to intermittently fail in the CI and we seemed to have reached a point where we simply aren't able to reproduce these failures locally - we tried 2/4/16CPU with 8G/16G memory running Ubuntu 16.04 on bare metal. Even if we are able to reproduce, as is in the case of a segfault, there's no way to ascertain that the local failures have the same causes (e.g backtrace for a segfault) as the ones in the CI.

I have two questions in this context:
1. Is there a way to collect failure data from the CI environment - core dumps on a crash or thread stacks on a timeout (I don't think XCTest helps) ?
2. Is there a way to obtain more information about the environment - is it virtualised, # of CPU cores, memory size, CPU/memory available for each build, what's the load (CPU/memory utilisation)?

In general, I'd be really helpful to have some suggestions about how to go about debugging failures that are being observed ONLY in the CI.

Thanks!
Pushkar N Kulkarni,
IBM Runtimes

Simplicity is prerequisite for reliability - Edsger W. Dijkstra