Debugging failures seen only in CI


(Pushkar N Kulkarni) #1

However, the tests continue to intermittently fail in the CI and we seemed to have reached a point where we simply aren’t able to reproduce these failures locally - we tried 2/4/16CPU with 8G/16G memory running Ubuntu 16.04 on bare metal. Even if we are able to reproduce, as is in the case of a segfault, there’s no way to ascertain that the local failures have the same causes (e.g backtrace for a segfault) as the ones in the CI.

I have two questions in this context:

  1. Is there a way to collect failure data from the CI environment - core dumps on a crash or thread stacks on a timeout (I don’t think XCTest helps) ?

  2. Is there a way to obtain more information about the environment - is it virtualised, # of CPU cores, memory size, CPU/memory available for each build, what’s the load (CPU/memory utilisation)?

In general, I’d be really helpful to have some suggestions about how to go about debugging failures that are being observed ONLY in the CI.

Thanks!
Pushkar N Kulkarni,

IBM Runtimes

Simplicity is prerequisite for reliability - Edsger W. Dijkstra

Over the past few months, we’ve seen random and intermittent failures in the URLSession tests in TestFoundation in the CI builds running on Ubuntu 16.04. Surprisingly, these failures never occur on PR builds. We’ve tried to reproduce them locally (on matching Ubuntu levels) and fixed locally observed issues, assuming those were the issues that broke the CI.


(Mishal Shah) #2

Over the past few months, we've seen random and intermittent failures in the URLSession tests in TestFoundation in the CI builds running on Ubuntu 16.04. Surprisingly, these failures never occur on PR builds. We've tried to reproduce them locally (on matching Ubuntu levels) and fixed locally observed issues, assuming those were the issues that broke the CI.

However, the tests continue to intermittently fail in the CI and we seemed to have reached a point where we simply aren't able to reproduce these failures locally - we tried 2/4/16CPU with 8G/16G memory running Ubuntu 16.04 on bare metal. Even if we are able to reproduce, as is in the case of a segfault, there's no way to ascertain that the local failures have the same causes (e.g backtrace for a segfault) as the ones in the CI.

I have two questions in this context:
1. Is there a way to collect failure data from the CI environment - core dumps on a crash or thread stacks on a timeout (I don't think XCTest helps) ?

Please send me email with build you are seeing failure with, and where on the system the core dump is located.

2. Is there a way to obtain more information about the environment - is it virtualised, # of CPU cores, memory size, CPU/memory available for each build, what's the load (CPU/memory utilisation)?

I updated the CI Jobs to display following info for each build. Please let me know if you would like see additional info.

···

On May 18, 2017, at 12:54 PM, Pushkar N Kulkarni via swift-dev <swift-dev@swift.org> wrote:

========================
OS Info:

No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.10
Release: 16.10
Codename: yakkety

CPU Info:

CPU op-mode(s): 32-bit, 64-bit
CPU(s): 48
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
CPU family: 6
CPU MHz: 2600.000
CPU max MHz: 2600.0000
CPU min MHz: 1200.0000

Free:

              total used free shared buff/cache available
Mem: 65860844 3029364 48991112 47452 13840368 62111280
Swap: 999420 271228 728192

Uptime:

19:39:19 up 195 days, 1:51, 0 users, load average: 18.37, 25.89, 42.99

Number of executors: 4

https://ci.swift.org/job/oss-swift-incremental-RA-linux-ubuntu-16_10/3733/consoleFull

Thanks,
Mishal Shah

In general, I'd be really helpful to have some suggestions about how to go about debugging failures that are being observed ONLY in the CI.

Thanks!
Pushkar N Kulkarni,
IBM Runtimes

Simplicity is prerequisite for reliability - Edsger W. Dijkstra

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev