You are here

The Trouble with Tribbles... - Peter Tribble

Subscribe to The Trouble with Tribbles... - Peter Tribble feed
Peter Tribblehttp://www.blogger.com/profile/09363446984245451854noreply@blogger.comBlogger514125
Updated: 1 week 14 hours ago

Thoughts on SPARC support in illumos

Sun, 02/10/2019 - 16:10
One interesting property of illumos is that its legacy stretches back decades - there is truly ancient code rubbing shoulders with the very modern.

An area where we have really old code is on SPARC, where illumos has support in the codebase for a large variety of Sun desktops and servers.

There's a reasonable chance that quite a bit of this code is currently broken. Not because it's fundamentally poor code (although it's probably fair to say that the code quality is of its time, and a lot of it is really old), but it lives within an evolving codebase and hasn't been touched in the lifetime of illumos, and likely much longer. Not only that, but it's probably making more assumptions about being built with the old Studio toolchain rather than with gcc.

What of this code is useful and worth keeping and fixing, and what should be dropped?

A first step in this was that I have recently removed support for starfire - the venerable Sun E10K. It seems extremely unlikely that anyone is running illumos on such a machine. Or indeed that anyone has them running at all - they're museum pieces at this point.

A similar, if rather newer, class of system is the starcat, the Sun F15K and variants. Again, it's big, expensive, requires dedicated controller hardware, and is unlikely to be the kind of thing anyone's going to have lying about. (And, if you're a business, there's no real point in trying to make such a system work - you would be much better off, both operationally and financially, in getting a current SPARC system.)

And if nobody has such a system, then not only is the code useless, it's also untestable.

The domained systems, like starfire and starcat, are also good candidates for removal because of the relative complexity and uniqueness of their code. And it's not as if the design specs for this hardware are out there to study.

What else might we consider removing (with starfire done and starcat a given)?

  1. The serengeti, Sun-Fire E2900-E6800. Another big blob of complex code.
  2. The lw8 (lightweight 8), aka the V-1280. This is basically some serengeti boards in a volume server chassis.
  3. Anything using Sbus. That would be the Ultra-2, and the E3000-E6000 (sunfire). There's also the socal, sf, and bpp drivers. One snag  with removing the Ultra-2 is that it's used as the base platfrom for the newer US-II desktops, which link back to it.
  4. The olympus platform. That's anything from Fujitsu. One slight snag here is that the M3000 was quite a useful box and is readily available on eBay, and quite affordable too.
  5. Netra systems. (Specifically NetraCT - there's a US-IIi NetraCT, and two US-IIe systems, the NetraCT-40 and the NetraCT-60. Code names montecarlo and makaha (something about Tonga too). Also CP2300 aka snowbird.
  6. Server blade. I'm talking the old B100s blade here.
  7. Binary compatibility with SunOS 4 - this is kernel support for a.out, and libbc.
I'm not saying at this point that all of this code and platform support will go, just that it lists the potential candidates. For example, I regard support for M3000 as useful, and definitely worth thinking about keeping.

What does that leave as supported? Most of the US-II and US-III desktops, most of the V-series servers, and pretty much all the early sun4v (T1 through T3 chips) systems. In other words, the sort of thing that you can pick up second hand fairly easily at this point.

Getting rid of code that we can never use has a number of benefits:

  • We end up with a smaller body of code, that is thus easier to manage.
  • We end up with less code that needs to be updated, for example to make it gcc7 clean, or to fix problems found by smatch, or to enable illumos to adopt newer toolchains.
  • We can concentrate on the code that we have left, and improve its quality.
If we put those together into a single strategy, the overall aim is to take illumos for SPARC from a large body of unknown, untested, and unsupportable code to a smaller body of properly maintained, testable, and supportable code. Reduce quantity to improve quality, if you like.

As part of this project, I've looked through much of the SPARC codebase. And it's not particularly pretty. One reason for attacking starfire was that I was able to convince myself relatively quickly that I could come up with a removal plan that was well-bounded - it was possible to take all of it out without accidentally affecting anything else. Some of the other platforms need bit more analysis to tease out all the dependencies and complexity - bits of code are shared between platforms in a variety of non-obvious ways.

The above represents my thoughts on what would be a reasonable strategy for supporting SPARC in illumos. I would naturally be interested in the views of others, and specifically if anyone is actually using illumos on any of the platforms potentially on the chopping block.
Categories: Personal Blogs

SPARC and tod modules on illumos

Sat, 02/09/2019 - 00:21
Following up from removing starfire support from illumos, I've been browsing through the codebase to identify more legacy code that shouldn't be there any more.

Along the way, I discovered a little tidbit about how the tod (time of day) modules - the interface to the hardware clock - work on SPARC.

If you look, there are a whole bunch of tod modules, and it's not at all obvious how they fit together - they all appear to be candidates for loading, and it's not obvious how the correct one for a platform is chosen.

The mechanism is actually pretty simple, if a little odd.

There's a global variable in the kernel named:

tod_module_name
This can be set in several ways - for some platforms, it's hard-coded in that platform's platmod. Or it could be extracted from the firmware (OBP). That tells the system which tod module should be used.

And the way this works is that each tod module has _init code that looks like

if (tod_module_name is myself) {
   initialize the driver
} else {
   do nothing
}
so at boot all the tod modules get loaded, but only the one that matches the name set by the platform actually initializes itself.

Later in boot, there's an attempt to unload all modules. Similarly the _fini for each driver essentially does

if (tod_module_name is myself) {
   I'm busy and can't be unloaded
} else {
   yeah, unload me
}
So, when the system finishes booting, you end up with only one tod module loaded and functional, and it's the right one.

Returning to the original question, can any of the tod modules be safely removed because no platform uses them? To be honest, I don't know. Some have names that match the platform they're for. It's pretty obvious, for example, that todstarfire goes with the starfire platform, so it was safe to take that out. But I don't know the module names returned by every possible piece of SPARC hardware, so it isn't really safe to remove some of the others. (And, as a further problem, I know that at least one is only referenced in closed source, binary only, platform modules.)
Categories: Personal Blogs

Tribblix and the python transition

Sun, 11/04/2018 - 21:53
It's been over a decade since python 3 came out, and a lot of the world is still using python 2.

But we're at the point now where the world has said enough is enough, and it's time to finally get the 2-to-3 transition over with.

And while Tribblix is all about retro styling, it's also all about keeping up. So I put together a plan for migrating Tribblix from python 2 to python 3.
  • Ship all the modules for python 3 as well as python 2, ready to switch
  • Move the python consumers (eg mercurial) across to python 3
  • Make python 3 the default
  • Deprecate and remove python 2
This is made a little easier by the fact that there's nothing in Tribblix itself that uses python directly - I haven't made the mistake of having my packaging system or anything like that written in python, for example.

There was a little wrinkle in all this. I had got this planned out, and then python 3.7 was just around the corner. So I ended up waiting a little, and put a python 3.6 to 3.7 transition at the beginning of the list.

So where am I right now? I've now got all the modules built and packaged for python 3.7, and python 3.6 has been removed from Tribblix. This was made somewhat easier by the fact that no packages in Tribblix yet depended on python 3 - the transition hadn't been started properly, so I could simply throw away all the python 3.6 stuff.

As an aside, this had the odd side effect that all the python 3.7 modules were packaged straight away for SPARC, whereas python 3.6 was never finished there - the 3.6 to 3.7 switch was all scripted, rather than manual, so was very little actual work. There were a couple of modules that needed to be updated anyway to work with 3.7 (pyyaml for example), and I took the opportunity to do a bunch of routine module updates at the same time.

So just having all the modules turned out to be nearly trivial. Now there's going to be a longer slog migrating all the python consumers across and making python 3 the default. (It might be easiest to make python 3 the default first, so that when building the consumers they automatically pick up the python I want.)

I was originally thinking of a fairly slow and structured approach where each step would be a point release of 3.7. But I'm well ahead of that already, and the remaining steps are likely to occur fairly promptly. (Or, as promptly as I have time to do the work.)

So it won't be long before we bid farewell to python 2 in Tribblix.
Categories: Personal Blogs