OSMF should host a Forgejo/GitLab instance

(we’re veering away from Github vs alternatives but)

It doesn’t need to be a separate category - you can use tags very much like categories - you’ll notice that that list has a “new topic” button. I’ve often directed DWG correspondents that way when they want to talk about a country that doesn’t have a category.

From the other side of things, many of the OSM-related software repositories (the website, mod_tile, etc.) get “issues” raised that aren’t software issues at all, but instead “I was trying to do something with OSM data and something went wrong - help!” - those would be far better on this forum than in Github (or any equivalent).

1 Like

(Yes, I know I’m gently steering us away from window shopping for Git frontends. That’s called “taking a step back”.)

Right, Discourse isn’t a full replacement for a bug tracker. It’s more of a solution to the problem of people relying on proprietary chat platforms than the problem of relying on a proprietary platform for software development. But many of the arguments seem to apply equally, particularly the concern you expressed about sharing personal information or having to maintain an account on an unfamiliar service. (Discourse’s UI is also much more comprehensively localized for a global audience than GitHub’s, which isn’t saying much.)

Some projects do use GitHub discussions, but that feature tries to replicate Stack Overflow post voting just like this forum’s post voting plugin, with all the attendant confusion. I wouldn’t recommend it.

I think this discussion somewhat mixes reporting problems and tracking issues. Those two things may look like they are one and the same on the surface, but this is a misconception. They are different problems and need different tools.

Most users who report problems do not have the experience to use an issue tracker in a structured way that helps developers with tracking. It is not rare that the first handful of comments is spent with a back and forth trying to understand what the actual problem may be. This is a necessary part of problem reporting but really not helpful for structured issue tracking.

Personally, I somewhat prefer when people start a discussion here in the forum when they see a problem with Nominatim. It’s prefect for asking clarifying questions. There are lots of knowledgeable people here, who can help with background information on OSM or help tracking down the OSM data that causes the problem. And there are regulars here who are happy to repeat answers to the questions that get asked again and again. I can’t emphasize enough how helpful this is for me as a maintainer. So, when it comes to reporting problems, this forum is just a fine tool for it and it ticks all the boxes on privacy (as far as a public venue can do that).

Issue tracking is a different beast. It has to work for the maintainers and developers. Github currently does the job, not perfectly but well enough. Access to the CI and the ability to communicate across project boundaries are a huge win. Most of the privacy concerns don’t really apply because I do open source development only. Everything is in the open for everyone to see: code, issues, comments. That is a conscious choice. There is also no real lock-in. If you are worried that some information only lives on Github, then there is always the option to mirror/backup the important stuff.

I’m not naive. I understand that there is no such thing as a free lunch, also not on Github. What Microsoft gets out of it, are millions of lines of code they can use without paying license fees. That likely pays for itself a hundred times over. So the deal is: I get an issue tracker and CI, they get the code. I’m fine with that. It’s open source.

13 Likes

From tech press coverage It seems they also use it heavily for their internal code repositories and some of their early improvements were driven by needs they had managing their own code and teams. They’re also said to have gone moved an “inner source” methodology for all their code which a GitHub variant would be convenient for. IMHO their acquisition was probably more about preserving tooling that they find useful than snaffling code they could already just fork as an when they needed it.

I really don’t think OSM will benefit much from shifting away and it’ll be costly in terms of manpower to run the thing unless we just hand everything over to yet another faceless entity liable to being taken over.

1 Like

While I could arrange for a server and host the forge there, I’m increasingly demotivated to do so given the responses of the OSM community (…at least, what fraction of it speaks here). It seems nobody will move their projects to a free platform even if I do provide them a perfect alternative on a platter, so my efforts will probably be wasted.

In response to the recent replies, as well as the people downvoting my replies - the ignorance, apathy, and denial of the community continues to be disappointing and depressing. I had expected better from - as I said before - a community which is otherwise so invested in privacy, freedom, attribution, and copyleft. Do better.

1 Like

Is it also “open source” to reproduce nontrivial parts of code verbatim, violating attribution and copyleft? Because that’s what’s going on with Copilot. Would the pro-GitHub section of the OSM community also be okay with the same being done for OSM data? :roll_eyes:

Anyone claiming to be in favor of open source or copyleft and still using GitHub is utterly hypocritical. The amount of support this nonsensical thought process is receiving is nothing short of disgusting.

What about those of us who release our organic, unassisted code into the public domain? Or those of us who checked the box to “consider our map edits to be in the public domain”, perhaps knowing it would amount to nothing? You may not agree with those choices, but consider the environment that permitted those choices to even be made.

The OSM community is already developing a nonzero amount of code on self-hosted or non-GitHub-hosted repositories. Aside from JOSM, you just haven’t heard about them – that’s one of the points that others are trying to make here.

7 Likes

Is it also “open source” to reproduce nontrivial parts of code verbatim, violating attribution and copyleft? Because that’s what’s going on with Copilot.

This is essentially the same discussion that is being had the world over with regard to ChatGPT/Midjourney and the likes scouring the web for content to learn from, and to some degree our OSM data also feeds some AI models that essentially feed off our copylefted work. I am mildly critical of those endeavours even though I can see the conundrum of how we would allow a human to do all this “learning from stuff others have made” without claiming copyleft on their brains, while not allowing a machine to do the same.

But using a self-hosted whatever instance wouldn’t protect us from third parties ingesting the stuff we publish there and - potentially - violating our license.

If GitHub is doing bad things with our code, this wouldn’t stop just because we self-host; we’d still have the same issues, plus the additional work of self-hosting.

Anyone claiming to be in favor of open source or copyleft and still using GitHub is utterly hypocritical.

I think you have a very narrow perspective. Someone could also say: “They’re abusing my stuff anyway no matter what I do, so why not get something out of it at least”.

The amount of support this nonsensical thought process is receiving is nothing short of disgusting.

The fact that so little makes sense to you could also come from you not having given the matter enough thought.

12 Likes

Slightly OT.

I would point out that the jury is still out (literally :-)) on this from a legal pov.

The thing is that a copy is a copy is a copy, just as if you produced a verbatim copy of something that you “saw” once you are likely going to run in to trouble. How that maps to the specific combination of producers and users of LLMs is going to be interesting, but currently is undecided. There are many dimensions to this, just consider the US fair use doctrine that doesn’t exist elsewhere in such an expansive form.

In any case as long as the contributors to a github hosted project are aware of the situation IMHO it is their call if they want to use it or not. OSS isn’t just about licences.

1 Like

It was mentioned already, but may be worth repeating: insulting others very rarely convinces them.

15 Likes

The legality of ML models that can reproduce source data is still up in the air - anyone who has an answer they’re sure of world-wide is incorrect. Regardless of which way the various courts and governments go on the matter, it doesn’t really matter for this discussion. Code for OSM projects will generally be at least mirrored to Github, so it will get picked up anyways. Companies could clone the repos and do anything copyright law and the licenses allow.

7 Likes

Hi all,

As mapcomplete developer, I have been experimenting with Forgejo and I love it!

The transition (still ongoing) was pretty smooth. I got actions running and login with osm too. And yes, a contributor popped up who rarely contributed before because ‘github = GAFAM = bad’.

Imho, Githubs enshittification has started (slowly but surely). I also agree with the ideological argument that we should try to use FLOSS as much as (pragmatically) possible (but pragmatism and stuff that works is important too)

Feel free to make an account and to test on https://source.mapcomplete.org

3 Likes

See my comment here OSMF should host a Forgejo/GitLab instance - #18 by SimonPoole

I think at this time, we could do like @amapanda_ᚐᚋᚐᚅᚇᚐ did with the Mastodon instance: find a reliable open source-friendly provider, pay for hosting a Forgejo instance (and set up independent backups), proclaim it the official OSMF instance, and in a year, if people start migrating their code, we could expand the instance and ask for OSMF funding.

5 Likes

Running forgejo itself should be easy enough (unlike gitlab which is a huge pain to run) but just running forgejo falls far short of being a replacement for github or any other hosting service.

The obvious huge problem is that you don’t get any CI or other automation - that in itself would probably prevent anybody much moving.

Then there’s the problem of network effects and the fact that you need to get everybody to move or you fragment the community.

I just don’t see that it’s worth us spending effort on when the existing solutions work fine unless you’re doing it for ideological reasons and I’m just not that interested in those.

2 Likes

You got me worried for a minute, but Forgeyo can do CI.

I then thought about GitHub Pages, but there is Codeberg, so that’s covered.

I agree that network effects are the main problem. But I remember that having a separate SVN didn’t stop people from contributing (see also JOSM). And OAuth2 integration would make it easier for OSM people. I doubt people come to Every Door issues by searching on GitHub, and not following a link from a website.

2 Likes

I think that must be quite new, and it will significantly complicate running it.

The bigger problem though is that it works using the forgejo runner which, by it’s own admission, is:

alpha release, should not be considered secure enough to deploy in production

2 Likes

Hi all,

As a uMap maintainer, I’m happy to see this discussion.

I’d be in favor of moving uMap out of Github, and in my opinion that would be ideal if there where an “OSM related” projects hub, so it would be easier for contributors and new comers, because same UI, shared auth, etc.

In my eyes, moving to codeberg sounds like a nice option, so it’s not yet another service on the shoulders of the foundation ops. But of course a self hosted forgejo would be awesome too from my point of view.

So, ready to move where the community decide to!

Yohan

3 Likes

As I pointed out somewhere above, the only option that doesn’t increase PI exposure is the OSMF running the instance itself, as soon as you use a hosted solution, you are adding another party and literally just making the problem worse.

PS: as to code in public repos being used to train Generative AI models, there is unlikely to be any difference.

1 Like

I agree that’s a problem. Although the underlying ACT is a production-ready thing with 57k stars. I guess the problem is with security, and by allowing CI/CD only by a hand-reviewed maintainer request, we can keep it more or less secure.

(Codeberg is probably also a good option.)