osm2pgrouting (#24) - working with large files (#297) - Message List

working with large files

First of all, I love the tool. Thanks a bunch for all the hard work and the clean source code.

So as the title implies, and as I am aware osm2pgrouting does not do, I need to load 300GB of data. I am not totally clear on the node/way/relation interdependency, or what happens if I try to sequentially load overlapping maps. Can I use Osmosis (or a shell script, or an XML parser, or whatever) to break out a few thousand physically overlapping smaller boxes and load them one-by one, eliminating incomplete ways and relations, or would the lack of internal references to already added ways and relations result in multi-adding those entries? Should I do that anyway and then write a de-duping query to operate within the db after a full load?

Or, is the only reasonable approach to extend the osm2pgrouting source code? If so, what would you change?

Should I perhaps do a little of one and a little of the other?

Anyway, thank you again.

~ Aaron

  • Message #1074

    Hi Aaron,

    OSM data has the topology information already, but we actually don't care - osm2pgrouting runs our topology creation function anyway. For you it means that you can add ways to the table by small chunks and than run assign_vertex_id function on complete ways table.

    Of course osm2pgrouting is far from perfect. We got few patches from the community members, but unfortunately never had time to test them well.

    For example, there was a patch for storing temp links in db - #161 (but recently somebody complained that it doesn't work well -  http://lists.postlbs.org/pipermail/pgrouting-users/2009-September/thread.html)

    Another patch we got from another person and you can also try it - #176

    You might also be interested in this topic - http://pgrouting.postlbs.org/discussion/topic/279

    Cheers!

    Anton.

    • Message #1100

      i am writing a tool that uses osmosis to split large files into small ones and than import the small files with a patched osm2pgrouting..

      with unchanged osm2pgrouting you cant import another file.

      so far : routing works , also across two seperatly imported files. but the routing algorithmens are slow now

      • Message #1101

        There is a major bug in patch Ticket #161

        While creating ways, every node is read from db, and a NEW object is created from it instead of reading the right node from the vector:

        node->numsOfUse is never >1,

        so function OSMDocument::SplitWays?() fails

        • Message #1136

          Is there a new patch or a patch to this patch that fixes the problem?

      • Message #1135

        Hi ole,

        I'm interested in loading multiple OSM download files into a single database. Would your patch work for this? Will you be posting your patch so others can use it? Or like for combining multiple export files back into a single routable dataset.

        ## separate OSM download files
        /home/woodbri/work/osm/data/great_britain.osm.bz2
        /home/woodbri/work/osm/data/ireland.osm.bz2
        /home/woodbri/work/osm/data/isle_of_man.osm.bz2
        ## multiple overlapping exports files
        /home/woodbri/work/osm/data/guadeloupe-20090928-1.osm
        /home/woodbri/work/osm/data/guadeloupe-20090928-2.osm
        /home/woodbri/work/osm/data/guadeloupe-20090928-3.osm
        
        • Message #1140

          hi, yes with my fix, you can import the overlapping files. you could also import non overlapping files, if you split them with the osmosis bbox parameter "-completeWays", but using this parameter on a large file, splitting with osmosis is much slower.

          im just working on some issues. if its stable, ill post it here. that will be very soon

          • Message #1146

            http://pgrouting.postlbs.org/wiki/osm2pgroutingPatchToAppendFile

            let me hear if you had any problems. i made some untestet changes *duck*

            • Message #1238

              Hi Ole, thanks for sharing that patch! It all worked well and quick until it came to the second (and last) file and the message "deleting dublicate ways" was to be read. That message appeared two days ago and hand not yet been replaced by a new one. However, the cpu is still running with 100% percent, around 90% from that is used by the postmaster process.

              I imported two German federal states: hamburg.osm -clean schleswig-holstein.osm -append -finalize The virtual Suse-Linux 11.1 machine has 2.8 GB ram and one core of a INTEL Core 2 Duo E8500 3.16 GHz

              Do you have any idea whether this poor performance is due to bad configuration on my side or due to a problem with your patch? (the unpatched osm2pgrouting works well) I'm looking forward to your reply, Christof

              • Message #1239

                Hi Ole,

                I changed the SQL to delete from DELETE FROM ways WHERE gid IN (SELECT b.gid FROM ways a, ways b WHERE a.the_geom ~= b.the_geom AND a.gid > b.gid); to DELETE FROM ways WHERE gid IN (SELECT b.gid FROM ways a, ways b WHERE a.gid > b.gid AND a.x1 = b.x1 AND a.y1 = b.y1 AND a.the_geom ~= b.the_geom);

                now it works fine. Thanks a again for that patch!

                • Message #1264

                  Sorry for the late reply. i also had problems with this Statement but on larger data. i ll replace the statement in the patch.

            • Message #1209

              we have completely rewritten osm2pgrouting in java, no more memory problems. you can import any file now with one tool. ill post it soon

              we imported europe in about 10 hours.

              • Message #1265

                sorry for not posting it 'soon' :)

                if anyone is interested in the tool write to oliver.bindel[at]htw-saarland.de maybe there is no need for it in pgrouting, pgrouting runs out of memory while routing across europe. i think pgrouting really isnt made for large data.

                • Message #1271

                  Hi,

                  We are interested in your tool very much. Don't you mind to make it available for download and give us a link? Or can we put it on this website for download?

                  Regarding memory problem - you're right, but I have to note that it is not pgRouting who runs out of memory, but the hardware. And I'm not surprised - if you load all roads, even small and tiny ones, you will easily run out of memory. Long distance routing needs different approach - data hierarchy for example. We are aware of this problem and it was discussed so many times here at the forum, but currently we don't have resources to start such a project.

                  Cheers,

                  Anton.