developer tip

삽입 순서를 추적하는 std :: map?

copycodes 2020. 8. 22. 09:06
반응형

삽입 순서를 추적하는 std :: map?


현재 std::map<std::string,int>고유 한 문자열 식별자에 정수 값을 저장하는이 있으며 문자열을 검색합니다. 게재 신청서를 추적하지 않는다는 점을 제외하면 대부분 내가 원하는 작업을 수행합니다. 따라서 맵을 반복하여 값을 출력하면 문자열에 따라 정렬됩니다. 하지만 (첫 번째) 삽입 순서에 따라 정렬되기를 바랍니다.

vector<pair<string,int>>대신 a 사용하려고 생각 했지만 문자열을 찾아서 정수 값을 약 10,000,000 번 늘려야하므로 a std::vector가 상당히 느려질 지 여부를 알 수 없습니다 .

사용 방법이 있습니까, std::map아니면 std제 필요에 더 적합한 다른 용기가 있습니까?

[저는 GCC 3.4를 사용 중이며 아마도 내 값이 50 쌍 이하일 것 std::map입니다.].

감사.


std :: map에 50 개의 값만있는 경우 인쇄하기 전에 std :: vector에 복사하고 적절한 functor를 사용하여 std :: sort를 통해 정렬 할 수 있습니다.

또는 boost :: multi_index 사용할 수 있습니다 . 여러 인덱스를 사용할 수 있습니다. 귀하의 경우 다음과 같이 보일 수 있습니다.

struct value_t {
      string s;
      int    i;
};
struct string_tag {};
typedef multi_index_container<
    value_t,
    indexed_by<
        random_access<>, // this index represents insertion order
        hashed_unique< tag<string_tag>, member<value_t, string, &value_t::s> >
    >
> values_t;

a std::vectorstd::tr1::unordered_map(해시 테이블)을 결합 할 수 있습니다 . 여기에 링크의 부스트의 문서 에 대한이 unordered_map. 벡터를 사용하여 삽입 순서를 추적하고 해시 테이블을 사용하여 빈번한 조회를 수행 할 수 있습니다. 수십만 조회를 수행 std::map하는 경우 해시 테이블에 대한 O (log n) 조회 와 O (1)의 차이가 클 수 있습니다.

std::vector<std::string> insertOrder;
std::tr1::unordered_map<std::string, long> myTable;

// Initialize the hash table and record insert order.
myTable["foo"] = 0;
insertOrder.push_back("foo");
myTable["bar"] = 0;
insertOrder.push_back("bar");
myTable["baz"] = 0;
insertOrder.push_back("baz");

/* Increment things in myTable 100000 times */

// Print the final results.
for (int i = 0; i < insertOrder.size(); ++i)
{
    const std::string &s = insertOrder[i];
    std::cout << s << ' ' << myTable[s] << '\n';
}

평행을 유지하십시오 list<string> insertionOrder.

인쇄 할 시간이되면 목록을 반복 하고 지도를 조회합니다 .

each element in insertionOrder  // walks in insertionOrder..
    print map[ element ].second // but lookup is in map

Tessil은 MIT 라이센스 인 주문형 맵 (및 세트)을 매우 훌륭하게 구현했습니다. 여기에서 찾을 수 있습니다 : ordered-map

지도 예

#include <iostream>
#include <string>
#include <cstdlib>
#include "ordered_map.h"

int main() {
tsl::ordered_map<char, int> map = {{'d', 1}, {'a', 2}, {'g', 3}};
map.insert({'b', 4});
map['h'] = 5;
map['e'] = 6;

map.erase('a');


// {d, 1} {g, 3} {b, 4} {h, 5} {e, 6}
for(const auto& key_value : map) {
    std::cout << "{" << key_value.first << ", " << key_value.second << "}" << std::endl;
}


map.unordered_erase('b');

// Break order: {d, 1} {g, 3} {e, 6} {h, 5}
for(const auto& key_value : map) {
    std::cout << "{" << key_value.first << ", " << key_value.second << "}" << std::endl;
}
}

두 조회 전략이 모두 필요한 경우 두 개의 컨테이너로 끝납니다. vector실제 값 ( ints) 과 함께 a 사용하고 map< string, vector< T >::difference_type> 에 a를 넣어 인덱스를 벡터에 반환 할 수 있습니다.

이 모든 것을 완료하려면 두 가지를 하나의 클래스로 캡슐화 할 수 있습니다.

하지만 부스트에는 여러 인덱스 가있는 컨테이너가 있다고 생각 합니다.


You cannot do that with a map, but you could use two separate structures - the map and the vector and keep them synchronized - that is when you delete from the map, find and delete the element from the vector. Or you could create a map<string, pair<int,int>> - and in your pair store the size() of the map upon insertion to record position, along with the value of the int, and then when you print, use the position member to sort.


This is somewhat related to Faisals answer. You can just create a wrapper class around a map and vector and easily keep them synchronized. Proper encapsulation will let you control the access method and hence which container to use... the vector or the map. This avoids using Boost or anything like that.


Another way to implement this is with a map instead of a vector. I will show you this approach and discuss the differences:

Just create a class that has two maps behind the scenes.

#include <map>
#include <string>

using namespace std;

class SpecialMap {
  // usual stuff...

 private:
  int counter_;
  map<int, string> insertion_order_;
  map<string, int> data_;
};

You can then expose an iterator to iterator over data_ in the proper order. The way you do that is iterate through insertion_order_, and for each element you get from that iteration, do a lookup in the data_ with the value from insertion_order_

You can use the more efficient hash_map for insertion_order since you don't care about directly iterating through insertion_order_.

To do inserts, you can have a method like this:

void SpecialMap::Insert(const string& key, int value) {
  // This may be an over simplification... You ought to check
  // if you are overwriting a value in data_ so that you can update
  // insertion_order_ accordingly
  insertion_order_[counter_++] = key;
  data_[key] = value;
}

There are a lot of ways you can make the design better and worry about performance, but this is a good skeleton to get you started on implementing this functionality on your own. You can make it templated, and you might actually store pairs as values in data_ so that you can easily reference the entry in insertion_order_. But I leave these design issues as an exercise :-).

Update: I suppose I should say something about efficiency of using map vs. vector for insertion_order_

  • lookups directly into data, in both cases are O(1)
  • inserts in the vector approach are O(1), inserts in the map approach are O(logn)
  • deletes in the vector approach are O(n) because you have to scan for the item to remove. With the map approach they are O(logn).

Maybe if you are not going to use deletes as much, you should use the vector approach. The map approach would be better if you were supporting a different ordering (like priority) instead of insertion order.


// Should be like this man!

// This maintains the complexity of insertion is O(logN) and deletion is also O(logN).

class SpecialMap {
private:
  int counter_;
  map<int, string> insertion_order_;
  map<string, int> insertion_order_reverse_look_up; // <- for fast delete
  map<string, Data> data_;
};

What you want (without resorting to Boost) is what I call an "ordered hash", which is essentially a mashup of a hash and a linked list with string or integer keys (or both at the same time). An ordered hash maintains the order of the elements during iteration with the absolute performance of a hash.

I've been putting together a relatively new C++ snippet library that fills in what I view as holes in the C++ language for C++ library developers. Go here:

https://github.com/cubiclesoft/cross-platform-cpp

Grab:

templates/detachable_ordered_hash.cpp
templates/detachable_ordered_hash.h
templates/detachable_ordered_hash_util.h

If user-controlled data will be placed into the hash, you might also want:

security/security_csprng.cpp
security/security_csprng.h

Invoke it:

#include "templates/detachable_ordered_hash.h"
...
// The 47 is the nearest prime to a power of two
// that is close to your data size.
//
// If your brain hurts, just use the lookup table
// in 'detachable_ordered_hash.cpp'.
//
// If you don't care about some minimal memory thrashing,
// just use a value of 3.  It'll auto-resize itself.
int y;
CubicleSoft::OrderedHash<int> TempHash(47);
// If you need a secure hash (many hashes are vulnerable
// to DoS attacks), pass in two randomly selected 64-bit
// integer keys.  Construct with CSPRNG.
// CubicleSoft::OrderedHash<int> TempHash(47, Key1, Key2);
CubicleSoft::OrderedHashNode<int> *Node;
...
// Push() for string keys takes a pointer to the string,
// its length, and the value to store.  The new node is
// pushed onto the end of the linked list and wherever it
// goes in the hash.
y = 80;
TempHash.Push("key1", 5, y++);
TempHash.Push("key22", 6, y++);
TempHash.Push("key3", 5, y++);
// Adding an integer key into the same hash just for kicks.
TempHash.Push(12345, y++);
...
// Finding a node and modifying its value.
Node = TempHash.Find("key1", 5);
Node->Value = y++;
...
Node = TempHash.FirstList();
while (Node != NULL)
{
  if (Node->GetStrKey())  printf("%s => %d\n", Node->GetStrKey(), Node->Value);
  else  printf("%d => %d\n", (int)Node->GetIntKey(), Node->Value);

  Node = Node->NextList();
}

I ran into this SO thread during my research phase to see if anything like OrderedHash already existed without requiring me to drop in a massive library. I was disappointed. So I wrote my own. And now I've shared it.


Here is solution that requires only standard template library without using boost's multiindex:
You could use std::map<std::string,int>; and vector <data>; where in map you store the index of the location of data in vector and vector stores data in insertion order. Here access to data has O(log n) complexity. displaying data in insertion order has O(n) complexity. insertion of data has O(log n) complexity.

For Example:

#include<iostream>
#include<map>
#include<vector>

struct data{
int value;
std::string s;
}

typedef std::map<std::string,int> MapIndex;//this map stores the index of data stored 
                                           //in VectorData mapped to a string              
typedef std::vector<data> VectorData;//stores the data in insertion order

void display_data_according_insertion_order(VectorData vectorData){
    for(std::vector<data>::iterator it=vectorData.begin();it!=vectorData.end();it++){
        std::cout<<it->value<<it->s<<std::endl;
    }
}
int lookup_string(std::string s,MapIndex mapIndex){
    std::MapIndex::iterator pt=mapIndex.find(s)
    if (pt!=mapIndex.end())return it->second;
    else return -1;//it signifies that key does not exist in map
}
int insert_value(data d,mapIndex,vectorData){
    if(mapIndex.find(d.s)==mapIndex.end()){
        mapIndex.insert(std::make_pair(d.s,vectorData.size()));//as the data is to be
                                                               //inserted at back 
                                                               //therefore index is
                                                               //size of vector before
                                                               //insertion
        vectorData.push_back(d);
        return 1;
    }
    else return 0;//it signifies that insertion of data is failed due to the presence
                  //string in the map and map stores unique keys
}

One thing you need to consider is the small number of data elements you are using. It is possible that it will be faster to use just the vector. There is some overhead in the map that can cause it to be more expensive to do lookups in small data sets than the simpler vector. So, if you know that you will always be using around the same number of elements, do some benchmarking and see if the performance of the map and vector is what you really think it is. You may find the lookup in a vector with only 50 elements is near the same as the map.


Use boost::multi_index with map and list indices.


A map of pair (str,int) and static int that increments on insert calls indexes pairs of data. Put in a struct that can return the static int val with an index () member perhaps?

참고URL : https://stackoverflow.com/questions/1098175/a-stdmap-that-keep-track-of-the-order-of-insertion

반응형